Age | Commit message (Collapse) | Author |
|
commit bbe77c14ee6185a61b [net/sched: Retire dsmark qdisc] upstream
has removed CONFIG_NET_SCH_DSMARK so we drop it from our fragments
as well.
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
commit 051d442098421c28c7 [net/sched: Retire CBQ qdisc] removes
CONFIG_NET_SCH_CBQ from the tree, so we drop it from our fragments.
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Unfortunately linux-stable backported this:
Subject: ima: Remove deprecated IMA_TRUSTED_KEYRING Kconfig
From: Nayna Jain <nayna@linux.ibm.com>
[ Upstream commit 5087fd9e80e539d2163accd045b73da64de7de95 ]
Time to remove "IMA_TRUSTED_KEYRING".
...to all releases still being maintained.
stable-queue$git grep -l 5087fd9e80e539
releases/5.10.195/ima-remove-deprecated-ima_trusted_keyring-kconfig.patch
releases/5.15.132/ima-remove-deprecated-ima_trusted_keyring-kconfig.patch
releases/5.4.257/ima-remove-deprecated-ima_trusted_keyring-kconfig.patch
releases/6.1.53/ima-remove-deprecated-ima_trusted_keyring-kconfig.patch
releases/6.4.16/ima-remove-deprecated-ima_trusted_keyring-kconfig.patch
releases/6.5.3/ima-remove-deprecated-ima_trusted_keyring-kconfig.patch
So now when someone uses the feature, it triggers a do_kernel_configcheck
warning when the audit runs.
We added this file way back in 2019 so this fix will be needed on all
active branches that are using an LTS linux-stable kernel listed above.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
xt_NOTRACK was removed in linux-stable commit
965505015beccc4ec900798070165875b8e8dccf
and has been superseded by xt_CT.
Signed-off-by: Eero Aaltonen <eero.aaltonen@vaisala.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Since Linux 67951f6380732 ("[SCSI] Fix dependency problems in SCSI drivers"),
SCSI_DEBUG is protected by 'if SCSI_LOWLEVEL && SCSI' conditional, so select
SCSI_LOWLEVEL too.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Bruce Ashfield <bruce.ashfield@gmail.com>
Cc: Mariano López <just.another.mariano@gmail.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
[INFO]: Fragments with badly formatted configuration options:
- fragment configs/v5.10/standard/features/ima/ima.cfg has the following issues: #CONFIG_INTEGRITY_SIGNATURE=y
- fragment configs/v5.10/standard/features/ima/ima.cfg has the following issues: #CONFIG_INTEGRITY_ASYMMETRIC_KEYS=y
- fragment configs/v5.10/standard/features/ima/ima.cfg has the following issues: #CONFIG_INTEGRITY_TRUSTED_KEYRING=y
Signed-off-by: Armin Kuster <akuster808@gmail.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This use to enable the below option to support Bosch M_CAN controller
on Elkhart Lake:
CONFIG_CAN=m
CONFIG_CAN_M_CAN=m
CONFIG_CAN_M_CAN_PCI=m
CONFIG_CAN_M_CAN_PLATFORM=m
CONFIG_CAN_M_CAN_TCAN4X5X=m
Signed-off-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This will enable CONFIG_DPTF_PCH_FIVR for intel-x86 bsp to support
Dynamic Platform and Thermal Framework PCH FIVR Participant device.
Signed-off-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This will enable INTEL_UNCORE_FREQ_CONTROL for intel-x86 bsp
Signed-off-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This will enable CONFIG_INTEL_IDXD for intel-x86 bsp.
Signed-off-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This is 'y' selected by other options, so we can't force it to
=m.
We could drop the value completely, but having it around shows
our explicit intent to use the option.
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This option is m selected by our dependencies, so lets align with
them and avoid a warning.
We keep the option, so that if it changes in the future, we'll get
the warning again.
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Fixing the following warning, we can't set it to 'y', when it is 'm'
selected otherwise:
[NOTE]: 'CONFIG_DRM_KMS_HELPER' last val (y) and .config val (m) do not match
[INFO]: CONFIG_DRM_KMS_HELPER : m ## .config: 2331 :configs/v5.10/standard/tiny/features/i915/i915.cfg (y)
[INFO]: raw config text:
config DRM_KMS_HELPER
tristate
depends on DRM && HAS_IOMEM
help
CRTC helpers for KMS drivers.
Config 'DRM_KMS_HELPER' has the following Direct dependencies (DRM_KMS_HELPER=y):
DRM(=y) && HAS_IOMEM(=y)
Parent dependencies are:
DRM [y] HAS_IOMEM [y]
[INFO]: config 'CONFIG_DRM_KMS_HELPER' was set, but it wasn't assignable, check (parent) dependencies
[INFO]: selection details for 'CONFIG_DRM_KMS_HELPER':
Symbols currently m-selecting this symbol:
- DRM_I915
- DRM_CIRRUS_QEMU
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This patch fixes:
Warning: Value of CONFIG_NFT_REJECT_IPV6 is defined multiple times
within fragment ... features/nf_tables/nf_tables.cfg:
CONFIG_NFT_REJECT_IPV6=m
CONFIG_NFT_REJECT_IPV6=m
Signed-off-by: Sam Zeter <szeter@digi.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
MEDIA_SUPPORT is set by default to m so DVB_CORE can't be y and this is
the only tristate that we set to y that is controlled by MEDIA_SUPPORT.
Change it to m. Fixes:
[NOTE]: 'CONFIG_DVB_CORE' last val (y) and .config val (m) do not match
[INFO]: CONFIG_DVB_CORE : m ## .config: 3617 :configs/v5.4/standard/features/media/media.cfg (y)
[INFO]: raw config text:
config DVB_CORE
tristate
default y
select CRC32
depends on MEDIA_SUPPORT && MEDIA_DIGITAL_TV_SUPPORT && (I2C || I2C = n) && MEDIA_SUPPORT
Config 'DVB_CORE' has the following Direct dependencies (DVB_CORE=m):
MEDIA_SUPPORT(=m) && MEDIA_DIGITAL_TV_SUPPORT(=y) && I2C(=y) || I2C(=y) = n (=y)
Parent dependencies are:
MEDIA_DIGITAL_TV_SUPPORT [y] MEDIA_SUPPORT [m] I2C [y]
[INFO]: config 'CONFIG_DVB_CORE' was set, but it wasn't assignable, check (parent) dependencies
Signed-off-by: Anuj Mittal <anuj.mittal@intel.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Sai Hari Chandana Kalluri <chandana.kalluri@xilinx.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Sai Hari Chandana Kalluri <chandana.kalluri@xilinx.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Add configuration fragment to features/xilinx to enable v4l2 drivers
Signed-off-by: Sai Hari Chandana Kalluri <chandana.kalluri@xilinx.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
The oci containers fragment was imported from the configuration of
meta-virtualization. But the structure of fragments in meta-virt is
flat, compared to the kernel-cache.
We drop the duplicate lxc fragment, and sync with the main fragment.
We also split out the vswitch and docker fragments so that they can
be more easily found and re-used.
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
The group scheduling options in the lxc fragment were initially
used to support performance guaranteed systems using containers.
This option now causes issues with systemd runtimes and the
original feature it implemented is no longer relevant
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This will enable CONFIG_ICE=m for intel-x86 bsp.
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Signed-off-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Naveen Saini <naveen.kumar.saini@intel.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Creating an initial feature fragment that can be included when a
reproducible kernel build is desired. This is currently only one
option, but will have more in the future.
Signen-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This reverts commit 5ed9a90606e58962adada68e77ab387e8ae43f58.
|
|
The uptime feature was specifically requested in the early days of
linux-yocto. It is no longer useful, and is causing build issues, so
we remove it completely.
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
bcc and bpftrace may require kernel headers to be able to build tools.
Otherwise it would fail as
modprobe: module kheaders not found in modules.dep
Unable to find kernel headers. Try rebuilding kernel with CONFIG_IKHEADERS=m (module)
Signed-off-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Enable cgroups, docker, ebtables, lxc, vmswitch, xt-checksum configs to support
ocicontainers
Signed-off-by: Sai Hari Chandana Kalluri <chandana.kalluri@xilinx.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Enable xen configurations
Signed-off-by: Sai Hari Chandana Kalluri <chandana.kalluri@xilinx.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Forgot to refresh prior to sending patch for below commit:
Commit: 97c1acde apparmor/apparmor_on_boot: drop config no longer exists in 5.4
Signed-off-by: Armin Kuster <akuster808@gmail.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Armin Kuster <akuster808@gmail.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This fragment enbale linux kernel f2fs related options etc
Signed-off-by: qzhang2 <qiang.zhang@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This reverts commit 567dbf1f54dc70169d7edec603ca12846e1fa444.
|
|
1/1 [
Author: Charles Manning
Email: cdhmanning@gmail.com
Subject: yaffs2: fix memory leak when /proc/yaffs is read
Date: Mon, 16 Mar 2020 16:52:51 +0800
commit 27f18203551940abf35826a66978daf1b8124c6b from
git://www.aleph1.co.uk/yaffs2
Thanks to Jisheng Zhang <Jisheng.Zhang@synaptics.com> for supplying this patch
There is a kernel memory leak observed when the proc file /proc/yaffs
is read. This reason is that in yaffs_proc_open, single_open is called
and the respective release function is not called during release.
Fix with correct release function - single_release().
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
CONFIG_DM_VERITY depends on CONFIG_MD & CONFIG_BLK_DEV_DM. Without these
options in the .cfg file, the support for dm-verity will not be included.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This adds a new config fragment that allows building of the GPIO testing
module as a kernel feature. This is useful when enabling ptest for
libgpiod.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/6 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: v5.4.13-rt6
Date: Mon, 20 Jan 2020 09:59:29 +0100
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
2/6 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: Revert "hrtimer: move state change before hrtimer_cancel in do_nanosleep()"
Date: Mon, 20 Jan 2020 10:55:01 +0100
Since the rewrite of the hrtimer canceling code, the sleeping part
acquries a sleeping lock which preserves the state and it no longer
triggers a warning here.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
3/6 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: smp: Use smp_cond_func_t as type for the conditional function
Date: Thu, 16 Jan 2020 12:00:31 +0100
Use a typdef for the conditional function instead defining it each time in
the function prototype.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
4/6 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: smp: Add a smp_cond_func_t argument to smp_call_function_many()
Date: Thu, 16 Jan 2020 12:14:38 +0100
on_each_cpu_cond_mask() allocates a new CPU mask. The newly allocated
mask is a subset of the provided mask based on the conditional function.
This memory allocation could be avoided by extending
smp_call_function_many() with the conditional function and performing the
remote function call based on the mask and the conditional function.
Rename smp_call_function_many() to smp_call_function_many_cond() and add
the smp_cond_func_t argument. If smp_cond_func_t is provided then it is
used before invoking the function.
Provide smp_call_function_many() with cond_func set to NULL.
Let on_each_cpu_cond_mask() use smp_call_function_many_cond().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
5/6 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: smp: Remove allocation mask from on_each_cpu_cond.*()
Date: Thu, 16 Jan 2020 13:13:41 +0100
The allocation mask is no longer used by on_each_cpu_cond() and
on_each_cpu_cond_mask() and ca be removed.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
6/6 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: v5.4.13-rt7
Date: Mon, 20 Jan 2020 11:48:34 +0100
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
https://github.com/torvalds/linux/commit/8c5dc8d9f19c7992b5ed557b865127a80149041b
Applicable for v5.2-rc1 onward
Signed-off-by: Naveen Saini <naveen.kumar.saini@intel.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Lynxpoint GPIO driver moved under Intel pin control umbrella
for further transformation to a real pin control driver.
CONFIG_GPIO_LYNXPOINT renamed to PINCTRL_LYNXPOINT
https://github.com/intel/linux-intel-lts/commit/cb81404ffe750270f072d1fc839a4afe288046f3
Signed-off-by: Naveen Saini <naveen.kumar.saini@intel.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/7 [
Author: Kevin Hao
Email: kexin.hao@windriver.com
Subject: fs: yaffs2: replace CURRENT_TIME by other appropriate apis
Date: Fri, 25 Aug 2017 14:58:34 +0800
The macro CURRENT_TIME has already been deleted by commit bfe1c566453a
("time: delete CURRENT_TIME_SEC and CURRENT_TIME"). So we need to
replace all the uses of CURRENT_TIME by current_time() for filesystem
times, and ktime_get_* function for others.
Signed-off-by: Kevin Hao <kexin.hao@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
]
2/7 [
Author: Jiang Lu
Email: lu.jiang@windriver.com
Subject: Yaffs:check oob size before auto selecting Yaffs1
Date: Fri, 13 Dec 2013 11:18:18 +0800
Yaffs will select Yaffs1 for deives with 512 byte writing size.
Moreover, it will enable inband_tags automatically for devices with
small oob.
However, Yaffs1 can not work with inband_tags. So move the
oob size checking before auto selecting Yaffs1.
Signed-off-by: Jiang Lu <lu.jiang@windriver.com>
]
3/7 [
Author: Zhang Xiao
Email: xiao.zhang@windriver.com
Subject: yaffs: Avoid setting any ACL releated xattr
Date: Wed, 27 Aug 2014 13:34:15 +0800
YAFFS doesn't sopport ACL yet, it must refuse any related settings.
Signed-off-by: Zhang Xiao <xiao.zhang@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
]
4/7 [
Author: czou
Email: cao.zou@windriver.com
Subject: yaffs2: fix memory leak in mount/umount
Date: Mon, 13 Jul 2015 09:48:03 +0800
when using yaffs2 filesystem, after umounting, yaffs_put_super
function doesn't free the context_os memory, it is malloced in
yaffs_internal_read_super.
unreferenced object 0xd9103980 (size 64):
comm "mount", pid 5694, jiffies 159571 (age 45.150s)
hex dump (first 32 bytes):
80 39 10 d9 80 39 10 d9 00 b0 83 dc 00 b0 07 d9 .9...9..........
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<c0131b20>] kmem_cache_alloc_trace+0x188/0x350
[<c0386720>] yaffs_internal_read_super.isra.18+0x1d0/0x7e4
[<c0386d58>] yaffs2_internal_read_super_mtd+0x24/0x34
[<c0147110>] mount_bdev+0x178/0x1a0
[<c03835c8>] yaffs2_mount+0x28/0x30
[<c0147a74>] mount_fs+0x54/0x194
[<c01620e0>] vfs_kern_mount+0x58/0xf0
[<c0164b18>] do_mount+0x210/0xa48
[<c01656a4>] SyS_mount+0x94/0xc8
[<c000ec80>] ret_fast_syscall+0x0/0x30
[<ffffffff>] 0xffffffff
Signed-off-by: czou <cao.zou@windriver.com>
]
5/7 [
Author: He Zhe
Email: zhe.he@windriver.com
Subject: yaffs: Fix build failure by handling inode i_version with proper atomic API
Date: Tue, 19 Jun 2018 00:58:17 -0700
i_version in struct inode has changed to atomic64_t in mainline kernel.
This patch handles i_version with proper atomic API.
Signed-off-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
]
6/7 [
Author: Quanyang Wang
Email: quanyang.wang@windriver.com
Subject: yaffs: repair yaffs_get_mtd_device
Date: Tue, 3 Dec 2019 17:01:17 +0800
The function yaffs_get_mtd_device use wrong function to retrieve
mtd_info structure (using yaffs_get_mtd_device itself will cause
dead loop).
Use get_mtd_device to do this.
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
7/7 [
Author: Quanyang Wang
Email: quanyang.wang@windriver.com
Subject: yaffs: add strict check when call yaffs_internal_read_super
Date: Tue, 3 Dec 2019 17:01:18 +0800
When kernel booting, mount_block_root will be called to judge
the filesystem type of root device. Then .mount function in file_system_type
structure will do the check operation. But yaffs filesystem has a
relaxed examination because as a filesystem for NAND Flash, it doesn't
examinate whether the root device is the MTD NAND device. This results
that yaffs filesystem will do mount operation even though the root device
is a MMC card with a btrfs filesystem, and will crash kernel after
mounting failed. The crash log is as below:
md: Waiting for all devices to be available before autodetect
md: If you don't use raid, use raid=noautodetect
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
yaffs: dev is 187695107 name is "mmcblk0p3" rw
yaffs: passed flags ""
yaffs: dev is 187695107 name is "mmcblk0p3" rw
yaffs: passed flags ""
------------[ cut here ]------------
kernel BUG at fs/yaffs2/yaffs_getblockinfo.h:30!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.2.24-yocto-standard+ #250
Hardware name: ZynqMP ZCU102 Rev1.0 (DT)
pstate: 80000005 (Nzcv daif -PAN -UAO)
pc : yaffs_rd_chunk_tags_nand+0xf0/0x110
lr : yaffs_rd_chunk_tags_nand+0x108/0x110
sp : ffffff801003b770
x29: ffffff801003b770 x28: ffffffc876fe8000
x27: 00000000000c0000 x26: 0000000000000000
x25: 00000000ffffffe1 x24: 0000000000010000
x23: 0000000000000000 x22: ffffff8011228000
x21: 000000000000005f x20: ffffff801003b890
x19: ffffffc876fe8000 x18: ffffffffffffffff
x17: 0000000000000000 x16: 0000000000000000
x15: ffffff80112285c8 x14: ffffff801137d228
x13: ffffff801137ce74 x12: ffffff8011246000
x11: 0000000000000000 x10: ffffff801137c000
x9 : 0000000000000000 x8 : 0000000000000007
x7 : 000000000000015c x6 : ffffff801137c490
x5 : 0000000000000000 x4 : 0000000000000000
x3 : 00000000ffffffff x2 : 50c80792e0663400
x1 : 0000000000000000 x0 : 0000000000000037
Call trace:
yaffs_rd_chunk_tags_nand+0xf0/0x110
yaffs_summary_read+0x10c/0x2e0
yaffs2_scan_backwards+0x28c/0xf58
yaffs_guts_initialise+0x71c/0x7a0
yaffs_internal_read_super.isra.20+0x4ec/0x838
yaffs2_internal_read_super_mtd+0x2c/0x48
mount_bdev+0x1a4/0x1e0
yaffs2_mount+0x44/0x58
legacy_get_tree+0x34/0x60
vfs_get_tree+0x34/0x120
do_mount+0x708/0x980
ksys_mount+0x9c/0x110
mount_block_root+0x128/0x29c
mount_root+0x148/0x17c
prepare_namespace+0x178/0x1c0
kernel_init_freeable+0x370/0x390
kernel_init+0x18/0x110
ret_from_fork+0x10/0x1c
Code: d65f03c0 f00069c0 b9440400 37f00060 (d4210000)
---[ end trace 68aa0995bdf59f76 ]---
BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
in_atomic(): 1, irqs_disabled(): 128, pid: 1, name: swapper/0
Preemption disabled at:
[<ffffff80100a4598>] debug_exception_enter+0x30/0x40
CPU: 3 PID: 1 Comm: swapper/0 Tainted: G D 5.2.24-yocto-standard+ #250
Hardware name: ZynqMP ZCU102 Rev1.0 (DT)
Call trace:
dump_backtrace+0x0/0x148
show_stack+0x24/0x30
dump_stack+0x98/0xbc
___might_sleep+0x130/0x188
__might_sleep+0x58/0x90
exit_signals+0x44/0x258
do_exit+0xb4/0xa38
die+0x1bc/0x1e0
bug_handler+0x48/0x98
call_break_hook+0x7c/0xa8
brk_handler+0x28/0x68
do_debug_exception+0xc4/0x188
el1_dbg+0x18/0x8c
yaffs_rd_chunk_tags_nand+0xf0/0x110
yaffs_summary_read+0x10c/0x2e0
yaffs2_scan_backwards+0x28c/0xf58
yaffs_guts_initialise+0x71c/0x7a0
yaffs_internal_read_super.isra.20+0x4ec/0x838
yaffs2_internal_read_super_mtd+0x2c/0x48
mount_bdev+0x1a4/0x1e0
yaffs2_mount+0x44/0x58
legacy_get_tree+0x34/0x60
vfs_get_tree+0x34/0x120
do_mount+0x708/0x980
ksys_mount+0x9c/0x110
mount_block_root+0x128/0x29c
mount_root+0x148/0x17c
prepare_namespace+0x178/0x1c0
kernel_init_freeable+0x370/0x390
kernel_init+0x18/0x110
ret_from_fork+0x10/0x1c
note: swapper/0[1] exited with preempt_count 1
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x0002,20002004
Memory Limit: none
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
Use yaffs_get_mtd_device to add strict check.
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/2 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: yaffs2: adjust to proper location of MS_RDONLY
Date: Tue, 14 Jan 2020 13:53:11 -0500
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
2/2 [
Author: Kevin Hao
Email: kexin.hao@windriver.com
Subject: fs: yaffs2: replace CURRENT_TIME by other appropriate apis
Date: Fri, 25 Aug 2017 14:58:34 +0800
The macro CURRENT_TIME has already been deleted by commit bfe1c566453a
("time: delete CURRENT_TIME_SEC and CURRENT_TIME"). So we need to
replace all the uses of CURRENT_TIME by current_time() for filesystem
times, and ktime_get_* function for others.
Signed-off-by: Kevin Hao <kexin.hao@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@windriver.com>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Commit 8060c47ba853f1 [block: rename CONFIG_DEBUG_BLK_CGROUP to
CONFIG_BFQ_CGROUP_DEBUG] changes the name of this option. So we
update to match and avoid a configuration warning.
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/5 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: aufs5: kbuild
Date: Mon, 13 Jan 2020 12:30:06 -0500
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
2/5 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: aufs5: base
Date: Mon, 13 Jan 2020 12:30:59 -0500
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
3/5 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: aufs5: mmap
Date: Mon, 13 Jan 2020 12:31:31 -0500
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
4/5 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: aufs5: standalone
Date: Mon, 13 Jan 2020 12:31:58 -0500
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
5/5 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: aufs5: core
Date: Mon, 13 Jan 2020 12:33:15 -0500
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/1 [
Author: Bruce Ashfield
Email: bruce.ashfield@gmail.com
Subject: yaffs2: import git revision b4ce1bb (jan, 2020)
Date: Mon, 13 Jan 2020 12:19:30 -0500
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/251 [
Author: Waiman Long
Email: longman@redhat.com
Subject: lib/smp_processor_id: Don't use cpumask_equal()
Date: Thu, 3 Oct 2019 16:36:08 -0400
The check_preemption_disabled() function uses cpumask_equal() to see
if the task is bounded to the current CPU only. cpumask_equal() calls
memcmp() to do the comparison. As x86 doesn't have __HAVE_ARCH_MEMCMP,
the slow memcmp() function in lib/string.c is used.
On a RT kernel that call check_preemption_disabled() very frequently,
below is the perf-record output of a certain microbenchmark:
42.75% 2.45% testpmd [kernel.kallsyms] [k] check_preemption_disabled
40.01% 39.97% testpmd [kernel.kallsyms] [k] memcmp
We should avoid calling memcmp() in performance critical path. So the
cpumask_equal() call is now replaced with an equivalent simpler check.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
2/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jbd2: Simplify journal_unmap_buffer()
Date: Fri, 9 Aug 2019 14:42:27 +0200
journal_unmap_buffer() checks first whether the buffer head is a journal.
If so it takes locks and then invokes jbd2_journal_grab_journal_head()
followed by another check whether this is journal head buffer.
The double checking is pointless.
Replace the initial check with jbd2_journal_grab_journal_head() which
alredy checks whether the buffer head is actually a journal.
Allows also early access to the journal head pointer for the upcoming
conversion of state lock to a regular spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
3/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jbd2: Remove jbd_trylock_bh_state()
Date: Fri, 9 Aug 2019 14:42:28 +0200
No users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
4/251 [
Author: Jan Kara
Email: jack@suse.cz
Subject: jbd2: Move dropping of jh reference out of un/re-filing functions
Date: Fri, 9 Aug 2019 14:42:29 +0200
__jbd2_journal_unfile_buffer() and __jbd2_journal_refile_buffer() drop
transaction's jh reference when they remove jh from a transaction. This
will be however inconvenient once we move state lock into journal_head
itself as we still need to unlock it and we'd need to grab jh reference
just for that. Move dropping of jh reference out of these functions into
the few callers.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
5/251 [
Author: Jan Kara
Email: jack@suse.cz
Subject: jbd2: Drop unnecessary branch from jbd2_journal_forget()
Date: Fri, 9 Aug 2019 14:42:30 +0200
We have cleared both dirty & jbddirty bits from the bh. So there's no
difference between bforget() and brelse(). Thus there's no point jumping
to no_jbd branch.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
6/251 [
Author: Jan Kara
Email: jack@suse.cz
Subject: jbd2: Don't call __bforget() unnecessarily
Date: Fri, 9 Aug 2019 14:42:31 +0200
jbd2_journal_forget() jumps to 'not_jbd' branch which calls __bforget()
in cases where the buffer is clean which is pointless. In case of failed
assertion, it can be even argued that it is safer not to touch buffer's
dirty bits. Also logically it makes more sense to just jump to 'drop'
and that will make logic also simpler when we switch bh_state_lock to a
spinlock.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
7/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jbd2: Make state lock a spinlock
Date: Fri, 9 Aug 2019 14:42:32 +0200
Bit-spinlocks are problematic on PREEMPT_RT if functions which might sleep
on RT, e.g. spin_lock(), alloc/free(), are invoked inside the lock held
region because bit spinlocks disable preemption even on RT.
A first attempt was to replace state lock with a spinlock placed in struct
buffer_head and make the locking conditional on PREEMPT_RT and
DEBUG_BIT_SPINLOCKS.
Jan pointed out that there is a 4 byte hole in struct journal_head where a
regular spinlock fits in and he would not object to convert the state lock
to a spinlock unconditionally.
Aside of solving the RT problem, this also gains lockdep coverage for the
journal head state lock (bit-spinlocks are not covered by lockdep as it's
hard to fit a lockdep map into a single bit).
The trivial change would have been to convert the jbd_*lock_bh_state()
inlines, but that comes with the downside that these functions take a
buffer head pointer which needs to be converted to a journal head pointer
which adds another level of indirection.
As almost all functions which use this lock have a journal head pointer
readily available, it makes more sense to remove the lock helper inlines
and write out spin_*lock() at all call sites.
Fixup all locking comments as well.
Suggested-by: Jan Kara <jack@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jan Kara <jack@suse.com>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
8/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jbd2: Free journal head outside of locked region
Date: Fri, 9 Aug 2019 14:42:33 +0200
On PREEMPT_RT bit-spinlocks have the same semantics as on PREEMPT_RT=n,
i.e. they disable preemption. That means functions which are not safe to be
called in preempt disabled context on RT trigger a might_sleep() assert.
The journal head bit spinlock is mostly held for short code sequences with
trivial RT safe functionality, except for one place:
jbd2_journal_put_journal_head() invokes __journal_remove_journal_head()
with the journal head bit spinlock held. __journal_remove_journal_head()
invokes kmem_cache_free() which must not be called with preemption disabled
on RT.
Jan suggested to rework the removal function so the actual free happens
outside the bit-spinlocked region.
Split it into two parts:
- Do the sanity checks and the buffer head detach under the lock
- Do the actual free after dropping the lock
There is error case handling in the free part which needs to dereference
the b_size field of the now detached buffer head. Due to paranoia (caused
by ignorance) the size is retrieved in the detach function and handed into
the free function. Might be over-engineered, but better safe than sorry.
This makes the journal head bit-spinlock usage RT compliant and also avoids
nested locking which is not covered by lockdep.
Suggested-by: Jan Kara <jack@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-ext4@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
9/251 [
Author: Joel Fernandes (Google)
Email: joel@joelfernandes.org
Subject: workqueue: Convert for_each_wq to use built-in list check
Date: Thu, 15 Aug 2019 10:18:42 -0400
Because list_for_each_entry_rcu() can now check for holding a
lock as well as for being in an RCU read-side critical section,
this commit replaces the workqueue_sysfs_unregister() function's
use of assert_rcu_or_wq_mutex() and list_for_each_entry_rcu() with
list_for_each_entry_rcu() augmented with a lockdep_is_held() optional
argument.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
10/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86/ioapic: Rename misnamed functions
Date: Thu, 17 Oct 2019 12:19:02 +0200
ioapic_irqd_[un]mask() are misnomers as both functions do way more than
masking and unmasking the interrupt line. Both deal with the moving the
affinity of the interrupt within interrupt context. The mask/unmask is just
a tiny part of the functionality.
Rename them to ioapic_prepare/finish_move(), fixup the call sites and
rename the related variables in the code to reflect what this is about.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20191017101938.412489856@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
11/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: percpu-refcount: use normal instead of RCU-sched"
Date: Wed, 4 Sep 2019 17:59:36 +0200
This is a revert of commit
a4244454df129 ("percpu-refcount: use RCU-sched insted of normal RCU")
which claims the only reason for using RCU-sched is
"rcu_read_[un]lock() … are slightly more expensive than preempt_disable/enable()"
and
"As the RCU critical sections are extremely short, using sched-RCU
shouldn't have any latency implications."
The problem with RCU-sched is that it disables preemption and the
callback must not acquire any sleeping locks like spinlock_t on
PREEMPT_RT which is the case.
Convert back to normal RCU.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
12/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Don't disable interrupts independently of the lock
Date: Wed, 10 Apr 2019 11:01:37 +0200
The locks (active.lock and rq->lock) need to be taken with disabled
interrupts. This is done in i915_request_retire() by disabling the
interrupts independently of the locks itself.
While local_irq_disable()+spin_lock() equals spin_lock_irq() on vanilla
it does not on PREEMPT_RT.
Chris Wilson confirmed that local_irq_disable() was just introduced as
an optimisation to avoid enabling/disabling interrupts during
lock/unlock combo.
Enable/disable interrupts as part of the locking instruction.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
13/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block: Don't disable interrupts in trigger_softirq()
Date: Fri, 15 Nov 2019 21:37:22 +0100
trigger_softirq() is always invoked as a SMP-function call which is
always invoked with disables interrupts.
Don't disable interrupt in trigger_softirq() because interrupts are
already disabled.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
14/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm64: KVM: Invoke compute_layout() before alternatives are applied
Date: Thu, 26 Jul 2018 09:13:42 +0200
compute_layout() is invoked as part of an alternative fixup under
stop_machine(). This function invokes get_random_long() which acquires a
sleeping lock on -RT which can not be acquired in this context.
Rename compute_layout() to kvm_compute_layout() and invoke it before
stop_machine() applies the alternatives. Add a __init prefix to
kvm_compute_layout() because the caller has it, too (and so the code can be
discarded after boot).
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
15/251 [
Author: Marc Kleine-Budde
Email: mkl@pengutronix.de
Subject: net: sched: Use msleep() instead of yield()
Date: Wed, 5 Mar 2014 00:49:47 +0100
On PREEMPT_RT enabled systems the interrupt handler run as threads at prio 50
(by default). If a high priority userspace process tries to shut down a busy
network interface it might spin in a yield loop waiting for the device to
become idle. With the interrupt thread having a lower priority than the
looping process it might never be scheduled and so result in a deadlock on UP
systems.
With Magic SysRq the following backtrace can be produced:
> test_app R running 0 174 168 0x00000000
> [<c02c7070>] (__schedule+0x220/0x3fc) from [<c02c7870>] (preempt_schedule_irq+0x48/0x80)
> [<c02c7870>] (preempt_schedule_irq+0x48/0x80) from [<c0008fa8>] (svc_preempt+0x8/0x20)
> [<c0008fa8>] (svc_preempt+0x8/0x20) from [<c001a984>] (local_bh_enable+0x18/0x88)
> [<c001a984>] (local_bh_enable+0x18/0x88) from [<c025316c>] (dev_deactivate_many+0x220/0x264)
> [<c025316c>] (dev_deactivate_many+0x220/0x264) from [<c023be04>] (__dev_close_many+0x64/0xd4)
> [<c023be04>] (__dev_close_many+0x64/0xd4) from [<c023be9c>] (__dev_close+0x28/0x3c)
> [<c023be9c>] (__dev_close+0x28/0x3c) from [<c023f7f0>] (__dev_change_flags+0x88/0x130)
> [<c023f7f0>] (__dev_change_flags+0x88/0x130) from [<c023f904>] (dev_change_flags+0x10/0x48)
> [<c023f904>] (dev_change_flags+0x10/0x48) from [<c024c140>] (do_setlink+0x370/0x7ec)
> [<c024c140>] (do_setlink+0x370/0x7ec) from [<c024d2f0>] (rtnl_newlink+0x2b4/0x450)
> [<c024d2f0>] (rtnl_newlink+0x2b4/0x450) from [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4)
> [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4) from [<c0256740>] (netlink_rcv_skb+0xac/0xc0)
> [<c0256740>] (netlink_rcv_skb+0xac/0xc0) from [<c024bbd8>] (rtnetlink_rcv+0x18/0x24)
> [<c024bbd8>] (rtnetlink_rcv+0x18/0x24) from [<c02561b8>] (netlink_unicast+0x13c/0x198)
> [<c02561b8>] (netlink_unicast+0x13c/0x198) from [<c025651c>] (netlink_sendmsg+0x264/0x2e0)
> [<c025651c>] (netlink_sendmsg+0x264/0x2e0) from [<c022af98>] (sock_sendmsg+0x78/0x98)
> [<c022af98>] (sock_sendmsg+0x78/0x98) from [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278)
> [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278) from [<c022cf08>] (__sys_sendmsg+0x48/0x78)
> [<c022cf08>] (__sys_sendmsg+0x48/0x78) from [<c0009320>] (ret_fast_syscall+0x0/0x2c)
This patch works around the problem by replacing yield() by msleep(1), giving
the interrupt thread time to finish, similar to other changes contained in the
rt patch set. Using wait_for_completion() instead would probably be a better
solution.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
16/251 [
Author: Uladzislau Rezki (Sony)
Email: urezki@gmail.com
Subject: mm/vmalloc: remove preempt_disable/enable when doing preloading
Date: Sat, 30 Nov 2019 17:54:33 -0800
Some background. The preemption was disabled before to guarantee that a
preloaded object is available for a CPU, it was stored for. That was
achieved by combining the disabling the preemption and taking the spin
lock while the ne_fit_preload_node is checked.
The aim was to not allocate in atomic context when spinlock is taken
later, for regular vmap allocations. But that approach conflicts with
CONFIG_PREEMPT_RT philosophy. It means that calling spin_lock() with
disabled preemption is forbidden in the CONFIG_PREEMPT_RT kernel.
Therefore, get rid of preempt_disable() and preempt_enable() when the
preload is done for splitting purpose. As a result we do not guarantee
now that a CPU is preloaded, instead we minimize the case when it is
not, with this change, by populating the per cpu preload pointer under
the vmap_area_lock.
This implies that at least each caller that has done the preallocation
will not fallback to an atomic allocation later. It is possible that
the preallocation would be pointless or that no preallocation is done
because of the race but the data shows that this is really rare.
For example i run the special test case that follows the preload pattern
and path. 20 "unbind" threads run it and each does 1000000 allocations.
Only 3.5 times among 1000000 a CPU was not preloaded. So it can happen
but the number is negligible.
[mhocko@suse.com: changelog additions]
Link: http://lkml.kernel.org/r/20191016095438.12391-1-urezki@gmail.com
Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for split purpose")
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Daniel Wagner <dwagner@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
17/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: KVM: arm/arm64: Let the timer expire in hardirq context on RT
Date: Tue, 13 Aug 2019 14:29:41 +0200
The timers are canceled from an preempt-notifier which is invoked with
disabled preemption which is not allowed on PREEMPT_RT.
The timer callback is short so in could be invoked in hard-IRQ context
on -RT.
Let the timer expire on hard-IRQ context even on -RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
18/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add printk ring buffer documentation
Date: Tue, 12 Feb 2019 15:29:39 +0100
The full documentation file for the printk ring buffer.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
19/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add prb locking functions
Date: Tue, 12 Feb 2019 15:29:40 +0100
Add processor-reentrant spin locking functions. These allow
restricting the number of possible contexts to 2, which can simplify
implementing code that also supports NMI interruptions.
prb_lock();
/*
* This code is synchronized with all contexts
* except an NMI on the same processor.
*/
prb_unlock();
In order to support printk's emergency messages, a
processor-reentrant spin lock will be used to control raw access to
the emergency console. However, it must be the same
processor-reentrant spin lock as the one used by the ring buffer,
otherwise a deadlock can occur:
CPU1: printk lock -> emergency -> serial lock
CPU2: serial lock -> printk lock
By making the processor-reentrant implemtation available externally,
printk can use the same atomic_t for the ring buffer as for the
emergency console and thus avoid the above deadlock.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
20/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: define ring buffer struct and initializer
Date: Tue, 12 Feb 2019 15:29:41 +0100
See Documentation/printk-ringbuffer.txt for details about the
initializer arguments.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
21/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add writer interface
Date: Tue, 12 Feb 2019 15:29:42 +0100
Add the writer functions prb_reserve() and prb_commit(). These make
use of processor-reentrant spin locks to limit the number of possible
interruption scenarios for the writers.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
22/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add basic non-blocking reading interface
Date: Tue, 12 Feb 2019 15:29:43 +0100
Add reader iterator static declaration/initializer, dynamic
initializer, and functions to iterate and retrieve ring buffer data.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
23/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add blocking reader support
Date: Tue, 12 Feb 2019 15:29:44 +0100
Add a blocking read function for readers. An irq_work function is
used to signal the wait queue so that write notification can
be triggered from any context.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
24/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add functionality required by printk
Date: Tue, 12 Feb 2019 15:29:45 +0100
The printk subsystem needs to be able to query the size of the ring
buffer, seek to specific entries within the ring buffer, and track
if records could not be stored in the ring buffer.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
25/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add ring buffer and kthread
Date: Tue, 12 Feb 2019 15:29:46 +0100
The printk ring buffer provides an NMI-safe interface for writing
messages to a ring buffer. Using such a buffer for alleviates printk
callers from the current burdens of disabled preemption while calling
the console drivers (and possibly printing out many messages that
another task put into the log buffer).
Create a ring buffer to be used for storing messages to be
printed to the consoles.
Create a dedicated printk kthread to block on the ring buffer
and call the console drivers for the read messages.
NOTE: The printk_delay is relocated to _after_ the message is
printed, where it makes more sense.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
26/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove exclusive console hack
Date: Tue, 12 Feb 2019 15:29:47 +0100
In order to support printing the printk log history when new
consoles are registered, a global exclusive_console variable is
temporarily set. This only works because printk runs with
preemption disabled.
When console printing is moved to a fully preemptible dedicated
kthread, this hack no longer works.
Remove exclusive_console usage.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
27/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: redirect emit/store to new ringbuffer
Date: Tue, 12 Feb 2019 15:29:48 +0100
vprintk_emit and vprintk_store are the main functions that all printk
variants eventually go through. Change these to store the message in
the new printk ring buffer that the printk kthread is reading.
Remove functions no longer in use because of the changes to
vprintk_emit and vprintk_store.
In order to handle interrupts and NMIs, a second per-cpu ring buffer
(sprint_rb) is added. This ring buffer is used for NMI-safe memory
allocation in order to format the printk messages.
NOTE: LOG_CONT is ignored for now and handled as individual messages.
LOG_CONT functions are masked behind "#if 0" blocks until their
functionality can be restored
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
28/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk_safe: remove printk safe code
Date: Tue, 12 Feb 2019 15:29:49 +0100
vprintk variants are now NMI-safe so there is no longer a need for
the "safe" calls.
NOTE: This also removes printk flushing functionality.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
29/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: minimize console locking implementation
Date: Tue, 12 Feb 2019 15:29:50 +0100
Since printing of the printk buffer is now handled by the printk
kthread, minimize the console locking functions to just handle
locking of the console.
NOTE: With this console_flush_on_panic will no longer flush.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
30/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: track seq per console
Date: Tue, 12 Feb 2019 15:29:51 +0100
Allow each console to track which seq record was last printed. This
simplifies identifying dropped records.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
31/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: do boot_delay_msec inside printk_delay
Date: Tue, 12 Feb 2019 15:29:52 +0100
Both functions needed to be called one after the other, so just
integrate boot_delay_msec into printk_delay for simplification.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
32/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: print history for new consoles
Date: Tue, 12 Feb 2019 15:29:53 +0100
When new consoles register, they currently print how many messages
they have missed. However, many (or all) of those messages may still
be in the ring buffer. Add functionality to print as much of the
history as available. This is a clean replacement of the old
exclusive console hack.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
33/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement CON_PRINTBUFFER
Date: Tue, 12 Feb 2019 15:29:54 +0100
If the CON_PRINTBUFFER flag is not set, do not replay the history
for that console.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
34/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add processor number to output
Date: Tue, 12 Feb 2019 15:29:55 +0100
It can be difficult to sort printk out if multiple processors are
printing simultaneously. Add the processor number to the printk
output to allow the messages to be sorted.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
35/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: console: add write_atomic interface
Date: Tue, 12 Feb 2019 15:29:56 +0100
Add a write_atomic callback to the console. This is an optional
function for console drivers. The function must be atomic (including
NMI safe) for writing to the console.
Console drivers must still implement the write callback. The
write_atomic callback will only be used for emergency messages.
Creating an NMI safe write_atomic that must synchronize with write
requires a careful implementation of the console driver. To aid with
the implementation, a set of console_atomic_* functions are provided:
void console_atomic_lock(unsigned int *flags);
void console_atomic_unlock(unsigned int flags);
These functions synchronize using the processor-reentrant cpu lock of
the printk buffer.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
36/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: introduce emergency messages
Date: Tue, 12 Feb 2019 15:29:57 +0100
Console messages are generally either critical or non-critical.
Critical messages are messages such as crashes or sysrq output.
Critical messages should never be lost because generally they provide
important debugging information.
Since all console messages are output via a fully preemptible printk
kernel thread, it is possible that messages are not output because
that thread cannot be scheduled (BUG in scheduler, run-away RT task,
etc).
To allow critical messages to be output independent of the
schedulability of the printk task, introduce an emergency mechanism
that _immediately_ outputs the message to the consoles. To avoid
possible unbounded latency issues, the emergency mechanism only
outputs the printk line provided by the caller and ignores any
pending messages in the log buffer.
Critical messages are identified as messages (by default) with log
level LOGLEVEL_WARNING or more critical. This is configurable via the
kernel option CONSOLE_LOGLEVEL_EMERGENCY.
Any messages output as emergency messages are skipped by the printk
thread on those consoles that output the emergency message.
In order for a console driver to support emergency messages, the
write_atomic function must be implemented by the driver. If not
implemented, the emergency messages are handled like all other
messages and are printed by the printk thread.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
37/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: serial: 8250: implement write_atomic
Date: Tue, 12 Feb 2019 15:29:58 +0100
Implement a non-sleeping NMI-safe write_atomic console function in
order to support emergency printk messages.
Since interrupts need to be disabled during transmit, all usage of
the IER register was wrapped with access functions that use the
console_atomic_lock function to synchronize register access while
tracking the state of the interrupts. This was necessary because
write_atomic is can be calling from an NMI context that has
preempted write_atomic.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
38/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement KERN_CONT
Date: Tue, 12 Feb 2019 15:29:59 +0100
Implement KERN_CONT based on the printing CPU rather than on the
printing task. As long as the KERN_CONT messages are coming from the
same CPU and no non-KERN_CONT messages come, the messages are assumed
to belong to each other.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
39/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement /dev/kmsg
Date: Tue, 12 Feb 2019 15:30:00 +0100
Since printk messages are now logged to a new ring buffer, update
the /dev/kmsg functions to pull the messages from there.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
40/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement syslog
Date: Tue, 12 Feb 2019 15:30:01 +0100
Since printk messages are now logged to a new ring buffer, update
the syslog functions to pull the messages from there.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
41/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement kmsg_dump
Date: Tue, 12 Feb 2019 15:30:02 +0100
Since printk messages are now logged to a new ring buffer, update
the kmsg_dump functions to pull the messages from there.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
42/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove unused code
Date: Tue, 12 Feb 2019 15:30:03 +0100
Code relating to the safe context and anything dealing with the
previous log buffer implementation is no longer in use. Remove it.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
43/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: set deferred to default loglevel, enforce mask
Date: Thu, 14 Feb 2019 23:13:30 +0100
All messages printed via vpritnk_deferred() were being
automatically treated as emergency messages.
Messages printed via vprintk_deferred() should be set to the
default loglevel. LOGLEVEL_SCHED is no longer relevant.
Also, enforce the loglevel mask for emergency messages.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
44/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: serial: 8250: remove that trylock in serial8250_console_write_atomic()
Date: Thu, 14 Feb 2019 17:38:24 +0100
This does not work as rtmutex in NMI context. As per John, it is not
needed.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
45/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: serial: 8250: export symbols which are used by symbols
Date: Sat, 16 Feb 2019 09:02:00 +0100
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
46/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm: remove printk_nmi_.*()
Date: Fri, 15 Feb 2019 14:34:20 +0100
It is no longer provided by the printk core code.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
47/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: only allow kernel to emergency message
Date: Sun, 17 Feb 2019 03:11:20 +0100
Emergency messages exist as a mechanism for the kernel to
communicate critical information to users. It is not meant for
use by userspace. Only allow facility=0 messages to be
processed by the emergency message code.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
48/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: devkmsg: llseek: reset clear if it is lost
Date: Fri, 22 Feb 2019 23:02:44 +0100
SEEK_DATA will seek to the last clear record. If this clear record
is no longer in the ring buffer, devkmsg_llseek() will go into an
infinite loop. Fix that by resetting the clear sequence if the old
clear record is no longer in the ring buffer.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
49/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: printk: print "rate-limitted" message as info
Date: Fri, 22 Feb 2019 12:47:13 +0100
If messages which are injected via kmsg are dropped then they don't need
to be printed as warnings. This is to avoid latency spikes if the
interface decides to print a lot of important messages.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
50/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: kmsg_dump: remove mutex usage
Date: Wed, 24 Apr 2019 16:36:04 +0200
The kmsg dumper can be called from any context, but the dumping
helpers were using a mutex to synchronize the iterator against
concurrent dumps.
Rather than trying to synchronize the iterator, use a local copy
of the iterator during the dump. Then no synchronization is
required.
Reported-by: Scott Wood <swood@redhat.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
51/251 [
Author: He Zhe
Email: zhe.he@windriver.com
Subject: printk: devkmsg: read: Return EPIPE when the first message user-space wants has gone
Date: Tue, 24 Sep 2019 15:26:39 +0800
When user-space wants to read the first message, that is when user->seq
is 0, and that message has gone, it currently automatically resets
user->seq to current first seq. This mis-aligns with mainline kernel.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/dev-kmsg#n39
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/printk/printk.c#n899
We should inform user-space that what it wants has gone by returning EPIPE
in such scenario.
Link: https://lore.kernel.org/r/20190924072639.25986-1-zhe.he@windriver.com
Signed-off-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
52/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: handle iterating while buffer changing
Date: Mon, 7 Oct 2019 16:20:39 +0200
The syslog and kmsg_dump readers are provided buffers to fill.
Both try to maximize the provided buffer usage by calculating the
maximum number of messages that can fit. However, if after the
calculation, messages are dropped and new messages added, the
calculation will no longer match.
For syslog, add a check to make sure the provided buffer is not
overfilled.
For kmsg_dump, start over by recalculating the messages
available.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
53/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: hack out emergency loglevel usage
Date: Tue, 3 Dec 2019 09:14:57 +0100
Instead of using an emergency loglevel to determine if atomic
messages should be printed, use oops_in_progress. This conforms
to the decision that latency-causing atomic messages never be
generated during normal operation.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
54/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: serial: 8250: only atomic lock for console
Date: Fri, 10 Jan 2020 16:45:31 +0106
The atomic console implementation requires that IER is synchronized
between atomic and non-atomic usage. However, it was implemented such
that the console_atomic_lock was performed for all IER access, even
if that port was not a console.
The implementation also used a usage counter to keep track of IER
clear/restore windows. However, this is not needed because the
console_atomic_lock synchronization of IER access with prevent any
situations where IER is prematurely restored or left cleared.
Move the IER access functions to inline macros. They will only
console_atomic_lock if the port is a console. Remove the
restore_ier() function by having clear_ier() return the prior IER
value so that the caller can restore it using set_ier(). Rename the
IER access functions to match other 8250 wrapper macros.
Suggested-by: Dick Hollenbeck <dick@softplc.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
55/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: serial: 8250: fsl/ingenic/mtk: fix atomic console
Date: Fri, 10 Jan 2020 16:45:32 +0106
A few 8250 implementations have their own IER access. If the port
is a console, wrap the accesses with console_atomic_lock.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
56/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
Date: Fri, 15 Nov 2019 18:54:20 +0100
Bit spinlocks are problematic if PREEMPT_RT is enabled, because they
disable preemption, which is undesired for latency reasons and breaks when
regular spinlocks are taken within the bit_spinlock locked region because
regular spinlocks are converted to 'sleeping spinlocks' on RT. So RT
replaces the bit spinlocks with regular spinlocks to avoid this problem.
Bit spinlocks are also not covered by lock debugging, e.g. lockdep.
Substitute the BH_Uptodate_Lock bit spinlock with a regular spinlock.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: remove the wrapper and use always spinlock_t and move it into
the padding hole]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
57/251 [
Author: Clark Williams
Email: williams@redhat.com
Subject: thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
Date: Mon, 15 Jul 2019 15:25:00 -0500
The spinlock pkg_temp_lock has the potential of being taken in atomic
context because it can be acquired from the thermal IRQ vector.
It's static and limited scope so go ahead and make it a raw spinlock.
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
58/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: perf/core: Add SRCU annotation for pmus list walk
Date: Fri, 15 Nov 2019 18:04:07 +0100
Since commit
28875945ba98d ("rcu: Add support for consolidated-RCU reader checking")
there is an additional check to ensure that a RCU related lock is held
while the RCU list is iterated.
This section holds the SRCU reader lock instead.
Add annotation to list_for_each_entry_rcu() that pmus_srcu must be
acquired during the list traversal.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
59/251 [
Author: He Zhe
Email: zhe.he@windriver.com
Subject: kmemleak: Turn kmemleak_lock and object->lock to raw_spinlock_t
Date: Wed, 19 Dec 2018 16:30:57 +0100
kmemleak_lock as a rwlock on RT can possibly be acquired in atomic context
which does work on RT.
Since the kmemleak operation is performed in atomic context make it a
raw_spinlock_t so it can also be acquired on RT. This is used for
debugging and is not enabled by default in a production like environment
(where performance/latency matters) so it makes sense to make it a
raw_spinlock_t instead trying to get rid of the atomic context.
Turn also the kmemleak_object->lock into raw_spinlock_t which is
acquired (nested) while the kmemleak_lock is held.
The time spent in "echo scan > kmemleak" slightly improved on 64core box
with this patch applied after boot.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lkml.kernel.org/r/20181218150744.GB20197@arrakis.emea.arm.com
Link: https://lkml.kernel.org/r/1542877459-144382-1-git-send-email-zhe.he@windriver.com
Link: https://lkml.kernel.org/r/20190927082230.34152-1-yongxin.liu@windriver.com
Signed-off-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Liu Haitao <haitao.liu@windriver.com>
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
[bigeasy: Redo the description. Merge the individual bits: He Zhe did
the kmemleak_lock, Liu Haitao the ->lock and Yongxin Liu forwarded the
patch.]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
60/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: Use CONFIG_PREEMPTION
Date: Fri, 26 Jul 2019 11:30:49 +0200
Thisi is an all-in-one patch of the current `PREEMPTION' branch.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
61/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: BPF: Disable on PREEMPT_RT
Date: Thu, 10 Oct 2019 16:54:45 +0200
Disable BPF on PREEMPT_RT because
- it allocates and frees memory in atomic context
- it uses up_read_non_owner()
- BPF_PROG_RUN() expects to be invoked in non-preemptible context
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
62/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: workqueue: Don't assume that the callback has interrupts disabled
Date: Tue, 11 Jun 2019 11:21:02 +0200
Due to the TIMER_IRQSAFE flag, the timer callback is invoked with
disabled interrupts. On -RT the callback is invoked in softirq context
with enabled interrupts. Since the interrupts are threaded, there are
are no in_irq() users. The local_bh_disable() around the threaded
handler ensures that there is either a timer or a threaded handler
active on the CPU.
Disable interrupts before __queue_work() is invoked from the timer
callback.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
63/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: sched/swait: Add swait_event_lock_irq()
Date: Wed, 22 May 2019 12:42:26 +0200
The swait_event_lock_irq() is inspired by wait_event_lock_irq(). This is
required by the workqueue code once it switches to swait.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
64/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: workqueue: Use swait for wq_manager_wait
Date: Tue, 11 Jun 2019 11:21:09 +0200
In order for the workqueue code use raw_spinlock_t typed locking there
must not be a spinlock_t typed lock be acquired. A wait_queue_head uses
a spinlock_t lock for its list protection.
Use a swait based queue head to avoid raw_spinlock_t -> spinlock_t
locking.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
65/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: workqueue: Convert the locks to raw type
Date: Wed, 22 May 2019 12:43:56 +0200
After all the workqueue and the timer rework, we can finally make the
worker_pool lock raw.
The lock is not held over an unbounded period of time/iterations.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
66/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/compaction: Disable compact_unevictable_allowed on RT
Date: Fri, 8 Nov 2019 12:55:47 +0100
Since commit
5bbe3547aa3ba ("mm: allow compaction of unevictable pages")
it is allowed to examine mlocked pages for pages to compact by default.
On -RT even minor pagefaults are problematic because it may take a few
100us to resolve them and until then the task is blocked.
Make compact_unevictable_allowed = 0 default and remove it from /proc on
RT.
Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
67/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: Remove ->css_rstat_flush()
Date: Thu, 15 Aug 2019 18:14:16 +0200
I was looking at the lifetime of the the ->css_rstat_flush() to see if
cgroup_rstat_cpu_lock should remain a raw_spinlock_t. I didn't find any
users and is unused since it was introduced in commit
8f53470bab042 ("cgroup: Add cgroup_subsys->css_rstat_flush()")
Remove the css_rstat_flush callback because it has no users.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
68/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: Consolidate users of cgroup_rstat_lock.
Date: Fri, 16 Aug 2019 12:20:42 +0200
cgroup_rstat_flush_irqsafe() has no users, remove it.
cgroup_rstat_flush_hold() and cgroup_rstat_flush_release() are only used within
this file. Make it static.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
69/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: Remove `may_sleep' from cgroup_rstat_flush_locked()
Date: Fri, 16 Aug 2019 12:25:35 +0200
cgroup_rstat_flush_locked() is always invoked with `may_sleep' set to
true so that this case can be made default and the parameter removed.
Remove the `may_sleep' parameter.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
70/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: Acquire cgroup_rstat_lock with enabled interrupts
Date: Fri, 16 Aug 2019 12:49:36 +0200
There is no need to disable interrupts while cgroup_rstat_lock is
acquired. The lock is never used in-IRQ context so a simple spin_lock()
is enough for synchronisation purpose.
Acquire cgroup_rstat_lock without disabling interrupts and ensure that
cgroup_rstat_cpu_lock is acquired with disabled interrupts (this one is
acquired in-IRQ context).
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
71/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: workingset: replace IRQ-off check with a lockdep assert.
Date: Mon, 11 Feb 2019 10:40:46 +0100
Commit
68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow nodes")
introduced an IRQ-off check to ensure that a lock is held which also
disabled interrupts. This does not work the same way on -RT because none
of the locks, that are held, disable interrupts.
Replace this check with a lockdep assert which ensures that the lock is
held.
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
72/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: tpm: remove tpm_dev_wq_lock
Date: Mon, 11 Feb 2019 11:33:11 +0100
Added in commit
9e1b74a63f776 ("tpm: add support for nonblocking operation")
but never actually used it.
Cc: Philip Tricca <philip.b.tricca@intel.com>
Cc: Tadeusz Struk <tadeusz.struk@intel.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
73/251 [
Author: Rob Herring
Email: robh@kernel.org
Subject: of: Rework and simplify phandle cache to use a fixed size
Date: Wed, 11 Dec 2019 17:23:45 -0600
The phandle cache was added to speed up of_find_node_by_phandle() by
avoiding walking the whole DT to find a matching phandle. The
implementation has several shortcomings:
- The cache is designed to work on a linear set of phandle values.
This is true for dtc generated DTs, but not for other cases such as
Power.
- The cache isn't enabled until of_core_init() and a typical system
may see hundreds of calls to of_find_node_by_phandle() before that
point.
- The cache is freed and re-allocated when the number of phandles
changes.
- It takes a raw spinlock around a memory allocation which breaks on
RT.
Change the implementation to a fixed size and use hash_32() as the
cache index. This greatly simplifies the implementation. It avoids
the need for any re-alloc of the cache and taking a reference on nodes
in the cache. We only have a single source of removing cache entries
which is of_detach_node().
Using hash_32() removes any assumption on phandle values improving
the hit rate for non-linear phandle values. The effect on linear values
using hash_32() is about a 10% collision. The chances of thrashing on
colliding values seems to be low.
To compare performance, I used a RK3399 board which is a pretty typical
system. I found that just measuring boot time as done previously is
noisy and may be impacted by other things. Also bringing up secondary
cores causes some issues with measuring, so I booted with 'nr_cpus=1'.
With no caching, calls to of_find_node_by_phandle() take about 20124 us
for 1248 calls. There's an additional 288 calls before time keeping is
up. Using the average time per hit/miss with the cache, we can calculate
these calls to take 690 us (277 hit / 11 miss) with a 128 entry cache
and 13319 us with no cache or an uninitialized cache.
Comparing the 3 implementations the time spent in
of_find_node_by_phandle() is:
no cache: 20124 us (+ 13319 us)
128 entry cache: 5134 us (+ 690 us)
current cache: 819 us (+ 13319 us)
We could move the allocation of the cache earlier to improve the
current cache, but that just further complicates the situation as it
needs to be after slab is up, so we can't do it when unflattening (which
uses memblock).
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lkml.kernel.org/r/20191211232345.24810-1-robh@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
74/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: timekeeping: Split jiffies seqlock
Date: Thu, 14 Feb 2013 22:36:59 +0100
Replace jiffies_lock seqlock with a simple seqcounter and a rawlock so
it can be taken in atomic context on RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
75/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signal: Revert ptrace preempt magic
Date: Wed, 21 Sep 2011 19:57:12 +0200
Upstream commit '53da1d9456fe7f8 fix ptrace slowness' is nothing more
than a bandaid around the ptrace design trainwreck. It's not a
correctness issue, it's merily a cosmetic bandaid.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
76/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: dma-buf: Use seqlock_t instread disabling preemption
Date: Wed, 14 Aug 2019 16:38:43 +0200
"dma reservation" disables preemption while acquiring the write access
for "seqcount".
Replace the seqcount with a seqlock_t which provides seqcount like
semantic and lock for writer.
Link: https://lkml.kernel.org/r/f410b429-db86-f81c-7c67-f563fa808b62@free.fr
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
77/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: seqlock: Prevent rt starvation
Date: Wed, 22 Feb 2012 12:03:30 +0100
If a low prio writer gets preempted while holding the seqlock write
locked, a high prio reader spins forever on RT.
To prevent this let the reader grab the spinlock, so it blocks and
eventually boosts the writer. This way the writer can proceed and
endless spinning is prevented.
For seqcount writers we disable preemption over the update code
path. Thanks to Al Viro for distangling some VFS code to make that
possible.
Nicholas Mc Guire:
- spin_lock+unlock => spin_unlock_wait
- __write_seqcount_begin => __raw_write_seqcount_begin
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
78/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: NFSv4: replace seqcount_t with a seqlock_t
Date: Fri, 28 Oct 2016 23:05:11 +0200
The raw_write_seqcount_begin() in nfs4_reclaim_open_state() causes a
preempt_disable() on -RT. The spin_lock()/spin_unlock() in that section does
not work.
The lockdep part was removed in commit
abbec2da13f0 ("NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state")
because lockdep complained.
The whole seqcount thing was introduced in commit
c137afabe330 ("NFSv4: Allow the state manager to mark an open_owner as being recovered").
The recovery threads runs only once.
write_seqlock() does not work on !RT because it disables preemption and it the
writer side is preemptible (has to remain so despite the fact that it will
block readers).
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Link: https://lkml.kernel.org/r/20161021164727.24485-1-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
79/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/Qdisc: use a seqlock instead seqcount
Date: Wed, 14 Sep 2016 17:36:35 +0200
The seqcount disables preemption on -RT while it is held which can't
remove. Also we don't want the reader to spin for ages if the writer is
scheduled out. The seqlock on the other hand will serialize / sleep on
the lock while writer is active.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
80/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: Add a mutex around devnet_rename_seq
Date: Wed, 20 Mar 2013 18:06:20 +0100
On RT write_seqcount_begin() disables preemption and device_rename()
allocates memory with GFP_KERNEL and grabs later the sysfs_mutex
mutex. Serialize with a mutex and add use the non preemption disabling
__write_seqcount_begin().
To avoid writer starvation, let the reader grab the mutex and release
it when it detects a writer in progress. This keeps the normal case
(no reader on the fly) fast.
[ tglx: Instead of replacing the seqcount by a mutex, add the mutex ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
81/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: userfaultfd: Use a seqlock instead of seqcount
Date: Wed, 18 Dec 2019 12:25:09 +0100
On RT write_seqcount_begin() disables preemption which leads to warning
in add_wait_queue() while the spinlock_t is acquired.
The waitqueue can't be converted to swait_queue because
userfaultfd_wake_function() is used as a custom wake function.
Use seqlock instead seqcount to avoid the preempt_disable() section
during add_wait_queue().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
82/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/nfs: turn rmdir_sem into a semaphore
Date: Thu, 15 Sep 2016 10:51:27 +0200
The RW semaphore had a reader side which used the _non_owner version
because it most likely took the reader lock in one thread and released it
in another which would cause lockdep to complain if the "regular"
version was used.
On -RT we need the owner because the rw lock is turned into a rtmutex.
The semaphores on the hand are "plain simple" and should work as
expected. We can't have multiple readers but on -RT we don't allow
multiple readers anyway so that is not a loss.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
83/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: disable preemption on i_dir_seq's write side
Date: Fri, 20 Oct 2017 11:29:53 +0200
i_dir_seq is an opencoded seqcounter. Based on the code it looks like we
could have two writers in parallel despite the fact that the d_lock is
held. The problem is that during the write process on RT the preemption
is still enabled and if this process is interrupted by a reader with RT
priority then we lock up.
To avoid that lock up I am disabling the preemption during the update.
The rename of i_dir_seq is here to ensure to catch new write sides in
future.
Cc: stable-rt@vger.kernel.org
Reported-by: Oleg.Karfich@wago.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
84/251 [
Author: Paul Gortmaker
Email: paul.gortmaker@windriver.com
Subject: list_bl: Make list head locking RT safe
Date: Fri, 21 Jun 2013 15:07:25 -0400
As per changes in include/linux/jbd_common.h for avoiding the
bit_spin_locks on RT ("fs: jbd/jbd2: Make state lock and journal
head lock rt safe") we do the same thing here.
We use the non atomic __set_bit and __clear_bit inside the scope of
the lock to preserve the ability of the existing LIST_DEBUG code to
use the zero'th bit in the sanity checks.
As a bit spinlock, we had no lockdep visibility into the usage
of the list head locking. Now, if we were to implement it as a
standard non-raw spinlock, we would see:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
in_atomic(): 1, irqs_disabled(): 0, pid: 122, name: udevd
5 locks held by udevd/122:
#0: (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffff811967e8>] lock_rename+0xe8/0xf0
#1: (rename_lock){+.+...}, at: [<ffffffff811a277c>] d_move+0x2c/0x60
#2: (&dentry->d_lock){+.+...}, at: [<ffffffff811a0763>] dentry_lock_for_move+0xf3/0x130
#3: (&dentry->d_lock/2){+.+...}, at: [<ffffffff811a0734>] dentry_lock_for_move+0xc4/0x130
#4: (&dentry->d_lock/3){+.+...}, at: [<ffffffff811a0747>] dentry_lock_for_move+0xd7/0x130
Pid: 122, comm: udevd Not tainted 3.4.47-rt62 #7
Call Trace:
[<ffffffff810b9624>] __might_sleep+0x134/0x1f0
[<ffffffff817a24d4>] rt_spin_lock+0x24/0x60
[<ffffffff811a0c4c>] __d_shrink+0x5c/0xa0
[<ffffffff811a1b2d>] __d_drop+0x1d/0x40
[<ffffffff811a24be>] __d_move+0x8e/0x320
[<ffffffff811a278e>] d_move+0x3e/0x60
[<ffffffff81199598>] vfs_rename+0x198/0x4c0
[<ffffffff8119b093>] sys_renameat+0x213/0x240
[<ffffffff817a2de5>] ? _raw_spin_unlock+0x35/0x60
[<ffffffff8107781c>] ? do_page_fault+0x1ec/0x4b0
[<ffffffff817a32ca>] ? retint_swapgs+0xe/0x13
[<ffffffff813eb0e6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8119b0db>] sys_rename+0x1b/0x20
[<ffffffff817a3b96>] system_call_fastpath+0x1a/0x1f
Since we are only taking the lock during short lived list operations,
lets assume for now that it being raw won't be a significant latency
concern.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
[julia@ni.com: Use #define instead static inline to avoid false positive from
lockdep]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
85/251 [
Author: Clark Williams
Email: williams@redhat.com
Subject: fscache: initialize cookie hash table raw spinlocks
Date: Tue, 3 Jul 2018 13:34:30 -0500
The fscache cookie mechanism uses a hash table of hlist_bl_head structures. The
PREEMPT_RT patcheset adds a raw spinlock to this structure and so on PREEMPT_RT
the structures get used uninitialized, causing warnings about bad magic numbers
when spinlock debugging is turned on.
Use the init function for fscache cookies.
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
86/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: bring back explicit INIT_HLIST_BL_HEAD init
Date: Wed, 13 Sep 2017 12:32:34 +0200
Commit 3d375d78593c ("mm: update callers to use HASH_ZERO flag") removed
INIT_HLIST_BL_HEAD and uses the ZERO flag instead for the init. However
on RT we have also a spinlock which needs an init call so we can't use
that.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
87/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: use swait_queue instead of waitqueue
Date: Wed, 14 Sep 2016 14:35:49 +0200
__d_lookup_done() invokes wake_up_all() while holding a hlist_bl_lock()
which disables preemption. As a workaround convert it to swait.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
88/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: kconfig: Disable config options which are not RT compatible
Date: Sun, 24 Jul 2011 12:11:43 +0200
Disable stuff which is known to have issues on RT
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
89/251 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: Allow only SLUB on RT
Date: Fri, 3 Jul 2009 08:44:03 -0500
Memory allocation disables interrupts as part of the allocation and freeing
process. For -RT it is important that this section remain short and don't
depend on the size of the request or an internal state of the memory allocator.
At the beginning the SLAB memory allocator was adopted for RT's needs and it
required substantial changes. Later, with the addition of the SLUB memory
allocator we adopted this one as well and the changes were smaller. More
important, due to the design of the SLUB allocator it performs better and its
worst case latency was smaller. In the end only SLUB remained supported.
Disable SLAB and SLOB on -RT. Only SLUB is adopted to -RT needs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
90/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rcu: make RCU_BOOST default on RT
Date: Fri, 21 Mar 2014 20:19:05 +0100
Since it is no longer invoked from the softirq people run into OOM more
often if the priority of the RCU thread is too low. Making boosting
default on RT should help in those case and it can be switched off if
someone knows better.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
91/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable CONFIG_RT_GROUP_SCHED on RT
Date: Mon, 18 Jul 2011 17:03:52 +0200
Carsten reported problems when running:
taskset 01 chrt -f 1 sleep 1
from within rc.local on a F15 machine. The task stays running and
never gets on the run queue because some of the run queues have
rt_throttled=1 which does not go away. Works nice from a ssh login
shell. Disabling CONFIG_RT_GROUP_SCHED solves that as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
92/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: disable NET_RX_BUSY_POLL on RT
Date: Sat, 27 May 2017 19:02:06 +0200
napi_busy_loop() disables preemption and performs a NAPI poll. We can't acquire
sleeping locks with disabled preemption so we would have to work around this
and add explicit locking for synchronisation against ksoftirqd.
Without explicit synchronisation a low priority process would "own" the NAPI
state (by setting NAPIF_STATE_SCHED) and could be scheduled out (no
preempt_disable() and BH is preemptible on RT).
In case a network packages arrives then the interrupt handler would set
NAPIF_STATE_MISSED and the system would wait until the task owning the NAPI
would be scheduled in again.
Should a task with RT priority busy poll then it would consume the CPU instead
allowing tasks with lower priority to run.
The NET_RX_BUSY_POLL is disabled by default (the system wide sysctls for
poll/read are set to zero) so disable NET_RX_BUSY_POLL on RT to avoid wrong
locking context on RT. Should this feature be considered useful on RT systems
then it could be enabled again with proper locking and synchronisation.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
93/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: md: disable bcache
Date: Thu, 29 Aug 2013 11:48:57 +0200
It uses anon semaphores
|drivers/md/bcache/request.c: In function ‘cached_dev_write_complete’:
|drivers/md/bcache/request.c:1007:2: error: implicit declaration of function ‘up_read_non_owner’ [-Werror=implicit-function-declaration]
| up_read_non_owner(&dc->writeback_lock);
| ^
|drivers/md/bcache/request.c: In function ‘request_write’:
|drivers/md/bcache/request.c:1033:2: error: implicit declaration of function ‘down_read_non_owner’ [-Werror=implicit-function-declaration]
| down_read_non_owner(&dc->writeback_lock);
| ^
either we get rid of those or we have to introduce them…
Link: http://lkml.kernel.org/r/20130820111602.3cea203c@gandalf.local.home
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
94/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Disable runtime services on RT
Date: Thu, 26 Jul 2018 15:03:16 +0200
Based on meassurements the EFI functions get_variable /
get_next_variable take up to 2us which looks okay.
The functions get_time, set_time take around 10ms. Those 10ms are too
much. Even one ms would be too much.
Ard mentioned that SetVariable might even trigger larger latencies if
the firware will erase flash blocks on NOR.
The time-functions are used by efi-rtc and can be triggered during
runtimed (either via explicit read/write or ntp sync).
The variable write could be used by pstore.
These functions can be disabled without much of a loss. The poweroff /
reboot hooks may be provided by PSCI.
Disable EFI's runtime wrappers.
This was observed on "EFI v2.60 by SoftIron Overdrive 1000".
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
95/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Allow efi=runtime
Date: Thu, 26 Jul 2018 15:06:10 +0200
In case the command line option "efi=noruntime" is default at built-time, the user
could overwrite its state by `efi=runtime' and allow it again.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
96/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Disable HAVE_ARCH_JUMP_LABEL
Date: Mon, 1 Jul 2019 17:39:28 +0200
__text_poke() does:
| local_irq_save(flags);
…
| ptep = get_locked_pte(poking_mm, poking_addr, &ptl);
which does not work on -RT because the PTE-lock is a spinlock_t typed lock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
97/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Add local irq locks
Date: Mon, 20 Jun 2011 09:03:47 +0200
Introduce locallock. For !RT this maps to preempt_disable()/
local_irq_disable() so there is not much that changes. For RT this will
map to a spinlock. This makes preemption possible and locked "ressource"
gets the lockdep anotation it wouldn't have otherwise. The locks are
recursive for owner == current. Also, all locks user migrate_disable()
which ensures that the task is not migrated to another CPU while the lock
is held and the owner is preempted.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
98/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: softirq: Add preemptible softirq
Date: Mon, 20 May 2019 13:09:08 +0200
Add preemptible softirq for RT's needs. By removing the softirq count
from the preempt counter, the softirq becomes preemptible. A per-CPU
lock ensures that there is no parallel softirq processing or that
per-CPU variables are not access in parallel by multiple threads.
local_bh_enable() will process all softirq work that has been raised in
its BH-disabled section once the BH counter gets to 0.
[+ rcu_read_lock() as part of local_bh_disable() by Scott Wood]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
99/251 [
Author: Oleg Nesterov
Email: oleg@redhat.com
Subject: signal/x86: Delay calling signals in atomic
Date: Tue, 14 Jul 2015 14:26:34 +0200
On x86_64 we must disable preemption before we enable interrupts
for stack faults, int3 and debugging, because the current task is using
a per CPU debug stack defined by the IST. If we schedule out, another task
can come in and use the same stack and cause the stack to be corrupted
and crash the kernel on return.
When CONFIG_PREEMPT_RT is enabled, spin_locks become mutexes, and
one of these is the spin lock used in signal handling.
Some of the debug code (int3) causes do_trap() to send a signal.
This function calls a spin lock that has been converted to a mutex
and has the possibility to sleep. If this happens, the above issues with
the corrupted stack is possible.
Instead of calling the signal right away, for PREEMPT_RT and x86_64,
the signal information is stored on the stacks task_struct and
TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
code will send the signal when preemption is enabled.
[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: also needed on 32bit as per Yang Shi <yang.shi@linaro.org>]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
100/251 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: Split IRQ-off and zone->lock while freeing pages from PCP list #2
Date: Mon, 28 May 2018 15:24:21 +0200
Split the IRQ-off section while accessing the PCP list from zone->lock
while freeing pages.
Introcude isolate_pcp_pages() which separates the pages from the PCP
list onto a temporary list and then free the temporary list via
free_pcppages_bulk().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
101/251 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: Split IRQ-off and zone->lock while freeing pages from PCP list #2
Date: Mon, 28 May 2018 15:24:21 +0200
Split the IRQ-off section while accessing the PCP list from zone->lock
while freeing pages.
Introcude isolate_pcp_pages() which separates the pages from the PCP
list onto a temporary list and then free the temporary list via
free_pcppages_bulk().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
102/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/SLxB: change list_lock to raw_spinlock_t
Date: Mon, 28 May 2018 15:24:22 +0200
The list_lock is used with used with IRQs off on RT. Make it a raw_spinlock_t
otherwise the interrupts won't be disabled on -RT. The locking rules remain
the same on !RT.
This patch changes it for SLAB and SLUB since both share the same header
file for struct kmem_cache_node defintion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
103/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/SLUB: delay giving back empty slubs to IRQ enabled regions
Date: Thu, 21 Jun 2018 17:29:19 +0200
__free_slab() is invoked with disabled interrupts which increases the
irq-off time while __free_pages() is doing the work.
Allow __free_slab() to be invoked with enabled interrupts and move
everything from interrupts-off invocations to a temporary per-CPU list
so it can be processed later.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
104/251 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: page_alloc: rt-friendly per-cpu pages
Date: Fri, 3 Jul 2009 08:29:37 -0500
rt-friendly per-cpu pages: convert the irqs-off per-cpu locking
method into a preemptible, explicit-per-cpu-locks method.
Contains fixes from:
Peter Zijlstra <a.p.zijlstra@chello.nl>
Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
105/251 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: mm/page_alloc: Split drain_local_pages()
Date: Thu, 18 Apr 2019 11:09:04 +0200
Splitting the functionality of drain_local_pages() into a separate
function. This is a preparatory work for introducing the static key
dependend locking mechanism.
No functional change.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
106/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/swap: Add static key dependent pagevec locking
Date: Thu, 18 Apr 2019 11:09:05 +0200
The locking of struct pagevec is done by disabling preemption. In case the
struct has be accessed form interrupt context then interrupts are
disabled. This means the struct can only be accessed locally from the
CPU. There is also no lockdep coverage which would scream during if it
accessed from wrong context.
Create struct swap_pagevec which contains of a pagevec member and a
spin_lock_t. Introduce a static key, which changes the locking behavior
only if the key is set in the following way: Before the struct is accessed
the spin_lock has to be acquired instead of using preempt_disable(). Since
the struct is used CPU-locally there is no spinning on the lock but the
lock is acquired immediately. If the struct is accessed from interrupt
context, spin_lock_irqsave() is used.
No functional change yet because static key is not enabled.
[anna-maria: introduce static key]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
107/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/swap: Access struct pagevec remotely
Date: Thu, 18 Apr 2019 11:09:06 +0200
When the newly introduced static key would be enabled, struct pagevec is
locked during access. So it is possible to access it from a remote CPU. The
advantage is that the work can be done from the "requesting" CPU without
firing a worker on a remote CPU and waiting for it to complete the work.
No functional change because static key is not enabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
108/251 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: mm/swap: Enable "use_pvec_lock" nohz_full dependent
Date: Thu, 18 Apr 2019 11:09:07 +0200
When a system runs with CONFIG_NO_HZ_FULL enabled, the tick of CPUs listed
in 'nohz_full=' kernel command line parameter should be stopped whenever
possible. The tick stays longer stopped, when work for this CPU is handled
by another CPU.
With the already introduced static key 'use_pvec_lock' there is the
possibility to prevent firing a worker for mm/swap work on a remote CPU
with a stopped tick.
Therefore enabling the static key in case kernel command line parameter
'nohz_full=' setup was successful, which implies that CONFIG_NO_HZ_FULL is
set.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
109/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/swap: Enable use pvec lock on RT
Date: Mon, 12 Aug 2019 11:20:44 +0200
On RT we also need to avoid preempt disable/IRQ-off regions so have to enable
the locking while accessing pvecs.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
110/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: preempt: Provide preempt_*_(no)rt variants
Date: Fri, 24 Jul 2009 12:38:56 +0200
RT needs a few preempt_disable/enable points which are not necessary
otherwise. Implement variants to avoid #ifdeffery.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
111/251 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm/vmstat: Protect per cpu variables with preempt disable on RT
Date: Fri, 3 Jul 2009 08:30:13 -0500
Disable preemption on -RT for the vmstat code. On vanila the code runs in
IRQ-off regions while on -RT it is not. "preempt_disable" ensures that the
same ressources is not updated in parallel due to preemption.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
112/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm: Enable SLUB for RT
Date: Thu, 25 Oct 2012 10:32:35 +0100
Avoid the memory allocation in IRQ section
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: factor out everything except the kcalloc() workaorund ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
113/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: slub: Enable irqs for __GFP_WAIT
Date: Wed, 9 Jan 2013 12:08:15 +0100
SYSTEM_RUNNING might be too late for enabling interrupts. Allocations
with GFP_WAIT can happen before that. So use this as an indicator.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
114/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: slub: Disable SLUB_CPU_PARTIAL
Date: Wed, 15 Apr 2015 19:00:47 +0200
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 87, name: rcuop/7
|1 lock held by rcuop/7/87:
| #0: (rcu_callback){......}, at: [<ffffffff8112c76a>] rcu_nocb_kthread+0x1ca/0x5d0
|Preemption disabled at:[<ffffffff811eebd9>] put_cpu_partial+0x29/0x220
|
|CPU: 0 PID: 87 Comm: rcuop/7 Tainted: G W 4.0.0-rt0+ #477
|Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
| 000000000007a9fc ffff88013987baf8 ffffffff817441c7 0000000000000007
| 0000000000000000 ffff88013987bb18 ffffffff810eee51 0000000000000000
| ffff88013fc10200 ffff88013987bb48 ffffffff8174a1c4 000000000007a9fc
|Call Trace:
| [<ffffffff817441c7>] dump_stack+0x4f/0x90
| [<ffffffff810eee51>] ___might_sleep+0x121/0x1b0
| [<ffffffff8174a1c4>] rt_spin_lock+0x24/0x60
| [<ffffffff811a689a>] __free_pages_ok+0xaa/0x540
| [<ffffffff811a729d>] __free_pages+0x1d/0x30
| [<ffffffff811eddd5>] __free_slab+0xc5/0x1e0
| [<ffffffff811edf46>] free_delayed+0x56/0x70
| [<ffffffff811eecfd>] put_cpu_partial+0x14d/0x220
| [<ffffffff811efc98>] __slab_free+0x158/0x2c0
| [<ffffffff811f0021>] kmem_cache_free+0x221/0x2d0
| [<ffffffff81204d0c>] file_free_rcu+0x2c/0x40
| [<ffffffff8112c7e3>] rcu_nocb_kthread+0x243/0x5d0
| [<ffffffff810e951c>] kthread+0xfc/0x120
| [<ffffffff8174abc8>] ret_from_fork+0x58/0x90
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
115/251 [
Author: Yang Shi
Email: yang.shi@windriver.com
Subject: mm/memcontrol: Don't call schedule_work_on in preemption disabled context
Date: Wed, 30 Oct 2013 11:48:33 -0700
The following trace is triggered when running ltp oom test cases:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 17188, name: oom03
Preemption disabled at:[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
CPU: 2 PID: 17188 Comm: oom03 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffff88007684d730 ffff880070df9b58 ffffffff8169918d ffff880070df9b70
ffffffff8106db31 ffff88007688b4a0 ffff880070df9b88 ffffffff8169d9c0
ffff88007688b4a0 ffff880070df9bc8 ffffffff81059da1 0000000170df9bb0
Call Trace:
[<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff8112b361>] drain_all_stock+0xe1/0x1c0
[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
[<ffffffff8112beda>] __mem_cgroup_try_charge+0x41a/0xc40
[<ffffffff810f1c91>] ? release_pages+0x1b1/0x1f0
[<ffffffff8106f200>] ? sched_exec+0x40/0xb0
[<ffffffff8112cc87>] mem_cgroup_charge_common+0x37/0x70
[<ffffffff8112e2c6>] mem_cgroup_newpage_charge+0x26/0x30
[<ffffffff8110af68>] handle_pte_fault+0x618/0x840
[<ffffffff8103ecf6>] ? unpin_current_cpu+0x16/0x70
[<ffffffff81070f94>] ? migrate_enable+0xd4/0x200
[<ffffffff8110cde5>] handle_mm_fault+0x145/0x1e0
[<ffffffff810301e1>] __do_page_fault+0x1a1/0x4c0
[<ffffffff8169c9eb>] ? preempt_schedule_irq+0x4b/0x70
[<ffffffff8169e3b7>] ? retint_kernel+0x37/0x40
[<ffffffff8103053e>] do_page_fault+0xe/0x10
[<ffffffff8169e4c2>] page_fault+0x22/0x30
So, to prevent schedule_work_on from being called in preempt disabled context,
replace the pair of get/put_cpu() to get/put_cpu_light().
Signed-off-by: Yang Shi <yang.shi@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
116/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/memcontrol: Replace local_irq_disable with local locks
Date: Wed, 28 Jan 2015 17:14:16 +0100
There are a few local_irq_disable() which then take sleeping locks. This
patch converts them local locks.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
117/251 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: mm/zsmalloc: copy with get_cpu_var() and locking
Date: Tue, 22 Mar 2016 11:16:09 +0100
get_cpu_var() disables preemption and triggers a might_sleep() splat later.
This is replaced with get_locked_var().
This bitspinlocks are replaced with a proper mutex which requires a slightly
larger struct to allocate.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: replace the bitspin_lock() with a mutex, get_locked_var(). Mike then
fixed the size magic]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
118/251 [
Author: Luis Claudio R. Goncalves
Email: lclaudio@uudg.org
Subject: mm/zswap: Do not disable preemption in zswap_frontswap_store()
Date: Tue, 25 Jun 2019 11:28:04 -0300
Zswap causes "BUG: scheduling while atomic" by blocking on a rt_spin_lock() with
preemption disabled. The preemption is disabled by get_cpu_var() in
zswap_frontswap_store() to protect the access of the zswap_dstmem percpu variable.
Use get_locked_var() to protect the percpu zswap_dstmem variable, making the
code preemptive.
As get_cpu_ptr() also disables preemption, replace it by this_cpu_ptr() and
remove the counterpart put_cpu_ptr().
Steps to Reproduce:
1. # grubby --args "zswap.enabled=1" --update-kernel DEFAULT
2. # reboot
3. Calculate the amount o memory to be used by the test:
---> grep MemAvailable /proc/meminfo
---> Add 25% ~ 50% to that value
4. # stress --vm 1 --vm-bytes ${MemAvailable+25%} --timeout 240s
Usually, in less than 5 minutes the backtrace listed below appears, followed
by a kernel panic:
| BUG: scheduling while atomic: kswapd1/181/0x00000002
|
| Preemption disabled at:
| [<ffffffff8b2a6cda>] zswap_frontswap_store+0x21a/0x6e1
|
| Kernel panic - not syncing: scheduling while atomic
| CPU: 14 PID: 181 Comm: kswapd1 Kdump: loaded Not tainted 5.0.14-rt9 #1
| Hardware name: AMD Pence/Pence, BIOS WPN2321X_Weekly_12_03_21 03/19/2012
| Call Trace:
| panic+0x106/0x2a7
| __schedule_bug.cold+0x3f/0x51
| __schedule+0x5cb/0x6f0
| schedule+0x43/0xd0
| rt_spin_lock_slowlock_locked+0x114/0x2b0
| rt_spin_lock_slowlock+0x51/0x80
| zbud_alloc+0x1da/0x2d0
| zswap_frontswap_store+0x31a/0x6e1
| __frontswap_store+0xab/0x130
| swap_writepage+0x39/0x70
| pageout.isra.0+0xe3/0x320
| shrink_page_list+0xa8e/0xd10
| shrink_inactive_list+0x251/0x840
| shrink_node_memcg+0x213/0x770
| shrink_node+0xd9/0x450
| balance_pgdat+0x2d5/0x510
| kswapd+0x218/0x470
| kthread+0xfb/0x130
| ret_from_fork+0x27/0x50
Cc: stable-rt@vger.kernel.org
Reported-by: Ping Fang <pifang@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
119/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: radix-tree: use local locks
Date: Wed, 25 Jan 2017 16:34:27 +0100
The preload functionality uses per-CPU variables and preempt-disable to
ensure that it does not switch CPUs during its usage. This patch adds
local_locks() instead preempt_disable() for the same purpose and to
remain preemptible on -RT.
Cc: stable-rt@vger.kernel.org
Reported-and-debugged-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
120/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: kvm Require const tsc for RT
Date: Sun, 6 Nov 2011 12:26:18 +0100
Non constant TSC is a nightmare on bare metal already, but with
virtualization it becomes a complete disaster because the workarounds
are horrible latency wise. That's also a preliminary for running RT in
a guest on top of a RT host.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
121/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: pci/switchtec: Don't use completion's wait queue
Date: Wed, 4 Oct 2017 10:24:23 +0200
The poll callback is using completion's wait_queue_head_t member and
puts it in poll_wait() so the poll() caller gets a wakeup after command
completed. This does not work on RT because we don't have a
wait_queue_head_t in our completion implementation. Nobody in tree does
like that in tree so this is the only driver that breaks.
Instead of using the completion here is waitqueue with a status flag as
suggested by Logan.
I don't have the HW so I have no idea if it works as expected, so please
test it.
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
122/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: wait.h: include atomic.h
Date: Mon, 28 Oct 2013 12:19:57 +0100
| CC init/main.o
|In file included from include/linux/mmzone.h:9:0,
| from include/linux/gfp.h:4,
| from include/linux/kmod.h:22,
| from include/linux/module.h:13,
| from init/main.c:15:
|include/linux/wait.h: In function ‘wait_on_atomic_t’:
|include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
| if (atomic_read(val) == 0)
| ^
This pops up on ARM. Non-RT gets its atomic.h include from spinlock.h
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
123/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: completion: Use simple wait queues
Date: Fri, 11 Jan 2013 11:23:51 +0100
Completions have no long lasting callbacks and therefor do not need
the complex waitqueue variant. Use simple waitqueues which reduces the
contention on the waitqueue lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[cminyard@mvista.com: Move __prepare_to_swait() into the do loop because
swake_up_locked() removes the waiter on wake from the queue while in the
original code it is not the case]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
124/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: hrtimer: Allow raw wakeups during boot
Date: Fri, 9 Aug 2019 15:25:21 +0200
There are a few wake-up timers during the early boot which are essencial for
the system to make progress. At this stage there are no softirq spawn for the
softirq processing so there is no timer processing in softirq.
The wakeup in question:
smpboot_create_thread()
-> kthread_create_on_cpu()
-> kthread_bind()
-> wait_task_inactive()
-> schedule_hrtimeout()
Let the timer fire in hardirq context during the system boot.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
125/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: hrtimer: move state change before hrtimer_cancel in do_nanosleep()
Date: Thu, 6 Dec 2018 10:15:13 +0100
There is a small window between setting t->task to NULL and waking the
task up (which would set TASK_RUNNING). So the timer would fire, run and
set ->task to NULL while the other side/do_nanosleep() wouldn't enter
freezable_schedule(). After all we are peemptible here (in
do_nanosleep() and on the timer wake up path) and on KVM/virt the
virt-CPU might get preempted.
So do_nanosleep() wouldn't enter freezable_schedule() but cancel the
timer which is still running and wait for it via
hrtimer_wait_for_timer(). Then wait_event()/might_sleep() would complain
that it is invoked with state != TASK_RUNNING.
This isn't a problem since it would be reset to TASK_RUNNING later
anyway and we don't rely on the previous state.
Move the state update to TASK_RUNNING before hrtimer_cancel() so there
are no complains from might_sleep() about wrong state.
Cc: stable-rt@vger.kernel.org
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
126/251 [
Author: John Stultz
Email: johnstul@us.ibm.com
Subject: posix-timers: Thread posix-cpu-timers on -rt
Date: Fri, 3 Jul 2009 08:29:58 -0500
posix-cpu-timer code takes non -rt safe locks in hard irq
context. Move it to a thread.
[ 3.0 fixes from Peter Zijlstra <peterz@infradead.org> ]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
127/251 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: posix-timers: Add expiry lock
Date: Mon, 27 May 2019 16:54:06 +0200
If a about to be removed posix timer is active then the code will retry the
delete operation until it succeeds / the timer callback completes.
Use hrtimer_grab_expiry_lock() for posix timers which use a hrtimer underneath
to spin on a lock until the callback finished.
Introduce cpu_timers_grab_expiry_lock() for the posix-cpu-timer. This will
acquire the proper per-CPU spin_lock which is acquired by the CPU which is
expirering the timer.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
[bigeasy: keep the posix-cpu timer bits, everything else got applied]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
128/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Limit the number of task migrations per batch
Date: Mon, 6 Jun 2011 12:12:51 +0200
Put an upper limit on the number of tasks which are migrated per batch
to avoid large latencies.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
129/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Move mmdrop to RCU on RT
Date: Mon, 6 Jun 2011 12:20:33 +0200
Takes sleeping locks and calls into the memory allocator, so nothing
we want to do in task switch and oder atomic contexts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
130/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched: move stack + kprobe clean up to __put_task_struct()
Date: Mon, 21 Nov 2016 19:31:08 +0100
There is no need to free the stack before the task struct (except for reasons
mentioned in commit 68f24b08ee89 ("sched/core: Free the stack early if
CONFIG_THREAD_INFO_IN_TASK")). This also comes handy on -RT because we can't
free memory in preempt disabled region.
vfree_atomic() delays the memory cleanup to a worker. Since we move everything
to the RCU callback, we can also free it immediately.
Cc: stable-rt@vger.kernel.org #for kprobe_flush_task()
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
131/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add saved_state for tasks blocked on sleeping locks
Date: Sat, 25 Jun 2011 09:21:04 +0200
Spinlocks are state preserving in !RT. RT changes the state when a
task gets blocked on a lock. So we need to remember the state before
the lock contention. If a regular wakeup (not a RTmutex related
wakeup) happens, the saved_state is updated to running. When the lock
sleep is done, the saved state is restored.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
132/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Do not account rcu_preempt_depth on RT in might_sleep()
Date: Tue, 7 Jun 2011 09:19:06 +0200
RT changes the rcu_preempt_depth semantics, so we cannot check for it
in might_sleep().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
133/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable TTWU_QUEUE on RT
Date: Tue, 13 Sep 2011 16:42:35 +0200
The queued remote wakeup mechanism can introduce rather large
latencies if the number of migrated tasks is high. Disable it for RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
134/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: softirq: Avoid a cancel dead-lock in tasklet handling due to preemptible-softirq
Date: Sat, 22 Jun 2019 00:09:22 +0200
A pending / active tasklet which is preempted by a task on the same CPU
will spin indefinitely because the tasklet makes no progress.
To avoid this deadlock we can disable BH which will acquire the
softirq-lock which will force the completion of the softirq and so the
tasklet.
The BH off/on in tasklet_kill() will force tasklets which are not yet
running but scheduled (because ksoftirqd was preempted before it could
start the tasklet).
The BH off/on in tasklet_unlock_wait() will force tasklets which got
preempted while running.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
135/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Check preemption after reenabling interrupts
Date: Sun, 13 Nov 2011 17:17:09 +0100
raise_softirq_irqoff() disables interrupts and wakes the softirq
daemon, but after reenabling interrupts there is no preemption check,
so the execution of the softirq thread might be delayed arbitrarily.
In principle we could add that check to local_irq_enable/restore, but
that's overkill as the rasie_softirq_irqoff() sections are the only
ones which show this behaviour.
Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
136/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Disable softirq stacks for RT
Date: Mon, 18 Jul 2011 13:59:17 +0200
Disable extra stacks for softirqs. We want to preempt softirqs and
having them on special IRQ-stack does not make this easier.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
137/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: use local_bh_disable() in netif_rx_ni()
Date: Fri, 16 Jun 2017 19:03:16 +0200
In 2004 netif_rx_ni() gained a preempt_disable() section around
netif_rx() and its do_softirq() + testing for it. The do_softirq() part
is required because netif_rx() raises the softirq but does not invoke
it. The preempt_disable() is required to remain on the same CPU which added the
skb to the per-CPU list.
All this can be avoided be putting this into a local_bh_disable()ed
section. The local_bh_enable() part will invoke do_softirq() if
required.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
138/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Handle the various new futex race conditions
Date: Fri, 10 Jun 2011 11:04:15 +0200
RT opens a few new interesting race conditions in the rtmutex/futex
combo due to futex hash bucket lock being a 'sleeping' spinlock and
therefor not disabling preemption.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
139/251 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: futex: Fix bug on when a requeued RT task times out
Date: Tue, 14 Jul 2015 14:26:34 +0200
Requeue with timeout causes a bug with PREEMPT_RT.
The bug comes from a timed out condition.
TASK 1 TASK 2
------ ------
futex_wait_requeue_pi()
futex_wait_queue_me()
<timed out>
double_lock_hb();
raw_spin_lock(pi_lock);
if (current->pi_blocked_on) {
} else {
current->pi_blocked_on = PI_WAKE_INPROGRESS;
run_spin_unlock(pi_lock);
spin_lock(hb->lock); <-- blocked!
plist_for_each_entry_safe(this) {
rt_mutex_start_proxy_lock();
task_blocks_on_rt_mutex();
BUG_ON(task->pi_blocked_on)!!!!
The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the
problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to
grab the hb->lock, which it fails to do so. As the hb->lock is a mutex,
it will block and set the "pi_blocked_on" to the hb->lock.
When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails
because the task1's pi_blocked_on is no longer set to that, but instead,
set to the hb->lock.
The fix:
When calling rt_mutex_start_proxy_lock() a check is made to see
if the proxy tasks pi_blocked_on is set. If so, exit out early.
Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies
the proxy task that it is being requeued, and will handle things
appropriately.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
140/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: futex: Ensure lock/unlock symetry versus pi_lock and hash bucket lock
Date: Fri, 1 Mar 2013 11:17:42 +0100
In exit_pi_state_list() we have the following locking construct:
spin_lock(&hb->lock);
raw_spin_lock_irq(&curr->pi_lock);
...
spin_unlock(&hb->lock);
In !RT this works, but on RT the migrate_enable() function which is
called from spin_unlock() sees atomic context due to the held pi_lock
and just decrements the migrate_disable_atomic counter of the
task. Now the next call to migrate_disable() sees the counter being
negative and issues a warning. That check should be in
migrate_enable() already.
Fix this by dropping pi_lock before unlocking hb->lock and reaquire
pi_lock after that again. This is safe as the loop code reevaluates
head again under the pi_lock.
Reported-by: Yong Zhang <yong.zhang@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
141/251 [
Author: Grygorii Strashko
Email: Grygorii.Strashko@linaro.org
Subject: pid.h: include atomic.h
Date: Tue, 21 Jul 2015 19:43:56 +0300
This patch fixes build error:
CC kernel/pid_namespace.o
In file included from kernel/pid_namespace.c:11:0:
include/linux/pid.h: In function 'get_pid':
include/linux/pid.h:78:3: error: implicit declaration of function 'atomic_inc' [-Werror=implicit-function-declaration]
atomic_inc(&pid->count);
^
which happens when
CONFIG_PROVE_LOCKING=n
CONFIG_DEBUG_SPINLOCK=n
CONFIG_DEBUG_MUTEXES=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PID_NS=y
Vanilla gets this via spinlock.h.
Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
142/251 [
Author: Wolfgang M. Reimer
Email: linuxball@gmail.com
Subject: locking: locktorture: Do NOT include rwlock.h directly
Date: Tue, 21 Jul 2015 16:20:07 +0200
Including rwlock.h directly will cause kernel builds to fail
if CONFIG_PREEMPT_RT is defined. The correct header file
(rwlock_rt.h OR rwlock.h) will be included by spinlock.h which
is included by locktorture.c anyway.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Wolfgang M. Reimer <linuxball@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
143/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Add rtmutex_lock_killable()
Date: Thu, 9 Jun 2011 11:43:52 +0200
Add "killable" type to rtmutex. We need this since rtmutex are used as
"normal" mutexes which do use this type.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
144/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Make lock_killable work
Date: Sat, 1 Apr 2017 12:50:59 +0200
Locking an rt mutex killable does not work because signal handling is
restricted to TASK_INTERRUPTIBLE.
Use signal_pending_state() unconditionaly.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
145/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: spinlock: Split the lock types header
Date: Wed, 29 Jun 2011 19:34:01 +0200
Split raw_spinlock into its own file and the remaining spinlock_t into
its own non-RT header. The non-RT header will be replaced later by sleeping
spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
146/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Avoid include hell
Date: Wed, 29 Jun 2011 20:06:39 +0200
Include only the required raw types. This avoids pulling in the
complete spinlock header which in turn requires rtmutex.h at some point.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
147/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rbtree: don't include the rcu header
Date: Mon, 13 Jan 2020 12:11:23 -0500
The RCU header pulls in spinlock.h and fails due not yet defined types:
|In file included from include/linux/spinlock.h:275:0,
| from include/linux/rcupdate.h:38,
| from include/linux/rbtree.h:34,
| from include/linux/rtmutex.h:17,
| from include/linux/spinlock_types.h:18,
| from kernel/bounds.c:13:
|include/linux/rwlock_rt.h:16:38: error: unknown type name ‘rwlock_t’
| extern void __lockfunc rt_write_lock(rwlock_t *rwlock);
| ^
This patch moves the required RCU function from the rcupdate.h header file into
a new header file which can be included by both users.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
148/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Provide rt_mutex_slowlock_locked()
Date: Thu, 12 Oct 2017 16:14:22 +0200
This is the inner-part of rt_mutex_slowlock(), required for rwsem-rt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
149/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: export lockdep-less version of rt_mutex's lock, trylock and unlock
Date: Thu, 12 Oct 2017 16:36:39 +0200
Required for lock implementation ontop of rtmutex.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
150/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add sleeping lock implementation
Date: Thu, 12 Oct 2017 17:11:19 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
151/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Use the proper LOCK_OFFSET for cond_resched()
Date: Sun, 17 Jul 2011 22:51:33 +0200
RT does not increment preempt count when a 'sleeping' spinlock is
locked. Update PREEMPT_LOCK_OFFSET for that case.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
152/251 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: locking/rtmutex: Clean ->pi_blocked_on in the error case
Date: Mon, 30 Sep 2019 18:15:44 +0200
The function rt_mutex_wait_proxy_lock() cleans ->pi_blocked_on in case
of failure (timeout, signal). The same cleanup is required in
__rt_mutex_start_proxy_lock().
In both the cases the tasks was interrupted by a signal or timeout while
acquiring the lock and after the interruption it longer blocks on the
lock.
Fixes: 1a1fb985f2e2b ("futex: Handle early deadlock return correctly")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
153/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rtmutex: trylock is okay on -RT
Date: Wed, 2 Dec 2015 11:34:07 +0100
non-RT kernel could deadlock on rt_mutex_trylock() in softirq context. On
-RT we don't run softirqs in IRQ context but in thread context so it is
not a issue here.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
154/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add mutex implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:17:03 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
155/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add rwsem implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:28:34 +0200
The RT specific R/W semaphore implementation restricts the number of readers
to one because a writer cannot block on multiple readers and inherit its
priority or budget.
The single reader restricting is painful in various ways:
- Performance bottleneck for multi-threaded applications in the page fault
path (mmap sem)
- Progress blocker for drivers which are carefully crafted to avoid the
potential reader/writer deadlock in mainline.
The analysis of the writer code pathes shows, that properly written RT tasks
should not take them. Syscalls like mmap(), file access which take mmap sem
write locked have unbound latencies which are completely unrelated to mmap
sem. Other R/W sem users like graphics drivers are not suitable for RT tasks
either.
So there is little risk to hurt RT tasks when the RT rwsem implementation is
changed in the following way:
- Allow concurrent readers
- Make writers block until the last reader left the critical section. This
blocking is not subject to priority/budget inheritance.
- Readers blocked on a writer inherit their priority/budget in the normal
way.
There is a drawback with this scheme. R/W semaphores become writer unfair
though the applications which have triggered writer starvation (mostly on
mmap_sem) in the past are not really the typical workloads running on a RT
system. So while it's unlikely to hit writer starvation, it's possible. If
there are unexpected workloads on RT systems triggering it, we need to rethink
the approach.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
156/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add rwlock implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:18:06 +0200
The implementation is bias-based, similar to the rwsem implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
157/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: wire up RT's locking
Date: Thu, 12 Oct 2017 17:31:14 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
158/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rtmutex: add ww_mutex addon for mutex-rt
Date: Thu, 12 Oct 2017 17:34:38 +0200
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
159/251 [
Author: Mikulas Patocka
Email: mpatocka@redhat.com
Subject: locking/rt-mutex: fix deadlock in device mapper / block-IO
Date: Mon, 13 Nov 2017 12:56:53 -0500
When some block device driver creates a bio and submits it to another
block device driver, the bio is added to current->bio_list (in order to
avoid unbounded recursion).
However, this queuing of bios can cause deadlocks, in order to avoid them,
device mapper registers a function flush_current_bio_list. This function
is called when device mapper driver blocks. It redirects bios queued on
current->bio_list to helper workqueues, so that these bios can proceed
even if the driver is blocked.
The problem with CONFIG_PREEMPT_RT is that when the device mapper
driver blocks, it won't call flush_current_bio_list (because
tsk_is_pi_blocked returns true in sched_submit_work), so deadlocks in
block device stack can happen.
Note that we can't call blk_schedule_flush_plug if tsk_is_pi_blocked
returns true - that would cause
BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on)) in
task_blocks_on_rt_mutex when flush_current_bio_list attempts to take a
spinlock.
So the proper fix is to call blk_schedule_flush_plug in rt_mutex_fastlock,
when fast acquire failed and when the task is about to block.
CC: stable-rt@vger.kernel.org
[bigeasy: The deadlock is not device-mapper specific, it can also occur
in plain EXT4]
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
160/251 [
Author: Scott Wood
Email: swood@redhat.com
Subject: locking/rt-mutex: Flush block plug on __down_read()
Date: Fri, 4 Jan 2019 15:33:21 -0500
__down_read() bypasses the rtmutex frontend to call
rt_mutex_slowlock_locked() directly, and thus it needs to call
blk_schedule_flush_flug() itself.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
161/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: re-init the wait_lock in rt_mutex_init_proxy_locked()
Date: Thu, 16 Nov 2017 16:48:48 +0100
We could provide a key-class for the lockdep (and fixup all callers) or
move the init to all callers (like it was) in order to avoid lockdep
seeing a double-lock of the wait_lock.
Reported-by: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
162/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ptrace: fix ptrace vs tasklist_lock race
Date: Thu, 29 Aug 2013 18:21:04 +0200
As explained by Alexander Fyodorov <halcy@yandex.ru>:
|read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel,
|and it can remove __TASK_TRACED from task->state (by moving it to
|task->saved_state). If parent does wait() on child followed by a sys_ptrace
|call, the following race can happen:
|
|- child sets __TASK_TRACED in ptrace_stop()
|- parent does wait() which eventually calls wait_task_stopped() and returns
| child's pid
|- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves
| __TASK_TRACED flag to saved_state
|- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive()
The patch is based on his initial patch where an additional check is
added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is
taken in case the caller is interrupted between looking into ->state and
->saved_state.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
163/251 [
Author: Scott Wood
Email: swood@redhat.com
Subject: sched: __set_cpus_allowed_ptr(): Check cpus_mask, not cpus_ptr
Date: Sat, 27 Jul 2019 00:56:32 -0500
This function is concerned with the long-term cpu mask, not the
transitory mask the task might have while migrate disabled. Before
this patch, if a task was migrate disabled at the time
__set_cpus_allowed_ptr() was called, and the new mask happened to be
equal to the cpu that the task was running on, then the mask update
would be lost.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
164/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched/core: add migrate_disable()
Date: Sat, 27 May 2017 19:02:06 +0200
[bristot@redhat.com: rt: Increase/decrease the nr of migratory tasks when enabling/disabling migration
Link: https://lkml.kernel.org/r/e981d271cbeca975bca710e2fbcc6078c09741b0.1498482127.git.bristot@redhat.com
]
[swood@redhat.com: fixups and optimisations
Link:https://lkml.kernel.org/r/20190727055638.20443-1-swood@redhat.com
Link:https://lkml.kernel.org/r/20191012065214.28109-1-swood@redhat.com
]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
165/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: sched/core: migrate_enable() must access takedown_cpu_task on !HOTPLUG_CPU
Date: Fri, 29 Nov 2019 17:24:55 +0100
The variable takedown_cpu_task is never declared/used on !HOTPLUG_CPU
except for migrate_enable(). This leads to a link error.
Don't use takedown_cpu_task in !HOTPLUG_CPU.
Reported-by: Dick Hollenbeck <dick@softplc.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
166/251 [
Author: Scott Wood
Email: swood@redhat.com
Subject: sched: migrate_enable: Use stop_one_cpu_nowait()
Date: Sat, 12 Oct 2019 01:52:14 -0500
migrate_enable() can be called with current->state != TASK_RUNNING.
Avoid clobbering the existing state by using stop_one_cpu_nowait().
Since we're stopping the current cpu, we know that we won't get
past __schedule() until migration_cpu_stop() has run (at least up to
the point of migrating us to another cpu).
Signed-off-by: Scott Wood <swood@redhat.com>
[bigeasy: spin until the request has been processed]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
167/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: trace: Add migrate-disabled counter to tracing output
Date: Sun, 17 Jul 2011 21:56:42 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
168/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: futex: workaround migrate_disable/enable in different context
Date: Wed, 8 Mar 2017 14:23:35 +0100
migrate_enable() invokes __schedule() and it expects a preempt count of one.
Holding a raw_spinlock_t with disabled interrupts should not allow scheduling.
These little hacks ensure that we don't schedule while we lock the hb lockwith
interrupts enabled and unlock it with interrupts disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[XXX: As per PeterZ suggesstion
set_thread_flag(TIF_NEED_RESCHED); preempt_fold_need_resched()
would trigger a scheduler invocation on the last preempt_enable() which in
turn would allow to drop this.
]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
169/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: don't check for __LINUX_SPINLOCK_TYPES_H on -RT archs
Date: Fri, 4 Aug 2017 17:40:42 +0200
Upstream uses arch_spinlock_t within spinlock_t and requests that
spinlock_types.h header file is included first.
On -RT we have the rt_mutex with its raw_lock wait_lock which needs
architectures' spinlock_types.h header file for its definition. However
we need rt_mutex first because it is used to build the spinlock_t so
that check does not work for us.
Therefore I am dropping that check.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
170/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: Make spinlock_t and rwlock_t a RCU section on RT
Date: Tue, 19 Nov 2019 09:25:04 +0100
On !RT a locked spinlock_t and rwlock_t disables preemption which
implies a RCU read section. There is code that relies on that behaviour.
Add an explicit RCU read section on RT while a sleeping lock (a lock
which would disables preemption on !RT) acquired.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
171/251 [
Author: Scott Wood
Email: swood@redhat.com
Subject: rcu: Use rcuc threads on PREEMPT_RT as we did
Date: Wed, 11 Sep 2019 17:57:28 +0100
While switching to the reworked RCU-thread code, it has been forgotten
to enable the thread processing on -RT.
Besides restoring behavior that used to be default on RT, this avoids
a deadlock on scheduler locks.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
172/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: srcu: replace local_irqsave() with a locallock
Date: Thu, 12 Oct 2017 18:37:12 +0200
There are two instances which disable interrupts in order to become a
stable this_cpu_ptr() pointer. The restore part is coupled with
spin_unlock_irqrestore() which does not work on RT.
Replace the local_irq_save() call with the appropriate local_lock()
version of it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
173/251 [
Author: Julia Cartwright
Email: julia@ni.com
Subject: rcu: enable rcu_normal_after_boot by default for RT
Date: Wed, 12 Oct 2016 11:21:14 -0500
The forcing of an expedited grace period is an expensive and very
RT-application unfriendly operation, as it forcibly preempts all running
tasks on CPUs which are preventing the gp from expiring.
By default, as a policy decision, disable the expediting of grace
periods (after boot) on configurations which enable PREEMPT_RT.
Suggested-by: Luiz Capitulino <lcapitulino@redhat.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
174/251 [
Author: Scott Wood
Email: swood@redhat.com
Subject: rcutorture: Avoid problematic critical section nesting on RT
Date: Wed, 11 Sep 2019 17:57:29 +0100
rcutorture was generating some nesting scenarios that are not
reasonable. Constrain the state selection to avoid them.
Example #1:
1. preempt_disable()
2. local_bh_disable()
3. preempt_enable()
4. local_bh_enable()
On PREEMPT_RT, BH disabling takes a local lock only when called in
non-atomic context. Thus, atomic context must be retained until after BH
is re-enabled. Likewise, if BH is initially disabled in non-atomic
context, it cannot be re-enabled in atomic context.
Example #2:
1. rcu_read_lock()
2. local_irq_disable()
3. rcu_read_unlock()
4. local_irq_enable()
If the thread is preempted between steps 1 and 2,
rcu_read_unlock_special.b.blocked will be set, but it won't be
acted on in step 3 because IRQs are disabled. Thus, reporting of the
quiescent state will be delayed beyond the local_irq_enable().
For now, these scenarios will continue to be tested on non-PREEMPT_RT
kernels, until debug checks are added to ensure that they are not
happening elsewhere.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
175/251 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: rt: Improve the serial console PASS_LIMIT
Date: Wed, 14 Dec 2011 13:05:54 +0100
Beyond the warning:
drivers/tty/serial/8250/8250.c:1613:6: warning: unused variable ‘pass_counter’ [-Wunused-variable]
the solution of just looping infinitely was ugly - up it to 1 million to
give it a chance to continue in some really ugly situation.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
176/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs/epoll: Do not disable preemption on RT
Date: Fri, 8 Jul 2011 16:35:35 +0200
ep_call_nested() takes a sleeping lock so we can't disable preemption.
The light version is enough since ep_call_nested() doesn't mind beeing
invoked twice on the same CPU.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
177/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/vmalloc: Another preempt disable region which sucks
Date: Tue, 12 Jul 2011 11:39:36 +0200
Avoid the preempt disable version of get_cpu_var(). The inner-lock should
provide enough serialisation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
178/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block/mq: do not invoke preempt_disable()
Date: Tue, 14 Jul 2015 14:26:34 +0200
preempt_disable() and get_cpu() don't play well together with the sleeping
locks it tries to allocate later.
It seems to be enough to replace it with get_cpu_light() and migrate_disable().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
179/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block/mq: don't complete requests via IPI
Date: Thu, 29 Jan 2015 15:10:08 +0100
The IPI runs in hardirq context and there are sleeping locks. Assume caches are
shared and complete them on the local CPU.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
180/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: md: raid5: Make raid5_percpu handling RT aware
Date: Tue, 6 Apr 2010 16:51:31 +0200
__raid_run_ops() disables preemption with get_cpu() around the access
to the raid5_percpu variables. That causes scheduling while atomic
spews on RT.
Serialize the access to the percpu data with a lock and keep the code
preemptible.
Reported-by: Udo van den Heuvel <udovdh@xs4all.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Udo van den Heuvel <udovdh@xs4all.nl>
]
181/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: scsi/fcoe: Make RT aware.
Date: Sat, 12 Nov 2011 14:00:48 +0100
Do not disable preemption while taking sleeping locks. All user look safe
for migrate_diable() only.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
182/251 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: sunrpc: Make svc_xprt_do_enqueue() use get_cpu_light()
Date: Wed, 18 Feb 2015 16:05:28 +0100
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd
|Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
|CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9
|Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
| ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002
| 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008
| ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000
|Call Trace:
| [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
| [<ffffffff81073c86>] __might_sleep+0xe6/0x150
| [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50
| [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc]
| [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
| [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc]
| [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc]
| [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd]
| [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd]
| [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0
| [<ffffffff8117f889>] SyS_write+0x49/0xb0
| [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
183/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Introduce cpu_chill()
Date: Wed, 7 Mar 2012 20:51:03 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
defaults to cpu_relax() for non RT. On RT it puts the looping task to
sleep for a tick so the preempted task can make progress.
Steven Rostedt changed it to use a hrtimer instead of msleep():
|
|Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
|up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
|called from softirq context, it may block the ksoftirqd() from running, in
|which case, it may never wake up the msleep() causing the deadlock.
+ bigeasy later changed to schedule_hrtimeout()
|If a task calls cpu_chill() and gets woken up by a regular or spurious
|wakeup and has a signal pending, then it exits the sleep loop in
|do_nanosleep() and sets up the restart block. If restart->nanosleep.type is
|not TI_NONE then this results in accessing a stale user pointer from a
|previously interrupted syscall and a copy to user based on the stale
|pointer or a BUG() when 'type' is not supported in nanosleep_copyout().
+ bigeasy: add PF_NOFREEZE:
| [....] Waiting for /dev to be fully populated...
| =====================================
| [ BUG: udevd/229 still has locks held! ]
| 3.12.11-rt17 #23 Not tainted
| -------------------------------------
| 1 lock held by udevd/229:
| #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
|
| stack backtrace:
| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
184/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: block: Use cpu_chill() for retry loops
Date: Thu, 20 Dec 2012 18:28:26 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Steven also observed a live lock when there was a
concurrent priority boosting going on.
Use cpu_chill() instead of cpu_relax() to let the system
make progress.
[bigeasy: After all those changes that occured over the years, this one hunk is
left and should not cause any starvation on -RT anymore]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
185/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs: namespace: Use cpu_chill() in trylock loops
Date: Wed, 7 Mar 2012 21:00:34 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
186/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Use cpu_chill() instead of cpu_relax()
Date: Wed, 7 Mar 2012 21:10:04 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
187/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: debugobjects: Make RT aware
Date: Sun, 17 Jul 2011 21:41:35 +0200
Avoid filling the pool / allocating memory with irqs off().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
188/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Use skbufhead with raw lock
Date: Tue, 12 Jul 2011 15:38:34 +0200
Use the rps lock as rawlock so we can keep irq-off regions. It looks low
latency. However we can't kfree() from this context therefore we defer this
to the softirq and use the tofree_queue list for it (similar to process_queue).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
189/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: dev: always take qdisc's busylock in __dev_xmit_skb()
Date: Wed, 30 Mar 2016 13:36:29 +0200
The root-lock is dropped before dev_hard_start_xmit() is invoked and after
setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away
by a task with a higher priority then the task with the higher priority
won't be able to submit packets to the NIC directly instead they will be
enqueued into the Qdisc. The NIC will remain idle until the task(s) with
higher priority leave the CPU and the task with lower priority gets back
and finishes the job.
If we take always the busylock we ensure that the RT task can boost the
low-prio task and submit the packet.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
190/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: irqwork: push most work into softirq context
Date: Tue, 23 Jun 2015 15:32:51 +0200
Initially we defered all irqwork into softirq because we didn't want the
latency spikes if perf or another user was busy and delayed the RT task.
The NOHZ trigger (nohz_full_kick_work) was the first user that did not work
as expected if it did not run in the original irqwork context so we had to
bring it back somehow for it. push_irq_work_func is the second one that
requires this.
This patch adds the IRQ_WORK_HARD_IRQ which makes sure the callback runs
in raw-irq context. Everything else is defered into softirq context. Without
-RT we have the orignal behavior.
This patch incorporates tglx orignal work which revoked a little bringing back
the arch_irq_work_raise() if possible and a few fixes from Steven Rostedt and
Mike Galbraith,
[bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a
hard and soft variant]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
191/251 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: x86: crypto: Reduce preempt disabled regions
Date: Mon, 14 Nov 2011 18:19:27 +0100
Restrict the preempt disabled regions to the actual floating point
operations and enable preemption for the administrative actions.
This is necessary on RT to avoid that kfree and other operations are
called with preemption disabled.
Reported-and-tested-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
192/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: Reduce preempt disabled regions, more algos
Date: Fri, 21 Feb 2014 17:24:04 +0100
Don Estabrook reported
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200()
| kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
and his backtrace showed some crypto functions which looked fine.
The problem is the following sequence:
glue_xts_crypt_128bit()
{
blkcipher_walk_virt(); /* normal migrate_disable() */
glue_fpu_begin(); /* get atomic */
while (nbytes) {
__glue_xts_crypt_128bit();
blkcipher_walk_done(); /* with nbytes = 0, migrate_enable()
* while we are atomic */
};
glue_fpu_end() /* no longer atomic */
}
and this is why the counter get out of sync and the warning is printed.
The other problem is that we are non-preemptible between
glue_fpu_begin() and glue_fpu_end() and the latency grows. To fix this,
I shorten the FPU off region and ensure blkcipher_walk_done() is called
with preemption enabled. This might hurt the performance because we now
enable/disable the FPU state more often but we gain lower latency and
the bug is gone.
Reported-by: Don Estabrook <don.estabrook@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
193/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: limit more FPU-enabled sections
Date: Thu, 30 Nov 2017 13:40:10 +0100
Those crypto drivers use SSE/AVX/… for their crypto work and in order to
do so in kernel they need to enable the "FPU" in kernel mode which
disables preemption.
There are two problems with the way they are used:
- the while loop which processes X bytes may create latency spikes and
should be avoided or limited.
- the cipher-walk-next part may allocate/free memory and may use
kmap_atomic().
The whole kernel_fpu_begin()/end() processing isn't probably that cheap.
It most likely makes sense to process as much of those as possible in one
go. The new *_fpu_sched_rt() schedules only if a RT task is pending.
Probably we should measure the performance those ciphers in pure SW
mode and with this optimisations to see if it makes sense to keep them
for RT.
This kernel_fpu_resched() makes the code more preemptible which might hurt
performance.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
194/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: cryptd - add a lock instead preempt_disable/local_bh_disable
Date: Thu, 26 Jul 2018 18:52:00 +0200
cryptd has a per-CPU lock which protected with local_bh_disable() and
preempt_disable().
Add an explicit spin_lock to make the locking context more obvious and
visible to lockdep. Since it is a per-CPU lock, there should be no lock
contention on the actual spinlock.
There is a small race-window where we could be migrated to another CPU
after the cpu_queue has been obtain. This is not a problem because the
actual ressource is protected by the spinlock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
195/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: panic: skip get_random_bytes for RT_FULL in init_oops_id
Date: Tue, 14 Jul 2015 14:26:34 +0200
Disable on -RT. If this is invoked from irq-context we will have problems
to acquire the sleeping lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
196/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: stackprotector: Avoid random pool on rt
Date: Thu, 16 Dec 2010 14:25:18 +0100
CPU bringup calls into the random pool to initialize the stack
canary. During boot that works nicely even on RT as the might sleep
checks are disabled. During CPU hotplug the might sleep checks
trigger. Making the locks in random raw is a major PITA, so avoid the
call on RT is the only sensible solution. This is basically the same
randomness which we get during boot where the random pool has no
entropy and we rely on the TSC randomnness.
Reported-by: Carsten Emde <carsten.emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
197/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: random: Make it work on rt
Date: Tue, 21 Aug 2012 20:38:50 +0200
Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
198/251 [
Author: Priyanka Jain
Email: Priyanka.Jain@freescale.com
Subject: net: Remove preemption disabling in netif_rx()
Date: Thu, 17 May 2012 09:35:11 +0530
1)enqueue_to_backlog() (called from netif_rx) should be
bind to a particluar CPU. This can be achieved by
disabling migration. No need to disable preemption
2)Fixes crash "BUG: scheduling while atomic: ksoftirqd"
in case of RT.
If preemption is disabled, enqueue_to_backog() is called
in atomic context. And if backlog exceeds its count,
kfree_skb() is called. But in RT, kfree_skb() might
gets scheduled out, so it expects non atomic context.
3)When CONFIG_PREEMPT_RT is not defined,
migrate_enable(), migrate_disable() maps to
preempt_enable() and preempt_disable(), so no
change in functionality in case of non-RT.
-Replace preempt_enable(), preempt_disable() with
migrate_enable(), migrate_disable() respectively
-Replace get_cpu(), put_cpu() with get_cpu_light(),
put_cpu_light() respectively
Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com>
Cc: <rostedt@goodmis.orgn>
Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
199/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: lockdep: Make it RT aware
Date: Sun, 17 Jul 2011 18:51:23 +0200
teach lockdep that we don't really do softirqs on -RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
200/251 [
Author: Yong Zhang
Email: yong.zhang@windriver.com
Subject: lockdep: selftest: Only do hardirq context test for raw spinlock
Date: Mon, 16 Apr 2012 15:01:56 +0800
On -rt there is no softirq context any more and rwlock is sleepable,
disable softirq context test and rwlock+irq test.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Yong Zhang <yong.zhang@windriver.com>
Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
201/251 [
Author: Josh Cartwright
Email: josh.cartwright@ni.com
Subject: lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionals
Date: Wed, 28 Jan 2015 13:08:45 -0600
"lockdep: Selftest: Only do hardirq context test for raw spinlock"
disabled the execution of certain tests with PREEMPT_RT, but did
not prevent the tests from still being defined. This leads to warnings
like:
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function]
...
Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT
conditionals.
Signed-off-by: Josh Cartwright <josh.cartwright@ni.com>
Signed-off-by: Xander Huff <xander.huff@ni.com>
Acked-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
202/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: lockdep: disable self-test
Date: Tue, 17 Oct 2017 16:36:18 +0200
The self-test wasn't always 100% accurate for RT. We disabled a few
tests which failed because they had a different semantic for RT. Some
still reported false positives. Now the selftest locks up the system
during boot and it needs to be investigated…
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
203/251 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm,radeon,i915: Use preempt_disable/enable_rt() where recommended
Date: Sat, 27 Feb 2016 08:09:11 +0100
DRM folks identified the spots, so use them.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
204/251 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm,i915: Use local_lock/unlock_irq() in intel_pipe_update_start/end()
Date: Sat, 27 Feb 2016 09:01:42 +0100
[ 8.014039] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:918
[ 8.014041] in_atomic(): 0, irqs_disabled(): 1, pid: 78, name: kworker/u4:4
[ 8.014045] CPU: 1 PID: 78 Comm: kworker/u4:4 Not tainted 4.1.7-rt7 #5
[ 8.014055] Workqueue: events_unbound async_run_entry_fn
[ 8.014059] 0000000000000000 ffff880037153748 ffffffff815f32c9 0000000000000002
[ 8.014063] ffff88013a50e380 ffff880037153768 ffffffff815ef075 ffff8800372c06c8
[ 8.014066] ffff8800372c06c8 ffff880037153778 ffffffff8107c0b3 ffff880037153798
[ 8.014067] Call Trace:
[ 8.014074] [<ffffffff815f32c9>] dump_stack+0x4a/0x61
[ 8.014078] [<ffffffff815ef075>] ___might_sleep.part.93+0xe9/0xee
[ 8.014082] [<ffffffff8107c0b3>] ___might_sleep+0x53/0x80
[ 8.014086] [<ffffffff815f9064>] rt_spin_lock+0x24/0x50
[ 8.014090] [<ffffffff8109368b>] prepare_to_wait+0x2b/0xa0
[ 8.014152] [<ffffffffa016c04c>] intel_pipe_update_start+0x17c/0x300 [i915]
[ 8.014156] [<ffffffff81093b40>] ? prepare_to_wait_event+0x120/0x120
[ 8.014201] [<ffffffffa0158f36>] intel_begin_crtc_commit+0x166/0x1e0 [i915]
[ 8.014215] [<ffffffffa00c806d>] drm_atomic_helper_commit_planes+0x5d/0x1a0 [drm_kms_helper]
[ 8.014260] [<ffffffffa0171e9b>] intel_atomic_commit+0xab/0xf0 [i915]
[ 8.014288] [<ffffffffa00654c7>] drm_atomic_commit+0x37/0x60 [drm]
[ 8.014298] [<ffffffffa00c6fcd>] drm_atomic_helper_plane_set_property+0x8d/0xd0 [drm_kms_helper]
[ 8.014301] [<ffffffff815f77d9>] ? __ww_mutex_lock+0x39/0x40
[ 8.014319] [<ffffffffa0053b3d>] drm_mode_plane_set_obj_prop+0x2d/0x90 [drm]
[ 8.014328] [<ffffffffa00c8edb>] restore_fbdev_mode+0x6b/0xf0 [drm_kms_helper]
[ 8.014337] [<ffffffffa00cae49>] drm_fb_helper_restore_fbdev_mode_unlocked+0x29/0x80 [drm_kms_helper]
[ 8.014346] [<ffffffffa00caec2>] drm_fb_helper_set_par+0x22/0x50 [drm_kms_helper]
[ 8.014390] [<ffffffffa016dfba>] intel_fbdev_set_par+0x1a/0x60 [i915]
[ 8.014394] [<ffffffff81327dc4>] fbcon_init+0x4f4/0x580
[ 8.014398] [<ffffffff8139ef4c>] visual_init+0xbc/0x120
[ 8.014401] [<ffffffff813a1623>] do_bind_con_driver+0x163/0x330
[ 8.014405] [<ffffffff813a1b2c>] do_take_over_console+0x11c/0x1c0
[ 8.014408] [<ffffffff813236e3>] do_fbcon_takeover+0x63/0xd0
[ 8.014410] [<ffffffff81328965>] fbcon_event_notify+0x785/0x8d0
[ 8.014413] [<ffffffff8107c12d>] ? __might_sleep+0x4d/0x90
[ 8.014416] [<ffffffff810775fe>] notifier_call_chain+0x4e/0x80
[ 8.014419] [<ffffffff810779cd>] __blocking_notifier_call_chain+0x4d/0x70
[ 8.014422] [<ffffffff81077a06>] blocking_notifier_call_chain+0x16/0x20
[ 8.014425] [<ffffffff8132b48b>] fb_notifier_call_chain+0x1b/0x20
[ 8.014428] [<ffffffff8132d8fa>] register_framebuffer+0x21a/0x350
[ 8.014439] [<ffffffffa00cb164>] drm_fb_helper_initial_config+0x274/0x3e0 [drm_kms_helper]
[ 8.014483] [<ffffffffa016f1cb>] intel_fbdev_initial_config+0x1b/0x20 [i915]
[ 8.014486] [<ffffffff8107912c>] async_run_entry_fn+0x4c/0x160
[ 8.014490] [<ffffffff81070ffa>] process_one_work+0x14a/0x470
[ 8.014493] [<ffffffff81071489>] worker_thread+0x169/0x4c0
[ 8.014496] [<ffffffff81071320>] ? process_one_work+0x470/0x470
[ 8.014499] [<ffffffff81076606>] kthread+0xc6/0xe0
[ 8.014502] [<ffffffff81070000>] ? queue_work_on+0x80/0x110
[ 8.014506] [<ffffffff81076540>] ? kthread_worker_fn+0x1c0/0x1c0
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
205/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: disable tracing on -RT
Date: Thu, 6 Dec 2018 09:52:20 +0100
Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
| rt_spin_lock+0x3f/0x50
| gen6_read32+0x45/0x1d0 [i915]
| g4x_get_vblank_counter+0x36/0x40 [i915]
| trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spin locks. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping lock.
Based on this I don't see any other way than disable trace points on RT.
Cc: stable-rt@vger.kernel.org
Reported-by: Luca Abeni <lucabe72@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
206/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE
Date: Wed, 19 Dec 2018 10:47:02 +0100
The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
207/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Don't disable interrupts for intel_engine_breadcrumbs_irq()
Date: Thu, 26 Sep 2019 12:29:05 +0200
The function intel_engine_breadcrumbs_irq() is always invoked from an interrupt
handler and for that reason it invokes (as an optimisation) only spin_lock()
for locking assuming that the interrupts are already disabled. The
function intel_engine_signal_breadcrumbs() is provided to disable
interrupts while the former function is invoked so that assumption is
also true for callers from preemptible context.
On PREEMPT_RT local_irq_disable() really disables interrupts and this
forbids to invoke spin_lock() which becomes a sleeping spinlock.
This is also problematic with `threadirqs' in conjunction with
irq_work. With force threading the interrupt handler, the handler is
invoked with disabled BH but with interrupts enabled. This is okay and
the lock itself is never acquired in IRQ context. This changes with
irq_work (signal_irq_work()) which _still_ invokes
intel_engine_breadcrumbs_irq() from IRQ context. Lockdep should see this
and complain.
Acquire the locks in intel_engine_breadcrumbs_irq() with _irqsave()
suffix and let all callers invoke intel_engine_breadcrumbs_irq()
directly instead using intel_engine_signal_breadcrumbs().
Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
208/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Drop the IRQ-off asserts
Date: Thu, 26 Sep 2019 12:30:21 +0200
The lockdep_assert_irqs_disabled() check is needless. The previous
lockdep_assert_held() check ensures that the lock is acquired and while
the lock is acquired lockdep also prints a warning if the interrupts are
not disabled if they have to be.
These IRQ-off asserts trigger on PREEMPT_RT because the locks become
sleeping locks and do not really disable interrupts.
Remove lockdep_assert_irqs_disabled().
Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
209/251 [
Author: Mike Galbraith
Email: efault@gmx.de
Subject: cpuset: Convert callback_lock to raw_spinlock_t
Date: Sun, 8 Jan 2017 09:32:25 +0100
The two commits below add up to a cpuset might_sleep() splat for RT:
8447a0fee974 cpuset: convert callback_mutex to a spinlock
344736f29b35 cpuset: simplify cpuset_node_allowed API
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995
in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset
CPU: 135 PID: 11718 Comm: cset Tainted: G E 4.10.0-rt1-rt #4
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
Call Trace:
? dump_stack+0x5c/0x81
? ___might_sleep+0xf4/0x170
? rt_spin_lock+0x1c/0x50
? __cpuset_node_allowed+0x66/0xc0
? ___slab_alloc+0x390/0x570 <disables IRQs>
? anon_vma_fork+0x8f/0x140
? copy_page_range+0x6cf/0xb00
? anon_vma_fork+0x8f/0x140
? __slab_alloc.isra.74+0x5a/0x81
? anon_vma_fork+0x8f/0x140
? kmem_cache_alloc+0x1b5/0x1f0
? anon_vma_fork+0x8f/0x140
? copy_process.part.35+0x1670/0x1ee0
? _do_fork+0xdd/0x3f0
? _do_fork+0xdd/0x3f0
? do_syscall_64+0x61/0x170
? entry_SYSCALL64_slow_path+0x25/0x25
The later ensured that a NUMA box WILL take callback_lock in atomic
context by removing the allocator and reclaim path __GFP_HARDWALL
usage which prevented such contexts from taking callback_mutex.
One option would be to reinstate __GFP_HARDWALL protections for
RT, however, as the 8447a0fee974 changelog states:
The callback_mutex is only used to synchronize reads/updates of cpusets'
flags and cpu/node masks. These operations should always proceed fast so
there's no reason why we can't use a spinlock instead of the mutex.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
210/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: apparmor: use a locallock instead preempt_disable()
Date: Wed, 11 Oct 2017 17:43:49 +0200
get_buffers() disables preemption which acts as a lock for the per-CPU
variable. Since we can't disable preemption here on RT, a local_lock is
lock is used in order to remain on the same CPU and not to have more
than one user within the critical section.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
211/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Allow to enable RT
Date: Wed, 7 Aug 2019 18:15:38 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
212/251 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: mm, rt: kmap_atomic scheduling
Date: Thu, 28 Jul 2011 10:43:51 +0200
In fact, with migrate_disable() existing one could play games with
kmap_atomic. You could save/restore the kmap_atomic slots on context
switch (if there are any in use of course), this should be esp easy now
that we have a kmap_atomic stack.
Something like the below.. it wants replacing all the preempt_disable()
stuff with pagefault_disable() && migrate_disable() of course, but then
you can flip kmaps around like below.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[dvhart@linux.intel.com: build fix]
Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins
[tglx@linutronix.de: Get rid of the per cpu variable and store the idx
and the pte content right away in the task struct.
Shortens the context switch code. ]
]
213/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86/highmem: Add a "already used pte" check
Date: Mon, 11 Mar 2013 17:09:55 +0100
This is a copy from kmap_atomic_prot().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
214/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm/highmem: Flush tlb on unmap
Date: Mon, 11 Mar 2013 21:37:27 +0100
The tlb should be flushed on unmap and thus make the mapping entry
invalid. This is only done in the non-debug case which does not look
right.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
215/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Enable highmem for rt
Date: Wed, 13 Feb 2013 11:03:11 +0100
fixup highmem for ARM.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
216/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/scatterlist: Do not disable irqs on RT
Date: Fri, 3 Jul 2009 08:44:34 -0500
For -RT it is enough to keep pagefault disabled (which is currently handled by
kmap_atomic()).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
217/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add support for lazy preemption
Date: Fri, 26 Oct 2012 18:50:54 +0100
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.
Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.
So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).
Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.
For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.
Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)
The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:
# echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
and reenabled via
# echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
218/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: Support for lazy preemption
Date: Thu, 1 Nov 2012 11:03:47 +0100
Implement the x86 pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
219/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Add support for lazy preemption
Date: Wed, 31 Oct 2012 12:04:11 +0100
Implement the arm pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
220/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Add support for lazy preemption
Date: Thu, 1 Nov 2012 10:14:11 +0100
Implement the powerpc pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
221/251 [
Author: Anders Roxell
Email: anders.roxell@linaro.org
Subject: arch/arm64: Add lazy preempt support
Date: Thu, 14 May 2015 17:52:17 +0200
arm64 is missing support for PREEMPT_RT. The main feature which is
lacking is support for lazy preemption. The arch-specific entry code,
thread information structure definitions, and associated data tables
have to be extended to provide this support. Then the Kconfig file has
to be extended to indicate the support is available, and also to
indicate that support for full RT preemption is now available.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
]
222/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jump-label: disable if stop_machine() is used
Date: Wed, 8 Jul 2015 17:14:48 +0200
Some architectures are using stop_machine() while switching the opcode which
leads to latency spikes.
The architectures which use stop_machine() atm:
- ARM stop machine
- s390 stop machine
The architecures which use other sorcery:
- MIPS
- X86
- powerpc
- sparc
- arm64
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: only ARM for now]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
223/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: leds: trigger: disable CPU trigger on -RT
Date: Thu, 23 Jan 2014 14:45:59 +0100
as it triggers:
|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
|[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
|[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
|[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
|[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
|[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
|[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
|[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
|[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
|[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
|[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
|[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
224/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/omap: Make the locking RT aware
Date: Thu, 28 Jul 2011 13:32:57 +0200
The lock is a sleeping lock and local_irq_save() is not the
optimsation we are looking for. Redo it to make it work on -RT and
non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
225/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/pl011: Make the locking work on RT
Date: Tue, 8 Jan 2013 21:36:51 +0100
The lock is a sleeping lock and local_irq_save() is not the optimsation
we are looking for. Redo it to make it work on -RT and non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
226/251 [
Author: Kurt Kanzenbach
Email: kurt@linutronix.de
Subject: tty: serial: pl011: explicitly initialize the flags variable
Date: Mon, 24 Sep 2018 10:29:01 +0200
Silence the following gcc warning:
drivers/tty/serial/amba-pl011.c: In function ‘pl011_console_write’:
./include/linux/spinlock.h:260:3: warning: ‘flags’ may be used uninitialized in this function [-Wmaybe-uninitialized]
_raw_spin_unlock_irqrestore(lock, flags); \
^~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/tty/serial/amba-pl011.c:2214:16: note: ‘flags’ was declared here
unsigned long flags;
^~~~~
The code is correct. Thus, initializing flags to zero doesn't change the
behavior and resolves the warning.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
227/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm: include definition for cpumask_t
Date: Thu, 22 Dec 2016 17:28:33 +0100
This definition gets pulled in by other files. With the (later) split of
RCU and spinlock.h it won't compile anymore.
The split is done in ("rbtree: don't include the rcu header").
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
228/251 [
Author: Yadi.hu
Email: yadi.hu@windriver.com
Subject: ARM: enable irq in translation/section permission fault handlers
Date: Wed, 10 Dec 2014 10:32:09 +0800
Probably happens on all ARM, with
CONFIG_PREEMPT_RT
CONFIG_DEBUG_ATOMIC_SLEEP
This simple program....
int main() {
*((char*)0xc0001000) = 0;
};
[ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
[ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
[ 512.743217] INFO: lockdep is turned off.
[ 512.743360] irq event stamp: 0
[ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
[ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
[ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
[ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
[ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
[ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
[ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
[ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
[ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
[ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
[ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
[ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
[ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)
Oxc0000000 belongs to kernel address space, user task can not be
allowed to access it. For above condition, correct result is that
test case should receive a “segment fault” and exits but not stacks.
the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in
prefetch/data abort handlers"),it deletes irq enable block in Data
abort assemble code and move them into page/breakpiont/alignment fault
handlers instead. But author does not enable irq in translation/section
permission fault handlers. ARM disables irq when it enters exception/
interrupt mode, if kernel doesn't enable irq, it would be still disabled
during translation/section permission fault.
We see the above splat because do_force_sig_info is still called with
IRQs off, and that code eventually does a:
spin_lock_irqsave(&t->sighand->siglock, flags);
As this is architecture independent code, and we've not seen any other
need for other arch to have the siglock converted to raw lock, we can
conclude that we should enable irq for ARM translation/section
permission exception.
Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
229/251 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: genirq: update irq_set_irqchip_state documentation
Date: Thu, 11 Feb 2016 11:54:00 -0600
On -rt kernels, the use of migrate_disable()/migrate_enable() is
sufficient to guarantee a task isn't moved to another CPU. Update the
irq_set_irqchip_state() documentation to reflect this.
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
230/251 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: KVM: arm/arm64: downgrade preempt_disable()d region to migrate_disable()
Date: Thu, 11 Feb 2016 11:54:01 -0600
kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating
the vgic and timer states to prevent the calling task from migrating to
another CPU. It does so to prevent the task from writing to the
incorrect per-CPU GIC distributor registers.
On -rt kernels, it's possible to maintain the same guarantee with the
use of migrate_{disable,enable}(), with the added benefit that the
migrate-disabled region is preemptible. Update
kvm_arch_vcpu_ioctl_run() to do so.
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Reported-by: Manish Jaggi <Manish.Jaggi@caviumnetworks.com>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
231/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm64: fpsimd: Delay freeing memory in fpsimd_flush_thread()
Date: Wed, 25 Jul 2018 14:02:38 +0200
fpsimd_flush_thread() invokes kfree() via sve_free() within a preempt disabled
section which is not working on -RT.
Delay freeing of memory until preemption is enabled again.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
232/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm: at91: do not disable/enable clocks in a row
Date: Wed, 9 Mar 2016 10:51:06 +0100
Currently the driver will disable the clock and enable it one line later
if it is switching from periodic mode into one shot.
This can be avoided and causes a needless warning on -RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
233/251 [
Author: Benedikt Spranger
Email: b.spranger@linutronix.de
Subject: clocksource: TCLIB: Allow higher clock rates for clock events
Date: Mon, 8 Mar 2010 18:57:04 +0100
As default the TCLIB uses the 32KiHz base clock rate for clock events.
Add a compile time selection to allow higher clock resulution.
(fixed up by Sami Pietikäinen <Sami.Pietikainen@wapice.com>)
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
234/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Enable RT also on 32bit
Date: Thu, 7 Nov 2019 17:49:20 +0100
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
235/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:29 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
236/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM64: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:35 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
237/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/pseries/iommu: Use a locallock instead local_irq_save()
Date: Tue, 26 Mar 2019 18:31:54 +0100
The locallock protects the per-CPU variable tce_page. The function
attempts to allocate memory while tce_page is protected (by disabling
interrupts).
Use local_irq_save() instead of local_irq_disable().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
238/251 [
Author: Bogdan Purcareata
Email: bogdan.purcareata@freescale.com
Subject: powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT
Date: Fri, 24 Apr 2015 15:53:13 +0000
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.
This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
239/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Disable highmem on RT
Date: Mon, 18 Jul 2011 17:08:34 +0200
The current highmem handling on -RT is not compatible and needs fixups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
240/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/stackprotector: work around stack-guard init from atomic
Date: Tue, 26 Mar 2019 18:31:29 +0100
This is invoked from the secondary CPU in atomic context. On x86 we use
tsc instead. On Power we XOR it against mftb() so lets use stack address
as the initial value.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
241/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: POWERPC: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:41 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
242/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mips: Disable highmem on RT
Date: Mon, 18 Jul 2011 17:10:12 +0200
The current highmem handling on -RT is not compatible and needs fixups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
243/251 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: connector/cn_proc: Protect send_msg() with a local lock on RT
Date: Sun, 16 Oct 2016 05:11:54 +0200
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
|in_atomic(): 1, irqs_disabled(): 0, pid: 31807, name: sleep
|Preemption disabled at:[<ffffffff8148019b>] proc_exit_connector+0xbb/0x140
|
|CPU: 4 PID: 31807 Comm: sleep Tainted: G W E 4.8.0-rt11-rt #106
|Call Trace:
| [<ffffffff813436cd>] dump_stack+0x65/0x88
| [<ffffffff8109c425>] ___might_sleep+0xf5/0x180
| [<ffffffff816406b0>] __rt_spin_lock+0x20/0x50
| [<ffffffff81640978>] rt_read_lock+0x28/0x30
| [<ffffffff8156e209>] netlink_broadcast_filtered+0x49/0x3f0
| [<ffffffff81522621>] ? __kmalloc_reserve.isra.33+0x31/0x90
| [<ffffffff8156e5cd>] netlink_broadcast+0x1d/0x20
| [<ffffffff8147f57a>] cn_netlink_send_mult+0x19a/0x1f0
| [<ffffffff8147f5eb>] cn_netlink_send+0x1b/0x20
| [<ffffffff814801d8>] proc_exit_connector+0xf8/0x140
| [<ffffffff81077f71>] do_exit+0x5d1/0xba0
| [<ffffffff810785cc>] do_group_exit+0x4c/0xc0
| [<ffffffff81078654>] SyS_exit_group+0x14/0x20
| [<ffffffff81640a72>] entry_SYSCALL_64_fastpath+0x1a/0xa4
Since ab8ed951080e ("connector: fix out-of-order cn_proc netlink message
delivery") which is v4.7-rc6.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
244/251 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drivers/block/zram: Replace bit spinlocks with rtmutex for -rt
Date: Thu, 31 Mar 2016 04:08:28 +0200
They're nondeterministic, and lead to ___might_sleep() splats in -rt.
OTOH, they're a lot less wasteful than an rtmutex per page.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
245/251 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drivers/zram: Don't disable preemption in zcomp_stream_get/put()
Date: Thu, 20 Oct 2016 11:15:22 +0200
In v4.7, the driver switched to percpu compression streams, disabling
preemption via get/put_cpu_ptr(). Use a per-zcomp_strm lock here. We
also have to fix an lock order issue in zram_decompress_page() such
that zs_map_object() nests inside of zcomp_stream_put() as it does in
zram_bvec_write().
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: get_locked_var() -> per zcomp_strm lock]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
246/251 [
Author: Julia Cartwright
Email: julia@ni.com
Subject: squashfs: make use of local lock in multi_cpu decompressor
Date: Mon, 7 May 2018 08:58:57 -0500
Currently, the squashfs multi_cpu decompressor makes use of
get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption
during decompression.
Because the workload is distributed across CPUs, all CPUs can observe a
very high wakeup latency, which has been seen to be as much as 8000us.
Convert this decompressor to make use of a local lock, which will allow
execution of the decompressor with preemption-enabled, but also ensure
concurrent accesses to the percpu compressor data on the local CPU will
be serialized.
Cc: stable-rt@vger.kernel.org
Reported-by: Alexander Stein <alexander.stein@systec-electronic.com>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
247/251 [
Author: Haris Okanovic
Email: haris.okanovic@ni.com
Subject: tpm_tis: fix stall after iowrite*()s
Date: Tue, 15 Aug 2017 15:13:08 -0500
ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.
For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.
The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.
Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
248/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signals: Allow rt tasks to cache one sigqueue struct
Date: Fri, 3 Jul 2009 08:44:56 -0500
To avoid allocation allow rt tasks to cache one sigqueue struct in
task struct.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
249/251 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: genirq: Disable irqpoll on -rt
Date: Fri, 3 Jul 2009 08:29:57 -0500
Creates long latencies for no value
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
250/251 [
Author: Clark Williams
Email: williams@redhat.com
Subject: sysfs: Add /sys/kernel/realtime entry
Date: Sat, 30 Jul 2011 21:55:53 -0500
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.
Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.
Are there better solutions? Should it exist and return 0 on !-rt?
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
]
251/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: Add localversion for -RT release
Date: Fri, 8 Jul 2011 20:25:16 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/251 [
Author: Waiman Long
Email: longman@redhat.com
Subject: lib/smp_processor_id: Don't use cpumask_equal()
Date: Thu, 3 Oct 2019 16:36:08 -0400
The check_preemption_disabled() function uses cpumask_equal() to see
if the task is bounded to the current CPU only. cpumask_equal() calls
memcmp() to do the comparison. As x86 doesn't have __HAVE_ARCH_MEMCMP,
the slow memcmp() function in lib/string.c is used.
On a RT kernel that call check_preemption_disabled() very frequently,
below is the perf-record output of a certain microbenchmark:
42.75% 2.45% testpmd [kernel.kallsyms] [k] check_preemption_disabled
40.01% 39.97% testpmd [kernel.kallsyms] [k] memcmp
We should avoid calling memcmp() in performance critical path. So the
cpumask_equal() call is now replaced with an equivalent simpler check.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
2/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jbd2: Simplify journal_unmap_buffer()
Date: Fri, 9 Aug 2019 14:42:27 +0200
journal_unmap_buffer() checks first whether the buffer head is a journal.
If so it takes locks and then invokes jbd2_journal_grab_journal_head()
followed by another check whether this is journal head buffer.
The double checking is pointless.
Replace the initial check with jbd2_journal_grab_journal_head() which
alredy checks whether the buffer head is actually a journal.
Allows also early access to the journal head pointer for the upcoming
conversion of state lock to a regular spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
3/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jbd2: Remove jbd_trylock_bh_state()
Date: Fri, 9 Aug 2019 14:42:28 +0200
No users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
4/251 [
Author: Jan Kara
Email: jack@suse.cz
Subject: jbd2: Move dropping of jh reference out of un/re-filing functions
Date: Fri, 9 Aug 2019 14:42:29 +0200
__jbd2_journal_unfile_buffer() and __jbd2_journal_refile_buffer() drop
transaction's jh reference when they remove jh from a transaction. This
will be however inconvenient once we move state lock into journal_head
itself as we still need to unlock it and we'd need to grab jh reference
just for that. Move dropping of jh reference out of these functions into
the few callers.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
5/251 [
Author: Jan Kara
Email: jack@suse.cz
Subject: jbd2: Drop unnecessary branch from jbd2_journal_forget()
Date: Fri, 9 Aug 2019 14:42:30 +0200
We have cleared both dirty & jbddirty bits from the bh. So there's no
difference between bforget() and brelse(). Thus there's no point jumping
to no_jbd branch.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
6/251 [
Author: Jan Kara
Email: jack@suse.cz
Subject: jbd2: Don't call __bforget() unnecessarily
Date: Fri, 9 Aug 2019 14:42:31 +0200
jbd2_journal_forget() jumps to 'not_jbd' branch which calls __bforget()
in cases where the buffer is clean which is pointless. In case of failed
assertion, it can be even argued that it is safer not to touch buffer's
dirty bits. Also logically it makes more sense to just jump to 'drop'
and that will make logic also simpler when we switch bh_state_lock to a
spinlock.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
7/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jbd2: Make state lock a spinlock
Date: Fri, 9 Aug 2019 14:42:32 +0200
Bit-spinlocks are problematic on PREEMPT_RT if functions which might sleep
on RT, e.g. spin_lock(), alloc/free(), are invoked inside the lock held
region because bit spinlocks disable preemption even on RT.
A first attempt was to replace state lock with a spinlock placed in struct
buffer_head and make the locking conditional on PREEMPT_RT and
DEBUG_BIT_SPINLOCKS.
Jan pointed out that there is a 4 byte hole in struct journal_head where a
regular spinlock fits in and he would not object to convert the state lock
to a spinlock unconditionally.
Aside of solving the RT problem, this also gains lockdep coverage for the
journal head state lock (bit-spinlocks are not covered by lockdep as it's
hard to fit a lockdep map into a single bit).
The trivial change would have been to convert the jbd_*lock_bh_state()
inlines, but that comes with the downside that these functions take a
buffer head pointer which needs to be converted to a journal head pointer
which adds another level of indirection.
As almost all functions which use this lock have a journal head pointer
readily available, it makes more sense to remove the lock helper inlines
and write out spin_*lock() at all call sites.
Fixup all locking comments as well.
Suggested-by: Jan Kara <jack@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jan Kara <jack@suse.com>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
8/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jbd2: Free journal head outside of locked region
Date: Fri, 9 Aug 2019 14:42:33 +0200
On PREEMPT_RT bit-spinlocks have the same semantics as on PREEMPT_RT=n,
i.e. they disable preemption. That means functions which are not safe to be
called in preempt disabled context on RT trigger a might_sleep() assert.
The journal head bit spinlock is mostly held for short code sequences with
trivial RT safe functionality, except for one place:
jbd2_journal_put_journal_head() invokes __journal_remove_journal_head()
with the journal head bit spinlock held. __journal_remove_journal_head()
invokes kmem_cache_free() which must not be called with preemption disabled
on RT.
Jan suggested to rework the removal function so the actual free happens
outside the bit-spinlocked region.
Split it into two parts:
- Do the sanity checks and the buffer head detach under the lock
- Do the actual free after dropping the lock
There is error case handling in the free part which needs to dereference
the b_size field of the now detached buffer head. Due to paranoia (caused
by ignorance) the size is retrieved in the detach function and handed into
the free function. Might be over-engineered, but better safe than sorry.
This makes the journal head bit-spinlock usage RT compliant and also avoids
nested locking which is not covered by lockdep.
Suggested-by: Jan Kara <jack@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-ext4@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
9/251 [
Author: Joel Fernandes (Google)
Email: joel@joelfernandes.org
Subject: workqueue: Convert for_each_wq to use built-in list check
Date: Thu, 15 Aug 2019 10:18:42 -0400
Because list_for_each_entry_rcu() can now check for holding a
lock as well as for being in an RCU read-side critical section,
this commit replaces the workqueue_sysfs_unregister() function's
use of assert_rcu_or_wq_mutex() and list_for_each_entry_rcu() with
list_for_each_entry_rcu() augmented with a lockdep_is_held() optional
argument.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
10/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86/ioapic: Rename misnamed functions
Date: Thu, 17 Oct 2019 12:19:02 +0200
ioapic_irqd_[un]mask() are misnomers as both functions do way more than
masking and unmasking the interrupt line. Both deal with the moving the
affinity of the interrupt within interrupt context. The mask/unmask is just
a tiny part of the functionality.
Rename them to ioapic_prepare/finish_move(), fixup the call sites and
rename the related variables in the code to reflect what this is about.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20191017101938.412489856@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
11/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: percpu-refcount: use normal instead of RCU-sched"
Date: Wed, 4 Sep 2019 17:59:36 +0200
This is a revert of commit
a4244454df129 ("percpu-refcount: use RCU-sched insted of normal RCU")
which claims the only reason for using RCU-sched is
"rcu_read_[un]lock() … are slightly more expensive than preempt_disable/enable()"
and
"As the RCU critical sections are extremely short, using sched-RCU
shouldn't have any latency implications."
The problem with RCU-sched is that it disables preemption and the
callback must not acquire any sleeping locks like spinlock_t on
PREEMPT_RT which is the case.
Convert back to normal RCU.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
12/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Don't disable interrupts independently of the lock
Date: Wed, 10 Apr 2019 11:01:37 +0200
The locks (active.lock and rq->lock) need to be taken with disabled
interrupts. This is done in i915_request_retire() by disabling the
interrupts independently of the locks itself.
While local_irq_disable()+spin_lock() equals spin_lock_irq() on vanilla
it does not on PREEMPT_RT.
Chris Wilson confirmed that local_irq_disable() was just introduced as
an optimisation to avoid enabling/disabling interrupts during
lock/unlock combo.
Enable/disable interrupts as part of the locking instruction.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
13/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block: Don't disable interrupts in trigger_softirq()
Date: Fri, 15 Nov 2019 21:37:22 +0100
trigger_softirq() is always invoked as a SMP-function call which is
always invoked with disables interrupts.
Don't disable interrupt in trigger_softirq() because interrupts are
already disabled.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
14/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm64: KVM: Invoke compute_layout() before alternatives are applied
Date: Thu, 26 Jul 2018 09:13:42 +0200
compute_layout() is invoked as part of an alternative fixup under
stop_machine(). This function invokes get_random_long() which acquires a
sleeping lock on -RT which can not be acquired in this context.
Rename compute_layout() to kvm_compute_layout() and invoke it before
stop_machine() applies the alternatives. Add a __init prefix to
kvm_compute_layout() because the caller has it, too (and so the code can be
discarded after boot).
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
15/251 [
Author: Marc Kleine-Budde
Email: mkl@pengutronix.de
Subject: net: sched: Use msleep() instead of yield()
Date: Wed, 5 Mar 2014 00:49:47 +0100
On PREEMPT_RT enabled systems the interrupt handler run as threads at prio 50
(by default). If a high priority userspace process tries to shut down a busy
network interface it might spin in a yield loop waiting for the device to
become idle. With the interrupt thread having a lower priority than the
looping process it might never be scheduled and so result in a deadlock on UP
systems.
With Magic SysRq the following backtrace can be produced:
> test_app R running 0 174 168 0x00000000
> [<c02c7070>] (__schedule+0x220/0x3fc) from [<c02c7870>] (preempt_schedule_irq+0x48/0x80)
> [<c02c7870>] (preempt_schedule_irq+0x48/0x80) from [<c0008fa8>] (svc_preempt+0x8/0x20)
> [<c0008fa8>] (svc_preempt+0x8/0x20) from [<c001a984>] (local_bh_enable+0x18/0x88)
> [<c001a984>] (local_bh_enable+0x18/0x88) from [<c025316c>] (dev_deactivate_many+0x220/0x264)
> [<c025316c>] (dev_deactivate_many+0x220/0x264) from [<c023be04>] (__dev_close_many+0x64/0xd4)
> [<c023be04>] (__dev_close_many+0x64/0xd4) from [<c023be9c>] (__dev_close+0x28/0x3c)
> [<c023be9c>] (__dev_close+0x28/0x3c) from [<c023f7f0>] (__dev_change_flags+0x88/0x130)
> [<c023f7f0>] (__dev_change_flags+0x88/0x130) from [<c023f904>] (dev_change_flags+0x10/0x48)
> [<c023f904>] (dev_change_flags+0x10/0x48) from [<c024c140>] (do_setlink+0x370/0x7ec)
> [<c024c140>] (do_setlink+0x370/0x7ec) from [<c024d2f0>] (rtnl_newlink+0x2b4/0x450)
> [<c024d2f0>] (rtnl_newlink+0x2b4/0x450) from [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4)
> [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4) from [<c0256740>] (netlink_rcv_skb+0xac/0xc0)
> [<c0256740>] (netlink_rcv_skb+0xac/0xc0) from [<c024bbd8>] (rtnetlink_rcv+0x18/0x24)
> [<c024bbd8>] (rtnetlink_rcv+0x18/0x24) from [<c02561b8>] (netlink_unicast+0x13c/0x198)
> [<c02561b8>] (netlink_unicast+0x13c/0x198) from [<c025651c>] (netlink_sendmsg+0x264/0x2e0)
> [<c025651c>] (netlink_sendmsg+0x264/0x2e0) from [<c022af98>] (sock_sendmsg+0x78/0x98)
> [<c022af98>] (sock_sendmsg+0x78/0x98) from [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278)
> [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278) from [<c022cf08>] (__sys_sendmsg+0x48/0x78)
> [<c022cf08>] (__sys_sendmsg+0x48/0x78) from [<c0009320>] (ret_fast_syscall+0x0/0x2c)
This patch works around the problem by replacing yield() by msleep(1), giving
the interrupt thread time to finish, similar to other changes contained in the
rt patch set. Using wait_for_completion() instead would probably be a better
solution.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
16/251 [
Author: Uladzislau Rezki (Sony)
Email: urezki@gmail.com
Subject: mm/vmalloc: remove preempt_disable/enable when doing preloading
Date: Sat, 30 Nov 2019 17:54:33 -0800
Some background. The preemption was disabled before to guarantee that a
preloaded object is available for a CPU, it was stored for. That was
achieved by combining the disabling the preemption and taking the spin
lock while the ne_fit_preload_node is checked.
The aim was to not allocate in atomic context when spinlock is taken
later, for regular vmap allocations. But that approach conflicts with
CONFIG_PREEMPT_RT philosophy. It means that calling spin_lock() with
disabled preemption is forbidden in the CONFIG_PREEMPT_RT kernel.
Therefore, get rid of preempt_disable() and preempt_enable() when the
preload is done for splitting purpose. As a result we do not guarantee
now that a CPU is preloaded, instead we minimize the case when it is
not, with this change, by populating the per cpu preload pointer under
the vmap_area_lock.
This implies that at least each caller that has done the preallocation
will not fallback to an atomic allocation later. It is possible that
the preallocation would be pointless or that no preallocation is done
because of the race but the data shows that this is really rare.
For example i run the special test case that follows the preload pattern
and path. 20 "unbind" threads run it and each does 1000000 allocations.
Only 3.5 times among 1000000 a CPU was not preloaded. So it can happen
but the number is negligible.
[mhocko@suse.com: changelog additions]
Link: http://lkml.kernel.org/r/20191016095438.12391-1-urezki@gmail.com
Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for split purpose")
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Daniel Wagner <dwagner@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
17/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: KVM: arm/arm64: Let the timer expire in hardirq context on RT
Date: Tue, 13 Aug 2019 14:29:41 +0200
The timers are canceled from an preempt-notifier which is invoked with
disabled preemption which is not allowed on PREEMPT_RT.
The timer callback is short so in could be invoked in hard-IRQ context
on -RT.
Let the timer expire on hard-IRQ context even on -RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
18/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add printk ring buffer documentation
Date: Tue, 12 Feb 2019 15:29:39 +0100
The full documentation file for the printk ring buffer.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
19/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add prb locking functions
Date: Tue, 12 Feb 2019 15:29:40 +0100
Add processor-reentrant spin locking functions. These allow
restricting the number of possible contexts to 2, which can simplify
implementing code that also supports NMI interruptions.
prb_lock();
/*
* This code is synchronized with all contexts
* except an NMI on the same processor.
*/
prb_unlock();
In order to support printk's emergency messages, a
processor-reentrant spin lock will be used to control raw access to
the emergency console. However, it must be the same
processor-reentrant spin lock as the one used by the ring buffer,
otherwise a deadlock can occur:
CPU1: printk lock -> emergency -> serial lock
CPU2: serial lock -> printk lock
By making the processor-reentrant implemtation available externally,
printk can use the same atomic_t for the ring buffer as for the
emergency console and thus avoid the above deadlock.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
20/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: define ring buffer struct and initializer
Date: Tue, 12 Feb 2019 15:29:41 +0100
See Documentation/printk-ringbuffer.txt for details about the
initializer arguments.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
21/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add writer interface
Date: Tue, 12 Feb 2019 15:29:42 +0100
Add the writer functions prb_reserve() and prb_commit(). These make
use of processor-reentrant spin locks to limit the number of possible
interruption scenarios for the writers.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
22/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add basic non-blocking reading interface
Date: Tue, 12 Feb 2019 15:29:43 +0100
Add reader iterator static declaration/initializer, dynamic
initializer, and functions to iterate and retrieve ring buffer data.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
23/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add blocking reader support
Date: Tue, 12 Feb 2019 15:29:44 +0100
Add a blocking read function for readers. An irq_work function is
used to signal the wait queue so that write notification can
be triggered from any context.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
24/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add functionality required by printk
Date: Tue, 12 Feb 2019 15:29:45 +0100
The printk subsystem needs to be able to query the size of the ring
buffer, seek to specific entries within the ring buffer, and track
if records could not be stored in the ring buffer.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
25/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add ring buffer and kthread
Date: Tue, 12 Feb 2019 15:29:46 +0100
The printk ring buffer provides an NMI-safe interface for writing
messages to a ring buffer. Using such a buffer for alleviates printk
callers from the current burdens of disabled preemption while calling
the console drivers (and possibly printing out many messages that
another task put into the log buffer).
Create a ring buffer to be used for storing messages to be
printed to the consoles.
Create a dedicated printk kthread to block on the ring buffer
and call the console drivers for the read messages.
NOTE: The printk_delay is relocated to _after_ the message is
printed, where it makes more sense.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
26/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove exclusive console hack
Date: Tue, 12 Feb 2019 15:29:47 +0100
In order to support printing the printk log history when new
consoles are registered, a global exclusive_console variable is
temporarily set. This only works because printk runs with
preemption disabled.
When console printing is moved to a fully preemptible dedicated
kthread, this hack no longer works.
Remove exclusive_console usage.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
27/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: redirect emit/store to new ringbuffer
Date: Tue, 12 Feb 2019 15:29:48 +0100
vprintk_emit and vprintk_store are the main functions that all printk
variants eventually go through. Change these to store the message in
the new printk ring buffer that the printk kthread is reading.
Remove functions no longer in use because of the changes to
vprintk_emit and vprintk_store.
In order to handle interrupts and NMIs, a second per-cpu ring buffer
(sprint_rb) is added. This ring buffer is used for NMI-safe memory
allocation in order to format the printk messages.
NOTE: LOG_CONT is ignored for now and handled as individual messages.
LOG_CONT functions are masked behind "#if 0" blocks until their
functionality can be restored
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
28/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk_safe: remove printk safe code
Date: Tue, 12 Feb 2019 15:29:49 +0100
vprintk variants are now NMI-safe so there is no longer a need for
the "safe" calls.
NOTE: This also removes printk flushing functionality.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
29/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: minimize console locking implementation
Date: Tue, 12 Feb 2019 15:29:50 +0100
Since printing of the printk buffer is now handled by the printk
kthread, minimize the console locking functions to just handle
locking of the console.
NOTE: With this console_flush_on_panic will no longer flush.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
30/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: track seq per console
Date: Tue, 12 Feb 2019 15:29:51 +0100
Allow each console to track which seq record was last printed. This
simplifies identifying dropped records.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
31/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: do boot_delay_msec inside printk_delay
Date: Tue, 12 Feb 2019 15:29:52 +0100
Both functions needed to be called one after the other, so just
integrate boot_delay_msec into printk_delay for simplification.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
32/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: print history for new consoles
Date: Tue, 12 Feb 2019 15:29:53 +0100
When new consoles register, they currently print how many messages
they have missed. However, many (or all) of those messages may still
be in the ring buffer. Add functionality to print as much of the
history as available. This is a clean replacement of the old
exclusive console hack.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
33/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement CON_PRINTBUFFER
Date: Tue, 12 Feb 2019 15:29:54 +0100
If the CON_PRINTBUFFER flag is not set, do not replay the history
for that console.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
34/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add processor number to output
Date: Tue, 12 Feb 2019 15:29:55 +0100
It can be difficult to sort printk out if multiple processors are
printing simultaneously. Add the processor number to the printk
output to allow the messages to be sorted.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
35/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: console: add write_atomic interface
Date: Tue, 12 Feb 2019 15:29:56 +0100
Add a write_atomic callback to the console. This is an optional
function for console drivers. The function must be atomic (including
NMI safe) for writing to the console.
Console drivers must still implement the write callback. The
write_atomic callback will only be used for emergency messages.
Creating an NMI safe write_atomic that must synchronize with write
requires a careful implementation of the console driver. To aid with
the implementation, a set of console_atomic_* functions are provided:
void console_atomic_lock(unsigned int *flags);
void console_atomic_unlock(unsigned int flags);
These functions synchronize using the processor-reentrant cpu lock of
the printk buffer.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
36/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: introduce emergency messages
Date: Tue, 12 Feb 2019 15:29:57 +0100
Console messages are generally either critical or non-critical.
Critical messages are messages such as crashes or sysrq output.
Critical messages should never be lost because generally they provide
important debugging information.
Since all console messages are output via a fully preemptible printk
kernel thread, it is possible that messages are not output because
that thread cannot be scheduled (BUG in scheduler, run-away RT task,
etc).
To allow critical messages to be output independent of the
schedulability of the printk task, introduce an emergency mechanism
that _immediately_ outputs the message to the consoles. To avoid
possible unbounded latency issues, the emergency mechanism only
outputs the printk line provided by the caller and ignores any
pending messages in the log buffer.
Critical messages are identified as messages (by default) with log
level LOGLEVEL_WARNING or more critical. This is configurable via the
kernel option CONSOLE_LOGLEVEL_EMERGENCY.
Any messages output as emergency messages are skipped by the printk
thread on those consoles that output the emergency message.
In order for a console driver to support emergency messages, the
write_atomic function must be implemented by the driver. If not
implemented, the emergency messages are handled like all other
messages and are printed by the printk thread.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
37/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: serial: 8250: implement write_atomic
Date: Tue, 12 Feb 2019 15:29:58 +0100
Implement a non-sleeping NMI-safe write_atomic console function in
order to support emergency printk messages.
Since interrupts need to be disabled during transmit, all usage of
the IER register was wrapped with access functions that use the
console_atomic_lock function to synchronize register access while
tracking the state of the interrupts. This was necessary because
write_atomic is can be calling from an NMI context that has
preempted write_atomic.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
38/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement KERN_CONT
Date: Tue, 12 Feb 2019 15:29:59 +0100
Implement KERN_CONT based on the printing CPU rather than on the
printing task. As long as the KERN_CONT messages are coming from the
same CPU and no non-KERN_CONT messages come, the messages are assumed
to belong to each other.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
39/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement /dev/kmsg
Date: Tue, 12 Feb 2019 15:30:00 +0100
Since printk messages are now logged to a new ring buffer, update
the /dev/kmsg functions to pull the messages from there.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
40/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement syslog
Date: Tue, 12 Feb 2019 15:30:01 +0100
Since printk messages are now logged to a new ring buffer, update
the syslog functions to pull the messages from there.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
41/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement kmsg_dump
Date: Tue, 12 Feb 2019 15:30:02 +0100
Since printk messages are now logged to a new ring buffer, update
the kmsg_dump functions to pull the messages from there.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
42/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove unused code
Date: Tue, 12 Feb 2019 15:30:03 +0100
Code relating to the safe context and anything dealing with the
previous log buffer implementation is no longer in use. Remove it.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
43/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: set deferred to default loglevel, enforce mask
Date: Thu, 14 Feb 2019 23:13:30 +0100
All messages printed via vpritnk_deferred() were being
automatically treated as emergency messages.
Messages printed via vprintk_deferred() should be set to the
default loglevel. LOGLEVEL_SCHED is no longer relevant.
Also, enforce the loglevel mask for emergency messages.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
44/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: serial: 8250: remove that trylock in serial8250_console_write_atomic()
Date: Thu, 14 Feb 2019 17:38:24 +0100
This does not work as rtmutex in NMI context. As per John, it is not
needed.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
45/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: serial: 8250: export symbols which are used by symbols
Date: Sat, 16 Feb 2019 09:02:00 +0100
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
46/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm: remove printk_nmi_.*()
Date: Fri, 15 Feb 2019 14:34:20 +0100
It is no longer provided by the printk core code.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
47/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: only allow kernel to emergency message
Date: Sun, 17 Feb 2019 03:11:20 +0100
Emergency messages exist as a mechanism for the kernel to
communicate critical information to users. It is not meant for
use by userspace. Only allow facility=0 messages to be
processed by the emergency message code.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
48/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: devkmsg: llseek: reset clear if it is lost
Date: Fri, 22 Feb 2019 23:02:44 +0100
SEEK_DATA will seek to the last clear record. If this clear record
is no longer in the ring buffer, devkmsg_llseek() will go into an
infinite loop. Fix that by resetting the clear sequence if the old
clear record is no longer in the ring buffer.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
49/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: printk: print "rate-limitted" message as info
Date: Fri, 22 Feb 2019 12:47:13 +0100
If messages which are injected via kmsg are dropped then they don't need
to be printed as warnings. This is to avoid latency spikes if the
interface decides to print a lot of important messages.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
50/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: kmsg_dump: remove mutex usage
Date: Wed, 24 Apr 2019 16:36:04 +0200
The kmsg dumper can be called from any context, but the dumping
helpers were using a mutex to synchronize the iterator against
concurrent dumps.
Rather than trying to synchronize the iterator, use a local copy
of the iterator during the dump. Then no synchronization is
required.
Reported-by: Scott Wood <swood@redhat.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
51/251 [
Author: He Zhe
Email: zhe.he@windriver.com
Subject: printk: devkmsg: read: Return EPIPE when the first message user-space wants has gone
Date: Tue, 24 Sep 2019 15:26:39 +0800
When user-space wants to read the first message, that is when user->seq
is 0, and that message has gone, it currently automatically resets
user->seq to current first seq. This mis-aligns with mainline kernel.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/dev-kmsg#n39
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/printk/printk.c#n899
We should inform user-space that what it wants has gone by returning EPIPE
in such scenario.
Link: https://lore.kernel.org/r/20190924072639.25986-1-zhe.he@windriver.com
Signed-off-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
52/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: handle iterating while buffer changing
Date: Mon, 7 Oct 2019 16:20:39 +0200
The syslog and kmsg_dump readers are provided buffers to fill.
Both try to maximize the provided buffer usage by calculating the
maximum number of messages that can fit. However, if after the
calculation, messages are dropped and new messages added, the
calculation will no longer match.
For syslog, add a check to make sure the provided buffer is not
overfilled.
For kmsg_dump, start over by recalculating the messages
available.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
53/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: hack out emergency loglevel usage
Date: Tue, 3 Dec 2019 09:14:57 +0100
Instead of using an emergency loglevel to determine if atomic
messages should be printed, use oops_in_progress. This conforms
to the decision that latency-causing atomic messages never be
generated during normal operation.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
54/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: serial: 8250: only atomic lock for console
Date: Fri, 10 Jan 2020 16:45:31 +0106
The atomic console implementation requires that IER is synchronized
between atomic and non-atomic usage. However, it was implemented such
that the console_atomic_lock was performed for all IER access, even
if that port was not a console.
The implementation also used a usage counter to keep track of IER
clear/restore windows. However, this is not needed because the
console_atomic_lock synchronization of IER access with prevent any
situations where IER is prematurely restored or left cleared.
Move the IER access functions to inline macros. They will only
console_atomic_lock if the port is a console. Remove the
restore_ier() function by having clear_ier() return the prior IER
value so that the caller can restore it using set_ier(). Rename the
IER access functions to match other 8250 wrapper macros.
Suggested-by: Dick Hollenbeck <dick@softplc.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
55/251 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: serial: 8250: fsl/ingenic/mtk: fix atomic console
Date: Fri, 10 Jan 2020 16:45:32 +0106
A few 8250 implementations have their own IER access. If the port
is a console, wrap the accesses with console_atomic_lock.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
56/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
Date: Fri, 15 Nov 2019 18:54:20 +0100
Bit spinlocks are problematic if PREEMPT_RT is enabled, because they
disable preemption, which is undesired for latency reasons and breaks when
regular spinlocks are taken within the bit_spinlock locked region because
regular spinlocks are converted to 'sleeping spinlocks' on RT. So RT
replaces the bit spinlocks with regular spinlocks to avoid this problem.
Bit spinlocks are also not covered by lock debugging, e.g. lockdep.
Substitute the BH_Uptodate_Lock bit spinlock with a regular spinlock.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: remove the wrapper and use always spinlock_t and move it into
the padding hole]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
57/251 [
Author: Clark Williams
Email: williams@redhat.com
Subject: thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
Date: Mon, 15 Jul 2019 15:25:00 -0500
The spinlock pkg_temp_lock has the potential of being taken in atomic
context because it can be acquired from the thermal IRQ vector.
It's static and limited scope so go ahead and make it a raw spinlock.
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
58/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: perf/core: Add SRCU annotation for pmus list walk
Date: Fri, 15 Nov 2019 18:04:07 +0100
Since commit
28875945ba98d ("rcu: Add support for consolidated-RCU reader checking")
there is an additional check to ensure that a RCU related lock is held
while the RCU list is iterated.
This section holds the SRCU reader lock instead.
Add annotation to list_for_each_entry_rcu() that pmus_srcu must be
acquired during the list traversal.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
59/251 [
Author: He Zhe
Email: zhe.he@windriver.com
Subject: kmemleak: Turn kmemleak_lock and object->lock to raw_spinlock_t
Date: Wed, 19 Dec 2018 16:30:57 +0100
kmemleak_lock as a rwlock on RT can possibly be acquired in atomic context
which does work on RT.
Since the kmemleak operation is performed in atomic context make it a
raw_spinlock_t so it can also be acquired on RT. This is used for
debugging and is not enabled by default in a production like environment
(where performance/latency matters) so it makes sense to make it a
raw_spinlock_t instead trying to get rid of the atomic context.
Turn also the kmemleak_object->lock into raw_spinlock_t which is
acquired (nested) while the kmemleak_lock is held.
The time spent in "echo scan > kmemleak" slightly improved on 64core box
with this patch applied after boot.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lkml.kernel.org/r/20181218150744.GB20197@arrakis.emea.arm.com
Link: https://lkml.kernel.org/r/1542877459-144382-1-git-send-email-zhe.he@windriver.com
Link: https://lkml.kernel.org/r/20190927082230.34152-1-yongxin.liu@windriver.com
Signed-off-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Liu Haitao <haitao.liu@windriver.com>
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
[bigeasy: Redo the description. Merge the individual bits: He Zhe did
the kmemleak_lock, Liu Haitao the ->lock and Yongxin Liu forwarded the
patch.]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
60/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: Use CONFIG_PREEMPTION
Date: Fri, 26 Jul 2019 11:30:49 +0200
Thisi is an all-in-one patch of the current `PREEMPTION' branch.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
61/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: BPF: Disable on PREEMPT_RT
Date: Thu, 10 Oct 2019 16:54:45 +0200
Disable BPF on PREEMPT_RT because
- it allocates and frees memory in atomic context
- it uses up_read_non_owner()
- BPF_PROG_RUN() expects to be invoked in non-preemptible context
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
62/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: workqueue: Don't assume that the callback has interrupts disabled
Date: Tue, 11 Jun 2019 11:21:02 +0200
Due to the TIMER_IRQSAFE flag, the timer callback is invoked with
disabled interrupts. On -RT the callback is invoked in softirq context
with enabled interrupts. Since the interrupts are threaded, there are
are no in_irq() users. The local_bh_disable() around the threaded
handler ensures that there is either a timer or a threaded handler
active on the CPU.
Disable interrupts before __queue_work() is invoked from the timer
callback.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
63/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: sched/swait: Add swait_event_lock_irq()
Date: Wed, 22 May 2019 12:42:26 +0200
The swait_event_lock_irq() is inspired by wait_event_lock_irq(). This is
required by the workqueue code once it switches to swait.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
64/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: workqueue: Use swait for wq_manager_wait
Date: Tue, 11 Jun 2019 11:21:09 +0200
In order for the workqueue code use raw_spinlock_t typed locking there
must not be a spinlock_t typed lock be acquired. A wait_queue_head uses
a spinlock_t lock for its list protection.
Use a swait based queue head to avoid raw_spinlock_t -> spinlock_t
locking.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
65/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: workqueue: Convert the locks to raw type
Date: Wed, 22 May 2019 12:43:56 +0200
After all the workqueue and the timer rework, we can finally make the
worker_pool lock raw.
The lock is not held over an unbounded period of time/iterations.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
66/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/compaction: Disable compact_unevictable_allowed on RT
Date: Fri, 8 Nov 2019 12:55:47 +0100
Since commit
5bbe3547aa3ba ("mm: allow compaction of unevictable pages")
it is allowed to examine mlocked pages for pages to compact by default.
On -RT even minor pagefaults are problematic because it may take a few
100us to resolve them and until then the task is blocked.
Make compact_unevictable_allowed = 0 default and remove it from /proc on
RT.
Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
67/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: Remove ->css_rstat_flush()
Date: Thu, 15 Aug 2019 18:14:16 +0200
I was looking at the lifetime of the the ->css_rstat_flush() to see if
cgroup_rstat_cpu_lock should remain a raw_spinlock_t. I didn't find any
users and is unused since it was introduced in commit
8f53470bab042 ("cgroup: Add cgroup_subsys->css_rstat_flush()")
Remove the css_rstat_flush callback because it has no users.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
68/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: Consolidate users of cgroup_rstat_lock.
Date: Fri, 16 Aug 2019 12:20:42 +0200
cgroup_rstat_flush_irqsafe() has no users, remove it.
cgroup_rstat_flush_hold() and cgroup_rstat_flush_release() are only used within
this file. Make it static.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
69/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: Remove `may_sleep' from cgroup_rstat_flush_locked()
Date: Fri, 16 Aug 2019 12:25:35 +0200
cgroup_rstat_flush_locked() is always invoked with `may_sleep' set to
true so that this case can be made default and the parameter removed.
Remove the `may_sleep' parameter.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
70/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: Acquire cgroup_rstat_lock with enabled interrupts
Date: Fri, 16 Aug 2019 12:49:36 +0200
There is no need to disable interrupts while cgroup_rstat_lock is
acquired. The lock is never used in-IRQ context so a simple spin_lock()
is enough for synchronisation purpose.
Acquire cgroup_rstat_lock without disabling interrupts and ensure that
cgroup_rstat_cpu_lock is acquired with disabled interrupts (this one is
acquired in-IRQ context).
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
71/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: workingset: replace IRQ-off check with a lockdep assert.
Date: Mon, 11 Feb 2019 10:40:46 +0100
Commit
68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow nodes")
introduced an IRQ-off check to ensure that a lock is held which also
disabled interrupts. This does not work the same way on -RT because none
of the locks, that are held, disable interrupts.
Replace this check with a lockdep assert which ensures that the lock is
held.
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
72/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: tpm: remove tpm_dev_wq_lock
Date: Mon, 11 Feb 2019 11:33:11 +0100
Added in commit
9e1b74a63f776 ("tpm: add support for nonblocking operation")
but never actually used it.
Cc: Philip Tricca <philip.b.tricca@intel.com>
Cc: Tadeusz Struk <tadeusz.struk@intel.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
73/251 [
Author: Rob Herring
Email: robh@kernel.org
Subject: of: Rework and simplify phandle cache to use a fixed size
Date: Wed, 11 Dec 2019 17:23:45 -0600
The phandle cache was added to speed up of_find_node_by_phandle() by
avoiding walking the whole DT to find a matching phandle. The
implementation has several shortcomings:
- The cache is designed to work on a linear set of phandle values.
This is true for dtc generated DTs, but not for other cases such as
Power.
- The cache isn't enabled until of_core_init() and a typical system
may see hundreds of calls to of_find_node_by_phandle() before that
point.
- The cache is freed and re-allocated when the number of phandles
changes.
- It takes a raw spinlock around a memory allocation which breaks on
RT.
Change the implementation to a fixed size and use hash_32() as the
cache index. This greatly simplifies the implementation. It avoids
the need for any re-alloc of the cache and taking a reference on nodes
in the cache. We only have a single source of removing cache entries
which is of_detach_node().
Using hash_32() removes any assumption on phandle values improving
the hit rate for non-linear phandle values. The effect on linear values
using hash_32() is about a 10% collision. The chances of thrashing on
colliding values seems to be low.
To compare performance, I used a RK3399 board which is a pretty typical
system. I found that just measuring boot time as done previously is
noisy and may be impacted by other things. Also bringing up secondary
cores causes some issues with measuring, so I booted with 'nr_cpus=1'.
With no caching, calls to of_find_node_by_phandle() take about 20124 us
for 1248 calls. There's an additional 288 calls before time keeping is
up. Using the average time per hit/miss with the cache, we can calculate
these calls to take 690 us (277 hit / 11 miss) with a 128 entry cache
and 13319 us with no cache or an uninitialized cache.
Comparing the 3 implementations the time spent in
of_find_node_by_phandle() is:
no cache: 20124 us (+ 13319 us)
128 entry cache: 5134 us (+ 690 us)
current cache: 819 us (+ 13319 us)
We could move the allocation of the cache earlier to improve the
current cache, but that just further complicates the situation as it
needs to be after slab is up, so we can't do it when unflattening (which
uses memblock).
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lkml.kernel.org/r/20191211232345.24810-1-robh@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
74/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: timekeeping: Split jiffies seqlock
Date: Thu, 14 Feb 2013 22:36:59 +0100
Replace jiffies_lock seqlock with a simple seqcounter and a rawlock so
it can be taken in atomic context on RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
75/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signal: Revert ptrace preempt magic
Date: Wed, 21 Sep 2011 19:57:12 +0200
Upstream commit '53da1d9456fe7f8 fix ptrace slowness' is nothing more
than a bandaid around the ptrace design trainwreck. It's not a
correctness issue, it's merily a cosmetic bandaid.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
76/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: dma-buf: Use seqlock_t instread disabling preemption
Date: Wed, 14 Aug 2019 16:38:43 +0200
"dma reservation" disables preemption while acquiring the write access
for "seqcount".
Replace the seqcount with a seqlock_t which provides seqcount like
semantic and lock for writer.
Link: https://lkml.kernel.org/r/f410b429-db86-f81c-7c67-f563fa808b62@free.fr
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
77/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: seqlock: Prevent rt starvation
Date: Wed, 22 Feb 2012 12:03:30 +0100
If a low prio writer gets preempted while holding the seqlock write
locked, a high prio reader spins forever on RT.
To prevent this let the reader grab the spinlock, so it blocks and
eventually boosts the writer. This way the writer can proceed and
endless spinning is prevented.
For seqcount writers we disable preemption over the update code
path. Thanks to Al Viro for distangling some VFS code to make that
possible.
Nicholas Mc Guire:
- spin_lock+unlock => spin_unlock_wait
- __write_seqcount_begin => __raw_write_seqcount_begin
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
78/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: NFSv4: replace seqcount_t with a seqlock_t
Date: Fri, 28 Oct 2016 23:05:11 +0200
The raw_write_seqcount_begin() in nfs4_reclaim_open_state() causes a
preempt_disable() on -RT. The spin_lock()/spin_unlock() in that section does
not work.
The lockdep part was removed in commit
abbec2da13f0 ("NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state")
because lockdep complained.
The whole seqcount thing was introduced in commit
c137afabe330 ("NFSv4: Allow the state manager to mark an open_owner as being recovered").
The recovery threads runs only once.
write_seqlock() does not work on !RT because it disables preemption and it the
writer side is preemptible (has to remain so despite the fact that it will
block readers).
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Link: https://lkml.kernel.org/r/20161021164727.24485-1-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
79/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/Qdisc: use a seqlock instead seqcount
Date: Wed, 14 Sep 2016 17:36:35 +0200
The seqcount disables preemption on -RT while it is held which can't
remove. Also we don't want the reader to spin for ages if the writer is
scheduled out. The seqlock on the other hand will serialize / sleep on
the lock while writer is active.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
80/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: Add a mutex around devnet_rename_seq
Date: Wed, 20 Mar 2013 18:06:20 +0100
On RT write_seqcount_begin() disables preemption and device_rename()
allocates memory with GFP_KERNEL and grabs later the sysfs_mutex
mutex. Serialize with a mutex and add use the non preemption disabling
__write_seqcount_begin().
To avoid writer starvation, let the reader grab the mutex and release
it when it detects a writer in progress. This keeps the normal case
(no reader on the fly) fast.
[ tglx: Instead of replacing the seqcount by a mutex, add the mutex ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
81/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: userfaultfd: Use a seqlock instead of seqcount
Date: Wed, 18 Dec 2019 12:25:09 +0100
On RT write_seqcount_begin() disables preemption which leads to warning
in add_wait_queue() while the spinlock_t is acquired.
The waitqueue can't be converted to swait_queue because
userfaultfd_wake_function() is used as a custom wake function.
Use seqlock instead seqcount to avoid the preempt_disable() section
during add_wait_queue().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
82/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/nfs: turn rmdir_sem into a semaphore
Date: Thu, 15 Sep 2016 10:51:27 +0200
The RW semaphore had a reader side which used the _non_owner version
because it most likely took the reader lock in one thread and released it
in another which would cause lockdep to complain if the "regular"
version was used.
On -RT we need the owner because the rw lock is turned into a rtmutex.
The semaphores on the hand are "plain simple" and should work as
expected. We can't have multiple readers but on -RT we don't allow
multiple readers anyway so that is not a loss.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
83/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: disable preemption on i_dir_seq's write side
Date: Fri, 20 Oct 2017 11:29:53 +0200
i_dir_seq is an opencoded seqcounter. Based on the code it looks like we
could have two writers in parallel despite the fact that the d_lock is
held. The problem is that during the write process on RT the preemption
is still enabled and if this process is interrupted by a reader with RT
priority then we lock up.
To avoid that lock up I am disabling the preemption during the update.
The rename of i_dir_seq is here to ensure to catch new write sides in
future.
Cc: stable-rt@vger.kernel.org
Reported-by: Oleg.Karfich@wago.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
84/251 [
Author: Paul Gortmaker
Email: paul.gortmaker@windriver.com
Subject: list_bl: Make list head locking RT safe
Date: Fri, 21 Jun 2013 15:07:25 -0400
As per changes in include/linux/jbd_common.h for avoiding the
bit_spin_locks on RT ("fs: jbd/jbd2: Make state lock and journal
head lock rt safe") we do the same thing here.
We use the non atomic __set_bit and __clear_bit inside the scope of
the lock to preserve the ability of the existing LIST_DEBUG code to
use the zero'th bit in the sanity checks.
As a bit spinlock, we had no lockdep visibility into the usage
of the list head locking. Now, if we were to implement it as a
standard non-raw spinlock, we would see:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
in_atomic(): 1, irqs_disabled(): 0, pid: 122, name: udevd
5 locks held by udevd/122:
#0: (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffff811967e8>] lock_rename+0xe8/0xf0
#1: (rename_lock){+.+...}, at: [<ffffffff811a277c>] d_move+0x2c/0x60
#2: (&dentry->d_lock){+.+...}, at: [<ffffffff811a0763>] dentry_lock_for_move+0xf3/0x130
#3: (&dentry->d_lock/2){+.+...}, at: [<ffffffff811a0734>] dentry_lock_for_move+0xc4/0x130
#4: (&dentry->d_lock/3){+.+...}, at: [<ffffffff811a0747>] dentry_lock_for_move+0xd7/0x130
Pid: 122, comm: udevd Not tainted 3.4.47-rt62 #7
Call Trace:
[<ffffffff810b9624>] __might_sleep+0x134/0x1f0
[<ffffffff817a24d4>] rt_spin_lock+0x24/0x60
[<ffffffff811a0c4c>] __d_shrink+0x5c/0xa0
[<ffffffff811a1b2d>] __d_drop+0x1d/0x40
[<ffffffff811a24be>] __d_move+0x8e/0x320
[<ffffffff811a278e>] d_move+0x3e/0x60
[<ffffffff81199598>] vfs_rename+0x198/0x4c0
[<ffffffff8119b093>] sys_renameat+0x213/0x240
[<ffffffff817a2de5>] ? _raw_spin_unlock+0x35/0x60
[<ffffffff8107781c>] ? do_page_fault+0x1ec/0x4b0
[<ffffffff817a32ca>] ? retint_swapgs+0xe/0x13
[<ffffffff813eb0e6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8119b0db>] sys_rename+0x1b/0x20
[<ffffffff817a3b96>] system_call_fastpath+0x1a/0x1f
Since we are only taking the lock during short lived list operations,
lets assume for now that it being raw won't be a significant latency
concern.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
[julia@ni.com: Use #define instead static inline to avoid false positive from
lockdep]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
85/251 [
Author: Clark Williams
Email: williams@redhat.com
Subject: fscache: initialize cookie hash table raw spinlocks
Date: Tue, 3 Jul 2018 13:34:30 -0500
The fscache cookie mechanism uses a hash table of hlist_bl_head structures. The
PREEMPT_RT patcheset adds a raw spinlock to this structure and so on PREEMPT_RT
the structures get used uninitialized, causing warnings about bad magic numbers
when spinlock debugging is turned on.
Use the init function for fscache cookies.
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
86/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: bring back explicit INIT_HLIST_BL_HEAD init
Date: Wed, 13 Sep 2017 12:32:34 +0200
Commit 3d375d78593c ("mm: update callers to use HASH_ZERO flag") removed
INIT_HLIST_BL_HEAD and uses the ZERO flag instead for the init. However
on RT we have also a spinlock which needs an init call so we can't use
that.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
87/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: use swait_queue instead of waitqueue
Date: Wed, 14 Sep 2016 14:35:49 +0200
__d_lookup_done() invokes wake_up_all() while holding a hlist_bl_lock()
which disables preemption. As a workaround convert it to swait.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
88/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: kconfig: Disable config options which are not RT compatible
Date: Sun, 24 Jul 2011 12:11:43 +0200
Disable stuff which is known to have issues on RT
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
89/251 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: Allow only SLUB on RT
Date: Fri, 3 Jul 2009 08:44:03 -0500
Memory allocation disables interrupts as part of the allocation and freeing
process. For -RT it is important that this section remain short and don't
depend on the size of the request or an internal state of the memory allocator.
At the beginning the SLAB memory allocator was adopted for RT's needs and it
required substantial changes. Later, with the addition of the SLUB memory
allocator we adopted this one as well and the changes were smaller. More
important, due to the design of the SLUB allocator it performs better and its
worst case latency was smaller. In the end only SLUB remained supported.
Disable SLAB and SLOB on -RT. Only SLUB is adopted to -RT needs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
90/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rcu: make RCU_BOOST default on RT
Date: Fri, 21 Mar 2014 20:19:05 +0100
Since it is no longer invoked from the softirq people run into OOM more
often if the priority of the RCU thread is too low. Making boosting
default on RT should help in those case and it can be switched off if
someone knows better.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
91/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable CONFIG_RT_GROUP_SCHED on RT
Date: Mon, 18 Jul 2011 17:03:52 +0200
Carsten reported problems when running:
taskset 01 chrt -f 1 sleep 1
from within rc.local on a F15 machine. The task stays running and
never gets on the run queue because some of the run queues have
rt_throttled=1 which does not go away. Works nice from a ssh login
shell. Disabling CONFIG_RT_GROUP_SCHED solves that as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
92/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: disable NET_RX_BUSY_POLL on RT
Date: Sat, 27 May 2017 19:02:06 +0200
napi_busy_loop() disables preemption and performs a NAPI poll. We can't acquire
sleeping locks with disabled preemption so we would have to work around this
and add explicit locking for synchronisation against ksoftirqd.
Without explicit synchronisation a low priority process would "own" the NAPI
state (by setting NAPIF_STATE_SCHED) and could be scheduled out (no
preempt_disable() and BH is preemptible on RT).
In case a network packages arrives then the interrupt handler would set
NAPIF_STATE_MISSED and the system would wait until the task owning the NAPI
would be scheduled in again.
Should a task with RT priority busy poll then it would consume the CPU instead
allowing tasks with lower priority to run.
The NET_RX_BUSY_POLL is disabled by default (the system wide sysctls for
poll/read are set to zero) so disable NET_RX_BUSY_POLL on RT to avoid wrong
locking context on RT. Should this feature be considered useful on RT systems
then it could be enabled again with proper locking and synchronisation.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
93/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: md: disable bcache
Date: Thu, 29 Aug 2013 11:48:57 +0200
It uses anon semaphores
|drivers/md/bcache/request.c: In function ‘cached_dev_write_complete’:
|drivers/md/bcache/request.c:1007:2: error: implicit declaration of function ‘up_read_non_owner’ [-Werror=implicit-function-declaration]
| up_read_non_owner(&dc->writeback_lock);
| ^
|drivers/md/bcache/request.c: In function ‘request_write’:
|drivers/md/bcache/request.c:1033:2: error: implicit declaration of function ‘down_read_non_owner’ [-Werror=implicit-function-declaration]
| down_read_non_owner(&dc->writeback_lock);
| ^
either we get rid of those or we have to introduce them…
Link: http://lkml.kernel.org/r/20130820111602.3cea203c@gandalf.local.home
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
94/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Disable runtime services on RT
Date: Thu, 26 Jul 2018 15:03:16 +0200
Based on meassurements the EFI functions get_variable /
get_next_variable take up to 2us which looks okay.
The functions get_time, set_time take around 10ms. Those 10ms are too
much. Even one ms would be too much.
Ard mentioned that SetVariable might even trigger larger latencies if
the firware will erase flash blocks on NOR.
The time-functions are used by efi-rtc and can be triggered during
runtimed (either via explicit read/write or ntp sync).
The variable write could be used by pstore.
These functions can be disabled without much of a loss. The poweroff /
reboot hooks may be provided by PSCI.
Disable EFI's runtime wrappers.
This was observed on "EFI v2.60 by SoftIron Overdrive 1000".
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
95/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Allow efi=runtime
Date: Thu, 26 Jul 2018 15:06:10 +0200
In case the command line option "efi=noruntime" is default at built-time, the user
could overwrite its state by `efi=runtime' and allow it again.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
96/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Disable HAVE_ARCH_JUMP_LABEL
Date: Mon, 1 Jul 2019 17:39:28 +0200
__text_poke() does:
| local_irq_save(flags);
…
| ptep = get_locked_pte(poking_mm, poking_addr, &ptl);
which does not work on -RT because the PTE-lock is a spinlock_t typed lock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
97/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Add local irq locks
Date: Mon, 20 Jun 2011 09:03:47 +0200
Introduce locallock. For !RT this maps to preempt_disable()/
local_irq_disable() so there is not much that changes. For RT this will
map to a spinlock. This makes preemption possible and locked "ressource"
gets the lockdep anotation it wouldn't have otherwise. The locks are
recursive for owner == current. Also, all locks user migrate_disable()
which ensures that the task is not migrated to another CPU while the lock
is held and the owner is preempted.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
98/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: softirq: Add preemptible softirq
Date: Mon, 20 May 2019 13:09:08 +0200
Add preemptible softirq for RT's needs. By removing the softirq count
from the preempt counter, the softirq becomes preemptible. A per-CPU
lock ensures that there is no parallel softirq processing or that
per-CPU variables are not access in parallel by multiple threads.
local_bh_enable() will process all softirq work that has been raised in
its BH-disabled section once the BH counter gets to 0.
[+ rcu_read_lock() as part of local_bh_disable() by Scott Wood]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
99/251 [
Author: Oleg Nesterov
Email: oleg@redhat.com
Subject: signal/x86: Delay calling signals in atomic
Date: Tue, 14 Jul 2015 14:26:34 +0200
On x86_64 we must disable preemption before we enable interrupts
for stack faults, int3 and debugging, because the current task is using
a per CPU debug stack defined by the IST. If we schedule out, another task
can come in and use the same stack and cause the stack to be corrupted
and crash the kernel on return.
When CONFIG_PREEMPT_RT is enabled, spin_locks become mutexes, and
one of these is the spin lock used in signal handling.
Some of the debug code (int3) causes do_trap() to send a signal.
This function calls a spin lock that has been converted to a mutex
and has the possibility to sleep. If this happens, the above issues with
the corrupted stack is possible.
Instead of calling the signal right away, for PREEMPT_RT and x86_64,
the signal information is stored on the stacks task_struct and
TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
code will send the signal when preemption is enabled.
[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: also needed on 32bit as per Yang Shi <yang.shi@linaro.org>]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
100/251 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: Split IRQ-off and zone->lock while freeing pages from PCP list #2
Date: Mon, 28 May 2018 15:24:21 +0200
Split the IRQ-off section while accessing the PCP list from zone->lock
while freeing pages.
Introcude isolate_pcp_pages() which separates the pages from the PCP
list onto a temporary list and then free the temporary list via
free_pcppages_bulk().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
101/251 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: Split IRQ-off and zone->lock while freeing pages from PCP list #2
Date: Mon, 28 May 2018 15:24:21 +0200
Split the IRQ-off section while accessing the PCP list from zone->lock
while freeing pages.
Introcude isolate_pcp_pages() which separates the pages from the PCP
list onto a temporary list and then free the temporary list via
free_pcppages_bulk().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
102/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/SLxB: change list_lock to raw_spinlock_t
Date: Mon, 28 May 2018 15:24:22 +0200
The list_lock is used with used with IRQs off on RT. Make it a raw_spinlock_t
otherwise the interrupts won't be disabled on -RT. The locking rules remain
the same on !RT.
This patch changes it for SLAB and SLUB since both share the same header
file for struct kmem_cache_node defintion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
103/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/SLUB: delay giving back empty slubs to IRQ enabled regions
Date: Thu, 21 Jun 2018 17:29:19 +0200
__free_slab() is invoked with disabled interrupts which increases the
irq-off time while __free_pages() is doing the work.
Allow __free_slab() to be invoked with enabled interrupts and move
everything from interrupts-off invocations to a temporary per-CPU list
so it can be processed later.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
104/251 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: page_alloc: rt-friendly per-cpu pages
Date: Fri, 3 Jul 2009 08:29:37 -0500
rt-friendly per-cpu pages: convert the irqs-off per-cpu locking
method into a preemptible, explicit-per-cpu-locks method.
Contains fixes from:
Peter Zijlstra <a.p.zijlstra@chello.nl>
Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
105/251 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: mm/page_alloc: Split drain_local_pages()
Date: Thu, 18 Apr 2019 11:09:04 +0200
Splitting the functionality of drain_local_pages() into a separate
function. This is a preparatory work for introducing the static key
dependend locking mechanism.
No functional change.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
106/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/swap: Add static key dependent pagevec locking
Date: Thu, 18 Apr 2019 11:09:05 +0200
The locking of struct pagevec is done by disabling preemption. In case the
struct has be accessed form interrupt context then interrupts are
disabled. This means the struct can only be accessed locally from the
CPU. There is also no lockdep coverage which would scream during if it
accessed from wrong context.
Create struct swap_pagevec which contains of a pagevec member and a
spin_lock_t. Introduce a static key, which changes the locking behavior
only if the key is set in the following way: Before the struct is accessed
the spin_lock has to be acquired instead of using preempt_disable(). Since
the struct is used CPU-locally there is no spinning on the lock but the
lock is acquired immediately. If the struct is accessed from interrupt
context, spin_lock_irqsave() is used.
No functional change yet because static key is not enabled.
[anna-maria: introduce static key]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
107/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/swap: Access struct pagevec remotely
Date: Thu, 18 Apr 2019 11:09:06 +0200
When the newly introduced static key would be enabled, struct pagevec is
locked during access. So it is possible to access it from a remote CPU. The
advantage is that the work can be done from the "requesting" CPU without
firing a worker on a remote CPU and waiting for it to complete the work.
No functional change because static key is not enabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
108/251 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: mm/swap: Enable "use_pvec_lock" nohz_full dependent
Date: Thu, 18 Apr 2019 11:09:07 +0200
When a system runs with CONFIG_NO_HZ_FULL enabled, the tick of CPUs listed
in 'nohz_full=' kernel command line parameter should be stopped whenever
possible. The tick stays longer stopped, when work for this CPU is handled
by another CPU.
With the already introduced static key 'use_pvec_lock' there is the
possibility to prevent firing a worker for mm/swap work on a remote CPU
with a stopped tick.
Therefore enabling the static key in case kernel command line parameter
'nohz_full=' setup was successful, which implies that CONFIG_NO_HZ_FULL is
set.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
109/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/swap: Enable use pvec lock on RT
Date: Mon, 12 Aug 2019 11:20:44 +0200
On RT we also need to avoid preempt disable/IRQ-off regions so have to enable
the locking while accessing pvecs.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
110/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: preempt: Provide preempt_*_(no)rt variants
Date: Fri, 24 Jul 2009 12:38:56 +0200
RT needs a few preempt_disable/enable points which are not necessary
otherwise. Implement variants to avoid #ifdeffery.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
111/251 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm/vmstat: Protect per cpu variables with preempt disable on RT
Date: Fri, 3 Jul 2009 08:30:13 -0500
Disable preemption on -RT for the vmstat code. On vanila the code runs in
IRQ-off regions while on -RT it is not. "preempt_disable" ensures that the
same ressources is not updated in parallel due to preemption.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
112/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm: Enable SLUB for RT
Date: Thu, 25 Oct 2012 10:32:35 +0100
Avoid the memory allocation in IRQ section
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: factor out everything except the kcalloc() workaorund ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
113/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: slub: Enable irqs for __GFP_WAIT
Date: Wed, 9 Jan 2013 12:08:15 +0100
SYSTEM_RUNNING might be too late for enabling interrupts. Allocations
with GFP_WAIT can happen before that. So use this as an indicator.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
114/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: slub: Disable SLUB_CPU_PARTIAL
Date: Wed, 15 Apr 2015 19:00:47 +0200
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 87, name: rcuop/7
|1 lock held by rcuop/7/87:
| #0: (rcu_callback){......}, at: [<ffffffff8112c76a>] rcu_nocb_kthread+0x1ca/0x5d0
|Preemption disabled at:[<ffffffff811eebd9>] put_cpu_partial+0x29/0x220
|
|CPU: 0 PID: 87 Comm: rcuop/7 Tainted: G W 4.0.0-rt0+ #477
|Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
| 000000000007a9fc ffff88013987baf8 ffffffff817441c7 0000000000000007
| 0000000000000000 ffff88013987bb18 ffffffff810eee51 0000000000000000
| ffff88013fc10200 ffff88013987bb48 ffffffff8174a1c4 000000000007a9fc
|Call Trace:
| [<ffffffff817441c7>] dump_stack+0x4f/0x90
| [<ffffffff810eee51>] ___might_sleep+0x121/0x1b0
| [<ffffffff8174a1c4>] rt_spin_lock+0x24/0x60
| [<ffffffff811a689a>] __free_pages_ok+0xaa/0x540
| [<ffffffff811a729d>] __free_pages+0x1d/0x30
| [<ffffffff811eddd5>] __free_slab+0xc5/0x1e0
| [<ffffffff811edf46>] free_delayed+0x56/0x70
| [<ffffffff811eecfd>] put_cpu_partial+0x14d/0x220
| [<ffffffff811efc98>] __slab_free+0x158/0x2c0
| [<ffffffff811f0021>] kmem_cache_free+0x221/0x2d0
| [<ffffffff81204d0c>] file_free_rcu+0x2c/0x40
| [<ffffffff8112c7e3>] rcu_nocb_kthread+0x243/0x5d0
| [<ffffffff810e951c>] kthread+0xfc/0x120
| [<ffffffff8174abc8>] ret_from_fork+0x58/0x90
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
115/251 [
Author: Yang Shi
Email: yang.shi@windriver.com
Subject: mm/memcontrol: Don't call schedule_work_on in preemption disabled context
Date: Wed, 30 Oct 2013 11:48:33 -0700
The following trace is triggered when running ltp oom test cases:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 17188, name: oom03
Preemption disabled at:[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
CPU: 2 PID: 17188 Comm: oom03 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffff88007684d730 ffff880070df9b58 ffffffff8169918d ffff880070df9b70
ffffffff8106db31 ffff88007688b4a0 ffff880070df9b88 ffffffff8169d9c0
ffff88007688b4a0 ffff880070df9bc8 ffffffff81059da1 0000000170df9bb0
Call Trace:
[<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff8112b361>] drain_all_stock+0xe1/0x1c0
[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
[<ffffffff8112beda>] __mem_cgroup_try_charge+0x41a/0xc40
[<ffffffff810f1c91>] ? release_pages+0x1b1/0x1f0
[<ffffffff8106f200>] ? sched_exec+0x40/0xb0
[<ffffffff8112cc87>] mem_cgroup_charge_common+0x37/0x70
[<ffffffff8112e2c6>] mem_cgroup_newpage_charge+0x26/0x30
[<ffffffff8110af68>] handle_pte_fault+0x618/0x840
[<ffffffff8103ecf6>] ? unpin_current_cpu+0x16/0x70
[<ffffffff81070f94>] ? migrate_enable+0xd4/0x200
[<ffffffff8110cde5>] handle_mm_fault+0x145/0x1e0
[<ffffffff810301e1>] __do_page_fault+0x1a1/0x4c0
[<ffffffff8169c9eb>] ? preempt_schedule_irq+0x4b/0x70
[<ffffffff8169e3b7>] ? retint_kernel+0x37/0x40
[<ffffffff8103053e>] do_page_fault+0xe/0x10
[<ffffffff8169e4c2>] page_fault+0x22/0x30
So, to prevent schedule_work_on from being called in preempt disabled context,
replace the pair of get/put_cpu() to get/put_cpu_light().
Signed-off-by: Yang Shi <yang.shi@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
116/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/memcontrol: Replace local_irq_disable with local locks
Date: Wed, 28 Jan 2015 17:14:16 +0100
There are a few local_irq_disable() which then take sleeping locks. This
patch converts them local locks.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
117/251 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: mm/zsmalloc: copy with get_cpu_var() and locking
Date: Tue, 22 Mar 2016 11:16:09 +0100
get_cpu_var() disables preemption and triggers a might_sleep() splat later.
This is replaced with get_locked_var().
This bitspinlocks are replaced with a proper mutex which requires a slightly
larger struct to allocate.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: replace the bitspin_lock() with a mutex, get_locked_var(). Mike then
fixed the size magic]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
118/251 [
Author: Luis Claudio R. Goncalves
Email: lclaudio@uudg.org
Subject: mm/zswap: Do not disable preemption in zswap_frontswap_store()
Date: Tue, 25 Jun 2019 11:28:04 -0300
Zswap causes "BUG: scheduling while atomic" by blocking on a rt_spin_lock() with
preemption disabled. The preemption is disabled by get_cpu_var() in
zswap_frontswap_store() to protect the access of the zswap_dstmem percpu variable.
Use get_locked_var() to protect the percpu zswap_dstmem variable, making the
code preemptive.
As get_cpu_ptr() also disables preemption, replace it by this_cpu_ptr() and
remove the counterpart put_cpu_ptr().
Steps to Reproduce:
1. # grubby --args "zswap.enabled=1" --update-kernel DEFAULT
2. # reboot
3. Calculate the amount o memory to be used by the test:
---> grep MemAvailable /proc/meminfo
---> Add 25% ~ 50% to that value
4. # stress --vm 1 --vm-bytes ${MemAvailable+25%} --timeout 240s
Usually, in less than 5 minutes the backtrace listed below appears, followed
by a kernel panic:
| BUG: scheduling while atomic: kswapd1/181/0x00000002
|
| Preemption disabled at:
| [<ffffffff8b2a6cda>] zswap_frontswap_store+0x21a/0x6e1
|
| Kernel panic - not syncing: scheduling while atomic
| CPU: 14 PID: 181 Comm: kswapd1 Kdump: loaded Not tainted 5.0.14-rt9 #1
| Hardware name: AMD Pence/Pence, BIOS WPN2321X_Weekly_12_03_21 03/19/2012
| Call Trace:
| panic+0x106/0x2a7
| __schedule_bug.cold+0x3f/0x51
| __schedule+0x5cb/0x6f0
| schedule+0x43/0xd0
| rt_spin_lock_slowlock_locked+0x114/0x2b0
| rt_spin_lock_slowlock+0x51/0x80
| zbud_alloc+0x1da/0x2d0
| zswap_frontswap_store+0x31a/0x6e1
| __frontswap_store+0xab/0x130
| swap_writepage+0x39/0x70
| pageout.isra.0+0xe3/0x320
| shrink_page_list+0xa8e/0xd10
| shrink_inactive_list+0x251/0x840
| shrink_node_memcg+0x213/0x770
| shrink_node+0xd9/0x450
| balance_pgdat+0x2d5/0x510
| kswapd+0x218/0x470
| kthread+0xfb/0x130
| ret_from_fork+0x27/0x50
Cc: stable-rt@vger.kernel.org
Reported-by: Ping Fang <pifang@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
119/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: radix-tree: use local locks
Date: Wed, 25 Jan 2017 16:34:27 +0100
The preload functionality uses per-CPU variables and preempt-disable to
ensure that it does not switch CPUs during its usage. This patch adds
local_locks() instead preempt_disable() for the same purpose and to
remain preemptible on -RT.
Cc: stable-rt@vger.kernel.org
Reported-and-debugged-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
120/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: kvm Require const tsc for RT
Date: Sun, 6 Nov 2011 12:26:18 +0100
Non constant TSC is a nightmare on bare metal already, but with
virtualization it becomes a complete disaster because the workarounds
are horrible latency wise. That's also a preliminary for running RT in
a guest on top of a RT host.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
121/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: pci/switchtec: Don't use completion's wait queue
Date: Wed, 4 Oct 2017 10:24:23 +0200
The poll callback is using completion's wait_queue_head_t member and
puts it in poll_wait() so the poll() caller gets a wakeup after command
completed. This does not work on RT because we don't have a
wait_queue_head_t in our completion implementation. Nobody in tree does
like that in tree so this is the only driver that breaks.
Instead of using the completion here is waitqueue with a status flag as
suggested by Logan.
I don't have the HW so I have no idea if it works as expected, so please
test it.
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
122/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: wait.h: include atomic.h
Date: Mon, 28 Oct 2013 12:19:57 +0100
| CC init/main.o
|In file included from include/linux/mmzone.h:9:0,
| from include/linux/gfp.h:4,
| from include/linux/kmod.h:22,
| from include/linux/module.h:13,
| from init/main.c:15:
|include/linux/wait.h: In function ‘wait_on_atomic_t’:
|include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
| if (atomic_read(val) == 0)
| ^
This pops up on ARM. Non-RT gets its atomic.h include from spinlock.h
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
123/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: completion: Use simple wait queues
Date: Fri, 11 Jan 2013 11:23:51 +0100
Completions have no long lasting callbacks and therefor do not need
the complex waitqueue variant. Use simple waitqueues which reduces the
contention on the waitqueue lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[cminyard@mvista.com: Move __prepare_to_swait() into the do loop because
swake_up_locked() removes the waiter on wake from the queue while in the
original code it is not the case]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
124/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: hrtimer: Allow raw wakeups during boot
Date: Fri, 9 Aug 2019 15:25:21 +0200
There are a few wake-up timers during the early boot which are essencial for
the system to make progress. At this stage there are no softirq spawn for the
softirq processing so there is no timer processing in softirq.
The wakeup in question:
smpboot_create_thread()
-> kthread_create_on_cpu()
-> kthread_bind()
-> wait_task_inactive()
-> schedule_hrtimeout()
Let the timer fire in hardirq context during the system boot.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
125/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: hrtimer: move state change before hrtimer_cancel in do_nanosleep()
Date: Thu, 6 Dec 2018 10:15:13 +0100
There is a small window between setting t->task to NULL and waking the
task up (which would set TASK_RUNNING). So the timer would fire, run and
set ->task to NULL while the other side/do_nanosleep() wouldn't enter
freezable_schedule(). After all we are peemptible here (in
do_nanosleep() and on the timer wake up path) and on KVM/virt the
virt-CPU might get preempted.
So do_nanosleep() wouldn't enter freezable_schedule() but cancel the
timer which is still running and wait for it via
hrtimer_wait_for_timer(). Then wait_event()/might_sleep() would complain
that it is invoked with state != TASK_RUNNING.
This isn't a problem since it would be reset to TASK_RUNNING later
anyway and we don't rely on the previous state.
Move the state update to TASK_RUNNING before hrtimer_cancel() so there
are no complains from might_sleep() about wrong state.
Cc: stable-rt@vger.kernel.org
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
126/251 [
Author: John Stultz
Email: johnstul@us.ibm.com
Subject: posix-timers: Thread posix-cpu-timers on -rt
Date: Fri, 3 Jul 2009 08:29:58 -0500
posix-cpu-timer code takes non -rt safe locks in hard irq
context. Move it to a thread.
[ 3.0 fixes from Peter Zijlstra <peterz@infradead.org> ]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
127/251 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: posix-timers: Add expiry lock
Date: Mon, 27 May 2019 16:54:06 +0200
If a about to be removed posix timer is active then the code will retry the
delete operation until it succeeds / the timer callback completes.
Use hrtimer_grab_expiry_lock() for posix timers which use a hrtimer underneath
to spin on a lock until the callback finished.
Introduce cpu_timers_grab_expiry_lock() for the posix-cpu-timer. This will
acquire the proper per-CPU spin_lock which is acquired by the CPU which is
expirering the timer.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
[bigeasy: keep the posix-cpu timer bits, everything else got applied]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
128/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Limit the number of task migrations per batch
Date: Mon, 6 Jun 2011 12:12:51 +0200
Put an upper limit on the number of tasks which are migrated per batch
to avoid large latencies.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
129/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Move mmdrop to RCU on RT
Date: Mon, 6 Jun 2011 12:20:33 +0200
Takes sleeping locks and calls into the memory allocator, so nothing
we want to do in task switch and oder atomic contexts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
130/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched: move stack + kprobe clean up to __put_task_struct()
Date: Mon, 21 Nov 2016 19:31:08 +0100
There is no need to free the stack before the task struct (except for reasons
mentioned in commit 68f24b08ee89 ("sched/core: Free the stack early if
CONFIG_THREAD_INFO_IN_TASK")). This also comes handy on -RT because we can't
free memory in preempt disabled region.
vfree_atomic() delays the memory cleanup to a worker. Since we move everything
to the RCU callback, we can also free it immediately.
Cc: stable-rt@vger.kernel.org #for kprobe_flush_task()
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
131/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add saved_state for tasks blocked on sleeping locks
Date: Sat, 25 Jun 2011 09:21:04 +0200
Spinlocks are state preserving in !RT. RT changes the state when a
task gets blocked on a lock. So we need to remember the state before
the lock contention. If a regular wakeup (not a RTmutex related
wakeup) happens, the saved_state is updated to running. When the lock
sleep is done, the saved state is restored.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
132/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Do not account rcu_preempt_depth on RT in might_sleep()
Date: Tue, 7 Jun 2011 09:19:06 +0200
RT changes the rcu_preempt_depth semantics, so we cannot check for it
in might_sleep().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
133/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable TTWU_QUEUE on RT
Date: Tue, 13 Sep 2011 16:42:35 +0200
The queued remote wakeup mechanism can introduce rather large
latencies if the number of migrated tasks is high. Disable it for RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
134/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: softirq: Avoid a cancel dead-lock in tasklet handling due to preemptible-softirq
Date: Sat, 22 Jun 2019 00:09:22 +0200
A pending / active tasklet which is preempted by a task on the same CPU
will spin indefinitely because the tasklet makes no progress.
To avoid this deadlock we can disable BH which will acquire the
softirq-lock which will force the completion of the softirq and so the
tasklet.
The BH off/on in tasklet_kill() will force tasklets which are not yet
running but scheduled (because ksoftirqd was preempted before it could
start the tasklet).
The BH off/on in tasklet_unlock_wait() will force tasklets which got
preempted while running.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
135/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Check preemption after reenabling interrupts
Date: Sun, 13 Nov 2011 17:17:09 +0100
raise_softirq_irqoff() disables interrupts and wakes the softirq
daemon, but after reenabling interrupts there is no preemption check,
so the execution of the softirq thread might be delayed arbitrarily.
In principle we could add that check to local_irq_enable/restore, but
that's overkill as the rasie_softirq_irqoff() sections are the only
ones which show this behaviour.
Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
136/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Disable softirq stacks for RT
Date: Mon, 18 Jul 2011 13:59:17 +0200
Disable extra stacks for softirqs. We want to preempt softirqs and
having them on special IRQ-stack does not make this easier.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
137/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: use local_bh_disable() in netif_rx_ni()
Date: Fri, 16 Jun 2017 19:03:16 +0200
In 2004 netif_rx_ni() gained a preempt_disable() section around
netif_rx() and its do_softirq() + testing for it. The do_softirq() part
is required because netif_rx() raises the softirq but does not invoke
it. The preempt_disable() is required to remain on the same CPU which added the
skb to the per-CPU list.
All this can be avoided be putting this into a local_bh_disable()ed
section. The local_bh_enable() part will invoke do_softirq() if
required.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
138/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Handle the various new futex race conditions
Date: Fri, 10 Jun 2011 11:04:15 +0200
RT opens a few new interesting race conditions in the rtmutex/futex
combo due to futex hash bucket lock being a 'sleeping' spinlock and
therefor not disabling preemption.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
139/251 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: futex: Fix bug on when a requeued RT task times out
Date: Tue, 14 Jul 2015 14:26:34 +0200
Requeue with timeout causes a bug with PREEMPT_RT.
The bug comes from a timed out condition.
TASK 1 TASK 2
------ ------
futex_wait_requeue_pi()
futex_wait_queue_me()
<timed out>
double_lock_hb();
raw_spin_lock(pi_lock);
if (current->pi_blocked_on) {
} else {
current->pi_blocked_on = PI_WAKE_INPROGRESS;
run_spin_unlock(pi_lock);
spin_lock(hb->lock); <-- blocked!
plist_for_each_entry_safe(this) {
rt_mutex_start_proxy_lock();
task_blocks_on_rt_mutex();
BUG_ON(task->pi_blocked_on)!!!!
The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the
problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to
grab the hb->lock, which it fails to do so. As the hb->lock is a mutex,
it will block and set the "pi_blocked_on" to the hb->lock.
When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails
because the task1's pi_blocked_on is no longer set to that, but instead,
set to the hb->lock.
The fix:
When calling rt_mutex_start_proxy_lock() a check is made to see
if the proxy tasks pi_blocked_on is set. If so, exit out early.
Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies
the proxy task that it is being requeued, and will handle things
appropriately.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
140/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: futex: Ensure lock/unlock symetry versus pi_lock and hash bucket lock
Date: Fri, 1 Mar 2013 11:17:42 +0100
In exit_pi_state_list() we have the following locking construct:
spin_lock(&hb->lock);
raw_spin_lock_irq(&curr->pi_lock);
...
spin_unlock(&hb->lock);
In !RT this works, but on RT the migrate_enable() function which is
called from spin_unlock() sees atomic context due to the held pi_lock
and just decrements the migrate_disable_atomic counter of the
task. Now the next call to migrate_disable() sees the counter being
negative and issues a warning. That check should be in
migrate_enable() already.
Fix this by dropping pi_lock before unlocking hb->lock and reaquire
pi_lock after that again. This is safe as the loop code reevaluates
head again under the pi_lock.
Reported-by: Yong Zhang <yong.zhang@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
141/251 [
Author: Grygorii Strashko
Email: Grygorii.Strashko@linaro.org
Subject: pid.h: include atomic.h
Date: Tue, 21 Jul 2015 19:43:56 +0300
This patch fixes build error:
CC kernel/pid_namespace.o
In file included from kernel/pid_namespace.c:11:0:
include/linux/pid.h: In function 'get_pid':
include/linux/pid.h:78:3: error: implicit declaration of function 'atomic_inc' [-Werror=implicit-function-declaration]
atomic_inc(&pid->count);
^
which happens when
CONFIG_PROVE_LOCKING=n
CONFIG_DEBUG_SPINLOCK=n
CONFIG_DEBUG_MUTEXES=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PID_NS=y
Vanilla gets this via spinlock.h.
Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
142/251 [
Author: Wolfgang M. Reimer
Email: linuxball@gmail.com
Subject: locking: locktorture: Do NOT include rwlock.h directly
Date: Tue, 21 Jul 2015 16:20:07 +0200
Including rwlock.h directly will cause kernel builds to fail
if CONFIG_PREEMPT_RT is defined. The correct header file
(rwlock_rt.h OR rwlock.h) will be included by spinlock.h which
is included by locktorture.c anyway.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Wolfgang M. Reimer <linuxball@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
143/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Add rtmutex_lock_killable()
Date: Thu, 9 Jun 2011 11:43:52 +0200
Add "killable" type to rtmutex. We need this since rtmutex are used as
"normal" mutexes which do use this type.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
144/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Make lock_killable work
Date: Sat, 1 Apr 2017 12:50:59 +0200
Locking an rt mutex killable does not work because signal handling is
restricted to TASK_INTERRUPTIBLE.
Use signal_pending_state() unconditionaly.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
145/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: spinlock: Split the lock types header
Date: Wed, 29 Jun 2011 19:34:01 +0200
Split raw_spinlock into its own file and the remaining spinlock_t into
its own non-RT header. The non-RT header will be replaced later by sleeping
spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
146/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Avoid include hell
Date: Wed, 29 Jun 2011 20:06:39 +0200
Include only the required raw types. This avoids pulling in the
complete spinlock header which in turn requires rtmutex.h at some point.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
147/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rbtree: don't include the rcu header
Date: Mon, 13 Jan 2020 11:23:28 -0500
The RCU header pulls in spinlock.h and fails due not yet defined types:
|In file included from include/linux/spinlock.h:275:0,
| from include/linux/rcupdate.h:38,
| from include/linux/rbtree.h:34,
| from include/linux/rtmutex.h:17,
| from include/linux/spinlock_types.h:18,
| from kernel/bounds.c:13:
|include/linux/rwlock_rt.h:16:38: error: unknown type name ‘rwlock_t’
| extern void __lockfunc rt_write_lock(rwlock_t *rwlock);
| ^
This patch moves the required RCU function from the rcupdate.h header file into
a new header file which can be included by both users.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
148/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Provide rt_mutex_slowlock_locked()
Date: Thu, 12 Oct 2017 16:14:22 +0200
This is the inner-part of rt_mutex_slowlock(), required for rwsem-rt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
149/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: export lockdep-less version of rt_mutex's lock, trylock and unlock
Date: Thu, 12 Oct 2017 16:36:39 +0200
Required for lock implementation ontop of rtmutex.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
150/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add sleeping lock implementation
Date: Thu, 12 Oct 2017 17:11:19 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
151/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Use the proper LOCK_OFFSET for cond_resched()
Date: Sun, 17 Jul 2011 22:51:33 +0200
RT does not increment preempt count when a 'sleeping' spinlock is
locked. Update PREEMPT_LOCK_OFFSET for that case.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
152/251 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: locking/rtmutex: Clean ->pi_blocked_on in the error case
Date: Mon, 30 Sep 2019 18:15:44 +0200
The function rt_mutex_wait_proxy_lock() cleans ->pi_blocked_on in case
of failure (timeout, signal). The same cleanup is required in
__rt_mutex_start_proxy_lock().
In both the cases the tasks was interrupted by a signal or timeout while
acquiring the lock and after the interruption it longer blocks on the
lock.
Fixes: 1a1fb985f2e2b ("futex: Handle early deadlock return correctly")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
153/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rtmutex: trylock is okay on -RT
Date: Wed, 2 Dec 2015 11:34:07 +0100
non-RT kernel could deadlock on rt_mutex_trylock() in softirq context. On
-RT we don't run softirqs in IRQ context but in thread context so it is
not a issue here.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
154/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add mutex implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:17:03 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
155/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add rwsem implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:28:34 +0200
The RT specific R/W semaphore implementation restricts the number of readers
to one because a writer cannot block on multiple readers and inherit its
priority or budget.
The single reader restricting is painful in various ways:
- Performance bottleneck for multi-threaded applications in the page fault
path (mmap sem)
- Progress blocker for drivers which are carefully crafted to avoid the
potential reader/writer deadlock in mainline.
The analysis of the writer code pathes shows, that properly written RT tasks
should not take them. Syscalls like mmap(), file access which take mmap sem
write locked have unbound latencies which are completely unrelated to mmap
sem. Other R/W sem users like graphics drivers are not suitable for RT tasks
either.
So there is little risk to hurt RT tasks when the RT rwsem implementation is
changed in the following way:
- Allow concurrent readers
- Make writers block until the last reader left the critical section. This
blocking is not subject to priority/budget inheritance.
- Readers blocked on a writer inherit their priority/budget in the normal
way.
There is a drawback with this scheme. R/W semaphores become writer unfair
though the applications which have triggered writer starvation (mostly on
mmap_sem) in the past are not really the typical workloads running on a RT
system. So while it's unlikely to hit writer starvation, it's possible. If
there are unexpected workloads on RT systems triggering it, we need to rethink
the approach.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
156/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add rwlock implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:18:06 +0200
The implementation is bias-based, similar to the rwsem implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
157/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: wire up RT's locking
Date: Thu, 12 Oct 2017 17:31:14 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
158/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rtmutex: add ww_mutex addon for mutex-rt
Date: Thu, 12 Oct 2017 17:34:38 +0200
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
159/251 [
Author: Mikulas Patocka
Email: mpatocka@redhat.com
Subject: locking/rt-mutex: fix deadlock in device mapper / block-IO
Date: Mon, 13 Nov 2017 12:56:53 -0500
When some block device driver creates a bio and submits it to another
block device driver, the bio is added to current->bio_list (in order to
avoid unbounded recursion).
However, this queuing of bios can cause deadlocks, in order to avoid them,
device mapper registers a function flush_current_bio_list. This function
is called when device mapper driver blocks. It redirects bios queued on
current->bio_list to helper workqueues, so that these bios can proceed
even if the driver is blocked.
The problem with CONFIG_PREEMPT_RT is that when the device mapper
driver blocks, it won't call flush_current_bio_list (because
tsk_is_pi_blocked returns true in sched_submit_work), so deadlocks in
block device stack can happen.
Note that we can't call blk_schedule_flush_plug if tsk_is_pi_blocked
returns true - that would cause
BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on)) in
task_blocks_on_rt_mutex when flush_current_bio_list attempts to take a
spinlock.
So the proper fix is to call blk_schedule_flush_plug in rt_mutex_fastlock,
when fast acquire failed and when the task is about to block.
CC: stable-rt@vger.kernel.org
[bigeasy: The deadlock is not device-mapper specific, it can also occur
in plain EXT4]
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
160/251 [
Author: Scott Wood
Email: swood@redhat.com
Subject: locking/rt-mutex: Flush block plug on __down_read()
Date: Fri, 4 Jan 2019 15:33:21 -0500
__down_read() bypasses the rtmutex frontend to call
rt_mutex_slowlock_locked() directly, and thus it needs to call
blk_schedule_flush_flug() itself.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
161/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: re-init the wait_lock in rt_mutex_init_proxy_locked()
Date: Thu, 16 Nov 2017 16:48:48 +0100
We could provide a key-class for the lockdep (and fixup all callers) or
move the init to all callers (like it was) in order to avoid lockdep
seeing a double-lock of the wait_lock.
Reported-by: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
162/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ptrace: fix ptrace vs tasklist_lock race
Date: Thu, 29 Aug 2013 18:21:04 +0200
As explained by Alexander Fyodorov <halcy@yandex.ru>:
|read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel,
|and it can remove __TASK_TRACED from task->state (by moving it to
|task->saved_state). If parent does wait() on child followed by a sys_ptrace
|call, the following race can happen:
|
|- child sets __TASK_TRACED in ptrace_stop()
|- parent does wait() which eventually calls wait_task_stopped() and returns
| child's pid
|- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves
| __TASK_TRACED flag to saved_state
|- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive()
The patch is based on his initial patch where an additional check is
added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is
taken in case the caller is interrupted between looking into ->state and
->saved_state.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
163/251 [
Author: Scott Wood
Email: swood@redhat.com
Subject: sched: __set_cpus_allowed_ptr(): Check cpus_mask, not cpus_ptr
Date: Sat, 27 Jul 2019 00:56:32 -0500
This function is concerned with the long-term cpu mask, not the
transitory mask the task might have while migrate disabled. Before
this patch, if a task was migrate disabled at the time
__set_cpus_allowed_ptr() was called, and the new mask happened to be
equal to the cpu that the task was running on, then the mask update
would be lost.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
164/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched/core: add migrate_disable()
Date: Sat, 27 May 2017 19:02:06 +0200
[bristot@redhat.com: rt: Increase/decrease the nr of migratory tasks when enabling/disabling migration
Link: https://lkml.kernel.org/r/e981d271cbeca975bca710e2fbcc6078c09741b0.1498482127.git.bristot@redhat.com
]
[swood@redhat.com: fixups and optimisations
Link:https://lkml.kernel.org/r/20190727055638.20443-1-swood@redhat.com
Link:https://lkml.kernel.org/r/20191012065214.28109-1-swood@redhat.com
]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
165/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: sched/core: migrate_enable() must access takedown_cpu_task on !HOTPLUG_CPU
Date: Fri, 29 Nov 2019 17:24:55 +0100
The variable takedown_cpu_task is never declared/used on !HOTPLUG_CPU
except for migrate_enable(). This leads to a link error.
Don't use takedown_cpu_task in !HOTPLUG_CPU.
Reported-by: Dick Hollenbeck <dick@softplc.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
166/251 [
Author: Scott Wood
Email: swood@redhat.com
Subject: sched: migrate_enable: Use stop_one_cpu_nowait()
Date: Sat, 12 Oct 2019 01:52:14 -0500
migrate_enable() can be called with current->state != TASK_RUNNING.
Avoid clobbering the existing state by using stop_one_cpu_nowait().
Since we're stopping the current cpu, we know that we won't get
past __schedule() until migration_cpu_stop() has run (at least up to
the point of migrating us to another cpu).
Signed-off-by: Scott Wood <swood@redhat.com>
[bigeasy: spin until the request has been processed]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
167/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: trace: Add migrate-disabled counter to tracing output
Date: Sun, 17 Jul 2011 21:56:42 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
168/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: futex: workaround migrate_disable/enable in different context
Date: Wed, 8 Mar 2017 14:23:35 +0100
migrate_enable() invokes __schedule() and it expects a preempt count of one.
Holding a raw_spinlock_t with disabled interrupts should not allow scheduling.
These little hacks ensure that we don't schedule while we lock the hb lockwith
interrupts enabled and unlock it with interrupts disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[XXX: As per PeterZ suggesstion
set_thread_flag(TIF_NEED_RESCHED); preempt_fold_need_resched()
would trigger a scheduler invocation on the last preempt_enable() which in
turn would allow to drop this.
]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
169/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: don't check for __LINUX_SPINLOCK_TYPES_H on -RT archs
Date: Fri, 4 Aug 2017 17:40:42 +0200
Upstream uses arch_spinlock_t within spinlock_t and requests that
spinlock_types.h header file is included first.
On -RT we have the rt_mutex with its raw_lock wait_lock which needs
architectures' spinlock_types.h header file for its definition. However
we need rt_mutex first because it is used to build the spinlock_t so
that check does not work for us.
Therefore I am dropping that check.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
170/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: Make spinlock_t and rwlock_t a RCU section on RT
Date: Tue, 19 Nov 2019 09:25:04 +0100
On !RT a locked spinlock_t and rwlock_t disables preemption which
implies a RCU read section. There is code that relies on that behaviour.
Add an explicit RCU read section on RT while a sleeping lock (a lock
which would disables preemption on !RT) acquired.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
171/251 [
Author: Scott Wood
Email: swood@redhat.com
Subject: rcu: Use rcuc threads on PREEMPT_RT as we did
Date: Wed, 11 Sep 2019 17:57:28 +0100
While switching to the reworked RCU-thread code, it has been forgotten
to enable the thread processing on -RT.
Besides restoring behavior that used to be default on RT, this avoids
a deadlock on scheduler locks.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
172/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: srcu: replace local_irqsave() with a locallock
Date: Thu, 12 Oct 2017 18:37:12 +0200
There are two instances which disable interrupts in order to become a
stable this_cpu_ptr() pointer. The restore part is coupled with
spin_unlock_irqrestore() which does not work on RT.
Replace the local_irq_save() call with the appropriate local_lock()
version of it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
173/251 [
Author: Julia Cartwright
Email: julia@ni.com
Subject: rcu: enable rcu_normal_after_boot by default for RT
Date: Wed, 12 Oct 2016 11:21:14 -0500
The forcing of an expedited grace period is an expensive and very
RT-application unfriendly operation, as it forcibly preempts all running
tasks on CPUs which are preventing the gp from expiring.
By default, as a policy decision, disable the expediting of grace
periods (after boot) on configurations which enable PREEMPT_RT.
Suggested-by: Luiz Capitulino <lcapitulino@redhat.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
174/251 [
Author: Scott Wood
Email: swood@redhat.com
Subject: rcutorture: Avoid problematic critical section nesting on RT
Date: Wed, 11 Sep 2019 17:57:29 +0100
rcutorture was generating some nesting scenarios that are not
reasonable. Constrain the state selection to avoid them.
Example #1:
1. preempt_disable()
2. local_bh_disable()
3. preempt_enable()
4. local_bh_enable()
On PREEMPT_RT, BH disabling takes a local lock only when called in
non-atomic context. Thus, atomic context must be retained until after BH
is re-enabled. Likewise, if BH is initially disabled in non-atomic
context, it cannot be re-enabled in atomic context.
Example #2:
1. rcu_read_lock()
2. local_irq_disable()
3. rcu_read_unlock()
4. local_irq_enable()
If the thread is preempted between steps 1 and 2,
rcu_read_unlock_special.b.blocked will be set, but it won't be
acted on in step 3 because IRQs are disabled. Thus, reporting of the
quiescent state will be delayed beyond the local_irq_enable().
For now, these scenarios will continue to be tested on non-PREEMPT_RT
kernels, until debug checks are added to ensure that they are not
happening elsewhere.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
175/251 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: rt: Improve the serial console PASS_LIMIT
Date: Wed, 14 Dec 2011 13:05:54 +0100
Beyond the warning:
drivers/tty/serial/8250/8250.c:1613:6: warning: unused variable ‘pass_counter’ [-Wunused-variable]
the solution of just looping infinitely was ugly - up it to 1 million to
give it a chance to continue in some really ugly situation.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
176/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs/epoll: Do not disable preemption on RT
Date: Fri, 8 Jul 2011 16:35:35 +0200
ep_call_nested() takes a sleeping lock so we can't disable preemption.
The light version is enough since ep_call_nested() doesn't mind beeing
invoked twice on the same CPU.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
177/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/vmalloc: Another preempt disable region which sucks
Date: Tue, 12 Jul 2011 11:39:36 +0200
Avoid the preempt disable version of get_cpu_var(). The inner-lock should
provide enough serialisation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
178/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block/mq: do not invoke preempt_disable()
Date: Tue, 14 Jul 2015 14:26:34 +0200
preempt_disable() and get_cpu() don't play well together with the sleeping
locks it tries to allocate later.
It seems to be enough to replace it with get_cpu_light() and migrate_disable().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
179/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block/mq: don't complete requests via IPI
Date: Thu, 29 Jan 2015 15:10:08 +0100
The IPI runs in hardirq context and there are sleeping locks. Assume caches are
shared and complete them on the local CPU.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
180/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: md: raid5: Make raid5_percpu handling RT aware
Date: Tue, 6 Apr 2010 16:51:31 +0200
__raid_run_ops() disables preemption with get_cpu() around the access
to the raid5_percpu variables. That causes scheduling while atomic
spews on RT.
Serialize the access to the percpu data with a lock and keep the code
preemptible.
Reported-by: Udo van den Heuvel <udovdh@xs4all.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Udo van den Heuvel <udovdh@xs4all.nl>
]
181/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: scsi/fcoe: Make RT aware.
Date: Sat, 12 Nov 2011 14:00:48 +0100
Do not disable preemption while taking sleeping locks. All user look safe
for migrate_diable() only.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
182/251 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: sunrpc: Make svc_xprt_do_enqueue() use get_cpu_light()
Date: Wed, 18 Feb 2015 16:05:28 +0100
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd
|Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
|CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9
|Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
| ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002
| 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008
| ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000
|Call Trace:
| [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
| [<ffffffff81073c86>] __might_sleep+0xe6/0x150
| [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50
| [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc]
| [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
| [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc]
| [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc]
| [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd]
| [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd]
| [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0
| [<ffffffff8117f889>] SyS_write+0x49/0xb0
| [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
183/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Introduce cpu_chill()
Date: Wed, 7 Mar 2012 20:51:03 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
defaults to cpu_relax() for non RT. On RT it puts the looping task to
sleep for a tick so the preempted task can make progress.
Steven Rostedt changed it to use a hrtimer instead of msleep():
|
|Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
|up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
|called from softirq context, it may block the ksoftirqd() from running, in
|which case, it may never wake up the msleep() causing the deadlock.
+ bigeasy later changed to schedule_hrtimeout()
|If a task calls cpu_chill() and gets woken up by a regular or spurious
|wakeup and has a signal pending, then it exits the sleep loop in
|do_nanosleep() and sets up the restart block. If restart->nanosleep.type is
|not TI_NONE then this results in accessing a stale user pointer from a
|previously interrupted syscall and a copy to user based on the stale
|pointer or a BUG() when 'type' is not supported in nanosleep_copyout().
+ bigeasy: add PF_NOFREEZE:
| [....] Waiting for /dev to be fully populated...
| =====================================
| [ BUG: udevd/229 still has locks held! ]
| 3.12.11-rt17 #23 Not tainted
| -------------------------------------
| 1 lock held by udevd/229:
| #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
|
| stack backtrace:
| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
184/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: block: Use cpu_chill() for retry loops
Date: Thu, 20 Dec 2012 18:28:26 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Steven also observed a live lock when there was a
concurrent priority boosting going on.
Use cpu_chill() instead of cpu_relax() to let the system
make progress.
[bigeasy: After all those changes that occured over the years, this one hunk is
left and should not cause any starvation on -RT anymore]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
185/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs: namespace: Use cpu_chill() in trylock loops
Date: Wed, 7 Mar 2012 21:00:34 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
186/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Use cpu_chill() instead of cpu_relax()
Date: Wed, 7 Mar 2012 21:10:04 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
187/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: debugobjects: Make RT aware
Date: Sun, 17 Jul 2011 21:41:35 +0200
Avoid filling the pool / allocating memory with irqs off().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
188/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Use skbufhead with raw lock
Date: Tue, 12 Jul 2011 15:38:34 +0200
Use the rps lock as rawlock so we can keep irq-off regions. It looks low
latency. However we can't kfree() from this context therefore we defer this
to the softirq and use the tofree_queue list for it (similar to process_queue).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
189/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: dev: always take qdisc's busylock in __dev_xmit_skb()
Date: Wed, 30 Mar 2016 13:36:29 +0200
The root-lock is dropped before dev_hard_start_xmit() is invoked and after
setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away
by a task with a higher priority then the task with the higher priority
won't be able to submit packets to the NIC directly instead they will be
enqueued into the Qdisc. The NIC will remain idle until the task(s) with
higher priority leave the CPU and the task with lower priority gets back
and finishes the job.
If we take always the busylock we ensure that the RT task can boost the
low-prio task and submit the packet.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
190/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: irqwork: push most work into softirq context
Date: Tue, 23 Jun 2015 15:32:51 +0200
Initially we defered all irqwork into softirq because we didn't want the
latency spikes if perf or another user was busy and delayed the RT task.
The NOHZ trigger (nohz_full_kick_work) was the first user that did not work
as expected if it did not run in the original irqwork context so we had to
bring it back somehow for it. push_irq_work_func is the second one that
requires this.
This patch adds the IRQ_WORK_HARD_IRQ which makes sure the callback runs
in raw-irq context. Everything else is defered into softirq context. Without
-RT we have the orignal behavior.
This patch incorporates tglx orignal work which revoked a little bringing back
the arch_irq_work_raise() if possible and a few fixes from Steven Rostedt and
Mike Galbraith,
[bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a
hard and soft variant]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
191/251 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: x86: crypto: Reduce preempt disabled regions
Date: Mon, 14 Nov 2011 18:19:27 +0100
Restrict the preempt disabled regions to the actual floating point
operations and enable preemption for the administrative actions.
This is necessary on RT to avoid that kfree and other operations are
called with preemption disabled.
Reported-and-tested-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
192/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: Reduce preempt disabled regions, more algos
Date: Fri, 21 Feb 2014 17:24:04 +0100
Don Estabrook reported
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200()
| kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
and his backtrace showed some crypto functions which looked fine.
The problem is the following sequence:
glue_xts_crypt_128bit()
{
blkcipher_walk_virt(); /* normal migrate_disable() */
glue_fpu_begin(); /* get atomic */
while (nbytes) {
__glue_xts_crypt_128bit();
blkcipher_walk_done(); /* with nbytes = 0, migrate_enable()
* while we are atomic */
};
glue_fpu_end() /* no longer atomic */
}
and this is why the counter get out of sync and the warning is printed.
The other problem is that we are non-preemptible between
glue_fpu_begin() and glue_fpu_end() and the latency grows. To fix this,
I shorten the FPU off region and ensure blkcipher_walk_done() is called
with preemption enabled. This might hurt the performance because we now
enable/disable the FPU state more often but we gain lower latency and
the bug is gone.
Reported-by: Don Estabrook <don.estabrook@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
193/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: limit more FPU-enabled sections
Date: Thu, 30 Nov 2017 13:40:10 +0100
Those crypto drivers use SSE/AVX/… for their crypto work and in order to
do so in kernel they need to enable the "FPU" in kernel mode which
disables preemption.
There are two problems with the way they are used:
- the while loop which processes X bytes may create latency spikes and
should be avoided or limited.
- the cipher-walk-next part may allocate/free memory and may use
kmap_atomic().
The whole kernel_fpu_begin()/end() processing isn't probably that cheap.
It most likely makes sense to process as much of those as possible in one
go. The new *_fpu_sched_rt() schedules only if a RT task is pending.
Probably we should measure the performance those ciphers in pure SW
mode and with this optimisations to see if it makes sense to keep them
for RT.
This kernel_fpu_resched() makes the code more preemptible which might hurt
performance.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
194/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: cryptd - add a lock instead preempt_disable/local_bh_disable
Date: Thu, 26 Jul 2018 18:52:00 +0200
cryptd has a per-CPU lock which protected with local_bh_disable() and
preempt_disable().
Add an explicit spin_lock to make the locking context more obvious and
visible to lockdep. Since it is a per-CPU lock, there should be no lock
contention on the actual spinlock.
There is a small race-window where we could be migrated to another CPU
after the cpu_queue has been obtain. This is not a problem because the
actual ressource is protected by the spinlock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
195/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: panic: skip get_random_bytes for RT_FULL in init_oops_id
Date: Tue, 14 Jul 2015 14:26:34 +0200
Disable on -RT. If this is invoked from irq-context we will have problems
to acquire the sleeping lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
196/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: stackprotector: Avoid random pool on rt
Date: Thu, 16 Dec 2010 14:25:18 +0100
CPU bringup calls into the random pool to initialize the stack
canary. During boot that works nicely even on RT as the might sleep
checks are disabled. During CPU hotplug the might sleep checks
trigger. Making the locks in random raw is a major PITA, so avoid the
call on RT is the only sensible solution. This is basically the same
randomness which we get during boot where the random pool has no
entropy and we rely on the TSC randomnness.
Reported-by: Carsten Emde <carsten.emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
197/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: random: Make it work on rt
Date: Tue, 21 Aug 2012 20:38:50 +0200
Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
198/251 [
Author: Priyanka Jain
Email: Priyanka.Jain@freescale.com
Subject: net: Remove preemption disabling in netif_rx()
Date: Thu, 17 May 2012 09:35:11 +0530
1)enqueue_to_backlog() (called from netif_rx) should be
bind to a particluar CPU. This can be achieved by
disabling migration. No need to disable preemption
2)Fixes crash "BUG: scheduling while atomic: ksoftirqd"
in case of RT.
If preemption is disabled, enqueue_to_backog() is called
in atomic context. And if backlog exceeds its count,
kfree_skb() is called. But in RT, kfree_skb() might
gets scheduled out, so it expects non atomic context.
3)When CONFIG_PREEMPT_RT is not defined,
migrate_enable(), migrate_disable() maps to
preempt_enable() and preempt_disable(), so no
change in functionality in case of non-RT.
-Replace preempt_enable(), preempt_disable() with
migrate_enable(), migrate_disable() respectively
-Replace get_cpu(), put_cpu() with get_cpu_light(),
put_cpu_light() respectively
Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com>
Cc: <rostedt@goodmis.orgn>
Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
199/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: lockdep: Make it RT aware
Date: Sun, 17 Jul 2011 18:51:23 +0200
teach lockdep that we don't really do softirqs on -RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
200/251 [
Author: Yong Zhang
Email: yong.zhang@windriver.com
Subject: lockdep: selftest: Only do hardirq context test for raw spinlock
Date: Mon, 16 Apr 2012 15:01:56 +0800
On -rt there is no softirq context any more and rwlock is sleepable,
disable softirq context test and rwlock+irq test.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Yong Zhang <yong.zhang@windriver.com>
Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
201/251 [
Author: Josh Cartwright
Email: josh.cartwright@ni.com
Subject: lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionals
Date: Wed, 28 Jan 2015 13:08:45 -0600
"lockdep: Selftest: Only do hardirq context test for raw spinlock"
disabled the execution of certain tests with PREEMPT_RT, but did
not prevent the tests from still being defined. This leads to warnings
like:
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function]
...
Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT
conditionals.
Signed-off-by: Josh Cartwright <josh.cartwright@ni.com>
Signed-off-by: Xander Huff <xander.huff@ni.com>
Acked-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
202/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: lockdep: disable self-test
Date: Tue, 17 Oct 2017 16:36:18 +0200
The self-test wasn't always 100% accurate for RT. We disabled a few
tests which failed because they had a different semantic for RT. Some
still reported false positives. Now the selftest locks up the system
during boot and it needs to be investigated…
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
203/251 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm,radeon,i915: Use preempt_disable/enable_rt() where recommended
Date: Sat, 27 Feb 2016 08:09:11 +0100
DRM folks identified the spots, so use them.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
204/251 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm,i915: Use local_lock/unlock_irq() in intel_pipe_update_start/end()
Date: Sat, 27 Feb 2016 09:01:42 +0100
[ 8.014039] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:918
[ 8.014041] in_atomic(): 0, irqs_disabled(): 1, pid: 78, name: kworker/u4:4
[ 8.014045] CPU: 1 PID: 78 Comm: kworker/u4:4 Not tainted 4.1.7-rt7 #5
[ 8.014055] Workqueue: events_unbound async_run_entry_fn
[ 8.014059] 0000000000000000 ffff880037153748 ffffffff815f32c9 0000000000000002
[ 8.014063] ffff88013a50e380 ffff880037153768 ffffffff815ef075 ffff8800372c06c8
[ 8.014066] ffff8800372c06c8 ffff880037153778 ffffffff8107c0b3 ffff880037153798
[ 8.014067] Call Trace:
[ 8.014074] [<ffffffff815f32c9>] dump_stack+0x4a/0x61
[ 8.014078] [<ffffffff815ef075>] ___might_sleep.part.93+0xe9/0xee
[ 8.014082] [<ffffffff8107c0b3>] ___might_sleep+0x53/0x80
[ 8.014086] [<ffffffff815f9064>] rt_spin_lock+0x24/0x50
[ 8.014090] [<ffffffff8109368b>] prepare_to_wait+0x2b/0xa0
[ 8.014152] [<ffffffffa016c04c>] intel_pipe_update_start+0x17c/0x300 [i915]
[ 8.014156] [<ffffffff81093b40>] ? prepare_to_wait_event+0x120/0x120
[ 8.014201] [<ffffffffa0158f36>] intel_begin_crtc_commit+0x166/0x1e0 [i915]
[ 8.014215] [<ffffffffa00c806d>] drm_atomic_helper_commit_planes+0x5d/0x1a0 [drm_kms_helper]
[ 8.014260] [<ffffffffa0171e9b>] intel_atomic_commit+0xab/0xf0 [i915]
[ 8.014288] [<ffffffffa00654c7>] drm_atomic_commit+0x37/0x60 [drm]
[ 8.014298] [<ffffffffa00c6fcd>] drm_atomic_helper_plane_set_property+0x8d/0xd0 [drm_kms_helper]
[ 8.014301] [<ffffffff815f77d9>] ? __ww_mutex_lock+0x39/0x40
[ 8.014319] [<ffffffffa0053b3d>] drm_mode_plane_set_obj_prop+0x2d/0x90 [drm]
[ 8.014328] [<ffffffffa00c8edb>] restore_fbdev_mode+0x6b/0xf0 [drm_kms_helper]
[ 8.014337] [<ffffffffa00cae49>] drm_fb_helper_restore_fbdev_mode_unlocked+0x29/0x80 [drm_kms_helper]
[ 8.014346] [<ffffffffa00caec2>] drm_fb_helper_set_par+0x22/0x50 [drm_kms_helper]
[ 8.014390] [<ffffffffa016dfba>] intel_fbdev_set_par+0x1a/0x60 [i915]
[ 8.014394] [<ffffffff81327dc4>] fbcon_init+0x4f4/0x580
[ 8.014398] [<ffffffff8139ef4c>] visual_init+0xbc/0x120
[ 8.014401] [<ffffffff813a1623>] do_bind_con_driver+0x163/0x330
[ 8.014405] [<ffffffff813a1b2c>] do_take_over_console+0x11c/0x1c0
[ 8.014408] [<ffffffff813236e3>] do_fbcon_takeover+0x63/0xd0
[ 8.014410] [<ffffffff81328965>] fbcon_event_notify+0x785/0x8d0
[ 8.014413] [<ffffffff8107c12d>] ? __might_sleep+0x4d/0x90
[ 8.014416] [<ffffffff810775fe>] notifier_call_chain+0x4e/0x80
[ 8.014419] [<ffffffff810779cd>] __blocking_notifier_call_chain+0x4d/0x70
[ 8.014422] [<ffffffff81077a06>] blocking_notifier_call_chain+0x16/0x20
[ 8.014425] [<ffffffff8132b48b>] fb_notifier_call_chain+0x1b/0x20
[ 8.014428] [<ffffffff8132d8fa>] register_framebuffer+0x21a/0x350
[ 8.014439] [<ffffffffa00cb164>] drm_fb_helper_initial_config+0x274/0x3e0 [drm_kms_helper]
[ 8.014483] [<ffffffffa016f1cb>] intel_fbdev_initial_config+0x1b/0x20 [i915]
[ 8.014486] [<ffffffff8107912c>] async_run_entry_fn+0x4c/0x160
[ 8.014490] [<ffffffff81070ffa>] process_one_work+0x14a/0x470
[ 8.014493] [<ffffffff81071489>] worker_thread+0x169/0x4c0
[ 8.014496] [<ffffffff81071320>] ? process_one_work+0x470/0x470
[ 8.014499] [<ffffffff81076606>] kthread+0xc6/0xe0
[ 8.014502] [<ffffffff81070000>] ? queue_work_on+0x80/0x110
[ 8.014506] [<ffffffff81076540>] ? kthread_worker_fn+0x1c0/0x1c0
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
205/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: disable tracing on -RT
Date: Thu, 6 Dec 2018 09:52:20 +0100
Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
| rt_spin_lock+0x3f/0x50
| gen6_read32+0x45/0x1d0 [i915]
| g4x_get_vblank_counter+0x36/0x40 [i915]
| trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spin locks. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping lock.
Based on this I don't see any other way than disable trace points on RT.
Cc: stable-rt@vger.kernel.org
Reported-by: Luca Abeni <lucabe72@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
206/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE
Date: Wed, 19 Dec 2018 10:47:02 +0100
The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
207/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Don't disable interrupts for intel_engine_breadcrumbs_irq()
Date: Thu, 26 Sep 2019 12:29:05 +0200
The function intel_engine_breadcrumbs_irq() is always invoked from an interrupt
handler and for that reason it invokes (as an optimisation) only spin_lock()
for locking assuming that the interrupts are already disabled. The
function intel_engine_signal_breadcrumbs() is provided to disable
interrupts while the former function is invoked so that assumption is
also true for callers from preemptible context.
On PREEMPT_RT local_irq_disable() really disables interrupts and this
forbids to invoke spin_lock() which becomes a sleeping spinlock.
This is also problematic with `threadirqs' in conjunction with
irq_work. With force threading the interrupt handler, the handler is
invoked with disabled BH but with interrupts enabled. This is okay and
the lock itself is never acquired in IRQ context. This changes with
irq_work (signal_irq_work()) which _still_ invokes
intel_engine_breadcrumbs_irq() from IRQ context. Lockdep should see this
and complain.
Acquire the locks in intel_engine_breadcrumbs_irq() with _irqsave()
suffix and let all callers invoke intel_engine_breadcrumbs_irq()
directly instead using intel_engine_signal_breadcrumbs().
Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
208/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Drop the IRQ-off asserts
Date: Thu, 26 Sep 2019 12:30:21 +0200
The lockdep_assert_irqs_disabled() check is needless. The previous
lockdep_assert_held() check ensures that the lock is acquired and while
the lock is acquired lockdep also prints a warning if the interrupts are
not disabled if they have to be.
These IRQ-off asserts trigger on PREEMPT_RT because the locks become
sleeping locks and do not really disable interrupts.
Remove lockdep_assert_irqs_disabled().
Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
209/251 [
Author: Mike Galbraith
Email: efault@gmx.de
Subject: cpuset: Convert callback_lock to raw_spinlock_t
Date: Sun, 8 Jan 2017 09:32:25 +0100
The two commits below add up to a cpuset might_sleep() splat for RT:
8447a0fee974 cpuset: convert callback_mutex to a spinlock
344736f29b35 cpuset: simplify cpuset_node_allowed API
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995
in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset
CPU: 135 PID: 11718 Comm: cset Tainted: G E 4.10.0-rt1-rt #4
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
Call Trace:
? dump_stack+0x5c/0x81
? ___might_sleep+0xf4/0x170
? rt_spin_lock+0x1c/0x50
? __cpuset_node_allowed+0x66/0xc0
? ___slab_alloc+0x390/0x570 <disables IRQs>
? anon_vma_fork+0x8f/0x140
? copy_page_range+0x6cf/0xb00
? anon_vma_fork+0x8f/0x140
? __slab_alloc.isra.74+0x5a/0x81
? anon_vma_fork+0x8f/0x140
? kmem_cache_alloc+0x1b5/0x1f0
? anon_vma_fork+0x8f/0x140
? copy_process.part.35+0x1670/0x1ee0
? _do_fork+0xdd/0x3f0
? _do_fork+0xdd/0x3f0
? do_syscall_64+0x61/0x170
? entry_SYSCALL64_slow_path+0x25/0x25
The later ensured that a NUMA box WILL take callback_lock in atomic
context by removing the allocator and reclaim path __GFP_HARDWALL
usage which prevented such contexts from taking callback_mutex.
One option would be to reinstate __GFP_HARDWALL protections for
RT, however, as the 8447a0fee974 changelog states:
The callback_mutex is only used to synchronize reads/updates of cpusets'
flags and cpu/node masks. These operations should always proceed fast so
there's no reason why we can't use a spinlock instead of the mutex.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
210/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: apparmor: use a locallock instead preempt_disable()
Date: Wed, 11 Oct 2017 17:43:49 +0200
get_buffers() disables preemption which acts as a lock for the per-CPU
variable. Since we can't disable preemption here on RT, a local_lock is
lock is used in order to remain on the same CPU and not to have more
than one user within the critical section.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
211/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Allow to enable RT
Date: Wed, 7 Aug 2019 18:15:38 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
212/251 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: mm, rt: kmap_atomic scheduling
Date: Thu, 28 Jul 2011 10:43:51 +0200
In fact, with migrate_disable() existing one could play games with
kmap_atomic. You could save/restore the kmap_atomic slots on context
switch (if there are any in use of course), this should be esp easy now
that we have a kmap_atomic stack.
Something like the below.. it wants replacing all the preempt_disable()
stuff with pagefault_disable() && migrate_disable() of course, but then
you can flip kmaps around like below.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[dvhart@linux.intel.com: build fix]
Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins
[tglx@linutronix.de: Get rid of the per cpu variable and store the idx
and the pte content right away in the task struct.
Shortens the context switch code. ]
]
213/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86/highmem: Add a "already used pte" check
Date: Mon, 11 Mar 2013 17:09:55 +0100
This is a copy from kmap_atomic_prot().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
214/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm/highmem: Flush tlb on unmap
Date: Mon, 11 Mar 2013 21:37:27 +0100
The tlb should be flushed on unmap and thus make the mapping entry
invalid. This is only done in the non-debug case which does not look
right.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
215/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Enable highmem for rt
Date: Wed, 13 Feb 2013 11:03:11 +0100
fixup highmem for ARM.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
216/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/scatterlist: Do not disable irqs on RT
Date: Fri, 3 Jul 2009 08:44:34 -0500
For -RT it is enough to keep pagefault disabled (which is currently handled by
kmap_atomic()).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
217/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add support for lazy preemption
Date: Fri, 26 Oct 2012 18:50:54 +0100
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.
Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.
So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).
Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.
For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.
Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)
The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:
# echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
and reenabled via
# echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
218/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: Support for lazy preemption
Date: Thu, 1 Nov 2012 11:03:47 +0100
Implement the x86 pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
219/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Add support for lazy preemption
Date: Wed, 31 Oct 2012 12:04:11 +0100
Implement the arm pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
220/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Add support for lazy preemption
Date: Thu, 1 Nov 2012 10:14:11 +0100
Implement the powerpc pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
221/251 [
Author: Anders Roxell
Email: anders.roxell@linaro.org
Subject: arch/arm64: Add lazy preempt support
Date: Thu, 14 May 2015 17:52:17 +0200
arm64 is missing support for PREEMPT_RT. The main feature which is
lacking is support for lazy preemption. The arch-specific entry code,
thread information structure definitions, and associated data tables
have to be extended to provide this support. Then the Kconfig file has
to be extended to indicate the support is available, and also to
indicate that support for full RT preemption is now available.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
]
222/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jump-label: disable if stop_machine() is used
Date: Wed, 8 Jul 2015 17:14:48 +0200
Some architectures are using stop_machine() while switching the opcode which
leads to latency spikes.
The architectures which use stop_machine() atm:
- ARM stop machine
- s390 stop machine
The architecures which use other sorcery:
- MIPS
- X86
- powerpc
- sparc
- arm64
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: only ARM for now]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
223/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: leds: trigger: disable CPU trigger on -RT
Date: Thu, 23 Jan 2014 14:45:59 +0100
as it triggers:
|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
|[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
|[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
|[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
|[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
|[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
|[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
|[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
|[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
|[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
|[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
|[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
224/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/omap: Make the locking RT aware
Date: Thu, 28 Jul 2011 13:32:57 +0200
The lock is a sleeping lock and local_irq_save() is not the
optimsation we are looking for. Redo it to make it work on -RT and
non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
225/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/pl011: Make the locking work on RT
Date: Tue, 8 Jan 2013 21:36:51 +0100
The lock is a sleeping lock and local_irq_save() is not the optimsation
we are looking for. Redo it to make it work on -RT and non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
226/251 [
Author: Kurt Kanzenbach
Email: kurt@linutronix.de
Subject: tty: serial: pl011: explicitly initialize the flags variable
Date: Mon, 24 Sep 2018 10:29:01 +0200
Silence the following gcc warning:
drivers/tty/serial/amba-pl011.c: In function ‘pl011_console_write’:
./include/linux/spinlock.h:260:3: warning: ‘flags’ may be used uninitialized in this function [-Wmaybe-uninitialized]
_raw_spin_unlock_irqrestore(lock, flags); \
^~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/tty/serial/amba-pl011.c:2214:16: note: ‘flags’ was declared here
unsigned long flags;
^~~~~
The code is correct. Thus, initializing flags to zero doesn't change the
behavior and resolves the warning.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
227/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm: include definition for cpumask_t
Date: Thu, 22 Dec 2016 17:28:33 +0100
This definition gets pulled in by other files. With the (later) split of
RCU and spinlock.h it won't compile anymore.
The split is done in ("rbtree: don't include the rcu header").
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
228/251 [
Author: Yadi.hu
Email: yadi.hu@windriver.com
Subject: ARM: enable irq in translation/section permission fault handlers
Date: Wed, 10 Dec 2014 10:32:09 +0800
Probably happens on all ARM, with
CONFIG_PREEMPT_RT
CONFIG_DEBUG_ATOMIC_SLEEP
This simple program....
int main() {
*((char*)0xc0001000) = 0;
};
[ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
[ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
[ 512.743217] INFO: lockdep is turned off.
[ 512.743360] irq event stamp: 0
[ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
[ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
[ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
[ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
[ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
[ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
[ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
[ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
[ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
[ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
[ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
[ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
[ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)
Oxc0000000 belongs to kernel address space, user task can not be
allowed to access it. For above condition, correct result is that
test case should receive a “segment fault” and exits but not stacks.
the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in
prefetch/data abort handlers"),it deletes irq enable block in Data
abort assemble code and move them into page/breakpiont/alignment fault
handlers instead. But author does not enable irq in translation/section
permission fault handlers. ARM disables irq when it enters exception/
interrupt mode, if kernel doesn't enable irq, it would be still disabled
during translation/section permission fault.
We see the above splat because do_force_sig_info is still called with
IRQs off, and that code eventually does a:
spin_lock_irqsave(&t->sighand->siglock, flags);
As this is architecture independent code, and we've not seen any other
need for other arch to have the siglock converted to raw lock, we can
conclude that we should enable irq for ARM translation/section
permission exception.
Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
229/251 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: genirq: update irq_set_irqchip_state documentation
Date: Thu, 11 Feb 2016 11:54:00 -0600
On -rt kernels, the use of migrate_disable()/migrate_enable() is
sufficient to guarantee a task isn't moved to another CPU. Update the
irq_set_irqchip_state() documentation to reflect this.
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
230/251 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: KVM: arm/arm64: downgrade preempt_disable()d region to migrate_disable()
Date: Thu, 11 Feb 2016 11:54:01 -0600
kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating
the vgic and timer states to prevent the calling task from migrating to
another CPU. It does so to prevent the task from writing to the
incorrect per-CPU GIC distributor registers.
On -rt kernels, it's possible to maintain the same guarantee with the
use of migrate_{disable,enable}(), with the added benefit that the
migrate-disabled region is preemptible. Update
kvm_arch_vcpu_ioctl_run() to do so.
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Reported-by: Manish Jaggi <Manish.Jaggi@caviumnetworks.com>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
231/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm64: fpsimd: Delay freeing memory in fpsimd_flush_thread()
Date: Wed, 25 Jul 2018 14:02:38 +0200
fpsimd_flush_thread() invokes kfree() via sve_free() within a preempt disabled
section which is not working on -RT.
Delay freeing of memory until preemption is enabled again.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
232/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm: at91: do not disable/enable clocks in a row
Date: Wed, 9 Mar 2016 10:51:06 +0100
Currently the driver will disable the clock and enable it one line later
if it is switching from periodic mode into one shot.
This can be avoided and causes a needless warning on -RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
233/251 [
Author: Benedikt Spranger
Email: b.spranger@linutronix.de
Subject: clocksource: TCLIB: Allow higher clock rates for clock events
Date: Mon, 8 Mar 2010 18:57:04 +0100
As default the TCLIB uses the 32KiHz base clock rate for clock events.
Add a compile time selection to allow higher clock resulution.
(fixed up by Sami Pietikäinen <Sami.Pietikainen@wapice.com>)
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
234/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Enable RT also on 32bit
Date: Thu, 7 Nov 2019 17:49:20 +0100
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
235/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:29 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
236/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM64: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:35 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
237/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/pseries/iommu: Use a locallock instead local_irq_save()
Date: Tue, 26 Mar 2019 18:31:54 +0100
The locallock protects the per-CPU variable tce_page. The function
attempts to allocate memory while tce_page is protected (by disabling
interrupts).
Use local_irq_save() instead of local_irq_disable().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
238/251 [
Author: Bogdan Purcareata
Email: bogdan.purcareata@freescale.com
Subject: powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT
Date: Fri, 24 Apr 2015 15:53:13 +0000
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.
This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
239/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Disable highmem on RT
Date: Mon, 18 Jul 2011 17:08:34 +0200
The current highmem handling on -RT is not compatible and needs fixups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
240/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/stackprotector: work around stack-guard init from atomic
Date: Tue, 26 Mar 2019 18:31:29 +0100
This is invoked from the secondary CPU in atomic context. On x86 we use
tsc instead. On Power we XOR it against mftb() so lets use stack address
as the initial value.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
241/251 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: POWERPC: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:41 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
242/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mips: Disable highmem on RT
Date: Mon, 18 Jul 2011 17:10:12 +0200
The current highmem handling on -RT is not compatible and needs fixups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
243/251 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: connector/cn_proc: Protect send_msg() with a local lock on RT
Date: Sun, 16 Oct 2016 05:11:54 +0200
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
|in_atomic(): 1, irqs_disabled(): 0, pid: 31807, name: sleep
|Preemption disabled at:[<ffffffff8148019b>] proc_exit_connector+0xbb/0x140
|
|CPU: 4 PID: 31807 Comm: sleep Tainted: G W E 4.8.0-rt11-rt #106
|Call Trace:
| [<ffffffff813436cd>] dump_stack+0x65/0x88
| [<ffffffff8109c425>] ___might_sleep+0xf5/0x180
| [<ffffffff816406b0>] __rt_spin_lock+0x20/0x50
| [<ffffffff81640978>] rt_read_lock+0x28/0x30
| [<ffffffff8156e209>] netlink_broadcast_filtered+0x49/0x3f0
| [<ffffffff81522621>] ? __kmalloc_reserve.isra.33+0x31/0x90
| [<ffffffff8156e5cd>] netlink_broadcast+0x1d/0x20
| [<ffffffff8147f57a>] cn_netlink_send_mult+0x19a/0x1f0
| [<ffffffff8147f5eb>] cn_netlink_send+0x1b/0x20
| [<ffffffff814801d8>] proc_exit_connector+0xf8/0x140
| [<ffffffff81077f71>] do_exit+0x5d1/0xba0
| [<ffffffff810785cc>] do_group_exit+0x4c/0xc0
| [<ffffffff81078654>] SyS_exit_group+0x14/0x20
| [<ffffffff81640a72>] entry_SYSCALL_64_fastpath+0x1a/0xa4
Since ab8ed951080e ("connector: fix out-of-order cn_proc netlink message
delivery") which is v4.7-rc6.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
244/251 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drivers/block/zram: Replace bit spinlocks with rtmutex for -rt
Date: Thu, 31 Mar 2016 04:08:28 +0200
They're nondeterministic, and lead to ___might_sleep() splats in -rt.
OTOH, they're a lot less wasteful than an rtmutex per page.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
245/251 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drivers/zram: Don't disable preemption in zcomp_stream_get/put()
Date: Thu, 20 Oct 2016 11:15:22 +0200
In v4.7, the driver switched to percpu compression streams, disabling
preemption via get/put_cpu_ptr(). Use a per-zcomp_strm lock here. We
also have to fix an lock order issue in zram_decompress_page() such
that zs_map_object() nests inside of zcomp_stream_put() as it does in
zram_bvec_write().
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: get_locked_var() -> per zcomp_strm lock]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
246/251 [
Author: Julia Cartwright
Email: julia@ni.com
Subject: squashfs: make use of local lock in multi_cpu decompressor
Date: Mon, 7 May 2018 08:58:57 -0500
Currently, the squashfs multi_cpu decompressor makes use of
get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption
during decompression.
Because the workload is distributed across CPUs, all CPUs can observe a
very high wakeup latency, which has been seen to be as much as 8000us.
Convert this decompressor to make use of a local lock, which will allow
execution of the decompressor with preemption-enabled, but also ensure
concurrent accesses to the percpu compressor data on the local CPU will
be serialized.
Cc: stable-rt@vger.kernel.org
Reported-by: Alexander Stein <alexander.stein@systec-electronic.com>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
247/251 [
Author: Haris Okanovic
Email: haris.okanovic@ni.com
Subject: tpm_tis: fix stall after iowrite*()s
Date: Tue, 15 Aug 2017 15:13:08 -0500
ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.
For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.
The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.
Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
248/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signals: Allow rt tasks to cache one sigqueue struct
Date: Fri, 3 Jul 2009 08:44:56 -0500
To avoid allocation allow rt tasks to cache one sigqueue struct in
task struct.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
249/251 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: genirq: Disable irqpoll on -rt
Date: Fri, 3 Jul 2009 08:29:57 -0500
Creates long latencies for no value
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
250/251 [
Author: Clark Williams
Email: williams@redhat.com
Subject: sysfs: Add /sys/kernel/realtime entry
Date: Sat, 30 Jul 2011 21:55:53 -0500
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.
Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.
Are there better solutions? Should it exist and return 0 on !-rt?
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
]
251/251 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: Add localversion for -RT release
Date: Fri, 8 Jul 2011 20:25:16 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
This feature enables thunderbolt support on Apple hardware or on PCs
with Intel Falcon Ridge or newer.
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
1/252 [
Author: Waiman Long
Email: longman@redhat.com
Subject: lib/smp_processor_id: Don't use cpumask_equal()
Date: Thu, 3 Oct 2019 16:36:08 -0400
The check_preemption_disabled() function uses cpumask_equal() to see
if the task is bounded to the current CPU only. cpumask_equal() calls
memcmp() to do the comparison. As x86 doesn't have __HAVE_ARCH_MEMCMP,
the slow memcmp() function in lib/string.c is used.
On a RT kernel that call check_preemption_disabled() very frequently,
below is the perf-record output of a certain microbenchmark:
42.75% 2.45% testpmd [kernel.kallsyms] [k] check_preemption_disabled
40.01% 39.97% testpmd [kernel.kallsyms] [k] memcmp
We should avoid calling memcmp() in performance critical path. So the
cpumask_equal() call is now replaced with an equivalent simpler check.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
2/252 [
Author: Julien Grall
Email: julien.grall@arm.com
Subject: lib/ubsan: Don't seralize UBSAN report
Date: Fri, 20 Sep 2019 11:08:35 +0100
At the moment, UBSAN report will be serialized using a spin_lock(). On
RT-systems, spinlocks are turned to rt_spin_lock and may sleep. This will
result to the following splat if the undefined behavior is in a context
that can sleep:
| BUG: sleeping function called from invalid context at /src/linux/kernel/locking/rtmutex.c:968
| in_atomic(): 1, irqs_disabled(): 128, pid: 3447, name: make
| 1 lock held by make/3447:
| #0: 000000009a966332 (&mm->mmap_sem){++++}, at: do_page_fault+0x140/0x4f8
| Preemption disabled at:
| [<ffff000011324a4c>] rt_mutex_futex_unlock+0x4c/0xb0
| CPU: 3 PID: 3447 Comm: make Tainted: G W 5.2.14-rt7-01890-ge6e057589653 #911
| Call trace:
| dump_backtrace+0x0/0x148
| show_stack+0x14/0x20
| dump_stack+0xbc/0x104
| ___might_sleep+0x154/0x210
| rt_spin_lock+0x68/0xa0
| ubsan_prologue+0x30/0x68
| handle_overflow+0x64/0xe0
| __ubsan_handle_add_overflow+0x10/0x18
| __lock_acquire+0x1c28/0x2a28
| lock_acquire+0xf0/0x370
| _raw_spin_lock_irqsave+0x58/0x78
| rt_mutex_futex_unlock+0x4c/0xb0
| rt_spin_unlock+0x28/0x70
| get_page_from_freelist+0x428/0x2b60
| __alloc_pages_nodemask+0x174/0x1708
| alloc_pages_vma+0x1ac/0x238
| __handle_mm_fault+0x4ac/0x10b0
| handle_mm_fault+0x1d8/0x3b0
| do_page_fault+0x1c8/0x4f8
| do_translation_fault+0xb8/0xe0
| do_mem_abort+0x3c/0x98
| el0_da+0x20/0x24
The spin_lock() will protect against multiple CPUs to output a report
together, I guess to prevent them to be interleaved. However, they can
still interleave with other messages (and even splat from __migth_sleep).
So the lock usefulness seems pretty limited. Rather than trying to
accomodate RT-system by switching to a raw_spin_lock(), the lock is now
completely dropped.
Link: https://lkml.kernel.org/r/20190920100835.14999-1-julien.grall@arm.com
Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
3/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jbd2: Simplify journal_unmap_buffer()
Date: Fri, 9 Aug 2019 14:42:27 +0200
journal_unmap_buffer() checks first whether the buffer head is a journal.
If so it takes locks and then invokes jbd2_journal_grab_journal_head()
followed by another check whether this is journal head buffer.
The double checking is pointless.
Replace the initial check with jbd2_journal_grab_journal_head() which
alredy checks whether the buffer head is actually a journal.
Allows also early access to the journal head pointer for the upcoming
conversion of state lock to a regular spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
4/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jbd2: Remove jbd_trylock_bh_state()
Date: Fri, 9 Aug 2019 14:42:28 +0200
No users.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
5/252 [
Author: Jan Kara
Email: jack@suse.cz
Subject: jbd2: Move dropping of jh reference out of un/re-filing functions
Date: Fri, 9 Aug 2019 14:42:29 +0200
__jbd2_journal_unfile_buffer() and __jbd2_journal_refile_buffer() drop
transaction's jh reference when they remove jh from a transaction. This
will be however inconvenient once we move state lock into journal_head
itself as we still need to unlock it and we'd need to grab jh reference
just for that. Move dropping of jh reference out of these functions into
the few callers.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
6/252 [
Author: Jan Kara
Email: jack@suse.cz
Subject: jbd2: Drop unnecessary branch from jbd2_journal_forget()
Date: Fri, 9 Aug 2019 14:42:30 +0200
We have cleared both dirty & jbddirty bits from the bh. So there's no
difference between bforget() and brelse(). Thus there's no point jumping
to no_jbd branch.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
7/252 [
Author: Jan Kara
Email: jack@suse.cz
Subject: jbd2: Don't call __bforget() unnecessarily
Date: Fri, 9 Aug 2019 14:42:31 +0200
jbd2_journal_forget() jumps to 'not_jbd' branch which calls __bforget()
in cases where the buffer is clean which is pointless. In case of failed
assertion, it can be even argued that it is safer not to touch buffer's
dirty bits. Also logically it makes more sense to just jump to 'drop'
and that will make logic also simpler when we switch bh_state_lock to a
spinlock.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
8/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jbd2: Make state lock a spinlock
Date: Fri, 9 Aug 2019 14:42:32 +0200
Bit-spinlocks are problematic on PREEMPT_RT if functions which might sleep
on RT, e.g. spin_lock(), alloc/free(), are invoked inside the lock held
region because bit spinlocks disable preemption even on RT.
A first attempt was to replace state lock with a spinlock placed in struct
buffer_head and make the locking conditional on PREEMPT_RT and
DEBUG_BIT_SPINLOCKS.
Jan pointed out that there is a 4 byte hole in struct journal_head where a
regular spinlock fits in and he would not object to convert the state lock
to a spinlock unconditionally.
Aside of solving the RT problem, this also gains lockdep coverage for the
journal head state lock (bit-spinlocks are not covered by lockdep as it's
hard to fit a lockdep map into a single bit).
The trivial change would have been to convert the jbd_*lock_bh_state()
inlines, but that comes with the downside that these functions take a
buffer head pointer which needs to be converted to a journal head pointer
which adds another level of indirection.
As almost all functions which use this lock have a journal head pointer
readily available, it makes more sense to remove the lock helper inlines
and write out spin_*lock() at all call sites.
Fixup all locking comments as well.
Suggested-by: Jan Kara <jack@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jan Kara <jack@suse.com>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
9/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jbd2: Free journal head outside of locked region
Date: Fri, 9 Aug 2019 14:42:33 +0200
On PREEMPT_RT bit-spinlocks have the same semantics as on PREEMPT_RT=n,
i.e. they disable preemption. That means functions which are not safe to be
called in preempt disabled context on RT trigger a might_sleep() assert.
The journal head bit spinlock is mostly held for short code sequences with
trivial RT safe functionality, except for one place:
jbd2_journal_put_journal_head() invokes __journal_remove_journal_head()
with the journal head bit spinlock held. __journal_remove_journal_head()
invokes kmem_cache_free() which must not be called with preemption disabled
on RT.
Jan suggested to rework the removal function so the actual free happens
outside the bit-spinlocked region.
Split it into two parts:
- Do the sanity checks and the buffer head detach under the lock
- Do the actual free after dropping the lock
There is error case handling in the free part which needs to dereference
the b_size field of the now detached buffer head. Due to paranoia (caused
by ignorance) the size is retrieved in the detach function and handed into
the free function. Might be over-engineered, but better safe than sorry.
This makes the journal head bit-spinlock usage RT compliant and also avoids
nested locking which is not covered by lockdep.
Suggested-by: Jan Kara <jack@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-ext4@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
10/252 [
Author: Joel Fernandes (Google)
Email: joel@joelfernandes.org
Subject: workqueue: Convert for_each_wq to use built-in list check
Date: Thu, 15 Aug 2019 10:18:42 -0400
Because list_for_each_entry_rcu() can now check for holding a
lock as well as for being in an RCU read-side critical section,
this commit replaces the workqueue_sysfs_unregister() function's
use of assert_rcu_or_wq_mutex() and list_for_each_entry_rcu() with
list_for_each_entry_rcu() augmented with a lockdep_is_held() optional
argument.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
11/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86/ioapic: Prevent inconsistent state when moving an interrupt
Date: Thu, 17 Oct 2019 12:19:01 +0200
There is an issue with threaded interrupts which are marked ONESHOT
and using the fasteoi handler:
if (IS_ONESHOT())
mask_irq();
....
cond_unmask_eoi_irq()
chip->irq_eoi();
if (setaffinity_pending) {
mask_ioapic();
...
move_affinity();
unmask_ioapic();
}
So if setaffinity is pending the interrupt will be moved and then
unconditionally unmasked at the ioapic level, which is wrong in two
aspects:
1) It should be kept masked up to the point where the threaded handler
finished.
2) The physical chip state and the software masked state are inconsistent
Guard both the mask and the unmask with a check for the software masked
state. If the line is marked masked then the ioapic line is also masked, so
both mask_ioapic() and unmask_ioapic() can be skipped safely.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Fixes: 3aa551c9b4c4 ("genirq: add threaded interrupt handler support")
Link: https://lkml.kernel.org/r/20191017101938.321393687@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
12/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86/ioapic: Rename misnamed functions
Date: Thu, 17 Oct 2019 12:19:02 +0200
ioapic_irqd_[un]mask() are misnomers as both functions do way more than
masking and unmasking the interrupt line. Both deal with the moving the
affinity of the interrupt within interrupt context. The mask/unmask is just
a tiny part of the functionality.
Rename them to ioapic_prepare/finish_move(), fixup the call sites and
rename the related variables in the code to reflect what this is about.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/20191017101938.412489856@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
13/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: percpu-refcount: use normal instead of RCU-sched"
Date: Wed, 4 Sep 2019 17:59:36 +0200
This is a revert of commit
a4244454df129 ("percpu-refcount: use RCU-sched insted of normal RCU")
which claims the only reason for using RCU-sched is
"rcu_read_[un]lock() … are slightly more expensive than preempt_disable/enable()"
and
"As the RCU critical sections are extremely short, using sched-RCU
shouldn't have any latency implications."
The problem with RCU-sched is that it disables preemption and the
callback must not acquire any sleeping locks like spinlock_t on
PREEMPT_RT which is the case.
Convert back to normal RCU.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
14/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Don't disable interrupts independently of the lock
Date: Wed, 10 Apr 2019 11:01:37 +0200
The locks (active.lock and rq->lock) need to be taken with disabled
interrupts. This is done in i915_request_retire() by disabling the
interrupts independently of the locks itself.
While local_irq_disable()+spin_lock() equals spin_lock_irq() on vanilla
it does not on PREEMPT_RT.
Chris Wilson confirmed that local_irq_disable() was just introduced as
an optimisation to avoid enabling/disabling interrupts during
lock/unlock combo.
Enable/disable interrupts as part of the locking instruction.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
15/252 [
Author: Julia Cartwright
Email: julia@ni.com
Subject: watchdog: prevent deferral of watchdogd wakeup on RT
Date: Fri, 28 Sep 2018 21:03:51 +0000
When PREEMPT_RT is enabled, all hrtimer expiry functions are
deferred for execution into the context of ksoftirqd unless otherwise
annotated.
Deferring the expiry of the hrtimer used by the watchdog core, however,
is a waste, as the callback does nothing but queue a kthread work item
and wakeup watchdogd.
It's worst then that, too: the deferral through ksoftirqd also means
that for correct behavior a user must adjust the scheduling parameters
of both watchdogd _and_ ksoftirqd, which is unnecessary and has other
side effects (like causing unrelated expiry functions to execute at
potentially elevated priority).
Instead, mark the hrtimer used by the watchdog core as being _HARD to
allow it's execution directly from hardirq context. The work done in
this expiry function is well-bounded and minimal.
A user still must adjust the scheduling parameters of the watchdogd
to be correct w.r.t. their application needs.
Cc: Guenter Roeck <linux@roeck-us.net>
Reported-and-tested-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Reported-by: Tim Sander <tim@krieglstein.org>
Signed-off-by: Julia Cartwright <julia@ni.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
[bigeasy: use only HRTIMER_MODE_REL_HARD]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
16/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block: Don't disable interrupts in trigger_softirq()
Date: Fri, 15 Nov 2019 21:37:22 +0100
trigger_softirq() is always invoked as a SMP-function call which is
always invoked with disables interrupts.
Don't disable interrupt in trigger_softirq() because interrupts are
already disabled.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
17/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm64: KVM: Invoke compute_layout() before alternatives are applied
Date: Thu, 26 Jul 2018 09:13:42 +0200
compute_layout() is invoked as part of an alternative fixup under
stop_machine(). This function invokes get_random_long() which acquires a
sleeping lock on -RT which can not be acquired in this context.
Rename compute_layout() to kvm_compute_layout() and invoke it before
stop_machine() applies the alternatives. Add a __init prefix to
kvm_compute_layout() because the caller has it, too (and so the code can be
discarded after boot).
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
18/252 [
Author: Marc Kleine-Budde
Email: mkl@pengutronix.de
Subject: net: sched: Use msleep() instead of yield()
Date: Wed, 5 Mar 2014 00:49:47 +0100
On PREEMPT_RT enabled systems the interrupt handler run as threads at prio 50
(by default). If a high priority userspace process tries to shut down a busy
network interface it might spin in a yield loop waiting for the device to
become idle. With the interrupt thread having a lower priority than the
looping process it might never be scheduled and so result in a deadlock on UP
systems.
With Magic SysRq the following backtrace can be produced:
> test_app R running 0 174 168 0x00000000
> [<c02c7070>] (__schedule+0x220/0x3fc) from [<c02c7870>] (preempt_schedule_irq+0x48/0x80)
> [<c02c7870>] (preempt_schedule_irq+0x48/0x80) from [<c0008fa8>] (svc_preempt+0x8/0x20)
> [<c0008fa8>] (svc_preempt+0x8/0x20) from [<c001a984>] (local_bh_enable+0x18/0x88)
> [<c001a984>] (local_bh_enable+0x18/0x88) from [<c025316c>] (dev_deactivate_many+0x220/0x264)
> [<c025316c>] (dev_deactivate_many+0x220/0x264) from [<c023be04>] (__dev_close_many+0x64/0xd4)
> [<c023be04>] (__dev_close_many+0x64/0xd4) from [<c023be9c>] (__dev_close+0x28/0x3c)
> [<c023be9c>] (__dev_close+0x28/0x3c) from [<c023f7f0>] (__dev_change_flags+0x88/0x130)
> [<c023f7f0>] (__dev_change_flags+0x88/0x130) from [<c023f904>] (dev_change_flags+0x10/0x48)
> [<c023f904>] (dev_change_flags+0x10/0x48) from [<c024c140>] (do_setlink+0x370/0x7ec)
> [<c024c140>] (do_setlink+0x370/0x7ec) from [<c024d2f0>] (rtnl_newlink+0x2b4/0x450)
> [<c024d2f0>] (rtnl_newlink+0x2b4/0x450) from [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4)
> [<c024cfa0>] (rtnetlink_rcv_msg+0x158/0x1f4) from [<c0256740>] (netlink_rcv_skb+0xac/0xc0)
> [<c0256740>] (netlink_rcv_skb+0xac/0xc0) from [<c024bbd8>] (rtnetlink_rcv+0x18/0x24)
> [<c024bbd8>] (rtnetlink_rcv+0x18/0x24) from [<c02561b8>] (netlink_unicast+0x13c/0x198)
> [<c02561b8>] (netlink_unicast+0x13c/0x198) from [<c025651c>] (netlink_sendmsg+0x264/0x2e0)
> [<c025651c>] (netlink_sendmsg+0x264/0x2e0) from [<c022af98>] (sock_sendmsg+0x78/0x98)
> [<c022af98>] (sock_sendmsg+0x78/0x98) from [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278)
> [<c022bb50>] (___sys_sendmsg.part.25+0x268/0x278) from [<c022cf08>] (__sys_sendmsg+0x48/0x78)
> [<c022cf08>] (__sys_sendmsg+0x48/0x78) from [<c0009320>] (ret_fast_syscall+0x0/0x2c)
This patch works around the problem by replacing yield() by msleep(1), giving
the interrupt thread time to finish, similar to other changes contained in the
rt patch set. Using wait_for_completion() instead would probably be a better
solution.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
19/252 [
Author: Uladzislau Rezki (Sony)
Email: urezki@gmail.com
Subject: mm/vmalloc: remove preempt_disable/enable when doing preloading
Date: Sat, 30 Nov 2019 17:54:33 -0800
Some background. The preemption was disabled before to guarantee that a
preloaded object is available for a CPU, it was stored for. That was
achieved by combining the disabling the preemption and taking the spin
lock while the ne_fit_preload_node is checked.
The aim was to not allocate in atomic context when spinlock is taken
later, for regular vmap allocations. But that approach conflicts with
CONFIG_PREEMPT_RT philosophy. It means that calling spin_lock() with
disabled preemption is forbidden in the CONFIG_PREEMPT_RT kernel.
Therefore, get rid of preempt_disable() and preempt_enable() when the
preload is done for splitting purpose. As a result we do not guarantee
now that a CPU is preloaded, instead we minimize the case when it is
not, with this change, by populating the per cpu preload pointer under
the vmap_area_lock.
This implies that at least each caller that has done the preallocation
will not fallback to an atomic allocation later. It is possible that
the preallocation would be pointless or that no preallocation is done
because of the race but the data shows that this is really rare.
For example i run the special test case that follows the preload pattern
and path. 20 "unbind" threads run it and each does 1000000 allocations.
Only 3.5 times among 1000000 a CPU was not preloaded. So it can happen
but the number is negligible.
[mhocko@suse.com: changelog additions]
Link: http://lkml.kernel.org/r/20191016095438.12391-1-urezki@gmail.com
Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for split purpose")
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Daniel Wagner <dwagner@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
20/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: KVM: arm/arm64: Let the timer expire in hardirq context on RT
Date: Tue, 13 Aug 2019 14:29:41 +0200
The timers are canceled from an preempt-notifier which is invoked with
disabled preemption which is not allowed on PREEMPT_RT.
The timer callback is short so in could be invoked in hard-IRQ context
on -RT.
Let the timer expire on hard-IRQ context even on -RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
21/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add printk ring buffer documentation
Date: Tue, 12 Feb 2019 15:29:39 +0100
The full documentation file for the printk ring buffer.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
22/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add prb locking functions
Date: Tue, 12 Feb 2019 15:29:40 +0100
Add processor-reentrant spin locking functions. These allow
restricting the number of possible contexts to 2, which can simplify
implementing code that also supports NMI interruptions.
prb_lock();
/*
* This code is synchronized with all contexts
* except an NMI on the same processor.
*/
prb_unlock();
In order to support printk's emergency messages, a
processor-reentrant spin lock will be used to control raw access to
the emergency console. However, it must be the same
processor-reentrant spin lock as the one used by the ring buffer,
otherwise a deadlock can occur:
CPU1: printk lock -> emergency -> serial lock
CPU2: serial lock -> printk lock
By making the processor-reentrant implemtation available externally,
printk can use the same atomic_t for the ring buffer as for the
emergency console and thus avoid the above deadlock.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
23/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: define ring buffer struct and initializer
Date: Tue, 12 Feb 2019 15:29:41 +0100
See Documentation/printk-ringbuffer.txt for details about the
initializer arguments.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
24/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add writer interface
Date: Tue, 12 Feb 2019 15:29:42 +0100
Add the writer functions prb_reserve() and prb_commit(). These make
use of processor-reentrant spin locks to limit the number of possible
interruption scenarios for the writers.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
25/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add basic non-blocking reading interface
Date: Tue, 12 Feb 2019 15:29:43 +0100
Add reader iterator static declaration/initializer, dynamic
initializer, and functions to iterate and retrieve ring buffer data.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
26/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add blocking reader support
Date: Tue, 12 Feb 2019 15:29:44 +0100
Add a blocking read function for readers. An irq_work function is
used to signal the wait queue so that write notification can
be triggered from any context.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
27/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk-rb: add functionality required by printk
Date: Tue, 12 Feb 2019 15:29:45 +0100
The printk subsystem needs to be able to query the size of the ring
buffer, seek to specific entries within the ring buffer, and track
if records could not be stored in the ring buffer.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
28/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add ring buffer and kthread
Date: Tue, 12 Feb 2019 15:29:46 +0100
The printk ring buffer provides an NMI-safe interface for writing
messages to a ring buffer. Using such a buffer for alleviates printk
callers from the current burdens of disabled preemption while calling
the console drivers (and possibly printing out many messages that
another task put into the log buffer).
Create a ring buffer to be used for storing messages to be
printed to the consoles.
Create a dedicated printk kthread to block on the ring buffer
and call the console drivers for the read messages.
NOTE: The printk_delay is relocated to _after_ the message is
printed, where it makes more sense.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
29/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove exclusive console hack
Date: Tue, 12 Feb 2019 15:29:47 +0100
In order to support printing the printk log history when new
consoles are registered, a global exclusive_console variable is
temporarily set. This only works because printk runs with
preemption disabled.
When console printing is moved to a fully preemptible dedicated
kthread, this hack no longer works.
Remove exclusive_console usage.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
30/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: redirect emit/store to new ringbuffer
Date: Tue, 12 Feb 2019 15:29:48 +0100
vprintk_emit and vprintk_store are the main functions that all printk
variants eventually go through. Change these to store the message in
the new printk ring buffer that the printk kthread is reading.
Remove functions no longer in use because of the changes to
vprintk_emit and vprintk_store.
In order to handle interrupts and NMIs, a second per-cpu ring buffer
(sprint_rb) is added. This ring buffer is used for NMI-safe memory
allocation in order to format the printk messages.
NOTE: LOG_CONT is ignored for now and handled as individual messages.
LOG_CONT functions are masked behind "#if 0" blocks until their
functionality can be restored
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
31/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk_safe: remove printk safe code
Date: Tue, 12 Feb 2019 15:29:49 +0100
vprintk variants are now NMI-safe so there is no longer a need for
the "safe" calls.
NOTE: This also removes printk flushing functionality.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
32/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: minimize console locking implementation
Date: Tue, 12 Feb 2019 15:29:50 +0100
Since printing of the printk buffer is now handled by the printk
kthread, minimize the console locking functions to just handle
locking of the console.
NOTE: With this console_flush_on_panic will no longer flush.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
33/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: track seq per console
Date: Tue, 12 Feb 2019 15:29:51 +0100
Allow each console to track which seq record was last printed. This
simplifies identifying dropped records.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
34/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: do boot_delay_msec inside printk_delay
Date: Tue, 12 Feb 2019 15:29:52 +0100
Both functions needed to be called one after the other, so just
integrate boot_delay_msec into printk_delay for simplification.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
35/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: print history for new consoles
Date: Tue, 12 Feb 2019 15:29:53 +0100
When new consoles register, they currently print how many messages
they have missed. However, many (or all) of those messages may still
be in the ring buffer. Add functionality to print as much of the
history as available. This is a clean replacement of the old
exclusive console hack.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
36/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement CON_PRINTBUFFER
Date: Tue, 12 Feb 2019 15:29:54 +0100
If the CON_PRINTBUFFER flag is not set, do not replay the history
for that console.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
37/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: add processor number to output
Date: Tue, 12 Feb 2019 15:29:55 +0100
It can be difficult to sort printk out if multiple processors are
printing simultaneously. Add the processor number to the printk
output to allow the messages to be sorted.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
38/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: console: add write_atomic interface
Date: Tue, 12 Feb 2019 15:29:56 +0100
Add a write_atomic callback to the console. This is an optional
function for console drivers. The function must be atomic (including
NMI safe) for writing to the console.
Console drivers must still implement the write callback. The
write_atomic callback will only be used for emergency messages.
Creating an NMI safe write_atomic that must synchronize with write
requires a careful implementation of the console driver. To aid with
the implementation, a set of console_atomic_* functions are provided:
void console_atomic_lock(unsigned int *flags);
void console_atomic_unlock(unsigned int flags);
These functions synchronize using the processor-reentrant cpu lock of
the printk buffer.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
39/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: introduce emergency messages
Date: Tue, 12 Feb 2019 15:29:57 +0100
Console messages are generally either critical or non-critical.
Critical messages are messages such as crashes or sysrq output.
Critical messages should never be lost because generally they provide
important debugging information.
Since all console messages are output via a fully preemptible printk
kernel thread, it is possible that messages are not output because
that thread cannot be scheduled (BUG in scheduler, run-away RT task,
etc).
To allow critical messages to be output independent of the
schedulability of the printk task, introduce an emergency mechanism
that _immediately_ outputs the message to the consoles. To avoid
possible unbounded latency issues, the emergency mechanism only
outputs the printk line provided by the caller and ignores any
pending messages in the log buffer.
Critical messages are identified as messages (by default) with log
level LOGLEVEL_WARNING or more critical. This is configurable via the
kernel option CONSOLE_LOGLEVEL_EMERGENCY.
Any messages output as emergency messages are skipped by the printk
thread on those consoles that output the emergency message.
In order for a console driver to support emergency messages, the
write_atomic function must be implemented by the driver. If not
implemented, the emergency messages are handled like all other
messages and are printed by the printk thread.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
40/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: serial: 8250: implement write_atomic
Date: Tue, 12 Feb 2019 15:29:58 +0100
Implement a non-sleeping NMI-safe write_atomic console function in
order to support emergency printk messages.
Since interrupts need to be disabled during transmit, all usage of
the IER register was wrapped with access functions that use the
console_atomic_lock function to synchronize register access while
tracking the state of the interrupts. This was necessary because
write_atomic is can be calling from an NMI context that has
preempted write_atomic.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
41/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement KERN_CONT
Date: Tue, 12 Feb 2019 15:29:59 +0100
Implement KERN_CONT based on the printing CPU rather than on the
printing task. As long as the KERN_CONT messages are coming from the
same CPU and no non-KERN_CONT messages come, the messages are assumed
to belong to each other.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
42/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement /dev/kmsg
Date: Tue, 12 Feb 2019 15:30:00 +0100
Since printk messages are now logged to a new ring buffer, update
the /dev/kmsg functions to pull the messages from there.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
43/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement syslog
Date: Tue, 12 Feb 2019 15:30:01 +0100
Since printk messages are now logged to a new ring buffer, update
the syslog functions to pull the messages from there.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
44/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: implement kmsg_dump
Date: Tue, 12 Feb 2019 15:30:02 +0100
Since printk messages are now logged to a new ring buffer, update
the kmsg_dump functions to pull the messages from there.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
45/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: remove unused code
Date: Tue, 12 Feb 2019 15:30:03 +0100
Code relating to the safe context and anything dealing with the
previous log buffer implementation is no longer in use. Remove it.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
46/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: set deferred to default loglevel, enforce mask
Date: Thu, 14 Feb 2019 23:13:30 +0100
All messages printed via vpritnk_deferred() were being
automatically treated as emergency messages.
Messages printed via vprintk_deferred() should be set to the
default loglevel. LOGLEVEL_SCHED is no longer relevant.
Also, enforce the loglevel mask for emergency messages.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
47/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: serial: 8250: remove that trylock in serial8250_console_write_atomic()
Date: Thu, 14 Feb 2019 17:38:24 +0100
This does not work as rtmutex in NMI context. As per John, it is not
needed.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
48/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: serial: 8250: export symbols which are used by symbols
Date: Sat, 16 Feb 2019 09:02:00 +0100
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
49/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm: remove printk_nmi_.*()
Date: Fri, 15 Feb 2019 14:34:20 +0100
It is no longer provided by the printk core code.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
50/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: only allow kernel to emergency message
Date: Sun, 17 Feb 2019 03:11:20 +0100
Emergency messages exist as a mechanism for the kernel to
communicate critical information to users. It is not meant for
use by userspace. Only allow facility=0 messages to be
processed by the emergency message code.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
51/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: devkmsg: llseek: reset clear if it is lost
Date: Fri, 22 Feb 2019 23:02:44 +0100
SEEK_DATA will seek to the last clear record. If this clear record
is no longer in the ring buffer, devkmsg_llseek() will go into an
infinite loop. Fix that by resetting the clear sequence if the old
clear record is no longer in the ring buffer.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
52/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: printk: print "rate-limitted" message as info
Date: Fri, 22 Feb 2019 12:47:13 +0100
If messages which are injected via kmsg are dropped then they don't need
to be printed as warnings. This is to avoid latency spikes if the
interface decides to print a lot of important messages.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
53/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: kmsg_dump: remove mutex usage
Date: Wed, 24 Apr 2019 16:36:04 +0200
The kmsg dumper can be called from any context, but the dumping
helpers were using a mutex to synchronize the iterator against
concurrent dumps.
Rather than trying to synchronize the iterator, use a local copy
of the iterator during the dump. Then no synchronization is
required.
Reported-by: Scott Wood <swood@redhat.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
54/252 [
Author: He Zhe
Email: zhe.he@windriver.com
Subject: printk: devkmsg: read: Return EPIPE when the first message user-space wants has gone
Date: Tue, 24 Sep 2019 15:26:39 +0800
When user-space wants to read the first message, that is when user->seq
is 0, and that message has gone, it currently automatically resets
user->seq to current first seq. This mis-aligns with mainline kernel.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/ABI/testing/dev-kmsg#n39
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/printk/printk.c#n899
We should inform user-space that what it wants has gone by returning EPIPE
in such scenario.
Link: https://lore.kernel.org/r/20190924072639.25986-1-zhe.he@windriver.com
Signed-off-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
55/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: handle iterating while buffer changing
Date: Mon, 7 Oct 2019 16:20:39 +0200
The syslog and kmsg_dump readers are provided buffers to fill.
Both try to maximize the provided buffer usage by calculating the
maximum number of messages that can fit. However, if after the
calculation, messages are dropped and new messages added, the
calculation will no longer match.
For syslog, add a check to make sure the provided buffer is not
overfilled.
For kmsg_dump, start over by recalculating the messages
available.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
56/252 [
Author: John Ogness
Email: john.ogness@linutronix.de
Subject: printk: hack out emergency loglevel usage
Date: Tue, 3 Dec 2019 09:14:57 +0100
Instead of using an emergency loglevel to determine if atomic
messages should be printed, use oops_in_progress. This conforms
to the decision that latency-causing atomic messages never be
generated during normal operation.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
57/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t
Date: Fri, 15 Nov 2019 18:54:20 +0100
Bit spinlocks are problematic if PREEMPT_RT is enabled, because they
disable preemption, which is undesired for latency reasons and breaks when
regular spinlocks are taken within the bit_spinlock locked region because
regular spinlocks are converted to 'sleeping spinlocks' on RT. So RT
replaces the bit spinlocks with regular spinlocks to avoid this problem.
Bit spinlocks are also not covered by lock debugging, e.g. lockdep.
Substitute the BH_Uptodate_Lock bit spinlock with a regular spinlock.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: remove the wrapper and use always spinlock_t and move it into
the padding hole]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
58/252 [
Author: Clark Williams
Email: williams@redhat.com
Subject: thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t
Date: Mon, 15 Jul 2019 15:25:00 -0500
The spinlock pkg_temp_lock has the potential of being taken in atomic
context because it can be acquired from the thermal IRQ vector.
It's static and limited scope so go ahead and make it a raw spinlock.
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
59/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: perf/core: Add SRCU annotation for pmus list walk
Date: Fri, 15 Nov 2019 18:04:07 +0100
Since commit
28875945ba98d ("rcu: Add support for consolidated-RCU reader checking")
there is an additional check to ensure that a RCU related lock is held
while the RCU list is iterated.
This section holds the SRCU reader lock instead.
Add annotation to list_for_each_entry_rcu() that pmus_srcu must be
acquired during the list traversal.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
60/252 [
Author: He Zhe
Email: zhe.he@windriver.com
Subject: kmemleak: Turn kmemleak_lock and object->lock to raw_spinlock_t
Date: Wed, 19 Dec 2018 16:30:57 +0100
kmemleak_lock as a rwlock on RT can possibly be acquired in atomic context
which does work on RT.
Since the kmemleak operation is performed in atomic context make it a
raw_spinlock_t so it can also be acquired on RT. This is used for
debugging and is not enabled by default in a production like environment
(where performance/latency matters) so it makes sense to make it a
raw_spinlock_t instead trying to get rid of the atomic context.
Turn also the kmemleak_object->lock into raw_spinlock_t which is
acquired (nested) while the kmemleak_lock is held.
The time spent in "echo scan > kmemleak" slightly improved on 64core box
with this patch applied after boot.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lkml.kernel.org/r/20181218150744.GB20197@arrakis.emea.arm.com
Link: https://lkml.kernel.org/r/1542877459-144382-1-git-send-email-zhe.he@windriver.com
Link: https://lkml.kernel.org/r/20190927082230.34152-1-yongxin.liu@windriver.com
Signed-off-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Liu Haitao <haitao.liu@windriver.com>
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
[bigeasy: Redo the description. Merge the individual bits: He Zhe did
the kmemleak_lock, Liu Haitao the ->lock and Yongxin Liu forwarded the
patch.]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
61/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: Use CONFIG_PREEMPTION
Date: Fri, 26 Jul 2019 11:30:49 +0200
Thisi is an all-in-one patch of the current `PREEMPTION' branch.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
62/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: BPF: Disable on PREEMPT_RT
Date: Thu, 10 Oct 2019 16:54:45 +0200
Disable BPF on PREEMPT_RT because
- it allocates and frees memory in atomic context
- it uses up_read_non_owner()
- BPF_PROG_RUN() expects to be invoked in non-preemptible context
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
63/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: workqueue: Don't assume that the callback has interrupts disabled
Date: Tue, 11 Jun 2019 11:21:02 +0200
Due to the TIMER_IRQSAFE flag, the timer callback is invoked with
disabled interrupts. On -RT the callback is invoked in softirq context
with enabled interrupts. Since the interrupts are threaded, there are
are no in_irq() users. The local_bh_disable() around the threaded
handler ensures that there is either a timer or a threaded handler
active on the CPU.
Disable interrupts before __queue_work() is invoked from the timer
callback.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
64/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: sched/swait: Add swait_event_lock_irq()
Date: Wed, 22 May 2019 12:42:26 +0200
The swait_event_lock_irq() is inspired by wait_event_lock_irq(). This is
required by the workqueue code once it switches to swait.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
65/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: workqueue: Use swait for wq_manager_wait
Date: Tue, 11 Jun 2019 11:21:09 +0200
In order for the workqueue code use raw_spinlock_t typed locking there
must not be a spinlock_t typed lock be acquired. A wait_queue_head uses
a spinlock_t lock for its list protection.
Use a swait based queue head to avoid raw_spinlock_t -> spinlock_t
locking.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
66/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: workqueue: Convert the locks to raw type
Date: Wed, 22 May 2019 12:43:56 +0200
After all the workqueue and the timer rework, we can finally make the
worker_pool lock raw.
The lock is not held over an unbounded period of time/iterations.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
67/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/compaction: Disable compact_unevictable_allowed on RT
Date: Fri, 8 Nov 2019 12:55:47 +0100
Since commit
5bbe3547aa3ba ("mm: allow compaction of unevictable pages")
it is allowed to examine mlocked pages for pages to compact by default.
On -RT even minor pagefaults are problematic because it may take a few
100us to resolve them and until then the task is blocked.
Make compact_unevictable_allowed = 0 default and remove it from /proc on
RT.
Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
68/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: Remove ->css_rstat_flush()
Date: Thu, 15 Aug 2019 18:14:16 +0200
I was looking at the lifetime of the the ->css_rstat_flush() to see if
cgroup_rstat_cpu_lock should remain a raw_spinlock_t. I didn't find any
users and is unused since it was introduced in commit
8f53470bab042 ("cgroup: Add cgroup_subsys->css_rstat_flush()")
Remove the css_rstat_flush callback because it has no users.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
69/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: Consolidate users of cgroup_rstat_lock.
Date: Fri, 16 Aug 2019 12:20:42 +0200
cgroup_rstat_flush_irqsafe() has no users, remove it.
cgroup_rstat_flush_hold() and cgroup_rstat_flush_release() are only used within
this file. Make it static.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
70/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: Remove `may_sleep' from cgroup_rstat_flush_locked()
Date: Fri, 16 Aug 2019 12:25:35 +0200
cgroup_rstat_flush_locked() is always invoked with `may_sleep' set to
true so that this case can be made default and the parameter removed.
Remove the `may_sleep' parameter.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
71/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: cgroup: Acquire cgroup_rstat_lock with enabled interrupts
Date: Fri, 16 Aug 2019 12:49:36 +0200
There is no need to disable interrupts while cgroup_rstat_lock is
acquired. The lock is never used in-IRQ context so a simple spin_lock()
is enough for synchronisation purpose.
Acquire cgroup_rstat_lock without disabling interrupts and ensure that
cgroup_rstat_cpu_lock is acquired with disabled interrupts (this one is
acquired in-IRQ context).
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
72/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm: workingset: replace IRQ-off check with a lockdep assert.
Date: Mon, 11 Feb 2019 10:40:46 +0100
Commit
68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow nodes")
introduced an IRQ-off check to ensure that a lock is held which also
disabled interrupts. This does not work the same way on -RT because none
of the locks, that are held, disable interrupts.
Replace this check with a lockdep assert which ensures that the lock is
held.
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
73/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: tpm: remove tpm_dev_wq_lock
Date: Mon, 11 Feb 2019 11:33:11 +0100
Added in commit
9e1b74a63f776 ("tpm: add support for nonblocking operation")
but never actually used it.
Cc: Philip Tricca <philip.b.tricca@intel.com>
Cc: Tadeusz Struk <tadeusz.struk@intel.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
74/252 [
Author: Rob Herring
Email: robh@kernel.org
Subject: of: Rework and simplify phandle cache to use a fixed size
Date: Wed, 11 Dec 2019 17:23:45 -0600
The phandle cache was added to speed up of_find_node_by_phandle() by
avoiding walking the whole DT to find a matching phandle. The
implementation has several shortcomings:
- The cache is designed to work on a linear set of phandle values.
This is true for dtc generated DTs, but not for other cases such as
Power.
- The cache isn't enabled until of_core_init() and a typical system
may see hundreds of calls to of_find_node_by_phandle() before that
point.
- The cache is freed and re-allocated when the number of phandles
changes.
- It takes a raw spinlock around a memory allocation which breaks on
RT.
Change the implementation to a fixed size and use hash_32() as the
cache index. This greatly simplifies the implementation. It avoids
the need for any re-alloc of the cache and taking a reference on nodes
in the cache. We only have a single source of removing cache entries
which is of_detach_node().
Using hash_32() removes any assumption on phandle values improving
the hit rate for non-linear phandle values. The effect on linear values
using hash_32() is about a 10% collision. The chances of thrashing on
colliding values seems to be low.
To compare performance, I used a RK3399 board which is a pretty typical
system. I found that just measuring boot time as done previously is
noisy and may be impacted by other things. Also bringing up secondary
cores causes some issues with measuring, so I booted with 'nr_cpus=1'.
With no caching, calls to of_find_node_by_phandle() take about 20124 us
for 1248 calls. There's an additional 288 calls before time keeping is
up. Using the average time per hit/miss with the cache, we can calculate
these calls to take 690 us (277 hit / 11 miss) with a 128 entry cache
and 13319 us with no cache or an uninitialized cache.
Comparing the 3 implementations the time spent in
of_find_node_by_phandle() is:
no cache: 20124 us (+ 13319 us)
128 entry cache: 5134 us (+ 690 us)
current cache: 819 us (+ 13319 us)
We could move the allocation of the cache earlier to improve the
current cache, but that just further complicates the situation as it
needs to be after slab is up, so we can't do it when unflattening (which
uses memblock).
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lkml.kernel.org/r/20191211232345.24810-1-robh@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
75/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: timekeeping: Split jiffies seqlock
Date: Thu, 14 Feb 2013 22:36:59 +0100
Replace jiffies_lock seqlock with a simple seqcounter and a rawlock so
it can be taken in atomic context on RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
76/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signal: Revert ptrace preempt magic
Date: Wed, 21 Sep 2011 19:57:12 +0200
Upstream commit '53da1d9456fe7f8 fix ptrace slowness' is nothing more
than a bandaid around the ptrace design trainwreck. It's not a
correctness issue, it's merily a cosmetic bandaid.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
77/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: dma-buf: Use seqlock_t instread disabling preemption
Date: Wed, 14 Aug 2019 16:38:43 +0200
"dma reservation" disables preemption while acquiring the write access
for "seqcount".
Replace the seqcount with a seqlock_t which provides seqcount like
semantic and lock for writer.
Link: https://lkml.kernel.org/r/f410b429-db86-f81c-7c67-f563fa808b62@free.fr
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
78/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: seqlock: Prevent rt starvation
Date: Wed, 22 Feb 2012 12:03:30 +0100
If a low prio writer gets preempted while holding the seqlock write
locked, a high prio reader spins forever on RT.
To prevent this let the reader grab the spinlock, so it blocks and
eventually boosts the writer. This way the writer can proceed and
endless spinning is prevented.
For seqcount writers we disable preemption over the update code
path. Thanks to Al Viro for distangling some VFS code to make that
possible.
Nicholas Mc Guire:
- spin_lock+unlock => spin_unlock_wait
- __write_seqcount_begin => __raw_write_seqcount_begin
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
79/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: NFSv4: replace seqcount_t with a seqlock_t
Date: Fri, 28 Oct 2016 23:05:11 +0200
The raw_write_seqcount_begin() in nfs4_reclaim_open_state() causes a
preempt_disable() on -RT. The spin_lock()/spin_unlock() in that section does
not work.
The lockdep part was removed in commit
abbec2da13f0 ("NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state")
because lockdep complained.
The whole seqcount thing was introduced in commit
c137afabe330 ("NFSv4: Allow the state manager to mark an open_owner as being recovered").
The recovery threads runs only once.
write_seqlock() does not work on !RT because it disables preemption and it the
writer side is preemptible (has to remain so despite the fact that it will
block readers).
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Link: https://lkml.kernel.org/r/20161021164727.24485-1-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
80/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/Qdisc: use a seqlock instead seqcount
Date: Wed, 14 Sep 2016 17:36:35 +0200
The seqcount disables preemption on -RT while it is held which can't
remove. Also we don't want the reader to spin for ages if the writer is
scheduled out. The seqlock on the other hand will serialize / sleep on
the lock while writer is active.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
81/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: Add a mutex around devnet_rename_seq
Date: Wed, 20 Mar 2013 18:06:20 +0100
On RT write_seqcount_begin() disables preemption and device_rename()
allocates memory with GFP_KERNEL and grabs later the sysfs_mutex
mutex. Serialize with a mutex and add use the non preemption disabling
__write_seqcount_begin().
To avoid writer starvation, let the reader grab the mutex and release
it when it detects a writer in progress. This keeps the normal case
(no reader on the fly) fast.
[ tglx: Instead of replacing the seqcount by a mutex, add the mutex ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
82/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: userfaultfd: Use a seqlock instead of seqcount
Date: Wed, 18 Dec 2019 12:25:09 +0100
On RT write_seqcount_begin() disables preemption which leads to warning
in add_wait_queue() while the spinlock_t is acquired.
The waitqueue can't be converted to swait_queue because
userfaultfd_wake_function() is used as a custom wake function.
Use seqlock instead seqcount to avoid the preempt_disable() section
during add_wait_queue().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
83/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/nfs: turn rmdir_sem into a semaphore
Date: Thu, 15 Sep 2016 10:51:27 +0200
The RW semaphore had a reader side which used the _non_owner version
because it most likely took the reader lock in one thread and released it
in another which would cause lockdep to complain if the "regular"
version was used.
On -RT we need the owner because the rw lock is turned into a rtmutex.
The semaphores on the hand are "plain simple" and should work as
expected. We can't have multiple readers but on -RT we don't allow
multiple readers anyway so that is not a loss.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
84/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: disable preemption on i_dir_seq's write side
Date: Fri, 20 Oct 2017 11:29:53 +0200
i_dir_seq is an opencoded seqcounter. Based on the code it looks like we
could have two writers in parallel despite the fact that the d_lock is
held. The problem is that during the write process on RT the preemption
is still enabled and if this process is interrupted by a reader with RT
priority then we lock up.
To avoid that lock up I am disabling the preemption during the update.
The rename of i_dir_seq is here to ensure to catch new write sides in
future.
Cc: stable-rt@vger.kernel.org
Reported-by: Oleg.Karfich@wago.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
85/252 [
Author: Paul Gortmaker
Email: paul.gortmaker@windriver.com
Subject: list_bl: Make list head locking RT safe
Date: Fri, 21 Jun 2013 15:07:25 -0400
As per changes in include/linux/jbd_common.h for avoiding the
bit_spin_locks on RT ("fs: jbd/jbd2: Make state lock and journal
head lock rt safe") we do the same thing here.
We use the non atomic __set_bit and __clear_bit inside the scope of
the lock to preserve the ability of the existing LIST_DEBUG code to
use the zero'th bit in the sanity checks.
As a bit spinlock, we had no lockdep visibility into the usage
of the list head locking. Now, if we were to implement it as a
standard non-raw spinlock, we would see:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
in_atomic(): 1, irqs_disabled(): 0, pid: 122, name: udevd
5 locks held by udevd/122:
#0: (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffff811967e8>] lock_rename+0xe8/0xf0
#1: (rename_lock){+.+...}, at: [<ffffffff811a277c>] d_move+0x2c/0x60
#2: (&dentry->d_lock){+.+...}, at: [<ffffffff811a0763>] dentry_lock_for_move+0xf3/0x130
#3: (&dentry->d_lock/2){+.+...}, at: [<ffffffff811a0734>] dentry_lock_for_move+0xc4/0x130
#4: (&dentry->d_lock/3){+.+...}, at: [<ffffffff811a0747>] dentry_lock_for_move+0xd7/0x130
Pid: 122, comm: udevd Not tainted 3.4.47-rt62 #7
Call Trace:
[<ffffffff810b9624>] __might_sleep+0x134/0x1f0
[<ffffffff817a24d4>] rt_spin_lock+0x24/0x60
[<ffffffff811a0c4c>] __d_shrink+0x5c/0xa0
[<ffffffff811a1b2d>] __d_drop+0x1d/0x40
[<ffffffff811a24be>] __d_move+0x8e/0x320
[<ffffffff811a278e>] d_move+0x3e/0x60
[<ffffffff81199598>] vfs_rename+0x198/0x4c0
[<ffffffff8119b093>] sys_renameat+0x213/0x240
[<ffffffff817a2de5>] ? _raw_spin_unlock+0x35/0x60
[<ffffffff8107781c>] ? do_page_fault+0x1ec/0x4b0
[<ffffffff817a32ca>] ? retint_swapgs+0xe/0x13
[<ffffffff813eb0e6>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8119b0db>] sys_rename+0x1b/0x20
[<ffffffff817a3b96>] system_call_fastpath+0x1a/0x1f
Since we are only taking the lock during short lived list operations,
lets assume for now that it being raw won't be a significant latency
concern.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
[julia@ni.com: Use #define instead static inline to avoid false positive from
lockdep]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
86/252 [
Author: Clark Williams
Email: williams@redhat.com
Subject: fscache: initialize cookie hash table raw spinlocks
Date: Tue, 3 Jul 2018 13:34:30 -0500
The fscache cookie mechanism uses a hash table of hlist_bl_head structures. The
PREEMPT_RT patcheset adds a raw spinlock to this structure and so on PREEMPT_RT
the structures get used uninitialized, causing warnings about bad magic numbers
when spinlock debugging is turned on.
Use the init function for fscache cookies.
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
87/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: bring back explicit INIT_HLIST_BL_HEAD init
Date: Wed, 13 Sep 2017 12:32:34 +0200
Commit 3d375d78593c ("mm: update callers to use HASH_ZERO flag") removed
INIT_HLIST_BL_HEAD and uses the ZERO flag instead for the init. However
on RT we have also a spinlock which needs an init call so we can't use
that.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
88/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: fs/dcache: use swait_queue instead of waitqueue
Date: Wed, 14 Sep 2016 14:35:49 +0200
__d_lookup_done() invokes wake_up_all() while holding a hlist_bl_lock()
which disables preemption. As a workaround convert it to swait.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
89/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: kconfig: Disable config options which are not RT compatible
Date: Sun, 24 Jul 2011 12:11:43 +0200
Disable stuff which is known to have issues on RT
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
90/252 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: Allow only SLUB on RT
Date: Fri, 3 Jul 2009 08:44:03 -0500
Memory allocation disables interrupts as part of the allocation and freeing
process. For -RT it is important that this section remain short and don't
depend on the size of the request or an internal state of the memory allocator.
At the beginning the SLAB memory allocator was adopted for RT's needs and it
required substantial changes. Later, with the addition of the SLUB memory
allocator we adopted this one as well and the changes were smaller. More
important, due to the design of the SLUB allocator it performs better and its
worst case latency was smaller. In the end only SLUB remained supported.
Disable SLAB and SLOB on -RT. Only SLUB is adopted to -RT needs.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
91/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rcu: make RCU_BOOST default on RT
Date: Fri, 21 Mar 2014 20:19:05 +0100
Since it is no longer invoked from the softirq people run into OOM more
often if the priority of the RCU thread is too low. Making boosting
default on RT should help in those case and it can be switched off if
someone knows better.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
92/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable CONFIG_RT_GROUP_SCHED on RT
Date: Mon, 18 Jul 2011 17:03:52 +0200
Carsten reported problems when running:
taskset 01 chrt -f 1 sleep 1
from within rc.local on a F15 machine. The task stays running and
never gets on the run queue because some of the run queues have
rt_throttled=1 which does not go away. Works nice from a ssh login
shell. Disabling CONFIG_RT_GROUP_SCHED solves that as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
93/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: disable NET_RX_BUSY_POLL on RT
Date: Sat, 27 May 2017 19:02:06 +0200
napi_busy_loop() disables preemption and performs a NAPI poll. We can't acquire
sleeping locks with disabled preemption so we would have to work around this
and add explicit locking for synchronisation against ksoftirqd.
Without explicit synchronisation a low priority process would "own" the NAPI
state (by setting NAPIF_STATE_SCHED) and could be scheduled out (no
preempt_disable() and BH is preemptible on RT).
In case a network packages arrives then the interrupt handler would set
NAPIF_STATE_MISSED and the system would wait until the task owning the NAPI
would be scheduled in again.
Should a task with RT priority busy poll then it would consume the CPU instead
allowing tasks with lower priority to run.
The NET_RX_BUSY_POLL is disabled by default (the system wide sysctls for
poll/read are set to zero) so disable NET_RX_BUSY_POLL on RT to avoid wrong
locking context on RT. Should this feature be considered useful on RT systems
then it could be enabled again with proper locking and synchronisation.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
94/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: md: disable bcache
Date: Thu, 29 Aug 2013 11:48:57 +0200
It uses anon semaphores
|drivers/md/bcache/request.c: In function ‘cached_dev_write_complete’:
|drivers/md/bcache/request.c:1007:2: error: implicit declaration of function ‘up_read_non_owner’ [-Werror=implicit-function-declaration]
| up_read_non_owner(&dc->writeback_lock);
| ^
|drivers/md/bcache/request.c: In function ‘request_write’:
|drivers/md/bcache/request.c:1033:2: error: implicit declaration of function ‘down_read_non_owner’ [-Werror=implicit-function-declaration]
| down_read_non_owner(&dc->writeback_lock);
| ^
either we get rid of those or we have to introduce them…
Link: http://lkml.kernel.org/r/20130820111602.3cea203c@gandalf.local.home
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
95/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Disable runtime services on RT
Date: Thu, 26 Jul 2018 15:03:16 +0200
Based on meassurements the EFI functions get_variable /
get_next_variable take up to 2us which looks okay.
The functions get_time, set_time take around 10ms. Those 10ms are too
much. Even one ms would be too much.
Ard mentioned that SetVariable might even trigger larger latencies if
the firware will erase flash blocks on NOR.
The time-functions are used by efi-rtc and can be triggered during
runtimed (either via explicit read/write or ntp sync).
The variable write could be used by pstore.
These functions can be disabled without much of a loss. The poweroff /
reboot hooks may be provided by PSCI.
Disable EFI's runtime wrappers.
This was observed on "EFI v2.60 by SoftIron Overdrive 1000".
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
96/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: efi: Allow efi=runtime
Date: Thu, 26 Jul 2018 15:06:10 +0200
In case the command line option "efi=noruntime" is default at built-time, the user
could overwrite its state by `efi=runtime' and allow it again.
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
97/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Disable HAVE_ARCH_JUMP_LABEL
Date: Mon, 1 Jul 2019 17:39:28 +0200
__text_poke() does:
| local_irq_save(flags);
…
| ptep = get_locked_pte(poking_mm, poking_addr, &ptl);
which does not work on -RT because the PTE-lock is a spinlock_t typed lock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
98/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Add local irq locks
Date: Mon, 20 Jun 2011 09:03:47 +0200
Introduce locallock. For !RT this maps to preempt_disable()/
local_irq_disable() so there is not much that changes. For RT this will
map to a spinlock. This makes preemption possible and locked "ressource"
gets the lockdep anotation it wouldn't have otherwise. The locks are
recursive for owner == current. Also, all locks user migrate_disable()
which ensures that the task is not migrated to another CPU while the lock
is held and the owner is preempted.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
99/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: softirq: Add preemptible softirq
Date: Mon, 20 May 2019 13:09:08 +0200
Add preemptible softirq for RT's needs. By removing the softirq count
from the preempt counter, the softirq becomes preemptible. A per-CPU
lock ensures that there is no parallel softirq processing or that
per-CPU variables are not access in parallel by multiple threads.
local_bh_enable() will process all softirq work that has been raised in
its BH-disabled section once the BH counter gets to 0.
[+ rcu_read_lock() as part of local_bh_disable() by Scott Wood]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
100/252 [
Author: Oleg Nesterov
Email: oleg@redhat.com
Subject: signal/x86: Delay calling signals in atomic
Date: Tue, 14 Jul 2015 14:26:34 +0200
On x86_64 we must disable preemption before we enable interrupts
for stack faults, int3 and debugging, because the current task is using
a per CPU debug stack defined by the IST. If we schedule out, another task
can come in and use the same stack and cause the stack to be corrupted
and crash the kernel on return.
When CONFIG_PREEMPT_RT is enabled, spin_locks become mutexes, and
one of these is the spin lock used in signal handling.
Some of the debug code (int3) causes do_trap() to send a signal.
This function calls a spin lock that has been converted to a mutex
and has the possibility to sleep. If this happens, the above issues with
the corrupted stack is possible.
Instead of calling the signal right away, for PREEMPT_RT and x86_64,
the signal information is stored on the stacks task_struct and
TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
code will send the signal when preemption is enabled.
[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: also needed on 32bit as per Yang Shi <yang.shi@linaro.org>]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
101/252 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: Split IRQ-off and zone->lock while freeing pages from PCP list #2
Date: Mon, 28 May 2018 15:24:21 +0200
Split the IRQ-off section while accessing the PCP list from zone->lock
while freeing pages.
Introcude isolate_pcp_pages() which separates the pages from the PCP
list onto a temporary list and then free the temporary list via
free_pcppages_bulk().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
102/252 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: Split IRQ-off and zone->lock while freeing pages from PCP list #2
Date: Mon, 28 May 2018 15:24:21 +0200
Split the IRQ-off section while accessing the PCP list from zone->lock
while freeing pages.
Introcude isolate_pcp_pages() which separates the pages from the PCP
list onto a temporary list and then free the temporary list via
free_pcppages_bulk().
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
103/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/SLxB: change list_lock to raw_spinlock_t
Date: Mon, 28 May 2018 15:24:22 +0200
The list_lock is used with used with IRQs off on RT. Make it a raw_spinlock_t
otherwise the interrupts won't be disabled on -RT. The locking rules remain
the same on !RT.
This patch changes it for SLAB and SLUB since both share the same header
file for struct kmem_cache_node defintion.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
104/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/SLUB: delay giving back empty slubs to IRQ enabled regions
Date: Thu, 21 Jun 2018 17:29:19 +0200
__free_slab() is invoked with disabled interrupts which increases the
irq-off time while __free_pages() is doing the work.
Allow __free_slab() to be invoked with enabled interrupts and move
everything from interrupts-off invocations to a temporary per-CPU list
so it can be processed later.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
105/252 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm: page_alloc: rt-friendly per-cpu pages
Date: Fri, 3 Jul 2009 08:29:37 -0500
rt-friendly per-cpu pages: convert the irqs-off per-cpu locking
method into a preemptible, explicit-per-cpu-locks method.
Contains fixes from:
Peter Zijlstra <a.p.zijlstra@chello.nl>
Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
106/252 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: mm/page_alloc: Split drain_local_pages()
Date: Thu, 18 Apr 2019 11:09:04 +0200
Splitting the functionality of drain_local_pages() into a separate
function. This is a preparatory work for introducing the static key
dependend locking mechanism.
No functional change.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
107/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/swap: Add static key dependent pagevec locking
Date: Thu, 18 Apr 2019 11:09:05 +0200
The locking of struct pagevec is done by disabling preemption. In case the
struct has be accessed form interrupt context then interrupts are
disabled. This means the struct can only be accessed locally from the
CPU. There is also no lockdep coverage which would scream during if it
accessed from wrong context.
Create struct swap_pagevec which contains of a pagevec member and a
spin_lock_t. Introduce a static key, which changes the locking behavior
only if the key is set in the following way: Before the struct is accessed
the spin_lock has to be acquired instead of using preempt_disable(). Since
the struct is used CPU-locally there is no spinning on the lock but the
lock is acquired immediately. If the struct is accessed from interrupt
context, spin_lock_irqsave() is used.
No functional change yet because static key is not enabled.
[anna-maria: introduce static key]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
108/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/swap: Access struct pagevec remotely
Date: Thu, 18 Apr 2019 11:09:06 +0200
When the newly introduced static key would be enabled, struct pagevec is
locked during access. So it is possible to access it from a remote CPU. The
advantage is that the work can be done from the "requesting" CPU without
firing a worker on a remote CPU and waiting for it to complete the work.
No functional change because static key is not enabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
109/252 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: mm/swap: Enable "use_pvec_lock" nohz_full dependent
Date: Thu, 18 Apr 2019 11:09:07 +0200
When a system runs with CONFIG_NO_HZ_FULL enabled, the tick of CPUs listed
in 'nohz_full=' kernel command line parameter should be stopped whenever
possible. The tick stays longer stopped, when work for this CPU is handled
by another CPU.
With the already introduced static key 'use_pvec_lock' there is the
possibility to prevent firing a worker for mm/swap work on a remote CPU
with a stopped tick.
Therefore enabling the static key in case kernel command line parameter
'nohz_full=' setup was successful, which implies that CONFIG_NO_HZ_FULL is
set.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
110/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/swap: Enable use pvec lock on RT
Date: Mon, 12 Aug 2019 11:20:44 +0200
On RT we also need to avoid preempt disable/IRQ-off regions so have to enable
the locking while accessing pvecs.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
111/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: preempt: Provide preempt_*_(no)rt variants
Date: Fri, 24 Jul 2009 12:38:56 +0200
RT needs a few preempt_disable/enable points which are not necessary
otherwise. Implement variants to avoid #ifdeffery.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
112/252 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: mm/vmstat: Protect per cpu variables with preempt disable on RT
Date: Fri, 3 Jul 2009 08:30:13 -0500
Disable preemption on -RT for the vmstat code. On vanila the code runs in
IRQ-off regions while on -RT it is not. "preempt_disable" ensures that the
same ressources is not updated in parallel due to preemption.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
113/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm: Enable SLUB for RT
Date: Thu, 25 Oct 2012 10:32:35 +0100
Avoid the memory allocation in IRQ section
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: factor out everything except the kcalloc() workaorund ]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
114/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: slub: Enable irqs for __GFP_WAIT
Date: Wed, 9 Jan 2013 12:08:15 +0100
SYSTEM_RUNNING might be too late for enabling interrupts. Allocations
with GFP_WAIT can happen before that. So use this as an indicator.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
115/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: slub: Disable SLUB_CPU_PARTIAL
Date: Wed, 15 Apr 2015 19:00:47 +0200
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 87, name: rcuop/7
|1 lock held by rcuop/7/87:
| #0: (rcu_callback){......}, at: [<ffffffff8112c76a>] rcu_nocb_kthread+0x1ca/0x5d0
|Preemption disabled at:[<ffffffff811eebd9>] put_cpu_partial+0x29/0x220
|
|CPU: 0 PID: 87 Comm: rcuop/7 Tainted: G W 4.0.0-rt0+ #477
|Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
| 000000000007a9fc ffff88013987baf8 ffffffff817441c7 0000000000000007
| 0000000000000000 ffff88013987bb18 ffffffff810eee51 0000000000000000
| ffff88013fc10200 ffff88013987bb48 ffffffff8174a1c4 000000000007a9fc
|Call Trace:
| [<ffffffff817441c7>] dump_stack+0x4f/0x90
| [<ffffffff810eee51>] ___might_sleep+0x121/0x1b0
| [<ffffffff8174a1c4>] rt_spin_lock+0x24/0x60
| [<ffffffff811a689a>] __free_pages_ok+0xaa/0x540
| [<ffffffff811a729d>] __free_pages+0x1d/0x30
| [<ffffffff811eddd5>] __free_slab+0xc5/0x1e0
| [<ffffffff811edf46>] free_delayed+0x56/0x70
| [<ffffffff811eecfd>] put_cpu_partial+0x14d/0x220
| [<ffffffff811efc98>] __slab_free+0x158/0x2c0
| [<ffffffff811f0021>] kmem_cache_free+0x221/0x2d0
| [<ffffffff81204d0c>] file_free_rcu+0x2c/0x40
| [<ffffffff8112c7e3>] rcu_nocb_kthread+0x243/0x5d0
| [<ffffffff810e951c>] kthread+0xfc/0x120
| [<ffffffff8174abc8>] ret_from_fork+0x58/0x90
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
116/252 [
Author: Yang Shi
Email: yang.shi@windriver.com
Subject: mm/memcontrol: Don't call schedule_work_on in preemption disabled context
Date: Wed, 30 Oct 2013 11:48:33 -0700
The following trace is triggered when running ltp oom test cases:
BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 17188, name: oom03
Preemption disabled at:[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
CPU: 2 PID: 17188 Comm: oom03 Not tainted 3.10.10-rt3 #2
Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
ffff88007684d730 ffff880070df9b58 ffffffff8169918d ffff880070df9b70
ffffffff8106db31 ffff88007688b4a0 ffff880070df9b88 ffffffff8169d9c0
ffff88007688b4a0 ffff880070df9bc8 ffffffff81059da1 0000000170df9bb0
Call Trace:
[<ffffffff8169918d>] dump_stack+0x19/0x1b
[<ffffffff8106db31>] __might_sleep+0xf1/0x170
[<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
[<ffffffff81059da1>] queue_work_on+0x61/0x100
[<ffffffff8112b361>] drain_all_stock+0xe1/0x1c0
[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
[<ffffffff8112beda>] __mem_cgroup_try_charge+0x41a/0xc40
[<ffffffff810f1c91>] ? release_pages+0x1b1/0x1f0
[<ffffffff8106f200>] ? sched_exec+0x40/0xb0
[<ffffffff8112cc87>] mem_cgroup_charge_common+0x37/0x70
[<ffffffff8112e2c6>] mem_cgroup_newpage_charge+0x26/0x30
[<ffffffff8110af68>] handle_pte_fault+0x618/0x840
[<ffffffff8103ecf6>] ? unpin_current_cpu+0x16/0x70
[<ffffffff81070f94>] ? migrate_enable+0xd4/0x200
[<ffffffff8110cde5>] handle_mm_fault+0x145/0x1e0
[<ffffffff810301e1>] __do_page_fault+0x1a1/0x4c0
[<ffffffff8169c9eb>] ? preempt_schedule_irq+0x4b/0x70
[<ffffffff8169e3b7>] ? retint_kernel+0x37/0x40
[<ffffffff8103053e>] do_page_fault+0xe/0x10
[<ffffffff8169e4c2>] page_fault+0x22/0x30
So, to prevent schedule_work_on from being called in preempt disabled context,
replace the pair of get/put_cpu() to get/put_cpu_light().
Signed-off-by: Yang Shi <yang.shi@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
117/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: mm/memcontrol: Replace local_irq_disable with local locks
Date: Wed, 28 Jan 2015 17:14:16 +0100
There are a few local_irq_disable() which then take sleeping locks. This
patch converts them local locks.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
118/252 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: mm/zsmalloc: copy with get_cpu_var() and locking
Date: Tue, 22 Mar 2016 11:16:09 +0100
get_cpu_var() disables preemption and triggers a might_sleep() splat later.
This is replaced with get_locked_var().
This bitspinlocks are replaced with a proper mutex which requires a slightly
larger struct to allocate.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: replace the bitspin_lock() with a mutex, get_locked_var(). Mike then
fixed the size magic]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
119/252 [
Author: Luis Claudio R. Goncalves
Email: lclaudio@uudg.org
Subject: mm/zswap: Do not disable preemption in zswap_frontswap_store()
Date: Tue, 25 Jun 2019 11:28:04 -0300
Zswap causes "BUG: scheduling while atomic" by blocking on a rt_spin_lock() with
preemption disabled. The preemption is disabled by get_cpu_var() in
zswap_frontswap_store() to protect the access of the zswap_dstmem percpu variable.
Use get_locked_var() to protect the percpu zswap_dstmem variable, making the
code preemptive.
As get_cpu_ptr() also disables preemption, replace it by this_cpu_ptr() and
remove the counterpart put_cpu_ptr().
Steps to Reproduce:
1. # grubby --args "zswap.enabled=1" --update-kernel DEFAULT
2. # reboot
3. Calculate the amount o memory to be used by the test:
---> grep MemAvailable /proc/meminfo
---> Add 25% ~ 50% to that value
4. # stress --vm 1 --vm-bytes ${MemAvailable+25%} --timeout 240s
Usually, in less than 5 minutes the backtrace listed below appears, followed
by a kernel panic:
| BUG: scheduling while atomic: kswapd1/181/0x00000002
|
| Preemption disabled at:
| [<ffffffff8b2a6cda>] zswap_frontswap_store+0x21a/0x6e1
|
| Kernel panic - not syncing: scheduling while atomic
| CPU: 14 PID: 181 Comm: kswapd1 Kdump: loaded Not tainted 5.0.14-rt9 #1
| Hardware name: AMD Pence/Pence, BIOS WPN2321X_Weekly_12_03_21 03/19/2012
| Call Trace:
| panic+0x106/0x2a7
| __schedule_bug.cold+0x3f/0x51
| __schedule+0x5cb/0x6f0
| schedule+0x43/0xd0
| rt_spin_lock_slowlock_locked+0x114/0x2b0
| rt_spin_lock_slowlock+0x51/0x80
| zbud_alloc+0x1da/0x2d0
| zswap_frontswap_store+0x31a/0x6e1
| __frontswap_store+0xab/0x130
| swap_writepage+0x39/0x70
| pageout.isra.0+0xe3/0x320
| shrink_page_list+0xa8e/0xd10
| shrink_inactive_list+0x251/0x840
| shrink_node_memcg+0x213/0x770
| shrink_node+0xd9/0x450
| balance_pgdat+0x2d5/0x510
| kswapd+0x218/0x470
| kthread+0xfb/0x130
| ret_from_fork+0x27/0x50
Cc: stable-rt@vger.kernel.org
Reported-by: Ping Fang <pifang@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
120/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: radix-tree: use local locks
Date: Wed, 25 Jan 2017 16:34:27 +0100
The preload functionality uses per-CPU variables and preempt-disable to
ensure that it does not switch CPUs during its usage. This patch adds
local_locks() instead preempt_disable() for the same purpose and to
remain preemptible on -RT.
Cc: stable-rt@vger.kernel.org
Reported-and-debugged-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
121/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: kvm Require const tsc for RT
Date: Sun, 6 Nov 2011 12:26:18 +0100
Non constant TSC is a nightmare on bare metal already, but with
virtualization it becomes a complete disaster because the workarounds
are horrible latency wise. That's also a preliminary for running RT in
a guest on top of a RT host.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
122/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: pci/switchtec: Don't use completion's wait queue
Date: Wed, 4 Oct 2017 10:24:23 +0200
The poll callback is using completion's wait_queue_head_t member and
puts it in poll_wait() so the poll() caller gets a wakeup after command
completed. This does not work on RT because we don't have a
wait_queue_head_t in our completion implementation. Nobody in tree does
like that in tree so this is the only driver that breaks.
Instead of using the completion here is waitqueue with a status flag as
suggested by Logan.
I don't have the HW so I have no idea if it works as expected, so please
test it.
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
123/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: wait.h: include atomic.h
Date: Mon, 28 Oct 2013 12:19:57 +0100
| CC init/main.o
|In file included from include/linux/mmzone.h:9:0,
| from include/linux/gfp.h:4,
| from include/linux/kmod.h:22,
| from include/linux/module.h:13,
| from init/main.c:15:
|include/linux/wait.h: In function ‘wait_on_atomic_t’:
|include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
| if (atomic_read(val) == 0)
| ^
This pops up on ARM. Non-RT gets its atomic.h include from spinlock.h
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
124/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: completion: Use simple wait queues
Date: Fri, 11 Jan 2013 11:23:51 +0100
Completions have no long lasting callbacks and therefor do not need
the complex waitqueue variant. Use simple waitqueues which reduces the
contention on the waitqueue lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[cminyard@mvista.com: Move __prepare_to_swait() into the do loop because
swake_up_locked() removes the waiter on wake from the queue while in the
original code it is not the case]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
125/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: hrtimer: Allow raw wakeups during boot
Date: Fri, 9 Aug 2019 15:25:21 +0200
There are a few wake-up timers during the early boot which are essencial for
the system to make progress. At this stage there are no softirq spawn for the
softirq processing so there is no timer processing in softirq.
The wakeup in question:
smpboot_create_thread()
-> kthread_create_on_cpu()
-> kthread_bind()
-> wait_task_inactive()
-> schedule_hrtimeout()
Let the timer fire in hardirq context during the system boot.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
126/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: hrtimer: move state change before hrtimer_cancel in do_nanosleep()
Date: Thu, 6 Dec 2018 10:15:13 +0100
There is a small window between setting t->task to NULL and waking the
task up (which would set TASK_RUNNING). So the timer would fire, run and
set ->task to NULL while the other side/do_nanosleep() wouldn't enter
freezable_schedule(). After all we are peemptible here (in
do_nanosleep() and on the timer wake up path) and on KVM/virt the
virt-CPU might get preempted.
So do_nanosleep() wouldn't enter freezable_schedule() but cancel the
timer which is still running and wait for it via
hrtimer_wait_for_timer(). Then wait_event()/might_sleep() would complain
that it is invoked with state != TASK_RUNNING.
This isn't a problem since it would be reset to TASK_RUNNING later
anyway and we don't rely on the previous state.
Move the state update to TASK_RUNNING before hrtimer_cancel() so there
are no complains from might_sleep() about wrong state.
Cc: stable-rt@vger.kernel.org
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
127/252 [
Author: John Stultz
Email: johnstul@us.ibm.com
Subject: posix-timers: Thread posix-cpu-timers on -rt
Date: Fri, 3 Jul 2009 08:29:58 -0500
posix-cpu-timer code takes non -rt safe locks in hard irq
context. Move it to a thread.
[ 3.0 fixes from Peter Zijlstra <peterz@infradead.org> ]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
128/252 [
Author: Anna-Maria Gleixner
Email: anna-maria@linutronix.de
Subject: posix-timers: Add expiry lock
Date: Mon, 27 May 2019 16:54:06 +0200
If a about to be removed posix timer is active then the code will retry the
delete operation until it succeeds / the timer callback completes.
Use hrtimer_grab_expiry_lock() for posix timers which use a hrtimer underneath
to spin on a lock until the callback finished.
Introduce cpu_timers_grab_expiry_lock() for the posix-cpu-timer. This will
acquire the proper per-CPU spin_lock which is acquired by the CPU which is
expirering the timer.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
[bigeasy: keep the posix-cpu timer bits, everything else got applied]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
129/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Limit the number of task migrations per batch
Date: Mon, 6 Jun 2011 12:12:51 +0200
Put an upper limit on the number of tasks which are migrated per batch
to avoid large latencies.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
130/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Move mmdrop to RCU on RT
Date: Mon, 6 Jun 2011 12:20:33 +0200
Takes sleeping locks and calls into the memory allocator, so nothing
we want to do in task switch and oder atomic contexts.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
131/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched: move stack + kprobe clean up to __put_task_struct()
Date: Mon, 21 Nov 2016 19:31:08 +0100
There is no need to free the stack before the task struct (except for reasons
mentioned in commit 68f24b08ee89 ("sched/core: Free the stack early if
CONFIG_THREAD_INFO_IN_TASK")). This also comes handy on -RT because we can't
free memory in preempt disabled region.
vfree_atomic() delays the memory cleanup to a worker. Since we move everything
to the RCU callback, we can also free it immediately.
Cc: stable-rt@vger.kernel.org #for kprobe_flush_task()
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
132/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add saved_state for tasks blocked on sleeping locks
Date: Sat, 25 Jun 2011 09:21:04 +0200
Spinlocks are state preserving in !RT. RT changes the state when a
task gets blocked on a lock. So we need to remember the state before
the lock contention. If a regular wakeup (not a RTmutex related
wakeup) happens, the saved_state is updated to running. When the lock
sleep is done, the saved state is restored.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
133/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Do not account rcu_preempt_depth on RT in might_sleep()
Date: Tue, 7 Jun 2011 09:19:06 +0200
RT changes the rcu_preempt_depth semantics, so we cannot check for it
in might_sleep().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
134/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Disable TTWU_QUEUE on RT
Date: Tue, 13 Sep 2011 16:42:35 +0200
The queued remote wakeup mechanism can introduce rather large
latencies if the number of migrated tasks is high. Disable it for RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
135/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: softirq: Avoid a cancel dead-lock in tasklet handling due to preemptible-softirq
Date: Sat, 22 Jun 2019 00:09:22 +0200
A pending / active tasklet which is preempted by a task on the same CPU
will spin indefinitely because the tasklet makes no progress.
To avoid this deadlock we can disable BH which will acquire the
softirq-lock which will force the completion of the softirq and so the
tasklet.
The BH off/on in tasklet_kill() will force tasklets which are not yet
running but scheduled (because ksoftirqd was preempted before it could
start the tasklet).
The BH off/on in tasklet_unlock_wait() will force tasklets which got
preempted while running.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
136/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Check preemption after reenabling interrupts
Date: Sun, 13 Nov 2011 17:17:09 +0100
raise_softirq_irqoff() disables interrupts and wakes the softirq
daemon, but after reenabling interrupts there is no preemption check,
so the execution of the softirq thread might be delayed arbitrarily.
In principle we could add that check to local_irq_enable/restore, but
that's overkill as the rasie_softirq_irqoff() sections are the only
ones which show this behaviour.
Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
137/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: softirq: Disable softirq stacks for RT
Date: Mon, 18 Jul 2011 13:59:17 +0200
Disable extra stacks for softirqs. We want to preempt softirqs and
having them on special IRQ-stack does not make this easier.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
138/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net/core: use local_bh_disable() in netif_rx_ni()
Date: Fri, 16 Jun 2017 19:03:16 +0200
In 2004 netif_rx_ni() gained a preempt_disable() section around
netif_rx() and its do_softirq() + testing for it. The do_softirq() part
is required because netif_rx() raises the softirq but does not invoke
it. The preempt_disable() is required to remain on the same CPU which added the
skb to the per-CPU list.
All this can be avoided be putting this into a local_bh_disable()ed
section. The local_bh_enable() part will invoke do_softirq() if
required.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
139/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Handle the various new futex race conditions
Date: Fri, 10 Jun 2011 11:04:15 +0200
RT opens a few new interesting race conditions in the rtmutex/futex
combo due to futex hash bucket lock being a 'sleeping' spinlock and
therefor not disabling preemption.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
140/252 [
Author: Steven Rostedt
Email: rostedt@goodmis.org
Subject: futex: Fix bug on when a requeued RT task times out
Date: Tue, 14 Jul 2015 14:26:34 +0200
Requeue with timeout causes a bug with PREEMPT_RT.
The bug comes from a timed out condition.
TASK 1 TASK 2
------ ------
futex_wait_requeue_pi()
futex_wait_queue_me()
<timed out>
double_lock_hb();
raw_spin_lock(pi_lock);
if (current->pi_blocked_on) {
} else {
current->pi_blocked_on = PI_WAKE_INPROGRESS;
run_spin_unlock(pi_lock);
spin_lock(hb->lock); <-- blocked!
plist_for_each_entry_safe(this) {
rt_mutex_start_proxy_lock();
task_blocks_on_rt_mutex();
BUG_ON(task->pi_blocked_on)!!!!
The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the
problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to
grab the hb->lock, which it fails to do so. As the hb->lock is a mutex,
it will block and set the "pi_blocked_on" to the hb->lock.
When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails
because the task1's pi_blocked_on is no longer set to that, but instead,
set to the hb->lock.
The fix:
When calling rt_mutex_start_proxy_lock() a check is made to see
if the proxy tasks pi_blocked_on is set. If so, exit out early.
Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies
the proxy task that it is being requeued, and will handle things
appropriately.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
141/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: futex: Ensure lock/unlock symetry versus pi_lock and hash bucket lock
Date: Fri, 1 Mar 2013 11:17:42 +0100
In exit_pi_state_list() we have the following locking construct:
spin_lock(&hb->lock);
raw_spin_lock_irq(&curr->pi_lock);
...
spin_unlock(&hb->lock);
In !RT this works, but on RT the migrate_enable() function which is
called from spin_unlock() sees atomic context due to the held pi_lock
and just decrements the migrate_disable_atomic counter of the
task. Now the next call to migrate_disable() sees the counter being
negative and issues a warning. That check should be in
migrate_enable() already.
Fix this by dropping pi_lock before unlocking hb->lock and reaquire
pi_lock after that again. This is safe as the loop code reevaluates
head again under the pi_lock.
Reported-by: Yong Zhang <yong.zhang@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
142/252 [
Author: Grygorii Strashko
Email: Grygorii.Strashko@linaro.org
Subject: pid.h: include atomic.h
Date: Tue, 21 Jul 2015 19:43:56 +0300
This patch fixes build error:
CC kernel/pid_namespace.o
In file included from kernel/pid_namespace.c:11:0:
include/linux/pid.h: In function 'get_pid':
include/linux/pid.h:78:3: error: implicit declaration of function 'atomic_inc' [-Werror=implicit-function-declaration]
atomic_inc(&pid->count);
^
which happens when
CONFIG_PROVE_LOCKING=n
CONFIG_DEBUG_SPINLOCK=n
CONFIG_DEBUG_MUTEXES=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_PID_NS=y
Vanilla gets this via spinlock.h.
Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
143/252 [
Author: Wolfgang M. Reimer
Email: linuxball@gmail.com
Subject: locking: locktorture: Do NOT include rwlock.h directly
Date: Tue, 21 Jul 2015 16:20:07 +0200
Including rwlock.h directly will cause kernel builds to fail
if CONFIG_PREEMPT_RT is defined. The correct header file
(rwlock_rt.h OR rwlock.h) will be included by spinlock.h which
is included by locktorture.c anyway.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Wolfgang M. Reimer <linuxball@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
144/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Add rtmutex_lock_killable()
Date: Thu, 9 Jun 2011 11:43:52 +0200
Add "killable" type to rtmutex. We need this since rtmutex are used as
"normal" mutexes which do use this type.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
145/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Make lock_killable work
Date: Sat, 1 Apr 2017 12:50:59 +0200
Locking an rt mutex killable does not work because signal handling is
restricted to TASK_INTERRUPTIBLE.
Use signal_pending_state() unconditionaly.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
146/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: spinlock: Split the lock types header
Date: Wed, 29 Jun 2011 19:34:01 +0200
Split raw_spinlock into its own file and the remaining spinlock_t into
its own non-RT header. The non-RT header will be replaced later by sleeping
spinlocks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
147/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Avoid include hell
Date: Wed, 29 Jun 2011 20:06:39 +0200
Include only the required raw types. This avoids pulling in the
complete spinlock header which in turn requires rtmutex.h at some point.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
148/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rbtree: don't include the rcu header
Date: Fri, 20 Dec 2019 11:38:21 -0500
The RCU header pulls in spinlock.h and fails due not yet defined types:
|In file included from include/linux/spinlock.h:275:0,
| from include/linux/rcupdate.h:38,
| from include/linux/rbtree.h:34,
| from include/linux/rtmutex.h:17,
| from include/linux/spinlock_types.h:18,
| from kernel/bounds.c:13:
|include/linux/rwlock_rt.h:16:38: error: unknown type name ‘rwlock_t’
| extern void __lockfunc rt_write_lock(rwlock_t *rwlock);
| ^
This patch moves the required RCU function from the rcupdate.h header file into
a new header file which can be included by both users.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
149/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: Provide rt_mutex_slowlock_locked()
Date: Thu, 12 Oct 2017 16:14:22 +0200
This is the inner-part of rt_mutex_slowlock(), required for rwsem-rt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
150/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: export lockdep-less version of rt_mutex's lock, trylock and unlock
Date: Thu, 12 Oct 2017 16:36:39 +0200
Required for lock implementation ontop of rtmutex.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
151/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add sleeping lock implementation
Date: Thu, 12 Oct 2017 17:11:19 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
152/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Use the proper LOCK_OFFSET for cond_resched()
Date: Sun, 17 Jul 2011 22:51:33 +0200
RT does not increment preempt count when a 'sleeping' spinlock is
locked. Update PREEMPT_LOCK_OFFSET for that case.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
153/252 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: locking/rtmutex: Clean ->pi_blocked_on in the error case
Date: Mon, 30 Sep 2019 18:15:44 +0200
The function rt_mutex_wait_proxy_lock() cleans ->pi_blocked_on in case
of failure (timeout, signal). The same cleanup is required in
__rt_mutex_start_proxy_lock().
In both the cases the tasks was interrupted by a signal or timeout while
acquiring the lock and after the interruption it longer blocks on the
lock.
Fixes: 1a1fb985f2e2b ("futex: Handle early deadlock return correctly")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
154/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rtmutex: trylock is okay on -RT
Date: Wed, 2 Dec 2015 11:34:07 +0100
non-RT kernel could deadlock on rt_mutex_trylock() in softirq context. On
-RT we don't run softirqs in IRQ context but in thread context so it is
not a issue here.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
155/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add mutex implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:17:03 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
156/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add rwsem implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:28:34 +0200
The RT specific R/W semaphore implementation restricts the number of readers
to one because a writer cannot block on multiple readers and inherit its
priority or budget.
The single reader restricting is painful in various ways:
- Performance bottleneck for multi-threaded applications in the page fault
path (mmap sem)
- Progress blocker for drivers which are carefully crafted to avoid the
potential reader/writer deadlock in mainline.
The analysis of the writer code pathes shows, that properly written RT tasks
should not take them. Syscalls like mmap(), file access which take mmap sem
write locked have unbound latencies which are completely unrelated to mmap
sem. Other R/W sem users like graphics drivers are not suitable for RT tasks
either.
So there is little risk to hurt RT tasks when the RT rwsem implementation is
changed in the following way:
- Allow concurrent readers
- Make writers block until the last reader left the critical section. This
blocking is not subject to priority/budget inheritance.
- Readers blocked on a writer inherit their priority/budget in the normal
way.
There is a drawback with this scheme. R/W semaphores become writer unfair
though the applications which have triggered writer starvation (mostly on
mmap_sem) in the past are not really the typical workloads running on a RT
system. So while it's unlikely to hit writer starvation, it's possible. If
there are unexpected workloads on RT systems triggering it, we need to rethink
the approach.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
157/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: add rwlock implementation based on rtmutex
Date: Thu, 12 Oct 2017 17:18:06 +0200
The implementation is bias-based, similar to the rwsem implementation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
158/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rtmutex: wire up RT's locking
Date: Thu, 12 Oct 2017 17:31:14 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
159/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: rtmutex: add ww_mutex addon for mutex-rt
Date: Thu, 12 Oct 2017 17:34:38 +0200
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
160/252 [
Author: Mikulas Patocka
Email: mpatocka@redhat.com
Subject: locking/rt-mutex: fix deadlock in device mapper / block-IO
Date: Mon, 13 Nov 2017 12:56:53 -0500
When some block device driver creates a bio and submits it to another
block device driver, the bio is added to current->bio_list (in order to
avoid unbounded recursion).
However, this queuing of bios can cause deadlocks, in order to avoid them,
device mapper registers a function flush_current_bio_list. This function
is called when device mapper driver blocks. It redirects bios queued on
current->bio_list to helper workqueues, so that these bios can proceed
even if the driver is blocked.
The problem with CONFIG_PREEMPT_RT is that when the device mapper
driver blocks, it won't call flush_current_bio_list (because
tsk_is_pi_blocked returns true in sched_submit_work), so deadlocks in
block device stack can happen.
Note that we can't call blk_schedule_flush_plug if tsk_is_pi_blocked
returns true - that would cause
BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on)) in
task_blocks_on_rt_mutex when flush_current_bio_list attempts to take a
spinlock.
So the proper fix is to call blk_schedule_flush_plug in rt_mutex_fastlock,
when fast acquire failed and when the task is about to block.
CC: stable-rt@vger.kernel.org
[bigeasy: The deadlock is not device-mapper specific, it can also occur
in plain EXT4]
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
161/252 [
Author: Scott Wood
Email: swood@redhat.com
Subject: locking/rt-mutex: Flush block plug on __down_read()
Date: Fri, 4 Jan 2019 15:33:21 -0500
__down_read() bypasses the rtmutex frontend to call
rt_mutex_slowlock_locked() directly, and thus it needs to call
blk_schedule_flush_flug() itself.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
162/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking/rtmutex: re-init the wait_lock in rt_mutex_init_proxy_locked()
Date: Thu, 16 Nov 2017 16:48:48 +0100
We could provide a key-class for the lockdep (and fixup all callers) or
move the init to all callers (like it was) in order to avoid lockdep
seeing a double-lock of the wait_lock.
Reported-by: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
163/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ptrace: fix ptrace vs tasklist_lock race
Date: Thu, 29 Aug 2013 18:21:04 +0200
As explained by Alexander Fyodorov <halcy@yandex.ru>:
|read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel,
|and it can remove __TASK_TRACED from task->state (by moving it to
|task->saved_state). If parent does wait() on child followed by a sys_ptrace
|call, the following race can happen:
|
|- child sets __TASK_TRACED in ptrace_stop()
|- parent does wait() which eventually calls wait_task_stopped() and returns
| child's pid
|- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves
| __TASK_TRACED flag to saved_state
|- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive()
The patch is based on his initial patch where an additional check is
added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is
taken in case the caller is interrupted between looking into ->state and
->saved_state.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
164/252 [
Author: Scott Wood
Email: swood@redhat.com
Subject: sched: __set_cpus_allowed_ptr(): Check cpus_mask, not cpus_ptr
Date: Sat, 27 Jul 2019 00:56:32 -0500
This function is concerned with the long-term cpu mask, not the
transitory mask the task might have while migrate disabled. Before
this patch, if a task was migrate disabled at the time
__set_cpus_allowed_ptr() was called, and the new mask happened to be
equal to the cpu that the task was running on, then the mask update
would be lost.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
165/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: kernel/sched/core: add migrate_disable()
Date: Sat, 27 May 2017 19:02:06 +0200
[bristot@redhat.com: rt: Increase/decrease the nr of migratory tasks when enabling/disabling migration
Link: https://lkml.kernel.org/r/e981d271cbeca975bca710e2fbcc6078c09741b0.1498482127.git.bristot@redhat.com
]
[swood@redhat.com: fixups and optimisations
Link:https://lkml.kernel.org/r/20190727055638.20443-1-swood@redhat.com
Link:https://lkml.kernel.org/r/20191012065214.28109-1-swood@redhat.com
]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
166/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: sched/core: migrate_enable() must access takedown_cpu_task on !HOTPLUG_CPU
Date: Fri, 29 Nov 2019 17:24:55 +0100
The variable takedown_cpu_task is never declared/used on !HOTPLUG_CPU
except for migrate_enable(). This leads to a link error.
Don't use takedown_cpu_task in !HOTPLUG_CPU.
Reported-by: Dick Hollenbeck <dick@softplc.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
167/252 [
Author: Scott Wood
Email: swood@redhat.com
Subject: sched: migrate_enable: Use stop_one_cpu_nowait()
Date: Sat, 12 Oct 2019 01:52:14 -0500
migrate_enable() can be called with current->state != TASK_RUNNING.
Avoid clobbering the existing state by using stop_one_cpu_nowait().
Since we're stopping the current cpu, we know that we won't get
past __schedule() until migration_cpu_stop() has run (at least up to
the point of migrating us to another cpu).
Signed-off-by: Scott Wood <swood@redhat.com>
[bigeasy: spin until the request has been processed]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
168/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: trace: Add migrate-disabled counter to tracing output
Date: Sun, 17 Jul 2011 21:56:42 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
169/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: futex: workaround migrate_disable/enable in different context
Date: Wed, 8 Mar 2017 14:23:35 +0100
migrate_enable() invokes __schedule() and it expects a preempt count of one.
Holding a raw_spinlock_t with disabled interrupts should not allow scheduling.
These little hacks ensure that we don't schedule while we lock the hb lockwith
interrupts enabled and unlock it with interrupts disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[XXX: As per PeterZ suggesstion
set_thread_flag(TIF_NEED_RESCHED); preempt_fold_need_resched()
would trigger a scheduler invocation on the last preempt_enable() which in
turn would allow to drop this.
]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
170/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: don't check for __LINUX_SPINLOCK_TYPES_H on -RT archs
Date: Fri, 4 Aug 2017 17:40:42 +0200
Upstream uses arch_spinlock_t within spinlock_t and requests that
spinlock_types.h header file is included first.
On -RT we have the rt_mutex with its raw_lock wait_lock which needs
architectures' spinlock_types.h header file for its definition. However
we need rt_mutex first because it is used to build the spinlock_t so
that check does not work for us.
Therefore I am dropping that check.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
171/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: locking: Make spinlock_t and rwlock_t a RCU section on RT
Date: Tue, 19 Nov 2019 09:25:04 +0100
On !RT a locked spinlock_t and rwlock_t disables preemption which
implies a RCU read section. There is code that relies on that behaviour.
Add an explicit RCU read section on RT while a sleeping lock (a lock
which would disables preemption on !RT) acquired.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
172/252 [
Author: Scott Wood
Email: swood@redhat.com
Subject: rcu: Use rcuc threads on PREEMPT_RT as we did
Date: Wed, 11 Sep 2019 17:57:28 +0100
While switching to the reworked RCU-thread code, it has been forgotten
to enable the thread processing on -RT.
Besides restoring behavior that used to be default on RT, this avoids
a deadlock on scheduler locks.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
173/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: srcu: replace local_irqsave() with a locallock
Date: Thu, 12 Oct 2017 18:37:12 +0200
There are two instances which disable interrupts in order to become a
stable this_cpu_ptr() pointer. The restore part is coupled with
spin_unlock_irqrestore() which does not work on RT.
Replace the local_irq_save() call with the appropriate local_lock()
version of it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
174/252 [
Author: Julia Cartwright
Email: julia@ni.com
Subject: rcu: enable rcu_normal_after_boot by default for RT
Date: Wed, 12 Oct 2016 11:21:14 -0500
The forcing of an expedited grace period is an expensive and very
RT-application unfriendly operation, as it forcibly preempts all running
tasks on CPUs which are preventing the gp from expiring.
By default, as a policy decision, disable the expediting of grace
periods (after boot) on configurations which enable PREEMPT_RT.
Suggested-by: Luiz Capitulino <lcapitulino@redhat.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
175/252 [
Author: Scott Wood
Email: swood@redhat.com
Subject: rcutorture: Avoid problematic critical section nesting on RT
Date: Wed, 11 Sep 2019 17:57:29 +0100
rcutorture was generating some nesting scenarios that are not
reasonable. Constrain the state selection to avoid them.
Example #1:
1. preempt_disable()
2. local_bh_disable()
3. preempt_enable()
4. local_bh_enable()
On PREEMPT_RT, BH disabling takes a local lock only when called in
non-atomic context. Thus, atomic context must be retained until after BH
is re-enabled. Likewise, if BH is initially disabled in non-atomic
context, it cannot be re-enabled in atomic context.
Example #2:
1. rcu_read_lock()
2. local_irq_disable()
3. rcu_read_unlock()
4. local_irq_enable()
If the thread is preempted between steps 1 and 2,
rcu_read_unlock_special.b.blocked will be set, but it won't be
acted on in step 3 because IRQs are disabled. Thus, reporting of the
quiescent state will be delayed beyond the local_irq_enable().
For now, these scenarios will continue to be tested on non-PREEMPT_RT
kernels, until debug checks are added to ensure that they are not
happening elsewhere.
Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
176/252 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: rt: Improve the serial console PASS_LIMIT
Date: Wed, 14 Dec 2011 13:05:54 +0100
Beyond the warning:
drivers/tty/serial/8250/8250.c:1613:6: warning: unused variable ‘pass_counter’ [-Wunused-variable]
the solution of just looping infinitely was ugly - up it to 1 million to
give it a chance to continue in some really ugly situation.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
177/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs/epoll: Do not disable preemption on RT
Date: Fri, 8 Jul 2011 16:35:35 +0200
ep_call_nested() takes a sleeping lock so we can't disable preemption.
The light version is enough since ep_call_nested() doesn't mind beeing
invoked twice on the same CPU.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
178/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/vmalloc: Another preempt disable region which sucks
Date: Tue, 12 Jul 2011 11:39:36 +0200
Avoid the preempt disable version of get_cpu_var(). The inner-lock should
provide enough serialisation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
179/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block/mq: do not invoke preempt_disable()
Date: Tue, 14 Jul 2015 14:26:34 +0200
preempt_disable() and get_cpu() don't play well together with the sleeping
locks it tries to allocate later.
It seems to be enough to replace it with get_cpu_light() and migrate_disable().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
180/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: block/mq: don't complete requests via IPI
Date: Thu, 29 Jan 2015 15:10:08 +0100
The IPI runs in hardirq context and there are sleeping locks. Assume caches are
shared and complete them on the local CPU.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
181/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: md: raid5: Make raid5_percpu handling RT aware
Date: Tue, 6 Apr 2010 16:51:31 +0200
__raid_run_ops() disables preemption with get_cpu() around the access
to the raid5_percpu variables. That causes scheduling while atomic
spews on RT.
Serialize the access to the percpu data with a lock and keep the code
preemptible.
Reported-by: Udo van den Heuvel <udovdh@xs4all.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Udo van den Heuvel <udovdh@xs4all.nl>
]
182/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: scsi/fcoe: Make RT aware.
Date: Sat, 12 Nov 2011 14:00:48 +0100
Do not disable preemption while taking sleeping locks. All user look safe
for migrate_diable() only.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
183/252 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: sunrpc: Make svc_xprt_do_enqueue() use get_cpu_light()
Date: Wed, 18 Feb 2015 16:05:28 +0100
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
|in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd
|Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
|CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9
|Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
| ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002
| 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008
| ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000
|Call Trace:
| [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
| [<ffffffff81073c86>] __might_sleep+0xe6/0x150
| [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50
| [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc]
| [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
| [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc]
| [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc]
| [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd]
| [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd]
| [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0
| [<ffffffff8117f889>] SyS_write+0x49/0xb0
| [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
184/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: rt: Introduce cpu_chill()
Date: Wed, 7 Mar 2012 20:51:03 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
defaults to cpu_relax() for non RT. On RT it puts the looping task to
sleep for a tick so the preempted task can make progress.
Steven Rostedt changed it to use a hrtimer instead of msleep():
|
|Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
|up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
|called from softirq context, it may block the ksoftirqd() from running, in
|which case, it may never wake up the msleep() causing the deadlock.
+ bigeasy later changed to schedule_hrtimeout()
|If a task calls cpu_chill() and gets woken up by a regular or spurious
|wakeup and has a signal pending, then it exits the sleep loop in
|do_nanosleep() and sets up the restart block. If restart->nanosleep.type is
|not TI_NONE then this results in accessing a stale user pointer from a
|previously interrupted syscall and a copy to user based on the stale
|pointer or a BUG() when 'type' is not supported in nanosleep_copyout().
+ bigeasy: add PF_NOFREEZE:
| [....] Waiting for /dev to be fully populated...
| =====================================
| [ BUG: udevd/229 still has locks held! ]
| 3.12.11-rt17 #23 Not tainted
| -------------------------------------
| 1 lock held by udevd/229:
| #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
|
| stack backtrace:
| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
185/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: block: Use cpu_chill() for retry loops
Date: Thu, 20 Dec 2012 18:28:26 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Steven also observed a live lock when there was a
concurrent priority boosting going on.
Use cpu_chill() instead of cpu_relax() to let the system
make progress.
[bigeasy: After all those changes that occured over the years, this one hunk is
left and should not cause any starvation on -RT anymore]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
186/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: fs: namespace: Use cpu_chill() in trylock loops
Date: Wed, 7 Mar 2012 21:00:34 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
187/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Use cpu_chill() instead of cpu_relax()
Date: Wed, 7 Mar 2012 21:10:04 +0100
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
188/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: debugobjects: Make RT aware
Date: Sun, 17 Jul 2011 21:41:35 +0200
Avoid filling the pool / allocating memory with irqs off().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
189/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: net: Use skbufhead with raw lock
Date: Tue, 12 Jul 2011 15:38:34 +0200
Use the rps lock as rawlock so we can keep irq-off regions. It looks low
latency. However we can't kfree() from this context therefore we defer this
to the softirq and use the tofree_queue list for it (similar to process_queue).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
190/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: net: dev: always take qdisc's busylock in __dev_xmit_skb()
Date: Wed, 30 Mar 2016 13:36:29 +0200
The root-lock is dropped before dev_hard_start_xmit() is invoked and after
setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away
by a task with a higher priority then the task with the higher priority
won't be able to submit packets to the NIC directly instead they will be
enqueued into the Qdisc. The NIC will remain idle until the task(s) with
higher priority leave the CPU and the task with lower priority gets back
and finishes the job.
If we take always the busylock we ensure that the RT task can boost the
low-prio task and submit the packet.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
191/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: irqwork: push most work into softirq context
Date: Tue, 23 Jun 2015 15:32:51 +0200
Initially we defered all irqwork into softirq because we didn't want the
latency spikes if perf or another user was busy and delayed the RT task.
The NOHZ trigger (nohz_full_kick_work) was the first user that did not work
as expected if it did not run in the original irqwork context so we had to
bring it back somehow for it. push_irq_work_func is the second one that
requires this.
This patch adds the IRQ_WORK_HARD_IRQ which makes sure the callback runs
in raw-irq context. Everything else is defered into softirq context. Without
-RT we have the orignal behavior.
This patch incorporates tglx orignal work which revoked a little bringing back
the arch_irq_work_raise() if possible and a few fixes from Steven Rostedt and
Mike Galbraith,
[bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a
hard and soft variant]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
192/252 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: x86: crypto: Reduce preempt disabled regions
Date: Mon, 14 Nov 2011 18:19:27 +0100
Restrict the preempt disabled regions to the actual floating point
operations and enable preemption for the administrative actions.
This is necessary on RT to avoid that kfree and other operations are
called with preemption disabled.
Reported-and-tested-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
193/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: Reduce preempt disabled regions, more algos
Date: Fri, 21 Feb 2014 17:24:04 +0100
Don Estabrook reported
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
| kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200()
| kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100()
and his backtrace showed some crypto functions which looked fine.
The problem is the following sequence:
glue_xts_crypt_128bit()
{
blkcipher_walk_virt(); /* normal migrate_disable() */
glue_fpu_begin(); /* get atomic */
while (nbytes) {
__glue_xts_crypt_128bit();
blkcipher_walk_done(); /* with nbytes = 0, migrate_enable()
* while we are atomic */
};
glue_fpu_end() /* no longer atomic */
}
and this is why the counter get out of sync and the warning is printed.
The other problem is that we are non-preemptible between
glue_fpu_begin() and glue_fpu_end() and the latency grows. To fix this,
I shorten the FPU off region and ensure blkcipher_walk_done() is called
with preemption enabled. This might hurt the performance because we now
enable/disable the FPU state more often but we gain lower latency and
the bug is gone.
Reported-by: Don Estabrook <don.estabrook@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
194/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: limit more FPU-enabled sections
Date: Thu, 30 Nov 2017 13:40:10 +0100
Those crypto drivers use SSE/AVX/… for their crypto work and in order to
do so in kernel they need to enable the "FPU" in kernel mode which
disables preemption.
There are two problems with the way they are used:
- the while loop which processes X bytes may create latency spikes and
should be avoided or limited.
- the cipher-walk-next part may allocate/free memory and may use
kmap_atomic().
The whole kernel_fpu_begin()/end() processing isn't probably that cheap.
It most likely makes sense to process as much of those as possible in one
go. The new *_fpu_sched_rt() schedules only if a RT task is pending.
Probably we should measure the performance those ciphers in pure SW
mode and with this optimisations to see if it makes sense to keep them
for RT.
This kernel_fpu_resched() makes the code more preemptible which might hurt
performance.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
195/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: crypto: cryptd - add a lock instead preempt_disable/local_bh_disable
Date: Thu, 26 Jul 2018 18:52:00 +0200
cryptd has a per-CPU lock which protected with local_bh_disable() and
preempt_disable().
Add an explicit spin_lock to make the locking context more obvious and
visible to lockdep. Since it is a per-CPU lock, there should be no lock
contention on the actual spinlock.
There is a small race-window where we could be migrated to another CPU
after the cpu_queue has been obtain. This is not a problem because the
actual ressource is protected by the spinlock.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
196/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: panic: skip get_random_bytes for RT_FULL in init_oops_id
Date: Tue, 14 Jul 2015 14:26:34 +0200
Disable on -RT. If this is invoked from irq-context we will have problems
to acquire the sleeping lock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
197/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: stackprotector: Avoid random pool on rt
Date: Thu, 16 Dec 2010 14:25:18 +0100
CPU bringup calls into the random pool to initialize the stack
canary. During boot that works nicely even on RT as the might sleep
checks are disabled. During CPU hotplug the might sleep checks
trigger. Making the locks in random raw is a major PITA, so avoid the
call on RT is the only sensible solution. This is basically the same
randomness which we get during boot where the random pool has no
entropy and we rely on the TSC randomnness.
Reported-by: Carsten Emde <carsten.emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
198/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: random: Make it work on rt
Date: Tue, 21 Aug 2012 20:38:50 +0200
Delegate the random insertion to the forced threaded interrupt
handler. Store the return IP of the hard interrupt handler in the irq
descriptor and feed it into the random generator as a source of
entropy.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
199/252 [
Author: Priyanka Jain
Email: Priyanka.Jain@freescale.com
Subject: net: Remove preemption disabling in netif_rx()
Date: Thu, 17 May 2012 09:35:11 +0530
1)enqueue_to_backlog() (called from netif_rx) should be
bind to a particluar CPU. This can be achieved by
disabling migration. No need to disable preemption
2)Fixes crash "BUG: scheduling while atomic: ksoftirqd"
in case of RT.
If preemption is disabled, enqueue_to_backog() is called
in atomic context. And if backlog exceeds its count,
kfree_skb() is called. But in RT, kfree_skb() might
gets scheduled out, so it expects non atomic context.
3)When CONFIG_PREEMPT_RT is not defined,
migrate_enable(), migrate_disable() maps to
preempt_enable() and preempt_disable(), so no
change in functionality in case of non-RT.
-Replace preempt_enable(), preempt_disable() with
migrate_enable(), migrate_disable() respectively
-Replace get_cpu(), put_cpu() with get_cpu_light(),
put_cpu_light() respectively
Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com>
Cc: <rostedt@goodmis.orgn>
Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
200/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: lockdep: Make it RT aware
Date: Sun, 17 Jul 2011 18:51:23 +0200
teach lockdep that we don't really do softirqs on -RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
201/252 [
Author: Yong Zhang
Email: yong.zhang@windriver.com
Subject: lockdep: selftest: Only do hardirq context test for raw spinlock
Date: Mon, 16 Apr 2012 15:01:56 +0800
On -rt there is no softirq context any more and rwlock is sleepable,
disable softirq context test and rwlock+irq test.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Yong Zhang <yong.zhang@windriver.com>
Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
202/252 [
Author: Josh Cartwright
Email: josh.cartwright@ni.com
Subject: lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionals
Date: Wed, 28 Jan 2015 13:08:45 -0600
"lockdep: Selftest: Only do hardirq context test for raw spinlock"
disabled the execution of certain tests with PREEMPT_RT, but did
not prevent the tests from still being defined. This leads to warnings
like:
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function]
./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function]
...
Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT
conditionals.
Signed-off-by: Josh Cartwright <josh.cartwright@ni.com>
Signed-off-by: Xander Huff <xander.huff@ni.com>
Acked-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
203/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: lockdep: disable self-test
Date: Tue, 17 Oct 2017 16:36:18 +0200
The self-test wasn't always 100% accurate for RT. We disabled a few
tests which failed because they had a different semantic for RT. Some
still reported false positives. Now the selftest locks up the system
during boot and it needs to be investigated…
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
204/252 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm,radeon,i915: Use preempt_disable/enable_rt() where recommended
Date: Sat, 27 Feb 2016 08:09:11 +0100
DRM folks identified the spots, so use them.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
205/252 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drm,i915: Use local_lock/unlock_irq() in intel_pipe_update_start/end()
Date: Sat, 27 Feb 2016 09:01:42 +0100
[ 8.014039] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:918
[ 8.014041] in_atomic(): 0, irqs_disabled(): 1, pid: 78, name: kworker/u4:4
[ 8.014045] CPU: 1 PID: 78 Comm: kworker/u4:4 Not tainted 4.1.7-rt7 #5
[ 8.014055] Workqueue: events_unbound async_run_entry_fn
[ 8.014059] 0000000000000000 ffff880037153748 ffffffff815f32c9 0000000000000002
[ 8.014063] ffff88013a50e380 ffff880037153768 ffffffff815ef075 ffff8800372c06c8
[ 8.014066] ffff8800372c06c8 ffff880037153778 ffffffff8107c0b3 ffff880037153798
[ 8.014067] Call Trace:
[ 8.014074] [<ffffffff815f32c9>] dump_stack+0x4a/0x61
[ 8.014078] [<ffffffff815ef075>] ___might_sleep.part.93+0xe9/0xee
[ 8.014082] [<ffffffff8107c0b3>] ___might_sleep+0x53/0x80
[ 8.014086] [<ffffffff815f9064>] rt_spin_lock+0x24/0x50
[ 8.014090] [<ffffffff8109368b>] prepare_to_wait+0x2b/0xa0
[ 8.014152] [<ffffffffa016c04c>] intel_pipe_update_start+0x17c/0x300 [i915]
[ 8.014156] [<ffffffff81093b40>] ? prepare_to_wait_event+0x120/0x120
[ 8.014201] [<ffffffffa0158f36>] intel_begin_crtc_commit+0x166/0x1e0 [i915]
[ 8.014215] [<ffffffffa00c806d>] drm_atomic_helper_commit_planes+0x5d/0x1a0 [drm_kms_helper]
[ 8.014260] [<ffffffffa0171e9b>] intel_atomic_commit+0xab/0xf0 [i915]
[ 8.014288] [<ffffffffa00654c7>] drm_atomic_commit+0x37/0x60 [drm]
[ 8.014298] [<ffffffffa00c6fcd>] drm_atomic_helper_plane_set_property+0x8d/0xd0 [drm_kms_helper]
[ 8.014301] [<ffffffff815f77d9>] ? __ww_mutex_lock+0x39/0x40
[ 8.014319] [<ffffffffa0053b3d>] drm_mode_plane_set_obj_prop+0x2d/0x90 [drm]
[ 8.014328] [<ffffffffa00c8edb>] restore_fbdev_mode+0x6b/0xf0 [drm_kms_helper]
[ 8.014337] [<ffffffffa00cae49>] drm_fb_helper_restore_fbdev_mode_unlocked+0x29/0x80 [drm_kms_helper]
[ 8.014346] [<ffffffffa00caec2>] drm_fb_helper_set_par+0x22/0x50 [drm_kms_helper]
[ 8.014390] [<ffffffffa016dfba>] intel_fbdev_set_par+0x1a/0x60 [i915]
[ 8.014394] [<ffffffff81327dc4>] fbcon_init+0x4f4/0x580
[ 8.014398] [<ffffffff8139ef4c>] visual_init+0xbc/0x120
[ 8.014401] [<ffffffff813a1623>] do_bind_con_driver+0x163/0x330
[ 8.014405] [<ffffffff813a1b2c>] do_take_over_console+0x11c/0x1c0
[ 8.014408] [<ffffffff813236e3>] do_fbcon_takeover+0x63/0xd0
[ 8.014410] [<ffffffff81328965>] fbcon_event_notify+0x785/0x8d0
[ 8.014413] [<ffffffff8107c12d>] ? __might_sleep+0x4d/0x90
[ 8.014416] [<ffffffff810775fe>] notifier_call_chain+0x4e/0x80
[ 8.014419] [<ffffffff810779cd>] __blocking_notifier_call_chain+0x4d/0x70
[ 8.014422] [<ffffffff81077a06>] blocking_notifier_call_chain+0x16/0x20
[ 8.014425] [<ffffffff8132b48b>] fb_notifier_call_chain+0x1b/0x20
[ 8.014428] [<ffffffff8132d8fa>] register_framebuffer+0x21a/0x350
[ 8.014439] [<ffffffffa00cb164>] drm_fb_helper_initial_config+0x274/0x3e0 [drm_kms_helper]
[ 8.014483] [<ffffffffa016f1cb>] intel_fbdev_initial_config+0x1b/0x20 [i915]
[ 8.014486] [<ffffffff8107912c>] async_run_entry_fn+0x4c/0x160
[ 8.014490] [<ffffffff81070ffa>] process_one_work+0x14a/0x470
[ 8.014493] [<ffffffff81071489>] worker_thread+0x169/0x4c0
[ 8.014496] [<ffffffff81071320>] ? process_one_work+0x470/0x470
[ 8.014499] [<ffffffff81076606>] kthread+0xc6/0xe0
[ 8.014502] [<ffffffff81070000>] ? queue_work_on+0x80/0x110
[ 8.014506] [<ffffffff81076540>] ? kthread_worker_fn+0x1c0/0x1c0
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
206/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: disable tracing on -RT
Date: Thu, 6 Dec 2018 09:52:20 +0100
Luca Abeni reported this:
| BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
| CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
| Call Trace:
| rt_spin_lock+0x3f/0x50
| gen6_read32+0x45/0x1d0 [i915]
| g4x_get_vblank_counter+0x36/0x40 [i915]
| trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
The tracing events use trace_i915_pipe_update_start() among other events
use functions acquire spin locks. A few trace points use
intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
might acquire a sleeping lock.
Based on this I don't see any other way than disable trace points on RT.
Cc: stable-rt@vger.kernel.org
Reported-by: Luca Abeni <lucabe72@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
207/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE
Date: Wed, 19 Dec 2018 10:47:02 +0100
The order of the header files is important. If this header file is
included after tracepoint.h was included then the NOTRACE here becomes a
nop. Currently this happens for two .c files which use the tracepoitns
behind DRM_I915_LOW_LEVEL_TRACEPOINTS.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
208/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Don't disable interrupts for intel_engine_breadcrumbs_irq()
Date: Thu, 26 Sep 2019 12:29:05 +0200
The function intel_engine_breadcrumbs_irq() is always invoked from an interrupt
handler and for that reason it invokes (as an optimisation) only spin_lock()
for locking assuming that the interrupts are already disabled. The
function intel_engine_signal_breadcrumbs() is provided to disable
interrupts while the former function is invoked so that assumption is
also true for callers from preemptible context.
On PREEMPT_RT local_irq_disable() really disables interrupts and this
forbids to invoke spin_lock() which becomes a sleeping spinlock.
This is also problematic with `threadirqs' in conjunction with
irq_work. With force threading the interrupt handler, the handler is
invoked with disabled BH but with interrupts enabled. This is okay and
the lock itself is never acquired in IRQ context. This changes with
irq_work (signal_irq_work()) which _still_ invokes
intel_engine_breadcrumbs_irq() from IRQ context. Lockdep should see this
and complain.
Acquire the locks in intel_engine_breadcrumbs_irq() with _irqsave()
suffix and let all callers invoke intel_engine_breadcrumbs_irq()
directly instead using intel_engine_signal_breadcrumbs().
Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
209/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: drm/i915: Drop the IRQ-off asserts
Date: Thu, 26 Sep 2019 12:30:21 +0200
The lockdep_assert_irqs_disabled() check is needless. The previous
lockdep_assert_held() check ensures that the lock is acquired and while
the lock is acquired lockdep also prints a warning if the interrupts are
not disabled if they have to be.
These IRQ-off asserts trigger on PREEMPT_RT because the locks become
sleeping locks and do not really disable interrupts.
Remove lockdep_assert_irqs_disabled().
Reported-by: Clark Williams <williams@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
210/252 [
Author: Mike Galbraith
Email: efault@gmx.de
Subject: cpuset: Convert callback_lock to raw_spinlock_t
Date: Sun, 8 Jan 2017 09:32:25 +0100
The two commits below add up to a cpuset might_sleep() splat for RT:
8447a0fee974 cpuset: convert callback_mutex to a spinlock
344736f29b35 cpuset: simplify cpuset_node_allowed API
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995
in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset
CPU: 135 PID: 11718 Comm: cset Tainted: G E 4.10.0-rt1-rt #4
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
Call Trace:
? dump_stack+0x5c/0x81
? ___might_sleep+0xf4/0x170
? rt_spin_lock+0x1c/0x50
? __cpuset_node_allowed+0x66/0xc0
? ___slab_alloc+0x390/0x570 <disables IRQs>
? anon_vma_fork+0x8f/0x140
? copy_page_range+0x6cf/0xb00
? anon_vma_fork+0x8f/0x140
? __slab_alloc.isra.74+0x5a/0x81
? anon_vma_fork+0x8f/0x140
? kmem_cache_alloc+0x1b5/0x1f0
? anon_vma_fork+0x8f/0x140
? copy_process.part.35+0x1670/0x1ee0
? _do_fork+0xdd/0x3f0
? _do_fork+0xdd/0x3f0
? do_syscall_64+0x61/0x170
? entry_SYSCALL64_slow_path+0x25/0x25
The later ensured that a NUMA box WILL take callback_lock in atomic
context by removing the allocator and reclaim path __GFP_HARDWALL
usage which prevented such contexts from taking callback_mutex.
One option would be to reinstate __GFP_HARDWALL protections for
RT, however, as the 8447a0fee974 changelog states:
The callback_mutex is only used to synchronize reads/updates of cpusets'
flags and cpu/node masks. These operations should always proceed fast so
there's no reason why we can't use a spinlock instead of the mutex.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
211/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: apparmor: use a locallock instead preempt_disable()
Date: Wed, 11 Oct 2017 17:43:49 +0200
get_buffers() disables preemption which acts as a lock for the per-CPU
variable. Since we can't disable preemption here on RT, a local_lock is
lock is used in order to remain on the same CPU and not to have more
than one user within the critical section.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
212/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Allow to enable RT
Date: Wed, 7 Aug 2019 18:15:38 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
213/252 [
Author: Peter Zijlstra
Email: peterz@infradead.org
Subject: mm, rt: kmap_atomic scheduling
Date: Thu, 28 Jul 2011 10:43:51 +0200
In fact, with migrate_disable() existing one could play games with
kmap_atomic. You could save/restore the kmap_atomic slots on context
switch (if there are any in use of course), this should be esp easy now
that we have a kmap_atomic stack.
Something like the below.. it wants replacing all the preempt_disable()
stuff with pagefault_disable() && migrate_disable() of course, but then
you can flip kmaps around like below.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[dvhart@linux.intel.com: build fix]
Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins
[tglx@linutronix.de: Get rid of the per cpu variable and store the idx
and the pte content right away in the task struct.
Shortens the context switch code. ]
]
214/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86/highmem: Add a "already used pte" check
Date: Mon, 11 Mar 2013 17:09:55 +0100
This is a copy from kmap_atomic_prot().
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
215/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm/highmem: Flush tlb on unmap
Date: Mon, 11 Mar 2013 21:37:27 +0100
The tlb should be flushed on unmap and thus make the mapping entry
invalid. This is only done in the non-debug case which does not look
right.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
216/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Enable highmem for rt
Date: Wed, 13 Feb 2013 11:03:11 +0100
fixup highmem for ARM.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
217/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mm/scatterlist: Do not disable irqs on RT
Date: Fri, 3 Jul 2009 08:44:34 -0500
For -RT it is enough to keep pagefault disabled (which is currently handled by
kmap_atomic()).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
218/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: sched: Add support for lazy preemption
Date: Fri, 26 Oct 2012 18:50:54 +0100
It has become an obsession to mitigate the determinism vs. throughput
loss of RT. Looking at the mainline semantics of preemption points
gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
tasks. One major issue is the wakeup of tasks which are right away
preempting the waking task while the waking task holds a lock on which
the woken task will block right after having preempted the wakee. In
mainline this is prevented due to the implicit preemption disable of
spin/rw_lock held regions. On RT this is not possible due to the fully
preemptible nature of sleeping spinlocks.
Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
is really not a correctness issue. RT folks are concerned about
SCHED_FIFO/RR tasks preemption and not about the purely fairness
driven SCHED_OTHER preemption latencies.
So I introduced a lazy preemption mechanism which only applies to
SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
existing preempt_count each tasks sports now a preempt_lazy_count
which is manipulated on lock acquiry and release. This is slightly
incorrect as for lazyness reasons I coupled this on
migrate_disable/enable so some other mechanisms get the same treatment
(e.g. get_cpu_light).
Now on the scheduler side instead of setting NEED_RESCHED this sets
NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
therefor allows to exit the waking task the lock held region before
the woken task preempts. That also works better for cross CPU wakeups
as the other side can stay in the adaptive spinning loop.
For RT class preemption there is no change. This simply sets
NEED_RESCHED and forgoes the lazy preemption counter.
Initial test do not expose any observable latency increasement, but
history shows that I've been proven wrong before :)
The lazy preemption mode is per default on, but with
CONFIG_SCHED_DEBUG enabled it can be disabled via:
# echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
and reenabled via
# echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
The test results so far are very machine and workload dependent, but
there is a clear trend that it enhances the non RT workload
performance.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
219/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: x86: Support for lazy preemption
Date: Thu, 1 Nov 2012 11:03:47 +0100
Implement the x86 pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
220/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: arm: Add support for lazy preemption
Date: Wed, 31 Oct 2012 12:04:11 +0100
Implement the arm pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
221/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Add support for lazy preemption
Date: Thu, 1 Nov 2012 10:14:11 +0100
Implement the powerpc pieces for lazy preempt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
222/252 [
Author: Anders Roxell
Email: anders.roxell@linaro.org
Subject: arch/arm64: Add lazy preempt support
Date: Thu, 14 May 2015 17:52:17 +0200
arm64 is missing support for PREEMPT_RT. The main feature which is
lacking is support for lazy preemption. The arch-specific entry code,
thread information structure definitions, and associated data tables
have to be extended to provide this support. Then the Kconfig file has
to be extended to indicate the support is available, and also to
indicate that support for full RT preemption is now available.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
]
223/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: jump-label: disable if stop_machine() is used
Date: Wed, 8 Jul 2015 17:14:48 +0200
Some architectures are using stop_machine() while switching the opcode which
leads to latency spikes.
The architectures which use stop_machine() atm:
- ARM stop machine
- s390 stop machine
The architecures which use other sorcery:
- MIPS
- X86
- powerpc
- sparc
- arm64
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bigeasy: only ARM for now]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
224/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: leds: trigger: disable CPU trigger on -RT
Date: Thu, 23 Jan 2014 14:45:59 +0100
as it triggers:
|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
|[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
|[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
|[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
|[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
|[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
|[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
|[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
|[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
|[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
|[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
|[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
225/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/omap: Make the locking RT aware
Date: Thu, 28 Jul 2011 13:32:57 +0200
The lock is a sleeping lock and local_irq_save() is not the
optimsation we are looking for. Redo it to make it work on -RT and
non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
226/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: tty/serial/pl011: Make the locking work on RT
Date: Tue, 8 Jan 2013 21:36:51 +0100
The lock is a sleeping lock and local_irq_save() is not the optimsation
we are looking for. Redo it to make it work on -RT and non-RT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
227/252 [
Author: Kurt Kanzenbach
Email: kurt@linutronix.de
Subject: tty: serial: pl011: explicitly initialize the flags variable
Date: Mon, 24 Sep 2018 10:29:01 +0200
Silence the following gcc warning:
drivers/tty/serial/amba-pl011.c: In function ‘pl011_console_write’:
./include/linux/spinlock.h:260:3: warning: ‘flags’ may be used uninitialized in this function [-Wmaybe-uninitialized]
_raw_spin_unlock_irqrestore(lock, flags); \
^~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/tty/serial/amba-pl011.c:2214:16: note: ‘flags’ was declared here
unsigned long flags;
^~~~~
The code is correct. Thus, initializing flags to zero doesn't change the
behavior and resolves the warning.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
228/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm: include definition for cpumask_t
Date: Thu, 22 Dec 2016 17:28:33 +0100
This definition gets pulled in by other files. With the (later) split of
RCU and spinlock.h it won't compile anymore.
The split is done in ("rbtree: don't include the rcu header").
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
229/252 [
Author: Yadi.hu
Email: yadi.hu@windriver.com
Subject: ARM: enable irq in translation/section permission fault handlers
Date: Wed, 10 Dec 2014 10:32:09 +0800
Probably happens on all ARM, with
CONFIG_PREEMPT_RT
CONFIG_DEBUG_ATOMIC_SLEEP
This simple program....
int main() {
*((char*)0xc0001000) = 0;
};
[ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
[ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
[ 512.743217] INFO: lockdep is turned off.
[ 512.743360] irq event stamp: 0
[ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
[ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
[ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
[ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
[ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
[ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
[ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
[ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
[ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
[ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
[ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
[ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
[ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
[ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)
Oxc0000000 belongs to kernel address space, user task can not be
allowed to access it. For above condition, correct result is that
test case should receive a “segment fault” and exits but not stacks.
the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in
prefetch/data abort handlers"),it deletes irq enable block in Data
abort assemble code and move them into page/breakpiont/alignment fault
handlers instead. But author does not enable irq in translation/section
permission fault handlers. ARM disables irq when it enters exception/
interrupt mode, if kernel doesn't enable irq, it would be still disabled
during translation/section permission fault.
We see the above splat because do_force_sig_info is still called with
IRQs off, and that code eventually does a:
spin_lock_irqsave(&t->sighand->siglock, flags);
As this is architecture independent code, and we've not seen any other
need for other arch to have the siglock converted to raw lock, we can
conclude that we should enable irq for ARM translation/section
permission exception.
Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
230/252 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: genirq: update irq_set_irqchip_state documentation
Date: Thu, 11 Feb 2016 11:54:00 -0600
On -rt kernels, the use of migrate_disable()/migrate_enable() is
sufficient to guarantee a task isn't moved to another CPU. Update the
irq_set_irqchip_state() documentation to reflect this.
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
231/252 [
Author: Josh Cartwright
Email: joshc@ni.com
Subject: KVM: arm/arm64: downgrade preempt_disable()d region to migrate_disable()
Date: Thu, 11 Feb 2016 11:54:01 -0600
kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating
the vgic and timer states to prevent the calling task from migrating to
another CPU. It does so to prevent the task from writing to the
incorrect per-CPU GIC distributor registers.
On -rt kernels, it's possible to maintain the same guarantee with the
use of migrate_{disable,enable}(), with the added benefit that the
migrate-disabled region is preemptible. Update
kvm_arch_vcpu_ioctl_run() to do so.
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Reported-by: Manish Jaggi <Manish.Jaggi@caviumnetworks.com>
Signed-off-by: Josh Cartwright <joshc@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
232/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm64: fpsimd: Delay freeing memory in fpsimd_flush_thread()
Date: Wed, 25 Jul 2018 14:02:38 +0200
fpsimd_flush_thread() invokes kfree() via sve_free() within a preempt disabled
section which is not working on -RT.
Delay freeing of memory until preemption is enabled again.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
233/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: arm: at91: do not disable/enable clocks in a row
Date: Wed, 9 Mar 2016 10:51:06 +0100
Currently the driver will disable the clock and enable it one line later
if it is switching from periodic mode into one shot.
This can be avoided and causes a needless warning on -RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
234/252 [
Author: Benedikt Spranger
Email: b.spranger@linutronix.de
Subject: clocksource: TCLIB: Allow higher clock rates for clock events
Date: Mon, 8 Mar 2010 18:57:04 +0100
As default the TCLIB uses the 32KiHz base clock rate for clock events.
Add a compile time selection to allow higher clock resulution.
(fixed up by Sami Pietikäinen <Sami.Pietikainen@wapice.com>)
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
235/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: x86: Enable RT also on 32bit
Date: Thu, 7 Nov 2019 17:49:20 +0100
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
236/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:29 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
237/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: ARM64: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:35 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
238/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/pseries/iommu: Use a locallock instead local_irq_save()
Date: Tue, 26 Mar 2019 18:31:54 +0100
The locallock protects the per-CPU variable tce_page. The function
attempts to allocate memory while tce_page is protected (by disabling
interrupts).
Use local_irq_save() instead of local_irq_disable().
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
239/252 [
Author: Bogdan Purcareata
Email: bogdan.purcareata@freescale.com
Subject: powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT
Date: Fri, 24 Apr 2015 15:53:13 +0000
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.
This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
240/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: powerpc: Disable highmem on RT
Date: Mon, 18 Jul 2011 17:08:34 +0200
The current highmem handling on -RT is not compatible and needs fixups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
241/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: powerpc/stackprotector: work around stack-guard init from atomic
Date: Tue, 26 Mar 2019 18:31:29 +0100
This is invoked from the secondary CPU in atomic context. On x86 we use
tsc instead. On Power we XOR it against mftb() so lets use stack address
as the initial value.
Cc: stable-rt@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
242/252 [
Author: Sebastian Andrzej Siewior
Email: bigeasy@linutronix.de
Subject: POWERPC: Allow to enable RT
Date: Fri, 11 Oct 2019 13:14:41 +0200
Allow to select RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
243/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: mips: Disable highmem on RT
Date: Mon, 18 Jul 2011 17:10:12 +0200
The current highmem handling on -RT is not compatible and needs fixups.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
244/252 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: connector/cn_proc: Protect send_msg() with a local lock on RT
Date: Sun, 16 Oct 2016 05:11:54 +0200
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
|in_atomic(): 1, irqs_disabled(): 0, pid: 31807, name: sleep
|Preemption disabled at:[<ffffffff8148019b>] proc_exit_connector+0xbb/0x140
|
|CPU: 4 PID: 31807 Comm: sleep Tainted: G W E 4.8.0-rt11-rt #106
|Call Trace:
| [<ffffffff813436cd>] dump_stack+0x65/0x88
| [<ffffffff8109c425>] ___might_sleep+0xf5/0x180
| [<ffffffff816406b0>] __rt_spin_lock+0x20/0x50
| [<ffffffff81640978>] rt_read_lock+0x28/0x30
| [<ffffffff8156e209>] netlink_broadcast_filtered+0x49/0x3f0
| [<ffffffff81522621>] ? __kmalloc_reserve.isra.33+0x31/0x90
| [<ffffffff8156e5cd>] netlink_broadcast+0x1d/0x20
| [<ffffffff8147f57a>] cn_netlink_send_mult+0x19a/0x1f0
| [<ffffffff8147f5eb>] cn_netlink_send+0x1b/0x20
| [<ffffffff814801d8>] proc_exit_connector+0xf8/0x140
| [<ffffffff81077f71>] do_exit+0x5d1/0xba0
| [<ffffffff810785cc>] do_group_exit+0x4c/0xc0
| [<ffffffff81078654>] SyS_exit_group+0x14/0x20
| [<ffffffff81640a72>] entry_SYSCALL_64_fastpath+0x1a/0xa4
Since ab8ed951080e ("connector: fix out-of-order cn_proc netlink message
delivery") which is v4.7-rc6.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
245/252 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drivers/block/zram: Replace bit spinlocks with rtmutex for -rt
Date: Thu, 31 Mar 2016 04:08:28 +0200
They're nondeterministic, and lead to ___might_sleep() splats in -rt.
OTOH, they're a lot less wasteful than an rtmutex per page.
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
246/252 [
Author: Mike Galbraith
Email: umgwanakikbuti@gmail.com
Subject: drivers/zram: Don't disable preemption in zcomp_stream_get/put()
Date: Thu, 20 Oct 2016 11:15:22 +0200
In v4.7, the driver switched to percpu compression streams, disabling
preemption via get/put_cpu_ptr(). Use a per-zcomp_strm lock here. We
also have to fix an lock order issue in zram_decompress_page() such
that zs_map_object() nests inside of zcomp_stream_put() as it does in
zram_bvec_write().
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
[bigeasy: get_locked_var() -> per zcomp_strm lock]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
247/252 [
Author: Julia Cartwright
Email: julia@ni.com
Subject: squashfs: make use of local lock in multi_cpu decompressor
Date: Mon, 7 May 2018 08:58:57 -0500
Currently, the squashfs multi_cpu decompressor makes use of
get_cpu_ptr()/put_cpu_ptr(), which unconditionally disable preemption
during decompression.
Because the workload is distributed across CPUs, all CPUs can observe a
very high wakeup latency, which has been seen to be as much as 8000us.
Convert this decompressor to make use of a local lock, which will allow
execution of the decompressor with preemption-enabled, but also ensure
concurrent accesses to the percpu compressor data on the local CPU will
be serialized.
Cc: stable-rt@vger.kernel.org
Reported-by: Alexander Stein <alexander.stein@systec-electronic.com>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
248/252 [
Author: Haris Okanovic
Email: haris.okanovic@ni.com
Subject: tpm_tis: fix stall after iowrite*()s
Date: Tue, 15 Aug 2017 15:13:08 -0500
ioread8() operations to TPM MMIO addresses can stall the cpu when
immediately following a sequence of iowrite*()'s to the same region.
For example, cyclitest measures ~400us latency spikes when a non-RT
usermode application communicates with an SPI-based TPM chip (Intel Atom
E3940 system, PREEMPT_RT kernel). The spikes are caused by a
stalling ioread8() operation following a sequence of 30+ iowrite8()s to
the same address. I believe this happens because the write sequence is
buffered (in cpu or somewhere along the bus), and gets flushed on the
first LOAD instruction (ioread*()) that follows.
The enclosed change appears to fix this issue: read the TPM chip's
access register (status code) after every iowrite*() operation to
amortize the cost of flushing data to chip across multiple instructions.
Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
]
249/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: signals: Allow rt tasks to cache one sigqueue struct
Date: Fri, 3 Jul 2009 08:44:56 -0500
To avoid allocation allow rt tasks to cache one sigqueue struct in
task struct.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
250/252 [
Author: Ingo Molnar
Email: mingo@elte.hu
Subject: genirq: Disable irqpoll on -rt
Date: Fri, 3 Jul 2009 08:29:57 -0500
Creates long latencies for no value
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
251/252 [
Author: Clark Williams
Email: williams@redhat.com
Subject: sysfs: Add /sys/kernel/realtime entry
Date: Sat, 30 Jul 2011 21:55:53 -0500
Add a /sys/kernel entry to indicate that the kernel is a
realtime kernel.
Clark says that he needs this for udev rules, udev needs to evaluate
if its a PREEMPT_RT kernel a few thousand times and parsing uname
output is too slow or so.
Are there better solutions? Should it exist and return 0 on !-rt?
Signed-off-by: Clark Williams <williams@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
]
252/252 [
Author: Thomas Gleixner
Email: tglx@linutronix.de
Subject: Add localversion for -RT release
Date: Fri, 8 Jul 2011 20:25:16 +0200
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
]
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|
|
Add pinctrl and GPIO driver support for Cedar Fork, Denverton,
Gemini Lake SoC, Ice Lake PCH and Lewisburg.
Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com>
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
|