aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/admin-guide
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r--Documentation/admin-guide/LSM/LoadPin.rst6
-rw-r--r--Documentation/admin-guide/LSM/SafeSetID.rst29
-rw-r--r--Documentation/admin-guide/LSM/Yama.rst7
-rw-r--r--Documentation/admin-guide/LSM/tomoyo.rst16
-rw-r--r--Documentation/admin-guide/README.rst147
-rw-r--r--Documentation/admin-guide/abi-obsolete.rst11
-rw-r--r--Documentation/admin-guide/abi-removed.rst5
-rw-r--r--Documentation/admin-guide/abi-stable.rst14
-rw-r--r--Documentation/admin-guide/abi-testing.rst20
-rw-r--r--Documentation/admin-guide/abi.rst11
-rw-r--r--Documentation/admin-guide/acpi/cppc_sysfs.rst8
-rw-r--r--Documentation/admin-guide/acpi/dsdt-override.rst13
-rw-r--r--Documentation/admin-guide/acpi/fan_performance_states.rst28
-rw-r--r--Documentation/admin-guide/acpi/index.rst1
-rw-r--r--Documentation/admin-guide/acpi/initrd_table_override.rst2
-rw-r--r--Documentation/admin-guide/acpi/ssdt-overlays.rst51
-rw-r--r--Documentation/admin-guide/auxdisplay/cfag12864b.rst2
-rw-r--r--Documentation/admin-guide/auxdisplay/ks0108.rst2
-rw-r--r--Documentation/admin-guide/bcache.rst36
-rw-r--r--Documentation/admin-guide/binderfs.rst15
-rw-r--r--Documentation/admin-guide/binfmt-misc.rst4
-rw-r--r--Documentation/admin-guide/blockdev/drbd/figures.rst4
-rw-r--r--Documentation/admin-guide/blockdev/drbd/index.rst2
-rw-r--r--Documentation/admin-guide/blockdev/drbd/peer-states-8.dot (renamed from Documentation/admin-guide/blockdev/drbd/node-states-8.dot)5
-rw-r--r--Documentation/admin-guide/blockdev/floppy.rst6
-rw-r--r--Documentation/admin-guide/blockdev/index.rst6
-rw-r--r--Documentation/admin-guide/blockdev/nbd.rst2
-rw-r--r--Documentation/admin-guide/blockdev/paride.rst388
-rw-r--r--Documentation/admin-guide/blockdev/ramdisk.rst66
-rw-r--r--Documentation/admin-guide/blockdev/zram.rst137
-rw-r--r--Documentation/admin-guide/bootconfig.rst129
-rw-r--r--Documentation/admin-guide/bug-bisect.rst2
-rw-r--r--Documentation/admin-guide/bug-hunting.rst53
-rw-r--r--Documentation/admin-guide/cgroup-v1/blkio-controller.rst155
-rw-r--r--Documentation/admin-guide/cgroup-v1/cgroups.rst2
-rw-r--r--Documentation/admin-guide/cgroup-v1/cpusets.rst4
-rw-r--r--Documentation/admin-guide/cgroup-v1/hugetlb.rst4
-rw-r--r--Documentation/admin-guide/cgroup-v1/index.rst1
-rw-r--r--Documentation/admin-guide/cgroup-v1/memcg_test.rst27
-rw-r--r--Documentation/admin-guide/cgroup-v1/memory.rst420
-rw-r--r--Documentation/admin-guide/cgroup-v1/misc.rst4
-rw-r--r--Documentation/admin-guide/cgroup-v1/rdma.rst2
-rw-r--r--Documentation/admin-guide/cgroup-v2.rst857
-rw-r--r--Documentation/admin-guide/cifs/authors.rst6
-rw-r--r--Documentation/admin-guide/cifs/changes.rst5
-rw-r--r--Documentation/admin-guide/cifs/introduction.rst30
-rw-r--r--Documentation/admin-guide/cifs/todo.rst66
-rw-r--r--Documentation/admin-guide/cifs/usage.rst47
-rwxr-xr-xDocumentation/admin-guide/cifs/winucase_convert.pl2
-rw-r--r--Documentation/admin-guide/cpu-load.rst67
-rw-r--r--Documentation/admin-guide/cputopology.rst124
-rw-r--r--Documentation/admin-guide/dell_rbu.rst2
-rw-r--r--Documentation/admin-guide/device-mapper/cache-policies.rst2
-rw-r--r--Documentation/admin-guide/device-mapper/dm-crypt.rst14
-rw-r--r--Documentation/admin-guide/device-mapper/dm-dust.rst32
-rw-r--r--Documentation/admin-guide/device-mapper/dm-ebs.rst51
-rw-r--r--Documentation/admin-guide/device-mapper/dm-flakey.rst14
-rw-r--r--Documentation/admin-guide/device-mapper/dm-ima.rst715
-rw-r--r--Documentation/admin-guide/device-mapper/dm-init.rst8
-rw-r--r--Documentation/admin-guide/device-mapper/dm-integrity.rst86
-rw-r--r--Documentation/admin-guide/device-mapper/dm-raid.rst4
-rw-r--r--Documentation/admin-guide/device-mapper/dm-zoned.rst70
-rw-r--r--Documentation/admin-guide/device-mapper/index.rst2
-rw-r--r--Documentation/admin-guide/device-mapper/unstriped.rst10
-rw-r--r--Documentation/admin-guide/device-mapper/verity.rst17
-rw-r--r--Documentation/admin-guide/device-mapper/writecache.rst45
-rw-r--r--Documentation/admin-guide/devices.rst9
-rw-r--r--Documentation/admin-guide/devices.txt65
-rw-r--r--Documentation/admin-guide/dynamic-debug-howto.rst283
-rw-r--r--Documentation/admin-guide/efi-stub.rst6
-rw-r--r--Documentation/admin-guide/ext4.rst33
-rw-r--r--Documentation/admin-guide/features.rst3
-rw-r--r--Documentation/admin-guide/filesystem-monitoring.rst78
-rw-r--r--Documentation/admin-guide/gpio/gpio-aggregator.rst111
-rw-r--r--Documentation/admin-guide/gpio/gpio-mockup.rst51
-rw-r--r--Documentation/admin-guide/gpio/gpio-sim.rst134
-rw-r--r--Documentation/admin-guide/gpio/index.rst3
-rw-r--r--Documentation/admin-guide/gpio/sysfs.rst2
-rw-r--r--Documentation/admin-guide/hw-vuln/core-scheduling.rst226
-rw-r--r--Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst91
-rw-r--r--Documentation/admin-guide/hw-vuln/gather_data_sampling.rst109
-rw-r--r--Documentation/admin-guide/hw-vuln/index.rst9
-rw-r--r--Documentation/admin-guide/hw-vuln/l1d_flush.rst69
-rw-r--r--Documentation/admin-guide/hw-vuln/l1tf.rst2
-rw-r--r--Documentation/admin-guide/hw-vuln/mds.rst40
-rw-r--r--Documentation/admin-guide/hw-vuln/multihit.rst4
-rw-r--r--Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst271
-rw-r--r--Documentation/admin-guide/hw-vuln/special-register-buffer-data-sampling.rst150
-rw-r--r--Documentation/admin-guide/hw-vuln/spectre.rst144
-rw-r--r--Documentation/admin-guide/hw-vuln/srso.rst160
-rw-r--r--Documentation/admin-guide/hw-vuln/tsx_async_abort.rst37
-rw-r--r--Documentation/admin-guide/hw_random.rst11
-rw-r--r--Documentation/admin-guide/index.rst26
-rw-r--r--Documentation/admin-guide/init.rst76
-rw-r--r--Documentation/admin-guide/initrd.rst2
-rw-r--r--Documentation/admin-guide/iostats.rst6
-rw-r--r--Documentation/admin-guide/kdump/gdbmacros.txt159
-rw-r--r--Documentation/admin-guide/kdump/kdump.rst230
-rw-r--r--Documentation/admin-guide/kdump/vmcoreinfo.rst254
-rw-r--r--Documentation/admin-guide/kernel-parameters.rst67
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt3295
-rw-r--r--Documentation/admin-guide/kernel-per-CPU-kthreads.rst44
-rw-r--r--Documentation/admin-guide/laptops/disk-shock-protection.rst2
-rw-r--r--Documentation/admin-guide/laptops/laptop-mode.rst11
-rw-r--r--Documentation/admin-guide/laptops/lg-laptop.rst6
-rw-r--r--Documentation/admin-guide/laptops/sonypi.rst2
-rw-r--r--Documentation/admin-guide/laptops/thinkpad-acpi.rst115
-rw-r--r--Documentation/admin-guide/lockup-watchdogs.rst4
-rw-r--r--Documentation/admin-guide/md.rst10
-rw-r--r--Documentation/admin-guide/media/au0828-cardlist.rst39
-rw-r--r--Documentation/admin-guide/media/avermedia.rst94
-rw-r--r--Documentation/admin-guide/media/bt8xx.rst157
-rw-r--r--Documentation/admin-guide/media/bttv-cardlist.rst683
-rw-r--r--Documentation/admin-guide/media/bttv.rst1762
-rw-r--r--Documentation/admin-guide/media/building.rst357
-rw-r--r--Documentation/admin-guide/media/cafe_ccic.rst62
-rw-r--r--Documentation/admin-guide/media/cardlist.rst29
-rw-r--r--Documentation/admin-guide/media/cec.rst380
-rw-r--r--Documentation/admin-guide/media/ci.rst77
-rw-r--r--Documentation/admin-guide/media/cx18-cardlist.rst17
-rw-r--r--Documentation/admin-guide/media/cx231xx-cardlist.rst99
-rw-r--r--Documentation/admin-guide/media/cx23885-cardlist.rst267
-rw-r--r--Documentation/admin-guide/media/cx88-cardlist.rst383
-rw-r--r--Documentation/admin-guide/media/cx88.rst58
-rw-r--r--Documentation/admin-guide/media/dvb-drivers.rst15
-rw-r--r--Documentation/admin-guide/media/dvb-usb-a800-cardlist.rst16
-rw-r--r--Documentation/admin-guide/media/dvb-usb-af9005-cardlist.rst20
-rw-r--r--Documentation/admin-guide/media/dvb-usb-af9015-cardlist.rst80
-rw-r--r--Documentation/admin-guide/media/dvb-usb-af9035-cardlist.rst74
-rw-r--r--Documentation/admin-guide/media/dvb-usb-anysee-cardlist.rst16
-rw-r--r--Documentation/admin-guide/media/dvb-usb-au6610-cardlist.rst16
-rw-r--r--Documentation/admin-guide/media/dvb-usb-az6007-cardlist.rst20
-rw-r--r--Documentation/admin-guide/media/dvb-usb-az6027-cardlist.rst24
-rw-r--r--Documentation/admin-guide/media/dvb-usb-ce6230-cardlist.rst18
-rw-r--r--Documentation/admin-guide/media/dvb-usb-cinergyT2-cardlist.rst16
-rw-r--r--Documentation/admin-guide/media/dvb-usb-cxusb-cardlist.rst40
-rw-r--r--Documentation/admin-guide/media/dvb-usb-dib0700-cardlist.rst162
-rw-r--r--Documentation/admin-guide/media/dvb-usb-dibusb-mb-cardlist.rst42
-rw-r--r--Documentation/admin-guide/media/dvb-usb-dibusb-mc-cardlist.rst30
-rw-r--r--Documentation/admin-guide/media/dvb-usb-digitv-cardlist.rst16
-rw-r--r--Documentation/admin-guide/media/dvb-usb-dtt200u-cardlist.rst22
-rw-r--r--Documentation/admin-guide/media/dvb-usb-dtv5100-cardlist.rst16
-rw-r--r--Documentation/admin-guide/media/dvb-usb-dvbsky-cardlist.rst42
-rw-r--r--Documentation/admin-guide/media/dvb-usb-dw2102-cardlist.rst56
-rw-r--r--Documentation/admin-guide/media/dvb-usb-ec168-cardlist.rst16
-rw-r--r--Documentation/admin-guide/media/dvb-usb-gl861-cardlist.rst20
-rw-r--r--Documentation/admin-guide/media/dvb-usb-gp8psk-cardlist.rst22
-rw-r--r--Documentation/admin-guide/media/dvb-usb-lmedm04-cardlist.rst20
-rw-r--r--Documentation/admin-guide/media/dvb-usb-m920x-cardlist.rst26
-rw-r--r--Documentation/admin-guide/media/dvb-usb-mxl111sf-cardlist.rst36
-rw-r--r--Documentation/admin-guide/media/dvb-usb-nova-t-usb2-cardlist.rst16
-rw-r--r--Documentation/admin-guide/media/dvb-usb-opera1-cardlist.rst16
-rw-r--r--Documentation/admin-guide/media/dvb-usb-pctv452e-cardlist.rst20
-rw-r--r--Documentation/admin-guide/media/dvb-usb-rtl28xxu-cardlist.rst80
-rw-r--r--Documentation/admin-guide/media/dvb-usb-technisat-usb2-cardlist.rst16
-rw-r--r--Documentation/admin-guide/media/dvb-usb-ttusb2-cardlist.rst24
-rw-r--r--Documentation/admin-guide/media/dvb-usb-umt-010-cardlist.rst16
-rw-r--r--Documentation/admin-guide/media/dvb-usb-vp702x-cardlist.rst16
-rw-r--r--Documentation/admin-guide/media/dvb-usb-vp7045-cardlist.rst18
-rw-r--r--Documentation/admin-guide/media/dvb-usb-zd1301-cardlist.rst16
-rw-r--r--Documentation/admin-guide/media/dvb.rst12
-rw-r--r--Documentation/admin-guide/media/dvb_intro.rst616
-rw-r--r--Documentation/admin-guide/media/dvb_references.rst29
-rw-r--r--Documentation/admin-guide/media/em28xx-cardlist.rst440
-rw-r--r--Documentation/admin-guide/media/faq.rst216
-rw-r--r--Documentation/admin-guide/media/fimc.rst153
-rw-r--r--Documentation/admin-guide/media/frontend-cardlist.rst226
-rw-r--r--Documentation/admin-guide/media/gspca-cardlist.rst451
-rw-r--r--Documentation/admin-guide/media/i2c-cardlist.rst288
-rw-r--r--Documentation/admin-guide/media/imx.rst714
-rw-r--r--Documentation/admin-guide/media/imx6q-sabreauto.dot51
-rw-r--r--Documentation/admin-guide/media/imx6q-sabresd.dot56
-rw-r--r--Documentation/admin-guide/media/imx7.rst221
-rw-r--r--Documentation/admin-guide/media/index.rst56
-rw-r--r--Documentation/admin-guide/media/intro.rst27
-rw-r--r--Documentation/admin-guide/media/ipu3.rst600
-rw-r--r--Documentation/admin-guide/media/ipu3_rcb.svg331
-rw-r--r--Documentation/admin-guide/media/ivtv-cardlist.rst139
-rw-r--r--Documentation/admin-guide/media/ivtv.rst218
-rw-r--r--Documentation/admin-guide/media/lmedm04.rst107
-rw-r--r--Documentation/admin-guide/media/mgb4.rst374
-rw-r--r--Documentation/admin-guide/media/misc-cardlist.rst28
-rw-r--r--Documentation/admin-guide/media/omap3isp.rst92
-rw-r--r--Documentation/admin-guide/media/omap4_camera.rst62
-rw-r--r--Documentation/admin-guide/media/opera-firmware.rst33
-rw-r--r--Documentation/admin-guide/media/other-usb-cardlist.rst78
-rw-r--r--Documentation/admin-guide/media/pci-cardlist.rst109
-rw-r--r--Documentation/admin-guide/media/philips.rst247
-rw-r--r--Documentation/admin-guide/media/platform-cardlist.rst89
-rw-r--r--Documentation/admin-guide/media/qcom_camss.rst185
-rw-r--r--Documentation/admin-guide/media/qcom_camss_8x96_graph.dot106
-rw-r--r--Documentation/admin-guide/media/qcom_camss_graph.dot43
-rw-r--r--Documentation/admin-guide/media/radio-cardlist.rst44
-rw-r--r--Documentation/admin-guide/media/rcar-fdp1.rst39
-rw-r--r--Documentation/admin-guide/media/remote-controller.rst76
-rw-r--r--Documentation/admin-guide/media/rkisp1.dot18
-rw-r--r--Documentation/admin-guide/media/rkisp1.rst197
-rw-r--r--Documentation/admin-guide/media/saa7134-cardlist.rst803
-rw-r--r--Documentation/admin-guide/media/saa7134.rst89
-rw-r--r--Documentation/admin-guide/media/saa7164-cardlist.rst71
-rw-r--r--Documentation/admin-guide/media/si470x.rst167
-rw-r--r--Documentation/admin-guide/media/si4713.rst192
-rw-r--r--Documentation/admin-guide/media/si476x.rst160
-rw-r--r--Documentation/admin-guide/media/siano-cardlist.rst56
-rw-r--r--Documentation/admin-guide/media/starfive_camss.rst72
-rw-r--r--Documentation/admin-guide/media/starfive_camss_graph.dot12
-rw-r--r--Documentation/admin-guide/media/technisat.rst100
-rw-r--r--Documentation/admin-guide/media/ttusb-dec.rst45
-rw-r--r--Documentation/admin-guide/media/tuner-cardlist.rst100
-rw-r--r--Documentation/admin-guide/media/usb-cardlist.rst149
-rw-r--r--Documentation/admin-guide/media/v4l-drivers.rst34
-rw-r--r--Documentation/admin-guide/media/vimc.dot26
-rw-r--r--Documentation/admin-guide/media/vimc.rst110
-rw-r--r--Documentation/admin-guide/media/visl.rst177
-rw-r--r--Documentation/admin-guide/media/vivid.rst1416
-rw-r--r--Documentation/admin-guide/media/zoran-cardlist.rst51
-rw-r--r--Documentation/admin-guide/mm/cma_debugfs.rst10
-rw-r--r--Documentation/admin-guide/mm/concepts.rst17
-rw-r--r--Documentation/admin-guide/mm/damon/index.rst17
-rw-r--r--Documentation/admin-guide/mm/damon/lru_sort.rst294
-rw-r--r--Documentation/admin-guide/mm/damon/reclaim.rst274
-rw-r--r--Documentation/admin-guide/mm/damon/start.rst127
-rw-r--r--Documentation/admin-guide/mm/damon/usage.rst911
-rw-r--r--Documentation/admin-guide/mm/hugetlbpage.rst109
-rw-r--r--Documentation/admin-guide/mm/idle_page_tracking.rst9
-rw-r--r--Documentation/admin-guide/mm/index.rst13
-rw-r--r--Documentation/admin-guide/mm/ksm.rst144
-rw-r--r--Documentation/admin-guide/mm/memory-hotplug.rst909
-rw-r--r--Documentation/admin-guide/mm/multigen_lru.rst162
-rw-r--r--Documentation/admin-guide/mm/nommu-mmap.rst283
-rw-r--r--Documentation/admin-guide/mm/numa_memory_policy.rst47
-rw-r--r--Documentation/admin-guide/mm/numaperf.rst22
-rw-r--r--Documentation/admin-guide/mm/pagemap.rst186
-rw-r--r--Documentation/admin-guide/mm/shrinker_debugfs.rst133
-rw-r--r--Documentation/admin-guide/mm/soft-dirty.rst2
-rw-r--r--Documentation/admin-guide/mm/swap_numa.rst78
-rw-r--r--Documentation/admin-guide/mm/transhuge.rst142
-rw-r--r--Documentation/admin-guide/mm/userfaultfd.rst382
-rw-r--r--Documentation/admin-guide/mm/zswap.rst178
-rw-r--r--Documentation/admin-guide/module-signing.rst21
-rw-r--r--Documentation/admin-guide/mono.rst4
-rw-r--r--Documentation/admin-guide/nfs/fault_injection.rst70
-rw-r--r--Documentation/admin-guide/nfs/index.rst1
-rw-r--r--Documentation/admin-guide/nfs/nfs-client.rst19
-rw-r--r--Documentation/admin-guide/nfs/nfs-rdma.rst2
-rw-r--r--Documentation/admin-guide/nfs/nfsroot.rst8
-rw-r--r--Documentation/admin-guide/nfs/pnfs-block-server.rst2
-rw-r--r--Documentation/admin-guide/nfs/pnfs-scsi-server.rst2
-rw-r--r--Documentation/admin-guide/numastat.rst31
-rw-r--r--Documentation/admin-guide/perf-security.rst155
-rw-r--r--Documentation/admin-guide/perf/alibaba_pmu.rst105
-rw-r--r--Documentation/admin-guide/perf/ampere_cspmu.rst29
-rw-r--r--Documentation/admin-guide/perf/arm-ccn.rst2
-rw-r--r--Documentation/admin-guide/perf/arm-cmn.rst65
-rw-r--r--Documentation/admin-guide/perf/cxl.rst68
-rw-r--r--Documentation/admin-guide/perf/dwc_pcie_pmu.rst94
-rw-r--r--Documentation/admin-guide/perf/hisi-pcie-pmu.rst130
-rw-r--r--Documentation/admin-guide/perf/hisi-pmu.rst66
-rw-r--r--Documentation/admin-guide/perf/hns3-pmu.rst136
-rw-r--r--Documentation/admin-guide/perf/imx-ddr.rst47
-rw-r--r--Documentation/admin-guide/perf/index.rst9
-rw-r--r--Documentation/admin-guide/perf/meson-ddr-pmu.rst70
-rw-r--r--Documentation/admin-guide/perf/nvidia-pmu.rst299
-rw-r--r--Documentation/admin-guide/pm/amd-pstate.rst720
-rw-r--r--Documentation/admin-guide/pm/cpufreq.rst17
-rw-r--r--Documentation/admin-guide/pm/cpuidle.rst127
-rw-r--r--Documentation/admin-guide/pm/intel-speed-select.rst939
-rw-r--r--Documentation/admin-guide/pm/intel_idle.rst33
-rw-r--r--Documentation/admin-guide/pm/intel_pstate.rst139
-rw-r--r--Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst115
-rw-r--r--Documentation/admin-guide/pm/working-state.rst3
-rw-r--r--Documentation/admin-guide/pmf.rst24
-rw-r--r--Documentation/admin-guide/pnp.rst4
-rw-r--r--Documentation/admin-guide/pstore-blk.rst234
-rw-r--r--Documentation/admin-guide/quickly-build-trimmed-linux.rst1097
-rw-r--r--Documentation/admin-guide/ramoops.rst22
-rw-r--r--Documentation/admin-guide/ras.rst30
-rw-r--r--Documentation/admin-guide/reporting-bugs.rst182
-rw-r--r--Documentation/admin-guide/reporting-issues.rst1764
-rw-r--r--Documentation/admin-guide/reporting-regressions.rst451
-rw-r--r--Documentation/admin-guide/security-bugs.rst89
-rw-r--r--Documentation/admin-guide/serial-console.rst38
-rw-r--r--Documentation/admin-guide/spkguide.txt1625
-rw-r--r--Documentation/admin-guide/svga.rst7
-rw-r--r--Documentation/admin-guide/syscall-user-dispatch.rst94
-rw-r--r--Documentation/admin-guide/sysctl/abi.rst73
-rw-r--r--Documentation/admin-guide/sysctl/fs.rst254
-rw-r--r--Documentation/admin-guide/sysctl/kernel.rst576
-rw-r--r--Documentation/admin-guide/sysctl/net.rst114
-rw-r--r--Documentation/admin-guide/sysctl/vm.rst183
-rw-r--r--Documentation/admin-guide/sysrq.rst41
-rw-r--r--Documentation/admin-guide/tainted-kernels.rst36
-rw-r--r--Documentation/admin-guide/thermal/index.rst8
-rw-r--r--Documentation/admin-guide/thermal/intel_powerclamp.rst345
-rw-r--r--Documentation/admin-guide/thunderbolt.rst63
-rw-r--r--Documentation/admin-guide/unicode.rst13
-rw-r--r--Documentation/admin-guide/wimax/i2400m.rst283
-rw-r--r--Documentation/admin-guide/wimax/index.rst19
-rw-r--r--Documentation/admin-guide/wimax/wimax.rst89
-rw-r--r--Documentation/admin-guide/workload-tracing.rst606
-rw-r--r--Documentation/admin-guide/xfs.rst91
301 files changed, 38353 insertions, 4796 deletions
diff --git a/Documentation/admin-guide/LSM/LoadPin.rst b/Documentation/admin-guide/LSM/LoadPin.rst
index 716ad9b23c9a..dd3ca68b5df1 100644
--- a/Documentation/admin-guide/LSM/LoadPin.rst
+++ b/Documentation/admin-guide/LSM/LoadPin.rst
@@ -11,8 +11,8 @@ restrictions without needing to sign the files individually.
The LSM is selectable at build-time with ``CONFIG_SECURITY_LOADPIN``, and
can be controlled at boot-time with the kernel command line option
-"``loadpin.enabled``". By default, it is enabled, but can be disabled at
-boot ("``loadpin.enabled=0``").
+"``loadpin.enforce``". By default, it is enabled, but can be disabled at
+boot ("``loadpin.enforce=0``").
LoadPin starts pinning when it sees the first file loaded. If the
block device backing the filesystem is not read-only, a sysctl is
@@ -28,4 +28,4 @@ different mechanisms such as ``CONFIG_MODULE_SIG`` and
``CONFIG_KEXEC_VERIFY_SIG`` to verify kernel module and kernel image while
still use LoadPin to protect the integrity of other files kernel loads. The
full list of valid file types can be found in ``kernel_read_file_str``
-defined in ``include/linux/fs.h``.
+defined in ``include/linux/kernel_read_file.h``.
diff --git a/Documentation/admin-guide/LSM/SafeSetID.rst b/Documentation/admin-guide/LSM/SafeSetID.rst
index 7bff07ce4fdd..0ec34863c674 100644
--- a/Documentation/admin-guide/LSM/SafeSetID.rst
+++ b/Documentation/admin-guide/LSM/SafeSetID.rst
@@ -3,9 +3,9 @@ SafeSetID
=========
SafeSetID is an LSM module that gates the setid family of syscalls to restrict
UID/GID transitions from a given UID/GID to only those approved by a
-system-wide whitelist. These restrictions also prohibit the given UIDs/GIDs
+system-wide allowlist. These restrictions also prohibit the given UIDs/GIDs
from obtaining auxiliary privileges associated with CAP_SET{U/G}ID, such as
-allowing a user to set up user namespace UID mappings.
+allowing a user to set up user namespace UID/GID mappings.
Background
@@ -98,10 +98,21 @@ Directions for use
==================
This LSM hooks the setid syscalls to make sure transitions are allowed if an
applicable restriction policy is in place. Policies are configured through
-securityfs by writing to the safesetid/add_whitelist_policy and
-safesetid/flush_whitelist_policies files at the location where securityfs is
-mounted. The format for adding a policy is '<UID>:<UID>', using literal
-numbers, such as '123:456'. To flush the policies, any write to the file is
-sufficient. Again, configuring a policy for a UID will prevent that UID from
-obtaining auxiliary setid privileges, such as allowing a user to set up user
-namespace UID mappings.
+securityfs by writing to the safesetid/uid_allowlist_policy and
+safesetid/gid_allowlist_policy files at the location where securityfs is
+mounted. The format for adding a policy is '<UID>:<UID>' or '<GID>:<GID>',
+using literal numbers, and ending with a newline character such as '123:456\n'.
+Writing an empty string "" will flush the policy. Again, configuring a policy
+for a UID/GID will prevent that UID/GID from obtaining auxiliary setid
+privileges, such as allowing a user to set up user namespace UID/GID mappings.
+
+Note on GID policies and setgroups()
+====================================
+In v5.9 we are adding support for limiting CAP_SETGID privileges as was done
+previously for CAP_SETUID. However, for compatibility with common sandboxing
+related code conventions in userspace, we currently allow arbitrary
+setgroups() calls for processes with CAP_SETGID restrictions. Until we add
+support in a future release for restricting setgroups() calls, these GID
+policies add no meaningful security. setgroups() restrictions will be enforced
+once we have the policy checking code in place, which will rely on GID policy
+configuration code added in v5.9.
diff --git a/Documentation/admin-guide/LSM/Yama.rst b/Documentation/admin-guide/LSM/Yama.rst
index d0a060de3973..d9cd937ebd2d 100644
--- a/Documentation/admin-guide/LSM/Yama.rst
+++ b/Documentation/admin-guide/LSM/Yama.rst
@@ -19,9 +19,10 @@ attach to other running processes (e.g. Firefox, SSH sessions, GPG agent,
etc) to extract additional credentials and continue to expand the scope
of their attack without resorting to user-assisted phishing.
-This is not a theoretical problem. SSH session hijacking
-(http://www.storm.net.nz/projects/7) and arbitrary code injection
-(http://c-skills.blogspot.com/2007/05/injectso.html) attacks already
+This is not a theoretical problem. `SSH session hijacking
+<https://www.blackhat.com/presentations/bh-usa-05/bh-us-05-boileau.pdf>`_
+and `arbitrary code injection
+<https://c-skills.blogspot.com/2007/05/injectso.html>`_ attacks already
exist and remain possible if ptrace is allowed to operate as before.
Since ptrace is not commonly used by non-developers and non-admins, system
builders should be allowed the option to disable this debugging system.
diff --git a/Documentation/admin-guide/LSM/tomoyo.rst b/Documentation/admin-guide/LSM/tomoyo.rst
index e2d6b6e15082..4bc9c2b4da6f 100644
--- a/Documentation/admin-guide/LSM/tomoyo.rst
+++ b/Documentation/admin-guide/LSM/tomoyo.rst
@@ -27,29 +27,29 @@ Where is documentation?
=======================
User <-> Kernel interface documentation is available at
-http://tomoyo.osdn.jp/2.5/policy-specification/index.html .
+https://tomoyo.osdn.jp/2.5/policy-specification/index.html .
Materials we prepared for seminars and symposiums are available at
-http://osdn.jp/projects/tomoyo/docs/?category_id=532&language_id=1 .
+https://osdn.jp/projects/tomoyo/docs/?category_id=532&language_id=1 .
Below lists are chosen from three aspects.
What is TOMOYO?
TOMOYO Linux Overview
- http://osdn.jp/projects/tomoyo/docs/lca2009-takeda.pdf
+ https://osdn.jp/projects/tomoyo/docs/lca2009-takeda.pdf
TOMOYO Linux: pragmatic and manageable security for Linux
- http://osdn.jp/projects/tomoyo/docs/freedomhectaipei-tomoyo.pdf
+ https://osdn.jp/projects/tomoyo/docs/freedomhectaipei-tomoyo.pdf
TOMOYO Linux: A Practical Method to Understand and Protect Your Own Linux Box
- http://osdn.jp/projects/tomoyo/docs/PacSec2007-en-no-demo.pdf
+ https://osdn.jp/projects/tomoyo/docs/PacSec2007-en-no-demo.pdf
What can TOMOYO do?
Deep inside TOMOYO Linux
- http://osdn.jp/projects/tomoyo/docs/lca2009-kumaneko.pdf
+ https://osdn.jp/projects/tomoyo/docs/lca2009-kumaneko.pdf
The role of "pathname based access control" in security.
- http://osdn.jp/projects/tomoyo/docs/lfj2008-bof.pdf
+ https://osdn.jp/projects/tomoyo/docs/lfj2008-bof.pdf
History of TOMOYO?
Realities of Mainlining
- http://osdn.jp/projects/tomoyo/docs/lfj2008.pdf
+ https://osdn.jp/projects/tomoyo/docs/lfj2008.pdf
What is future plan?
====================
diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst
index cc6151fc0845..9a969c0157f1 100644
--- a/Documentation/admin-guide/README.rst
+++ b/Documentation/admin-guide/README.rst
@@ -1,9 +1,9 @@
.. _readme:
-Linux kernel release 5.x <http://kernel.org/>
+Linux kernel release 6.x <http://kernel.org/>
=============================================
-These are the release notes for Linux version 5. Read them carefully,
+These are the release notes for Linux version 6. Read them carefully,
as they tell you what this is all about, explain how to install the
kernel, and what to do if something goes wrong.
@@ -63,7 +63,7 @@ Installing the kernel source
directory where you have permissions (e.g. your home directory) and
unpack it::
- xz -cd linux-5.x.tar.xz | tar xvf -
+ xz -cd linux-6.x.tar.xz | tar xvf -
Replace "X" with the version number of the latest kernel.
@@ -72,12 +72,12 @@ Installing the kernel source
files. They should match the library, and not get messed up by
whatever the kernel-du-jour happens to be.
- - You can also upgrade between 5.x releases by patching. Patches are
+ - You can also upgrade between 6.x releases by patching. Patches are
distributed in the xz format. To install by patching, get all the
newer patch files, enter the top level directory of the kernel source
- (linux-5.x) and execute::
+ (linux-6.x) and execute::
- xz -cd ../patch-5.x.xz | patch -p1
+ xz -cd ../patch-6.x.xz | patch -p1
Replace "x" for all versions bigger than the version "x" of your current
source tree, **in_order**, and you should be ok. You may want to remove
@@ -85,13 +85,13 @@ Installing the kernel source
that there are no failed patches (some-file-name# or some-file-name.rej).
If there are, either you or I have made a mistake.
- Unlike patches for the 5.x kernels, patches for the 5.x.y kernels
+ Unlike patches for the 6.x kernels, patches for the 6.x.y kernels
(also known as the -stable kernels) are not incremental but instead apply
- directly to the base 5.x kernel. For example, if your base kernel is 5.0
- and you want to apply the 5.0.3 patch, you must not first apply the 5.0.1
- and 5.0.2 patches. Similarly, if you are running kernel version 5.0.2 and
- want to jump to 5.0.3, you must first reverse the 5.0.2 patch (that is,
- patch -R) **before** applying the 5.0.3 patch. You can read more on this in
+ directly to the base 6.x kernel. For example, if your base kernel is 6.0
+ and you want to apply the 6.0.3 patch, you must not first apply the 6.0.1
+ and 6.0.2 patches. Similarly, if you are running kernel version 6.0.2 and
+ want to jump to 6.0.3, you must first reverse the 6.0.2 patch (that is,
+ patch -R) **before** applying the 6.0.3 patch. You can read more on this in
:ref:`Documentation/process/applying-patches.rst <applying_patches>`.
Alternatively, the script patch-kernel can be used to automate this
@@ -114,7 +114,7 @@ Installing the kernel source
Software requirements
---------------------
- Compiling and running the 5.x kernels requires up-to-date
+ Compiling and running the 6.x kernels requires up-to-date
versions of various software packages. Consult
:ref:`Documentation/process/changes.rst <changes>` for the minimum version numbers
required and how to get updates for these packages. Beware that using
@@ -132,12 +132,12 @@ Build directory for the kernel
place for the output files (including .config).
Example::
- kernel source code: /usr/src/linux-5.x
+ kernel source code: /usr/src/linux-6.x
build directory: /home/name/build/kernel
To configure and build the kernel, use::
- cd /usr/src/linux-5.x
+ cd /usr/src/linux-6.x
make O=/home/name/build/kernel menuconfig
make O=/home/name/build/kernel
sudo make O=/home/name/build/kernel modules_install install
@@ -209,20 +209,28 @@ Configuring the kernel
store the lsmod of that machine into a file
and pass it in as a LSMOD parameter.
+ Also, you can preserve modules in certain folders
+ or kconfig files by specifying their paths in
+ parameter LMC_KEEP.
+
target$ lsmod > /tmp/mylsmod
target$ scp /tmp/mylsmod host:/tmp
- host$ make LSMOD=/tmp/mylsmod localmodconfig
+ host$ make LSMOD=/tmp/mylsmod \
+ LMC_KEEP="drivers/usb:drivers/gpu:fs" \
+ localmodconfig
The above also works when cross compiling.
"make localyesconfig" Similar to localmodconfig, except it will convert
- all module options to built in (=y) options.
+ all module options to built in (=y) options. You can
+ also preserve modules by LMC_KEEP.
- "make kvmconfig" Enable additional options for kvm guest kernel support.
+ "make kvm_guest.config" Enable additional options for kvm guest kernel
+ support.
- "make xenconfig" Enable additional options for xen dom0 guest kernel
- support.
+ "make xen.config" Enable additional options for xen dom0 guest kernel
+ support.
"make tinyconfig" Configure the tiniest possible kernel.
@@ -251,11 +259,9 @@ Configuring the kernel
Compiling the kernel
--------------------
- - Make sure you have at least gcc 4.6 available.
+ - Make sure you have at least gcc 5.1 available.
For more information, refer to :ref:`Documentation/process/changes.rst <changes>`.
- Please note that you can still run a.out user programs with this kernel.
-
- Do a ``make`` to create a compressed kernel image. It is also
possible to do ``make install`` if you have lilo installed to suit the
kernel makefiles, but you may want to check your particular lilo setup first.
@@ -315,94 +321,19 @@ Compiling the kernel
reboot, and enjoy!
If you ever need to change the default root device, video mode,
- ramdisk size, etc. in the kernel image, use the ``rdev`` program (or
- alternatively the LILO boot options when appropriate). No need to
- recompile the kernel to change these parameters.
+ etc. in the kernel image, use your bootloader's boot options
+ where appropriate. No need to recompile the kernel to change
+ these parameters.
- Reboot with the new kernel and enjoy.
If something goes wrong
-----------------------
- - If you have problems that seem to be due to kernel bugs, please check
- the file MAINTAINERS to see if there is a particular person associated
- with the part of the kernel that you are having trouble with. If there
- isn't anyone listed there, then the second best thing is to mail
- them to me (torvalds@linux-foundation.org), and possibly to any other
- relevant mailing-list or to the newsgroup.
-
- - In all bug-reports, *please* tell what kernel you are talking about,
- how to duplicate the problem, and what your setup is (use your common
- sense). If the problem is new, tell me so, and if the problem is
- old, please try to tell me when you first noticed it.
-
- - If the bug results in a message like::
-
- unable to handle kernel paging request at address C0000010
- Oops: 0002
- EIP: 0010:XXXXXXXX
- eax: xxxxxxxx ebx: xxxxxxxx ecx: xxxxxxxx edx: xxxxxxxx
- esi: xxxxxxxx edi: xxxxxxxx ebp: xxxxxxxx
- ds: xxxx es: xxxx fs: xxxx gs: xxxx
- Pid: xx, process nr: xx
- xx xx xx xx xx xx xx xx xx xx
-
- or similar kernel debugging information on your screen or in your
- system log, please duplicate it *exactly*. The dump may look
- incomprehensible to you, but it does contain information that may
- help debugging the problem. The text above the dump is also
- important: it tells something about why the kernel dumped code (in
- the above example, it's due to a bad kernel pointer). More information
- on making sense of the dump is in Documentation/admin-guide/bug-hunting.rst
-
- - If you compiled the kernel with CONFIG_KALLSYMS you can send the dump
- as is, otherwise you will have to use the ``ksymoops`` program to make
- sense of the dump (but compiling with CONFIG_KALLSYMS is usually preferred).
- This utility can be downloaded from
- https://www.kernel.org/pub/linux/utils/kernel/ksymoops/ .
- Alternatively, you can do the dump lookup by hand:
-
- - In debugging dumps like the above, it helps enormously if you can
- look up what the EIP value means. The hex value as such doesn't help
- me or anybody else very much: it will depend on your particular
- kernel setup. What you should do is take the hex value from the EIP
- line (ignore the ``0010:``), and look it up in the kernel namelist to
- see which kernel function contains the offending address.
-
- To find out the kernel function name, you'll need to find the system
- binary associated with the kernel that exhibited the symptom. This is
- the file 'linux/vmlinux'. To extract the namelist and match it against
- the EIP from the kernel crash, do::
-
- nm vmlinux | sort | less
-
- This will give you a list of kernel addresses sorted in ascending
- order, from which it is simple to find the function that contains the
- offending address. Note that the address given by the kernel
- debugging messages will not necessarily match exactly with the
- function addresses (in fact, that is very unlikely), so you can't
- just 'grep' the list: the list will, however, give you the starting
- point of each kernel function, so by looking for the function that
- has a starting address lower than the one you are searching for but
- is followed by a function with a higher address you will find the one
- you want. In fact, it may be a good idea to include a bit of
- "context" in your problem report, giving a few lines around the
- interesting one.
-
- If you for some reason cannot do the above (you have a pre-compiled
- kernel image or similar), telling me as much about your setup as
- possible will help. Please read the :ref:`admin-guide/reporting-bugs.rst <reportingbugs>`
- document for details.
-
- - Alternatively, you can use gdb on a running kernel. (read-only; i.e. you
- cannot change values or set break points.) To do this, first compile the
- kernel with -g; edit arch/x86/Makefile appropriately, then do a ``make
- clean``. You'll also need to enable CONFIG_PROC_FS (via ``make config``).
-
- After you've rebooted with the new kernel, do ``gdb vmlinux /proc/kcore``.
- You can now use all the usual gdb commands. The command to look up the
- point where your system crashed is ``l *0xXXXXXXXX``. (Replace the XXXes
- with the EIP value.)
-
- gdb'ing a non-running kernel currently fails because ``gdb`` (wrongly)
- disregards the starting offset for which the kernel is compiled.
+If you have problems that seem to be due to kernel bugs, please follow the
+instructions at 'Documentation/admin-guide/reporting-issues.rst'.
+
+Hints on understanding kernel bug reports are in
+'Documentation/admin-guide/bug-hunting.rst'. More on debugging the kernel
+with gdb is in 'Documentation/dev-tools/gdb-kernel-debugging.rst' and
+'Documentation/dev-tools/kgdb.rst'.
diff --git a/Documentation/admin-guide/abi-obsolete.rst b/Documentation/admin-guide/abi-obsolete.rst
new file mode 100644
index 000000000000..594e697aa1b2
--- /dev/null
+++ b/Documentation/admin-guide/abi-obsolete.rst
@@ -0,0 +1,11 @@
+ABI obsolete symbols
+====================
+
+Documents interfaces that are still remaining in the kernel, but are
+marked to be removed at some later point in time.
+
+The description of the interface will document the reason why it is
+obsolete and when it can be expected to be removed.
+
+.. kernel-abi:: ABI/obsolete
+ :rst:
diff --git a/Documentation/admin-guide/abi-removed.rst b/Documentation/admin-guide/abi-removed.rst
new file mode 100644
index 000000000000..f9e000c81828
--- /dev/null
+++ b/Documentation/admin-guide/abi-removed.rst
@@ -0,0 +1,5 @@
+ABI removed symbols
+===================
+
+.. kernel-abi:: ABI/removed
+ :rst:
diff --git a/Documentation/admin-guide/abi-stable.rst b/Documentation/admin-guide/abi-stable.rst
new file mode 100644
index 000000000000..fc3361d847b1
--- /dev/null
+++ b/Documentation/admin-guide/abi-stable.rst
@@ -0,0 +1,14 @@
+ABI stable symbols
+==================
+
+Documents the interfaces that the developer has defined to be stable.
+
+Userspace programs are free to use these interfaces with no
+restrictions, and backward compatibility for them will be guaranteed
+for at least 2 years.
+
+Most interfaces (like syscalls) are expected to never change and always
+be available.
+
+.. kernel-abi:: ABI/stable
+ :rst:
diff --git a/Documentation/admin-guide/abi-testing.rst b/Documentation/admin-guide/abi-testing.rst
new file mode 100644
index 000000000000..19767926b344
--- /dev/null
+++ b/Documentation/admin-guide/abi-testing.rst
@@ -0,0 +1,20 @@
+ABI testing symbols
+===================
+
+Documents interfaces that are felt to be stable,
+as the main development of this interface has been completed.
+
+The interface can be changed to add new features, but the
+current interface will not break by doing this, unless grave
+errors or security problems are found in them.
+
+Userspace programs can start to rely on these interfaces, but they must
+be aware of changes that can occur before these interfaces move to
+be marked stable.
+
+Programs that use these interfaces are strongly encouraged to add their
+name to the description of these interfaces, so that the kernel
+developers can easily notify them if any changes occur.
+
+.. kernel-abi:: ABI/testing
+ :rst:
diff --git a/Documentation/admin-guide/abi.rst b/Documentation/admin-guide/abi.rst
new file mode 100644
index 000000000000..bcab3ef2597c
--- /dev/null
+++ b/Documentation/admin-guide/abi.rst
@@ -0,0 +1,11 @@
+=====================
+Linux ABI description
+=====================
+
+.. toctree::
+ :maxdepth: 2
+
+ abi-stable
+ abi-testing
+ abi-obsolete
+ abi-removed
diff --git a/Documentation/admin-guide/acpi/cppc_sysfs.rst b/Documentation/admin-guide/acpi/cppc_sysfs.rst
index a4b99afbe331..36981c667823 100644
--- a/Documentation/admin-guide/acpi/cppc_sysfs.rst
+++ b/Documentation/admin-guide/acpi/cppc_sysfs.rst
@@ -4,11 +4,13 @@
Collaborative Processor Performance Control (CPPC)
==================================================
+.. _cppc_sysfs:
+
CPPC
====
CPPC defined in the ACPI spec describes a mechanism for the OS to manage the
-performance of a logical processor on a contigious and abstract performance
+performance of a logical processor on a contiguous and abstract performance
scale. CPPC exposes a set of registers to describe abstract performance scale,
to request performance levels and to measure per-cpu delivered performance.
@@ -45,7 +47,7 @@ for each cpu X::
* lowest_freq : CPU frequency corresponding to lowest_perf (in MHz).
* nominal_freq : CPU frequency corresponding to nominal_perf (in MHz).
The above frequencies should only be used to report processor performance in
- freqency instead of abstract scale. These values should not be used for any
+ frequency instead of abstract scale. These values should not be used for any
functional decisions.
* feedback_ctrs : Includes both Reference and delivered performance counter.
@@ -73,4 +75,4 @@ taking two different snapshots of feedback counters at time T1 and T2.
delivered_counter_delta = fbc_t2[del] - fbc_t1[del]
reference_counter_delta = fbc_t2[ref] - fbc_t1[ref]
- delivered_perf = (refernce_perf x delivered_counter_delta) / reference_counter_delta
+ delivered_perf = (reference_perf x delivered_counter_delta) / reference_counter_delta
diff --git a/Documentation/admin-guide/acpi/dsdt-override.rst b/Documentation/admin-guide/acpi/dsdt-override.rst
deleted file mode 100644
index 50bd7f194bf4..000000000000
--- a/Documentation/admin-guide/acpi/dsdt-override.rst
+++ /dev/null
@@ -1,13 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-===============
-Overriding DSDT
-===============
-
-Linux supports a method of overriding the BIOS DSDT:
-
-CONFIG_ACPI_CUSTOM_DSDT - builds the image into the kernel.
-
-When to use this method is described in detail on the
-Linux/ACPI home page:
-https://01.org/linux-acpi/documentation/overriding-dsdt
diff --git a/Documentation/admin-guide/acpi/fan_performance_states.rst b/Documentation/admin-guide/acpi/fan_performance_states.rst
index 98fe5c333121..b9e4b4d146c1 100644
--- a/Documentation/admin-guide/acpi/fan_performance_states.rst
+++ b/Documentation/admin-guide/acpi/fan_performance_states.rst
@@ -60,3 +60,31 @@ For example::
When a given field is not populated or its value provided by the platform
firmware is invalid, the "not-defined" string is shown instead of the value.
+
+ACPI Fan Fine Grain Control
+=============================
+
+When _FIF object specifies support for fine grain control, then fan speed
+can be set from 0 to 100% with the recommended minimum "step size" via
+_FSL object. User can adjust fan speed using thermal sysfs cooling device.
+
+Here use can look at fan performance states for a reference speed (speed_rpm)
+and set it by changing cooling device cur_state. If the fine grain control
+is supported then user can also adjust to some other speeds which are
+not defined in the performance states.
+
+The support of fine grain control is presented via sysfs attribute
+"fine_grain_control". If fine grain control is present, this attribute
+will show "1" otherwise "0".
+
+This sysfs attribute is presented in the same directory as performance states.
+
+ACPI Fan Performance Feedback
+=============================
+
+The optional _FST object provides status information for the fan device.
+This includes field to provide current fan speed in revolutions per minute
+at which the fan is rotating.
+
+This speed is presented in the sysfs using the attribute "fan_speed_rpm",
+in the same directory as performance states.
diff --git a/Documentation/admin-guide/acpi/index.rst b/Documentation/admin-guide/acpi/index.rst
index 71277689ad97..b078fdb8f4c9 100644
--- a/Documentation/admin-guide/acpi/index.rst
+++ b/Documentation/admin-guide/acpi/index.rst
@@ -9,7 +9,6 @@ the Linux ACPI support.
:maxdepth: 1
initrd_table_override
- dsdt-override
ssdt-overlays
cppc_sysfs
fan_performance_states
diff --git a/Documentation/admin-guide/acpi/initrd_table_override.rst b/Documentation/admin-guide/acpi/initrd_table_override.rst
index cbd768207631..bb24fa6b5fbe 100644
--- a/Documentation/admin-guide/acpi/initrd_table_override.rst
+++ b/Documentation/admin-guide/acpi/initrd_table_override.rst
@@ -102,7 +102,7 @@ Where to retrieve userspace tools
=================================
iasl and acpixtract are part of Intel's ACPICA project:
-http://acpica.org/
+https://acpica.org/
and should be packaged by distributions (for example in the acpica package
on SUSE).
diff --git a/Documentation/admin-guide/acpi/ssdt-overlays.rst b/Documentation/admin-guide/acpi/ssdt-overlays.rst
index da37455f96c9..5ea9f4a3b76e 100644
--- a/Documentation/admin-guide/acpi/ssdt-overlays.rst
+++ b/Documentation/admin-guide/acpi/ssdt-overlays.rst
@@ -30,22 +30,21 @@ following ASL code can be used::
{
Device (STAC)
{
- Name (_ADR, Zero)
Name (_HID, "BMA222E")
+ Name (RBUF, ResourceTemplate ()
+ {
+ I2cSerialBus (0x0018, ControllerInitiated, 0x00061A80,
+ AddressingMode7Bit, "\\_SB.I2C6", 0x00,
+ ResourceConsumer, ,)
+ GpioInt (Edge, ActiveHigh, Exclusive, PullDown, 0x0000,
+ "\\_SB.GPO2", 0x00, ResourceConsumer, , )
+ { // Pin list
+ 0
+ }
+ })
Method (_CRS, 0, Serialized)
{
- Name (RBUF, ResourceTemplate ()
- {
- I2cSerialBus (0x0018, ControllerInitiated, 0x00061A80,
- AddressingMode7Bit, "\\_SB.I2C6", 0x00,
- ResourceConsumer, ,)
- GpioInt (Edge, ActiveHigh, Exclusive, PullDown, 0x0000,
- "\\_SB.GPO2", 0x00, ResourceConsumer, , )
- { // Pin list
- 0
- }
- })
Return (RBUF)
}
}
@@ -63,7 +62,7 @@ which can then be compiled to AML binary format::
ASL Input: minnomax.asl - 30 lines, 614 bytes, 7 keywords
AML Output: minnowmax.aml - 165 bytes, 6 named objects, 1 executable opcodes
-[1] http://wiki.minnowboard.org/MinnowBoard_MAX#Low_Speed_Expansion_Connector_.28Top.29
+[1] https://www.elinux.org/Minnowboard:MinnowMax#Low_Speed_Expansion_.28Top.29
The resulting AML code can then be loaded by the kernel using one of the methods
below.
@@ -75,7 +74,7 @@ This option allows loading of user defined SSDTs from initrd and it is useful
when the system does not support EFI or when there is not enough EFI storage.
It works in a similar way with initrd based ACPI tables override/upgrade: SSDT
-aml code must be placed in the first, uncompressed, initrd under the
+AML code must be placed in the first, uncompressed, initrd under the
"kernel/firmware/acpi" path. Multiple files can be used and this will translate
in loading multiple tables. Only SSDT and OEM tables are allowed. See
initrd_table_override.txt for more details.
@@ -103,12 +102,14 @@ This is the preferred method, when EFI is supported on the platform, because it
allows a persistent, OS independent way of storing the user defined SSDTs. There
is also work underway to implement EFI support for loading user defined SSDTs
and using this method will make it easier to convert to the EFI loading
-mechanism when that will arrive.
+mechanism when that will arrive. To enable it, the
+CONFIG_EFI_CUSTOM_SSDT_OVERLAYS should be chosen to y.
-In order to load SSDTs from an EFI variable the efivar_ssdt kernel command line
-parameter can be used. The argument for the option is the variable name to
-use. If there are multiple variables with the same name but with different
-vendor GUIDs, all of them will be loaded.
+In order to load SSDTs from an EFI variable the ``"efivar_ssdt=..."`` kernel
+command line parameter can be used (the name has a limitation of 16 characters).
+The argument for the option is the variable name to use. If there are multiple
+variables with the same name but with different vendor GUIDs, all of them will
+be loaded.
In order to store the AML code in an EFI variable the efivarfs filesystem can be
used. It is enabled and mounted by default in /sys/firmware/efi/efivars in all
@@ -127,7 +128,7 @@ variable with the content from a given file::
#!/bin/sh -e
- while ! [ -z "$1" ]; do
+ while [ -n "$1" ]; do
case "$1" in
"-f") filename="$2"; shift;;
"-g") guid="$2"; shift;;
@@ -167,14 +168,14 @@ variable with the content from a given file::
Loading ACPI SSDTs from configfs
================================
-This option allows loading of user defined SSDTs from userspace via the configfs
+This option allows loading of user defined SSDTs from user space via the configfs
interface. The CONFIG_ACPI_CONFIGFS option must be select and configfs must be
mounted. In the following examples, we assume that configfs has been mounted in
-/config.
+/sys/kernel/config.
-New tables can be loading by creating new directories in /config/acpi/table/ and
-writing the SSDT aml code in the aml attribute::
+New tables can be loading by creating new directories in /sys/kernel/config/acpi/table
+and writing the SSDT AML code in the aml attribute::
- cd /config/acpi/table
+ cd /sys/kernel/config/acpi/table
mkdir my_ssdt
cat ~/ssdt.aml > my_ssdt/aml
diff --git a/Documentation/admin-guide/auxdisplay/cfag12864b.rst b/Documentation/admin-guide/auxdisplay/cfag12864b.rst
index 18c2865bd322..da385d851acc 100644
--- a/Documentation/admin-guide/auxdisplay/cfag12864b.rst
+++ b/Documentation/admin-guide/auxdisplay/cfag12864b.rst
@@ -3,7 +3,7 @@ cfag12864b LCD Driver Documentation
===================================
:License: GPLv2
-:Author & Maintainer: Miguel Ojeda Sandonis
+:Author & Maintainer: Miguel Ojeda <ojeda@kernel.org>
:Date: 2006-10-27
diff --git a/Documentation/admin-guide/auxdisplay/ks0108.rst b/Documentation/admin-guide/auxdisplay/ks0108.rst
index c0b7faf73136..a7d3fe509373 100644
--- a/Documentation/admin-guide/auxdisplay/ks0108.rst
+++ b/Documentation/admin-guide/auxdisplay/ks0108.rst
@@ -3,7 +3,7 @@ ks0108 LCD Controller Driver Documentation
==========================================
:License: GPLv2
-:Author & Maintainer: Miguel Ojeda Sandonis
+:Author & Maintainer: Miguel Ojeda <ojeda@kernel.org>
:Date: 2006-10-27
diff --git a/Documentation/admin-guide/bcache.rst b/Documentation/admin-guide/bcache.rst
index c0ce64d75bbf..6fdb495ac466 100644
--- a/Documentation/admin-guide/bcache.rst
+++ b/Documentation/admin-guide/bcache.rst
@@ -5,11 +5,14 @@ A block layer cache (bcache)
Say you've got a big slow raid 6, and an ssd or three. Wouldn't it be
nice if you could use them as cache... Hence bcache.
-Wiki and git repositories are at:
+The bcache wiki can be found at:
+ https://bcache.evilpiepirate.org
- - http://bcache.evilpiepirate.org
- - http://evilpiepirate.org/git/linux-bcache.git
- - http://evilpiepirate.org/git/bcache-tools.git
+This is the git repository of bcache-tools:
+ https://git.kernel.org/pub/scm/linux/kernel/git/colyli/bcache-tools.git/
+
+The latest bcache kernel code can be found from mainline Linux kernel:
+ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
It's designed around the performance characteristics of SSDs - it only allocates
in erase block sized buckets, and it uses a hybrid btree/log to track cached
@@ -41,17 +44,21 @@ in the cache it first disables writeback caching and waits for all dirty data
to be flushed.
Getting started:
-You'll need make-bcache from the bcache-tools repository. Both the cache device
+You'll need bcache util from the bcache-tools repository. Both the cache device
and backing device must be formatted before use::
- make-bcache -B /dev/sdb
- make-bcache -C /dev/sdc
+ bcache make -B /dev/sdb
+ bcache make -C /dev/sdc
-make-bcache has the ability to format multiple devices at the same time - if
+`bcache make` has the ability to format multiple devices at the same time - if
you format your backing devices and cache device at the same time, you won't
have to manually attach::
- make-bcache -B /dev/sda /dev/sdb -C /dev/sdc
+ bcache make -B /dev/sda /dev/sdb -C /dev/sdc
+
+If your bcache-tools is not updated to latest version and does not have the
+unified `bcache` utility, you may use the legacy `make-bcache` utility to format
+bcache device with same -B and -C parameters.
bcache-tools now ships udev rules, and bcache devices are known to the kernel
immediately. Without udev, you can manually register devices like this::
@@ -188,7 +195,7 @@ D) Recovering data without bcache:
If bcache is not available in the kernel, a filesystem on the backing
device is still available at an 8KiB offset. So either via a loopdev
of the backing device created with --offset 8K, or any value defined by
---data-offset when you originally formatted bcache with `make-bcache`.
+--data-offset when you originally formatted bcache with `bcache make`.
For example::
@@ -197,7 +204,7 @@ For example::
This should present your unmodified backing device data in /dev/loop0
If your cache is in writethrough mode, then you can safely discard the
-cache device without loosing data.
+cache device without losing data.
E) Wiping a cache device
@@ -210,7 +217,7 @@ E) Wiping a cache device
After you boot back with bcache enabled, you recreate the cache and attach it::
- host:~# make-bcache -C /dev/sdh2
+ host:~# bcache make -C /dev/sdh2
UUID: 7be7e175-8f4c-4f99-94b2-9c904d227045
Set UUID: 5bc072a8-ab17-446d-9744-e247949913c1
version: 0
@@ -318,7 +325,7 @@ want for getting the best possible numbers when benchmarking.
The default metadata size in bcache is 8k. If your backing device is
RAID based, then be sure to align this by a multiple of your stride
- width using `make-bcache --data-offset`. If you intend to expand your
+ width using `bcache make --data-offset`. If you intend to expand your
disk array in the future, then multiply a series of primes by your
raid stripe size to get the disk multiples that you would like.
@@ -501,9 +508,6 @@ cache_miss_collisions
cache miss, but raced with a write and data was already present (usually 0
since the synchronization for cache misses was rewritten)
-cache_readaheads
- Count of times readahead occurred.
-
Sysfs - cache set
~~~~~~~~~~~~~~~~~
diff --git a/Documentation/admin-guide/binderfs.rst b/Documentation/admin-guide/binderfs.rst
index 8243af9b3510..41a4db00df8d 100644
--- a/Documentation/admin-guide/binderfs.rst
+++ b/Documentation/admin-guide/binderfs.rst
@@ -70,5 +70,18 @@ Deleting binder Devices
Binderfs binder devices can be deleted via `unlink() <unlink_>`_. This means
that the `rm() <rm_>`_ tool can be used to delete them. Note that the
``binder-control`` device cannot be deleted since this would make the binderfs
-instance unuseable. The ``binder-control`` device will be deleted when the
+instance unusable. The ``binder-control`` device will be deleted when the
binderfs instance is unmounted and all references to it have been dropped.
+
+Binder features
+---------------
+
+Assuming an instance of binderfs has been mounted at ``/dev/binderfs``, the
+features supported by the binder driver can be located under
+``/dev/binderfs/features/``. The presence of individual files can be tested
+to determine whether a particular feature is supported by the driver.
+
+Example::
+
+ cat /dev/binderfs/features/oneway_spam_detection
+ 1
diff --git a/Documentation/admin-guide/binfmt-misc.rst b/Documentation/admin-guide/binfmt-misc.rst
index 7a864131e5ea..59cd902e3549 100644
--- a/Documentation/admin-guide/binfmt-misc.rst
+++ b/Documentation/admin-guide/binfmt-misc.rst
@@ -23,7 +23,7 @@ Here is what the fields mean:
- ``name``
is an identifier string. A new /proc file will be created with this
- ``name below /proc/sys/fs/binfmt_misc``; cannot contain slashes ``/`` for
+ name below ``/proc/sys/fs/binfmt_misc``; cannot contain slashes ``/`` for
obvious reasons.
- ``type``
is the type of recognition. Give ``M`` for magic and ``E`` for extension.
@@ -83,7 +83,7 @@ Here is what the fields mean:
``F`` - fix binary
The usual behaviour of binfmt_misc is to spawn the
binary lazily when the misc format file is invoked. However,
- this doesn``t work very well in the face of mount namespaces and
+ this doesn't work very well in the face of mount namespaces and
changeroots, so the ``F`` mode opens the binary as soon as the
emulation is installed and uses the opened image to spawn the
emulator, meaning it is always available once installed,
diff --git a/Documentation/admin-guide/blockdev/drbd/figures.rst b/Documentation/admin-guide/blockdev/drbd/figures.rst
index bd9a4901fe46..9f73253ea353 100644
--- a/Documentation/admin-guide/blockdev/drbd/figures.rst
+++ b/Documentation/admin-guide/blockdev/drbd/figures.rst
@@ -25,6 +25,6 @@ Sub graphs of DRBD's state transitions
:alt: disk-states-8.dot
:align: center
-.. kernel-figure:: node-states-8.dot
- :alt: node-states-8.dot
+.. kernel-figure:: peer-states-8.dot
+ :alt: peer-states-8.dot
:align: center
diff --git a/Documentation/admin-guide/blockdev/drbd/index.rst b/Documentation/admin-guide/blockdev/drbd/index.rst
index 68ecd5c113e9..561fd1e35917 100644
--- a/Documentation/admin-guide/blockdev/drbd/index.rst
+++ b/Documentation/admin-guide/blockdev/drbd/index.rst
@@ -10,7 +10,7 @@ Description
clusters and in this context, is a "drop-in" replacement for shared
storage. Simplistically, you could see it as a network RAID 1.
- Please visit http://www.drbd.org to find out more.
+ Please visit https://www.drbd.org to find out more.
.. toctree::
:maxdepth: 1
diff --git a/Documentation/admin-guide/blockdev/drbd/node-states-8.dot b/Documentation/admin-guide/blockdev/drbd/peer-states-8.dot
index bfa54e1f8016..6dc3954954d6 100644
--- a/Documentation/admin-guide/blockdev/drbd/node-states-8.dot
+++ b/Documentation/admin-guide/blockdev/drbd/peer-states-8.dot
@@ -1,8 +1,3 @@
-digraph node_states {
- Secondary -> Primary [ label = "ioctl_set_state()" ]
- Primary -> Secondary [ label = "ioctl_set_state()" ]
-}
-
digraph peer_states {
Secondary -> Primary [ label = "recv state packet" ]
Primary -> Secondary [ label = "recv state packet" ]
diff --git a/Documentation/admin-guide/blockdev/floppy.rst b/Documentation/admin-guide/blockdev/floppy.rst
index 4a8f31cf4139..0328438ebe2c 100644
--- a/Documentation/admin-guide/blockdev/floppy.rst
+++ b/Documentation/admin-guide/blockdev/floppy.rst
@@ -6,7 +6,7 @@ FAQ list:
=========
A FAQ list may be found in the fdutils package (see below), and also
-at <http://fdutils.linux.lu/faq.html>.
+at <https://fdutils.linux.lu/faq.html>.
LILO configuration options (Thinkpad users, read this)
@@ -220,11 +220,11 @@ It also contains additional documentation about the floppy driver.
The latest version can be found at fdutils homepage:
- http://fdutils.linux.lu
+ https://fdutils.linux.lu
The fdutils releases can be found at:
- http://fdutils.linux.lu/download.html
+ https://fdutils.linux.lu/download.html
http://www.tux.org/pub/knaff/fdutils/
diff --git a/Documentation/admin-guide/blockdev/index.rst b/Documentation/admin-guide/blockdev/index.rst
index b903cf152091..957ccf617797 100644
--- a/Documentation/admin-guide/blockdev/index.rst
+++ b/Documentation/admin-guide/blockdev/index.rst
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0
-===========================
-The Linux RapidIO Subsystem
-===========================
+=============
+Block Devices
+=============
.. toctree::
:maxdepth: 1
diff --git a/Documentation/admin-guide/blockdev/nbd.rst b/Documentation/admin-guide/blockdev/nbd.rst
index d78dfe559dcf..faf2ac4b1509 100644
--- a/Documentation/admin-guide/blockdev/nbd.rst
+++ b/Documentation/admin-guide/blockdev/nbd.rst
@@ -14,7 +14,7 @@ to borrow disk space from another computer.
Unlike NFS, it is possible to put any filesystem on it, etc.
For more information, or to download the nbd-client and nbd-server
-tools, go to http://nbd.sf.net/.
+tools, go to https://github.com/NetworkBlockDevice/nbd.
The nbd kernel module need only be installed on the client
system, as the nbd-server is completely in userspace. In fact,
diff --git a/Documentation/admin-guide/blockdev/paride.rst b/Documentation/admin-guide/blockdev/paride.rst
index 87b4278bf314..e85ad37cc0e5 100644
--- a/Documentation/admin-guide/blockdev/paride.rst
+++ b/Documentation/admin-guide/blockdev/paride.rst
@@ -3,6 +3,7 @@ Linux and parallel port IDE devices
===================================
PARIDE v1.03 (c) 1997-8 Grant Guenther <grant@torque.net>
+PATA_PARPORT (c) 2023 Ondrej Zary
1. Introduction
===============
@@ -51,27 +52,15 @@ parallel port IDE subsystem, including:
as well as most of the clone and no-name products on the market.
-To support such a wide range of devices, PARIDE, the parallel port IDE
-subsystem, is actually structured in three parts. There is a base
-paride module which provides a registry and some common methods for
-accessing the parallel ports. The second component is a set of
-high-level drivers for each of the different types of supported devices:
+To support such a wide range of devices, pata_parport is actually structured
+in two parts. There is a base pata_parport module which provides an interface
+to kernel libata subsystem, registry and some common methods for accessing
+the parallel ports.
- === =============
- pd IDE disk
- pcd ATAPI CD-ROM
- pf ATAPI disk
- pt ATAPI tape
- pg ATAPI generic
- === =============
-
-(Currently, the pg driver is only used with CD-R drives).
-
-The high-level drivers function according to the relevant standards.
-The third component of PARIDE is a set of low-level protocol drivers
-for each of the parallel port IDE adapter chips. Thanks to the interest
-and encouragement of Linux users from many parts of the world,
-support is available for almost all known adapter protocols:
+The second component is a set of low-level protocol drivers for each of the
+parallel port IDE adapter chips. Thanks to the interest and encouragement of
+Linux users from many parts of the world, support is available for almost all
+known adapter protocols:
==== ====================================== ====
aten ATEN EH-100 (HK)
@@ -91,251 +80,87 @@ support is available for almost all known adapter protocols:
==== ====================================== ====
-2. Using the PARIDE subsystem
-=============================
+2. Using pata_parport subsystem
+===============================
While configuring the Linux kernel, you may choose either to build
-the PARIDE drivers into your kernel, or to build them as modules.
+the pata_parport drivers into your kernel, or to build them as modules.
In either case, you will need to select "Parallel port IDE device support"
-as well as at least one of the high-level drivers and at least one
-of the parallel port communication protocols. If you do not know
-what kind of parallel port adapter is used in your drive, you could
-begin by checking the file names and any text files on your DOS
+and at least one of the parallel port communication protocols.
+If you do not know what kind of parallel port adapter is used in your drive,
+you could begin by checking the file names and any text files on your DOS
installation floppy. Alternatively, you can look at the markings on
the adapter chip itself. That's usually sufficient to identify the
correct device.
-You can actually select all the protocol modules, and allow the PARIDE
+You can actually select all the protocol modules, and allow the pata_parport
subsystem to try them all for you.
For the "brand-name" products listed above, here are the protocol
and high-level drivers that you would use:
- ================ ============ ====== ========
- Manufacturer Model Driver Protocol
- ================ ============ ====== ========
- MicroSolutions CD-ROM pcd bpck
- MicroSolutions PD drive pf bpck
- MicroSolutions hard-drive pd bpck
- MicroSolutions 8000t tape pt bpck
- SyQuest EZ, SparQ pd epat
- Imation Superdisk pf epat
- Maxell Superdisk pf friq
- Avatar Shark pd epat
- FreeCom CD-ROM pcd frpw
- Hewlett-Packard 5GB Tape pt epat
- Hewlett-Packard 7200e (CD) pcd epat
- Hewlett-Packard 7200e (CD-R) pg epat
- ================ ============ ====== ========
-
-2.1 Configuring built-in drivers
----------------------------------
-
-We recommend that you get to know how the drivers work and how to
-configure them as loadable modules, before attempting to compile a
-kernel with the drivers built-in.
-
-If you built all of your PARIDE support directly into your kernel,
-and you have just a single parallel port IDE device, your kernel should
-locate it automatically for you. If you have more than one device,
-you may need to give some command line options to your bootloader
-(eg: LILO), how to do that is beyond the scope of this document.
-
-The high-level drivers accept a number of command line parameters, all
-of which are documented in the source files in linux/drivers/block/paride.
-By default, each driver will automatically try all parallel ports it
-can find, and all protocol types that have been installed, until it finds
-a parallel port IDE adapter. Once it finds one, the probe stops. So,
-if you have more than one device, you will need to tell the drivers
-how to identify them. This requires specifying the port address, the
-protocol identification number and, for some devices, the drive's
-chain ID. While your system is booting, a number of messages are
-displayed on the console. Like all such messages, they can be
-reviewed with the 'dmesg' command. Among those messages will be
-some lines like::
-
- paride: bpck registered as protocol 0
- paride: epat registered as protocol 1
-
-The numbers will always be the same until you build a new kernel with
-different protocol selections. You should note these numbers as you
-will need them to identify the devices.
+ ================ ============ ========
+ Manufacturer Model Protocol
+ ================ ============ ========
+ MicroSolutions CD-ROM bpck
+ MicroSolutions PD drive bpck
+ MicroSolutions hard-drive bpck
+ MicroSolutions 8000t tape bpck
+ SyQuest EZ, SparQ epat
+ Imation Superdisk epat
+ Maxell Superdisk friq
+ Avatar Shark epat
+ FreeCom CD-ROM frpw
+ Hewlett-Packard 5GB Tape epat
+ Hewlett-Packard 7200e (CD) epat
+ Hewlett-Packard 7200e (CD-R) epat
+ ================ ============ ========
+
+All parports and all protocol drivers are probed automatically unless probe=0
+parameter is used. So just "modprobe epat" is enough for a Imation SuperDisk
+drive to work.
+
+Manual device creation::
+
+ # echo "port protocol mode unit delay" >/sys/bus/pata_parport/new_device
+
+where:
+
+ ======== ================================================
+ port parport name (or "auto" for all parports)
+ protocol protocol name (or "auto" for all protocols)
+ mode mode number (protocol-specific) or -1 for probe
+ unit unit number (for backpack only, see below)
+ delay I/O delay (see troubleshooting section below)
+ ======== ================================================
If you happen to be using a MicroSolutions backpack device, you will
also need to know the unit ID number for each drive. This is usually
the last two digits of the drive's serial number (but read MicroSolutions'
documentation about this).
-As an example, let's assume that you have a MicroSolutions PD/CD drive
-with unit ID number 36 connected to the parallel port at 0x378, a SyQuest
-EZ-135 connected to the chained port on the PD/CD drive and also an
-Imation Superdisk connected to port 0x278. You could give the following
-options on your boot command::
-
- pd.drive0=0x378,1 pf.drive0=0x278,1 pf.drive1=0x378,0,36
-
-In the last option, pf.drive1 configures device /dev/pf1, the 0x378
-is the parallel port base address, the 0 is the protocol registration
-number and 36 is the chain ID.
-
-Please note: while PARIDE will work both with and without the
-PARPORT parallel port sharing system that is included by the
-"Parallel port support" option, PARPORT must be included and enabled
-if you want to use chains of devices on the same parallel port.
-
-2.2 Loading and configuring PARIDE as modules
-----------------------------------------------
-
-It is much faster and simpler to get to understand the PARIDE drivers
-if you use them as loadable kernel modules.
-
-Note 1:
- using these drivers with the "kerneld" automatic module loading
- system is not recommended for beginners, and is not documented here.
-
-Note 2:
- if you build PARPORT support as a loadable module, PARIDE must
- also be built as loadable modules, and PARPORT must be loaded before
- the PARIDE modules.
-
-To use PARIDE, you must begin by::
-
- insmod paride
-
-this loads a base module which provides a registry for the protocols,
-among other tasks.
-
-Then, load as many of the protocol modules as you think you might need.
-As you load each module, it will register the protocols that it supports,
-and print a log message to your kernel log file and your console. For
-example::
-
- # insmod epat
- paride: epat registered as protocol 0
- # insmod kbic
- paride: k951 registered as protocol 1
- paride: k971 registered as protocol 2
-
-Finally, you can load high-level drivers for each kind of device that
-you have connected. By default, each driver will autoprobe for a single
-device, but you can support up to four similar devices by giving their
-individual co-ordinates when you load the driver.
-
-For example, if you had two no-name CD-ROM drives both using the
-KingByte KBIC-951A adapter, one on port 0x378 and the other on 0x3bc
-you could give the following command::
-
- # insmod pcd drive0=0x378,1 drive1=0x3bc,1
-
-For most adapters, giving a port address and protocol number is sufficient,
-but check the source files in linux/drivers/block/paride for more
-information. (Hopefully someone will write some man pages one day !).
-
-As another example, here's what happens when PARPORT is installed, and
-a SyQuest EZ-135 is attached to port 0x378::
-
- # insmod paride
- paride: version 1.0 installed
- # insmod epat
- paride: epat registered as protocol 0
- # insmod pd
- pd: pd version 1.0, major 45, cluster 64, nice 0
- pda: Sharing parport1 at 0x378
- pda: epat 1.0, Shuttle EPAT chip c3 at 0x378, mode 5 (EPP-32), delay 1
- pda: SyQuest EZ135A, 262144 blocks [128M], (512/16/32), removable media
- pda: pda1
-
-Note that the last line is the output from the generic partition table
-scanner - in this case it reports that it has found a disk with one partition.
-
-2.3 Using a PARIDE device
---------------------------
-
-Once the drivers have been loaded, you can access PARIDE devices in the
-same way as their traditional counterparts. You will probably need to
-create the device "special files". Here is a simple script that you can
-cut to a file and execute::
-
- #!/bin/bash
- #
- # mkd -- a script to create the device special files for the PARIDE subsystem
- #
- function mkdev {
- mknod $1 $2 $3 $4 ; chmod 0660 $1 ; chown root:disk $1
- }
- #
- function pd {
- D=$( printf \\$( printf "x%03x" $[ $1 + 97 ] ) )
- mkdev pd$D b 45 $[ $1 * 16 ]
- for P in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- do mkdev pd$D$P b 45 $[ $1 * 16 + $P ]
- done
- }
- #
- cd /dev
- #
- for u in 0 1 2 3 ; do pd $u ; done
- for u in 0 1 2 3 ; do mkdev pcd$u b 46 $u ; done
- for u in 0 1 2 3 ; do mkdev pf$u b 47 $u ; done
- for u in 0 1 2 3 ; do mkdev pt$u c 96 $u ; done
- for u in 0 1 2 3 ; do mkdev npt$u c 96 $[ $u + 128 ] ; done
- for u in 0 1 2 3 ; do mkdev pg$u c 97 $u ; done
- #
- # end of mkd
-
-With the device files and drivers in place, you can access PARIDE devices
-like any other Linux device. For example, to mount a CD-ROM in pcd0, use::
-
- mount /dev/pcd0 /cdrom
-
-If you have a fresh Avatar Shark cartridge, and the drive is pda, you
-might do something like::
-
- fdisk /dev/pda -- make a new partition table with
- partition 1 of type 83
-
- mke2fs /dev/pda1 -- to build the file system
-
- mkdir /shark -- make a place to mount the disk
-
- mount /dev/pda1 /shark
-
-Devices like the Imation superdisk work in the same way, except that
-they do not have a partition table. For example to make a 120MB
-floppy that you could share with a DOS system::
-
- mkdosfs /dev/pf0
- mount /dev/pf0 /mnt
-
-
-2.4 The pf driver
-------------------
-
-The pf driver is intended for use with parallel port ATAPI disk
-devices. The most common devices in this category are PD drives
-and LS-120 drives. Traditionally, media for these devices are not
-partitioned. Consequently, the pf driver does not support partitioned
-media. This may be changed in a future version of the driver.
-
-2.5 Using the pt driver
-------------------------
-
-The pt driver for parallel port ATAPI tape drives is a minimal driver.
-It does not yet support many of the standard tape ioctl operations.
-For best performance, a block size of 32KB should be used. You will
-probably want to set the parallel port delay to 0, if you can.
-
-2.6 Using the pg driver
-------------------------
-
-The pg driver can be used in conjunction with the cdrecord program
-to create CD-ROMs. Please get cdrecord version 1.6.1 or later
-from ftp://ftp.fokus.gmd.de/pub/unix/cdrecord/ . To record CD-R media
-your parallel port should ideally be set to EPP mode, and the "port delay"
-should be set to 0. With those settings it is possible to record at 2x
-speed without any buffer underruns. If you cannot get the driver to work
-in EPP mode, try to use "bidirectional" or "PS/2" mode and 1x speeds only.
+If you omit the parameters from the end, defaults will be used, e.g.:
+
+Probe all parports with all protocols::
+
+ # echo auto >/sys/bus/pata_parport/new_device
+
+Probe parport0 using protocol epat and mode 4 (EPP-16)::
+
+ # echo "parport0 epat 4" >/sys/bus/pata_parport/new_device
+
+Probe parport0 using all protocols::
+
+ # echo "parport0 auto" >/sys/bus/pata_parport/new_device
+
+Probe all parports using protoocol epat::
+
+ # echo "auto epat" >/sys/bus/pata_parport/new_device
+
+Deleting devices::
+
+ # echo pata_parport.0 >/sys/bus/pata_parport/delete_device
3. Troubleshooting
@@ -344,9 +169,9 @@ in EPP mode, try to use "bidirectional" or "PS/2" mode and 1x speeds only.
3.1 Use EPP mode if you can
----------------------------
-The most common problems that people report with the PARIDE drivers
+The most common problems that people report with the pata_parport drivers
concern the parallel port CMOS settings. At this time, none of the
-PARIDE protocol modules support ECP mode, or any ECP combination modes.
+protocol modules support ECP mode, or any ECP combination modes.
If you are able to do so, please set your parallel port into EPP mode
using your CMOS setup procedure.
@@ -354,17 +179,14 @@ using your CMOS setup procedure.
-------------------------
Some parallel ports cannot reliably transfer data at full speed. To
-offset the errors, the PARIDE protocol modules introduce a "port
+offset the errors, the protocol modules introduce a "port
delay" between each access to the i/o ports. Each protocol sets
a default value for this delay. In most cases, the user can override
the default and set it to 0 - resulting in somewhat higher transfer
rates. In some rare cases (especially with older 486 systems) the
default delays are not long enough. if you experience corrupt data
transfers, or unexpected failures, you may wish to increase the
-port delay. The delay can be programmed using the "driveN" parameters
-to each of the high-level drivers. Please see the notes above, or
-read the comments at the beginning of the driver source files in
-linux/drivers/block/paride.
+port delay.
3.3 Some drives need a printer reset
-------------------------------------
@@ -374,66 +196,12 @@ that do not always power up correctly. We have noticed this with some
drives based on OnSpec and older Freecom adapters. In these rare cases,
the adapter can often be reinitialised by issuing a "printer reset" on
the parallel port. As the reset operation is potentially disruptive in
-multiple device environments, the PARIDE drivers will not do it
+multiple device environments, the pata_parport drivers will not do it
automatically. You can however, force a printer reset by doing::
insmod lp reset=1
rmmod lp
If you have one of these marginal cases, you should probably build
-your paride drivers as modules, and arrange to do the printer reset
-before loading the PARIDE drivers.
-
-3.4 Use the verbose option and dmesg if you need help
-------------------------------------------------------
-
-While a lot of testing has gone into these drivers to make them work
-as smoothly as possible, problems will arise. If you do have problems,
-please check all the obvious things first: does the drive work in
-DOS with the manufacturer's drivers ? If that doesn't yield any useful
-clues, then please make sure that only one drive is hooked to your system,
-and that either (a) PARPORT is enabled or (b) no other device driver
-is using your parallel port (check in /proc/ioports). Then, load the
-appropriate drivers (you can load several protocol modules if you want)
-as in::
-
- # insmod paride
- # insmod epat
- # insmod bpck
- # insmod kbic
- ...
- # insmod pd verbose=1
-
-(using the correct driver for the type of device you have, of course).
-The verbose=1 parameter will cause the drivers to log a trace of their
-activity as they attempt to locate your drive.
-
-Use 'dmesg' to capture a log of all the PARIDE messages (any messages
-beginning with paride:, a protocol module's name or a driver's name) and
-include that with your bug report. You can submit a bug report in one
-of two ways. Either send it directly to the author of the PARIDE suite,
-by e-mail to grant@torque.net, or join the linux-parport mailing list
-and post your report there.
-
-3.5 For more information or help
----------------------------------
-
-You can join the linux-parport mailing list by sending a mail message
-to:
-
- linux-parport-request@torque.net
-
-with the single word::
-
- subscribe
-
-in the body of the mail message (not in the subject line). Please be
-sure that your mail program is correctly set up when you do this, as
-the list manager is a robot that will subscribe you using the reply
-address in your mail headers. REMOVE any anti-spam gimmicks you may
-have in your mail headers, when sending mail to the list server.
-
-You might also find some useful information on the linux-parport
-web pages (although they are not always up to date) at
-
- http://web.archive.org/web/%2E/http://www.torque.net/parport/
+your pata_parport drivers as modules, and arrange to do the printer reset
+before loading the pata_parport drivers.
diff --git a/Documentation/admin-guide/blockdev/ramdisk.rst b/Documentation/admin-guide/blockdev/ramdisk.rst
index b7c2268f8dec..9ce6101e8dd9 100644
--- a/Documentation/admin-guide/blockdev/ramdisk.rst
+++ b/Documentation/admin-guide/blockdev/ramdisk.rst
@@ -6,7 +6,7 @@ Using the RAM disk block device with Linux
1) Overview
2) Kernel Command Line Parameters
- 3) Using "rdev -r"
+ 3) Using "rdev"
4) An Example of Creating a Compressed RAM Disk
@@ -59,51 +59,27 @@ default is 4096 (4 MB).
rd_size
See ramdisk_size.
-3) Using "rdev -r"
-------------------
+3) Using "rdev"
+---------------
-The usage of the word (two bytes) that "rdev -r" sets in the kernel image is
-as follows. The low 11 bits (0 -> 10) specify an offset (in 1 k blocks) of up
-to 2 MB (2^11) of where to find the RAM disk (this used to be the size). Bit
-14 indicates that a RAM disk is to be loaded, and bit 15 indicates whether a
-prompt/wait sequence is to be given before trying to read the RAM disk. Since
-the RAM disk dynamically grows as data is being written into it, a size field
-is not required. Bits 11 to 13 are not currently used and may as well be zero.
-These numbers are no magical secrets, as seen below::
+"rdev" is an obsolete, deprecated, antiquated utility that could be used
+to set the boot device in a Linux kernel image.
- ./arch/x86/kernel/setup.c:#define RAMDISK_IMAGE_START_MASK 0x07FF
- ./arch/x86/kernel/setup.c:#define RAMDISK_PROMPT_FLAG 0x8000
- ./arch/x86/kernel/setup.c:#define RAMDISK_LOAD_FLAG 0x4000
+Instead of using rdev, just place the boot device information on the
+kernel command line and pass it to the kernel from the bootloader.
-Consider a typical two floppy disk setup, where you will have the
-kernel on disk one, and have already put a RAM disk image onto disk #2.
+You can also pass arguments to the kernel by setting FDARGS in
+arch/x86/boot/Makefile and specify in initrd image by setting FDINITRD in
+arch/x86/boot/Makefile.
-Hence you want to set bits 0 to 13 as 0, meaning that your RAM disk
-starts at an offset of 0 kB from the beginning of the floppy.
-The command line equivalent is: "ramdisk_start=0"
+Some of the kernel command line boot options that may apply here are::
-You want bit 14 as one, indicating that a RAM disk is to be loaded.
-The command line equivalent is: "load_ramdisk=1"
-
-You want bit 15 as one, indicating that you want a prompt/keypress
-sequence so that you have a chance to switch floppy disks.
-The command line equivalent is: "prompt_ramdisk=1"
-
-Putting that together gives 2^15 + 2^14 + 0 = 49152 for an rdev word.
-So to create disk one of the set, you would do::
-
- /usr/src/linux# cat arch/x86/boot/zImage > /dev/fd0
- /usr/src/linux# rdev /dev/fd0 /dev/fd0
- /usr/src/linux# rdev -r /dev/fd0 49152
+ ramdisk_start=N
+ ramdisk_size=M
If you make a boot disk that has LILO, then for the above, you would use::
- append = "ramdisk_start=0 load_ramdisk=1 prompt_ramdisk=1"
-
-Since the default start = 0 and the default prompt = 1, you could use::
-
- append = "load_ramdisk=1"
-
+ append = "ramdisk_start=N ramdisk_size=M"
4) An Example of Creating a Compressed RAM Disk
-----------------------------------------------
@@ -151,12 +127,9 @@ f) Put the RAM disk image onto the floppy, after the kernel. Use an offset
dd if=/tmp/ram_image.gz of=/dev/fd0 bs=1k seek=400
-g) Use "rdev" to set the boot device, RAM disk offset, prompt flag, etc.
- For prompt_ramdisk=1, load_ramdisk=1, ramdisk_start=400, one would
- have 2^15 + 2^14 + 400 = 49552::
-
- rdev /dev/fd0 /dev/fd0
- rdev -r /dev/fd0 49552
+g) Make sure that you have already specified the boot information in
+ FDARGS and FDINITRD or that you use a bootloader to pass kernel
+ command line boot options to the kernel.
That is it. You now have your boot/root compressed RAM disk floppy. Some
users may wish to combine steps (d) and (f) by using a pipe.
@@ -167,11 +140,14 @@ users may wish to combine steps (d) and (f) by using a pipe.
Changelog:
----------
+SEPT-2020 :
+
+ Removed usage of "rdev"
+
10-22-04 :
Updated to reflect changes in command line options, remove
obsolete references, general cleanup.
James Nelson (james4765@gmail.com)
-
12-95 :
Original Document
diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst
index a6fd1f9b5faf..ee2b0030d416 100644
--- a/Documentation/admin-guide/blockdev/zram.rst
+++ b/Documentation/admin-guide/blockdev/zram.rst
@@ -266,6 +266,7 @@ line of text and contains the following stats separated by whitespace:
No memory is allocated for such pages.
pages_compacted the number of pages freed during compaction
huge_pages the number of incompressible pages
+ huge_pages_since the number of incompressible pages since zram set up
================ =============================================================
File /sys/block/zram<id>/bd_stat
@@ -314,8 +315,8 @@ To use the feature, admin should set up backing device via::
echo /dev/sda5 > /sys/block/zramX/backing_dev
-before disksize setting. It supports only partition at this moment.
-If admin wants to use incompressible page writeback, they could do via::
+before disksize setting. It supports only partitions at this moment.
+If admin wants to use incompressible page writeback, they could do it via::
echo huge > /sys/block/zramX/writeback
@@ -327,12 +328,35 @@ as idle::
From now on, any pages on zram are idle pages. The idle mark
will be removed until someone requests access of the block.
IOW, unless there is access request, those pages are still idle pages.
+Additionally, when CONFIG_ZRAM_TRACK_ENTRY_ACTIME is enabled pages can be
+marked as idle based on how long (in seconds) it's been since they were
+last accessed::
+
+ echo 86400 > /sys/block/zramX/idle
+
+In this example all pages which haven't been accessed in more than 86400
+seconds (one day) will be marked idle.
Admin can request writeback of those idle pages at right timing via::
echo idle > /sys/block/zramX/writeback
-With the command, zram writeback idle pages from memory to the storage.
+With the command, zram will writeback idle pages from memory to the storage.
+
+Additionally, if a user choose to writeback only huge and idle pages
+this can be accomplished with::
+
+ echo huge_idle > /sys/block/zramX/writeback
+
+If a user chooses to writeback only incompressible pages (pages that none of
+algorithms can compress) this can be accomplished with::
+
+ echo incompressible > /sys/block/zramX/writeback
+
+If an admin wants to write a specific page in zram device to the backing device,
+they could write a page index into the interface::
+
+ echo "page_index=1251" > /sys/block/zramX/writeback
If there are lots of write IO with flash device, potentially, it has
flash wearout problem so that admin needs to design write limitation
@@ -340,7 +364,7 @@ to guarantee storage health for entire product life.
To overcome the concern, zram supports "writeback_limit" feature.
The "writeback_limit_enable"'s default value is 0 so that it doesn't limit
-any writeback. IOW, if admin wants to apply writeback budget, he should
+any writeback. IOW, if admin wants to apply writeback budget, they should
enable writeback_limit_enable via::
$ echo 1 > /sys/block/zramX/writeback_limit_enable
@@ -351,7 +375,7 @@ until admin sets the budget via /sys/block/zramX/writeback_limit.
(If admin doesn't enable writeback_limit_enable, writeback_limit's value
assigned via /sys/block/zramX/writeback_limit is meaningless.)
-If admin want to limit writeback as per-day 400M, he could do it
+If admin wants to limit writeback as per-day 400M, they could do it
like below::
$ MB_SHIFT=20
@@ -360,17 +384,17 @@ like below::
/sys/block/zram0/writeback_limit.
$ echo 1 > /sys/block/zram0/writeback_limit_enable
-If admins want to allow further write again once the bugdet is exhausted,
-he could do it like below::
+If admins want to allow further write again once the budget is exhausted,
+they could do it like below::
$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
/sys/block/zram0/writeback_limit
-If admin wants to see remaining writeback budget since last set::
+If an admin wants to see the remaining writeback budget since last set::
$ cat /sys/block/zramX/writeback_limit
-If admin want to disable writeback limit, he could do::
+If an admin wants to disable writeback limit, they could do::
$ echo 0 > /sys/block/zramX/writeback_limit_enable
@@ -379,9 +403,90 @@ system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of
writeback happened until you reset the zram to allocate extra writeback
budget in next setting is user's job.
-If admin wants to measure writeback count in a certain period, he could
+If admin wants to measure writeback count in a certain period, they could
know it via /sys/block/zram0/bd_stat's 3rd column.
+recompression
+-------------
+
+With CONFIG_ZRAM_MULTI_COMP, zram can recompress pages using alternative
+(secondary) compression algorithms. The basic idea is that alternative
+compression algorithm can provide better compression ratio at a price of
+(potentially) slower compression/decompression speeds. Alternative compression
+algorithm can, for example, be more successful compressing huge pages (those
+that default algorithm failed to compress). Another application is idle pages
+recompression - pages that are cold and sit in the memory can be recompressed
+using more effective algorithm and, hence, reduce zsmalloc memory usage.
+
+With CONFIG_ZRAM_MULTI_COMP, zram supports up to 4 compression algorithms:
+one primary and up to 3 secondary ones. Primary zram compressor is explained
+in "3) Select compression algorithm", secondary algorithms are configured
+using recomp_algorithm device attribute.
+
+Example:::
+
+ #show supported recompression algorithms
+ cat /sys/block/zramX/recomp_algorithm
+ #1: lzo lzo-rle lz4 lz4hc [zstd]
+ #2: lzo lzo-rle lz4 [lz4hc] zstd
+
+Alternative compression algorithms are sorted by priority. In the example
+above, zstd is used as the first alternative algorithm, which has priority
+of 1, while lz4hc is configured as a compression algorithm with priority 2.
+Alternative compression algorithm's priority is provided during algorithms
+configuration:::
+
+ #select zstd recompression algorithm, priority 1
+ echo "algo=zstd priority=1" > /sys/block/zramX/recomp_algorithm
+
+ #select deflate recompression algorithm, priority 2
+ echo "algo=deflate priority=2" > /sys/block/zramX/recomp_algorithm
+
+Another device attribute that CONFIG_ZRAM_MULTI_COMP enables is recompress,
+which controls recompression.
+
+Examples:::
+
+ #IDLE pages recompression is activated by `idle` mode
+ echo "type=idle" > /sys/block/zramX/recompress
+
+ #HUGE pages recompression is activated by `huge` mode
+ echo "type=huge" > /sys/block/zram0/recompress
+
+ #HUGE_IDLE pages recompression is activated by `huge_idle` mode
+ echo "type=huge_idle" > /sys/block/zramX/recompress
+
+The number of idle pages can be significant, so user-space can pass a size
+threshold (in bytes) to the recompress knob: zram will recompress only pages
+of equal or greater size:::
+
+ #recompress all pages larger than 3000 bytes
+ echo "threshold=3000" > /sys/block/zramX/recompress
+
+ #recompress idle pages larger than 2000 bytes
+ echo "type=idle threshold=2000" > /sys/block/zramX/recompress
+
+Recompression of idle pages requires memory tracking.
+
+During re-compression for every page, that matches re-compression criteria,
+ZRAM iterates the list of registered alternative compression algorithms in
+order of their priorities. ZRAM stops either when re-compression was
+successful (re-compressed object is smaller in size than the original one)
+and matches re-compression criteria (e.g. size threshold) or when there are
+no secondary algorithms left to try. If none of the secondary algorithms can
+successfully re-compressed the page such a page is marked as incompressible,
+so ZRAM will not attempt to re-compress it in the future.
+
+This re-compression behaviour, when it iterates through the list of
+registered compression algorithms, increases our chances of finding the
+algorithm that successfully compresses a particular page. Sometimes, however,
+it is convenient (and sometimes even necessary) to limit recompression to
+only one particular algorithm so that it will not try any other algorithms.
+This can be achieved by providing a algo=NAME parameter:::
+
+ #use zstd algorithm only (if registered)
+ echo "type=huge algo=zstd" > /sys/block/zramX/recompress
+
memory tracking
===============
@@ -392,9 +497,11 @@ pages of the process with*pagemap.
If you enable the feature, you could see block state via
/sys/kernel/debug/zram/zram0/block_state". The output is as follows::
- 300 75.033841 .wh.
- 301 63.806904 s...
- 302 63.806919 ..hi
+ 300 75.033841 .wh...
+ 301 63.806904 s.....
+ 302 63.806919 ..hi..
+ 303 62.801919 ....r.
+ 304 146.781902 ..hi.n
First column
zram's block index.
@@ -411,6 +518,10 @@ Third column
huge page
i:
idle page
+ r:
+ recompressed page (secondary compression algorithm)
+ n:
+ none (including secondary) of algorithms could compress it
First line of above example says 300th block is accessed at 75.033841sec
and the block's state is huge so it is written back to the backing
diff --git a/Documentation/admin-guide/bootconfig.rst b/Documentation/admin-guide/bootconfig.rst
index d6b3b77a4129..91339efdcb54 100644
--- a/Documentation/admin-guide/bootconfig.rst
+++ b/Documentation/admin-guide/bootconfig.rst
@@ -71,6 +71,16 @@ For example,::
foo = bar, baz
foo = qux # !ERROR! we can not re-define same key
+If you want to update the value, you must use the override operator
+``:=`` explicitly. For example::
+
+ foo = bar, baz
+ foo := qux
+
+then, the ``qux`` is assigned to ``foo`` key. This is useful for
+overriding the default value by adding (partial) custom bootconfigs
+without parsing the default bootconfig.
+
If you want to append the value to existing key as an array member,
you can use ``+=`` operator. For example::
@@ -79,12 +89,35 @@ you can use ``+=`` operator. For example::
In this case, the key ``foo`` has ``bar``, ``baz`` and ``qux``.
-However, a sub-key and a value can not co-exist under a parent key.
-For example, following config is NOT allowed.::
+Moreover, sub-keys and a value can coexist under a parent key.
+For example, following config is allowed.::
foo = value1
- foo.bar = value2 # !ERROR! subkey "bar" and value "value1" can NOT co-exist
+ foo.bar = value2
+ foo := value3 # This will update foo's value.
+
+Note, since there is no syntax to put a raw value directly under a
+structured key, you have to define it outside of the brace. For example::
+
+ foo {
+ bar = value1
+ bar {
+ baz = value2
+ qux = value3
+ }
+ }
+
+Also, the order of the value node under a key is fixed. If there
+are a value and subkeys, the value is always the first child node
+of the key. Thus if user specifies subkeys first, e.g.::
+
+ foo.bar = value1
+ foo = value2
+
+In the program (and /proc/bootconfig), it will be shown as below::
+ foo = value2
+ foo.bar = value1
Comments
--------
@@ -125,18 +158,33 @@ Each key-value pair is shown in each line with following style::
Boot Kernel With a Boot Config
==============================
-Since the boot configuration file is loaded with initrd, it will be added
-to the end of the initrd (initramfs) image file with size, checksum and
-12-byte magic word as below.
+There are two options to boot the kernel with bootconfig: attaching the
+bootconfig to the initrd image or embedding it in the kernel itself.
-[initrd][bootconfig][size(u32)][checksum(u32)][#BOOTCONFIG\n]
+Attaching a Boot Config to Initrd
+---------------------------------
+
+Since the boot configuration file is loaded with initrd by default,
+it will be added to the end of the initrd (initramfs) image file with
+padding, size, checksum and 12-byte magic word as below.
+
+[initrd][bootconfig][padding][size(le32)][checksum(le32)][#BOOTCONFIG\n]
+
+The size and checksum fields are unsigned 32bit little endian value.
+
+When the boot configuration is added to the initrd image, the total
+file size is aligned to 4 bytes. To fill the gap, null characters
+(``\0``) will be added. Thus the ``size`` is the length of the bootconfig
+file + padding bytes.
The Linux kernel decodes the last part of the initrd image in memory to
get the boot configuration data.
Because of this "piggyback" method, there is no need to change or
-update the boot loader and the kernel image itself.
+update the boot loader and the kernel image itself as long as the boot
+loader passes the correct initrd file size. If by any chance, the boot
+loader passes a longer size, the kernel fails to find the bootconfig data.
-To do this operation, Linux kernel provides "bootconfig" command under
+To do this operation, Linux kernel provides ``bootconfig`` command under
tools/bootconfig, which allows admin to apply or delete the config file
to/from initrd image. You can build it by the following command::
@@ -153,6 +201,66 @@ To remove the config from the image, you can use -d option as below::
Then add "bootconfig" on the normal kernel command line to tell the
kernel to look for the bootconfig at the end of the initrd file.
+Alternatively, build your kernel with the ``CONFIG_BOOT_CONFIG_FORCE``
+Kconfig option selected.
+
+Embedding a Boot Config into Kernel
+-----------------------------------
+
+If you can not use initrd, you can also embed the bootconfig file in the
+kernel by Kconfig options. In this case, you need to recompile the kernel
+with the following configs::
+
+ CONFIG_BOOT_CONFIG_EMBED=y
+ CONFIG_BOOT_CONFIG_EMBED_FILE="/PATH/TO/BOOTCONFIG/FILE"
+
+``CONFIG_BOOT_CONFIG_EMBED_FILE`` requires an absolute path or a relative
+path to the bootconfig file from source tree or object tree.
+The kernel will embed it as the default bootconfig.
+
+Just as when attaching the bootconfig to the initrd, you need ``bootconfig``
+option on the kernel command line to enable the embedded bootconfig, or,
+alternatively, build your kernel with the ``CONFIG_BOOT_CONFIG_FORCE``
+Kconfig option selected.
+
+Note that even if you set this option, you can override the embedded
+bootconfig by another bootconfig which attached to the initrd.
+
+Kernel parameters via Boot Config
+=================================
+
+In addition to the kernel command line, the boot config can be used for
+passing the kernel parameters. All the key-value pairs under ``kernel``
+key will be passed to kernel cmdline directly. Moreover, the key-value
+pairs under ``init`` will be passed to init process via the cmdline.
+The parameters are concatenated with user-given kernel cmdline string
+as the following order, so that the command line parameter can override
+bootconfig parameters (this depends on how the subsystem handles parameters
+but in general, earlier parameter will be overwritten by later one.)::
+
+ [bootconfig params][cmdline params] -- [bootconfig init params][cmdline init params]
+
+Here is an example of the bootconfig file for kernel/init parameters.::
+
+ kernel {
+ root = 01234567-89ab-cdef-0123-456789abcd
+ }
+ init {
+ splash
+ }
+
+This will be copied into the kernel cmdline string as the following::
+
+ root="01234567-89ab-cdef-0123-456789abcd" -- splash
+
+If user gives some other command line like,::
+
+ ro bootconfig -- quiet
+
+The final kernel cmdline will be the following::
+
+ root="01234567-89ab-cdef-0123-456789abcd" ro bootconfig -- splash quiet
+
Config File Limitation
======================
@@ -165,7 +273,8 @@ up to 512 key-value pairs. If keys contains 3 words in average, it can
contain 256 key-value pairs. In most cases, the number of config items
will be under 100 entries and smaller than 8KB, so it would be enough.
If the node number exceeds 1024, parser returns an error even if the file
-size is smaller than 32KB.
+size is smaller than 32KB. (Note that this maximum size is not including
+the padding null characters.)
Anyway, since bootconfig command verifies it when appending a boot config
to initrd image, user can notice it before boot.
diff --git a/Documentation/admin-guide/bug-bisect.rst b/Documentation/admin-guide/bug-bisect.rst
index 59567da344e8..325c5d0ed34a 100644
--- a/Documentation/admin-guide/bug-bisect.rst
+++ b/Documentation/admin-guide/bug-bisect.rst
@@ -15,7 +15,7 @@ give up. Report as much as you have found to the relevant maintainer. See
MAINTAINERS for who that is for the subsystem you have worked on.
Before you submit a bug report read
-:ref:`Documentation/admin-guide/reporting-bugs.rst <reportingbugs>`.
+'Documentation/admin-guide/reporting-issues.rst'.
Devices not appearing
=====================
diff --git a/Documentation/admin-guide/bug-hunting.rst b/Documentation/admin-guide/bug-hunting.rst
index 44b8a4edd348..95299b08c405 100644
--- a/Documentation/admin-guide/bug-hunting.rst
+++ b/Documentation/admin-guide/bug-hunting.rst
@@ -49,15 +49,19 @@ the issue, it may also contain the word **Oops**, as on this one::
Despite being an **Oops** or some other sort of stack trace, the offended
line is usually required to identify and handle the bug. Along this chapter,
-we'll refer to "Oops" for all kinds of stack traces that need to be analized.
+we'll refer to "Oops" for all kinds of stack traces that need to be analyzed.
-.. note::
+If the kernel is compiled with ``CONFIG_DEBUG_INFO``, you can enhance the
+quality of the stack trace by using file:`scripts/decode_stacktrace.sh`.
+
+Modules linked in
+-----------------
+
+Modules that are tainted or are being loaded or unloaded are marked with
+"(...)", where the taint flags are described in
+file:`Documentation/admin-guide/tainted-kernels.rst`, "being loaded" is
+annotated with "+", and "being unloaded" is annotated with "-".
- ``ksymoops`` is useless on 2.6 or upper. Please use the Oops in its original
- format (from ``dmesg``, etc). Ignore any references in this or other docs to
- "decoding the Oops" or "running it through ksymoops".
- If you post an Oops from 2.6+ that has been run through ``ksymoops``,
- people will just tell you to repost it.
Where is the Oops message is located?
-------------------------------------
@@ -71,7 +75,7 @@ by running ``journalctl`` command.
Sometimes ``klogd`` dies, in which case you can run ``dmesg > file`` to
read the data from the kernel buffers and save it. Or you can
``cat /proc/kmsg > file``, however you have to break in to stop the transfer,
-``kmsg`` is a "never ending file".
+since ``kmsg`` is a "never ending file".
If the machine has crashed so badly that you cannot enter commands or
the disk is not available then you have three options:
@@ -81,9 +85,9 @@ the disk is not available then you have three options:
planned for a crash. Alternatively, you can take a picture of
the screen with a digital camera - not nice, but better than
nothing. If the messages scroll off the top of the console, you
- may find that booting with a higher resolution (eg, ``vga=791``)
+ may find that booting with a higher resolution (e.g., ``vga=791``)
will allow you to read more of the text. (Caveat: This needs ``vesafb``,
- so won't help for 'early' oopses)
+ so won't help for 'early' oopses.)
(2) Boot with a serial console (see
:ref:`Documentation/admin-guide/serial-console.rst <serial_console>`),
@@ -104,7 +108,7 @@ Kernel source file. There are two methods for doing that. Usually, using
gdb
^^^
-The GNU debug (``gdb``) is the best way to figure out the exact file and line
+The GNU debugger (``gdb``) is the best way to figure out the exact file and line
number of the OOPS from the ``vmlinux`` file.
The usage of gdb works best on a kernel compiled with ``CONFIG_DEBUG_INFO``.
@@ -165,7 +169,7 @@ If you have a call trace, such as::
[<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
...
-this shows the problem likely in the :jbd: module. You can load that module
+this shows the problem likely is in the :jbd: module. You can load that module
in gdb and list the relevant code::
$ gdb fs/jbd/jbd.ko
@@ -199,8 +203,9 @@ in the kernel hacking menu of the menu configuration.) For example::
You need to be at the top level of the kernel tree for this to pick up
your C files.
-If you don't have access to the code you can also debug on some crash dumps
-e.g. crash dump output as shown by Dave Miller::
+If you don't have access to the source code you can still debug some crash
+dumps using the following method (example crash dump output as shown by
+Dave Miller)::
EIP is at +0x14/0x4c0
...
@@ -230,6 +235,9 @@ e.g. crash dump output as shown by Dave Miller::
mov 0x8(%ebp), %ebx ! %ebx = skb->sk
mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
+file:`scripts/decodecode` can be used to automate most of this, depending
+on what CPU architecture is being debugged.
+
Reporting the bug
-----------------
@@ -241,7 +249,7 @@ used for the development of the affected code. This can be done by using
the ``get_maintainer.pl`` script.
For example, if you find a bug at the gspca's sonixj.c file, you can get
-their maintainers with::
+its maintainers with::
$ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c
Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%)
@@ -253,16 +261,17 @@ their maintainers with::
Please notice that it will point to:
-- The last developers that touched on the source code. On the above example,
- Tejun and Bhaktipriya (in this specific case, none really envolved on the
- development of this file);
+- The last developers that touched the source code (if this is done inside
+ a git tree). On the above example, Tejun and Bhaktipriya (in this
+ specific case, none really involved on the development of this file);
- The driver maintainer (Hans Verkuil);
- The subsystem maintainer (Mauro Carvalho Chehab);
- The driver and/or subsystem mailing list (linux-media@vger.kernel.org);
- the Linux Kernel mailing list (linux-kernel@vger.kernel.org).
Usually, the fastest way to have your bug fixed is to report it to mailing
-list used for the development of the code (linux-media ML) copying the driver maintainer (Hans).
+list used for the development of the code (linux-media ML) copying the
+driver maintainer (Hans).
If you are totally stumped as to whom to send the report, and
``get_maintainer.pl`` didn't provide you anything useful, send it to
@@ -303,9 +312,9 @@ protection fault message can be simply cut out of the message files
and forwarded to the kernel developers.
Two types of address resolution are performed by ``klogd``. The first is
-static translation and the second is dynamic translation. Static
-translation uses the System.map file in much the same manner that
-ksymoops does. In order to do static translation the ``klogd`` daemon
+static translation and the second is dynamic translation.
+Static translation uses the System.map file.
+In order to do static translation the ``klogd`` daemon
must be able to find a system map file at daemon initialization time.
See the klogd man page for information on how ``klogd`` searches for map
files.
diff --git a/Documentation/admin-guide/cgroup-v1/blkio-controller.rst b/Documentation/admin-guide/cgroup-v1/blkio-controller.rst
index 36d43ae7dc13..dabb80cdd25a 100644
--- a/Documentation/admin-guide/cgroup-v1/blkio-controller.rst
+++ b/Documentation/admin-guide/cgroup-v1/blkio-controller.rst
@@ -17,36 +17,37 @@ level logical devices like device mapper.
HOWTO
=====
+
Throttling/Upper Limit policy
-----------------------------
-- Enable Block IO controller::
+Enable Block IO controller::
CONFIG_BLK_CGROUP=y
-- Enable throttling in block layer::
+Enable throttling in block layer::
CONFIG_BLK_DEV_THROTTLING=y
-- Mount blkio controller (see cgroups.txt, Why are cgroups needed?)::
+Mount blkio controller (see cgroups.txt, Why are cgroups needed?)::
mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
-- Specify a bandwidth rate on particular device for root group. The format
- for policy is "<major>:<minor> <bytes_per_second>"::
+Specify a bandwidth rate on particular device for root group. The format
+for policy is "<major>:<minor> <bytes_per_second>"::
echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device
- Above will put a limit of 1MB/second on reads happening for root group
- on device having major/minor number 8:16.
+This will put a limit of 1MB/second on reads happening for root group
+on device having major/minor number 8:16.
-- Run dd to read a file and see if rate is throttled to 1MB/s or not::
+Run dd to read a file and see if rate is throttled to 1MB/s or not::
# dd iflag=direct if=/mnt/common/zerofile of=/dev/null bs=4K count=1024
1024+0 records in
1024+0 records out
4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s
- Limits for writes can be put using blkio.throttle.write_bps_device file.
+Limits for writes can be put using blkio.throttle.write_bps_device file.
Hierarchical Cgroups
====================
@@ -79,85 +80,89 @@ following::
Various user visible config options
===================================
-CONFIG_BLK_CGROUP
- - Block IO controller.
-CONFIG_BFQ_CGROUP_DEBUG
- - Debug help. Right now some additional stats file show up in cgroup
+ CONFIG_BLK_CGROUP
+ Block IO controller.
+
+ CONFIG_BFQ_CGROUP_DEBUG
+ Debug help. Right now some additional stats file show up in cgroup
if this option is enabled.
-CONFIG_BLK_DEV_THROTTLING
- - Enable block device throttling support in block layer.
+ CONFIG_BLK_DEV_THROTTLING
+ Enable block device throttling support in block layer.
Details of cgroup files
=======================
+
Proportional weight policy files
--------------------------------
-- blkio.weight
- - Specifies per cgroup weight. This is default weight of the group
- on all the devices until and unless overridden by per device rule.
- (See blkio.weight_device).
- Currently allowed range of weights is from 10 to 1000.
-- blkio.weight_device
- - One can specify per cgroup per device rules using this interface.
- These rules override the default value of group weight as specified
- by blkio.weight.
+ blkio.bfq.weight
+ Specifies per cgroup weight. This is default weight of the group
+ on all the devices until and unless overridden by per device rule
+ (see `blkio.bfq.weight_device` below).
+
+ Currently allowed range of weights is from 1 to 1000. For more details,
+ see Documentation/block/bfq-iosched.rst.
+
+ blkio.bfq.weight_device
+ Specifies per cgroup per device weights, overriding the default group
+ weight. For more details, see Documentation/block/bfq-iosched.rst.
Following is the format::
- # echo dev_maj:dev_minor weight > blkio.weight_device
+ # echo dev_maj:dev_minor weight > blkio.bfq.weight_device
Configure weight=300 on /dev/sdb (8:16) in this cgroup::
- # echo 8:16 300 > blkio.weight_device
- # cat blkio.weight_device
+ # echo 8:16 300 > blkio.bfq.weight_device
+ # cat blkio.bfq.weight_device
dev weight
8:16 300
Configure weight=500 on /dev/sda (8:0) in this cgroup::
- # echo 8:0 500 > blkio.weight_device
- # cat blkio.weight_device
+ # echo 8:0 500 > blkio.bfq.weight_device
+ # cat blkio.bfq.weight_device
dev weight
8:0 500
8:16 300
Remove specific weight for /dev/sda in this cgroup::
- # echo 8:0 0 > blkio.weight_device
- # cat blkio.weight_device
+ # echo 8:0 0 > blkio.bfq.weight_device
+ # cat blkio.bfq.weight_device
dev weight
8:16 300
-- blkio.time
- - disk time allocated to cgroup per device in milliseconds. First
+ blkio.time
+ Disk time allocated to cgroup per device in milliseconds. First
two fields specify the major and minor number of the device and
third field specifies the disk time allocated to group in
milliseconds.
-- blkio.sectors
- - number of sectors transferred to/from disk by the group. First
+ blkio.sectors
+ Number of sectors transferred to/from disk by the group. First
two fields specify the major and minor number of the device and
third field specifies the number of sectors transferred by the
group to/from the device.
-- blkio.io_service_bytes
- - Number of bytes transferred to/from the disk by the group. These
+ blkio.io_service_bytes
+ Number of bytes transferred to/from the disk by the group. These
are further divided by the type of operation - read or write, sync
or async. First two fields specify the major and minor number of the
device, third field specifies the operation type and the fourth field
specifies the number of bytes.
-- blkio.io_serviced
- - Number of IOs (bio) issued to the disk by the group. These
+ blkio.io_serviced
+ Number of IOs (bio) issued to the disk by the group. These
are further divided by the type of operation - read or write, sync
or async. First two fields specify the major and minor number of the
device, third field specifies the operation type and the fourth field
specifies the number of IOs.
-- blkio.io_service_time
- - Total amount of time between request dispatch and request completion
+ blkio.io_service_time
+ Total amount of time between request dispatch and request completion
for the IOs done by this cgroup. This is in nanoseconds to make it
meaningful for flash devices too. For devices with queue depth of 1,
this time represents the actual service time. When queue_depth > 1,
@@ -170,8 +175,8 @@ Proportional weight policy files
specifies the operation type and the fourth field specifies the
io_service_time in ns.
-- blkio.io_wait_time
- - Total amount of time the IOs for this cgroup spent waiting in the
+ blkio.io_wait_time
+ Total amount of time the IOs for this cgroup spent waiting in the
scheduler queues for service. This can be greater than the total time
elapsed since it is cumulative io_wait_time for all IOs. It is not a
measure of total time the cgroup spent waiting but rather a measure of
@@ -185,24 +190,24 @@ Proportional weight policy files
minor number of the device, third field specifies the operation type
and the fourth field specifies the io_wait_time in ns.
-- blkio.io_merged
- - Total number of bios/requests merged into requests belonging to this
+ blkio.io_merged
+ Total number of bios/requests merged into requests belonging to this
cgroup. This is further divided by the type of operation - read or
write, sync or async.
-- blkio.io_queued
- - Total number of requests queued up at any given instant for this
+ blkio.io_queued
+ Total number of requests queued up at any given instant for this
cgroup. This is further divided by the type of operation - read or
write, sync or async.
-- blkio.avg_queue_size
- - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
+ blkio.avg_queue_size
+ Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
The average queue size for this cgroup over the entire time of this
cgroup's existence. Queue size samples are taken each time one of the
queues of this cgroup gets a timeslice.
-- blkio.group_wait_time
- - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
+ blkio.group_wait_time
+ Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
This is the amount of time the cgroup had to wait since it became busy
(i.e., went from 0 to 1 request queued) to get a timeslice for one of
its queues. This is different from the io_wait_time which is the
@@ -212,8 +217,8 @@ Proportional weight policy files
will only report the group_wait_time accumulated till the last time it
got a timeslice and will not include the current delta.
-- blkio.empty_time
- - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
+ blkio.empty_time
+ Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
This is the amount of time a cgroup spends without any pending
requests when not being served, i.e., it does not include any time
spent idling for one of the queues of the cgroup. This is in
@@ -221,8 +226,8 @@ Proportional weight policy files
the stat will only report the empty_time accumulated till the last
time it had a pending request and will not include the current delta.
-- blkio.idle_time
- - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
+ blkio.idle_time
+ Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
This is the amount of time spent by the IO scheduler idling for a
given cgroup in anticipation of a better request than the existing ones
from other queues/cgroups. This is in nanoseconds. If this is read
@@ -230,60 +235,60 @@ Proportional weight policy files
idle_time accumulated till the last idle period and will not include
the current delta.
-- blkio.dequeue
- - Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This
+ blkio.dequeue
+ Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This
gives the statistics about how many a times a group was dequeued
from service tree of the device. First two fields specify the major
and minor number of the device and third field specifies the number
of times a group was dequeued from a particular device.
-- blkio.*_recursive
- - Recursive version of various stats. These files show the
+ blkio.*_recursive
+ Recursive version of various stats. These files show the
same information as their non-recursive counterparts but
include stats from all the descendant cgroups.
Throttling/Upper limit policy files
-----------------------------------
-- blkio.throttle.read_bps_device
- - Specifies upper limit on READ rate from the device. IO rate is
+ blkio.throttle.read_bps_device
+ Specifies upper limit on READ rate from the device. IO rate is
specified in bytes per second. Rules are per device. Following is
the format::
echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.throttle.read_bps_device
-- blkio.throttle.write_bps_device
- - Specifies upper limit on WRITE rate to the device. IO rate is
+ blkio.throttle.write_bps_device
+ Specifies upper limit on WRITE rate to the device. IO rate is
specified in bytes per second. Rules are per device. Following is
the format::
echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.throttle.write_bps_device
-- blkio.throttle.read_iops_device
- - Specifies upper limit on READ rate from the device. IO rate is
+ blkio.throttle.read_iops_device
+ Specifies upper limit on READ rate from the device. IO rate is
specified in IO per second. Rules are per device. Following is
the format::
echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.throttle.read_iops_device
-- blkio.throttle.write_iops_device
- - Specifies upper limit on WRITE rate to the device. IO rate is
+ blkio.throttle.write_iops_device
+ Specifies upper limit on WRITE rate to the device. IO rate is
specified in io per second. Rules are per device. Following is
the format::
echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.throttle.write_iops_device
-Note: If both BW and IOPS rules are specified for a device, then IO is
- subjected to both the constraints.
+ Note: If both BW and IOPS rules are specified for a device, then IO is
+ subjected to both the constraints.
-- blkio.throttle.io_serviced
- - Number of IOs (bio) issued to the disk by the group. These
+ blkio.throttle.io_serviced
+ Number of IOs (bio) issued to the disk by the group. These
are further divided by the type of operation - read or write, sync
or async. First two fields specify the major and minor number of the
device, third field specifies the operation type and the fourth field
specifies the number of IOs.
-- blkio.throttle.io_service_bytes
- - Number of bytes transferred to/from the disk by the group. These
+ blkio.throttle.io_service_bytes
+ Number of bytes transferred to/from the disk by the group. These
are further divided by the type of operation - read or write, sync
or async. First two fields specify the major and minor number of the
device, third field specifies the operation type and the fourth field
@@ -291,6 +296,6 @@ Note: If both BW and IOPS rules are specified for a device, then IO is
Common files among various policies
-----------------------------------
-- blkio.reset_stats
- - Writing an int to this file will result in resetting all the stats
+ blkio.reset_stats
+ Writing an int to this file will result in resetting all the stats
for that cgroup.
diff --git a/Documentation/admin-guide/cgroup-v1/cgroups.rst b/Documentation/admin-guide/cgroup-v1/cgroups.rst
index b0688011ed06..9343148ee993 100644
--- a/Documentation/admin-guide/cgroup-v1/cgroups.rst
+++ b/Documentation/admin-guide/cgroup-v1/cgroups.rst
@@ -80,6 +80,8 @@ access. For example, cpusets (see Documentation/admin-guide/cgroup-v1/cpusets.rs
you to associate a set of CPUs and a set of memory nodes with the
tasks in each cgroup.
+.. _cgroups-why-needed:
+
1.2 Why are cgroups needed ?
----------------------------
diff --git a/Documentation/admin-guide/cgroup-v1/cpusets.rst b/Documentation/admin-guide/cgroup-v1/cpusets.rst
index 7ade3abd342a..ae646d621a8a 100644
--- a/Documentation/admin-guide/cgroup-v1/cpusets.rst
+++ b/Documentation/admin-guide/cgroup-v1/cpusets.rst
@@ -1,3 +1,5 @@
+.. _cpusets:
+
=======
CPUSETS
=======
@@ -717,7 +719,7 @@ There are ways to query or modify cpusets:
cat, rmdir commands from the shell, or their equivalent from C.
- via the C library libcpuset.
- via the C library libcgroup.
- (http://sourceforge.net/projects/libcg/)
+ (https://github.com/libcgroup/libcgroup/)
- via the python application cset.
(http://code.google.com/p/cpuset/)
diff --git a/Documentation/admin-guide/cgroup-v1/hugetlb.rst b/Documentation/admin-guide/cgroup-v1/hugetlb.rst
index 338f2c7d7a1c..0fa724d82abb 100644
--- a/Documentation/admin-guide/cgroup-v1/hugetlb.rst
+++ b/Documentation/admin-guide/cgroup-v1/hugetlb.rst
@@ -29,12 +29,14 @@ Brief summary of control files::
hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded
hugetlb.<hugepagesize>.usage_in_bytes # show current usage for "hugepagesize" hugetlb
hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB usage limit
+ hugetlb.<hugepagesize>.numa_stat # show the numa information of the hugetlb memory charged to this cgroup
For a system supporting three hugepage sizes (64k, 32M and 1G), the control
files include::
hugetlb.1GB.limit_in_bytes
hugetlb.1GB.max_usage_in_bytes
+ hugetlb.1GB.numa_stat
hugetlb.1GB.usage_in_bytes
hugetlb.1GB.failcnt
hugetlb.1GB.rsvd.limit_in_bytes
@@ -43,6 +45,7 @@ files include::
hugetlb.1GB.rsvd.failcnt
hugetlb.64KB.limit_in_bytes
hugetlb.64KB.max_usage_in_bytes
+ hugetlb.64KB.numa_stat
hugetlb.64KB.usage_in_bytes
hugetlb.64KB.failcnt
hugetlb.64KB.rsvd.limit_in_bytes
@@ -51,6 +54,7 @@ files include::
hugetlb.64KB.rsvd.failcnt
hugetlb.32MB.limit_in_bytes
hugetlb.32MB.max_usage_in_bytes
+ hugetlb.32MB.numa_stat
hugetlb.32MB.usage_in_bytes
hugetlb.32MB.failcnt
hugetlb.32MB.rsvd.limit_in_bytes
diff --git a/Documentation/admin-guide/cgroup-v1/index.rst b/Documentation/admin-guide/cgroup-v1/index.rst
index 226f64473e8e..99fbc8a64ba9 100644
--- a/Documentation/admin-guide/cgroup-v1/index.rst
+++ b/Documentation/admin-guide/cgroup-v1/index.rst
@@ -17,6 +17,7 @@ Control Groups version 1
hugetlb
memcg_test
memory
+ misc
net_cls
net_prio
pids
diff --git a/Documentation/admin-guide/cgroup-v1/memcg_test.rst b/Documentation/admin-guide/cgroup-v1/memcg_test.rst
index 3f7115e07b5d..1f128458ddea 100644
--- a/Documentation/admin-guide/cgroup-v1/memcg_test.rst
+++ b/Documentation/admin-guide/cgroup-v1/memcg_test.rst
@@ -62,7 +62,7 @@ Please note that implementation details can be changed.
At cancel(), simply usage -= PAGE_SIZE.
-Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
+Under below explanation, we assume CONFIG_SWAP=y.
4. Anonymous
============
@@ -97,7 +97,7 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
=============
Page Cache is charged at
- - add_to_page_cache_locked().
+ - filemap_add_folio().
The logic is very clear. (About migration, see below)
@@ -133,18 +133,9 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
8. LRU
======
- Each memcg has its own private LRU. Now, its handling is under global
- VM's control (means that it's handled under global pgdat->lru_lock).
- Almost all routines around memcg's LRU is called by global LRU's
- list management functions under pgdat->lru_lock.
-
- A special function is mem_cgroup_isolate_pages(). This scans
- memcg's private LRU and call __isolate_lru_page() to extract a page
- from LRU.
-
- (By __isolate_lru_page(), the page is removed from both of global and
- private LRU.)
-
+ Each memcg has its own vector of LRUs (inactive anon, active anon,
+ inactive file, active file, unevictable) of pages from each node,
+ each LRU handled under a single lru_lock for that memcg and node.
9. Typical Tests.
=================
@@ -219,13 +210,11 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
This is an easy way to test page migration, too.
-9.5 mkdir/rmdir
----------------
+9.5 nested cgroups
+------------------
- When using hierarchy, mkdir/rmdir test should be done.
- Use tests like the following::
+ Use tests like the following for testing nested cgroups::
- echo 1 >/opt/cgroup/01/memory/use_hierarchy
mkdir /opt/cgroup/01/child_a
mkdir /opt/cgroup/01/child_b
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 0ae4f564c2d6..ca7d9402f6be 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -2,18 +2,18 @@
Memory Resource Controller
==========================
-NOTE:
+.. caution::
This document is hopelessly outdated and it asks for a complete
rewrite. It still contains a useful information so we are keeping it
here but make sure to check the current code if you need a deeper
understanding.
-NOTE:
+.. note::
The Memory Resource Controller has generically been referred to as the
memory controller in this document. Do not confuse memory controller
used here with the memory controller that is used in hardware.
-(For editors) In this document:
+.. hint::
When we mention a cgroup (cgroupfs's directory) with memory controller,
we call it "memory cgroup". When you see git-log and source code, you'll
see patch's title and function names tend to use "memcg".
@@ -23,7 +23,7 @@ Benefits and Purpose of the memory controller
=============================================
The memory controller isolates the memory behaviour of a group of tasks
-from the rest of the system. The article on LWN [12] mentions some probable
+from the rest of the system. The article on LWN [12]_ mentions some probable
uses of the memory controller. The memory controller can be used to
a. Isolate an application or a group of applications
@@ -55,7 +55,8 @@ Features:
- Root cgroup has no limit controls.
Kernel memory support is a work in progress, and the current version provides
- basically functionality. (See Section 2.7)
+ basically functionality. (See :ref:`section 2.7
+ <cgroup-v1-memory-kernel-extension>`)
Brief summary of control files.
@@ -64,6 +65,7 @@ Brief summary of control files.
threads
cgroup.procs show list of processes
cgroup.event_control an interface for event_fd()
+ This knob is not available on CONFIG_PREEMPT_RT systems.
memory.usage_in_bytes show current usage for memory
(See 5.5 for details)
memory.memsw.usage_in_bytes show current usage for memory+Swap
@@ -75,20 +77,28 @@ Brief summary of control files.
memory.max_usage_in_bytes show max memory usage recorded
memory.memsw.max_usage_in_bytes show max memory+Swap usage recorded
memory.soft_limit_in_bytes set/show soft limit of memory usage
+ This knob is not available on CONFIG_PREEMPT_RT systems.
memory.stat show various statistics
memory.use_hierarchy set/show hierarchical account enabled
+ This knob is deprecated and shouldn't be
+ used.
memory.force_empty trigger forced page reclaim
memory.pressure_level set memory pressure notifications
memory.swappiness set/show swappiness parameter of vmscan
(See sysctl's vm.swappiness)
memory.move_charge_at_immigrate set/show controls of moving charges
+ This knob is deprecated and shouldn't be
+ used.
memory.oom_control set/show oom controls.
memory.numa_stat show the number of memory usage per numa
node
- memory.kmem.limit_in_bytes set/show hard limit for kernel memory
- This knob is deprecated and shouldn't be
- used. It is planned that this be removed in
- the foreseeable future.
+ memory.kmem.limit_in_bytes Deprecated knob to set and read the kernel
+ memory hard limit. Kernel hard limit is not
+ supported since 5.16. Writing any value to
+ do file will not have any effect same as if
+ nokmem kernel parameter was specified.
+ Kernel memory is still charged and reported
+ by memory.kmem.usage_in_bytes.
memory.kmem.usage_in_bytes show current kernel memory allocation
memory.kmem.failcnt show the number of kernel memory usage
hits limits
@@ -105,16 +115,16 @@ Brief summary of control files.
==========
The memory controller has a long history. A request for comments for the memory
-controller was posted by Balbir Singh [1]. At the time the RFC was posted
+controller was posted by Balbir Singh [1]_. At the time the RFC was posted
there were several implementations for memory control. The goal of the
RFC was to build consensus and agreement for the minimal features required
-for memory control. The first RSS controller was posted by Balbir Singh[2]
-in Feb 2007. Pavel Emelianov [3][4][5] has since posted three versions of the
-RSS controller. At OLS, at the resource management BoF, everyone suggested
-that we handle both page cache and RSS together. Another request was raised
-to allow user space handling of OOM. The current memory controller is
+for memory control. The first RSS controller was posted by Balbir Singh [2]_
+in Feb 2007. Pavel Emelianov [3]_ [4]_ [5]_ has since posted three versions
+of the RSS controller. At OLS, at the resource management BoF, everyone
+suggested that we handle both page cache and RSS together. Another request was
+raised to allow user space handling of OOM. The current memory controller is
at version 6; it combines both mapped (RSS) and unmapped Page
-Cache Control [11].
+Cache Control [11]_.
2. Memory Control
=================
@@ -145,7 +155,8 @@ specific data structure (mem_cgroup) associated with it.
2.2. Accounting
---------------
-::
+.. code-block::
+ :caption: Figure 1: Hierarchy of Accounting
+--------------------+
| mem_cgroup |
@@ -165,7 +176,6 @@ specific data structure (mem_cgroup) associated with it.
| | | |
+---------------+ +---------------+
- (Figure 1: Hierarchy of Accounting)
Figure 1 shows the important aspects of the controller
@@ -192,18 +202,18 @@ are not accounted. We just account pages under usual VM management.
RSS pages are accounted at page_fault unless they've already been accounted
for earlier. A file page will be accounted for as Page Cache when it's
-inserted into inode (radix-tree). While it's mapped into the page tables of
+inserted into inode (xarray). While it's mapped into the page tables of
processes, duplicate accounting is carefully avoided.
An RSS page is unaccounted when it's fully unmapped. A PageCache page is
-unaccounted when it's removed from radix-tree. Even if RSS pages are fully
+unaccounted when it's removed from xarray. Even if RSS pages are fully
unmapped (by kswapd), they may exist as SwapCache in the system until they
are really freed. Such SwapCaches are also accounted.
-A swapped-in page is not accounted until it's mapped.
+A swapped-in page is accounted after adding into swapcache.
Note: The kernel does swapin-readahead and reads multiple swaps at once.
-This means swapped-in pages may contain pages for other tasks than a task
-causing page fault. So, we avoid accounting at swap-in I/O.
+Since page's memcg recorded into swap whatever memsw enabled, the page will
+be accounted after swapin.
At page migration, accounting information is kept.
@@ -219,21 +229,17 @@ behind this approach is that a cgroup that aggressively uses a shared
page will eventually get charged for it (once it is uncharged from
the cgroup that brought it in -- this will happen on memory pressure).
-But see section 8.2: when moving a task to another cgroup, its pages may
-be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
-
-Exception: If CONFIG_MEMCG_SWAP is not used.
-When you do swapoff and make swapped-out pages of shmem(tmpfs) to
-be backed into memory in force, charges for pages are accounted against the
-caller of swapoff rather than the users of shmem.
+But see :ref:`section 8.2 <cgroup-v1-memory-movable-charges>` when moving a
+task to another cgroup, its pages may be recharged to the new cgroup, if
+move_charge_at_immigrate has been chosen.
-2.4 Swap Extension (CONFIG_MEMCG_SWAP)
+2.4 Swap Extension
--------------------------------------
-Swap Extension allows you to record charge for swap. A swapped-in page is
-charged back to original page allocator if possible.
+Swap usage is always recorded for each of cgroup. Swap Extension allows you to
+read and limit it.
-When swap is accounted, following files are added.
+When CONFIG_SWAP is enabled, following files are added.
- memory.memsw.usage_in_bytes.
- memory.memsw.limit_in_bytes.
@@ -247,7 +253,8 @@ In this case, setting memsw.limit_in_bytes=3G will prevent bad use of swap.
By using the memsw limit, you can avoid system OOM which can be caused by swap
shortage.
-**why 'memory+swap' rather than swap**
+2.4.1 why 'memory+swap' rather than swap
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
to move account from memory to swap...there is no change in usage of
@@ -255,7 +262,8 @@ memory+swap. In other words, when we want to limit the usage of swap without
affecting global LRU, memory+swap limit is better than just limiting swap from
an OS point of view.
-**What happens when a cgroup hits memory.memsw.limit_in_bytes**
+2.4.2. What happens when a cgroup hits memory.memsw.limit_in_bytes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out
in this cgroup. Then, swap-out will not be done by cgroup routine and file
@@ -271,41 +279,40 @@ global VM. When a cgroup goes over its limit, we first try
to reclaim memory from the cgroup so as to make space for the new
pages that the cgroup has touched. If the reclaim is unsuccessful,
an OOM routine is invoked to select and kill the bulkiest task in the
-cgroup. (See 10. OOM Control below.)
+cgroup. (See :ref:`10. OOM Control <cgroup-v1-memory-oom-control>` below.)
The reclaim algorithm has not been modified for cgroups, except that
pages that are selected for reclaiming come from the per-cgroup LRU
list.
-NOTE:
- Reclaim does not work for the root cgroup, since we cannot set any
- limits on the root cgroup.
+.. note::
+ Reclaim does not work for the root cgroup, since we cannot set any
+ limits on the root cgroup.
-Note2:
- When panic_on_oom is set to "2", the whole system will panic.
+.. note::
+ When panic_on_oom is set to "2", the whole system will panic.
When oom event notifier is registered, event will be delivered.
-(See oom_control section)
+(See :ref:`oom_control <cgroup-v1-memory-oom-control>` section)
2.6 Locking
-----------
- lock_page_cgroup()/unlock_page_cgroup() should not be called under
- the i_pages lock.
+Lock order is as follows::
- Other lock order is following:
+ Page lock (PG_locked bit of page->flags)
+ mm->page_table_lock or split pte_lock
+ folio_memcg_lock (memcg->move_lock)
+ mapping->i_pages lock
+ lruvec->lru_lock.
- PG_locked.
- mm->page_table_lock
- pgdat->lru_lock
- lock_page_cgroup.
+Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by
+lruvec->lru_lock; PG_lru bit of page->flags is cleared before
+isolating a page from its LRU under lruvec->lru_lock.
- In many cases, just lock_page_cgroup() is called.
+.. _cgroup-v1-memory-kernel-extension:
- per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by
- pgdat->lru_lock, it has no lock of its own.
-
-2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM)
+2.7 Kernel Memory Extension
-----------------------------------------------
With the Kernel memory extension, the Memory Controller is able to limit
@@ -366,17 +373,17 @@ U != 0, K = unlimited:
U != 0, K < U:
Kernel memory is a subset of the user memory. This setup is useful in
- deployments where the total amount of memory per-cgroup is overcommited.
- Overcommiting kernel memory limits is definitely not recommended, since the
+ deployments where the total amount of memory per-cgroup is overcommitted.
+ Overcommitting kernel memory limits is definitely not recommended, since the
box can still run out of non-reclaimable memory.
In this case, the admin could set up K so that the sum of all groups is
never greater than the total memory, and freely set U at the cost of his
QoS.
-WARNING:
- In the current implementation, memory reclaim will NOT be
- triggered for a cgroup when it hits K while staying below U, which makes
- this setup impractical.
+ .. warning::
+ In the current implementation, memory reclaim will NOT be triggered for
+ a cgroup when it hits K while staying below U, which makes this setup
+ impractical.
U != 0, K >= U:
Since kmem charges will also be fed to the user counter and reclaim will be
@@ -387,47 +394,41 @@ U != 0, K >= U:
3. User Interface
=================
-3.0. Configuration
-------------------
-
-a. Enable CONFIG_CGROUPS
-b. Enable CONFIG_MEMCG
-c. Enable CONFIG_MEMCG_SWAP (to use swap extension)
-d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)
-
-3.1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
--------------------------------------------------------------------
+To use the user interface:
-::
+1. Enable CONFIG_CGROUPS and CONFIG_MEMCG options
+2. Prepare the cgroups (see :ref:`Why are cgroups needed?
+ <cgroups-why-needed>` for the background information)::
# mount -t tmpfs none /sys/fs/cgroup
# mkdir /sys/fs/cgroup/memory
# mount -t cgroup none /sys/fs/cgroup/memory -o memory
-3.2. Make the new group and move bash into it::
+3. Make the new group and move bash into it::
# mkdir /sys/fs/cgroup/memory/0
# echo $$ > /sys/fs/cgroup/memory/0/tasks
-Since now we're in the 0 cgroup, we can alter the memory limit::
+4. Since now we're in the 0 cgroup, we can alter the memory limit::
# echo 4M > /sys/fs/cgroup/memory/0/memory.limit_in_bytes
-NOTE:
- We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
- mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes,
- Gibibytes.)
+ The limit can now be queried::
-NOTE:
- We can write "-1" to reset the ``*.limit_in_bytes(unlimited)``.
+ # cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes
+ 4194304
-NOTE:
- We cannot set limits on the root cgroup any more.
+.. note::
+ We can use a suffix (k, K, m, M, g or G) to indicate values in kilo,
+ mega or gigabytes. (Here, Kilo, Mega, Giga are Kibibytes, Mebibytes,
+ Gibibytes.)
-::
+.. note::
+ We can write "-1" to reset the ``*.limit_in_bytes(unlimited)``.
+
+.. note::
+ We cannot set limits on the root cgroup any more.
- # cat /sys/fs/cgroup/memory/0/memory.limit_in_bytes
- 4194304
We can check the usage::
@@ -466,6 +467,8 @@ test because it has noise of shared objects/status.
But the above two are testing extreme situations.
Trying usual test under memory controller is always helpful.
+.. _cgroup-v1-memory-test-troubleshoot:
+
4.1 Troubleshooting
-------------------
@@ -478,8 +481,11 @@ terminated by the OOM killer. There are several causes for this:
A sync followed by echo 1 > /proc/sys/vm/drop_caches will help get rid of
some of the pages cached in the cgroup (page cache pages).
-To know what happens, disabling OOM_Kill as per "10. OOM Control" (below) and
-seeing what happens will be helpful.
+To know what happens, disabling OOM_Kill as per :ref:`"10. OOM Control"
+<cgroup-v1-memory-oom-control>` (below) and seeing what happens will be
+helpful.
+
+.. _cgroup-v1-memory-test-task-migration:
4.2 Task migration
------------------
@@ -490,26 +496,24 @@ remain charged to it, the charge is dropped when the page is freed or
reclaimed.
You can move charges of a task along with task migration.
-See 8. "Move charges at task migration"
+See :ref:`8. "Move charges at task migration" <cgroup-v1-memory-move-charges>`
4.3 Removing a cgroup
---------------------
-A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a
-cgroup might have some charge associated with it, even though all
-tasks have migrated away from it. (because we charge against pages, not
-against tasks.)
+A cgroup can be removed by rmdir, but as discussed in :ref:`sections 4.1
+<cgroup-v1-memory-test-troubleshoot>` and :ref:`4.2
+<cgroup-v1-memory-test-task-migration>`, a cgroup might have some charge
+associated with it, even though all tasks have migrated away from it. (because
+we charge against pages, not against tasks.)
-We move the stats to root (if use_hierarchy==0) or parent (if
-use_hierarchy==1), and no change on the charge except uncharging
+We move the stats to parent, and no change on the charge except uncharging
from the child.
Charges recorded in swap information is not updated at removal of cgroup.
Recorded information is discarded and a cgroup which uses swap (swapcache)
will be charged as a new owner of it.
-About use_hierarchy, see Section 6.
-
5. Misc. interfaces
===================
@@ -527,76 +531,70 @@ About use_hierarchy, see Section 6.
charged file caches. Some out-of-use page caches may keep charged until
memory pressure happens. If you want to avoid that, force_empty will be useful.
- Also, note that when memory.kmem.limit_in_bytes is set the charges due to
- kernel pages will still be seen. This is not considered a failure and the
- write will still return success. In this case, it is expected that
- memory.kmem.usage_in_bytes == memory.usage_in_bytes.
-
- About use_hierarchy, see Section 6.
-
5.2 stat file
-------------
-memory.stat file includes following statistics
-
-per-memory cgroup local status
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-=============== ===============================================================
-cache # of bytes of page cache memory.
-rss # of bytes of anonymous and swap cache memory (includes
- transparent hugepages).
-rss_huge # of bytes of anonymous transparent hugepages.
-mapped_file # of bytes of mapped file (includes tmpfs/shmem)
-pgpgin # of charging events to the memory cgroup. The charging
- event happens each time a page is accounted as either mapped
- anon page(RSS) or cache page(Page Cache) to the cgroup.
-pgpgout # of uncharging events to the memory cgroup. The uncharging
- event happens each time a page is unaccounted from the cgroup.
-swap # of bytes of swap usage
-dirty # of bytes that are waiting to get written back to the disk.
-writeback # of bytes of file/anon cache that are queued for syncing to
- disk.
-inactive_anon # of bytes of anonymous and swap cache memory on inactive
- LRU list.
-active_anon # of bytes of anonymous and swap cache memory on active
- LRU list.
-inactive_file # of bytes of file-backed memory on inactive LRU list.
-active_file # of bytes of file-backed memory on active LRU list.
-unevictable # of bytes of memory that cannot be reclaimed (mlocked etc).
-=============== ===============================================================
-
-status considering hierarchy (see memory.use_hierarchy settings)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-========================= ===================================================
-hierarchical_memory_limit # of bytes of memory limit with regard to hierarchy
- under which the memory cgroup is
-hierarchical_memsw_limit # of bytes of memory+swap limit with regard to
- hierarchy under which memory cgroup is.
-
-total_<counter> # hierarchical version of <counter>, which in
- addition to the cgroup's own value includes the
- sum of all hierarchical children's values of
- <counter>, i.e. total_cache
-========================= ===================================================
-
-The following additional stats are dependent on CONFIG_DEBUG_VM
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-========================= ========================================
-recent_rotated_anon VM internal parameter. (see mm/vmscan.c)
-recent_rotated_file VM internal parameter. (see mm/vmscan.c)
-recent_scanned_anon VM internal parameter. (see mm/vmscan.c)
-recent_scanned_file VM internal parameter. (see mm/vmscan.c)
-========================= ========================================
-
-Memo:
+memory.stat file includes following statistics:
+
+ * per-memory cgroup local status
+
+ =============== ===============================================================
+ cache # of bytes of page cache memory.
+ rss # of bytes of anonymous and swap cache memory (includes
+ transparent hugepages).
+ rss_huge # of bytes of anonymous transparent hugepages.
+ mapped_file # of bytes of mapped file (includes tmpfs/shmem)
+ pgpgin # of charging events to the memory cgroup. The charging
+ event happens each time a page is accounted as either mapped
+ anon page(RSS) or cache page(Page Cache) to the cgroup.
+ pgpgout # of uncharging events to the memory cgroup. The uncharging
+ event happens each time a page is unaccounted from the
+ cgroup.
+ swap # of bytes of swap usage
+ swapcached # of bytes of swap cached in memory
+ dirty # of bytes that are waiting to get written back to the disk.
+ writeback # of bytes of file/anon cache that are queued for syncing to
+ disk.
+ inactive_anon # of bytes of anonymous and swap cache memory on inactive
+ LRU list.
+ active_anon # of bytes of anonymous and swap cache memory on active
+ LRU list.
+ inactive_file # of bytes of file-backed memory and MADV_FREE anonymous
+ memory (LazyFree pages) on inactive LRU list.
+ active_file # of bytes of file-backed memory on active LRU list.
+ unevictable # of bytes of memory that cannot be reclaimed (mlocked etc).
+ =============== ===============================================================
+
+ * status considering hierarchy (see memory.use_hierarchy settings):
+
+ ========================= ===================================================
+ hierarchical_memory_limit # of bytes of memory limit with regard to
+ hierarchy
+ under which the memory cgroup is
+ hierarchical_memsw_limit # of bytes of memory+swap limit with regard to
+ hierarchy under which memory cgroup is.
+
+ total_<counter> # hierarchical version of <counter>, which in
+ addition to the cgroup's own value includes the
+ sum of all hierarchical children's values of
+ <counter>, i.e. total_cache
+ ========================= ===================================================
+
+ * additional vm parameters (depends on CONFIG_DEBUG_VM):
+
+ ========================= ========================================
+ recent_rotated_anon VM internal parameter. (see mm/vmscan.c)
+ recent_rotated_file VM internal parameter. (see mm/vmscan.c)
+ recent_scanned_anon VM internal parameter. (see mm/vmscan.c)
+ recent_scanned_file VM internal parameter. (see mm/vmscan.c)
+ ========================= ========================================
+
+.. hint::
recent_rotated means recent frequency of LRU rotation.
recent_scanned means recent # of scans to LRU.
showing for better debug please see the code for meanings.
-Note:
+.. note::
Only anonymous and swap cache memory is listed as part of 'rss' stat.
This should not be confused with the true 'resident set size' or the
amount of physical memory used by the cgroup.
@@ -680,31 +678,20 @@ hierarchy::
d e
In the diagram above, with hierarchical accounting enabled, all memory
-usage of e, is accounted to its ancestors up until the root (i.e, c and root),
-that has memory.use_hierarchy enabled. If one of the ancestors goes over its
-limit, the reclaim algorithm reclaims from the tasks in the ancestor and the
-children of the ancestor.
-
-6.1 Enabling hierarchical accounting and reclaim
-------------------------------------------------
+usage of e, is accounted to its ancestors up until the root (i.e, c and root).
+If one of the ancestors goes over its limit, the reclaim algorithm reclaims
+from the tasks in the ancestor and the children of the ancestor.
-A memory cgroup by default disables the hierarchy feature. Support
-can be enabled by writing 1 to memory.use_hierarchy file of the root cgroup::
+6.1 Hierarchical accounting and reclaim
+---------------------------------------
- # echo 1 > memory.use_hierarchy
-
-The feature can be disabled by::
+Hierarchical accounting is enabled by default. Disabling the hierarchical
+accounting is deprecated. An attempt to do it will result in a failure
+and a warning printed to dmesg.
- # echo 0 > memory.use_hierarchy
+For compatibility reasons writing 1 to memory.use_hierarchy will always pass::
-NOTE1:
- Enabling/disabling will fail if either the cgroup already has other
- cgroups created below it, or if the parent cgroup has use_hierarchy
- enabled.
-
-NOTE2:
- When panic_on_oom is set to "2", the whole system will panic in
- case of an OOM event in any cgroup.
+ # echo 1 > memory.use_hierarchy
7. Soft limits
==============
@@ -738,15 +725,25 @@ If we want to change this to 1G, we can at any time use::
# echo 1G > memory.soft_limit_in_bytes
-NOTE1:
+.. note::
Soft limits take effect over a long period of time, since they involve
reclaiming memory for balancing between memory cgroups
-NOTE2:
+
+.. note::
It is recommended to set the soft limit always below the hard limit,
otherwise the hard limit will take precedence.
-8. Move charges at task migration
-=================================
+.. _cgroup-v1-memory-move-charges:
+
+8. Move charges at task migration (DEPRECATED!)
+===============================================
+
+THIS IS DEPRECATED!
+
+It's expensive and unreliable! It's better practice to launch workload
+tasks directly from inside their target cgroup. Use dedicated workload
+cgroups to allow fine-grained policy adjustments without having to
+move physical pages between control domains.
Users can move charges associated with a task along with task migration, that
is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
@@ -763,23 +760,29 @@ If you want to enable it::
# echo (some positive value) > memory.move_charge_at_immigrate
-Note:
+.. note::
Each bits of move_charge_at_immigrate has its own meaning about what type
- of charges should be moved. See 8.2 for details.
-Note:
+ of charges should be moved. See :ref:`section 8.2
+ <cgroup-v1-memory-movable-charges>` for details.
+
+.. note::
Charges are moved only when you move mm->owner, in other words,
a leader of a thread group.
-Note:
+
+.. note::
If we cannot find enough space for the task in the destination cgroup, we
try to make space by reclaiming memory. Task migration may fail if we
cannot make enough space.
-Note:
+
+.. note::
It can take several seconds if you move charges much.
And if you want disable it again::
# echo 0 > memory.move_charge_at_immigrate
+.. _cgroup-v1-memory-movable-charges:
+
8.2 Type of charges which can be moved
--------------------------------------
@@ -829,6 +832,8 @@ threshold in any direction.
It's applicable for root and non-root cgroup.
+.. _cgroup-v1-memory-oom-control:
+
10. OOM Control
===============
@@ -873,6 +878,9 @@ At reading, current status of OOM is shown.
(if 1, oom-killer is disabled)
- under_oom 0 or 1
(if 1, the memory cgroup is under OOM, tasks may be stopped.)
+ - oom_kill integer counter
+ The number of processes belonging to this cgroup killed by any
+ kind of OOM killer.
11. Memory Pressure
===================
@@ -907,7 +915,7 @@ experiences some pressure. In this situation, only group C will receive the
notification, i.e. groups A and B will not receive it. This is done to avoid
excessive "broadcasting" of messages, which disturbs the system and which is
especially bad if we are low on memory or thrashing. Group B, will receive
-notification only if there are no event listers for group C.
+notification only if there are no event listeners for group C.
There are three optional modes that specify different propagation behavior:
@@ -981,25 +989,27 @@ commented and discussed quite extensively in the community.
References
==========
-1. Singh, Balbir. RFC: Memory Controller, http://lwn.net/Articles/206697/
-2. Singh, Balbir. Memory Controller (RSS Control),
+.. [1] Singh, Balbir. RFC: Memory Controller, http://lwn.net/Articles/206697/
+.. [2] Singh, Balbir. Memory Controller (RSS Control),
http://lwn.net/Articles/222762/
-3. Emelianov, Pavel. Resource controllers based on process cgroups
- http://lkml.org/lkml/2007/3/6/198
-4. Emelianov, Pavel. RSS controller based on process cgroups (v2)
- http://lkml.org/lkml/2007/4/9/78
-5. Emelianov, Pavel. RSS controller based on process cgroups (v3)
- http://lkml.org/lkml/2007/5/30/244
+.. [3] Emelianov, Pavel. Resource controllers based on process cgroups
+ https://lore.kernel.org/r/45ED7DEC.7010403@sw.ru
+.. [4] Emelianov, Pavel. RSS controller based on process cgroups (v2)
+ https://lore.kernel.org/r/461A3010.90403@sw.ru
+.. [5] Emelianov, Pavel. RSS controller based on process cgroups (v3)
+ https://lore.kernel.org/r/465D9739.8070209@openvz.org
+
6. Menage, Paul. Control Groups v10, http://lwn.net/Articles/236032/
7. Vaidyanathan, Srinivasan, Control Groups: Pagecache accounting and control
subsystem (v3), http://lwn.net/Articles/235534/
8. Singh, Balbir. RSS controller v2 test results (lmbench),
- http://lkml.org/lkml/2007/5/17/232
+ https://lore.kernel.org/r/464C95D4.7070806@linux.vnet.ibm.com
9. Singh, Balbir. RSS controller v2 AIM9 results
- http://lkml.org/lkml/2007/5/18/1
+ https://lore.kernel.org/r/464D267A.50107@linux.vnet.ibm.com
10. Singh, Balbir. Memory controller v6 test results,
- http://lkml.org/lkml/2007/8/19/36
-11. Singh, Balbir. Memory controller introduction (v6),
- http://lkml.org/lkml/2007/8/17/69
-12. Corbet, Jonathan, Controlling memory use in cgroups,
- http://lwn.net/Articles/243795/
+ https://lore.kernel.org/r/20070819094658.654.84837.sendpatchset@balbir-laptop
+
+.. [11] Singh, Balbir. Memory controller introduction (v6),
+ https://lore.kernel.org/r/20070817084228.26003.12568.sendpatchset@balbir-laptop
+.. [12] Corbet, Jonathan, Controlling memory use in cgroups,
+ http://lwn.net/Articles/243795/
diff --git a/Documentation/admin-guide/cgroup-v1/misc.rst b/Documentation/admin-guide/cgroup-v1/misc.rst
new file mode 100644
index 000000000000..661614c24df3
--- /dev/null
+++ b/Documentation/admin-guide/cgroup-v1/misc.rst
@@ -0,0 +1,4 @@
+===============
+Misc controller
+===============
+Please refer "Misc" documentation in Documentation/admin-guide/cgroup-v2.rst
diff --git a/Documentation/admin-guide/cgroup-v1/rdma.rst b/Documentation/admin-guide/cgroup-v1/rdma.rst
index 2fcb0a9bf790..e69369b7252e 100644
--- a/Documentation/admin-guide/cgroup-v1/rdma.rst
+++ b/Documentation/admin-guide/cgroup-v1/rdma.rst
@@ -114,4 +114,4 @@ Following resources can be accounted by rdma controller.
(d) Delete resource limit::
- echo echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max
+ echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index bcc80269bb6a..17e6e9565156 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1,3 +1,5 @@
+.. _cgroup-v2:
+
================
Control Group v2
================
@@ -54,6 +56,7 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
5-3-3. IO Latency
5-3-3-1. How IO Latency Throttling Works
5-3-3-2. IO Latency Interface Files
+ 5-3-4. IO Priority
5-4. PID
5-4-1. PID Interface Files
5-5. Cpuset
@@ -63,8 +66,11 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
5-7-1. RDMA Interface Files
5-8. HugeTLB
5.8-1. HugeTLB Interface Files
- 5-8. Misc
- 5-8-1. perf_event
+ 5-9. Misc
+ 5.9-1 Miscellaneous cgroup Interface Files
+ 5.9-2 Migration and Ownership
+ 5-10. Others
+ 5-10-1. perf_event
5-N. Non-normative information
5-N-1. CPU controller root cgroup process behaviour
5-N-2. IO controller root cgroup process behaviour
@@ -172,15 +178,21 @@ disabling controllers in v1 and make them always available in v2.
cgroup v2 currently supports the following mount options.
nsdelegate
-
Consider cgroup namespaces as delegation boundaries. This
option is system wide and can only be set on mount or modified
through remount from the init namespace. The mount option is
ignored on non-init namespace mounts. Please refer to the
Delegation section for details.
- memory_localevents
+ favordynmods
+ Reduce the latencies of dynamic cgroup modifications such as
+ task migrations and controller on/offs at the cost of making
+ hot path operations such as forks and exits more expensive.
+ The static usage pattern of creating a cgroup, enabling
+ controllers, and then seeding it with CLONE_INTO_CGROUP is
+ not affected by this option.
+ memory_localevents
Only populate memory.events with data for the current cgroup,
and not any subtrees. This is legacy behaviour, the default
behaviour without this option is to include subtree counts.
@@ -189,7 +201,6 @@ cgroup v2 currently supports the following mount options.
option is ignored on non-init namespace mounts.
memory_recursiveprot
-
Recursively apply memory.min and memory.low protection to
entire subtrees, without requiring explicit downward
propagation into leaf cgroups. This allows protecting entire
@@ -199,6 +210,35 @@ cgroup v2 currently supports the following mount options.
relying on the original semantics (e.g. specifying bogusly
high 'bypass' protection values at higher tree levels).
+ memory_hugetlb_accounting
+ Count HugeTLB memory usage towards the cgroup's overall
+ memory usage for the memory controller (for the purpose of
+ statistics reporting and memory protetion). This is a new
+ behavior that could regress existing setups, so it must be
+ explicitly opted in with this mount option.
+
+ A few caveats to keep in mind:
+
+ * There is no HugeTLB pool management involved in the memory
+ controller. The pre-allocated pool does not belong to anyone.
+ Specifically, when a new HugeTLB folio is allocated to
+ the pool, it is not accounted for from the perspective of the
+ memory controller. It is only charged to a cgroup when it is
+ actually used (for e.g at page fault time). Host memory
+ overcommit management has to consider this when configuring
+ hard limits. In general, HugeTLB pool management should be
+ done via other mechanisms (such as the HugeTLB controller).
+ * Failure to charge a HugeTLB folio to the memory controller
+ results in SIGBUS. This could happen even if the HugeTLB pool
+ still has pages available (but the cgroup limit is hit and
+ reclaim attempt fails).
+ * Charging HugeTLB memory towards the memory controller affects
+ memory protection and reclaim dynamics. Any userspace tuning
+ (of low, min limits for e.g) needs to take this into account.
+ * HugeTLB pages utilized while this option is not selected
+ will not be tracked by the memory controller (even if cgroup
+ v2 is remounted later on).
+
Organizing Processes and Threads
--------------------------------
@@ -353,6 +393,13 @@ constraint, a threaded controller must be able to handle competition
between threads in a non-leaf cgroup and its child cgroups. Each
threaded controller defines how such competitions are handled.
+Currently, the following controllers are threaded and can be enabled
+in a threaded cgroup::
+
+- cpu
+- cpuset
+- perf_event
+- pids
[Un]populated Notification
--------------------------
@@ -608,10 +655,12 @@ process migrations.
and is an example of this type.
+.. _cgroupv2-limits-distributor:
+
Limits
------
-A child can only consume upto the configured amount of the resource.
+A child can only consume up to the configured amount of the resource.
Limits can be over-committed - the sum of the limits of children can
exceed the amount of resource available to the parent.
@@ -624,15 +673,16 @@ process migrations.
"io.max" limits the maximum BPS and/or IOPS that a cgroup can consume
on an IO device and is an example of this type.
+.. _cgroupv2-protections-distributor:
Protections
-----------
-A cgroup is protected upto the configured amount of the resource
+A cgroup is protected up to the configured amount of the resource
as long as the usages of all its ancestors are under their
protected levels. Protections can be hard guarantees or best effort
soft boundaries. Protections can also be over-committed in which case
-only upto the amount available to the parent is protected among
+only up to the amount available to the parent is protected among
children.
Protections are in the range [0, max] and defaults to 0, which is
@@ -714,9 +764,7 @@ Conventions
- Settings for a single feature should be contained in a single file.
- The root cgroup should be exempt from resource control and thus
- shouldn't have resource control interface files. Also,
- informational files on the root cgroup which end up showing global
- information available elsewhere shouldn't exist.
+ shouldn't have resource control interface files.
- The default time unit is microseconds. If a different unit is ever
used, an explicit unit suffix must be present.
@@ -788,7 +836,6 @@ Core Interface Files
All cgroup core files are prefixed with "cgroup."
cgroup.type
-
A read-write single value file which exists on non-root
cgroups.
@@ -953,9 +1000,49 @@ All cgroup core files are prefixed with "cgroup."
it's possible to delete a frozen (and empty) cgroup, as well as
create new sub-cgroups.
+ cgroup.kill
+ A write-only single value file which exists in non-root cgroups.
+ The only allowed value is "1".
+
+ Writing "1" to the file causes the cgroup and all descendant cgroups to
+ be killed. This means that all processes located in the affected cgroup
+ tree will be killed via SIGKILL.
+
+ Killing a cgroup tree will deal with concurrent forks appropriately and
+ is protected against migrations.
+
+ In a threaded cgroup, writing this file fails with EOPNOTSUPP as
+ killing cgroups is a process directed operation, i.e. it affects
+ the whole thread-group.
+
+ cgroup.pressure
+ A read-write single value file that allowed values are "0" and "1".
+ The default is "1".
+
+ Writing "0" to the file will disable the cgroup PSI accounting.
+ Writing "1" to the file will re-enable the cgroup PSI accounting.
+
+ This control attribute is not hierarchical, so disable or enable PSI
+ accounting in a cgroup does not affect PSI accounting in descendants
+ and doesn't need pass enablement via ancestors from root.
+
+ The reason this control attribute exists is that PSI accounts stalls for
+ each cgroup separately and aggregates it at each level of the hierarchy.
+ This may cause non-negligible overhead for some workloads when under
+ deep level of the hierarchy, in which case this control attribute can
+ be used to disable PSI accounting in the non-leaf cgroups.
+
+ irq.pressure
+ A read-write nested-keyed file.
+
+ Shows pressure stall information for IRQ/SOFTIRQ. See
+ :ref:`Documentation/accounting/psi.rst <psi>` for details.
+
Controllers
===========
+.. _cgroup-v2-cpu:
+
CPU
---
@@ -985,7 +1072,7 @@ CPU Interface Files
All time durations are in microseconds.
cpu.stat
- A read-only flat-keyed file which exists on non-root cgroups.
+ A read-only flat-keyed file.
This file exists whether the controller is enabled or not.
It always reports the following three stats:
@@ -994,17 +1081,23 @@ All time durations are in microseconds.
- user_usec
- system_usec
- and the following three when the controller is enabled:
+ and the following five when the controller is enabled:
- nr_periods
- nr_throttled
- throttled_usec
+ - nr_bursts
+ - burst_usec
cpu.weight
A read-write single value file which exists on non-root
cgroups. The default is "100".
- The weight in the range [1, 10000].
+ For non idle groups (cpu.idle = 0), the weight is in the
+ range [1, 10000].
+
+ If the cgroup has been configured to be SCHED_IDLE (cpu.idle = 1),
+ then the weight will show as a 0.
cpu.weight.nice
A read-write single value file which exists on non-root
@@ -1026,12 +1119,18 @@ All time durations are in microseconds.
$MAX $PERIOD
- which indicates that the group may consume upto $MAX in each
+ which indicates that the group may consume up to $MAX in each
$PERIOD duration. "max" for $MAX indicates no limit. If only
one number is written, $MAX is updated.
+ cpu.max.burst
+ A read-write single value file which exists on non-root
+ cgroups. The default is "0".
+
+ The burst in the range [0, $MAX].
+
cpu.pressure
- A read-only nested-key file which exists on non-root cgroups.
+ A read-write nested-keyed file.
Shows pressure stall information for CPU. See
:ref:`Documentation/accounting/psi.rst <psi>` for details.
@@ -1062,6 +1161,16 @@ All time durations are in microseconds.
values similar to the sched_setattr(2). This maximum utilization
value is used to clamp the task specific maximum utilization clamp.
+ cpu.idle
+ A read-write single value file which exists on non-root cgroups.
+ The default is 0.
+
+ This is the cgroup analog of the per-task SCHED_IDLE sched policy.
+ Setting this value to a 1 will make the scheduling policy of the
+ cgroup SCHED_IDLE. The threads inside the cgroup will retain their
+ own relative priorities, but the cgroup itself will be treated as
+ very low priority relative to its peers.
+
Memory
@@ -1154,27 +1263,67 @@ PAGE_SIZE multiple when read back.
A read-write single value file which exists on non-root
cgroups. The default is "max".
- Memory usage throttle limit. This is the main mechanism to
- control memory usage of a cgroup. If a cgroup's usage goes
+ Memory usage throttle limit. If a cgroup's usage goes
over the high boundary, the processes of the cgroup are
throttled and put under heavy reclaim pressure.
Going over the high limit never invokes the OOM killer and
- under extreme conditions the limit may be breached.
+ under extreme conditions the limit may be breached. The high
+ limit should be used in scenarios where an external process
+ monitors the limited cgroup to alleviate heavy reclaim
+ pressure.
memory.max
A read-write single value file which exists on non-root
cgroups. The default is "max".
- Memory usage hard limit. This is the final protection
- mechanism. If a cgroup's memory usage reaches this limit and
- can't be reduced, the OOM killer is invoked in the cgroup.
- Under certain circumstances, the usage may go over the limit
- temporarily.
+ Memory usage hard limit. This is the main mechanism to limit
+ memory usage of a cgroup. If a cgroup's memory usage reaches
+ this limit and can't be reduced, the OOM killer is invoked in
+ the cgroup. Under certain circumstances, the usage may go
+ over the limit temporarily.
- This is the ultimate protection mechanism. As long as the
- high limit is used and monitored properly, this limit's
- utility is limited to providing the final safety net.
+ In default configuration regular 0-order allocations always
+ succeed unless OOM killer chooses current task as a victim.
+
+ Some kinds of allocations don't invoke the OOM killer.
+ Caller could retry them differently, return into userspace
+ as -ENOMEM or silently ignore in cases like disk readahead.
+
+ memory.reclaim
+ A write-only nested-keyed file which exists for all cgroups.
+
+ This is a simple interface to trigger memory reclaim in the
+ target cgroup.
+
+ This file accepts a single key, the number of bytes to reclaim.
+ No nested keys are currently supported.
+
+ Example::
+
+ echo "1G" > memory.reclaim
+
+ The interface can be later extended with nested keys to
+ configure the reclaim behavior. For example, specify the
+ type of memory to reclaim from (anon, file, ..).
+
+ Please note that the kernel can over or under reclaim from
+ the target cgroup. If less bytes are reclaimed than the
+ specified amount, -EAGAIN is returned.
+
+ Please note that the proactive reclaim (triggered by this
+ interface) is not meant to indicate memory pressure on the
+ memory cgroup. Therefore socket memory balancing triggered by
+ the memory reclaim normally is not exercised in this case.
+ This means that the networking layer will not adapt based on
+ reclaim induced by memory.reclaim.
+
+ memory.peak
+ A read-only single value file which exists on non-root
+ cgroups.
+
+ The max memory usage recorded for the cgroup and its
+ descendants since the creation of the cgroup.
memory.oom.group
A read-write single value file which exists on non-root
@@ -1202,7 +1351,7 @@ PAGE_SIZE multiple when read back.
Note that all fields in this file are hierarchical and the
file modified event can be generated due to an event down the
- hierarchy. For for the local events at the cgroup level see
+ hierarchy. For the local events at the cgroup level see
memory.events.local.
low
@@ -1228,22 +1377,17 @@ PAGE_SIZE multiple when read back.
The number of time the cgroup's memory usage was
reached the limit and allocation was about to fail.
- Depending on context result could be invocation of OOM
- killer and retrying allocation or failing allocation.
-
- Failed allocation in its turn could be returned into
- userspace as -ENOMEM or silently ignored in cases like
- disk readahead. For now OOM in memory cgroup kills
- tasks iff shortage has happened inside page fault.
-
This event is not raised if the OOM killer is not
considered as an option, e.g. for failed high-order
- allocations.
+ allocations or if caller asked to not retry attempts.
oom_kill
The number of processes belonging to this cgroup
killed by any kind of OOM killer.
+ oom_group_kill
+ The number of times a group OOM has occurred.
+
memory.events.local
Similar to memory.events but the fields in the file are local
to the cgroup i.e. not hierarchical. The file modified event
@@ -1262,6 +1406,10 @@ PAGE_SIZE multiple when read back.
can show up in the middle. Don't rely on items remaining in a
fixed position; use the keys to look up specific values!
+ If the entry has no per-node counter (or not show in the
+ memory.numa_stat). We use 'npn' (non-per-node) as the tag
+ to indicate that it will not show in the memory.numa_stat.
+
anon
Amount of memory used in anonymous mappings such as
brk(), sbrk(), and mmap(MAP_ANONYMOUS)
@@ -1270,20 +1418,42 @@ PAGE_SIZE multiple when read back.
Amount of memory used to cache filesystem data,
including tmpfs and shared memory.
+ kernel (npn)
+ Amount of total kernel memory, including
+ (kernel_stack, pagetables, percpu, vmalloc, slab) in
+ addition to other kernel memory use cases.
+
kernel_stack
Amount of memory allocated to kernel stacks.
- slab
- Amount of memory used for storing in-kernel data
- structures.
+ pagetables
+ Amount of memory allocated for page tables.
- sock
+ sec_pagetables
+ Amount of memory allocated for secondary page tables,
+ this currently includes KVM mmu allocations on x86
+ and arm64.
+
+ percpu (npn)
+ Amount of memory used for storing per-cpu kernel
+ data structures.
+
+ sock (npn)
Amount of memory used in network transmission buffers
+ vmalloc (npn)
+ Amount of memory used for vmap backed memory.
+
shmem
Amount of cached filesystem data that is swap-backed,
such as tmpfs, shm segments, shared anonymous mmap()s
+ zswap
+ Amount of memory consumed by the zswap compression backend.
+
+ zswapped
+ Amount of application memory swapped out to zswap.
+
file_mapped
Amount of cached filesystem data mapped with mmap()
@@ -1295,10 +1465,22 @@ PAGE_SIZE multiple when read back.
Amount of cached filesystem data that was modified and
is currently being written back to disk
+ swapcached
+ Amount of swap cached in memory. The swapcache is accounted
+ against both memory and swap usage.
+
anon_thp
Amount of memory used in anonymous mappings backed by
transparent hugepages
+ file_thp
+ Amount of cached filesystem data backed by transparent
+ hugepages
+
+ shmem_thp
+ Amount of shm, tmpfs, shared anonymous mmap()s backed by
+ transparent hugepages
+
inactive_anon, active_anon, inactive_file, active_file, unevictable
Amount of memory, swap-backed and filesystem-backed,
on the internal memory management lists used by the
@@ -1317,52 +1499,123 @@ PAGE_SIZE multiple when read back.
Part of "slab" that cannot be reclaimed on memory
pressure.
- pgfault
- Total number of page faults incurred
+ slab (npn)
+ Amount of memory used for storing in-kernel data
+ structures.
- pgmajfault
- Number of major page faults incurred
+ workingset_refault_anon
+ Number of refaults of previously evicted anonymous pages.
+
+ workingset_refault_file
+ Number of refaults of previously evicted file pages.
+
+ workingset_activate_anon
+ Number of refaulted anonymous pages that were immediately
+ activated.
- workingset_refault
- Number of refaults of previously evicted pages
+ workingset_activate_file
+ Number of refaulted file pages that were immediately activated.
- workingset_activate
- Number of refaulted pages that were immediately activated
+ workingset_restore_anon
+ Number of restored anonymous pages which have been detected as
+ an active workingset before they got reclaimed.
+
+ workingset_restore_file
+ Number of restored file pages which have been detected as an
+ active workingset before they got reclaimed.
workingset_nodereclaim
Number of times a shadow node has been reclaimed
- pgrefill
- Amount of scanned pages (in an active LRU list)
-
- pgscan
+ pgscan (npn)
Amount of scanned pages (in an inactive LRU list)
- pgsteal
+ pgsteal (npn)
Amount of reclaimed pages
- pgactivate
+ pgscan_kswapd (npn)
+ Amount of scanned pages by kswapd (in an inactive LRU list)
+
+ pgscan_direct (npn)
+ Amount of scanned pages directly (in an inactive LRU list)
+
+ pgscan_khugepaged (npn)
+ Amount of scanned pages by khugepaged (in an inactive LRU list)
+
+ pgsteal_kswapd (npn)
+ Amount of reclaimed pages by kswapd
+
+ pgsteal_direct (npn)
+ Amount of reclaimed pages directly
+
+ pgsteal_khugepaged (npn)
+ Amount of reclaimed pages by khugepaged
+
+ pgfault (npn)
+ Total number of page faults incurred
+
+ pgmajfault (npn)
+ Number of major page faults incurred
+
+ pgrefill (npn)
+ Amount of scanned pages (in an active LRU list)
+
+ pgactivate (npn)
Amount of pages moved to the active LRU list
- pgdeactivate
+ pgdeactivate (npn)
Amount of pages moved to the inactive LRU list
- pglazyfree
+ pglazyfree (npn)
Amount of pages postponed to be freed under memory pressure
- pglazyfreed
+ pglazyfreed (npn)
Amount of reclaimed lazyfree pages
- thp_fault_alloc
+ thp_fault_alloc (npn)
Number of transparent hugepages which were allocated to satisfy
- a page fault, including COW faults. This counter is not present
- when CONFIG_TRANSPARENT_HUGEPAGE is not set.
+ a page fault. This counter is not present when CONFIG_TRANSPARENT_HUGEPAGE
+ is not set.
- thp_collapse_alloc
+ thp_collapse_alloc (npn)
Number of transparent hugepages which were allocated to allow
collapsing an existing range of pages. This counter is not
present when CONFIG_TRANSPARENT_HUGEPAGE is not set.
+ thp_swpout (npn)
+ Number of transparent hugepages which are swapout in one piece
+ without splitting.
+
+ thp_swpout_fallback (npn)
+ Number of transparent hugepages which were split before swapout.
+ Usually because failed to allocate some continuous swap space
+ for the huge page.
+
+ memory.numa_stat
+ A read-only nested-keyed file which exists on non-root cgroups.
+
+ This breaks down the cgroup's memory footprint into different
+ types of memory, type-specific details, and other information
+ per node on the state of the memory management system.
+
+ This is useful for providing visibility into the NUMA locality
+ information within an memcg since the pages are allowed to be
+ allocated from any physical node. One of the use case is evaluating
+ application performance by combining this information with the
+ application's CPU allocation.
+
+ All memory amounts are in bytes.
+
+ The output format of memory.numa_stat is::
+
+ type N0=<bytes in node 0> N1=<bytes in node 1> ...
+
+ The entries are ordered to be human readable, and new entries
+ can show up in the middle. Don't rely on items remaining in a
+ fixed position; use the keys to look up specific values!
+
+ The entries can refer to the memory.stat.
+
memory.swap.current
A read-only single value file which exists on non-root
cgroups.
@@ -1370,6 +1623,29 @@ PAGE_SIZE multiple when read back.
The total amount of swap currently being used by the cgroup
and its descendants.
+ memory.swap.high
+ A read-write single value file which exists on non-root
+ cgroups. The default is "max".
+
+ Swap usage throttle limit. If a cgroup's swap usage exceeds
+ this limit, all its further allocations will be throttled to
+ allow userspace to implement custom out-of-memory procedures.
+
+ This limit marks a point of no return for the cgroup. It is NOT
+ designed to manage the amount of swapping a workload does
+ during regular operation. Compare to memory.swap.max, which
+ prohibits swapping past a set amount, but lets the cgroup
+ continue unimpeded as long as other memory can be reclaimed.
+
+ Healthy workloads are not expected to reach this limit.
+
+ memory.swap.peak
+ A read-only single value file which exists on non-root
+ cgroups.
+
+ The max swap usage recorded for the cgroup and its
+ descendants since the creation of the cgroup.
+
memory.swap.max
A read-write single value file which exists on non-root
cgroups. The default is "max".
@@ -1383,6 +1659,10 @@ PAGE_SIZE multiple when read back.
otherwise, a value change in this file generates a file
modified event.
+ high
+ The number of times the cgroup's swap usage was over
+ the high threshold.
+
max
The number of times the cgroup's swap usage was about
to go over the max boundary and swap allocation
@@ -1398,8 +1678,38 @@ PAGE_SIZE multiple when read back.
higher than the limit for an extended period of time. This
reduces the impact on the workload and memory management.
+ memory.zswap.current
+ A read-only single value file which exists on non-root
+ cgroups.
+
+ The total amount of memory consumed by the zswap compression
+ backend.
+
+ memory.zswap.max
+ A read-write single value file which exists on non-root
+ cgroups. The default is "max".
+
+ Zswap usage hard limit. If a cgroup's zswap pool reaches this
+ limit, it will refuse to take any more stores before existing
+ entries fault back in or are written out to disk.
+
+ memory.zswap.writeback
+ A read-write single value file. The default value is "1". The
+ initial value of the root cgroup is 1, and when a new cgroup is
+ created, it inherits the current value of its parent.
+
+ When this is set to 0, all swapping attempts to swapping devices
+ are disabled. This included both zswap writebacks, and swapping due
+ to zswap store failures. If the zswap store failures are recurring
+ (for e.g if the pages are incompressible), users can observe
+ reclaim inefficiency after disabling writeback (because the same
+ pages might be rejected again and again).
+
+ Note that this is subtly different from setting memory.swap.max to
+ 0, as it still allows for pages to be written to the zswap pool.
+
memory.pressure
- A read-only nested-key file which exists on non-root cgroups.
+ A read-only nested-keyed file.
Shows pressure stall information for memory. See
:ref:`Documentation/accounting/psi.rst <psi>` for details.
@@ -1462,8 +1772,7 @@ IO Interface Files
~~~~~~~~~~~~~~~~~~
io.stat
- A read-only nested-keyed file which exists on non-root
- cgroups.
+ A read-only nested-keyed file.
Lines are keyed by $MAJ:$MIN device numbers and not ordered.
The following nested keys are defined.
@@ -1483,7 +1792,7 @@ IO Interface Files
8:0 rbytes=90430464 wbytes=299008000 rios=8950 wios=1252 dbytes=50331648 dios=3021
io.cost.qos
- A read-write nested-keyed file with exists only on the root
+ A read-write nested-keyed file which exists only on the root
cgroup.
This file configures the Quality of Service of the IO cost
@@ -1538,7 +1847,7 @@ IO Interface Files
automatic mode can be restored by setting "ctrl" to "auto".
io.cost.model
- A read-write nested-keyed file with exists only on the root
+ A read-write nested-keyed file which exists only on the root
cgroup.
This file configures the cost model of the IO cost model based
@@ -1639,7 +1948,7 @@ IO Interface Files
8:16 rbps=2097152 wbps=max riops=max wiops=max
io.pressure
- A read-only nested-key file which exists on non-root cgroups.
+ A read-only nested-keyed file.
Shows pressure stall information for IO. See
:ref:`Documentation/accounting/psi.rst <psi>` for details.
@@ -1663,9 +1972,9 @@ per-cgroup dirty memory states are examined and the more restrictive
of the two is enforced.
cgroup writeback requires explicit support from the underlying
-filesystem. Currently, cgroup writeback is implemented on ext2, ext4
-and btrfs. On other filesystems, all writeback IOs are attributed to
-the root cgroup.
+filesystem. Currently, cgroup writeback is implemented on ext2, ext4,
+btrfs, f2fs, and xfs. On other filesystems, all writeback IOs are
+attributed to the root cgroup.
There are inherent differences in memory and writeback management
which affects how cgroup ownership is tracked. Memory is tracked per
@@ -1764,7 +2073,7 @@ IO Latency Interface Files
io.latency
This takes a similar format as the other controllers.
- "MAJOR:MINOR target=<target time in microseconds"
+ "MAJOR:MINOR target=<target time in microseconds>"
io.stat
If the controller is enabled you will see extra stats in io.stat in
@@ -1784,6 +2093,68 @@ IO Latency Interface Files
duration of time between evaluation events. Windows only elapse
with IO activity. Idle periods extend the most recent window.
+IO Priority
+~~~~~~~~~~~
+
+A single attribute controls the behavior of the I/O priority cgroup policy,
+namely the io.prio.class attribute. The following values are accepted for
+that attribute:
+
+ no-change
+ Do not modify the I/O priority class.
+
+ promote-to-rt
+ For requests that have a non-RT I/O priority class, change it into RT.
+ Also change the priority level of these requests to 4. Do not modify
+ the I/O priority of requests that have priority class RT.
+
+ restrict-to-be
+ For requests that do not have an I/O priority class or that have I/O
+ priority class RT, change it into BE. Also change the priority level
+ of these requests to 0. Do not modify the I/O priority class of
+ requests that have priority class IDLE.
+
+ idle
+ Change the I/O priority class of all requests into IDLE, the lowest
+ I/O priority class.
+
+ none-to-rt
+ Deprecated. Just an alias for promote-to-rt.
+
+The following numerical values are associated with the I/O priority policies:
+
++----------------+---+
+| no-change | 0 |
++----------------+---+
+| promote-to-rt | 1 |
++----------------+---+
+| restrict-to-be | 2 |
++----------------+---+
+| idle | 3 |
++----------------+---+
+
+The numerical value that corresponds to each I/O priority class is as follows:
+
++-------------------------------+---+
+| IOPRIO_CLASS_NONE | 0 |
++-------------------------------+---+
+| IOPRIO_CLASS_RT (real-time) | 1 |
++-------------------------------+---+
+| IOPRIO_CLASS_BE (best effort) | 2 |
++-------------------------------+---+
+| IOPRIO_CLASS_IDLE | 3 |
++-------------------------------+---+
+
+The algorithm to set the I/O priority class for a request is as follows:
+
+- If I/O priority class policy is promote-to-rt, change the request I/O
+ priority class to IOPRIO_CLASS_RT and change the request I/O priority
+ level to 4.
+- If I/O priority class policy is not promote-to-rt, translate the I/O priority
+ class policy into a number, then change the request I/O priority class
+ into the maximum of the I/O priority class policy number and the numerical
+ I/O priority class.
+
PID
---
@@ -1904,6 +2275,17 @@ Cpuset Interface Files
The value of "cpuset.mems" stays constant until the next update
and won't be affected by any memory nodes hotplug events.
+ Setting a non-empty value to "cpuset.mems" causes memory of
+ tasks within the cgroup to be migrated to the designated nodes if
+ they are currently using memory outside of the designated nodes.
+
+ There is a cost for this memory migration. The migration
+ may not be complete and some memory pages may be left behind.
+ So it is recommended that "cpuset.mems" should be set properly
+ before spawning new tasks into the cpuset. Even if there is
+ a need to change "cpuset.mems" with active tasks, it shouldn't
+ be done frequently.
+
cpuset.mems.effective
A read-only multiple values file which exists on all
cpuset-enabled cgroups.
@@ -1920,78 +2302,166 @@ Cpuset Interface Files
Its value will be affected by memory nodes hotplug events.
+ cpuset.cpus.exclusive
+ A read-write multiple values file which exists on non-root
+ cpuset-enabled cgroups.
+
+ It lists all the exclusive CPUs that are allowed to be used
+ to create a new cpuset partition. Its value is not used
+ unless the cgroup becomes a valid partition root. See the
+ "cpuset.cpus.partition" section below for a description of what
+ a cpuset partition is.
+
+ When the cgroup becomes a partition root, the actual exclusive
+ CPUs that are allocated to that partition are listed in
+ "cpuset.cpus.exclusive.effective" which may be different
+ from "cpuset.cpus.exclusive". If "cpuset.cpus.exclusive"
+ has previously been set, "cpuset.cpus.exclusive.effective"
+ is always a subset of it.
+
+ Users can manually set it to a value that is different from
+ "cpuset.cpus". The only constraint in setting it is that the
+ list of CPUs must be exclusive with respect to its sibling.
+
+ For a parent cgroup, any one of its exclusive CPUs can only
+ be distributed to at most one of its child cgroups. Having an
+ exclusive CPU appearing in two or more of its child cgroups is
+ not allowed (the exclusivity rule). A value that violates the
+ exclusivity rule will be rejected with a write error.
+
+ The root cgroup is a partition root and all its available CPUs
+ are in its exclusive CPU set.
+
+ cpuset.cpus.exclusive.effective
+ A read-only multiple values file which exists on all non-root
+ cpuset-enabled cgroups.
+
+ This file shows the effective set of exclusive CPUs that
+ can be used to create a partition root. The content of this
+ file will always be a subset of "cpuset.cpus" and its parent's
+ "cpuset.cpus.exclusive.effective" if its parent is not the root
+ cgroup. It will also be a subset of "cpuset.cpus.exclusive"
+ if it is set. If "cpuset.cpus.exclusive" is not set, it is
+ treated to have an implicit value of "cpuset.cpus" in the
+ formation of local partition.
+
+ cpuset.cpus.isolated
+ A read-only and root cgroup only multiple values file.
+
+ This file shows the set of all isolated CPUs used in existing
+ isolated partitions. It will be empty if no isolated partition
+ is created.
+
cpuset.cpus.partition
A read-write single value file which exists on non-root
cpuset-enabled cgroups. This flag is owned by the parent cgroup
and is not delegatable.
- It accepts only the following input values when written to.
-
- "root" - a partition root
- "member" - a non-root member of a partition
-
- When set to be a partition root, the current cgroup is the
- root of a new partition or scheduling domain that comprises
- itself and all its descendants except those that are separate
- partition roots themselves and their descendants. The root
- cgroup is always a partition root.
-
- There are constraints on where a partition root can be set.
- It can only be set in a cgroup if all the following conditions
- are true.
-
- 1) The "cpuset.cpus" is not empty and the list of CPUs are
- exclusive, i.e. they are not shared by any of its siblings.
- 2) The parent cgroup is a partition root.
- 3) The "cpuset.cpus" is also a proper subset of the parent's
- "cpuset.cpus.effective".
- 4) There is no child cgroups with cpuset enabled. This is for
- eliminating corner cases that have to be handled if such a
- condition is allowed.
-
- Setting it to partition root will take the CPUs away from the
- effective CPUs of the parent cgroup. Once it is set, this
- file cannot be reverted back to "member" if there are any child
- cgroups with cpuset enabled.
-
- A parent partition cannot distribute all its CPUs to its
- child partitions. There must be at least one cpu left in the
- parent partition.
-
- Once becoming a partition root, changes to "cpuset.cpus" is
- generally allowed as long as the first condition above is true,
- the change will not take away all the CPUs from the parent
- partition and the new "cpuset.cpus" value is a superset of its
- children's "cpuset.cpus" values.
-
- Sometimes, external factors like changes to ancestors'
- "cpuset.cpus" or cpu hotplug can cause the state of the partition
- root to change. On read, the "cpuset.sched.partition" file
- can show the following values.
-
- "member" Non-root member of a partition
- "root" Partition root
- "root invalid" Invalid partition root
-
- It is a partition root if the first 2 partition root conditions
- above are true and at least one CPU from "cpuset.cpus" is
- granted by the parent cgroup.
-
- A partition root can become invalid if none of CPUs requested
- in "cpuset.cpus" can be granted by the parent cgroup or the
- parent cgroup is no longer a partition root itself. In this
- case, it is not a real partition even though the restriction
- of the first partition root condition above will still apply.
- The cpu affinity of all the tasks in the cgroup will then be
- associated with CPUs in the nearest ancestor partition.
-
- An invalid partition root can be transitioned back to a
- real partition root if at least one of the requested CPUs
- can now be granted by its parent. In this case, the cpu
- affinity of all the tasks in the formerly invalid partition
- will be associated to the CPUs of the newly formed partition.
- Changing the partition state of an invalid partition root to
- "member" is always allowed even if child cpusets are present.
+ It accepts only the following input values when written to.
+
+ ========== =====================================
+ "member" Non-root member of a partition
+ "root" Partition root
+ "isolated" Partition root without load balancing
+ ========== =====================================
+
+ A cpuset partition is a collection of cpuset-enabled cgroups with
+ a partition root at the top of the hierarchy and its descendants
+ except those that are separate partition roots themselves and
+ their descendants. A partition has exclusive access to the
+ set of exclusive CPUs allocated to it. Other cgroups outside
+ of that partition cannot use any CPUs in that set.
+
+ There are two types of partitions - local and remote. A local
+ partition is one whose parent cgroup is also a valid partition
+ root. A remote partition is one whose parent cgroup is not a
+ valid partition root itself. Writing to "cpuset.cpus.exclusive"
+ is optional for the creation of a local partition as its
+ "cpuset.cpus.exclusive" file will assume an implicit value that
+ is the same as "cpuset.cpus" if it is not set. Writing the
+ proper "cpuset.cpus.exclusive" values down the cgroup hierarchy
+ before the target partition root is mandatory for the creation
+ of a remote partition.
+
+ Currently, a remote partition cannot be created under a local
+ partition. All the ancestors of a remote partition root except
+ the root cgroup cannot be a partition root.
+
+ The root cgroup is always a partition root and its state cannot
+ be changed. All other non-root cgroups start out as "member".
+
+ When set to "root", the current cgroup is the root of a new
+ partition or scheduling domain. The set of exclusive CPUs is
+ determined by the value of its "cpuset.cpus.exclusive.effective".
+
+ When set to "isolated", the CPUs in that partition will be in
+ an isolated state without any load balancing from the scheduler
+ and excluded from the unbound workqueues. Tasks placed in such
+ a partition with multiple CPUs should be carefully distributed
+ and bound to each of the individual CPUs for optimal performance.
+
+ A partition root ("root" or "isolated") can be in one of the
+ two possible states - valid or invalid. An invalid partition
+ root is in a degraded state where some state information may
+ be retained, but behaves more like a "member".
+
+ All possible state transitions among "member", "root" and
+ "isolated" are allowed.
+
+ On read, the "cpuset.cpus.partition" file can show the following
+ values.
+
+ ============================= =====================================
+ "member" Non-root member of a partition
+ "root" Partition root
+ "isolated" Partition root without load balancing
+ "root invalid (<reason>)" Invalid partition root
+ "isolated invalid (<reason>)" Invalid isolated partition root
+ ============================= =====================================
+
+ In the case of an invalid partition root, a descriptive string on
+ why the partition is invalid is included within parentheses.
+
+ For a local partition root to be valid, the following conditions
+ must be met.
+
+ 1) The parent cgroup is a valid partition root.
+ 2) The "cpuset.cpus.exclusive.effective" file cannot be empty,
+ though it may contain offline CPUs.
+ 3) The "cpuset.cpus.effective" cannot be empty unless there is
+ no task associated with this partition.
+
+ For a remote partition root to be valid, all the above conditions
+ except the first one must be met.
+
+ External events like hotplug or changes to "cpuset.cpus" or
+ "cpuset.cpus.exclusive" can cause a valid partition root to
+ become invalid and vice versa. Note that a task cannot be
+ moved to a cgroup with empty "cpuset.cpus.effective".
+
+ A valid non-root parent partition may distribute out all its CPUs
+ to its child local partitions when there is no task associated
+ with it.
+
+ Care must be taken to change a valid partition root to "member"
+ as all its child local partitions, if present, will become
+ invalid causing disruption to tasks running in those child
+ partitions. These inactivated partitions could be recovered if
+ their parent is switched back to a partition root with a proper
+ value in "cpuset.cpus" or "cpuset.cpus.exclusive".
+
+ Poll and inotify events are triggered whenever the state of
+ "cpuset.cpus.partition" changes. That includes changes caused
+ by write to "cpuset.cpus.partition", cpu hotplug or other
+ changes that modify the validity status of the partition.
+ This will allow user space agents to monitor unexpected changes
+ to "cpuset.cpus.partition" without the need to do continuous
+ polling.
+
+ A user can pre-configure certain CPUs to an isolated state
+ with load balancing disabled at boot time with the "isolcpus"
+ kernel boot command line option. If those CPUs are to be put
+ into a partition, they have to be used in an isolated partition.
Device controller
@@ -2003,26 +2473,26 @@ existing device files.
Cgroup v2 device controller has no interface files and is implemented
on top of cgroup BPF. To control access to device files, a user may
-create bpf programs of the BPF_CGROUP_DEVICE type and attach them
-to cgroups. On an attempt to access a device file, corresponding
-BPF programs will be executed, and depending on the return value
-the attempt will succeed or fail with -EPERM.
+create bpf programs of type BPF_PROG_TYPE_CGROUP_DEVICE and attach
+them to cgroups with BPF_CGROUP_DEVICE flag. On an attempt to access a
+device file, corresponding BPF programs will be executed, and depending
+on the return value the attempt will succeed or fail with -EPERM.
-A BPF_CGROUP_DEVICE program takes a pointer to the bpf_cgroup_dev_ctx
-structure, which describes the device access attempt: access type
-(mknod/read/write) and device (type, major and minor numbers).
-If the program returns 0, the attempt fails with -EPERM, otherwise
-it succeeds.
+A BPF_PROG_TYPE_CGROUP_DEVICE program takes a pointer to the
+bpf_cgroup_dev_ctx structure, which describes the device access attempt:
+access type (mknod/read/write) and device (type, major and minor numbers).
+If the program returns 0, the attempt fails with -EPERM, otherwise it
+succeeds.
-An example of BPF_CGROUP_DEVICE program may be found in the kernel
-source tree in the tools/testing/selftests/bpf/dev_cgroup.c file.
+An example of BPF_PROG_TYPE_CGROUP_DEVICE program may be found in
+tools/testing/selftests/bpf/progs/dev_cgroup.c in the kernel source tree.
RDMA
----
The "rdma" controller regulates the distribution and accounting of
-of RDMA resources.
+RDMA resources.
RDMA Interface Files
~~~~~~~~~~~~~~~~~~~~
@@ -2085,9 +2555,90 @@ HugeTLB Interface Files
are local to the cgroup i.e. not hierarchical. The file modified event
generated on this file reflects only the local events.
+ hugetlb.<hugepagesize>.numa_stat
+ Similar to memory.numa_stat, it shows the numa information of the
+ hugetlb pages of <hugepagesize> in this cgroup. Only active in
+ use hugetlb pages are included. The per-node values are in bytes.
+
Misc
----
+The Miscellaneous cgroup provides the resource limiting and tracking
+mechanism for the scalar resources which cannot be abstracted like the other
+cgroup resources. Controller is enabled by the CONFIG_CGROUP_MISC config
+option.
+
+A resource can be added to the controller via enum misc_res_type{} in the
+include/linux/misc_cgroup.h file and the corresponding name via misc_res_name[]
+in the kernel/cgroup/misc.c file. Provider of the resource must set its
+capacity prior to using the resource by calling misc_cg_set_capacity().
+
+Once a capacity is set then the resource usage can be updated using charge and
+uncharge APIs. All of the APIs to interact with misc controller are in
+include/linux/misc_cgroup.h.
+
+Misc Interface Files
+~~~~~~~~~~~~~~~~~~~~
+
+Miscellaneous controller provides 3 interface files. If two misc resources (res_a and res_b) are registered then:
+
+ misc.capacity
+ A read-only flat-keyed file shown only in the root cgroup. It shows
+ miscellaneous scalar resources available on the platform along with
+ their quantities::
+
+ $ cat misc.capacity
+ res_a 50
+ res_b 10
+
+ misc.current
+ A read-only flat-keyed file shown in the all cgroups. It shows
+ the current usage of the resources in the cgroup and its children.::
+
+ $ cat misc.current
+ res_a 3
+ res_b 0
+
+ misc.max
+ A read-write flat-keyed file shown in the non root cgroups. Allowed
+ maximum usage of the resources in the cgroup and its children.::
+
+ $ cat misc.max
+ res_a max
+ res_b 4
+
+ Limit can be set by::
+
+ # echo res_a 1 > misc.max
+
+ Limit can be set to max by::
+
+ # echo res_a max > misc.max
+
+ Limits can be set higher than the capacity value in the misc.capacity
+ file.
+
+ misc.events
+ A read-only flat-keyed file which exists on non-root cgroups. The
+ following entries are defined. Unless specified otherwise, a value
+ change in this file generates a file modified event. All fields in
+ this file are hierarchical.
+
+ max
+ The number of times the cgroup's resource usage was
+ about to go over the max boundary.
+
+Migration and Ownership
+~~~~~~~~~~~~~~~~~~~~~~~
+
+A miscellaneous scalar resource is charged to the cgroup in which it is used
+first, and stays charged to that cgroup until that resource is freed. Migrating
+a process to a different cgroup does not move the charge to the destination
+cgroup where the process has moved.
+
+Others
+------
+
perf_event
~~~~~~~~~~
@@ -2144,7 +2695,7 @@ Without cgroup namespace, the "/proc/$PID/cgroup" file shows the
complete path of the cgroup of a process. In a container setup where
a set of cgroups and namespaces are intended to isolate processes the
"/proc/$PID/cgroup" file may leak potential system level information
-to the isolated processes. For Example::
+to the isolated processes. For example::
# cat /proc/self/cgroup
0::/batchjobs/container_id1
diff --git a/Documentation/admin-guide/cifs/authors.rst b/Documentation/admin-guide/cifs/authors.rst
index b02d6dd6c070..5c1d2f0fa7d1 100644
--- a/Documentation/admin-guide/cifs/authors.rst
+++ b/Documentation/admin-guide/cifs/authors.rst
@@ -5,10 +5,10 @@ Authors
Original Author
---------------
-Steve French (sfrench@samba.org)
+Steve French (smfrench@gmail.com, sfrench@samba.org)
The author wishes to express his appreciation and thanks to:
-Andrew Tridgell (Samba team) for his early suggestions about smb/cifs VFS
+Andrew Tridgell (Samba team) for his early suggestions about SMB/CIFS VFS
improvements. Thanks to IBM for allowing me time and test resources to pursue
this project, to Jim McDonough from IBM (and the Samba Team) for his help, to
the IBM Linux JFS team for explaining many esoteric Linux filesystem features.
@@ -51,7 +51,7 @@ Patch Contributors
- Ronnie Sahlberg (for SMB3 xattr work, bug fixes, and lots of great work on compounding)
- Shirish Pargaonkar (for many ACL patches over the years)
- Sachin Prabhu (many bug fixes, including for reconnect, copy offload and security)
-- Paulo Alcantara
+- Paulo Alcantara (for some excellent work in DFS, and in booting from SMB3)
- Long Li (some great work on RDMA, SMB Direct)
diff --git a/Documentation/admin-guide/cifs/changes.rst b/Documentation/admin-guide/cifs/changes.rst
index 71f2ecb62299..8c42c4de510b 100644
--- a/Documentation/admin-guide/cifs/changes.rst
+++ b/Documentation/admin-guide/cifs/changes.rst
@@ -3,6 +3,7 @@ Changes
=======
See https://wiki.samba.org/index.php/LinuxCIFSKernel for summary
-information (that may be easier to read than parsing the output of
-"git log fs/cifs") about fixes/improvements to CIFS/SMB2/SMB3 support (changes
+information about fixes/improvements to CIFS/SMB2/SMB3 support (changes
to cifs.ko module) by kernel version (and cifs internal module version).
+This may be easier to read than parsing the output of
+"git log fs/smb/client" by release.
diff --git a/Documentation/admin-guide/cifs/introduction.rst b/Documentation/admin-guide/cifs/introduction.rst
index 0b98f672d36f..53ea62906aa5 100644
--- a/Documentation/admin-guide/cifs/introduction.rst
+++ b/Documentation/admin-guide/cifs/introduction.rst
@@ -7,19 +7,19 @@ Introduction
protocol which was the successor to the Server Message Block
(SMB) protocol, the native file sharing mechanism for most early
PC operating systems. New and improved versions of CIFS are now
- called SMB2 and SMB3. Use of SMB3 (and later, including SMB3.1.1)
- is strongly preferred over using older dialects like CIFS due to
- security reaasons. All modern dialects, including the most recent,
- SMB3.1.1 are supported by the CIFS VFS module. The SMB3 protocol
- is implemented and supported by all major file servers
- such as all modern versions of Windows (including Windows 2016
- Server), as well as by Samba (which provides excellent
- CIFS/SMB2/SMB3 server support and tools for Linux and many other
- operating systems). Apple systems also support SMB3 well, as
- do most Network Attached Storage vendors, so this network
- filesystem client can mount to a wide variety of systems.
- It also supports mounting to the cloud (for example
- Microsoft Azure), including the necessary security features.
+ called SMB2 and SMB3. Use of SMB3 (and later, including SMB3.1.1
+ the most current dialect) is strongly preferred over using older
+ dialects like CIFS due to security reasons. All modern dialects,
+ including the most recent, SMB3.1.1, are supported by the CIFS VFS
+ module. The SMB3 protocol is implemented and supported by all major
+ file servers such as Windows (including Windows 2019 Server), as
+ well as by Samba (which provides excellent CIFS/SMB2/SMB3 server
+ support and tools for Linux and many other operating systems).
+ Apple systems also support SMB3 well, as do most Network Attached
+ Storage vendors, so this network filesystem client can mount to a
+ wide variety of systems. It also supports mounting to the cloud
+ (for example Microsoft Azure), including the necessary security
+ features.
The intent of this module is to provide the most advanced network
file system function for SMB3 compliant servers, including advanced
@@ -27,8 +27,8 @@ Introduction
POSIX compliance, secure per-user session establishment, encryption,
high performance safe distributed caching (leases/oplocks), optional packet
signing, large files, Unicode support and other internationalization
- improvements. Since both Samba server and this filesystem client support
- the CIFS Unix extensions (and in the future SMB3 POSIX extensions),
+ improvements. Since both Samba server and this filesystem client support the
+ CIFS Unix extensions, and the Linux client also suppors SMB3 POSIX extensions,
the combination can provide a reasonable alternative to other network and
cluster file systems for fileserving in some Linux to Linux environments,
not just in Linux to Windows (or Linux to Mac) environments.
diff --git a/Documentation/admin-guide/cifs/todo.rst b/Documentation/admin-guide/cifs/todo.rst
index 084c25f92dcb..9a65c670774e 100644
--- a/Documentation/admin-guide/cifs/todo.rst
+++ b/Documentation/admin-guide/cifs/todo.rst
@@ -2,7 +2,8 @@
TODO
====
-Version 2.14 December 21, 2018
+As of 6.7 kernel. See https://wiki.samba.org/index.php/LinuxCIFSKernel
+for list of features added by release
A Partial List of Missing Features
==================================
@@ -12,25 +13,27 @@ for visible, important contributions to this module. Here
is a partial list of the known problems and missing features:
a) SMB3 (and SMB3.1.1) missing optional features:
+ multichannel performance optimizations, algorithmic channel selection,
+ directory leases optimizations,
+ support for faster packet signing (GMAC),
+ support for compression over the network,
+ T10 copy offload ie "ODX" (copy chunk, and "Duplicate Extents" ioctl
+ are currently the only two server side copy mechanisms supported)
- - multichannel (started), integration with RDMA
- - directory leases (improved metadata caching), started (root dir only)
- - T10 copy offload ie "ODX" (copy chunk, and "Duplicate Extents" ioctl
- currently the only two server side copy mechanisms supported)
+b) Better optimized compounding and error handling for sparse file support,
+ perhaps addition of new optional SMB3.1.1 fsctls to make collapse range
+ and insert range more atomic
-b) improved sparse file support (fiemap and SEEK_HOLE are implemented
- but additional features would be supportable by the protocol).
+c) Support for SMB3.1.1 over QUIC (and perhaps other socket based protocols
+ like SCTP)
-c) Directory entry caching relies on a 1 second timer, rather than
- using Directory Leases, currently only the root file handle is cached longer
-
-d) quota support (needs minor kernel change since quota calls
- to make it to network filesystems or deviceless filesystems)
+d) quota support (needs minor kernel change since quota calls otherwise
+ won't make it to network filesystems or deviceless filesystems).
e) Additional use cases can be optimized to use "compounding" (e.g.
open/query/close and open/setinfo/close) to reduce the number of
roundtrips to the server and improve performance. Various cases
- (stat, statfs, create, unlink, mkdir) already have been improved by
+ (stat, statfs, create, unlink, mkdir, xattrs) already have been improved by
using compounding but more can be done. In addition we could
significantly reduce redundant opens by using deferred close (with
handle caching leases) and better using reference counters on file
@@ -60,7 +63,9 @@ k) Add tools to take advantage of more smb3 specific ioctls and features
metadata attributes easier from tools (e.g. extending what was done
in smb-info tool).
-l) encrypted file support
+l) encrypted file support (currently the attribute showing the file is
+ encrypted on the server is reported, but changing the attribute is not
+ supported).
m) improved stats gathering tools (perhaps integration with nfsometer?)
to extend and make easier to use what is currently in /proc/fs/cifs/Stats
@@ -69,14 +74,13 @@ n) Add support for claims based ACLs ("DAC")
o) mount helper GUI (to simplify the various configuration options on mount)
-p) Add support for witness protocol (perhaps ioctl to cifs.ko from user space
- tool listening on witness protocol RPC) to allow for notification of share
- move, server failover, and server adapter changes. And also improve other
- failover scenarios, e.g. when client knows multiple DFS entries point to
- different servers, and the server we are connected to has gone down.
+p) Expand support for witness protocol to allow for notification of share
+ move, and server network adapter changes. Currently only notifications by
+ the witness protocol for server move is supported by the Linux client.
q) Allow mount.cifs to be more verbose in reporting errors with dialect
- or unsupported feature errors.
+ or unsupported feature errors. This would now be easier due to the
+ implementation of the new mount API.
r) updating cifs documentation, and user guide.
@@ -87,26 +91,22 @@ t) split cifs and smb3 support into separate modules so legacy (and less
secure) CIFS dialect can be disabled in environments that don't need it
and simplify the code.
-v) POSIX Extensions for SMB3.1.1 (started, create and mkdir support added
- so far).
+v) Additional testing of POSIX Extensions for SMB3.1.1
+
+w) Support for the Mac SMB3.1.1 extensions to improve interop with Apple servers
-w) Add support for additional strong encryption types, and additional spnego
- authentication mechanisms (see MS-SMB2)
+x) Support for additional authentication options (e.g. IAKERB, peer-to-peer
+ Kerberos, SCRAM and others supported by existing servers)
-x) Finish support for SMB3.1.1 compression
+y) Improved tracing, more eBPF trace points, better scripts for performance
+ analysis
Known Bugs
==========
-See http://bugzilla.samba.org - search on product "CifsVFS" for
+See https://bugzilla.samba.org - search on product "CifsVFS" for
current bug list. Also check http://bugzilla.kernel.org (Product = File System, Component = CIFS)
-
-1) existing symbolic links (Windows reparse points) are recognized but
- can not be created remotely. They are implemented for Samba and those that
- support the CIFS Unix extensions, although earlier versions of Samba
- overly restrict the pathnames.
-2) follow_link and readdir code does not follow dfs junctions
- but recognizes them
+and xfstest results e.g. https://wiki.samba.org/index.php/Xfstest-results-smb3
Misc testing to do
==================
diff --git a/Documentation/admin-guide/cifs/usage.rst b/Documentation/admin-guide/cifs/usage.rst
index d3fb67b8a976..aa8290a29dc8 100644
--- a/Documentation/admin-guide/cifs/usage.rst
+++ b/Documentation/admin-guide/cifs/usage.rst
@@ -16,8 +16,7 @@ standard for interoperating between Macs and Windows and major NAS appliances.
Please see
MS-SMB2 (for detailed SMB2/SMB3/SMB3.1.1 protocol specification)
-http://protocolfreedom.org/ and
-http://samba.org/samba/PFIF/
+or https://samba.org/samba/PFIF/
for more details.
@@ -32,7 +31,7 @@ Build instructions
For Linux:
-1) Download the kernel (e.g. from http://www.kernel.org)
+1) Download the kernel (e.g. from https://www.kernel.org)
and change directory into the top of the kernel directory tree
(e.g. /usr/src/linux-2.5.73)
2) make menuconfig (or make xconfig)
@@ -46,7 +45,7 @@ Installation instructions
If you have built the CIFS vfs as module (successfully) simply
type ``make modules_install`` (or if you prefer, manually copy the file to
-the modules directory e.g. /lib/modules/2.4.10-4GB/kernel/fs/cifs/cifs.ko).
+the modules directory e.g. /lib/modules/6.3.0-060300-generic/kernel/fs/smb/client/cifs.ko).
If you have built the CIFS vfs into the kernel itself, follow the instructions
for your distribution on how to install a new kernel (usually you
@@ -67,24 +66,24 @@ If cifs is built as a module, then the size and number of network buffers
and maximum number of simultaneous requests to one server can be configured.
Changing these from their defaults is not recommended. By executing modinfo::
- modinfo kernel/fs/cifs/cifs.ko
+ modinfo <path to cifs.ko>
-on kernel/fs/cifs/cifs.ko the list of configuration changes that can be made
+on kernel/fs/smb/client/cifs.ko the list of configuration changes that can be made
at module initialization time (by running insmod cifs.ko) can be seen.
Recommendations
===============
-To improve security the SMB2.1 dialect or later (usually will get SMB3) is now
+To improve security the SMB2.1 dialect or later (usually will get SMB3.1.1) is now
the new default. To use old dialects (e.g. to mount Windows XP) use "vers=1.0"
on mount (or vers=2.0 for Windows Vista). Note that the CIFS (vers=1.0) is
much older and less secure than the default dialect SMB3 which includes
many advanced security features such as downgrade attack detection
and encrypted shares and stronger signing and authentication algorithms.
There are additional mount options that may be helpful for SMB3 to get
-improved POSIX behavior (NB: can use vers=3.0 to force only SMB3, never 2.1):
+improved POSIX behavior (NB: can use vers=3 to force SMB3 or later, never 2.1):
- ``mfsymlinks`` and ``cifsacl`` and ``idsfromsid``
+ ``mfsymlinks`` and either ``cifsacl`` or ``modefromsid`` (usually with ``idsfromsid``)
Allowing User Mounts
====================
@@ -116,7 +115,7 @@ later source tree in docs/manpages/mount.cifs.8
Allowing User Unmounts
======================
-To permit users to ummount directories that they have user mounted (see above),
+To permit users to unmount directories that they have user mounted (see above),
the utility umount.cifs may be used. It may be invoked directly, or if
umount.cifs is placed in /sbin, umount can invoke the cifs umount helper
(at least for most versions of the umount utility) for umount of cifs
@@ -198,7 +197,7 @@ that is ignored by local server applications and non-cifs clients and that will
not be traversed by the Samba server). This is opaque to the Linux client
application using the cifs vfs. Absolute symlinks will work to Samba 3.0.5 or
later, but only for remote clients using the CIFS Unix extensions, and will
-be invisbile to Windows clients and typically will not affect local
+be invisible to Windows clients and typically will not affect local
applications running on the same server as Samba.
Use instructions
@@ -268,7 +267,7 @@ would be forbidden for Windows/CIFS semantics) as long as the server is
configured for Unix Extensions (and the client has not disabled
/proc/fs/cifs/LinuxExtensionsEnabled). In addition the mount option
``mapposix`` can be used on CIFS (vers=1.0) to force the mapping of
-illegal Windows/NTFS/SMB characters to a remap range (this mount parm
+illegal Windows/NTFS/SMB characters to a remap range (this mount parameter
is the default for SMB3). This remap (``mapposix``) range is also
compatible with Mac (and "Services for Mac" on some older Windows).
@@ -400,7 +399,7 @@ A partial list of the supported mount options follows:
sep
if first mount option (after the -o), overrides
the comma as the separator between the mount
- parms. e.g.::
+ parameters. e.g.::
-o user=myname,password=mypassword,domain=mydom
@@ -715,6 +714,8 @@ DebugData Displays information about active CIFS sessions and
version.
Stats Lists summary resource usage information as well as per
share statistics.
+open_files List all the open file handles on all active SMB sessions.
+mount_params List of all mount parameters available for the module
======================= =======================================================
Configuration pseudo-files:
@@ -734,10 +735,9 @@ SecurityFlags Flags which control security negotiation and
using weaker password hashes is 0x37037 (lanman,
plaintext, ntlm, ntlmv2, signing allowed). Some
SecurityFlags require the corresponding menuconfig
- options to be enabled (lanman and plaintext require
- CONFIG_CIFS_WEAK_PW_HASH for example). Enabling
- plaintext authentication currently requires also
- enabling lanman authentication in the security flags
+ options to be enabled. Enabling plaintext
+ authentication currently requires also enabling
+ lanman authentication in the security flags
because the cifs module only supports sending
laintext passwords using the older lanman dialect
form of the session setup SMB. (e.g. for authentication
@@ -766,7 +766,7 @@ cifsFYI If set to non-zero value, additional debug information
Some debugging statements are not compiled into the
cifs kernel unless CONFIG_CIFS_DEBUG2 is enabled in the
kernel configuration. cifsFYI may be set to one or
- nore of the following flags (7 sets them all)::
+ more of the following flags (7 sets them all)::
+-----------------------------------------------+------+
| log cifs informational messages | 0x01 |
@@ -795,6 +795,8 @@ LinuxExtensionsEnabled If set to one then the client will attempt to
support and want to map the uid and gid fields
to values supplied at mount (rather than the
actual values, then set this to zero. (default 1)
+dfscache List the content of the DFS cache.
+ If set to 0, the client will clear the cache.
======================= =======================================================
These experimental features and tracing can be enabled by changing flags in
@@ -831,7 +833,7 @@ the active sessions and the shares that are mounted.
Enabling Kerberos (extended security) works but requires version 1.2 or later
of the helper program cifs.upcall to be present and to be configured in the
/etc/request-key.conf file. The cifs.upcall helper program is from the Samba
-project(http://www.samba.org). NTLM and NTLMv2 and LANMAN support do not
+project(https://www.samba.org). NTLM and NTLMv2 and LANMAN support do not
require this helper. Note that NTLMv2 security (which does not require the
cifs.upcall helper program), instead of using Kerberos, is sufficient for
some use cases.
@@ -857,12 +859,17 @@ CIFS kernel module parameters
These module parameters can be specified or modified either during the time of
module loading or during the runtime by using the interface::
- /proc/module/cifs/parameters/<param>
+ /sys/module/cifs/parameters/<param>
i.e.::
echo "value" > /sys/module/cifs/parameters/<param>
+More detailed descriptions of the available module parameters and their values
+can be seen by doing:
+
+ modinfo cifs (or modinfo smb3)
+
================= ==========================================================
1. enable_oplocks Enable or disable oplocks. Oplocks are enabled by default.
[Y/y/1]. To disable use any of [N/n/0].
diff --git a/Documentation/admin-guide/cifs/winucase_convert.pl b/Documentation/admin-guide/cifs/winucase_convert.pl
index 322a9c833f23..993186beea20 100755
--- a/Documentation/admin-guide/cifs/winucase_convert.pl
+++ b/Documentation/admin-guide/cifs/winucase_convert.pl
@@ -16,7 +16,7 @@
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
-# along with this program. If not, see <http://www.gnu.org/licenses/>.
+# along with this program. If not, see <https://www.gnu.org/licenses/>.
#
while(<>) {
diff --git a/Documentation/admin-guide/cpu-load.rst b/Documentation/admin-guide/cpu-load.rst
index 2d01ce43d2a2..21a984337080 100644
--- a/Documentation/admin-guide/cpu-load.rst
+++ b/Documentation/admin-guide/cpu-load.rst
@@ -61,51 +61,54 @@ will lead to quite erratic information inside ``/proc/stat``::
static volatile sig_atomic_t stop;
- static void sighandler (int signr)
+ static void sighandler(int signr)
{
- (void) signr;
- stop = 1;
+ (void) signr;
+ stop = 1;
}
+
static unsigned long hog (unsigned long niters)
{
- stop = 0;
- while (!stop && --niters);
- return niters;
+ stop = 0;
+ while (!stop && --niters);
+ return niters;
}
+
int main (void)
{
- int i;
- struct itimerval it = { .it_interval = { .tv_sec = 0, .tv_usec = 1 },
- .it_value = { .tv_sec = 0, .tv_usec = 1 } };
- sigset_t set;
- unsigned long v[HIST];
- double tmp = 0.0;
- unsigned long n;
- signal (SIGALRM, &sighandler);
- setitimer (ITIMER_REAL, &it, NULL);
-
- hog (ULONG_MAX);
- for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog (ULONG_MAX);
- for (i = 0; i < HIST; ++i) tmp += v[i];
- tmp /= HIST;
- n = tmp - (tmp / 3.0);
-
- sigemptyset (&set);
- sigaddset (&set, SIGALRM);
-
- for (;;) {
- hog (n);
- sigwait (&set, &i);
- }
- return 0;
+ int i;
+ struct itimerval it = {
+ .it_interval = { .tv_sec = 0, .tv_usec = 1 },
+ .it_value = { .tv_sec = 0, .tv_usec = 1 } };
+ sigset_t set;
+ unsigned long v[HIST];
+ double tmp = 0.0;
+ unsigned long n;
+ signal(SIGALRM, &sighandler);
+ setitimer(ITIMER_REAL, &it, NULL);
+
+ hog (ULONG_MAX);
+ for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog(ULONG_MAX);
+ for (i = 0; i < HIST; ++i) tmp += v[i];
+ tmp /= HIST;
+ n = tmp - (tmp / 3.0);
+
+ sigemptyset(&set);
+ sigaddset(&set, SIGALRM);
+
+ for (;;) {
+ hog(n);
+ sigwait(&set, &i);
+ }
+ return 0;
}
References
----------
-- http://lkml.org/lkml/2007/2/12/6
-- Documentation/filesystems/proc.txt (1.8)
+- https://lore.kernel.org/r/loom.20070212T063225-663@post.gmane.org
+- Documentation/filesystems/proc.rst (1.8)
Thanks
diff --git a/Documentation/admin-guide/cputopology.rst b/Documentation/admin-guide/cputopology.rst
index b90dafcc8237..d29cacc9b3c3 100644
--- a/Documentation/admin-guide/cputopology.rst
+++ b/Documentation/admin-guide/cputopology.rst
@@ -2,105 +2,28 @@
How CPU topology info is exported via sysfs
===========================================
-Export CPU topology info via sysfs. Items (attributes) are similar
-to /proc/cpuinfo output of some architectures. They reside in
-/sys/devices/system/cpu/cpuX/topology/:
-
-physical_package_id:
-
- physical package id of cpuX. Typically corresponds to a physical
- socket number, but the actual value is architecture and platform
- dependent.
-
-die_id:
-
- the CPU die ID of cpuX. Typically it is the hardware platform's
- identifier (rather than the kernel's). The actual value is
- architecture and platform dependent.
-
-core_id:
-
- the CPU core ID of cpuX. Typically it is the hardware platform's
- identifier (rather than the kernel's). The actual value is
- architecture and platform dependent.
-
-book_id:
-
- the book ID of cpuX. Typically it is the hardware platform's
- identifier (rather than the kernel's). The actual value is
- architecture and platform dependent.
-
-drawer_id:
-
- the drawer ID of cpuX. Typically it is the hardware platform's
- identifier (rather than the kernel's). The actual value is
- architecture and platform dependent.
-
-core_cpus:
-
- internal kernel map of CPUs within the same core.
- (deprecated name: "thread_siblings")
-
-core_cpus_list:
-
- human-readable list of CPUs within the same core.
- (deprecated name: "thread_siblings_list");
-
-package_cpus:
-
- internal kernel map of the CPUs sharing the same physical_package_id.
- (deprecated name: "core_siblings")
-
-package_cpus_list:
-
- human-readable list of CPUs sharing the same physical_package_id.
- (deprecated name: "core_siblings_list")
-
-die_cpus:
-
- internal kernel map of CPUs within the same die.
-
-die_cpus_list:
-
- human-readable list of CPUs within the same die.
-
-book_siblings:
-
- internal kernel map of cpuX's hardware threads within the same
- book_id.
-
-book_siblings_list:
-
- human-readable list of cpuX's hardware threads within the same
- book_id.
-
-drawer_siblings:
-
- internal kernel map of cpuX's hardware threads within the same
- drawer_id.
-
-drawer_siblings_list:
-
- human-readable list of cpuX's hardware threads within the same
- drawer_id.
+CPU topology info is exported via sysfs. Items (attributes) are similar
+to /proc/cpuinfo output of some architectures. They reside in
+/sys/devices/system/cpu/cpuX/topology/. Please refer to the ABI file:
+Documentation/ABI/stable/sysfs-devices-system-cpu.
Architecture-neutral, drivers/base/topology.c, exports these attributes.
-However, the book and drawer related sysfs files will only be created if
-CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively.
-
-CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are currently only used on s390,
-where they reflect the cpu and cache hierarchy.
+However the die, cluster, book, and drawer hierarchy related sysfs files will
+only be created if an architecture provides the related macros as described
+below.
For an architecture to support this feature, it must define some of
these macros in include/asm-XXX/topology.h::
#define topology_physical_package_id(cpu)
#define topology_die_id(cpu)
+ #define topology_cluster_id(cpu)
#define topology_core_id(cpu)
#define topology_book_id(cpu)
#define topology_drawer_id(cpu)
#define topology_sibling_cpumask(cpu)
#define topology_core_cpumask(cpu)
+ #define topology_cluster_cpumask(cpu)
#define topology_die_cpumask(cpu)
#define topology_book_cpumask(cpu)
#define topology_drawer_cpumask(cpu)
@@ -116,15 +39,16 @@ not defined by include/asm-XXX/topology.h:
1) topology_physical_package_id: -1
2) topology_die_id: -1
-3) topology_core_id: 0
-4) topology_sibling_cpumask: just the given CPU
-5) topology_core_cpumask: just the given CPU
-6) topology_die_cpumask: just the given CPU
-
-For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
-default definitions for topology_book_id() and topology_book_cpumask().
-For architectures that don't support drawers (CONFIG_SCHED_DRAWER) there are
-no default definitions for topology_drawer_id() and topology_drawer_cpumask().
+3) topology_cluster_id: -1
+4) topology_core_id: 0
+5) topology_book_id: -1
+6) topology_drawer_id: -1
+7) topology_sibling_cpumask: just the given CPU
+8) topology_core_cpumask: just the given CPU
+9) topology_cluster_cpumask: just the given CPU
+10) topology_die_cpumask: just the given CPU
+11) topology_book_cpumask: just the given CPU
+12) topology_drawer_cpumask: just the given CPU
Additionally, CPU topology information is provided under
/sys/devices/system/cpu and includes these files. The internal
@@ -135,9 +59,9 @@ source for the output is in brackets ("[]").
[NR_CPUS-1]
offline: CPUs that are not online because they have been
- HOTPLUGGED off (see cpu-hotplug.txt) or exceed the limit
- of CPUs allowed by the kernel configuration (kernel_max
- above). [~cpu_online_mask + cpus >= NR_CPUS]
+ HOTPLUGGED off or exceed the limit of CPUs allowed by the
+ kernel configuration (kernel_max above).
+ [~cpu_online_mask + cpus >= NR_CPUS]
online: CPUs that are online and being scheduled [cpu_online_mask]
@@ -173,5 +97,5 @@ online.)::
possible: 0-127
present: 0-3
-See cpu-hotplug.txt for the possible_cpus=NUM kernel start parameter
-as well as more information on the various cpumasks.
+See Documentation/core-api/cpu_hotplug.rst for the possible_cpus=NUM
+kernel start parameter as well as more information on the various cpumasks.
diff --git a/Documentation/admin-guide/dell_rbu.rst b/Documentation/admin-guide/dell_rbu.rst
index 8d70e1fc9f9d..2196caf1b939 100644
--- a/Documentation/admin-guide/dell_rbu.rst
+++ b/Documentation/admin-guide/dell_rbu.rst
@@ -26,7 +26,7 @@ Please go to http://support.dell.com register and you can find info on
OpenManage and Dell Update packages (DUP).
Libsmbios can also be used to update BIOS on Dell systems go to
-http://linux.dell.com/libsmbios/ for details.
+https://linux.dell.com/libsmbios/ for details.
Dell_RBU driver supports BIOS update using the monolithic image and packetized
image methods. In case of monolithic the driver allocates a contiguous chunk
diff --git a/Documentation/admin-guide/device-mapper/cache-policies.rst b/Documentation/admin-guide/device-mapper/cache-policies.rst
index b17fe352fc41..13da4d831d46 100644
--- a/Documentation/admin-guide/device-mapper/cache-policies.rst
+++ b/Documentation/admin-guide/device-mapper/cache-policies.rst
@@ -70,7 +70,7 @@ the entries (each hotspot block covers a larger area than a single
cache block).
All this means smq uses ~25bytes per cache block. Still a lot of
-memory, but a substantial improvement nontheless.
+memory, but a substantial improvement nonetheless.
Level balancing
^^^^^^^^^^^^^^^
diff --git a/Documentation/admin-guide/device-mapper/dm-crypt.rst b/Documentation/admin-guide/device-mapper/dm-crypt.rst
index 8f4a3f889d43..aa2d04d95df6 100644
--- a/Documentation/admin-guide/device-mapper/dm-crypt.rst
+++ b/Documentation/admin-guide/device-mapper/dm-crypt.rst
@@ -46,7 +46,7 @@ Parameters::
capi:authenc(hmac(sha256),xts(aes))-random
capi:rfc7539(chacha20,poly1305)-random
- The /proc/crypto contains a list of curently loaded crypto modes.
+ The /proc/crypto contains a list of currently loaded crypto modes.
<key>
Key used for encryption. It is encoded either as a hexadecimal number
@@ -67,7 +67,7 @@ Parameters::
the value passed in <key_size>.
<key_type>
- Either 'logon' or 'user' kernel key type.
+ Either 'logon', 'user', 'encrypted' or 'trusted' kernel key type.
<key_description>
The kernel keyring key description crypt target should look for
@@ -92,7 +92,7 @@ Parameters::
<#opt_params>
Number of optional parameters. If there are no optional parameters,
- the optional paramaters section can be skipped or #opt_params can be zero.
+ the optional parameters section can be skipped or #opt_params can be zero.
Otherwise #opt_params is the number of following arguments.
Example of optional parameters section:
@@ -121,6 +121,14 @@ submit_from_crypt_cpus
thread because it benefits CFQ to have writes submitted using the
same context.
+no_read_workqueue
+ Bypass dm-crypt internal workqueue and process read requests synchronously.
+
+no_write_workqueue
+ Bypass dm-crypt internal workqueue and process write requests synchronously.
+ This option is automatically enabled for host-managed zoned block devices
+ (e.g. host-managed SMR hard-disks).
+
integrity:<bytes>:<type>
The device requires additional <bytes> metadata per-sector stored
in per-bio integrity structure. This metadata must by provided
diff --git a/Documentation/admin-guide/device-mapper/dm-dust.rst b/Documentation/admin-guide/device-mapper/dm-dust.rst
index b6e7e7ead831..e35ec8cd2f88 100644
--- a/Documentation/admin-guide/device-mapper/dm-dust.rst
+++ b/Documentation/admin-guide/device-mapper/dm-dust.rst
@@ -69,10 +69,11 @@ Create the dm-dust device:
$ sudo dmsetup create dust1 --table '0 33552384 dust /dev/vdb1 0 4096'
Check the status of the read behavior ("bypass" indicates that all I/O
-will be passed through to the underlying device)::
+will be passed through to the underlying device; "verbose" indicates that
+bad block additions, removals, and remaps will be verbosely logged)::
$ sudo dmsetup status dust1
- 0 33552384 dust 252:17 bypass
+ 0 33552384 dust 252:17 bypass verbose
$ sudo dd if=/dev/mapper/dust1 of=/dev/null bs=512 count=128 iflag=direct
128+0 records in
@@ -164,7 +165,7 @@ following message command::
A message will print with the number of bad blocks currently
configured on the device::
- kernel: device-mapper: dust: countbadblocks: 895 badblock(s) found
+ countbadblocks: 895 badblock(s) found
Querying for specific bad blocks
--------------------------------
@@ -176,11 +177,11 @@ following message command::
The following message will print if the block is in the list::
- device-mapper: dust: queryblock: block 72 found in badblocklist
+ dust_query_block: block 72 found in badblocklist
The following message will print if the block is not in the list::
- device-mapper: dust: queryblock: block 72 not found in badblocklist
+ dust_query_block: block 72 not found in badblocklist
The "queryblock" message command will work in both the "enabled"
and "disabled" modes, allowing the verification of whether a block
@@ -198,12 +199,28 @@ following message command::
After clearing the bad block list, the following message will appear::
- kernel: device-mapper: dust: clearbadblocks: badblocks cleared
+ dust_clear_badblocks: badblocks cleared
If there were no bad blocks to clear, the following message will
appear::
- kernel: device-mapper: dust: clearbadblocks: no badblocks found
+ dust_clear_badblocks: no badblocks found
+
+Listing the bad block list
+--------------------------
+
+To list all bad blocks in the bad block list (using an example device
+with blocks 1 and 2 in the bad block list), run the following message
+command::
+
+ $ sudo dmsetup message dust1 0 listbadblocks
+ 1
+ 2
+
+If there are no bad blocks in the bad block list, the command will
+execute with no output::
+
+ $ sudo dmsetup message dust1 0 listbadblocks
Message commands list
---------------------
@@ -223,6 +240,7 @@ Single argument message commands::
countbadblocks
clearbadblocks
+ listbadblocks
disable
enable
quiet
diff --git a/Documentation/admin-guide/device-mapper/dm-ebs.rst b/Documentation/admin-guide/device-mapper/dm-ebs.rst
new file mode 100644
index 000000000000..c09f66db5621
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-ebs.rst
@@ -0,0 +1,51 @@
+======
+dm-ebs
+======
+
+
+This target is similar to the linear target except that it emulates
+a smaller logical block size on a device with a larger logical block
+size. Its main purpose is to provide emulation of 512 byte sectors on
+devices that do not provide this emulation (i.e. 4K native disks).
+
+Supported emulated logical block sizes 512, 1024, 2048 and 4096.
+
+Underlying block size can be set to > 4K to test buffering larger units.
+
+
+Table parameters
+----------------
+ <dev path> <offset> <emulated sectors> [<underlying sectors>]
+
+Mandatory parameters:
+
+ <dev path>:
+ Full pathname to the underlying block-device,
+ or a "major:minor" device-number.
+ <offset>:
+ Starting sector within the device;
+ has to be a multiple of <emulated sectors>.
+ <emulated sectors>:
+ Number of sectors defining the logical block size to be emulated;
+ 1, 2, 4, 8 sectors of 512 bytes supported.
+
+Optional parameter:
+
+ <underlying sectors>:
+ Number of sectors defining the logical block size of <dev path>.
+ 2^N supported, e.g. 8 = emulate 8 sectors of 512 bytes = 4KiB.
+ If not provided, the logical block size of <dev path> will be used.
+
+
+Examples:
+
+Emulate 1 sector = 512 bytes logical block size on /dev/sda starting at
+offset 1024 sectors with underlying devices block size automatically set:
+
+ebs /dev/sda 1024 1
+
+Emulate 2 sector = 1KiB logical block size on /dev/sda starting at
+offset 128 sectors, enforce 2KiB underlying device block size.
+This presumes 2KiB logical blocksize on /dev/sda or less to work:
+
+ebs /dev/sda 128 2 4
diff --git a/Documentation/admin-guide/device-mapper/dm-flakey.rst b/Documentation/admin-guide/device-mapper/dm-flakey.rst
index 86138735879d..f967c5fea219 100644
--- a/Documentation/admin-guide/device-mapper/dm-flakey.rst
+++ b/Documentation/admin-guide/device-mapper/dm-flakey.rst
@@ -39,6 +39,10 @@ Optional feature parameters:
If no feature parameters are present, during the periods of
unreliability, all I/O returns errors.
+ error_reads:
+ All read I/O is failed with an error signalled.
+ Write I/O is handled correctly.
+
drop_writes:
All write I/O is silently ignored.
Read I/O is handled correctly.
@@ -63,6 +67,16 @@ Optional feature parameters:
Perform the replacement only if bio->bi_opf has all the
selected flags set.
+ random_read_corrupt <probability>
+ During <down interval>, replace random byte in a read bio
+ with a random value. probability is an integer between
+ 0 and 1000000000 meaning 0% to 100% probability of corruption.
+
+ random_write_corrupt <probability>
+ During <down interval>, replace random byte in a write bio
+ with a random value. probability is an integer between
+ 0 and 1000000000 meaning 0% to 100% probability of corruption.
+
Examples:
Replaces the 32nd byte of READ bios with the value 1::
diff --git a/Documentation/admin-guide/device-mapper/dm-ima.rst b/Documentation/admin-guide/device-mapper/dm-ima.rst
new file mode 100644
index 000000000000..a4aa50a828e0
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-ima.rst
@@ -0,0 +1,715 @@
+======
+dm-ima
+======
+
+For a given system, various external services/infrastructure tools
+(including the attestation service) interact with it - both during the
+setup and during rest of the system run-time. They share sensitive data
+and/or execute critical workload on that system. The external services
+may want to verify the current run-time state of the relevant kernel
+subsystems before fully trusting the system with business-critical
+data/workload.
+
+Device mapper plays a critical role on a given system by providing
+various important functionalities to the block devices using various
+target types like crypt, verity, integrity etc. Each of these target
+types’ functionalities can be configured with various attributes.
+The attributes chosen to configure these target types can significantly
+impact the security profile of the block device, and in-turn, of the
+system itself. For instance, the type of encryption algorithm and the
+key size determines the strength of encryption for a given block device.
+
+Therefore, verifying the current state of various block devices as well
+as their various target attributes is crucial for external services before
+fully trusting the system with business-critical data/workload.
+
+IMA kernel subsystem provides the necessary functionality for
+device mapper to measure the state and configuration of
+various block devices -
+
+- by device mapper itself, from within the kernel,
+- in a tamper resistant way,
+- and re-measured - triggered on state/configuration change.
+
+Setting the IMA Policy:
+=======================
+For IMA to measure the data on a given system, the IMA policy on the
+system needs to be updated to have following line, and the system needs
+to be restarted for the measurements to take effect.
+
+::
+
+ /etc/ima/ima-policy
+ measure func=CRITICAL_DATA label=device-mapper template=ima-buf
+
+The measurements will be reflected in the IMA logs, which are located at:
+
+::
+
+ /sys/kernel/security/integrity/ima/ascii_runtime_measurements
+ /sys/kernel/security/integrity/ima/binary_runtime_measurements
+
+Then IMA ASCII measurement log has the following format:
+
+::
+
+ <PCR> <TEMPLATE_DATA_DIGEST> <TEMPLATE_NAME> <TEMPLATE_DATA>
+
+ PCR := Platform Configuration Register, in which the values are registered.
+ This is applicable if TPM chip is in use.
+
+ TEMPLATE_DATA_DIGEST := Template data digest of the IMA record.
+ TEMPLATE_NAME := Template name that registered the integrity value (e.g. ima-buf).
+
+ TEMPLATE_DATA := <ALG> ":" <EVENT_DIGEST> <EVENT_NAME> <EVENT_DATA>
+ It contains data for the specific event to be measured,
+ in a given template data format.
+
+ ALG := Algorithm to compute event digest
+ EVENT_DIGEST := Digest of the event data
+ EVENT_NAME := Description of the event (e.g. 'dm_table_load').
+ EVENT_DATA := The event data to be measured.
+
+|
+
+| *NOTE #1:*
+| The DM target data measured by IMA subsystem can alternatively
+ be queried from userspace by setting DM_IMA_MEASUREMENT_FLAG with
+ DM_TABLE_STATUS_CMD.
+
+|
+
+| *NOTE #2:*
+| The Kernel configuration CONFIG_IMA_DISABLE_HTABLE allows measurement of duplicate records.
+| To support recording duplicate IMA events in the IMA log, the Kernel needs to be configured with
+ CONFIG_IMA_DISABLE_HTABLE=y.
+
+Supported Device States:
+========================
+Following device state changes will trigger IMA measurements:
+
+ 1. Table load
+ #. Device resume
+ #. Device remove
+ #. Table clear
+ #. Device rename
+
+1. Table load:
+---------------
+When a new table is loaded in a device's inactive table slot,
+the device information and target specific details from the
+targets in the table are measured.
+
+The IMA measurement log has the following format for 'dm_table_load':
+
+::
+
+ EVENT_NAME := "dm_table_load"
+ EVENT_DATA := <dm_version_str> ";" <device_metadata> ";" <table_load_data>
+
+ dm_version_str := "dm_version=" <N> "." <N> "." <N>
+ Same as Device Mapper driver version.
+ device_metadata := <device_name> "," <device_uuid> "," <device_major> "," <device_minor> ","
+ <minor_count> "," <num_device_targets> ";"
+
+ device_name := "name=" <dm-device-name>
+ device_uuid := "uuid=" <dm-device-uuid>
+ device_major := "major=" <N>
+ device_minor := "minor=" <N>
+ minor_count := "minor_count=" <N>
+ num_device_targets := "num_targets=" <N>
+ dm-device-name := Name of the device. If it contains special characters like '\', ',', ';',
+ they are prefixed with '\'.
+ dm-device-uuid := UUID of the device. If it contains special characters like '\', ',', ';',
+ they are prefixed with '\'.
+
+ table_load_data := <target_data>
+ Represents the data (as name=value pairs) from various targets in the table,
+ which is being loaded into the DM device's inactive table slot.
+ target_data := <target_data_row> | <target_data><target_data_row>
+
+ target_data_row := <target_index> "," <target_begin> "," <target_len> "," <target_name> ","
+ <target_version> "," <target_attributes> ";"
+ target_index := "target_index=" <N>
+ Represents nth target in the table (from 0 to N-1 targets specified in <num_device_targets>)
+ If all the data for N targets doesn't fit in the given buffer - then the data that fits
+ in the buffer (say from target 0 to x) is measured in a given IMA event.
+ The remaining data from targets x+1 to N-1 is measured in the subsequent IMA events,
+ with the same format as that of 'dm_table_load'
+ i.e. <dm_version_str> ";" <device_metadata> ";" <table_load_data>.
+
+ target_begin := "target_begin=" <N>
+ target_len := "target_len=" <N>
+ target_name := Name of the target. 'linear', 'crypt', 'integrity' etc.
+ The targets that are supported for IMA measurements are documented below in the
+ 'Supported targets' section.
+ target_version := "target_version=" <N> "." <N> "." <N>
+ target_attributes := Data containing comma separated list of name=value pairs of target specific attributes.
+
+ For instance, if a linear device is created with the following table entries,
+ # dmsetup create linear1
+ 0 2 linear /dev/loop0 512
+ 2 2 linear /dev/loop0 512
+ 4 2 linear /dev/loop0 512
+ 6 2 linear /dev/loop0 512
+
+ Then IMA ASCII measurement log will have the following entry:
+ (converted from ASCII to text for readability)
+
+ 10 a8c5ff755561c7a28146389d1514c318592af49a ima-buf sha256:4d73481ecce5eadba8ab084640d85bb9ca899af4d0a122989252a76efadc5b72
+ dm_table_load
+ dm_version=4.45.0;
+ name=linear1,uuid=,major=253,minor=0,minor_count=1,num_targets=4;
+ target_index=0,target_begin=0,target_len=2,target_name=linear,target_version=1.4.0,device_name=7:0,start=512;
+ target_index=1,target_begin=2,target_len=2,target_name=linear,target_version=1.4.0,device_name=7:0,start=512;
+ target_index=2,target_begin=4,target_len=2,target_name=linear,target_version=1.4.0,device_name=7:0,start=512;
+ target_index=3,target_begin=6,target_len=2,target_name=linear,target_version=1.4.0,device_name=7:0,start=512;
+
+2. Device resume:
+------------------
+When a suspended device is resumed, the device information and the hash of the
+data from previous load of an active table are measured.
+
+The IMA measurement log has the following format for 'dm_device_resume':
+
+::
+
+ EVENT_NAME := "dm_device_resume"
+ EVENT_DATA := <dm_version_str> ";" <device_metadata> ";" <active_table_hash> ";" <current_device_capacity> ";"
+
+ dm_version_str := As described in the 'Table load' section above.
+ device_metadata := As described in the 'Table load' section above.
+ active_table_hash := "active_table_hash=" <table_hash_alg> ":" <table_hash>
+ Rerpresents the hash of the IMA data being measured for the
+ active table for the device.
+ table_hash_alg := Algorithm used to compute the hash.
+ table_hash := Hash of the (<dm_version_str> ";" <device_metadata> ";" <table_load_data> ";")
+ as described in the 'dm_table_load' above.
+ Note: If the table_load data spans across multiple IMA 'dm_table_load'
+ events for a given device, the hash is computed combining all the event data
+ i.e. (<dm_version_str> ";" <device_metadata> ";" <table_load_data> ";")
+ across all those events.
+ current_device_capacity := "current_device_capacity=" <N>
+
+ For instance, if a linear device is resumed with the following command,
+ #dmsetup resume linear1
+
+ then IMA ASCII measurement log will have an entry with:
+ (converted from ASCII to text for readability)
+
+ 10 56c00cc062ffc24ccd9ac2d67d194af3282b934e ima-buf sha256:e7d12c03b958b4e0e53e7363a06376be88d98a1ac191fdbd3baf5e4b77f329b6
+ dm_device_resume
+ dm_version=4.45.0;
+ name=linear1,uuid=,major=253,minor=0,minor_count=1,num_targets=4;
+ active_table_hash=sha256:4d73481ecce5eadba8ab084640d85bb9ca899af4d0a122989252a76efadc5b72;current_device_capacity=8;
+
+3. Device remove:
+------------------
+When a device is removed, the device information and a sha256 hash of the
+data from an active and inactive table are measured.
+
+The IMA measurement log has the following format for 'dm_device_remove':
+
+::
+
+ EVENT_NAME := "dm_device_remove"
+ EVENT_DATA := <dm_version_str> ";" <device_active_metadata> ";" <device_inactive_metadata> ";"
+ <active_table_hash> "," <inactive_table_hash> "," <remove_all> ";" <current_device_capacity> ";"
+
+ dm_version_str := As described in the 'Table load' section above.
+ device_active_metadata := Device metadata that reflects the currently loaded active table.
+ The format is same as 'device_metadata' described in the 'Table load' section above.
+ device_inactive_metadata := Device metadata that reflects the inactive table.
+ The format is same as 'device_metadata' described in the 'Table load' section above.
+ active_table_hash := Hash of the currently loaded active table.
+ The format is same as 'active_table_hash' described in the 'Device resume' section above.
+ inactive_table_hash := Hash of the inactive table.
+ The format is same as 'active_table_hash' described in the 'Device resume' section above.
+ remove_all := "remove_all=" <yes_no>
+ yes_no := "y" | "n"
+ current_device_capacity := "current_device_capacity=" <N>
+
+ For instance, if a linear device is removed with the following command,
+ #dmsetup remove l1
+
+ then IMA ASCII measurement log will have the following entry:
+ (converted from ASCII to text for readability)
+
+ 10 790e830a3a7a31590824ac0642b3b31c2d0e8b38 ima-buf sha256:ab9f3c959367a8f5d4403d6ce9c3627dadfa8f9f0e7ec7899299782388de3840
+ dm_device_remove
+ dm_version=4.45.0;
+ device_active_metadata=name=l1,uuid=,major=253,minor=2,minor_count=1,num_targets=2;
+ device_inactive_metadata=name=l1,uuid=,major=253,minor=2,minor_count=1,num_targets=1;
+ active_table_hash=sha256:4a7e62efaebfc86af755831998b7db6f59b60d23c9534fb16a4455907957953a,
+ inactive_table_hash=sha256:9d79c175bc2302d55a183e8f50ad4bafd60f7692fd6249e5fd213e2464384b86,remove_all=n;
+ current_device_capacity=2048;
+
+4. Table clear:
+----------------
+When an inactive table is cleared from the device, the device information and a sha256 hash of the
+data from an inactive table are measured.
+
+The IMA measurement log has the following format for 'dm_table_clear':
+
+::
+
+ EVENT_NAME := "dm_table_clear"
+ EVENT_DATA := <dm_version_str> ";" <device_inactive_metadata> ";" <inactive_table_hash> ";" <current_device_capacity> ";"
+
+ dm_version_str := As described in the 'Table load' section above.
+ device_inactive_metadata := Device metadata that was captured during the load time inactive table being cleared.
+ The format is same as 'device_metadata' described in the 'Table load' section above.
+ inactive_table_hash := Hash of the inactive table being cleared from the device.
+ The format is same as 'active_table_hash' described in the 'Device resume' section above.
+ current_device_capacity := "current_device_capacity=" <N>
+
+ For instance, if a linear device's inactive table is cleared,
+ #dmsetup clear l1
+
+ then IMA ASCII measurement log will have an entry with:
+ (converted from ASCII to text for readability)
+
+ 10 77d347408f557f68f0041acb0072946bb2367fe5 ima-buf sha256:42f9ca22163fdfa548e6229dece2959bc5ce295c681644240035827ada0e1db5
+ dm_table_clear
+ dm_version=4.45.0;
+ name=l1,uuid=,major=253,minor=2,minor_count=1,num_targets=1;
+ inactive_table_hash=sha256:75c0dc347063bf474d28a9907037eba060bfe39d8847fc0646d75e149045d545;current_device_capacity=1024;
+
+5. Device rename:
+------------------
+When an device's NAME or UUID is changed, the device information and the new NAME and UUID
+are measured.
+
+The IMA measurement log has the following format for 'dm_device_rename':
+
+::
+
+ EVENT_NAME := "dm_device_rename"
+ EVENT_DATA := <dm_version_str> ";" <device_active_metadata> ";" <new_device_name> "," <new_device_uuid> ";" <current_device_capacity> ";"
+
+ dm_version_str := As described in the 'Table load' section above.
+ device_active_metadata := Device metadata that reflects the currently loaded active table.
+ The format is same as 'device_metadata' described in the 'Table load' section above.
+ new_device_name := "new_name=" <dm-device-name>
+ dm-device-name := Same as <dm-device-name> described in 'Table load' section above
+ new_device_uuid := "new_uuid=" <dm-device-uuid>
+ dm-device-uuid := Same as <dm-device-uuid> described in 'Table load' section above
+ current_device_capacity := "current_device_capacity=" <N>
+
+ E.g 1: if a linear device's name is changed with the following command,
+ #dmsetup rename linear1 --setuuid 1234-5678
+
+ then IMA ASCII measurement log will have an entry with:
+ (converted from ASCII to text for readability)
+
+ 10 8b0423209b4c66ac1523f4c9848c9b51ee332f48 ima-buf sha256:6847b7258134189531db593e9230b257c84f04038b5a18fd2e1473860e0569ac
+ dm_device_rename
+ dm_version=4.45.0;
+ name=linear1,uuid=,major=253,minor=2,minor_count=1,num_targets=1;new_name=linear1,new_uuid=1234-5678;
+ current_device_capacity=1024;
+
+ E.g 2: if a linear device's name is changed with the following command,
+ # dmsetup rename linear1 linear=2
+
+ then IMA ASCII measurement log will have an entry with:
+ (converted from ASCII to text for readability)
+
+ 10 bef70476b99c2bdf7136fae033aa8627da1bf76f ima-buf sha256:8c6f9f53b9ef9dc8f92a2f2cca8910e622543d0f0d37d484870cb16b95111402
+ dm_device_rename
+ dm_version=4.45.0;
+ name=linear1,uuid=1234-5678,major=253,minor=2,minor_count=1,num_targets=1;
+ new_name=linear\=2,new_uuid=1234-5678;
+ current_device_capacity=1024;
+
+Supported targets:
+==================
+
+Following targets are supported to measure their data using IMA:
+
+ 1. cache
+ #. crypt
+ #. integrity
+ #. linear
+ #. mirror
+ #. multipath
+ #. raid
+ #. snapshot
+ #. striped
+ #. verity
+
+1. cache
+---------
+The 'target_attributes' (described as part of EVENT_DATA in 'Table load'
+section above) has the following data format for 'cache' target.
+
+::
+
+ target_attributes := <target_name> "," <target_version> "," <metadata_mode> "," <cache_metadata_device> ","
+ <cache_device> "," <cache_origin_device> "," <writethrough> "," <writeback> ","
+ <passthrough> "," <no_discard_passdown> ";"
+
+ target_name := "target_name=cache"
+ target_version := "target_version=" <N> "." <N> "." <N>
+ metadata_mode := "metadata_mode=" <cache_metadata_mode>
+ cache_metadata_mode := "fail" | "ro" | "rw"
+ cache_device := "cache_device=" <cache_device_name_string>
+ cache_origin_device := "cache_origin_device=" <cache_origin_device_string>
+ writethrough := "writethrough=" <yes_no>
+ writeback := "writeback=" <yes_no>
+ passthrough := "passthrough=" <yes_no>
+ no_discard_passdown := "no_discard_passdown=" <yes_no>
+ yes_no := "y" | "n"
+
+ E.g.
+ When a 'cache' target is loaded, then IMA ASCII measurement log will have an entry
+ similar to the following, depicting what 'cache' attributes are measured in EVENT_DATA
+ for 'dm_table_load' event.
+ (converted from ASCII to text for readability)
+
+ dm_version=4.45.0;name=cache1,uuid=cache_uuid,major=253,minor=2,minor_count=1,num_targets=1;
+ target_index=0,target_begin=0,target_len=28672,target_name=cache,target_version=2.2.0,metadata_mode=rw,
+ cache_metadata_device=253:4,cache_device=253:3,cache_origin_device=253:5,writethrough=y,writeback=n,
+ passthrough=n,metadata2=y,no_discard_passdown=n;
+
+
+2. crypt
+---------
+The 'target_attributes' (described as part of EVENT_DATA in 'Table load'
+section above) has the following data format for 'crypt' target.
+
+::
+
+ target_attributes := <target_name> "," <target_version> "," <allow_discards> "," <same_cpu_crypt> ","
+ <submit_from_crypt_cpus> "," <no_read_workqueue> "," <no_write_workqueue> ","
+ <iv_large_sectors> "," <iv_large_sectors> "," [<integrity_tag_size> ","] [<cipher_auth> ","]
+ [<sector_size> ","] [<cipher_string> ","] <key_size> "," <key_parts> ","
+ <key_extra_size> "," <key_mac_size> ";"
+
+ target_name := "target_name=crypt"
+ target_version := "target_version=" <N> "." <N> "." <N>
+ allow_discards := "allow_discards=" <yes_no>
+ same_cpu_crypt := "same_cpu_crypt=" <yes_no>
+ submit_from_crypt_cpus := "submit_from_crypt_cpus=" <yes_no>
+ no_read_workqueue := "no_read_workqueue=" <yes_no>
+ no_write_workqueue := "no_write_workqueue=" <yes_no>
+ iv_large_sectors := "iv_large_sectors=" <yes_no>
+ integrity_tag_size := "integrity_tag_size=" <N>
+ cipher_auth := "cipher_auth=" <string>
+ sector_size := "sector_size=" <N>
+ cipher_string := "cipher_string="
+ key_size := "key_size=" <N>
+ key_parts := "key_parts=" <N>
+ key_extra_size := "key_extra_size=" <N>
+ key_mac_size := "key_mac_size=" <N>
+ yes_no := "y" | "n"
+
+ E.g.
+ When a 'crypt' target is loaded, then IMA ASCII measurement log will have an entry
+ similar to the following, depicting what 'crypt' attributes are measured in EVENT_DATA
+ for 'dm_table_load' event.
+ (converted from ASCII to text for readability)
+
+ dm_version=4.45.0;
+ name=crypt1,uuid=crypt_uuid1,major=253,minor=0,minor_count=1,num_targets=1;
+ target_index=0,target_begin=0,target_len=1953125,target_name=crypt,target_version=1.23.0,
+ allow_discards=y,same_cpu=n,submit_from_crypt_cpus=n,no_read_workqueue=n,no_write_workqueue=n,
+ iv_large_sectors=n,cipher_string=aes-xts-plain64,key_size=32,key_parts=1,key_extra_size=0,key_mac_size=0;
+
+3. integrity
+-------------
+The 'target_attributes' (described as part of EVENT_DATA in 'Table load'
+section above) has the following data format for 'integrity' target.
+
+::
+
+ target_attributes := <target_name> "," <target_version> "," <dev_name> "," <start>
+ <tag_size> "," <mode> "," [<meta_device> ","] [<block_size> ","] <recalculate> ","
+ <allow_discards> "," <fix_padding> "," <fix_hmac> "," <legacy_recalculate> ","
+ <journal_sectors> "," <interleave_sectors> "," <buffer_sectors> ";"
+
+ target_name := "target_name=integrity"
+ target_version := "target_version=" <N> "." <N> "." <N>
+ dev_name := "dev_name=" <device_name_str>
+ start := "start=" <N>
+ tag_size := "tag_size=" <N>
+ mode := "mode=" <integrity_mode_str>
+ integrity_mode_str := "J" | "B" | "D" | "R"
+ meta_device := "meta_device=" <meta_device_str>
+ block_size := "block_size=" <N>
+ recalculate := "recalculate=" <yes_no>
+ allow_discards := "allow_discards=" <yes_no>
+ fix_padding := "fix_padding=" <yes_no>
+ fix_hmac := "fix_hmac=" <yes_no>
+ legacy_recalculate := "legacy_recalculate=" <yes_no>
+ journal_sectors := "journal_sectors=" <N>
+ interleave_sectors := "interleave_sectors=" <N>
+ buffer_sectors := "buffer_sectors=" <N>
+ yes_no := "y" | "n"
+
+ E.g.
+ When a 'integrity' target is loaded, then IMA ASCII measurement log will have an entry
+ similar to the following, depicting what 'integrity' attributes are measured in EVENT_DATA
+ for 'dm_table_load' event.
+ (converted from ASCII to text for readability)
+
+ dm_version=4.45.0;
+ name=integrity1,uuid=,major=253,minor=1,minor_count=1,num_targets=1;
+ target_index=0,target_begin=0,target_len=7856,target_name=integrity,target_version=1.10.0,
+ dev_name=253:0,start=0,tag_size=32,mode=J,recalculate=n,allow_discards=n,fix_padding=n,
+ fix_hmac=n,legacy_recalculate=n,journal_sectors=88,interleave_sectors=32768,buffer_sectors=128;
+
+
+4. linear
+----------
+The 'target_attributes' (described as part of EVENT_DATA in 'Table load'
+section above) has the following data format for 'linear' target.
+
+::
+
+ target_attributes := <target_name> "," <target_version> "," <device_name> <,> <start> ";"
+
+ target_name := "target_name=linear"
+ target_version := "target_version=" <N> "." <N> "." <N>
+ device_name := "device_name=" <linear_device_name_str>
+ start := "start=" <N>
+
+ E.g.
+ When a 'linear' target is loaded, then IMA ASCII measurement log will have an entry
+ similar to the following, depicting what 'linear' attributes are measured in EVENT_DATA
+ for 'dm_table_load' event.
+ (converted from ASCII to text for readability)
+
+ dm_version=4.45.0;
+ name=linear1,uuid=linear_uuid1,major=253,minor=2,minor_count=1,num_targets=1;
+ target_index=0,target_begin=0,target_len=28672,target_name=linear,target_version=1.4.0,
+ device_name=253:1,start=2048;
+
+5. mirror
+----------
+The 'target_attributes' (described as part of EVENT_DATA in 'Table load'
+section above) has the following data format for 'mirror' target.
+
+::
+
+ target_attributes := <target_name> "," <target_version> "," <nr_mirrors> ","
+ <mirror_device_data> "," <handle_errors> "," <keep_log> "," <log_type_status> ";"
+
+ target_name := "target_name=mirror"
+ target_version := "target_version=" <N> "." <N> "." <N>
+ nr_mirrors := "nr_mirrors=" <NR>
+ mirror_device_data := <mirror_device_row> | <mirror_device_data><mirror_device_row>
+ mirror_device_row is repeated <NR> times - for <NR> described in <nr_mirrors>.
+ mirror_device_row := <mirror_device_name> "," <mirror_device_status>
+ mirror_device_name := "mirror_device_" <X> "=" <mirror_device_name_str>
+ where <X> ranges from 0 to (<NR> -1) - for <NR> described in <nr_mirrors>.
+ mirror_device_status := "mirror_device_" <X> "_status=" <mirror_device_status_char>
+ where <X> ranges from 0 to (<NR> -1) - for <NR> described in <nr_mirrors>.
+ mirror_device_status_char := "A" | "F" | "D" | "S" | "R" | "U"
+ handle_errors := "handle_errors=" <yes_no>
+ keep_log := "keep_log=" <yes_no>
+ log_type_status := "log_type_status=" <log_type_status_str>
+ yes_no := "y" | "n"
+
+ E.g.
+ When a 'mirror' target is loaded, then IMA ASCII measurement log will have an entry
+ similar to the following, depicting what 'mirror' attributes are measured in EVENT_DATA
+ for 'dm_table_load' event.
+ (converted from ASCII to text for readability)
+
+ dm_version=4.45.0;
+ name=mirror1,uuid=mirror_uuid1,major=253,minor=6,minor_count=1,num_targets=1;
+ target_index=0,target_begin=0,target_len=2048,target_name=mirror,target_version=1.14.0,nr_mirrors=2,
+ mirror_device_0=253:4,mirror_device_0_status=A,
+ mirror_device_1=253:5,mirror_device_1_status=A,
+ handle_errors=y,keep_log=n,log_type_status=;
+
+6. multipath
+-------------
+The 'target_attributes' (described as part of EVENT_DATA in 'Table load'
+section above) has the following data format for 'multipath' target.
+
+::
+
+ target_attributes := <target_name> "," <target_version> "," <nr_priority_groups>
+ ["," <pg_state> "," <priority_groups> "," <priority_group_paths>] ";"
+
+ target_name := "target_name=multipath"
+ target_version := "target_version=" <N> "." <N> "." <N>
+ nr_priority_groups := "nr_priority_groups=" <NPG>
+ priority_groups := <priority_groups_row>|<priority_groups_row><priority_groups>
+ priority_groups_row := "pg_state_" <X> "=" <pg_state_str> "," "nr_pgpaths_" <X> "=" <NPGP> ","
+ "path_selector_name_" <X> "=" <string> "," <priority_group_paths>
+ where <X> ranges from 0 to (<NPG> -1) - for <NPG> described in <nr_priority_groups>.
+ pg_state_str := "E" | "A" | "D"
+ <priority_group_paths> := <priority_group_paths_row> | <priority_group_paths_row><priority_group_paths>
+ priority_group_paths_row := "path_name_" <X> "_" <Y> "=" <string> "," "is_active_" <X> "_" <Y> "=" <is_active_str>
+ "fail_count_" <X> "_" <Y> "=" <N> "," "path_selector_status_" <X> "_" <Y> "=" <path_selector_status_str>
+ where <X> ranges from 0 to (<NPG> -1) - for <NPG> described in <nr_priority_groups>,
+ and <Y> ranges from 0 to (<NPGP> -1) - for <NPGP> described in <priority_groups_row>.
+ is_active_str := "A" | "F"
+
+ E.g.
+ When a 'multipath' target is loaded, then IMA ASCII measurement log will have an entry
+ similar to the following, depicting what 'multipath' attributes are measured in EVENT_DATA
+ for 'dm_table_load' event.
+ (converted from ASCII to text for readability)
+
+ dm_version=4.45.0;
+ name=mp,uuid=,major=253,minor=0,minor_count=1,num_targets=1;
+ target_index=0,target_begin=0,target_len=2097152,target_name=multipath,target_version=1.14.0,nr_priority_groups=2,
+ pg_state_0=E,nr_pgpaths_0=2,path_selector_name_0=queue-length,
+ path_name_0_0=8:16,is_active_0_0=A,fail_count_0_0=0,path_selector_status_0_0=,
+ path_name_0_1=8:32,is_active_0_1=A,fail_count_0_1=0,path_selector_status_0_1=,
+ pg_state_1=E,nr_pgpaths_1=2,path_selector_name_1=queue-length,
+ path_name_1_0=8:48,is_active_1_0=A,fail_count_1_0=0,path_selector_status_1_0=,
+ path_name_1_1=8:64,is_active_1_1=A,fail_count_1_1=0,path_selector_status_1_1=;
+
+7. raid
+--------
+The 'target_attributes' (described as part of EVENT_DATA in 'Table load'
+section above) has the following data format for 'raid' target.
+
+::
+
+ target_attributes := <target_name> "," <target_version> "," <raid_type> "," <raid_disks> "," <raid_state>
+ <raid_device_status> ["," journal_dev_mode] ";"
+
+ target_name := "target_name=raid"
+ target_version := "target_version=" <N> "." <N> "." <N>
+ raid_type := "raid_type=" <raid_type_str>
+ raid_disks := "raid_disks=" <NRD>
+ raid_state := "raid_state=" <raid_state_str>
+ raid_state_str := "frozen" | "reshape" |"resync" | "check" | "repair" | "recover" | "idle" |"undef"
+ raid_device_status := <raid_device_status_row> | <raid_device_status_row><raid_device_status>
+ <raid_device_status_row> is repeated <NRD> times - for <NRD> described in <raid_disks>.
+ raid_device_status_row := "raid_device_" <X> "_status=" <raid_device_status_str>
+ where <X> ranges from 0 to (<NRD> -1) - for <NRD> described in <raid_disks>.
+ raid_device_status_str := "A" | "D" | "a" | "-"
+ journal_dev_mode := "journal_dev_mode=" <journal_dev_mode_str>
+ journal_dev_mode_str := "writethrough" | "writeback" | "invalid"
+
+ E.g.
+ When a 'raid' target is loaded, then IMA ASCII measurement log will have an entry
+ similar to the following, depicting what 'raid' attributes are measured in EVENT_DATA
+ for 'dm_table_load' event.
+ (converted from ASCII to text for readability)
+
+ dm_version=4.45.0;
+ name=raid_LV1,uuid=uuid_raid_LV1,major=253,minor=12,minor_count=1,num_targets=1;
+ target_index=0,target_begin=0,target_len=2048,target_name=raid,target_version=1.15.1,
+ raid_type=raid10,raid_disks=4,raid_state=idle,
+ raid_device_0_status=A,
+ raid_device_1_status=A,
+ raid_device_2_status=A,
+ raid_device_3_status=A;
+
+
+8. snapshot
+------------
+The 'target_attributes' (described as part of EVENT_DATA in 'Table load'
+section above) has the following data format for 'snapshot' target.
+
+::
+
+ target_attributes := <target_name> "," <target_version> "," <snap_origin_name> ","
+ <snap_cow_name> "," <snap_valid> "," <snap_merge_failed> "," <snapshot_overflowed> ";"
+
+ target_name := "target_name=snapshot"
+ target_version := "target_version=" <N> "." <N> "." <N>
+ snap_origin_name := "snap_origin_name=" <string>
+ snap_cow_name := "snap_cow_name=" <string>
+ snap_valid := "snap_valid=" <yes_no>
+ snap_merge_failed := "snap_merge_failed=" <yes_no>
+ snapshot_overflowed := "snapshot_overflowed=" <yes_no>
+ yes_no := "y" | "n"
+
+ E.g.
+ When a 'snapshot' target is loaded, then IMA ASCII measurement log will have an entry
+ similar to the following, depicting what 'snapshot' attributes are measured in EVENT_DATA
+ for 'dm_table_load' event.
+ (converted from ASCII to text for readability)
+
+ dm_version=4.45.0;
+ name=snap1,uuid=snap_uuid1,major=253,minor=13,minor_count=1,num_targets=1;
+ target_index=0,target_begin=0,target_len=4096,target_name=snapshot,target_version=1.16.0,
+ snap_origin_name=253:11,snap_cow_name=253:12,snap_valid=y,snap_merge_failed=n,snapshot_overflowed=n;
+
+9. striped
+-----------
+The 'target_attributes' (described as part of EVENT_DATA in 'Table load'
+section above) has the following data format for 'striped' target.
+
+::
+
+ target_attributes := <target_name> "," <target_version> "," <stripes> "," <chunk_size> ","
+ <stripe_data> ";"
+
+ target_name := "target_name=striped"
+ target_version := "target_version=" <N> "." <N> "." <N>
+ stripes := "stripes=" <NS>
+ chunk_size := "chunk_size=" <N>
+ stripe_data := <stripe_data_row>|<stripe_data><stripe_data_row>
+ stripe_data_row := <stripe_device_name> "," <stripe_physical_start> "," <stripe_status>
+ stripe_device_name := "stripe_" <X> "_device_name=" <stripe_device_name_str>
+ where <X> ranges from 0 to (<NS> -1) - for <NS> described in <stripes>.
+ stripe_physical_start := "stripe_" <X> "_physical_start=" <N>
+ where <X> ranges from 0 to (<NS> -1) - for <NS> described in <stripes>.
+ stripe_status := "stripe_" <X> "_status=" <stripe_status_str>
+ where <X> ranges from 0 to (<NS> -1) - for <NS> described in <stripes>.
+ stripe_status_str := "D" | "A"
+
+ E.g.
+ When a 'striped' target is loaded, then IMA ASCII measurement log will have an entry
+ similar to the following, depicting what 'striped' attributes are measured in EVENT_DATA
+ for 'dm_table_load' event.
+ (converted from ASCII to text for readability)
+
+ dm_version=4.45.0;
+ name=striped1,uuid=striped_uuid1,major=253,minor=5,minor_count=1,num_targets=1;
+ target_index=0,target_begin=0,target_len=640,target_name=striped,target_version=1.6.0,stripes=2,chunk_size=64,
+ stripe_0_device_name=253:0,stripe_0_physical_start=2048,stripe_0_status=A,
+ stripe_1_device_name=253:3,stripe_1_physical_start=2048,stripe_1_status=A;
+
+10. verity
+----------
+The 'target_attributes' (described as part of EVENT_DATA in 'Table load'
+section above) has the following data format for 'verity' target.
+
+::
+
+ target_attributes := <target_name> "," <target_version> "," <hash_failed> "," <verity_version> ","
+ <data_device_name> "," <hash_device_name> "," <verity_algorithm> "," <root_digest> ","
+ <salt> "," <ignore_zero_blocks> "," <check_at_most_once> ["," <root_hash_sig_key_desc>]
+ ["," <verity_mode>] ";"
+
+ target_name := "target_name=verity"
+ target_version := "target_version=" <N> "." <N> "." <N>
+ hash_failed := "hash_failed=" <hash_failed_str>
+ hash_failed_str := "C" | "V"
+ verity_version := "verity_version=" <verity_version_str>
+ data_device_name := "data_device_name=" <data_device_name_str>
+ hash_device_name := "hash_device_name=" <hash_device_name_str>
+ verity_algorithm := "verity_algorithm=" <verity_algorithm_str>
+ root_digest := "root_digest=" <root_digest_str>
+ salt := "salt=" <salt_str>
+ salt_str := "-" <verity_salt_str>
+ ignore_zero_blocks := "ignore_zero_blocks=" <yes_no>
+ check_at_most_once := "check_at_most_once=" <yes_no>
+ root_hash_sig_key_desc := "root_hash_sig_key_desc="
+ verity_mode := "verity_mode=" <verity_mode_str>
+ verity_mode_str := "ignore_corruption" | "restart_on_corruption" | "panic_on_corruption" | "invalid"
+ yes_no := "y" | "n"
+
+ E.g.
+ When a 'verity' target is loaded, then IMA ASCII measurement log will have an entry
+ similar to the following, depicting what 'verity' attributes are measured in EVENT_DATA
+ for 'dm_table_load' event.
+ (converted from ASCII to text for readability)
+
+ dm_version=4.45.0;
+ name=test-verity,uuid=,major=253,minor=2,minor_count=1,num_targets=1;
+ target_index=0,target_begin=0,target_len=1953120,target_name=verity,target_version=1.8.0,hash_failed=V,
+ verity_version=1,data_device_name=253:1,hash_device_name=253:0,verity_algorithm=sha256,
+ root_digest=29cb87e60ce7b12b443ba6008266f3e41e93e403d7f298f8e3f316b29ff89c5e,
+ salt=e48da609055204e89ae53b655ca2216dd983cf3cb829f34f63a297d106d53e2d,
+ ignore_zero_blocks=n,check_at_most_once=n;
diff --git a/Documentation/admin-guide/device-mapper/dm-init.rst b/Documentation/admin-guide/device-mapper/dm-init.rst
index e5242ff17e9b..981d6a907699 100644
--- a/Documentation/admin-guide/device-mapper/dm-init.rst
+++ b/Documentation/admin-guide/device-mapper/dm-init.rst
@@ -123,3 +123,11 @@ Other examples (per target):
0 1638400 verity 1 8:1 8:2 4096 4096 204800 1 sha256
fb1a5a0f00deb908d8b53cb270858975e76cf64105d412ce764225d53b8f3cfd
51934789604d1b92399c52e7cb149d1b3a1b74bbbcb103b2a0aaacbed5c08584
+
+For setups using device-mapper on top of asynchronously probed block
+devices (MMC, USB, ..), it may be necessary to tell dm-init to
+explicitly wait for them to become available before setting up the
+device-mapper tables. This can be done with the "dm-mod.waitfor="
+module parameter, which takes a list of devices to wait for::
+
+ dm-mod.waitfor=<device1>[,..,<deviceN>]
diff --git a/Documentation/admin-guide/device-mapper/dm-integrity.rst b/Documentation/admin-guide/device-mapper/dm-integrity.rst
index 8439d2ae689b..d8a5f14d0e3c 100644
--- a/Documentation/admin-guide/device-mapper/dm-integrity.rst
+++ b/Documentation/admin-guide/device-mapper/dm-integrity.rst
@@ -25,7 +25,7 @@ mode it calculates and verifies the integrity tag internally. In this
mode, the dm-integrity target can be used to detect silent data
corruption on the disk or in the I/O path.
-There's an alternate mode of operation where dm-integrity uses bitmap
+There's an alternate mode of operation where dm-integrity uses a bitmap
instead of a journal. If a bit in the bitmap is 1, the corresponding
region's data and integrity tags are not synchronized - if the machine
crashes, the unsynchronized regions will be recalculated. The bitmap mode
@@ -38,6 +38,15 @@ the device. But it will only format the device if the superblock contains
zeroes. If the superblock is neither valid nor zeroed, the dm-integrity
target can't be loaded.
+Accesses to the on-disk metadata area containing checksums (aka tags) are
+buffered using dm-bufio. When an access to any given metadata area
+occurs, each unique metadata area gets its own buffer(s). The buffer size
+is capped at the size of the metadata area, but may be smaller, thereby
+requiring multiple buffers to represent the full metadata area. A smaller
+buffer size will produce a smaller resulting read/write operation to the
+metadata area for small reads/writes. The metadata is still read even in
+a full write to the data covered by a single buffer.
+
To use the target for the first time:
1. overwrite the superblock with zeroes
@@ -45,7 +54,7 @@ To use the target for the first time:
will format the device
3. unload the dm-integrity target
4. read the "provided_data_sectors" value from the superblock
-5. load the dm-integrity target with the the target size
+5. load the dm-integrity target with the target size
"provided_data_sectors"
6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
with the size "provided_data_sectors"
@@ -93,31 +102,27 @@ journal_sectors:number
device. If the device is already formatted, the value from the
superblock is used.
-interleave_sectors:number
+interleave_sectors:number (default 32768)
The number of interleaved sectors. This values is rounded down to
a power of two. If the device is already formatted, the value from
the superblock is used.
meta_device:device
- Don't interleave the data and metadata on on device. Use a
+ Don't interleave the data and metadata on the device. Use a
separate device for metadata.
-buffer_sectors:number
- The number of sectors in one buffer. The value is rounded down to
- a power of two.
-
- The tag area is accessed using buffers, the buffer size is
- configurable. The large buffer size means that the I/O size will
- be larger, but there could be less I/Os issued.
+buffer_sectors:number (default 128)
+ The number of sectors in one metadata buffer. The value is rounded
+ down to a power of two.
-journal_watermark:number
+journal_watermark:number (default 50)
The journal watermark in percents. When the size of the journal
exceeds this watermark, the thread that flushes the journal will
be started.
-commit_time:number
+commit_time:number (default 10000)
Commit time in milliseconds. When this time passes, the journal is
- written. The journal is also written immediatelly if the FLUSH
+ written. The journal is also written immediately if the FLUSH
request is received.
internal_hash:algorithm(:key) (the key is optional)
@@ -143,11 +148,11 @@ recalculate
journal_crypt:algorithm(:key) (the key is optional)
Encrypt the journal using given algorithm to make sure that the
attacker can't read the journal. You can use a block cipher here
- (such as "cbc(aes)") or a stream cipher (for example "chacha20",
- "salsa20" or "ctr(aes)").
+ (such as "cbc(aes)") or a stream cipher (for example "chacha20"
+ or "ctr(aes)").
The journal contains history of last writes to the block device,
- an attacker reading the journal could see the last sector nubmers
+ an attacker reading the journal could see the last sector numbers
that were written. From the sector numbers, the attacker can infer
the size of files that were written. To protect against this
situation, you can encrypt the journal.
@@ -163,11 +168,10 @@ journal_mac:algorithm(:key) (the key is optional)
the journal. Thus, modified sector number would be detected at
this stage.
-block_size:number
- The size of a data block in bytes. The larger the block size the
+block_size:number (default 512)
+ The size of a data block in bytes. The larger the block size the
less overhead there is for per-block integrity metadata.
- Supported values are 512, 1024, 2048 and 4096 bytes. If not
- specified the default block size is 512 bytes.
+ Supported values are 512, 1024, 2048 and 4096 bytes.
sectors_per_bit:number
In the bitmap mode, this parameter specifies the number of
@@ -177,14 +181,31 @@ bitmap_flush_interval:number
The bitmap flush interval in milliseconds. The metadata buffers
are synchronized when this interval expires.
+allow_discards
+ Allow block discard requests (a.k.a. TRIM) for the integrity device.
+ Discards are only allowed to devices using internal hash.
+
fix_padding
Use a smaller padding of the tag area that is more
space-efficient. If this option is not present, large padding is
used - that is for compatibility with older kernels.
-allow_discards
- Allow block discard requests (a.k.a. TRIM) for the integrity device.
- Discards are only allowed to devices using internal hash.
+fix_hmac
+ Improve security of internal_hash and journal_mac:
+
+ - the section number is mixed to the mac, so that an attacker can't
+ copy sectors from one journal section to another journal section
+ - the superblock is protected by journal_mac
+ - a 16-byte salt stored in the superblock is mixed to the mac, so
+ that the attacker can't detect that two disks have the same hmac
+ key and also to disallow the attacker to move sectors from one
+ disk to another
+
+legacy_recalculate
+ Allow recalculating of volumes with HMAC keys. This is disabled by
+ default for security reasons - an attacker could modify the volume,
+ set recalc_sector to zero, and the kernel would not detect the
+ modification.
The journal mode (D/J), buffer_sectors, journal_watermark, commit_time and
allow_discards can be changed when reloading the target (load an inactive
@@ -192,6 +213,20 @@ table and swap the tables with suspend and resume). The other arguments
should not be changed when reloading the target because the layout of disk
data depend on them and the reloaded target would be non-functional.
+For example, on a device using the default interleave_sectors of 32768, a
+block_size of 512, and an internal_hash of crc32c with a tag size of 4
+bytes, it will take 128 KiB of tags to track a full data area, requiring
+256 sectors of metadata per data area. With the default buffer_sectors of
+128, that means there will be 2 buffers per metadata area, or 2 buffers
+per 16 MiB of data.
+
+Status line:
+
+1. the number of integrity mismatches
+2. provided data sectors - that is the number of sectors that the user
+ could use
+3. the current recalculating position (or '-' if we didn't recalculate)
+
The layout of the formatted block device:
@@ -261,7 +296,8 @@ The layout of the formatted block device:
Each run contains:
* tag area - it contains integrity tags. There is one tag for each
- sector in the data area
+ sector in the data area. The size of this area is always 4KiB or
+ greater.
* data area - it contains data sectors. The number of data sectors
in one run must be a power of two. log2 of this value is stored
in the superblock.
diff --git a/Documentation/admin-guide/device-mapper/dm-raid.rst b/Documentation/admin-guide/device-mapper/dm-raid.rst
index 695a2ea1d1ae..bb17e26e3c1b 100644
--- a/Documentation/admin-guide/device-mapper/dm-raid.rst
+++ b/Documentation/admin-guide/device-mapper/dm-raid.rst
@@ -71,7 +71,7 @@ The target is named "raid" and it accepts the following parameters::
============= ===============================================================
Reference: Chapter 4 of
- http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
+ https://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
<#raid_params>: The number of parameters that follow.
@@ -418,6 +418,6 @@ Version History
specific devices are requested via rebuild. Fix RAID leg
rebuild errors.
1.15.0 Fix size extensions not being synchronized in case of new MD bitmap
- pages allocated; also fix those not occuring after previous reductions
+ pages allocated; also fix those not occurring after previous reductions
1.15.1 Fix argument count and arguments for rebuild/write_mostly/journal_(dev|mode)
on the status line.
diff --git a/Documentation/admin-guide/device-mapper/dm-zoned.rst b/Documentation/admin-guide/device-mapper/dm-zoned.rst
index 07f56ebc1730..932383fe6e88 100644
--- a/Documentation/admin-guide/device-mapper/dm-zoned.rst
+++ b/Documentation/admin-guide/device-mapper/dm-zoned.rst
@@ -14,7 +14,7 @@ host-aware zoned block devices.
For a more detailed description of the zoned block device models and
their constraints see (for SCSI devices):
-http://www.t10.org/drafts.htm#ZBC_Family
+https://www.t10.org/drafts.htm#ZBC_Family
and (for ATA devices):
@@ -24,7 +24,7 @@ The dm-zoned implementation is simple and minimizes system overhead (CPU
and memory usage as well as storage capacity loss). For a 10TB
host-managed disk with 256 MB zones, dm-zoned memory usage per disk
instance is at most 4.5 MB and as little as 5 zones will be used
-internally for storing metadata and performaing reclaim operations.
+internally for storing metadata and performing reclaim operations.
dm-zoned target devices are formatted and checked using the dmzadm
utility available at:
@@ -37,12 +37,16 @@ Algorithm
dm-zoned implements an on-disk buffering scheme to handle non-sequential
write accesses to the sequential zones of a zoned block device.
Conventional zones are used for caching as well as for storing internal
-metadata.
+metadata. It can also use a regular block device together with the zoned
+block device; in that case the regular block device will be split logically
+in zones with the same size as the zoned block device. These zones will be
+placed in front of the zones from the zoned block device and will be handled
+just like conventional zones.
-The zones of the device are separated into 2 types:
+The zones of the device(s) are separated into 2 types:
1) Metadata zones: these are conventional zones used to store metadata.
-Metadata zones are not reported as useable capacity to the user.
+Metadata zones are not reported as usable capacity to the user.
2) Data zones: all remaining zones, the vast majority of which will be
sequential zones used exclusively to store user data. The conventional
@@ -98,7 +102,7 @@ the buffer zone assigned. If the accessed chunk has no mapping, or the
accessed blocks are invalid, the read buffer is zeroed and the read
operation terminated.
-After some time, the limited number of convnetional zones available may
+After some time, the limited number of conventional zones available may
be exhausted (all used to map chunks or buffer sequential zones) and
unaligned writes to unbuffered chunks become impossible. To avoid this
situation, a reclaim process regularly scans used conventional zones and
@@ -127,6 +131,13 @@ resumed. Flushing metadata thus only temporarily delays write and
discard requests. Read requests can be processed concurrently while
metadata flush is being executed.
+If a regular device is used in conjunction with the zoned block device,
+a third set of metadata (without the zone bitmaps) is written to the
+start of the zoned block device. This metadata has a generation counter of
+'0' and will never be updated during normal operation; it just serves for
+identification purposes. The first and second copy of the metadata
+are located at the start of the regular block device.
+
Usage
=====
@@ -138,9 +149,46 @@ Ex::
dmzadm --format /dev/sdxx
-For a formatted device, the target can be created normally with the
-dmsetup utility. The only parameter that dm-zoned requires is the
-underlying zoned block device name. Ex::
- echo "0 `blockdev --getsize ${dev}` zoned ${dev}" | \
- dmsetup create dmz-`basename ${dev}`
+If two drives are to be used, both devices must be specified, with the
+regular block device as the first device.
+
+Ex::
+
+ dmzadm --format /dev/sdxx /dev/sdyy
+
+
+Formatted device(s) can be started with the dmzadm utility, too.:
+
+Ex::
+
+ dmzadm --start /dev/sdxx /dev/sdyy
+
+
+Information about the internal layout and current usage of the zones can
+be obtained with the 'status' callback from dmsetup:
+
+Ex::
+
+ dmsetup status /dev/dm-X
+
+will return a line
+
+ 0 <size> zoned <nr_zones> zones <nr_unmap_rnd>/<nr_rnd> random <nr_unmap_seq>/<nr_seq> sequential
+
+where <nr_zones> is the total number of zones, <nr_unmap_rnd> is the number
+of unmapped (ie free) random zones, <nr_rnd> the total number of zones,
+<nr_unmap_seq> the number of unmapped sequential zones, and <nr_seq> the
+total number of sequential zones.
+
+Normally the reclaim process will be started once there are less than 50
+percent free random zones. In order to start the reclaim process manually
+even before reaching this threshold the 'dmsetup message' function can be
+used:
+
+Ex::
+
+ dmsetup message /dev/dm-X 0 reclaim
+
+will start the reclaim process and random zones will be moved to sequential
+zones.
diff --git a/Documentation/admin-guide/device-mapper/index.rst b/Documentation/admin-guide/device-mapper/index.rst
index ec62fcc8eece..cde52cc09645 100644
--- a/Documentation/admin-guide/device-mapper/index.rst
+++ b/Documentation/admin-guide/device-mapper/index.rst
@@ -11,7 +11,9 @@ Device Mapper
dm-clone
dm-crypt
dm-dust
+ dm-ebs
dm-flakey
+ dm-ima
dm-init
dm-integrity
dm-io
diff --git a/Documentation/admin-guide/device-mapper/unstriped.rst b/Documentation/admin-guide/device-mapper/unstriped.rst
index 0a8d3eb3f072..5772ccdd1f5f 100644
--- a/Documentation/admin-guide/device-mapper/unstriped.rst
+++ b/Documentation/admin-guide/device-mapper/unstriped.rst
@@ -35,7 +35,7 @@ An example of undoing an existing dm-stripe
This small bash script will setup 4 loop devices and use the existing
striped target to combine the 4 devices into one. It then will use
-the unstriped target ontop of the striped device to access the
+the unstriped target on top of the striped device to access the
individual backing loop devices. We write data to the newly exposed
unstriped devices and verify the data written matches the correct
underlying device on the striped array::
@@ -110,8 +110,8 @@ to get a 92% reduction in read latency using this device mapper target.
Example dmsetup usage
=====================
-unstriped ontop of Intel NVMe device that has 2 cores
------------------------------------------------------
+unstriped on top of Intel NVMe device that has 2 cores
+------------------------------------------------------
::
@@ -124,8 +124,8 @@ respectively::
/dev/mapper/nvmset0
/dev/mapper/nvmset1
-unstriped ontop of striped with 4 drives using 128K chunk size
---------------------------------------------------------------
+unstriped on top of striped with 4 drives using 128K chunk size
+---------------------------------------------------------------
::
diff --git a/Documentation/admin-guide/device-mapper/verity.rst b/Documentation/admin-guide/device-mapper/verity.rst
index bb02caa45289..a65c1602cb23 100644
--- a/Documentation/admin-guide/device-mapper/verity.rst
+++ b/Documentation/admin-guide/device-mapper/verity.rst
@@ -69,7 +69,7 @@ Construction Parameters
<#opt_params>
Number of optional parameters. If there are no optional parameters,
- the optional paramaters section can be skipped or #opt_params can be zero.
+ the optional parameters section can be skipped or #opt_params can be zero.
Otherwise #opt_params is the number of following arguments.
Example of optional parameters section:
@@ -83,6 +83,10 @@ restart_on_corruption
not compatible with ignore_corruption and requires user space support to
avoid restart loops.
+panic_on_corruption
+ Panic the device when a corrupted block is discovered. This option is
+ not compatible with ignore_corruption and restart_on_corruption.
+
ignore_zero_blocks
Do not verify blocks that are expected to contain zeroes and always return
zeroes instead. This may be useful if the partition contains unused blocks
@@ -130,7 +134,16 @@ root_hash_sig_key_desc <key_description>
the pkcs7 signature of the roothash. The pkcs7 signature is used to validate
the root hash during the creation of the device mapper block device.
Verification of roothash depends on the config DM_VERITY_VERIFY_ROOTHASH_SIG
- being set in the kernel.
+ being set in the kernel. The signatures are checked against the builtin
+ trusted keyring by default, or the secondary trusted keyring if
+ DM_VERITY_VERIFY_ROOTHASH_SIG_SECONDARY_KEYRING is set. The secondary
+ trusted keyring includes by default the builtin trusted keyring, and it can
+ also gain new certificates at run time if they are signed by a certificate
+ already in the secondary trusted keyring.
+
+try_verify_in_tasklet
+ If verity hashes are in cache, verify data blocks in kernel tasklet instead
+ of workqueue. This option can reduce IO latency.
Theory of operation
===================
diff --git a/Documentation/admin-guide/device-mapper/writecache.rst b/Documentation/admin-guide/device-mapper/writecache.rst
index d3d7690f5e8d..60c16b7fd5ac 100644
--- a/Documentation/admin-guide/device-mapper/writecache.rst
+++ b/Documentation/admin-guide/device-mapper/writecache.rst
@@ -12,7 +12,6 @@ first sector should contain valid superblock from previous invocation.
Constructor parameters:
1. type of the cache device - "p" or "s"
-
- p - persistent memory
- s - SSD
2. the underlying device that will be cached
@@ -37,10 +36,10 @@ Constructor parameters:
autocommit_blocks n (default: 64 for pmem, 65536 for ssd)
when the application writes this amount of blocks without
issuing the FLUSH request, the blocks are automatically
- commited
+ committed
autocommit_time ms (default: 1000)
autocommit time in milliseconds. The data is automatically
- commited if this time passes and no FLUSH request is
+ committed if this time passes and no FLUSH request is
received
fua (by default on)
applicable only to persistent memory - use the FUA flag
@@ -53,19 +52,51 @@ Constructor parameters:
- some underlying devices perform better with fua, some
with nofua. The user should test it
+ cleaner
+ when this option is activated (either in the constructor
+ arguments or by a message), the cache will not promote
+ new writes (however, writes to already cached blocks are
+ promoted, to avoid data corruption due to misordered
+ writes) and it will gradually writeback any cached
+ data. The userspace can then monitor the cleaning
+ process with "dmsetup status". When the number of cached
+ blocks drops to zero, userspace can unload the
+ dm-writecache target and replace it with dm-linear or
+ other targets.
+ max_age n
+ specifies the maximum age of a block in milliseconds. If
+ a block is stored in the cache for too long, it will be
+ written to the underlying device and cleaned up.
+ metadata_only
+ only metadata is promoted to the cache. This option
+ improves performance for heavier REQ_META workloads.
+ pause_writeback n (default: 3000)
+ pause writeback if there was some write I/O redirected to
+ the origin volume in the last n milliseconds
Status:
+
1. error indicator - 0 if there was no error, otherwise error number
2. the number of blocks
3. the number of free blocks
4. the number of blocks under writeback
+5. the number of read blocks
+6. the number of read blocks that hit the cache
+7. the number of write blocks
+8. the number of write blocks that hit uncommitted block
+9. the number of write blocks that hit committed block
+10. the number of write blocks that bypass the cache
+11. the number of write blocks that are allocated in the cache
+12. the number of write requests that are blocked on the freelist
+13. the number of flush requests
+14. the number of discarded blocks
Messages:
flush
- flush the cache device. The message returns successfully
+ Flush the cache device. The message returns successfully
if the cache device was flushed without an error
flush_on_suspend
- flush the cache device on next suspend. Use this message
+ Flush the cache device on next suspend. Use this message
when you are going to remove the cache device. The proper
sequence for removing the cache device is:
@@ -77,3 +108,7 @@ Messages:
5. resume the device, so that it will use the linear
target
6. the cache device is now inactive and it can be deleted
+ cleaner
+ See above "cleaner" constructor documentation.
+ clear_stats
+ Clear the statistics that are reported on the status line
diff --git a/Documentation/admin-guide/devices.rst b/Documentation/admin-guide/devices.rst
index d41671aeaef0..e3776d77374b 100644
--- a/Documentation/admin-guide/devices.rst
+++ b/Documentation/admin-guide/devices.rst
@@ -7,17 +7,16 @@ This list is the Linux Device List, the official registry of allocated
device numbers and ``/dev`` directory nodes for the Linux operating
system.
-The LaTeX version of this document is no longer maintained, nor is
-the document that used to reside at lanana.org. This version in the
-mainline Linux kernel is the master document. Updates shall be sent
-as patches to the kernel maintainers (see the
+The version of this document at lanana.org is no longer maintained. This
+version in the mainline Linux kernel is the master document. Updates
+shall be sent as patches to the kernel maintainers (see the
:ref:`Documentation/process/submitting-patches.rst <submittingpatches>` document).
Specifically explore the sections titled "CHAR and MISC DRIVERS", and
"BLOCK LAYER" in the MAINTAINERS file to find the right maintainers
to involve for character and block devices.
This document is included by reference into the Filesystem Hierarchy
-Standard (FHS). The FHS is available from http://www.pathname.com/fhs/.
+Standard (FHS). The FHS is available from https://www.pathname.com/fhs/.
Allocations marked (68k/Amiga) apply to Linux/68k on the Amiga
platform only. Allocations marked (68k/Atari) apply to Linux/68k on
diff --git a/Documentation/admin-guide/devices.txt b/Documentation/admin-guide/devices.txt
index 2a97aaec8b12..94c98be1329a 100644
--- a/Documentation/admin-guide/devices.txt
+++ b/Documentation/admin-guide/devices.txt
@@ -4,7 +4,7 @@
1 char Memory devices
1 = /dev/mem Physical memory access
- 2 = /dev/kmem Kernel virtual memory access
+ 2 = /dev/kmem OBSOLETE - replaced by /proc/kcore
3 = /dev/null Null device
4 = /dev/port I/O port access
5 = /dev/zero Null byte source
@@ -289,7 +289,7 @@
152 = /dev/kpoll Kernel Poll Driver
153 = /dev/mergemem Memory merge device
154 = /dev/pmu Macintosh PowerBook power manager
- 155 = /dev/isictl MultiTech ISICom serial control
+ 155 =
156 = /dev/lcd Front panel LCD display
157 = /dev/ac Applicom Intl Profibus card
158 = /dev/nwbutton Netwinder external button
@@ -375,8 +375,9 @@
239 = /dev/uhid User-space I/O driver support for HID subsystem
240 = /dev/userio Serio driver testing device
241 = /dev/vhost-vsock Host kernel driver for virtio vsock
+ 242 = /dev/rfkill Turning off radio transmissions (rfkill)
- 242-254 Reserved for local use
+ 243-254 Reserved for local use
255 Reserved for MISC_DYNAMIC_MINOR
11 char Raw keyboard device (Linux/SPARC only)
@@ -476,11 +477,6 @@
18 block Sanyo CD-ROM
0 = /dev/sjcd Sanyo CD-ROM
- 19 char Cyclades serial card
- 0 = /dev/ttyC0 First Cyclades port
- ...
- 31 = /dev/ttyC31 32nd Cyclades port
-
19 block "Double" compressed disk
0 = /dev/double0 First compressed disk
...
@@ -492,11 +488,6 @@
See the Double documentation for the meaning of the
mirror devices.
- 20 char Cyclades serial card - alternate devices
- 0 = /dev/cub0 Callout device for ttyC0
- ...
- 31 = /dev/cub31 Callout device for ttyC31
-
20 block Hitachi CD-ROM (under development)
0 = /dev/hitcd Hitachi CD-ROM
@@ -1442,7 +1433,7 @@
...
The driver and documentation may be obtained from
- http://www.winradio.com/
+ https://www.winradio.com/
82 block I2O hard disk
0 = /dev/i2o/hdag 33rd I2O hard disk, whole disk
@@ -1656,12 +1647,12 @@
dynamically, so there is no fixed mapping from subdevice
pathnames to minor numbers.
- See http://www.comedi.org/ for information about the Comedi
+ See https://www.comedi.org/ for information about the Comedi
project.
98 block User-mode virtual block device
0 = /dev/ubda First user-mode block device
- 16 = /dev/udbb Second user-mode block device
+ 16 = /dev/ubdb Second user-mode block device
...
Partitions are handled in the same way as for IDE
@@ -1723,7 +1714,7 @@
implementations a kernel presence for caching and easy
mounting. For more information about the project,
write to <arla-drinkers@stacken.kth.se> or see
- http://www.stacken.kth.se/project/arla/
+ https://www.stacken.kth.se/project/arla/
103 block Audit device
0 = /dev/audit Audit device
@@ -1942,7 +1933,7 @@
...
255= /dev/umem/d15p15 15th partition of 16th board.
- 117 char COSA/SRP synchronous serial card
+ 117 char [REMOVED] COSA/SRP synchronous serial card
0 = /dev/cosa0c0 1st board, 1st channel
1 = /dev/cosa0c1 1st board, 2nd channel
...
@@ -2348,13 +2339,7 @@
disks (see major number 3) except that the limit on
partitions is 31.
- 162 char Raw block device interface
- 0 = /dev/rawctl Raw I/O control device
- 1 = /dev/raw/raw1 First raw I/O device
- 2 = /dev/raw/raw2 Second raw I/O device
- ...
- max minor number of raw device is set by kernel config
- MAX_RAW_DEVS or raw module parameter 'max_raw_devs'
+ 162 char Used for (now removed) raw block device interface
163 char
@@ -2706,18 +2691,9 @@
45 = /dev/ttyMM1 Marvell MPSC - port 1 (obsolete unused)
46 = /dev/ttyCPM0 PPC CPM (SCC or SMC) - port 0
...
- 47 = /dev/ttyCPM5 PPC CPM (SCC or SMC) - port 5
- 50 = /dev/ttyIOC0 Altix serial card
- ...
- 81 = /dev/ttyIOC31 Altix serial card
+ 51 = /dev/ttyCPM5 PPC CPM (SCC or SMC) - port 5
82 = /dev/ttyVR0 NEC VR4100 series SIU
83 = /dev/ttyVR1 NEC VR4100 series DSIU
- 84 = /dev/ttyIOC84 Altix ioc4 serial card
- ...
- 115 = /dev/ttyIOC115 Altix ioc4 serial card
- 116 = /dev/ttySIOC0 Altix ioc3 serial card
- ...
- 147 = /dev/ttySIOC31 Altix ioc3 serial card
148 = /dev/ttyPSC0 PPC PSC - port 0
...
153 = /dev/ttyPSC5 PPC PSC - port 5
@@ -2728,6 +2704,9 @@
...
185 = /dev/ttyNX15 Hilscher netX serial port 15
186 = /dev/ttyJ0 JTAG1 DCC protocol based serial port emulation
+
+ If maximum number of uartlite serial ports is more than 4, then the driver
+ uses dynamic allocation instead of static allocation for major number.
187 = /dev/ttyUL0 Xilinx uartlite - port 0
...
190 = /dev/ttyUL3 Xilinx uartlite - port 3
@@ -2776,10 +2755,7 @@
43 = /dev/ttycusmx2 Callout device for ttySMX2
46 = /dev/cucpm0 Callout device for ttyCPM0
...
- 49 = /dev/cucpm5 Callout device for ttyCPM5
- 50 = /dev/cuioc40 Callout device for ttyIOC40
- ...
- 81 = /dev/cuioc431 Callout device for ttyIOC431
+ 51 = /dev/cucpm5 Callout device for ttyCPM5
82 = /dev/cuvr0 Callout device for ttyVR0
83 = /dev/cuvr1 Callout device for ttyVR1
@@ -3002,10 +2978,10 @@
65 = /dev/infiniband/issm1 Second InfiniBand IsSM device
...
127 = /dev/infiniband/issm63 63rd InfiniBand IsSM device
- 128 = /dev/infiniband/uverbs0 First InfiniBand verbs device
- 129 = /dev/infiniband/uverbs1 Second InfiniBand verbs device
+ 192 = /dev/infiniband/uverbs0 First InfiniBand verbs device
+ 193 = /dev/infiniband/uverbs1 Second InfiniBand verbs device
...
- 159 = /dev/infiniband/uverbs31 31st InfiniBand verbs device
+ 223 = /dev/infiniband/uverbs31 31st InfiniBand verbs device
232 char Biometric Devices
0 = /dev/biometric/sensor0/fingerprint first fingerprint sensor on first device
@@ -3095,6 +3071,11 @@
...
255 = /dev/osd255 256th OSD Device
+ 261 char Compute Acceleration Devices
+ 0 = /dev/accel/accel0 First acceleration device
+ 1 = /dev/accel/accel1 Second acceleration device
+ ...
+
384-511 char RESERVED FOR DYNAMIC ASSIGNMENT
Character devices that request a dynamic allocation of major
number will take numbers starting from 511 and downward,
diff --git a/Documentation/admin-guide/dynamic-debug-howto.rst b/Documentation/admin-guide/dynamic-debug-howto.rst
index 0dc2eb8e44e5..0e9b48daf690 100644
--- a/Documentation/admin-guide/dynamic-debug-howto.rst
+++ b/Documentation/admin-guide/dynamic-debug-howto.rst
@@ -5,138 +5,115 @@ Dynamic debug
Introduction
============
-This document describes how to use the dynamic debug (dyndbg) feature.
+Dynamic debug allows you to dynamically enable/disable kernel
+debug-print code to obtain additional kernel information.
-Dynamic debug is designed to allow you to dynamically enable/disable
-kernel code to obtain additional kernel information. Currently, if
-``CONFIG_DYNAMIC_DEBUG`` is set, then all ``pr_debug()``/``dev_dbg()`` and
-``print_hex_dump_debug()``/``print_hex_dump_bytes()`` calls can be dynamically
-enabled per-callsite.
+If ``/proc/dynamic_debug/control`` exists, your kernel has dynamic
+debug. You'll need root access (sudo su) to use this.
-If ``CONFIG_DYNAMIC_DEBUG`` is not set, ``print_hex_dump_debug()`` is just
-shortcut for ``print_hex_dump(KERN_DEBUG)``.
+Dynamic debug provides:
-For ``print_hex_dump_debug()``/``print_hex_dump_bytes()``, format string is
-its ``prefix_str`` argument, if it is constant string; or ``hexdump``
-in case ``prefix_str`` is built dynamically.
+ * a Catalog of all *prdbgs* in your kernel.
+ ``cat /proc/dynamic_debug/control`` to see them.
-Dynamic debug has even more useful features:
-
- * Simple query language allows turning on and off debugging
- statements by matching any combination of 0 or 1 of:
+ * a Simple query/command language to alter *prdbgs* by selecting on
+ any combination of 0 or 1 of:
- source filename
- function name
- line number (including ranges of line numbers)
- module name
- format string
-
- * Provides a debugfs control file: ``<debugfs>/dynamic_debug/control``
- which can be read to display the complete list of known debug
- statements, to help guide you
-
-Controlling dynamic debug Behaviour
-===================================
-
-The behaviour of ``pr_debug()``/``dev_dbg()`` are controlled via writing to a
-control file in the 'debugfs' filesystem. Thus, you must first mount
-the debugfs filesystem, in order to make use of this feature.
-Subsequently, we refer to the control file as:
-``<debugfs>/dynamic_debug/control``. For example, if you want to enable
-printing from source file ``svcsock.c``, line 1603 you simply do::
-
- nullarbor:~ # echo 'file svcsock.c line 1603 +p' >
- <debugfs>/dynamic_debug/control
-
-If you make a mistake with the syntax, the write will fail thus::
-
- nullarbor:~ # echo 'file svcsock.c wtf 1 +p' >
- <debugfs>/dynamic_debug/control
- -bash: echo: write error: Invalid argument
-
-Note, for systems without 'debugfs' enabled, the control file can be
-found in ``/proc/dynamic_debug/control``.
+ - class name (as known/declared by each module)
Viewing Dynamic Debug Behaviour
===============================
-You can view the currently configured behaviour of all the debug
-statements via::
+You can view the currently configured behaviour in the *prdbg* catalog::
- nullarbor:~ # cat <debugfs>/dynamic_debug/control
+ :#> head -n7 /proc/dynamic_debug/control
# filename:lineno [module]function flags format
- /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:323 [svcxprt_rdma]svc_rdma_cleanup =_ "SVCRDMA Module Removed, deregister RPC RDMA transport\012"
- /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:341 [svcxprt_rdma]svc_rdma_init =_ "\011max_inline : %d\012"
- /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:340 [svcxprt_rdma]svc_rdma_init =_ "\011sq_depth : %d\012"
- /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:338 [svcxprt_rdma]svc_rdma_init =_ "\011max_requests : %d\012"
- ...
+ init/main.c:1179 [main]initcall_blacklist =_ "blacklisting initcall %s\012
+ init/main.c:1218 [main]initcall_blacklisted =_ "initcall %s blacklisted\012"
+ init/main.c:1424 [main]run_init_process =_ " with arguments:\012"
+ init/main.c:1426 [main]run_init_process =_ " %s\012"
+ init/main.c:1427 [main]run_init_process =_ " with environment:\012"
+ init/main.c:1429 [main]run_init_process =_ " %s\012"
+The 3rd space-delimited column shows the current flags, preceded by
+a ``=`` for easy use with grep/cut. ``=p`` shows enabled callsites.
-You can also apply standard Unix text manipulation filters to this
-data, e.g.::
+Controlling dynamic debug Behaviour
+===================================
- nullarbor:~ # grep -i rdma <debugfs>/dynamic_debug/control | wc -l
- 62
+The behaviour of *prdbg* sites are controlled by writing
+query/commands to the control file. Example::
- nullarbor:~ # grep -i tcp <debugfs>/dynamic_debug/control | wc -l
- 42
+ # grease the interface
+ :#> alias ddcmd='echo $* > /proc/dynamic_debug/control'
-The third column shows the currently enabled flags for each debug
-statement callsite (see below for definitions of the flags). The
-default value, with no flags enabled, is ``=_``. So you can view all
-the debug statement callsites with any non-default flags::
+ :#> ddcmd '-p; module main func run* +p'
+ :#> grep =p /proc/dynamic_debug/control
+ init/main.c:1424 [main]run_init_process =p " with arguments:\012"
+ init/main.c:1426 [main]run_init_process =p " %s\012"
+ init/main.c:1427 [main]run_init_process =p " with environment:\012"
+ init/main.c:1429 [main]run_init_process =p " %s\012"
- nullarbor:~ # awk '$3 != "=_"' <debugfs>/dynamic_debug/control
- # filename:lineno [module]function flags format
- /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c:1603 [sunrpc]svc_send p "svc_process: st_sendto returned %d\012"
+Error messages go to console/syslog::
+
+ :#> ddcmd mode foo +p
+ dyndbg: unknown keyword "mode"
+ dyndbg: query parse failed
+ bash: echo: write error: Invalid argument
+
+If debugfs is also enabled and mounted, ``dynamic_debug/control`` is
+also under the mount-dir, typically ``/sys/kernel/debug/``.
Command Language Reference
==========================
-At the lexical level, a command comprises a sequence of words separated
+At the basic lexical level, a command is a sequence of words separated
by spaces or tabs. So these are all equivalent::
- nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
- <debugfs>/dynamic_debug/control
- nullarbor:~ # echo -n ' file svcsock.c line 1603 +p ' >
- <debugfs>/dynamic_debug/control
- nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
- <debugfs>/dynamic_debug/control
+ :#> ddcmd file svcsock.c line 1603 +p
+ :#> ddcmd "file svcsock.c line 1603 +p"
+ :#> ddcmd ' file svcsock.c line 1603 +p '
Command submissions are bounded by a write() system call.
Multiple commands can be written together, separated by ``;`` or ``\n``::
- ~# echo "func pnpacpi_get_resources +p; func pnp_assign_mem +p" \
- > <debugfs>/dynamic_debug/control
-
-If your query set is big, you can batch them too::
+ :#> ddcmd "func pnpacpi_get_resources +p; func pnp_assign_mem +p"
+ :#> ddcmd <<"EOC"
+ func pnpacpi_get_resources +p
+ func pnp_assign_mem +p
+ EOC
+ :#> cat query-batch-file > /proc/dynamic_debug/control
- ~# cat query-batch-file > <debugfs>/dynamic_debug/control
+You can also use wildcards in each query term. The match rule supports
+``*`` (matches zero or more characters) and ``?`` (matches exactly one
+character). For example, you can match all usb drivers::
-Another way is to use wildcards. The match rule supports ``*`` (matches
-zero or more characters) and ``?`` (matches exactly one character). For
-example, you can match all usb drivers::
+ :#> ddcmd file "drivers/usb/*" +p # "" to suppress shell expansion
- ~# echo "file drivers/usb/* +p" > <debugfs>/dynamic_debug/control
-
-At the syntactical level, a command comprises a sequence of match
-specifications, followed by a flags change specification::
+Syntactically, a command is pairs of keyword values, followed by a
+flags change or setting::
command ::= match-spec* flags-spec
-The match-spec's are used to choose a subset of the known pr_debug()
-callsites to which to apply the flags-spec. Think of them as a query
-with implicit ANDs between each pair. Note that an empty list of
-match-specs will select all debug statement callsites.
+The match-spec's select *prdbgs* from the catalog, upon which to apply
+the flags-spec, all constraints are ANDed together. An absent keyword
+is the same as keyword "*".
+
-A match specification comprises a keyword, which controls the
-attribute of the callsite to be compared, and a value to compare
-against. Possible keywords are:::
+A match specification is a keyword, which selects the attribute of
+the callsite to be compared, and a value to compare against. Possible
+keywords are:::
match-spec ::= 'func' string |
'file' string |
'module' string |
'format' string |
+ 'class' string |
'line' line-range
line-range ::= lineno |
@@ -159,15 +136,18 @@ func
of each callsite. Example::
func svc_tcp_accept
+ func *recv* # in rfcomm, bluetooth, ping, tcp
file
- The given string is compared against either the full pathname, the
- src-root relative pathname, or the basename of the source file of
- each callsite. Examples::
+ The given string is compared against either the src-root relative
+ pathname, or the basename of the source file of each callsite.
+ Examples::
file svcsock.c
- file kernel/freezer.c
- file /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c
+ file kernel/freezer.c # ie column 1 of control file
+ file drivers/usb/* # all callsites under it
+ file inode.c:start_* # parse :tail as a func (above)
+ file inode.c:1-100 # parse :tail as a line-range (above)
module
The given string is compared against the module name
@@ -177,6 +157,7 @@ module
module sunrpc
module nfsd
+ module drm* # both drm, drm_kms_helper
format
The given string is searched for in the dynamic debug format
@@ -194,6 +175,16 @@ format
format "nfsd: SETATTR" // a neater way to match a format with whitespace
format 'nfsd: SETATTR' // yet another way to match a format with whitespace
+class
+ The given class_name is validated against each module, which may
+ have declared a list of known class_names. If the class_name is
+ found for a module, callsite & class matching and adjustment
+ proceeds. Examples::
+
+ class DRM_UT_KMS # a DRM.debug category
+ class JUNK # silent non-match
+ // class TLD_* # NOTICE: no wildcard in class names
+
line
The given line number or range of line numbers is compared
against the line number of each ``pr_debug()`` callsite. A single
@@ -219,20 +210,20 @@ of the characters::
The flags are::
p enables the pr_debug() callsite.
- f Include the function name in the printed message
- l Include line number in the printed message
- m Include module name in the printed message
- t Include thread ID in messages not generated from interrupt context
- _ No flags are set. (Or'd with others on input)
+ _ enables no flags.
-For ``print_hex_dump_debug()`` and ``print_hex_dump_bytes()``, only ``p`` flag
-have meaning, other flags ignored.
+ Decorator flags add to the message-prefix, in order:
+ t Include thread ID, or <intr>
+ m Include module name
+ f Include the function name
+ s Include the source file name
+ l Include line number
-For display, the flags are preceded by ``=``
-(mnemonic: what the flags are currently equal to).
+For ``print_hex_dump_debug()`` and ``print_hex_dump_bytes()``, only
+the ``p`` flag has meaning, other flags are ignored.
-Note the regexp ``^[-+=][flmpt_]+$`` matches a flags specification.
-To clear all flags at once, use ``=_`` or ``-flmpt``.
+Note the regexp ``^[-+=][fslmpt_]+$`` matches a flags specification.
+To clear all flags at once, use ``=_`` or ``-fslmpt``.
Debug messages during Boot Process
@@ -240,14 +231,13 @@ Debug messages during Boot Process
To activate debug messages for core code and built-in modules during
the boot process, even before userspace and debugfs exists, use
-``dyndbg="QUERY"``, ``module.dyndbg="QUERY"``, or ``ddebug_query="QUERY"``
-(``ddebug_query`` is obsoleted by ``dyndbg``, and deprecated). QUERY follows
+``dyndbg="QUERY"`` or ``module.dyndbg="QUERY"``. QUERY follows
the syntax described above, but must not exceed 1023 characters. Your
bootloader may impose lower limits.
These ``dyndbg`` params are processed just after the ddebug tables are
-processed, as part of the arch_initcall. Thus you can enable debug
-messages in all code run after this arch_initcall via this boot
+processed, as part of the early_initcall. Thus you can enable debug
+messages in all code run after this early_initcall via this boot
parameter.
On an x86 system for example ACPI enablement is a subsys_initcall and::
@@ -261,8 +251,7 @@ this boot parameter for debugging purposes.
If ``foo`` module is not built-in, ``foo.dyndbg`` will still be processed at
boot time, without effect, but will be reprocessed when module is
-loaded later. ``ddebug_query=`` and bare ``dyndbg=`` are only processed at
-boot.
+loaded later. Bare ``dyndbg=`` is only processed at boot.
Debug Messages at Module Initialization Time
@@ -270,7 +259,7 @@ Debug Messages at Module Initialization Time
When ``modprobe foo`` is called, modprobe scans ``/proc/cmdline`` for
``foo.params``, strips ``foo.``, and passes them to the kernel along with
-params given in modprobe args or ``/etc/modprob.d/*.conf`` files,
+params given in modprobe args or ``/etc/modprobe.d/*.conf`` files,
in the following order:
1. parameters given via ``/etc/modprobe.d/*.conf``::
@@ -306,7 +295,7 @@ For ``CONFIG_DYNAMIC_DEBUG`` kernels, any settings given at boot-time (or
enabled by ``-DDEBUG`` flag during compilation) can be disabled later via
the debugfs interface if the debug messages are no longer needed::
- echo "module module_name -p" > <debugfs>/dynamic_debug/control
+ echo "module module_name -p" > /proc/dynamic_debug/control
Examples
========
@@ -314,43 +303,75 @@ Examples
::
// enable the message at line 1603 of file svcsock.c
- nullarbor:~ # echo -n 'file svcsock.c line 1603 +p' >
- <debugfs>/dynamic_debug/control
+ :#> ddcmd 'file svcsock.c line 1603 +p'
// enable all the messages in file svcsock.c
- nullarbor:~ # echo -n 'file svcsock.c +p' >
- <debugfs>/dynamic_debug/control
+ :#> ddcmd 'file svcsock.c +p'
// enable all the messages in the NFS server module
- nullarbor:~ # echo -n 'module nfsd +p' >
- <debugfs>/dynamic_debug/control
+ :#> ddcmd 'module nfsd +p'
// enable all 12 messages in the function svc_process()
- nullarbor:~ # echo -n 'func svc_process +p' >
- <debugfs>/dynamic_debug/control
+ :#> ddcmd 'func svc_process +p'
// disable all 12 messages in the function svc_process()
- nullarbor:~ # echo -n 'func svc_process -p' >
- <debugfs>/dynamic_debug/control
+ :#> ddcmd 'func svc_process -p'
// enable messages for NFS calls READ, READLINK, READDIR and READDIR+.
- nullarbor:~ # echo -n 'format "nfsd: READ" +p' >
- <debugfs>/dynamic_debug/control
+ :#> ddcmd 'format "nfsd: READ" +p'
// enable messages in files of which the paths include string "usb"
- nullarbor:~ # echo -n '*usb* +p' > <debugfs>/dynamic_debug/control
+ :#> ddcmd 'file *usb* +p'
// enable all messages
- nullarbor:~ # echo -n '+p' > <debugfs>/dynamic_debug/control
+ :#> ddcmd '+p'
// add module, function to all enabled messages
- nullarbor:~ # echo -n '+mf' > <debugfs>/dynamic_debug/control
+ :#> ddcmd '+mf'
// boot-args example, with newlines and comments for readability
Kernel command line: ...
- // see whats going on in dyndbg=value processing
- dynamic_debug.verbose=1
- // enable pr_debugs in 2 builtins, #cmt is stripped
- dyndbg="module params +p #cmt ; module sys +p"
+ // see what's going on in dyndbg=value processing
+ dynamic_debug.verbose=3
+ // enable pr_debugs in the btrfs module (can be builtin or loadable)
+ btrfs.dyndbg="+p"
+ // enable pr_debugs in all files under init/
+ // and the function parse_one, #cmt is stripped
+ dyndbg="file init/* +p #cmt ; func parse_one +p"
// enable pr_debugs in 2 functions in a module loaded later
pc87360.dyndbg="func pc87360_init_device +p; func pc87360_find +p"
+
+Kernel Configuration
+====================
+
+Dynamic Debug is enabled via kernel config items::
+
+ CONFIG_DYNAMIC_DEBUG=y # build catalog, enables CORE
+ CONFIG_DYNAMIC_DEBUG_CORE=y # enable mechanics only, skip catalog
+
+If you do not want to enable dynamic debug globally (i.e. in some embedded
+system), you may set ``CONFIG_DYNAMIC_DEBUG_CORE`` as basic support of dynamic
+debug and add ``ccflags := -DDYNAMIC_DEBUG_MODULE`` into the Makefile of any
+modules which you'd like to dynamically debug later.
+
+
+Kernel *prdbg* API
+==================
+
+The following functions are cataloged and controllable when dynamic
+debug is enabled::
+
+ pr_debug()
+ dev_dbg()
+ print_hex_dump_debug()
+ print_hex_dump_bytes()
+
+Otherwise, they are off by default; ``ccflags += -DDEBUG`` or
+``#define DEBUG`` in a source file will enable them appropriately.
+
+If ``CONFIG_DYNAMIC_DEBUG`` is not set, ``print_hex_dump_debug()`` is
+just a shortcut for ``print_hex_dump(KERN_DEBUG)``.
+
+For ``print_hex_dump_debug()``/``print_hex_dump_bytes()``, format string is
+its ``prefix_str`` argument, if it is constant string; or ``hexdump``
+in case ``prefix_str`` is built dynamically.
diff --git a/Documentation/admin-guide/efi-stub.rst b/Documentation/admin-guide/efi-stub.rst
index 833edb0d0bc4..090f3a185e18 100644
--- a/Documentation/admin-guide/efi-stub.rst
+++ b/Documentation/admin-guide/efi-stub.rst
@@ -7,15 +7,15 @@ as a PE/COFF image, thereby convincing EFI firmware loaders to load
it as an EFI executable. The code that modifies the bzImage header,
along with the EFI-specific entry point that the firmware loader
jumps to are collectively known as the "EFI boot stub", and live in
-arch/x86/boot/header.S and arch/x86/boot/compressed/eboot.c,
+arch/x86/boot/header.S and drivers/firmware/efi/libstub/x86-stub.c,
respectively. For ARM the EFI stub is implemented in
arch/arm/boot/compressed/efi-header.S and
-arch/arm/boot/compressed/efi-stub.c. EFI stub code that is shared
+drivers/firmware/efi/libstub/arm32-stub.c. EFI stub code that is shared
between architectures is in drivers/firmware/efi/libstub.
For arm64, there is no compressed kernel support, so the Image itself
masquerades as a PE/COFF image and the EFI stub is linked into the
-kernel. The arm64 EFI stub lives in arch/arm64/kernel/efi-entry.S
+kernel. The arm64 EFI stub lives in drivers/firmware/efi/libstub/arm64.c
and drivers/firmware/efi/libstub/arm64-stub.c.
By using the EFI boot stub it's possible to boot a Linux kernel
diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst
index 9443fcef1876..5740d85439ff 100644
--- a/Documentation/admin-guide/ext4.rst
+++ b/Documentation/admin-guide/ext4.rst
@@ -392,9 +392,16 @@ When mounting an ext4 filesystem, the following option are accepted:
dax
Use direct access (no page cache). See
- Documentation/filesystems/dax.txt. Note that this option is
+ Documentation/filesystems/dax.rst. Note that this option is
incompatible with data=journal.
+ inlinecrypt
+ When possible, encrypt/decrypt the contents of encrypted files using the
+ blk-crypto framework rather than filesystem-layer encryption. This
+ allows the use of inline encryption hardware. The on-disk format is
+ unaffected. For more details, see
+ Documentation/block/inline-encryption.rst.
+
Data Mode
=========
There are 3 different data modes:
@@ -522,21 +529,21 @@ Files in /sys/fs/ext4/<devname>:
Ioctls
======
-There is some Ext4 specific functionality which can be accessed by applications
-through the system call interfaces. The list of all Ext4 specific ioctls are
-shown in the table below.
+Ext4 implements various ioctls which can be used by applications to access
+ext4-specific functionality. An incomplete list of these ioctls is shown in the
+table below. This list includes truly ext4-specific ioctls (``EXT4_IOC_*``) as
+well as ioctls that may have been ext4-specific originally but are now supported
+by some other filesystem(s) too (``FS_IOC_*``).
-Table of Ext4 specific ioctls
+Table of Ext4 ioctls
- EXT4_IOC_GETFLAGS
+ FS_IOC_GETFLAGS
Get additional attributes associated with inode. The ioctl argument is
- an integer bitfield, with bit values described in ext4.h. This ioctl is
- an alias for FS_IOC_GETFLAGS.
+ an integer bitfield, with bit values described in ext4.h.
- EXT4_IOC_SETFLAGS
+ FS_IOC_SETFLAGS
Set additional attributes associated with inode. The ioctl argument is
- an integer bitfield, with bit values described in ext4.h. This ioctl is
- an alias for FS_IOC_SETFLAGS.
+ an integer bitfield, with bit values described in ext4.h.
EXT4_IOC_GETVERSION, EXT4_IOC_GETVERSION_OLD
Get the inode i_generation number stored for each inode. The
@@ -611,7 +618,7 @@ kernel source: <file:fs/ext4/>
programs: http://e2fsprogs.sourceforge.net/
-useful links: http://fedoraproject.org/wiki/ext3-devel
+useful links: https://fedoraproject.org/wiki/ext3-devel
http://www.bullopensource.org/ext4/
http://ext4.wiki.kernel.org/index.php/Main_Page
- http://fedoraproject.org/wiki/Features/Ext4
+ https://fedoraproject.org/wiki/Features/Ext4
diff --git a/Documentation/admin-guide/features.rst b/Documentation/admin-guide/features.rst
new file mode 100644
index 000000000000..7651eca38227
--- /dev/null
+++ b/Documentation/admin-guide/features.rst
@@ -0,0 +1,3 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. kernel-feat:: features
diff --git a/Documentation/admin-guide/filesystem-monitoring.rst b/Documentation/admin-guide/filesystem-monitoring.rst
new file mode 100644
index 000000000000..ab8dba76283c
--- /dev/null
+++ b/Documentation/admin-guide/filesystem-monitoring.rst
@@ -0,0 +1,78 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================
+File system Monitoring with fanotify
+====================================
+
+File system Error Reporting
+===========================
+
+Fanotify supports the FAN_FS_ERROR event type for file system-wide error
+reporting. It is meant to be used by file system health monitoring
+daemons, which listen for these events and take actions (notify
+sysadmin, start recovery) when a file system problem is detected.
+
+By design, a FAN_FS_ERROR notification exposes sufficient information
+for a monitoring tool to know a problem in the file system has happened.
+It doesn't necessarily provide a user space application with semantics
+to verify an IO operation was successfully executed. That is out of
+scope for this feature. Instead, it is only meant as a framework for
+early file system problem detection and reporting recovery tools.
+
+When a file system operation fails, it is common for dozens of kernel
+errors to cascade after the initial failure, hiding the original failure
+log, which is usually the most useful debug data to troubleshoot the
+problem. For this reason, FAN_FS_ERROR tries to report only the first
+error that occurred for a file system since the last notification, and
+it simply counts additional errors. This ensures that the most
+important pieces of information are never lost.
+
+FAN_FS_ERROR requires the fanotify group to be setup with the
+FAN_REPORT_FID flag.
+
+At the time of this writing, the only file system that emits FAN_FS_ERROR
+notifications is Ext4.
+
+A FAN_FS_ERROR Notification has the following format::
+
+ ::
+
+ [ Notification Metadata (Mandatory) ]
+ [ Generic Error Record (Mandatory) ]
+ [ FID record (Mandatory) ]
+
+The order of records is not guaranteed, and new records might be added
+in the future. Therefore, applications must not rely on the order and
+must be prepared to skip over unknown records. Please refer to
+``samples/fanotify/fs-monitor.c`` for an example parser.
+
+Generic error record
+--------------------
+
+The generic error record provides enough information for a file system
+agnostic tool to learn about a problem in the file system, without
+providing any additional details about the problem. This record is
+identified by ``struct fanotify_event_info_header.info_type`` being set
+to FAN_EVENT_INFO_TYPE_ERROR.
+
+ ::
+
+ struct fanotify_event_info_error {
+ struct fanotify_event_info_header hdr;
+ __s32 error;
+ __u32 error_count;
+ };
+
+The `error` field identifies the type of error using errno values.
+`error_count` tracks the number of errors that occurred and were
+suppressed to preserve the original error information, since the last
+notification.
+
+FID record
+----------
+
+The FID record can be used to uniquely identify the inode that triggered
+the error through the combination of fsid and file handle. A file system
+specific application can use that information to attempt a recovery
+procedure. Errors that are not related to an inode are reported with an
+empty file handle of type FILEID_INVALID.
diff --git a/Documentation/admin-guide/gpio/gpio-aggregator.rst b/Documentation/admin-guide/gpio/gpio-aggregator.rst
new file mode 100644
index 000000000000..5cd1e7221756
--- /dev/null
+++ b/Documentation/admin-guide/gpio/gpio-aggregator.rst
@@ -0,0 +1,111 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+GPIO Aggregator
+===============
+
+The GPIO Aggregator provides a mechanism to aggregate GPIOs, and expose them as
+a new gpio_chip. This supports the following use cases.
+
+
+Aggregating GPIOs using Sysfs
+-----------------------------
+
+GPIO controllers are exported to userspace using /dev/gpiochip* character
+devices. Access control to these devices is provided by standard UNIX file
+system permissions, on an all-or-nothing basis: either a GPIO controller is
+accessible for a user, or it is not.
+
+The GPIO Aggregator provides access control for a set of one or more GPIOs, by
+aggregating them into a new gpio_chip, which can be assigned to a group or user
+using standard UNIX file ownership and permissions. Furthermore, this
+simplifies and hardens exporting GPIOs to a virtual machine, as the VM can just
+grab the full GPIO controller, and no longer needs to care about which GPIOs to
+grab and which not, reducing the attack surface.
+
+Aggregated GPIO controllers are instantiated and destroyed by writing to
+write-only attribute files in sysfs.
+
+ /sys/bus/platform/drivers/gpio-aggregator/
+
+ "new_device" ...
+ Userspace may ask the kernel to instantiate an aggregated GPIO
+ controller by writing a string describing the GPIOs to
+ aggregate to the "new_device" file, using the format
+
+ .. code-block:: none
+
+ [<gpioA>] [<gpiochipB> <offsets>] ...
+
+ Where:
+
+ "<gpioA>" ...
+ is a GPIO line name,
+
+ "<gpiochipB>" ...
+ is a GPIO chip label, and
+
+ "<offsets>" ...
+ is a comma-separated list of GPIO offsets and/or
+ GPIO offset ranges denoted by dashes.
+
+ Example: Instantiate a new GPIO aggregator by aggregating GPIO
+ line 19 of "e6052000.gpio" and GPIO lines 20-21 of
+ "e6050000.gpio" into a new gpio_chip:
+
+ .. code-block:: sh
+
+ $ echo 'e6052000.gpio 19 e6050000.gpio 20-21' > new_device
+
+ "delete_device" ...
+ Userspace may ask the kernel to destroy an aggregated GPIO
+ controller after use by writing its device name to the
+ "delete_device" file.
+
+ Example: Destroy the previously-created aggregated GPIO
+ controller, assumed to be "gpio-aggregator.0":
+
+ .. code-block:: sh
+
+ $ echo gpio-aggregator.0 > delete_device
+
+
+Generic GPIO Driver
+-------------------
+
+The GPIO Aggregator can also be used as a generic driver for a simple
+GPIO-operated device described in DT, without a dedicated in-kernel driver.
+This is useful in industrial control, and is not unlike e.g. spidev, which
+allows the user to communicate with an SPI device from userspace.
+
+Binding a device to the GPIO Aggregator is performed either by modifying the
+gpio-aggregator driver, or by writing to the "driver_override" file in Sysfs.
+
+Example: If "door" is a GPIO-operated device described in DT, using its own
+compatible value::
+
+ door {
+ compatible = "myvendor,mydoor";
+
+ gpios = <&gpio2 19 GPIO_ACTIVE_HIGH>,
+ <&gpio2 20 GPIO_ACTIVE_LOW>;
+ gpio-line-names = "open", "lock";
+ };
+
+it can be bound to the GPIO Aggregator by either:
+
+1. Adding its compatible value to ``gpio_aggregator_dt_ids[]``,
+2. Binding manually using "driver_override":
+
+.. code-block:: sh
+
+ $ echo gpio-aggregator > /sys/bus/platform/devices/door/driver_override
+ $ echo door > /sys/bus/platform/drivers/gpio-aggregator/bind
+
+After that, a new gpiochip "door" has been created:
+
+.. code-block:: sh
+
+ $ gpioinfo door
+ gpiochip12 - 2 lines:
+ line 0: "open" unused input active-high
+ line 1: "lock" unused input active-high
diff --git a/Documentation/admin-guide/gpio/gpio-mockup.rst b/Documentation/admin-guide/gpio/gpio-mockup.rst
new file mode 100644
index 000000000000..493071da1738
--- /dev/null
+++ b/Documentation/admin-guide/gpio/gpio-mockup.rst
@@ -0,0 +1,51 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+GPIO Testing Driver
+===================
+
+The GPIO Testing Driver (gpio-mockup) provides a way to create simulated GPIO
+chips for testing purposes. The lines exposed by these chips can be accessed
+using the standard GPIO character device interface as well as manipulated
+using the dedicated debugfs directory structure.
+
+Creating simulated chips using module params
+--------------------------------------------
+
+When loading the gpio-mockup driver a number of parameters can be passed to the
+module.
+
+ gpio_mockup_ranges
+
+ This parameter takes an argument in the form of an array of integer
+ pairs. Each pair defines the base GPIO number (non-negative integer)
+ and the first number after the last of this chip. If the base GPIO
+ is -1, the gpiolib will assign it automatically. while the following
+ parameter is the number of lines exposed by the chip.
+
+ Example: gpio_mockup_ranges=-1,8,-1,16,405,409
+
+ The line above creates three chips. The first one will expose 8 lines,
+ the second 16 and the third 4. The base GPIO for the third chip is set
+ to 405 while for two first chips it will be assigned automatically.
+
+ gpio_mockup_named_lines
+
+ This parameter doesn't take any arguments. It lets the driver know that
+ GPIO lines exposed by it should be named.
+
+ The name format is: gpio-mockup-X-Y where X is mockup chip's ID
+ and Y is the line offset.
+
+Manipulating simulated lines
+----------------------------
+
+Each mockup chip creates its own subdirectory in /sys/kernel/debug/gpio-mockup/.
+The directory is named after the chip's label. A symlink is also created, named
+after the chip's name, which points to the label directory.
+
+Inside each subdirectory, there's a separate attribute for each GPIO line. The
+name of the attribute represents the line's offset in the chip.
+
+Reading from a line attribute returns the current value. Writing to it (0 or 1)
+changes the configuration of the simulated pull-up/pull-down resistor
+(1 - pull-up, 0 - pull-down).
diff --git a/Documentation/admin-guide/gpio/gpio-sim.rst b/Documentation/admin-guide/gpio/gpio-sim.rst
new file mode 100644
index 000000000000..1cc5567a4bbe
--- /dev/null
+++ b/Documentation/admin-guide/gpio/gpio-sim.rst
@@ -0,0 +1,134 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+Configfs GPIO Simulator
+=======================
+
+The configfs GPIO Simulator (gpio-sim) provides a way to create simulated GPIO
+chips for testing purposes. The lines exposed by these chips can be accessed
+using the standard GPIO character device interface as well as manipulated
+using sysfs attributes.
+
+Creating simulated chips
+------------------------
+
+The gpio-sim module registers a configfs subsystem called ``'gpio-sim'``. For
+details of the configfs filesystem, please refer to the configfs documentation.
+
+The user can create a hierarchy of configfs groups and items as well as modify
+values of exposed attributes. Once the chip is instantiated, this hierarchy
+will be translated to appropriate device properties. The general structure is:
+
+**Group:** ``/config/gpio-sim``
+
+This is the top directory of the gpio-sim configfs tree.
+
+**Group:** ``/config/gpio-sim/gpio-device``
+
+**Attribute:** ``/config/gpio-sim/gpio-device/dev_name``
+
+**Attribute:** ``/config/gpio-sim/gpio-device/live``
+
+This is a directory representing a GPIO platform device. The ``'dev_name'``
+attribute is read-only and allows the user-space to read the platform device
+name (e.g. ``'gpio-sim.0'``). The ``'live'`` attribute allows to trigger the
+actual creation of the device once it's fully configured. The accepted values
+are: ``'1'`` to enable the simulated device and ``'0'`` to disable and tear
+it down.
+
+**Group:** ``/config/gpio-sim/gpio-device/gpio-bankX``
+
+**Attribute:** ``/config/gpio-sim/gpio-device/gpio-bankX/chip_name``
+
+**Attribute:** ``/config/gpio-sim/gpio-device/gpio-bankX/num_lines``
+
+This group represents a bank of GPIOs under the top platform device. The
+``'chip_name'`` attribute is read-only and allows the user-space to read the
+device name of the bank device. The ``'num_lines'`` attribute allows to specify
+the number of lines exposed by this bank.
+
+**Group:** ``/config/gpio-sim/gpio-device/gpio-bankX/lineY``
+
+**Attribute:** ``/config/gpio-sim/gpio-device/gpio-bankX/lineY/name``
+
+This group represents a single line at the offset Y. The 'name' attribute
+allows to set the line name as represented by the 'gpio-line-names' property.
+
+**Item:** ``/config/gpio-sim/gpio-device/gpio-bankX/lineY/hog``
+
+**Attribute:** ``/config/gpio-sim/gpio-device/gpio-bankX/lineY/hog/name``
+
+**Attribute:** ``/config/gpio-sim/gpio-device/gpio-bankX/lineY/hog/direction``
+
+This item makes the gpio-sim module hog the associated line. The ``'name'``
+attribute specifies the in-kernel consumer name to use. The ``'direction'``
+attribute specifies the hog direction and must be one of: ``'input'``,
+``'output-high'`` and ``'output-low'``.
+
+Inside each bank directory, there's a set of attributes that can be used to
+configure the new chip. Additionally the user can ``mkdir()`` subdirectories
+inside the chip's directory that allow to pass additional configuration for
+specific lines. The name of those subdirectories must take the form of:
+``'line<offset>'`` (e.g. ``'line0'``, ``'line20'``, etc.) as the name will be
+used by the module to assign the config to the specific line at given offset.
+
+Once the confiuration is complete, the ``'live'`` attribute must be set to 1 in
+order to instantiate the chip. It can be set back to 0 to destroy the simulated
+chip. The module will synchronously wait for the new simulated device to be
+successfully probed and if this doesn't happen, writing to ``'live'`` will
+result in an error.
+
+Simulated GPIO chips can also be defined in device-tree. The compatible string
+must be: ``"gpio-simulator"``. Supported properties are:
+
+ ``"gpio-sim,label"`` - chip label
+
+Other standard GPIO properties (like ``"gpio-line-names"``, ``"ngpios"`` or
+``"gpio-hog"``) are also supported. Please refer to the GPIO documentation for
+details.
+
+An example device-tree code defining a GPIO simulator:
+
+.. code-block :: none
+
+ gpio-sim {
+ compatible = "gpio-simulator";
+
+ bank0 {
+ gpio-controller;
+ #gpio-cells = <2>;
+ ngpios = <16>;
+ gpio-sim,label = "dt-bank0";
+ gpio-line-names = "", "sim-foo", "", "sim-bar";
+ };
+
+ bank1 {
+ gpio-controller;
+ #gpio-cells = <2>;
+ ngpios = <8>;
+ gpio-sim,label = "dt-bank1";
+
+ line3 {
+ gpio-hog;
+ gpios = <3 0>;
+ output-high;
+ line-name = "sim-hog-from-dt";
+ };
+ };
+ };
+
+Manipulating simulated lines
+----------------------------
+
+Each simulated GPIO chip creates a separate sysfs group under its device
+directory for each exposed line
+(e.g. ``/sys/devices/platform/gpio-sim.X/gpiochipY/``). The name of each group
+is of the form: ``'sim_gpioX'`` where X is the offset of the line. Inside each
+group there are two attributes:
+
+ ``pull`` - allows to read and set the current simulated pull setting for
+ every line, when writing the value must be one of: ``'pull-up'``,
+ ``'pull-down'``
+
+ ``value`` - allows to read the current value of the line which may be
+ different from the pull if the line is being driven from
+ user-space
diff --git a/Documentation/admin-guide/gpio/index.rst b/Documentation/admin-guide/gpio/index.rst
index a244ba4e87d5..f6861ca16ffe 100644
--- a/Documentation/admin-guide/gpio/index.rst
+++ b/Documentation/admin-guide/gpio/index.rst
@@ -7,7 +7,10 @@ gpio
.. toctree::
:maxdepth: 1
+ gpio-aggregator
sysfs
+ gpio-mockup
+ gpio-sim
.. only:: subproject and html
diff --git a/Documentation/admin-guide/gpio/sysfs.rst b/Documentation/admin-guide/gpio/sysfs.rst
index ec09ffd983e7..35171d15f78d 100644
--- a/Documentation/admin-guide/gpio/sysfs.rst
+++ b/Documentation/admin-guide/gpio/sysfs.rst
@@ -145,7 +145,7 @@ requested using gpio_request()::
/* export the GPIO to userspace */
int gpiod_export(struct gpio_desc *desc, bool direction_may_change);
- /* reverse gpio_export() */
+ /* reverse gpiod_export() */
void gpiod_unexport(struct gpio_desc *desc);
/* create a sysfs link to an exported GPIO node */
diff --git a/Documentation/admin-guide/hw-vuln/core-scheduling.rst b/Documentation/admin-guide/hw-vuln/core-scheduling.rst
new file mode 100644
index 000000000000..cf1eeefdfc32
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/core-scheduling.rst
@@ -0,0 +1,226 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+Core Scheduling
+===============
+Core scheduling support allows userspace to define groups of tasks that can
+share a core. These groups can be specified either for security usecases (one
+group of tasks don't trust another), or for performance usecases (some
+workloads may benefit from running on the same core as they don't need the same
+hardware resources of the shared core, or may prefer different cores if they
+do share hardware resource needs). This document only describes the security
+usecase.
+
+Security usecase
+----------------
+A cross-HT attack involves the attacker and victim running on different Hyper
+Threads of the same core. MDS and L1TF are examples of such attacks. The only
+full mitigation of cross-HT attacks is to disable Hyper Threading (HT). Core
+scheduling is a scheduler feature that can mitigate some (not all) cross-HT
+attacks. It allows HT to be turned on safely by ensuring that only tasks in a
+user-designated trusted group can share a core. This increase in core sharing
+can also improve performance, however it is not guaranteed that performance
+will always improve, though that is seen to be the case with a number of real
+world workloads. In theory, core scheduling aims to perform at least as good as
+when Hyper Threading is disabled. In practice, this is mostly the case though
+not always: as synchronizing scheduling decisions across 2 or more CPUs in a
+core involves additional overhead - especially when the system is lightly
+loaded. When ``total_threads <= N_CPUS/2``, the extra overhead may cause core
+scheduling to perform more poorly compared to SMT-disabled, where N_CPUS is the
+total number of CPUs. Please measure the performance of your workloads always.
+
+Usage
+-----
+Core scheduling support is enabled via the ``CONFIG_SCHED_CORE`` config option.
+Using this feature, userspace defines groups of tasks that can be co-scheduled
+on the same core. The core scheduler uses this information to make sure that
+tasks that are not in the same group never run simultaneously on a core, while
+doing its best to satisfy the system's scheduling requirements.
+
+Core scheduling can be enabled via the ``PR_SCHED_CORE`` prctl interface.
+This interface provides support for the creation of core scheduling groups, as
+well as admission and removal of tasks from created groups::
+
+ #include <sys/prctl.h>
+
+ int prctl(int option, unsigned long arg2, unsigned long arg3,
+ unsigned long arg4, unsigned long arg5);
+
+option:
+ ``PR_SCHED_CORE``
+
+arg2:
+ Command for operation, must be one off:
+
+ - ``PR_SCHED_CORE_GET`` -- get core_sched cookie of ``pid``.
+ - ``PR_SCHED_CORE_CREATE`` -- create a new unique cookie for ``pid``.
+ - ``PR_SCHED_CORE_SHARE_TO`` -- push core_sched cookie to ``pid``.
+ - ``PR_SCHED_CORE_SHARE_FROM`` -- pull core_sched cookie from ``pid``.
+
+arg3:
+ ``pid`` of the task for which the operation applies.
+
+arg4:
+ ``pid_type`` for which the operation applies. It is one of
+ ``PR_SCHED_CORE_SCOPE_``-prefixed macro constants. For example, if arg4
+ is ``PR_SCHED_CORE_SCOPE_THREAD_GROUP``, then the operation of this command
+ will be performed for all tasks in the task group of ``pid``.
+
+arg5:
+ userspace pointer to an unsigned long for storing the cookie returned by
+ ``PR_SCHED_CORE_GET`` command. Should be 0 for all other commands.
+
+In order for a process to push a cookie to, or pull a cookie from a process, it
+is required to have the ptrace access mode: `PTRACE_MODE_READ_REALCREDS` to the
+process.
+
+Building hierarchies of tasks
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The simplest way to build hierarchies of threads/processes which share a
+cookie and thus a core is to rely on the fact that the core-sched cookie is
+inherited across forks/clones and execs, thus setting a cookie for the
+'initial' script/executable/daemon will place every spawned child in the
+same core-sched group.
+
+Cookie Transferral
+~~~~~~~~~~~~~~~~~~
+Transferring a cookie between the current and other tasks is possible using
+PR_SCHED_CORE_SHARE_FROM and PR_SCHED_CORE_SHARE_TO to inherit a cookie from a
+specified task or a share a cookie with a task. In combination this allows a
+simple helper program to pull a cookie from a task in an existing core
+scheduling group and share it with already running tasks.
+
+Design/Implementation
+---------------------
+Each task that is tagged is assigned a cookie internally in the kernel. As
+mentioned in `Usage`_, tasks with the same cookie value are assumed to trust
+each other and share a core.
+
+The basic idea is that, every schedule event tries to select tasks for all the
+siblings of a core such that all the selected tasks running on a core are
+trusted (same cookie) at any point in time. Kernel threads are assumed trusted.
+The idle task is considered special, as it trusts everything and everything
+trusts it.
+
+During a schedule() event on any sibling of a core, the highest priority task on
+the sibling's core is picked and assigned to the sibling calling schedule(), if
+the sibling has the task enqueued. For rest of the siblings in the core,
+highest priority task with the same cookie is selected if there is one runnable
+in their individual run queues. If a task with same cookie is not available,
+the idle task is selected. Idle task is globally trusted.
+
+Once a task has been selected for all the siblings in the core, an IPI is sent to
+siblings for whom a new task was selected. Siblings on receiving the IPI will
+switch to the new task immediately. If an idle task is selected for a sibling,
+then the sibling is considered to be in a `forced idle` state. I.e., it may
+have tasks on its on runqueue to run, however it will still have to run idle.
+More on this in the next section.
+
+Forced-idling of hyperthreads
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The scheduler tries its best to find tasks that trust each other such that all
+tasks selected to be scheduled are of the highest priority in a core. However,
+it is possible that some runqueues had tasks that were incompatible with the
+highest priority ones in the core. Favoring security over fairness, one or more
+siblings could be forced to select a lower priority task if the highest
+priority task is not trusted with respect to the core wide highest priority
+task. If a sibling does not have a trusted task to run, it will be forced idle
+by the scheduler (idle thread is scheduled to run).
+
+When the highest priority task is selected to run, a reschedule-IPI is sent to
+the sibling to force it into idle. This results in 4 cases which need to be
+considered depending on whether a VM or a regular usermode process was running
+on either HT::
+
+ HT1 (attack) HT2 (victim)
+ A idle -> user space user space -> idle
+ B idle -> user space guest -> idle
+ C idle -> guest user space -> idle
+ D idle -> guest guest -> idle
+
+Note that for better performance, we do not wait for the destination CPU
+(victim) to enter idle mode. This is because the sending of the IPI would bring
+the destination CPU immediately into kernel mode from user space, or VMEXIT
+in the case of guests. At best, this would only leak some scheduler metadata
+which may not be worth protecting. It is also possible that the IPI is received
+too late on some architectures, but this has not been observed in the case of
+x86.
+
+Trust model
+~~~~~~~~~~~
+Core scheduling maintains trust relationships amongst groups of tasks by
+assigning them a tag that is the same cookie value.
+When a system with core scheduling boots, all tasks are considered to trust
+each other. This is because the core scheduler does not have information about
+trust relationships until userspace uses the above mentioned interfaces, to
+communicate them. In other words, all tasks have a default cookie value of 0.
+and are considered system-wide trusted. The forced-idling of siblings running
+cookie-0 tasks is also avoided.
+
+Once userspace uses the above mentioned interfaces to group sets of tasks, tasks
+within such groups are considered to trust each other, but do not trust those
+outside. Tasks outside the group also don't trust tasks within.
+
+Limitations of core-scheduling
+------------------------------
+Core scheduling tries to guarantee that only trusted tasks run concurrently on a
+core. But there could be small window of time during which untrusted tasks run
+concurrently or kernel could be running concurrently with a task not trusted by
+kernel.
+
+IPI processing delays
+~~~~~~~~~~~~~~~~~~~~~
+Core scheduling selects only trusted tasks to run together. IPI is used to notify
+the siblings to switch to the new task. But there could be hardware delays in
+receiving of the IPI on some arch (on x86, this has not been observed). This may
+cause an attacker task to start running on a CPU before its siblings receive the
+IPI. Even though cache is flushed on entry to user mode, victim tasks on siblings
+may populate data in the cache and micro architectural buffers after the attacker
+starts to run and this is a possibility for data leak.
+
+Open cross-HT issues that core scheduling does not solve
+--------------------------------------------------------
+1. For MDS
+~~~~~~~~~~
+Core scheduling cannot protect against MDS attacks between the siblings
+running in user mode and the others running in kernel mode. Even though all
+siblings run tasks which trust each other, when the kernel is executing
+code on behalf of a task, it cannot trust the code running in the
+sibling. Such attacks are possible for any combination of sibling CPU modes
+(host or guest mode).
+
+2. For L1TF
+~~~~~~~~~~~
+Core scheduling cannot protect against an L1TF guest attacker exploiting a
+guest or host victim. This is because the guest attacker can craft invalid
+PTEs which are not inverted due to a vulnerable guest kernel. The only
+solution is to disable EPT (Extended Page Tables).
+
+For both MDS and L1TF, if the guest vCPU is configured to not trust each
+other (by tagging separately), then the guest to guest attacks would go away.
+Or it could be a system admin policy which considers guest to guest attacks as
+a guest problem.
+
+Another approach to resolve these would be to make every untrusted task on the
+system to not trust every other untrusted task. While this could reduce
+parallelism of the untrusted tasks, it would still solve the above issues while
+allowing system processes (trusted tasks) to share a core.
+
+3. Protecting the kernel (IRQ, syscall, VMEXIT)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Unfortunately, core scheduling does not protect kernel contexts running on
+sibling hyperthreads from one another. Prototypes of mitigations have been posted
+to LKML to solve this, but it is debatable whether such windows are practically
+exploitable, and whether the performance overhead of the prototypes are worth
+it (not to mention, the added code complexity).
+
+Other Use cases
+---------------
+The main use case for Core scheduling is mitigating the cross-HT vulnerabilities
+with SMT enabled. There are other use cases where this feature could be used:
+
+- Isolating tasks that needs a whole core: Examples include realtime tasks, tasks
+ that uses SIMD instructions etc.
+- Gang scheduling: Requirements for a group of tasks that needs to be scheduled
+ together could also be realized using core scheduling. One example is vCPUs of
+ a VM.
diff --git a/Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst b/Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst
new file mode 100644
index 000000000000..875616d675fe
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/cross-thread-rsb.rst
@@ -0,0 +1,91 @@
+
+.. SPDX-License-Identifier: GPL-2.0
+
+Cross-Thread Return Address Predictions
+=======================================
+
+Certain AMD and Hygon processors are subject to a cross-thread return address
+predictions vulnerability. When running in SMT mode and one sibling thread
+transitions out of C0 state, the other sibling thread could use return target
+predictions from the sibling thread that transitioned out of C0.
+
+The Spectre v2 mitigations protect the Linux kernel, as it fills the return
+address prediction entries with safe targets when context switching to the idle
+thread. However, KVM does allow a VMM to prevent exiting guest mode when
+transitioning out of C0. This could result in a guest-controlled return target
+being consumed by the sibling thread.
+
+Affected processors
+-------------------
+
+The following CPUs are vulnerable:
+
+ - AMD Family 17h processors
+ - Hygon Family 18h processors
+
+Related CVEs
+------------
+
+The following CVE entry is related to this issue:
+
+ ============== =======================================
+ CVE-2022-27672 Cross-Thread Return Address Predictions
+ ============== =======================================
+
+Problem
+-------
+
+Affected SMT-capable processors support 1T and 2T modes of execution when SMT
+is enabled. In 2T mode, both threads in a core are executing code. For the
+processor core to enter 1T mode, it is required that one of the threads
+requests to transition out of the C0 state. This can be communicated with the
+HLT instruction or with an MWAIT instruction that requests non-C0.
+When the thread re-enters the C0 state, the processor transitions back
+to 2T mode, assuming the other thread is also still in C0 state.
+
+In affected processors, the return address predictor (RAP) is partitioned
+depending on the SMT mode. For instance, in 2T mode each thread uses a private
+16-entry RAP, but in 1T mode, the active thread uses a 32-entry RAP. Upon
+transition between 1T/2T mode, the RAP contents are not modified but the RAP
+pointers (which control the next return target to use for predictions) may
+change. This behavior may result in return targets from one SMT thread being
+used by RET predictions in the sibling thread following a 1T/2T switch. In
+particular, a RET instruction executed immediately after a transition to 1T may
+use a return target from the thread that just became idle. In theory, this
+could lead to information disclosure if the return targets used do not come
+from trustworthy code.
+
+Attack scenarios
+----------------
+
+An attack can be mounted on affected processors by performing a series of CALL
+instructions with targeted return locations and then transitioning out of C0
+state.
+
+Mitigation mechanism
+--------------------
+
+Before entering idle state, the kernel context switches to the idle thread. The
+context switch fills the RAP entries (referred to as the RSB in Linux) with safe
+targets by performing a sequence of CALL instructions.
+
+Prevent a guest VM from directly putting the processor into an idle state by
+intercepting HLT and MWAIT instructions.
+
+Both mitigations are required to fully address this issue.
+
+Mitigation control on the kernel command line
+---------------------------------------------
+
+Use existing Spectre v2 mitigations that will fill the RSB on context switch.
+
+Mitigation control for KVM - module parameter
+---------------------------------------------
+
+By default, the KVM hypervisor mitigates this issue by intercepting guest
+attempts to transition out of C0. A VMM can use the KVM_CAP_X86_DISABLE_EXITS
+capability to override those interceptions, but since this is not common, the
+mitigation that covers this path is not enabled by default.
+
+The mitigation for the KVM_CAP_X86_DISABLE_EXITS capability can be turned on
+using the boolean module parameter mitigate_smt_rsb, e.g. ``kvm.mitigate_smt_rsb=1``.
diff --git a/Documentation/admin-guide/hw-vuln/gather_data_sampling.rst b/Documentation/admin-guide/hw-vuln/gather_data_sampling.rst
new file mode 100644
index 000000000000..264bfa937f7d
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/gather_data_sampling.rst
@@ -0,0 +1,109 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+GDS - Gather Data Sampling
+==========================
+
+Gather Data Sampling is a hardware vulnerability which allows unprivileged
+speculative access to data which was previously stored in vector registers.
+
+Problem
+-------
+When a gather instruction performs loads from memory, different data elements
+are merged into the destination vector register. However, when a gather
+instruction that is transiently executed encounters a fault, stale data from
+architectural or internal vector registers may get transiently forwarded to the
+destination vector register instead. This will allow a malicious attacker to
+infer stale data using typical side channel techniques like cache timing
+attacks. GDS is a purely sampling-based attack.
+
+The attacker uses gather instructions to infer the stale vector register data.
+The victim does not need to do anything special other than use the vector
+registers. The victim does not need to use gather instructions to be
+vulnerable.
+
+Because the buffers are shared between Hyper-Threads cross Hyper-Thread attacks
+are possible.
+
+Attack scenarios
+----------------
+Without mitigation, GDS can infer stale data across virtually all
+permission boundaries:
+
+ Non-enclaves can infer SGX enclave data
+ Userspace can infer kernel data
+ Guests can infer data from hosts
+ Guest can infer guest from other guests
+ Users can infer data from other users
+
+Because of this, it is important to ensure that the mitigation stays enabled in
+lower-privilege contexts like guests and when running outside SGX enclaves.
+
+The hardware enforces the mitigation for SGX. Likewise, VMMs should ensure
+that guests are not allowed to disable the GDS mitigation. If a host erred and
+allowed this, a guest could theoretically disable GDS mitigation, mount an
+attack, and re-enable it.
+
+Mitigation mechanism
+--------------------
+This issue is mitigated in microcode. The microcode defines the following new
+bits:
+
+ ================================ === ============================
+ IA32_ARCH_CAPABILITIES[GDS_CTRL] R/O Enumerates GDS vulnerability
+ and mitigation support.
+ IA32_ARCH_CAPABILITIES[GDS_NO] R/O Processor is not vulnerable.
+ IA32_MCU_OPT_CTRL[GDS_MITG_DIS] R/W Disables the mitigation
+ 0 by default.
+ IA32_MCU_OPT_CTRL[GDS_MITG_LOCK] R/W Locks GDS_MITG_DIS=0. Writes
+ to GDS_MITG_DIS are ignored
+ Can't be cleared once set.
+ ================================ === ============================
+
+GDS can also be mitigated on systems that don't have updated microcode by
+disabling AVX. This can be done by setting gather_data_sampling="force" or
+"clearcpuid=avx" on the kernel command-line.
+
+If used, these options will disable AVX use by turning off XSAVE YMM support.
+However, the processor will still enumerate AVX support. Userspace that
+does not follow proper AVX enumeration to check both AVX *and* XSAVE YMM
+support will break.
+
+Mitigation control on the kernel command line
+---------------------------------------------
+The mitigation can be disabled by setting "gather_data_sampling=off" or
+"mitigations=off" on the kernel command line. Not specifying either will default
+to the mitigation being enabled. Specifying "gather_data_sampling=force" will
+use the microcode mitigation when available or disable AVX on affected systems
+where the microcode hasn't been updated to include the mitigation.
+
+GDS System Information
+------------------------
+The kernel provides vulnerability status information through sysfs. For
+GDS this can be accessed by the following sysfs file:
+
+/sys/devices/system/cpu/vulnerabilities/gather_data_sampling
+
+The possible values contained in this file are:
+
+ ============================== =============================================
+ Not affected Processor not vulnerable.
+ Vulnerable Processor vulnerable and mitigation disabled.
+ Vulnerable: No microcode Processor vulnerable and microcode is missing
+ mitigation.
+ Mitigation: AVX disabled,
+ no microcode Processor is vulnerable and microcode is missing
+ mitigation. AVX disabled as mitigation.
+ Mitigation: Microcode Processor is vulnerable and mitigation is in
+ effect.
+ Mitigation: Microcode (locked) Processor is vulnerable and mitigation is in
+ effect and cannot be disabled.
+ Unknown: Dependent on
+ hypervisor status Running on a virtual guest processor that is
+ affected but with no way to know if host
+ processor is mitigated or vulnerable.
+ ============================== =============================================
+
+GDS Default mitigation
+----------------------
+The updated microcode will enable the mitigation by default. The kernel's
+default action is to leave the mitigation enabled.
diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst
index 0795e3c2643f..de99caabf65a 100644
--- a/Documentation/admin-guide/hw-vuln/index.rst
+++ b/Documentation/admin-guide/hw-vuln/index.rst
@@ -13,4 +13,11 @@ are configurable at compile, boot or run time.
l1tf
mds
tsx_async_abort
- multihit.rst
+ multihit
+ special-register-buffer-data-sampling
+ core-scheduling
+ l1d_flush
+ processor_mmio_stale_data
+ cross-thread-rsb
+ srso
+ gather_data_sampling
diff --git a/Documentation/admin-guide/hw-vuln/l1d_flush.rst b/Documentation/admin-guide/hw-vuln/l1d_flush.rst
new file mode 100644
index 000000000000..210020bc3f56
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/l1d_flush.rst
@@ -0,0 +1,69 @@
+L1D Flushing
+============
+
+With an increasing number of vulnerabilities being reported around data
+leaks from the Level 1 Data cache (L1D) the kernel provides an opt-in
+mechanism to flush the L1D cache on context switch.
+
+This mechanism can be used to address e.g. CVE-2020-0550. For applications
+the mechanism keeps them safe from vulnerabilities, related to leaks
+(snooping of) from the L1D cache.
+
+
+Related CVEs
+------------
+The following CVEs can be addressed by this
+mechanism
+
+ ============= ======================== ==================
+ CVE-2020-0550 Improper Data Forwarding OS related aspects
+ ============= ======================== ==================
+
+Usage Guidelines
+----------------
+
+Please see document: :ref:`Documentation/userspace-api/spec_ctrl.rst
+<set_spec_ctrl>` for details.
+
+**NOTE**: The feature is disabled by default, applications need to
+specifically opt into the feature to enable it.
+
+Mitigation
+----------
+
+When PR_SET_L1D_FLUSH is enabled for a task a flush of the L1D cache is
+performed when the task is scheduled out and the incoming task belongs to a
+different process and therefore to a different address space.
+
+If the underlying CPU supports L1D flushing in hardware, the hardware
+mechanism is used, software fallback for the mitigation, is not supported.
+
+Mitigation control on the kernel command line
+---------------------------------------------
+
+The kernel command line allows to control the L1D flush mitigations at boot
+time with the option "l1d_flush=". The valid arguments for this option are:
+
+ ============ =============================================================
+ on Enables the prctl interface, applications trying to use
+ the prctl() will fail with an error if l1d_flush is not
+ enabled
+ ============ =============================================================
+
+By default the mechanism is disabled.
+
+Limitations
+-----------
+
+The mechanism does not mitigate L1D data leaks between tasks belonging to
+different processes which are concurrently executing on sibling threads of
+a physical CPU core when SMT is enabled on the system.
+
+This can be addressed by controlled placement of processes on physical CPU
+cores or by disabling SMT. See the relevant chapter in the L1TF mitigation
+document: :ref:`Documentation/admin-guide/hw-vuln/l1tf.rst <smt_control>`.
+
+**NOTE** : The opt-in of a task for L1D flushing works only when the task's
+affinity is limited to cores running in non-SMT mode. If a task which
+requested L1D flushing is scheduled on a SMT-enabled core the kernel sends
+a SIGBUS to the task.
diff --git a/Documentation/admin-guide/hw-vuln/l1tf.rst b/Documentation/admin-guide/hw-vuln/l1tf.rst
index f83212fae4d5..3eeeb488d955 100644
--- a/Documentation/admin-guide/hw-vuln/l1tf.rst
+++ b/Documentation/admin-guide/hw-vuln/l1tf.rst
@@ -268,7 +268,7 @@ Guest mitigation mechanisms
/proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
available at:
- https://www.kernel.org/doc/Documentation/IRQ-affinity.txt
+ https://www.kernel.org/doc/Documentation/core-api/irq/irq-affinity.rst
.. _smt_control:
diff --git a/Documentation/admin-guide/hw-vuln/mds.rst b/Documentation/admin-guide/hw-vuln/mds.rst
index 2d19c9f4c1fe..48c7b0b72aed 100644
--- a/Documentation/admin-guide/hw-vuln/mds.rst
+++ b/Documentation/admin-guide/hw-vuln/mds.rst
@@ -58,14 +58,14 @@ Because the buffers are potentially shared between Hyper-Threads cross
Hyper-Thread attacks are possible.
Deeper technical information is available in the MDS specific x86
-architecture section: :ref:`Documentation/x86/mds.rst <mds>`.
+architecture section: :ref:`Documentation/arch/x86/mds.rst <mds>`.
Attack scenarios
----------------
-Attacks against the MDS vulnerabilities can be mounted from malicious non
-priviledged user space applications running on hosts or guest. Malicious
+Attacks against the MDS vulnerabilities can be mounted from malicious non-
+privileged user space applications running on hosts or guest. Malicious
guest OSes can obviously mount attacks as well.
Contrary to other speculation based vulnerabilities the MDS vulnerability
@@ -102,9 +102,19 @@ The possible values in this file are:
* - 'Vulnerable'
- The processor is vulnerable, but no mitigation enabled
* - 'Vulnerable: Clear CPU buffers attempted, no microcode'
- - The processor is vulnerable but microcode is not updated.
-
- The mitigation is enabled on a best effort basis. See :ref:`vmwerv`
+ - The processor is vulnerable but microcode is not updated. The
+ mitigation is enabled on a best effort basis.
+
+ If the processor is vulnerable but the availability of the microcode
+ based mitigation mechanism is not advertised via CPUID, the kernel
+ selects a best effort mitigation mode. This mode invokes the mitigation
+ instructions without a guarantee that they clear the CPU buffers.
+
+ This is done to address virtualization scenarios where the host has the
+ microcode update applied, but the hypervisor is not yet updated to
+ expose the CPUID to the guest. If the host has updated microcode the
+ protection takes effect; otherwise a few CPU cycles are wasted
+ pointlessly.
* - 'Mitigation: Clear CPU buffers'
- The processor is vulnerable and the CPU buffer clearing mitigation is
enabled.
@@ -119,24 +129,6 @@ to the above information:
'SMT Host state unknown' Kernel runs in a VM, Host SMT state unknown
======================== ============================================
-.. _vmwerv:
-
-Best effort mitigation mode
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
- If the processor is vulnerable, but the availability of the microcode based
- mitigation mechanism is not advertised via CPUID the kernel selects a best
- effort mitigation mode. This mode invokes the mitigation instructions
- without a guarantee that they clear the CPU buffers.
-
- This is done to address virtualization scenarios where the host has the
- microcode update applied, but the hypervisor is not yet updated to expose
- the CPUID to the guest. If the host has updated microcode the protection
- takes effect otherwise a few cpu cycles are wasted pointlessly.
-
- The state in the mds sysfs file reflects this situation accordingly.
-
-
Mitigation mechanism
-------------------------
diff --git a/Documentation/admin-guide/hw-vuln/multihit.rst b/Documentation/admin-guide/hw-vuln/multihit.rst
index ba9988d8bce5..140e4cec38c3 100644
--- a/Documentation/admin-guide/hw-vuln/multihit.rst
+++ b/Documentation/admin-guide/hw-vuln/multihit.rst
@@ -80,6 +80,10 @@ The possible values in this file are:
- The processor is not vulnerable.
* - KVM: Mitigation: Split huge pages
- Software changes mitigate this issue.
+ * - KVM: Mitigation: VMX unsupported
+ - KVM is not vulnerable because Virtual Machine Extensions (VMX) is not supported.
+ * - KVM: Mitigation: VMX disabled
+ - KVM is not vulnerable because Virtual Machine Extensions (VMX) is disabled.
* - KVM: Vulnerable
- The processor is vulnerable, but no mitigation enabled
diff --git a/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
new file mode 100644
index 000000000000..1302fd1b55e8
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
@@ -0,0 +1,271 @@
+=========================================
+Processor MMIO Stale Data Vulnerabilities
+=========================================
+
+Processor MMIO Stale Data Vulnerabilities are a class of memory-mapped I/O
+(MMIO) vulnerabilities that can expose data. The sequences of operations for
+exposing data range from simple to very complex. Because most of the
+vulnerabilities require the attacker to have access to MMIO, many environments
+are not affected. System environments using virtualization where MMIO access is
+provided to untrusted guests may need mitigation. These vulnerabilities are
+not transient execution attacks. However, these vulnerabilities may propagate
+stale data into core fill buffers where the data can subsequently be inferred
+by an unmitigated transient execution attack. Mitigation for these
+vulnerabilities includes a combination of microcode update and software
+changes, depending on the platform and usage model. Some of these mitigations
+are similar to those used to mitigate Microarchitectural Data Sampling (MDS) or
+those used to mitigate Special Register Buffer Data Sampling (SRBDS).
+
+Data Propagators
+================
+Propagators are operations that result in stale data being copied or moved from
+one microarchitectural buffer or register to another. Processor MMIO Stale Data
+Vulnerabilities are operations that may result in stale data being directly
+read into an architectural, software-visible state or sampled from a buffer or
+register.
+
+Fill Buffer Stale Data Propagator (FBSDP)
+-----------------------------------------
+Stale data may propagate from fill buffers (FB) into the non-coherent portion
+of the uncore on some non-coherent writes. Fill buffer propagation by itself
+does not make stale data architecturally visible. Stale data must be propagated
+to a location where it is subject to reading or sampling.
+
+Sideband Stale Data Propagator (SSDP)
+-------------------------------------
+The sideband stale data propagator (SSDP) is limited to the client (including
+Intel Xeon server E3) uncore implementation. The sideband response buffer is
+shared by all client cores. For non-coherent reads that go to sideband
+destinations, the uncore logic returns 64 bytes of data to the core, including
+both requested data and unrequested stale data, from a transaction buffer and
+the sideband response buffer. As a result, stale data from the sideband
+response and transaction buffers may now reside in a core fill buffer.
+
+Primary Stale Data Propagator (PSDP)
+------------------------------------
+The primary stale data propagator (PSDP) is limited to the client (including
+Intel Xeon server E3) uncore implementation. Similar to the sideband response
+buffer, the primary response buffer is shared by all client cores. For some
+processors, MMIO primary reads will return 64 bytes of data to the core fill
+buffer including both requested data and unrequested stale data. This is
+similar to the sideband stale data propagator.
+
+Vulnerabilities
+===============
+Device Register Partial Write (DRPW) (CVE-2022-21166)
+-----------------------------------------------------
+Some endpoint MMIO registers incorrectly handle writes that are smaller than
+the register size. Instead of aborting the write or only copying the correct
+subset of bytes (for example, 2 bytes for a 2-byte write), more bytes than
+specified by the write transaction may be written to the register. On
+processors affected by FBSDP, this may expose stale data from the fill buffers
+of the core that created the write transaction.
+
+Shared Buffers Data Sampling (SBDS) (CVE-2022-21125)
+----------------------------------------------------
+After propagators may have moved data around the uncore and copied stale data
+into client core fill buffers, processors affected by MFBDS can leak data from
+the fill buffer. It is limited to the client (including Intel Xeon server E3)
+uncore implementation.
+
+Shared Buffers Data Read (SBDR) (CVE-2022-21123)
+------------------------------------------------
+It is similar to Shared Buffer Data Sampling (SBDS) except that the data is
+directly read into the architectural software-visible state. It is limited to
+the client (including Intel Xeon server E3) uncore implementation.
+
+Affected Processors
+===================
+Not all the CPUs are affected by all the variants. For instance, most
+processors for the server market (excluding Intel Xeon E3 processors) are
+impacted by only Device Register Partial Write (DRPW).
+
+Below is the list of affected Intel processors [#f1]_:
+
+ =================== ============ =========
+ Common name Family_Model Steppings
+ =================== ============ =========
+ HASWELL_X 06_3FH 2,4
+ SKYLAKE_L 06_4EH 3
+ BROADWELL_X 06_4FH All
+ SKYLAKE_X 06_55H 3,4,6,7,11
+ BROADWELL_D 06_56H 3,4,5
+ SKYLAKE 06_5EH 3
+ ICELAKE_X 06_6AH 4,5,6
+ ICELAKE_D 06_6CH 1
+ ICELAKE_L 06_7EH 5
+ ATOM_TREMONT_D 06_86H All
+ LAKEFIELD 06_8AH 1
+ KABYLAKE_L 06_8EH 9 to 12
+ ATOM_TREMONT 06_96H 1
+ ATOM_TREMONT_L 06_9CH 0
+ KABYLAKE 06_9EH 9 to 13
+ COMETLAKE 06_A5H 2,3,5
+ COMETLAKE_L 06_A6H 0,1
+ ROCKETLAKE 06_A7H 1
+ =================== ============ =========
+
+If a CPU is in the affected processor list, but not affected by a variant, it
+is indicated by new bits in MSR IA32_ARCH_CAPABILITIES. As described in a later
+section, mitigation largely remains the same for all the variants, i.e. to
+clear the CPU fill buffers via VERW instruction.
+
+New bits in MSRs
+================
+Newer processors and microcode update on existing affected processors added new
+bits to IA32_ARCH_CAPABILITIES MSR. These bits can be used to enumerate
+specific variants of Processor MMIO Stale Data vulnerabilities and mitigation
+capability.
+
+MSR IA32_ARCH_CAPABILITIES
+--------------------------
+Bit 13 - SBDR_SSDP_NO - When set, processor is not affected by either the
+ Shared Buffers Data Read (SBDR) vulnerability or the sideband stale
+ data propagator (SSDP).
+Bit 14 - FBSDP_NO - When set, processor is not affected by the Fill Buffer
+ Stale Data Propagator (FBSDP).
+Bit 15 - PSDP_NO - When set, processor is not affected by Primary Stale Data
+ Propagator (PSDP).
+Bit 17 - FB_CLEAR - When set, VERW instruction will overwrite CPU fill buffer
+ values as part of MD_CLEAR operations. Processors that do not
+ enumerate MDS_NO (meaning they are affected by MDS) but that do
+ enumerate support for both L1D_FLUSH and MD_CLEAR implicitly enumerate
+ FB_CLEAR as part of their MD_CLEAR support.
+Bit 18 - FB_CLEAR_CTRL - Processor supports read and write to MSR
+ IA32_MCU_OPT_CTRL[FB_CLEAR_DIS]. On such processors, the FB_CLEAR_DIS
+ bit can be set to cause the VERW instruction to not perform the
+ FB_CLEAR action. Not all processors that support FB_CLEAR will support
+ FB_CLEAR_CTRL.
+
+MSR IA32_MCU_OPT_CTRL
+---------------------
+Bit 3 - FB_CLEAR_DIS - When set, VERW instruction does not perform the FB_CLEAR
+action. This may be useful to reduce the performance impact of FB_CLEAR in
+cases where system software deems it warranted (for example, when performance
+is more critical, or the untrusted software has no MMIO access). Note that
+FB_CLEAR_DIS has no impact on enumeration (for example, it does not change
+FB_CLEAR or MD_CLEAR enumeration) and it may not be supported on all processors
+that enumerate FB_CLEAR.
+
+Mitigation
+==========
+Like MDS, all variants of Processor MMIO Stale Data vulnerabilities have the
+same mitigation strategy to force the CPU to clear the affected buffers before
+an attacker can extract the secrets.
+
+This is achieved by using the otherwise unused and obsolete VERW instruction in
+combination with a microcode update. The microcode clears the affected CPU
+buffers when the VERW instruction is executed.
+
+Kernel reuses the MDS function to invoke the buffer clearing:
+
+ mds_clear_cpu_buffers()
+
+On MDS affected CPUs, the kernel already invokes CPU buffer clear on
+kernel/userspace, hypervisor/guest and C-state (idle) transitions. No
+additional mitigation is needed on such CPUs.
+
+For CPUs not affected by MDS or TAA, mitigation is needed only for the attacker
+with MMIO capability. Therefore, VERW is not required for kernel/userspace. For
+virtualization case, VERW is only needed at VMENTER for a guest with MMIO
+capability.
+
+Mitigation points
+-----------------
+Return to user space
+^^^^^^^^^^^^^^^^^^^^
+Same mitigation as MDS when affected by MDS/TAA, otherwise no mitigation
+needed.
+
+C-State transition
+^^^^^^^^^^^^^^^^^^
+Control register writes by CPU during C-state transition can propagate data
+from fill buffer to uncore buffers. Execute VERW before C-state transition to
+clear CPU fill buffers.
+
+Guest entry point
+^^^^^^^^^^^^^^^^^
+Same mitigation as MDS when processor is also affected by MDS/TAA, otherwise
+execute VERW at VMENTER only for MMIO capable guests. On CPUs not affected by
+MDS/TAA, guest without MMIO access cannot extract secrets using Processor MMIO
+Stale Data vulnerabilities, so there is no need to execute VERW for such guests.
+
+Mitigation control on the kernel command line
+---------------------------------------------
+The kernel command line allows to control the Processor MMIO Stale Data
+mitigations at boot time with the option "mmio_stale_data=". The valid
+arguments for this option are:
+
+ ========== =================================================================
+ full If the CPU is vulnerable, enable mitigation; CPU buffer clearing
+ on exit to userspace and when entering a VM. Idle transitions are
+ protected as well. It does not automatically disable SMT.
+ full,nosmt Same as full, with SMT disabled on vulnerable CPUs. This is the
+ complete mitigation.
+ off Disables mitigation completely.
+ ========== =================================================================
+
+If the CPU is affected and mmio_stale_data=off is not supplied on the kernel
+command line, then the kernel selects the appropriate mitigation.
+
+Mitigation status information
+-----------------------------
+The Linux kernel provides a sysfs interface to enumerate the current
+vulnerability status of the system: whether the system is vulnerable, and
+which mitigations are active. The relevant sysfs file is:
+
+ /sys/devices/system/cpu/vulnerabilities/mmio_stale_data
+
+The possible values in this file are:
+
+ .. list-table::
+
+ * - 'Not affected'
+ - The processor is not vulnerable
+ * - 'Vulnerable'
+ - The processor is vulnerable, but no mitigation enabled
+ * - 'Vulnerable: Clear CPU buffers attempted, no microcode'
+ - The processor is vulnerable but microcode is not updated. The
+ mitigation is enabled on a best effort basis.
+
+ If the processor is vulnerable but the availability of the microcode
+ based mitigation mechanism is not advertised via CPUID, the kernel
+ selects a best effort mitigation mode. This mode invokes the mitigation
+ instructions without a guarantee that they clear the CPU buffers.
+
+ This is done to address virtualization scenarios where the host has the
+ microcode update applied, but the hypervisor is not yet updated to
+ expose the CPUID to the guest. If the host has updated microcode the
+ protection takes effect; otherwise a few CPU cycles are wasted
+ pointlessly.
+ * - 'Mitigation: Clear CPU buffers'
+ - The processor is vulnerable and the CPU buffer clearing mitigation is
+ enabled.
+ * - 'Unknown: No mitigations'
+ - The processor vulnerability status is unknown because it is
+ out of Servicing period. Mitigation is not attempted.
+
+Definitions:
+------------
+
+Servicing period: The process of providing functional and security updates to
+Intel processors or platforms, utilizing the Intel Platform Update (IPU)
+process or other similar mechanisms.
+
+End of Servicing Updates (ESU): ESU is the date at which Intel will no
+longer provide Servicing, such as through IPU or other similar update
+processes. ESU dates will typically be aligned to end of quarter.
+
+If the processor is vulnerable then the following information is appended to
+the above information:
+
+ ======================== ===========================================
+ 'SMT vulnerable' SMT is enabled
+ 'SMT disabled' SMT is disabled
+ 'SMT Host state unknown' Kernel runs in a VM, Host SMT state unknown
+ ======================== ===========================================
+
+References
+----------
+.. [#f1] Affected Processors
+ https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html
diff --git a/Documentation/admin-guide/hw-vuln/special-register-buffer-data-sampling.rst b/Documentation/admin-guide/hw-vuln/special-register-buffer-data-sampling.rst
new file mode 100644
index 000000000000..966c9b3296ea
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/special-register-buffer-data-sampling.rst
@@ -0,0 +1,150 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+SRBDS - Special Register Buffer Data Sampling
+=============================================
+
+SRBDS is a hardware vulnerability that allows MDS
+Documentation/admin-guide/hw-vuln/mds.rst techniques to
+infer values returned from special register accesses. Special register
+accesses are accesses to off core registers. According to Intel's evaluation,
+the special register reads that have a security expectation of privacy are
+RDRAND, RDSEED and SGX EGETKEY.
+
+When RDRAND, RDSEED and EGETKEY instructions are used, the data is moved
+to the core through the special register mechanism that is susceptible
+to MDS attacks.
+
+Affected processors
+-------------------
+Core models (desktop, mobile, Xeon-E3) that implement RDRAND and/or RDSEED may
+be affected.
+
+A processor is affected by SRBDS if its Family_Model and stepping is
+in the following list, with the exception of the listed processors
+exporting MDS_NO while Intel TSX is available yet not enabled. The
+latter class of processors are only affected when Intel TSX is enabled
+by software using TSX_CTRL_MSR otherwise they are not affected.
+
+ ============= ============ ========
+ common name Family_Model Stepping
+ ============= ============ ========
+ IvyBridge 06_3AH All
+
+ Haswell 06_3CH All
+ Haswell_L 06_45H All
+ Haswell_G 06_46H All
+
+ Broadwell_G 06_47H All
+ Broadwell 06_3DH All
+
+ Skylake_L 06_4EH All
+ Skylake 06_5EH All
+
+ Kabylake_L 06_8EH <= 0xC
+ Kabylake 06_9EH <= 0xD
+ ============= ============ ========
+
+Related CVEs
+------------
+
+The following CVE entry is related to this SRBDS issue:
+
+ ============== ===== =====================================
+ CVE-2020-0543 SRBDS Special Register Buffer Data Sampling
+ ============== ===== =====================================
+
+Attack scenarios
+----------------
+An unprivileged user can extract values returned from RDRAND and RDSEED
+executed on another core or sibling thread using MDS techniques.
+
+
+Mitigation mechanism
+--------------------
+Intel will release microcode updates that modify the RDRAND, RDSEED, and
+EGETKEY instructions to overwrite secret special register data in the shared
+staging buffer before the secret data can be accessed by another logical
+processor.
+
+During execution of the RDRAND, RDSEED, or EGETKEY instructions, off-core
+accesses from other logical processors will be delayed until the special
+register read is complete and the secret data in the shared staging buffer is
+overwritten.
+
+This has three effects on performance:
+
+#. RDRAND, RDSEED, or EGETKEY instructions have higher latency.
+
+#. Executing RDRAND at the same time on multiple logical processors will be
+ serialized, resulting in an overall reduction in the maximum RDRAND
+ bandwidth.
+
+#. Executing RDRAND, RDSEED or EGETKEY will delay memory accesses from other
+ logical processors that miss their core caches, with an impact similar to
+ legacy locked cache-line-split accesses.
+
+The microcode updates provide an opt-out mechanism (RNGDS_MITG_DIS) to disable
+the mitigation for RDRAND and RDSEED instructions executed outside of Intel
+Software Guard Extensions (Intel SGX) enclaves. On logical processors that
+disable the mitigation using this opt-out mechanism, RDRAND and RDSEED do not
+take longer to execute and do not impact performance of sibling logical
+processors memory accesses. The opt-out mechanism does not affect Intel SGX
+enclaves (including execution of RDRAND or RDSEED inside an enclave, as well
+as EGETKEY execution).
+
+IA32_MCU_OPT_CTRL MSR Definition
+--------------------------------
+Along with the mitigation for this issue, Intel added a new thread-scope
+IA32_MCU_OPT_CTRL MSR, (address 0x123). The presence of this MSR and
+RNGDS_MITG_DIS (bit 0) is enumerated by CPUID.(EAX=07H,ECX=0).EDX[SRBDS_CTRL =
+9]==1. This MSR is introduced through the microcode update.
+
+Setting IA32_MCU_OPT_CTRL[0] (RNGDS_MITG_DIS) to 1 for a logical processor
+disables the mitigation for RDRAND and RDSEED executed outside of an Intel SGX
+enclave on that logical processor. Opting out of the mitigation for a
+particular logical processor does not affect the RDRAND and RDSEED mitigations
+for other logical processors.
+
+Note that inside of an Intel SGX enclave, the mitigation is applied regardless
+of the value of RNGDS_MITG_DS.
+
+Mitigation control on the kernel command line
+---------------------------------------------
+The kernel command line allows control over the SRBDS mitigation at boot time
+with the option "srbds=". The option for this is:
+
+ ============= =============================================================
+ off This option disables SRBDS mitigation for RDRAND and RDSEED on
+ affected platforms.
+ ============= =============================================================
+
+SRBDS System Information
+------------------------
+The Linux kernel provides vulnerability status information through sysfs. For
+SRBDS this can be accessed by the following sysfs file:
+/sys/devices/system/cpu/vulnerabilities/srbds
+
+The possible values contained in this file are:
+
+ ============================== =============================================
+ Not affected Processor not vulnerable
+ Vulnerable Processor vulnerable and mitigation disabled
+ Vulnerable: No microcode Processor vulnerable and microcode is missing
+ mitigation
+ Mitigation: Microcode Processor is vulnerable and mitigation is in
+ effect.
+ Mitigation: TSX disabled Processor is only vulnerable when TSX is
+ enabled while this system was booted with TSX
+ disabled.
+ Unknown: Dependent on
+ hypervisor status Running on virtual guest processor that is
+ affected but with no way to know if host
+ processor is mitigated or vulnerable.
+ ============================== =============================================
+
+SRBDS Default mitigation
+------------------------
+This new microcode serializes processor access during execution of RDRAND,
+RDSEED ensures that the shared buffer is overwritten before it is released for
+reuse. Use the "srbds=off" kernel command line to disable the mitigation for
+RDRAND and RDSEED.
diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index e05e581af5cf..32a8893e5617 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -60,8 +60,8 @@ privileged data touched during the speculative execution.
Spectre variant 1 attacks take advantage of speculative execution of
conditional branches, while Spectre variant 2 attacks use speculative
execution of indirect branches to leak privileged memory.
-See :ref:`[1] <spec_ref1>` :ref:`[5] <spec_ref5>` :ref:`[7] <spec_ref7>`
-:ref:`[10] <spec_ref10>` :ref:`[11] <spec_ref11>`.
+See :ref:`[1] <spec_ref1>` :ref:`[5] <spec_ref5>` :ref:`[6] <spec_ref6>`
+:ref:`[7] <spec_ref7>` :ref:`[10] <spec_ref10>` :ref:`[11] <spec_ref11>`.
Spectre variant 1 (Bounds Check Bypass)
---------------------------------------
@@ -131,6 +131,19 @@ steer its indirect branch speculations to gadget code, and measure the
speculative execution's side effects left in level 1 cache to infer the
victim's data.
+Yet another variant 2 attack vector is for the attacker to poison the
+Branch History Buffer (BHB) to speculatively steer an indirect branch
+to a specific Branch Target Buffer (BTB) entry, even if the entry isn't
+associated with the source address of the indirect branch. Specifically,
+the BHB might be shared across privilege levels even in the presence of
+Enhanced IBRS.
+
+Currently the only known real-world BHB attack vector is via
+unprivileged eBPF. Therefore, it's highly recommended to not enable
+unprivileged eBPF, especially when eIBRS is used (without retpolines).
+For a full mitigation against BHB attacks, it's recommended to use
+retpolines (or eIBRS combined with retpolines).
+
Attack scenarios
----------------
@@ -364,13 +377,15 @@ The possible values in this file are:
- Kernel status:
- ==================================== =================================
- 'Not affected' The processor is not vulnerable
- 'Vulnerable' Vulnerable, no mitigation
- 'Mitigation: Full generic retpoline' Software-focused mitigation
- 'Mitigation: Full AMD retpoline' AMD-specific software mitigation
- 'Mitigation: Enhanced IBRS' Hardware-focused mitigation
- ==================================== =================================
+ ======================================== =================================
+ 'Not affected' The processor is not vulnerable
+ 'Mitigation: None' Vulnerable, no mitigation
+ 'Mitigation: Retpolines' Use Retpoline thunks
+ 'Mitigation: LFENCE' Use LFENCE instructions
+ 'Mitigation: Enhanced IBRS' Hardware-focused mitigation
+ 'Mitigation: Enhanced IBRS + Retpolines' Hardware-focused + Retpolines
+ 'Mitigation: Enhanced IBRS + LFENCE' Hardware-focused + LFENCE
+ ======================================== =================================
- Firmware status: Show if Indirect Branch Restricted Speculation (IBRS) is
used to protect against Spectre variant 2 attacks when calling firmware (x86 only).
@@ -407,6 +422,14 @@ The possible values in this file are:
'RSB filling' Protection of RSB on context switch enabled
============= ===========================================
+ - EIBRS Post-barrier Return Stack Buffer (PBRSB) protection status:
+
+ =========================== =======================================================
+ 'PBRSB-eIBRS: SW sequence' CPU is affected and protection of RSB on VMEXIT enabled
+ 'PBRSB-eIBRS: Vulnerable' CPU is vulnerable
+ 'PBRSB-eIBRS: Not affected' CPU is not affected by PBRSB
+ =========================== =======================================================
+
Full mitigation might require a microcode update from the CPU
vendor. When the necessary microcode is not available, the kernel will
report vulnerability.
@@ -456,8 +479,19 @@ Spectre variant 2
On Intel Skylake-era systems the mitigation covers most, but not all,
cases. See :ref:`[3] <spec_ref3>` for more details.
- On CPUs with hardware mitigation for Spectre variant 2 (e.g. Enhanced
- IBRS on x86), retpoline is automatically disabled at run time.
+ On CPUs with hardware mitigation for Spectre variant 2 (e.g. IBRS
+ or enhanced IBRS on x86), retpoline is automatically disabled at run time.
+
+ Systems which support enhanced IBRS (eIBRS) enable IBRS protection once at
+ boot, by setting the IBRS bit, and they're automatically protected against
+ Spectre v2 variant attacks.
+
+ On Intel's enhanced IBRS systems, this includes cross-thread branch target
+ injections on SMT systems (STIBP). In other words, Intel eIBRS enables
+ STIBP, too.
+
+ AMD Automatic IBRS does not protect userspace, and Legacy IBRS systems clear
+ the IBRS bit on exit to userspace, therefore both explicitly enable STIBP.
The retpoline mitigation is turned on by default on vulnerable
CPUs. It can be forced on or off by the administrator
@@ -468,7 +502,7 @@ Spectre variant 2
before invoking any firmware code to prevent Spectre variant 2 exploits
using the firmware.
- Using kernel address space randomization (CONFIG_RANDOMIZE_SLAB=y
+ Using kernel address space randomization (CONFIG_RANDOMIZE_BASE=y
and CONFIG_SLAB_FREELIST_RANDOM=y in the kernel configuration) makes
attacks on the kernel generally more difficult.
@@ -481,18 +515,20 @@ Spectre variant 2
For Spectre variant 2 mitigation, individual user programs
can be compiled with return trampolines for indirect branches.
This protects them from consuming poisoned entries in the branch
- target buffer left by malicious software. Alternatively, the
- programs can disable their indirect branch speculation via prctl()
- (See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
+ target buffer left by malicious software.
+
+ On legacy IBRS systems, at return to userspace, implicit STIBP is disabled
+ because the kernel clears the IBRS bit. In this case, the userspace programs
+ can disable indirect branch speculation via prctl() (See
+ :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
On x86, this will turn on STIBP to guard against attacks from the
sibling thread when the user program is running, and use IBPB to
flush the branch target buffer when switching to/from the program.
Restricting indirect branch speculation on a user program will
also prevent the program from launching a variant 2 attack
- on x86. All sand-boxed SECCOMP programs have indirect branch
- speculation restricted by default. Administrators can change
- that behavior via the kernel command line and sysfs control files.
+ on x86. Administrators can change that behavior via the kernel
+ command line and sysfs control files.
See :ref:`spectre_mitigation_control_command_line`.
Programs that disable their indirect branch speculation will have
@@ -584,71 +620,26 @@ kernel command line.
Specific mitigations can also be selected manually:
- retpoline
- replace indirect branches
- retpoline,generic
- google's original retpoline
- retpoline,amd
- AMD-specific minimal thunk
+ retpoline auto pick between generic,lfence
+ retpoline,generic Retpolines
+ retpoline,lfence LFENCE; indirect branch
+ retpoline,amd alias for retpoline,lfence
+ eibrs Enhanced/Auto IBRS
+ eibrs,retpoline Enhanced/Auto IBRS + Retpolines
+ eibrs,lfence Enhanced/Auto IBRS + LFENCE
+ ibrs use IBRS to protect kernel
Not specifying this option is equivalent to
spectre_v2=auto.
-For user space mitigation:
-
- spectre_v2_user=
-
- [X86] Control mitigation of Spectre variant 2
- (indirect branch speculation) vulnerability between
- user space tasks
-
- on
- Unconditionally enable mitigations. Is
- enforced by spectre_v2=on
-
- off
- Unconditionally disable mitigations. Is
- enforced by spectre_v2=off
-
- prctl
- Indirect branch speculation is enabled,
- but mitigation can be enabled via prctl
- per thread. The mitigation control state
- is inherited on fork.
-
- prctl,ibpb
- Like "prctl" above, but only STIBP is
- controlled per thread. IBPB is issued
- always when switching between different user
- space processes.
-
- seccomp
- Same as "prctl" above, but all seccomp
- threads will enable the mitigation unless
- they explicitly opt out.
-
- seccomp,ibpb
- Like "seccomp" above, but only STIBP is
- controlled per thread. IBPB is issued
- always when switching between different
- user space processes.
-
- auto
- Kernel selects the mitigation depending on
- the available CPU features and vulnerability.
-
- Default mitigation:
- If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
-
- Not specifying this option is equivalent to
- spectre_v2_user=auto.
-
In general the kernel by default selects
reasonable mitigations for the current CPU. To
disable Spectre variant 2 mitigations, boot with
spectre_v2=off. Spectre variant 1 mitigations
cannot be disabled.
+For spectre_v2_user see Documentation/admin-guide/kernel-parameters.txt
+
Mitigation selection guide
--------------------------
@@ -674,9 +665,8 @@ Mitigation selection guide
off by disabling their indirect branch speculation when they are run
(See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
This prevents untrusted programs from polluting the branch target
- buffer. All programs running in SECCOMP sandboxes have indirect
- branch speculation restricted by default. This behavior can be
- changed via the kernel command line and sysfs control files. See
+ buffer. This behavior can be changed via the kernel command line
+ and sysfs control files. See
:ref:`spectre_mitigation_control_command_line`.
3. High security mode
@@ -730,7 +720,7 @@ AMD white papers:
.. _spec_ref6:
-[6] `Software techniques for managing speculation on AMD processors <https://developer.amd.com/wp-content/resources/90343-B_SoftwareTechniquesforManagingSpeculation_WP_7-18Update_FNL.pdf>`_.
+[6] `Software techniques for managing speculation on AMD processors <https://developer.amd.com/wp-content/resources/Managing-Speculation-on-AMD-Processors.pdf>`_.
ARM white papers:
diff --git a/Documentation/admin-guide/hw-vuln/srso.rst b/Documentation/admin-guide/hw-vuln/srso.rst
new file mode 100644
index 000000000000..e715bfc09879
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/srso.rst
@@ -0,0 +1,160 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Speculative Return Stack Overflow (SRSO)
+========================================
+
+This is a mitigation for the speculative return stack overflow (SRSO)
+vulnerability found on AMD processors. The mechanism is by now the well
+known scenario of poisoning CPU functional units - the Branch Target
+Buffer (BTB) and Return Address Predictor (RAP) in this case - and then
+tricking the elevated privilege domain (the kernel) into leaking
+sensitive data.
+
+AMD CPUs predict RET instructions using a Return Address Predictor (aka
+Return Address Stack/Return Stack Buffer). In some cases, a non-architectural
+CALL instruction (i.e., an instruction predicted to be a CALL but is
+not actually a CALL) can create an entry in the RAP which may be used
+to predict the target of a subsequent RET instruction.
+
+The specific circumstances that lead to this varies by microarchitecture
+but the concern is that an attacker can mis-train the CPU BTB to predict
+non-architectural CALL instructions in kernel space and use this to
+control the speculative target of a subsequent kernel RET, potentially
+leading to information disclosure via a speculative side-channel.
+
+The issue is tracked under CVE-2023-20569.
+
+Affected processors
+-------------------
+
+AMD Zen, generations 1-4. That is, all families 0x17 and 0x19. Older
+processors have not been investigated.
+
+System information and options
+------------------------------
+
+First of all, it is required that the latest microcode be loaded for
+mitigations to be effective.
+
+The sysfs file showing SRSO mitigation status is:
+
+ /sys/devices/system/cpu/vulnerabilities/spec_rstack_overflow
+
+The possible values in this file are:
+
+ * 'Not affected':
+
+ The processor is not vulnerable
+
+* 'Vulnerable':
+
+ The processor is vulnerable and no mitigations have been applied.
+
+ * 'Vulnerable: No microcode':
+
+ The processor is vulnerable, no microcode extending IBPB
+ functionality to address the vulnerability has been applied.
+
+ * 'Vulnerable: Safe RET, no microcode':
+
+ The "Safe RET" mitigation (see below) has been applied to protect the
+ kernel, but the IBPB-extending microcode has not been applied. User
+ space tasks may still be vulnerable.
+
+ * 'Vulnerable: Microcode, no safe RET':
+
+ Extended IBPB functionality microcode patch has been applied. It does
+ not address User->Kernel and Guest->Host transitions protection but it
+ does address User->User and VM->VM attack vectors.
+
+ Note that User->User mitigation is controlled by how the IBPB aspect in
+ the Spectre v2 mitigation is selected:
+
+ * conditional IBPB:
+
+ where each process can select whether it needs an IBPB issued
+ around it PR_SPEC_DISABLE/_ENABLE etc, see :doc:`spectre`
+
+ * strict:
+
+ i.e., always on - by supplying spectre_v2_user=on on the kernel
+ command line
+
+ (spec_rstack_overflow=microcode)
+
+ * 'Mitigation: Safe RET':
+
+ Combined microcode/software mitigation. It complements the
+ extended IBPB microcode patch functionality by addressing
+ User->Kernel and Guest->Host transitions protection.
+
+ Selected by default or by spec_rstack_overflow=safe-ret
+
+ * 'Mitigation: IBPB':
+
+ Similar protection as "safe RET" above but employs an IBPB barrier on
+ privilege domain crossings (User->Kernel, Guest->Host).
+
+ (spec_rstack_overflow=ibpb)
+
+ * 'Mitigation: IBPB on VMEXIT':
+
+ Mitigation addressing the cloud provider scenario - the Guest->Host
+ transitions only.
+
+ (spec_rstack_overflow=ibpb-vmexit)
+
+
+
+In order to exploit vulnerability, an attacker needs to:
+
+ - gain local access on the machine
+
+ - break kASLR
+
+ - find gadgets in the running kernel in order to use them in the exploit
+
+ - potentially create and pin an additional workload on the sibling
+ thread, depending on the microarchitecture (not necessary on fam 0x19)
+
+ - run the exploit
+
+Considering the performance implications of each mitigation type, the
+default one is 'Mitigation: safe RET' which should take care of most
+attack vectors, including the local User->Kernel one.
+
+As always, the user is advised to keep her/his system up-to-date by
+applying software updates regularly.
+
+The default setting will be reevaluated when needed and especially when
+new attack vectors appear.
+
+As one can surmise, 'Mitigation: safe RET' does come at the cost of some
+performance depending on the workload. If one trusts her/his userspace
+and does not want to suffer the performance impact, one can always
+disable the mitigation with spec_rstack_overflow=off.
+
+Similarly, 'Mitigation: IBPB' is another full mitigation type employing
+an indrect branch prediction barrier after having applied the required
+microcode patch for one's system. This mitigation comes also at
+a performance cost.
+
+Mitigation: Safe RET
+--------------------
+
+The mitigation works by ensuring all RET instructions speculate to
+a controlled location, similar to how speculation is controlled in the
+retpoline sequence. To accomplish this, the __x86_return_thunk forces
+the CPU to mispredict every function return using a 'safe return'
+sequence.
+
+To ensure the safety of this mitigation, the kernel must ensure that the
+safe return sequence is itself free from attacker interference. In Zen3
+and Zen4, this is accomplished by creating a BTB alias between the
+untraining function srso_alias_untrain_ret() and the safe return
+function srso_alias_safe_ret() which results in evicting a potentially
+poisoned BTB entry and using that safe one for all function returns.
+
+In older Zen1 and Zen2, this is accomplished using a reinterpretation
+technique similar to Retbleed one: srso_untrain_ret() and
+srso_safe_ret().
diff --git a/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst b/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
index 68d96f0e9c95..444f84e22a91 100644
--- a/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
+++ b/Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
@@ -60,10 +60,10 @@ Hyper-Thread attacks are possible.
The victim of a malicious actor does not need to make use of TSX. Only the
attacker needs to begin a TSX transaction and raise an asynchronous abort
-which in turn potenitally leaks data stored in the buffers.
+which in turn potentially leaks data stored in the buffers.
More detailed technical information is available in the TAA specific x86
-architecture section: :ref:`Documentation/x86/tsx_async_abort.rst <tsx_async_abort>`.
+architecture section: :ref:`Documentation/arch/x86/tsx_async_abort.rst <tsx_async_abort>`.
Attack scenarios
@@ -98,7 +98,19 @@ The possible values in this file are:
* - 'Vulnerable'
- The CPU is affected by this vulnerability and the microcode and kernel mitigation are not applied.
* - 'Vulnerable: Clear CPU buffers attempted, no microcode'
- - The system tries to clear the buffers but the microcode might not support the operation.
+ - The processor is vulnerable but microcode is not updated. The
+ mitigation is enabled on a best effort basis.
+
+ If the processor is vulnerable but the availability of the microcode
+ based mitigation mechanism is not advertised via CPUID, the kernel
+ selects a best effort mitigation mode. This mode invokes the mitigation
+ instructions without a guarantee that they clear the CPU buffers.
+
+ This is done to address virtualization scenarios where the host has the
+ microcode update applied, but the hypervisor is not yet updated to
+ expose the CPUID to the guest. If the host has updated microcode the
+ protection takes effect; otherwise a few CPU cycles are wasted
+ pointlessly.
* - 'Mitigation: Clear CPU buffers'
- The microcode has been updated to clear the buffers. TSX is still enabled.
* - 'Mitigation: TSX disabled'
@@ -106,25 +118,6 @@ The possible values in this file are:
* - 'Not affected'
- The CPU is not affected by this issue.
-.. _ucode_needed:
-
-Best effort mitigation mode
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-If the processor is vulnerable, but the availability of the microcode-based
-mitigation mechanism is not advertised via CPUID the kernel selects a best
-effort mitigation mode. This mode invokes the mitigation instructions
-without a guarantee that they clear the CPU buffers.
-
-This is done to address virtualization scenarios where the host has the
-microcode update applied, but the hypervisor is not yet updated to expose the
-CPUID to the guest. If the host has updated microcode the protection takes
-effect; otherwise a few CPU cycles are wasted pointlessly.
-
-The state in the tsx_async_abort sysfs file reflects this situation
-accordingly.
-
-
Mitigation mechanism
--------------------
diff --git a/Documentation/admin-guide/hw_random.rst b/Documentation/admin-guide/hw_random.rst
index 121de96e395e..bfc39f1cf470 100644
--- a/Documentation/admin-guide/hw_random.rst
+++ b/Documentation/admin-guide/hw_random.rst
@@ -1,6 +1,6 @@
-==========================================================
-Linux support for random number generator in i8xx chipsets
-==========================================================
+=================================
+Hardware random number generators
+=================================
Introduction
============
@@ -14,10 +14,9 @@ into that core.
To make the most effective use of these mechanisms, you
should download the support software as well. Download the
-latest version of the "rng-tools" package from the
-hw_random driver's official Web site:
+latest version of the "rng-tools" package from:
- http://sourceforge.net/projects/gkernel/
+ https://github.com/nhorman/rng-tools
Those tools use /dev/hwrng to fill the kernel entropy pool,
which is used internally and exported by the /dev/urandom and
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 5a6269fb8593..fb40a1f6f79e 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -18,6 +18,9 @@ etc.
devices
sysctl/index
+ abi
+ features
+
This section describes CPU vulnerabilities and their mitigations.
.. toctree::
@@ -31,8 +34,9 @@ problems and bugs in particular.
.. toctree::
:maxdepth: 1
- reporting-bugs
- security-bugs
+ reporting-issues
+ reporting-regressions
+ quickly-build-trimmed-linux
bug-hunting
bug-bisect
tainted-kernels
@@ -41,6 +45,7 @@ problems and bugs in particular.
init
kdump/index
perf/index
+ pstore-blk
This is the beginning of a section with information of interest to
application developers. Documents covering various aspects of the kernel
@@ -51,6 +56,17 @@ ABI will be found here.
sysfs-rules
+This is the beginning of a section with information of interest to
+application developers and system integrators doing analysis of the
+Linux kernel for safety critical applications. Documents supporting
+analysis of kernel interactions with applications, and key kernel
+subsystems expectations will be found here.
+
+.. toctree::
+ :maxdepth: 1
+
+ workload-tracing
+
The rest of this manual consists of various unordered guides on how to
configure specific aspects of kernel behavior to your liking.
@@ -78,6 +94,7 @@ configure specific aspects of kernel behavior to your liking.
edid
efi-stub
ext4
+ filesystem-monitoring
nfs/index
gpio/index
highuid
@@ -93,6 +110,7 @@ configure specific aspects of kernel behavior to your liking.
lockup-watchdogs
LSM/index
md
+ media/index
mm/index
module-signing
mono
@@ -101,19 +119,21 @@ configure specific aspects of kernel behavior to your liking.
parport
perf-security
pm/index
+ pmf
pnp
rapidio
ras
rtc
serial-console
svga
+ syscall-user-dispatch
sysrq
+ thermal/index
thunderbolt
ufs
unicode
vga-softcursor
video-output
- wimax/index
xfs
.. only:: subproject and html
diff --git a/Documentation/admin-guide/init.rst b/Documentation/admin-guide/init.rst
index e89d97f31eaf..41f06a09152e 100644
--- a/Documentation/admin-guide/init.rst
+++ b/Documentation/admin-guide/init.rst
@@ -1,52 +1,48 @@
-Explaining the dreaded "No init found." boot hang message
+Explaining the "No working init found." boot hang message
=========================================================
+:Authors: Andreas Mohr <andi at lisas period de>
+ Cristian Souza <cristianmsbr at gmail period com>
-OK, so you've got this pretty unintuitive message (currently located
-in init/main.c) and are wondering what the H*** went wrong.
-Some high-level reasons for failure (listed roughly in order of execution)
-to load the init binary are:
-
-A) Unable to mount root FS
-B) init binary doesn't exist on rootfs
-C) broken console device
-D) binary exists but dependencies not available
-E) binary cannot be loaded
-
-Detailed explanations:
-
-A) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE)
- to get more detailed kernel messages.
-B) make sure you have the correct root FS type
- (and ``root=`` kernel parameter points to the correct partition),
- required drivers such as storage hardware (such as SCSI or USB!)
- and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules,
- to be pre-loaded by an initrd)
-C) Possibly a conflict in ``console= setup`` --> initial console unavailable.
- E.g. some serial consoles are unreliable due to serial IRQ issues (e.g.
- missing interrupt-based configuration).
+This document provides some high-level reasons for failure
+(listed roughly in order of execution) to load the init binary.
+
+1) **Unable to mount root FS**: Set "debug" kernel parameter (in bootloader
+ config file or CONFIG_CMDLINE) to get more detailed kernel messages.
+
+2) **init binary doesn't exist on rootfs**: Make sure you have the correct
+ root FS type (and ``root=`` kernel parameter points to the correct
+ partition), required drivers such as storage hardware (such as SCSI or
+ USB!) and filesystem (ext3, jffs2, etc.) are builtin (alternatively as
+ modules, to be pre-loaded by an initrd).
+
+3) **Broken console device**: Possibly a conflict in ``console= setup``
+ --> initial console unavailable. E.g. some serial consoles are unreliable
+ due to serial IRQ issues (e.g. missing interrupt-based configuration).
Try using a different ``console= device`` or e.g. ``netconsole=``.
-D) e.g. required library dependencies of the init binary such as
- ``/lib/ld-linux.so.2`` missing or broken. Use
- ``readelf -d <INIT>|grep NEEDED`` to find out which libraries are required.
-E) make sure the binary's architecture matches your hardware.
- E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware.
- In case you tried loading a non-binary file here (shell script?),
- you should make sure that the script specifies an interpreter in its shebang
- header line (``#!/...``) that is fully working (including its library
- dependencies). And before tackling scripts, better first test a simple
- non-script binary such as ``/bin/sh`` and confirm its successful execution.
- To find out more, add code ``to init/main.c`` to display kernel_execve()s
- return values.
+
+4) **Binary exists but dependencies not available**: E.g. required library
+ dependencies of the init binary such as ``/lib/ld-linux.so.2`` missing or
+ broken. Use ``readelf -d <INIT>|grep NEEDED`` to find out which libraries
+ are required.
+
+5) **Binary cannot be loaded**: Make sure the binary's architecture matches
+ your hardware. E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM
+ hardware. In case you tried loading a non-binary file here (shell script?),
+ you should make sure that the script specifies an interpreter in its
+ shebang header line (``#!/...``) that is fully working (including its
+ library dependencies). And before tackling scripts, better first test a
+ simple non-script binary such as ``/bin/sh`` and confirm its successful
+ execution. To find out more, add code ``to init/main.c`` to display
+ kernel_execve()s return values.
Please extend this explanation whenever you find new failure causes
(after all loading the init binary is a CRITICAL and hard transition step
-which needs to be made as painless as possible), then submit patch to LKML.
+which needs to be made as painless as possible), then submit a patch to LKML.
Further TODOs:
- Implement the various ``run_init_process()`` invocations via a struct array
which can then store the ``kernel_execve()`` result value and on failure
log it all by iterating over **all** results (very important usability fix).
-- try to make the implementation itself more helpful in general,
- e.g. by providing additional error messages at affected places.
+- Try to make the implementation itself more helpful in general, e.g. by
+ providing additional error messages at affected places.
-Andreas Mohr <andi at lisas period de>
diff --git a/Documentation/admin-guide/initrd.rst b/Documentation/admin-guide/initrd.rst
index a03dabaaf3a3..67bbad8806e8 100644
--- a/Documentation/admin-guide/initrd.rst
+++ b/Documentation/admin-guide/initrd.rst
@@ -376,7 +376,7 @@ Resources
---------
.. [#f1] Almesberger, Werner; "Booting Linux: The History and the Future"
- http://www.almesberger.net/cv/papers/ols2k-9.ps.gz
+ https://www.almesberger.net/cv/papers/ols2k-9.ps.gz
.. [#f2] newlib package (experimental), with initrd example
https://www.sourceware.org/newlib/
.. [#f3] util-linux: Miscellaneous utilities for Linux
diff --git a/Documentation/admin-guide/iostats.rst b/Documentation/admin-guide/iostats.rst
index 9b14b0c2c9c4..609a3201fd4e 100644
--- a/Documentation/admin-guide/iostats.rst
+++ b/Documentation/admin-guide/iostats.rst
@@ -76,7 +76,7 @@ Field 3 -- # of sectors read (unsigned long)
Field 4 -- # of milliseconds spent reading (unsigned int)
This is the total number of milliseconds spent by all reads (as
- measured from __make_request() to end_that_request_last()).
+ measured from blk_mq_alloc_request() to __blk_mq_end_request()).
Field 5 -- # of writes completed (unsigned long)
This is the total number of writes completed successfully.
@@ -89,7 +89,7 @@ Field 7 -- # of sectors written (unsigned long)
Field 8 -- # of milliseconds spent writing (unsigned int)
This is the total number of milliseconds spent by all writes (as
- measured from __make_request() to end_that_request_last()).
+ measured from blk_mq_alloc_request() to __blk_mq_end_request()).
Field 9 -- # of I/Os currently in progress (unsigned int)
The only field that should go to zero. Incremented as requests are
@@ -120,7 +120,7 @@ Field 14 -- # of sectors discarded (unsigned long)
Field 15 -- # of milliseconds spent discarding (unsigned int)
This is the total number of milliseconds spent by all discards (as
- measured from __make_request() to end_that_request_last()).
+ measured from blk_mq_alloc_request() to __blk_mq_end_request()).
Field 16 -- # of flush requests completed
This is the total number of flush requests completed successfully.
diff --git a/Documentation/admin-guide/kdump/gdbmacros.txt b/Documentation/admin-guide/kdump/gdbmacros.txt
index 220d0a80ca2c..030de95e3e6b 100644
--- a/Documentation/admin-guide/kdump/gdbmacros.txt
+++ b/Documentation/admin-guide/kdump/gdbmacros.txt
@@ -170,57 +170,103 @@ document trapinfo
address the kernel panicked.
end
-define dump_log_idx
- set $idx = $arg0
- if ($argc > 1)
- set $prev_flags = $arg1
+define dump_record
+ set var $desc = $arg0
+ set var $info = $arg1
+ if ($argc > 2)
+ set var $prev_flags = $arg2
else
- set $prev_flags = 0
+ set var $prev_flags = 0
end
- set $msg = ((struct printk_log *) (log_buf + $idx))
- set $prefix = 1
- set $newline = 1
- set $log = log_buf + $idx + sizeof(*$msg)
-
- # prev & LOG_CONT && !(msg->flags & LOG_PREIX)
- if (($prev_flags & 8) && !($msg->flags & 4))
- set $prefix = 0
+
+ set var $prefix = 1
+ set var $newline = 1
+
+ set var $begin = $desc->text_blk_lpos.begin % (1U << prb->text_data_ring.size_bits)
+ set var $next = $desc->text_blk_lpos.next % (1U << prb->text_data_ring.size_bits)
+
+ # handle data-less record
+ if ($begin & 1)
+ set var $text_len = 0
+ set var $log = ""
+ else
+ # handle wrapping data block
+ if ($begin > $next)
+ set var $begin = 0
+ end
+
+ # skip over descriptor id
+ set var $begin = $begin + sizeof(long)
+
+ # handle truncated message
+ if ($next - $begin < $info->text_len)
+ set var $text_len = $next - $begin
+ else
+ set var $text_len = $info->text_len
+ end
+
+ set var $log = &prb->text_data_ring.data[$begin]
+ end
+
+ # prev & LOG_CONT && !(info->flags & LOG_PREIX)
+ if (($prev_flags & 8) && !($info->flags & 4))
+ set var $prefix = 0
end
- # msg->flags & LOG_CONT
- if ($msg->flags & 8)
+ # info->flags & LOG_CONT
+ if ($info->flags & 8)
# (prev & LOG_CONT && !(prev & LOG_NEWLINE))
if (($prev_flags & 8) && !($prev_flags & 2))
- set $prefix = 0
+ set var $prefix = 0
end
- # (!(msg->flags & LOG_NEWLINE))
- if (!($msg->flags & 2))
- set $newline = 0
+ # (!(info->flags & LOG_NEWLINE))
+ if (!($info->flags & 2))
+ set var $newline = 0
end
end
if ($prefix)
- printf "[%5lu.%06lu] ", $msg->ts_nsec / 1000000000, $msg->ts_nsec % 1000000000
+ printf "[%5lu.%06lu] ", $info->ts_nsec / 1000000000, $info->ts_nsec % 1000000000
end
- if ($msg->text_len != 0)
- eval "printf \"%%%d.%ds\", $log", $msg->text_len, $msg->text_len
+ if ($text_len)
+ eval "printf \"%%%d.%ds\", $log", $text_len, $text_len
end
if ($newline)
printf "\n"
end
- if ($msg->dict_len > 0)
- set $dict = $log + $msg->text_len
- set $idx = 0
- set $line = 1
- while ($idx < $msg->dict_len)
- if ($line)
- printf " "
- set $line = 0
+
+ # handle dictionary data
+
+ set var $dict = &$info->dev_info.subsystem[0]
+ set var $dict_len = sizeof($info->dev_info.subsystem)
+ if ($dict[0] != '\0')
+ printf " SUBSYSTEM="
+ set var $idx = 0
+ while ($idx < $dict_len)
+ set var $c = $dict[$idx]
+ if ($c == '\0')
+ loop_break
+ else
+ if ($c < ' ' || $c >= 127 || $c == '\\')
+ printf "\\x%02x", $c
+ else
+ printf "%c", $c
+ end
end
- set $c = $dict[$idx]
+ set var $idx = $idx + 1
+ end
+ printf "\n"
+ end
+
+ set var $dict = &$info->dev_info.device[0]
+ set var $dict_len = sizeof($info->dev_info.device)
+ if ($dict[0] != '\0')
+ printf " DEVICE="
+ set var $idx = 0
+ while ($idx < $dict_len)
+ set var $c = $dict[$idx]
if ($c == '\0')
- printf "\n"
- set $line = 1
+ loop_break
else
if ($c < ' ' || $c >= 127 || $c == '\\')
printf "\\x%02x", $c
@@ -228,35 +274,48 @@ define dump_log_idx
printf "%c", $c
end
end
- set $idx = $idx + 1
+ set var $idx = $idx + 1
end
printf "\n"
end
end
-document dump_log_idx
- Dump a single log given its index in the log buffer. The first
- parameter is the index into log_buf, the second is optional and
- specified the previous log buffer's flags, used for properly
- formatting continued lines.
+document dump_record
+ Dump a single record. The first parameter is the descriptor,
+ the second parameter is the info, the third parameter is
+ optional and specifies the previous record's flags, used for
+ properly formatting continued lines.
end
define dmesg
- set $i = log_first_idx
- set $end_idx = log_first_idx
- set $prev_flags = 0
+ # definitions from kernel/printk/printk_ringbuffer.h
+ set var $desc_committed = 1
+ set var $desc_finalized = 2
+ set var $desc_sv_bits = sizeof(long) * 8
+ set var $desc_flags_shift = $desc_sv_bits - 2
+ set var $desc_flags_mask = 3 << $desc_flags_shift
+ set var $id_mask = ~$desc_flags_mask
+
+ set var $desc_count = 1U << prb->desc_ring.count_bits
+ set var $prev_flags = 0
+
+ set var $id = prb->desc_ring.tail_id.counter
+ set var $end_id = prb->desc_ring.head_id.counter
while (1)
- set $msg = ((struct printk_log *) (log_buf + $i))
- if ($msg->len == 0)
- set $i = 0
- else
- dump_log_idx $i $prev_flags
- set $i = $i + $msg->len
- set $prev_flags = $msg->flags
+ set var $desc = &prb->desc_ring.descs[$id % $desc_count]
+ set var $info = &prb->desc_ring.infos[$id % $desc_count]
+
+ # skip non-committed record
+ set var $state = 3 & ($desc->state_var.counter >> $desc_flags_shift)
+ if ($state == $desc_committed || $state == $desc_finalized)
+ dump_record $desc $info $prev_flags
+ set var $prev_flags = $info->flags
end
- if ($i == $end_idx)
+
+ if ($id == $end_id)
loop_break
end
+ set var $id = ($id + 1) & $id_mask
end
end
document dmesg
diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
index ac7e131d2935..5762e7477a0c 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -2,7 +2,7 @@
Documentation for Kdump - The kexec-based Crash Dumping Solution
================================================================
-This document includes overview, setup and installation, and analysis
+This document includes overview, setup, installation, and analysis
information.
Overview
@@ -13,11 +13,11 @@ dump of the system kernel's memory needs to be taken (for example, when
the system panics). The system kernel's memory image is preserved across
the reboot and is accessible to the dump-capture kernel.
-You can use common commands, such as cp and scp, to copy the
-memory image to a dump file on the local disk, or across the network to
-a remote system.
+You can use common commands, such as cp, scp or makedumpfile to copy
+the memory image to a dump file on the local disk, or across the network
+to a remote system.
-Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64,
+Kdump and kexec are currently supported on the x86, x86_64, ppc64,
s390x, arm and arm64 architectures.
When the system kernel boots, it reserves a small section of memory for
@@ -26,13 +26,15 @@ the dump-capture kernel. This ensures that ongoing Direct Memory Access
The kexec -p command loads the dump-capture kernel into this reserved
memory.
-On x86 machines, the first 640 KB of physical memory is needed to boot,
-regardless of where the kernel loads. Therefore, kexec backs up this
-region just before rebooting into the dump-capture kernel.
+On x86 machines, the first 640 KB of physical memory is needed for boot,
+regardless of where the kernel loads. For simpler handling, the whole
+low 1M is reserved to avoid any later kernel or device driver writing
+data into this area. Like this, the low 1M can be reused as system RAM
+by kdump kernel without extra handling.
-Similarly on PPC64 machines first 32KB of physical memory is needed for
-booting regardless of where the kernel is loaded and to support 64K page
-size kexec backs up the first 64KB memory.
+On PPC64 machines first 32KB of physical memory is needed for booting
+regardless of where the kernel is loaded and to support 64K page size
+kexec backs up the first 64KB memory.
For s390x, when kdump is triggered, the crashkernel region is exchanged
with the region [0, crashkernel region size] and then the kdump kernel
@@ -46,14 +48,14 @@ passed to the dump-capture kernel through the elfcorehdr= boot
parameter. Optionally the size of the ELF header can also be passed
when using the elfcorehdr=[size[KMG]@]offset[KMG] syntax.
-
With the dump-capture kernel, you can access the memory image through
/proc/vmcore. This exports the dump as an ELF-format file that you can
-write out using file copy commands such as cp or scp. Further, you can
-use analysis tools such as the GNU Debugger (GDB) and the Crash tool to
-debug the dump file. This method ensures that the dump pages are correctly
-ordered.
-
+write out using file copy commands such as cp or scp. You can also use
+makedumpfile utility to analyze and write out filtered contents with
+options, e.g with '-d 31' it will only write out kernel data. Further,
+you can use analysis tools such as the GNU Debugger (GDB) and the Crash
+tool to debug the dump file. This method ensures that the dump pages are
+correctly ordered.
Setup and Installation
======================
@@ -111,7 +113,7 @@ There are two possible methods of using Kdump.
2) Or use the system kernel binary itself as dump-capture kernel and there is
no need to build a separate dump-capture kernel. This is possible
only with the architectures which support a relocatable kernel. As
- of today, i386, x86_64, ppc64, ia64, arm and arm64 architectures support
+ of today, i386, x86_64, ppc64, arm and arm64 architectures support
relocatable kernel.
Building a relocatable kernel is advantageous from the point of view that
@@ -125,9 +127,18 @@ dump-capture kernels for enabling kdump support.
System kernel config options
----------------------------
-1) Enable "kexec system call" in "Processor type and features."::
+1) Enable "kexec system call" or "kexec file based system call" in
+ "Processor type and features."::
+
+ CONFIG_KEXEC=y or CONFIG_KEXEC_FILE=y
+
+ And both of them will select KEXEC_CORE::
+
+ CONFIG_KEXEC_CORE=y
+
+ Subsequently, CRASH_CORE is selected by KEXEC_CORE::
- CONFIG_KEXEC=y
+ CONFIG_CRASH_CORE=y
2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo
filesystems." This is usually enabled by default::
@@ -135,9 +146,9 @@ System kernel config options
CONFIG_SYSFS=y
Note that "sysfs file system support" might not appear in the "Pseudo
- filesystems" menu if "Configure standard kernel features (for small
- systems)" is not enabled in "General Setup." In this case, check the
- .config file itself to ensure that sysfs is turned on, as follows::
+ filesystems" menu if "Configure standard kernel features (expert users)"
+ is not enabled in "General Setup." In this case, check the .config file
+ itself to ensure that sysfs is turned on, as follows::
grep 'CONFIG_SYSFS' .config
@@ -175,17 +186,19 @@ Dump-capture kernel config options (Arch Dependent, i386 and x86_64)
CONFIG_HIGHMEM4G
-2) On i386 and x86_64, disable symmetric multi-processing support
- under "Processor type and features"::
+2) With CONFIG_SMP=y, usually nr_cpus=1 need specified on the kernel
+ command line when loading the dump-capture kernel because one
+ CPU is enough for kdump kernel to dump vmcore on most of systems.
- CONFIG_SMP=n
+ However, you can also specify nr_cpus=X to enable multiple processors
+ in kdump kernel. In this case, "disable_cpu_apicid=" is needed to
+ tell kdump kernel which cpu is 1st kernel's BSP. Please refer to
+ admin-guide/kernel-parameters.txt for more details.
- (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line
- when loading the dump-capture kernel, see section "Load the Dump-capture
- Kernel".)
+ With CONFIG_SMP=n, the above things are not related.
-3) If one wants to build and use a relocatable kernel,
- Enable "Build a relocatable kernel" support under "Processor type and
+3) A relocatable kernel is suggested to be built by default. If not yet,
+ enable "Build a relocatable kernel" support under "Processor type and
features"::
CONFIG_RELOCATABLE=y
@@ -223,28 +236,6 @@ Dump-capture kernel config options (Arch Dependent, ppc64)
Make and install the kernel and its modules.
-Dump-capture kernel config options (Arch Dependent, ia64)
-----------------------------------------------------------
-
-- No specific options are required to create a dump-capture kernel
- for ia64, other than those specified in the arch independent section
- above. This means that it is possible to use the system kernel
- as a dump-capture kernel if desired.
-
- The crashkernel region can be automatically placed by the system
- kernel at run time. This is done by specifying the base address as 0,
- or omitting it all together::
-
- crashkernel=256M@0
-
- or::
-
- crashkernel=256M
-
- If the start address is specified, note that the start address of the
- kernel will be aligned to 64Mb, so if the start address is not then
- any space below the alignment point will be wasted.
-
Dump-capture kernel config options (Arch Dependent, arm)
----------------------------------------------------------
@@ -260,54 +251,85 @@ Dump-capture kernel config options (Arch Dependent, arm64)
on non-VHE systems even if it is configured. This is because the CPU
will not be reset to EL2 on panic.
-Extended crashkernel syntax
+crashkernel syntax
===========================
+1) crashkernel=size@offset
-While the "crashkernel=size[@offset]" syntax is sufficient for most
-configurations, sometimes it's handy to have the reserved memory dependent
-on the value of System RAM -- that's mostly for distributors that pre-setup
-the kernel command line to avoid a unbootable system after some memory has
-been removed from the machine.
+ Here 'size' specifies how much memory to reserve for the dump-capture kernel
+ and 'offset' specifies the beginning of this reserved memory. For example,
+ "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
+ starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
-The syntax is::
+ The crashkernel region can be automatically placed by the system
+ kernel at run time. This is done by specifying the base address as 0,
+ or omitting it all together::
- crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]
- range=start-[end]
+ crashkernel=256M@0
-For example::
+ or::
- crashkernel=512M-2G:64M,2G-:128M
+ crashkernel=256M
-This would mean:
+ If the start address is specified, note that the start address of the
+ kernel will be aligned to a value (which is Arch dependent), so if the
+ start address is not then any space below the alignment point will be
+ wasted.
- 1) if the RAM is smaller than 512M, then don't reserve anything
- (this is the "rescue" case)
- 2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
- 3) if the RAM size is larger than 2G, then reserve 128M
+2) range1:size1[,range2:size2,...][@offset]
+ While the "crashkernel=size[@offset]" syntax is sufficient for most
+ configurations, sometimes it's handy to have the reserved memory dependent
+ on the value of System RAM -- that's mostly for distributors that pre-setup
+ the kernel command line to avoid a unbootable system after some memory has
+ been removed from the machine.
+ The syntax is::
-Boot into System Kernel
-=======================
+ crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]
+ range=start-[end]
+
+ For example::
+
+ crashkernel=512M-2G:64M,2G-:128M
+
+ This would mean:
+
+ 1) if the RAM is smaller than 512M, then don't reserve anything
+ (this is the "rescue" case)
+ 2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
+ 3) if the RAM size is larger than 2G, then reserve 128M
+3) crashkernel=size,high and crashkernel=size,low
+
+ If memory above 4G is preferred, crashkernel=size,high can be used to
+ fulfill that. With it, physical memory is allowed to be allocated from top,
+ so could be above 4G if system has more than 4G RAM installed. Otherwise,
+ memory region will be allocated below 4G if available.
+
+ When crashkernel=X,high is passed, kernel could allocate physical memory
+ region above 4G, low memory under 4G is needed in this case. There are
+ three ways to get low memory:
+
+ 1) Kernel will allocate at least 256M memory below 4G automatically
+ if crashkernel=Y,low is not specified.
+ 2) Let user specify low memory size instead.
+ 3) Specified value 0 will disable low memory allocation::
+
+ crashkernel=0,low
+
+Boot into System Kernel
+-----------------------
1) Update the boot loader (such as grub, yaboot, or lilo) configuration
files as necessary.
-2) Boot the system kernel with the boot parameter "crashkernel=Y@X",
- where Y specifies how much memory to reserve for the dump-capture kernel
- and X specifies the beginning of this reserved memory. For example,
- "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
- starting at physical address 0x01000000 (16MB) for the dump-capture kernel.
+2) Boot the system kernel with the boot parameter "crashkernel=Y@X".
- On x86 and x86_64, use "crashkernel=64M@16M".
+ On x86 and x86_64, use "crashkernel=Y[@X]". Most of the time, the
+ start address 'X' is not necessary, kernel will search a suitable
+ area. Unless an explicit start address is expected.
On ppc64, use "crashkernel=128M@32M".
- On ia64, 256M@256M is a generous value that typically works.
- The region may be automatically placed on ia64, see the
- dump-capture kernel config option notes above.
- If use sparse memory, the size should be rounded to GRANULE boundaries.
-
On s390x, typically use "crashkernel=xxM". The value of xx is dependent
on the memory consumption of the kdump system. In general this is not
dependent on the memory size of the production system.
@@ -331,17 +353,13 @@ of dump-capture kernel. Following is the summary.
For i386 and x86_64:
- - Use vmlinux if kernel is not relocatable.
- Use bzImage/vmlinuz if kernel is relocatable.
+ - Use vmlinux if kernel is not relocatable.
For ppc64:
- Use vmlinux
-For ia64:
-
- - Use vmlinux or vmlinuz.gz
-
For s390x:
- Use image or bzImage
@@ -383,16 +401,12 @@ to load dump-capture kernel::
--initrd=<initrd-for-dump-capture-kernel> \
--append="root=<root-dev> <arch-specific-options>"
-Please note, that --args-linux does not need to be specified for ia64.
-It is planned to make this a no-op on that architecture, but for now
-it should be omitted
-
Following are the arch specific command line options to be used while
loading dump-capture kernel.
-For i386, x86_64 and ia64:
+For i386 and x86_64:
- "1 irqpoll maxcpus=1 reset_devices"
+ "1 irqpoll nr_cpus=1 reset_devices"
For ppc64:
@@ -400,7 +414,7 @@ For ppc64:
For s390x:
- "1 maxcpus=1 cgroup_disable=memory"
+ "1 nr_cpus=1 cgroup_disable=memory"
For arm:
@@ -408,7 +422,7 @@ For arm:
For arm64:
- "1 maxcpus=1 reset_devices"
+ "1 nr_cpus=1 reset_devices"
Notes on loading the dump-capture kernel:
@@ -488,6 +502,14 @@ the following command::
cp /proc/vmcore <dump-file>
+or use scp to write out the dump file between hosts on a network, e.g::
+
+ scp /proc/vmcore remote_username@remote_ip:<dump-file>
+
+You can also use makedumpfile utility to write out the dump file
+with specified options to filter out unwanted contents, e.g::
+
+ makedumpfile -l --message-level 1 -d 31 /proc/vmcore <dump-file>
Analysis
========
@@ -509,9 +531,12 @@ ELF32-format headers using the --elf32-core-headers kernel option on the
dump kernel.
You can also use the Crash utility to analyze dump files in Kdump
-format. Crash is available on Dave Anderson's site at the following URL:
+format. Crash is available at the following URL:
+
+ https://github.com/crash-utility/crash
- http://people.redhat.com/~anderson/
+Crash document can be found at:
+ https://crash-utility.github.io/
Trigger Kdump on WARN()
=======================
@@ -521,11 +546,18 @@ will cause a kdump to occur at the panic() call. In cases where a user wants
to specify this during runtime, /proc/sys/kernel/panic_on_warn can be set to 1
to achieve the same behaviour.
+Trigger Kdump on add_taint()
+============================
+
+The kernel parameter panic_on_taint facilitates a conditional call to panic()
+from within add_taint() whenever the value set in this bitmask matches with the
+bit flag being set by add_taint().
+This will cause a kdump to occur at the add_taint()->panic() call.
+
Contact
=======
-- Vivek Goyal (vgoyal@redhat.com)
-- Maneesh Soni (maneesh@in.ibm.com)
+- kexec@lists.infradead.org
GDB macros
==========
diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index 007a6b86e0ee..bced9e4b6e08 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -39,6 +39,12 @@ call.
User-space tools can get the kernel name, host name, kernel release
number, kernel version, architecture name and OS type from it.
+(uts_namespace, name)
+---------------------
+
+Offset of the name's member. Crash Utility and Makedumpfile get
+the start address of the init_uts_ns.name from this.
+
node_online_map
---------------
@@ -93,6 +99,11 @@ It exists in the sparse memory mapping model, and it is also somewhat
similar to the mem_map variable, both of them are used to translate an
address.
+MAX_PHYSMEM_BITS
+----------------
+
+Defines the maximum supported physical address space memory.
+
page
----
@@ -130,8 +141,8 @@ nodemask_t
The size of a nodemask_t type. Used to compute the number of online
nodes.
-(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|compound_order|compound_head)
--------------------------------------------------------------------------------------------------
+(page, flags|_refcount|mapping|lru|_mapcount|private|compound_order|compound_head)
+----------------------------------------------------------------------------------
User-space tools compute their values based on the offset of these
variables. The variables are used when excluding unnecessary pages.
@@ -161,7 +172,7 @@ variables.
Offset of the free_list's member. This value is used to compute the number
of free pages.
-Each zone has a free_area structure array called free_area[MAX_ORDER].
+Each zone has a free_area structure array called free_area[NR_PAGE_ORDERS].
The free_list represents a linked list of free page blocks.
(list_head, next|prev)
@@ -178,56 +189,129 @@ Offsets of the vmap_area's members. They carry vmalloc-specific
information. Makedumpfile gets the start address of the vmalloc region
from this.
-(zone.free_area, MAX_ORDER)
----------------------------
+(zone.free_area, NR_PAGE_ORDERS)
+--------------------------------
Free areas descriptor. User-space tools use this value to iterate the
-free_area ranges. MAX_ORDER is used by the zone buddy allocator.
+free_area ranges. NR_PAGE_ORDERS is used by the zone buddy allocator.
+
+prb
+---
-log_first_idx
+A pointer to the printk ringbuffer (struct printk_ringbuffer). This
+may be pointing to the static boot ringbuffer or the dynamically
+allocated ringbuffer, depending on when the core dump occurred.
+Used by user-space tools to read the active kernel log buffer.
+
+printk_rb_static
+----------------
+
+A pointer to the static boot printk ringbuffer. If @prb has a
+different value, this is useful for viewing the initial boot messages,
+which may have been overwritten in the dynamically allocated
+ringbuffer.
+
+clear_seq
+---------
+
+The sequence number of the printk() record after the last clear
+command. It indicates the first record after the last
+SYSLOG_ACTION_CLEAR, like issued by 'dmesg -c'. Used by user-space
+tools to dump a subset of the dmesg log.
+
+printk_ringbuffer
+-----------------
+
+The size of a printk_ringbuffer structure. This structure contains all
+information required for accessing the various components of the
+kernel log buffer.
+
+(printk_ringbuffer, desc_ring|text_data_ring|dict_data_ring|fail)
+-----------------------------------------------------------------
+
+Offsets for the various components of the printk ringbuffer. Used by
+user-space tools to view the kernel log buffer without requiring the
+declaration of the structure.
+
+prb_desc_ring
-------------
-Index of the first record stored in the buffer log_buf. Used by
-user-space tools to read the strings in the log_buf.
+The size of the prb_desc_ring structure. This structure contains
+information about the set of record descriptors.
-log_buf
--------
+(prb_desc_ring, count_bits|descs|head_id|tail_id)
+-------------------------------------------------
+
+Offsets for the fields describing the set of record descriptors. Used
+by user-space tools to be able to traverse the descriptors without
+requiring the declaration of the structure.
+
+prb_desc
+--------
+
+The size of the prb_desc structure. This structure contains
+information about a single record descriptor.
+
+(prb_desc, info|state_var|text_blk_lpos|dict_blk_lpos)
+------------------------------------------------------
+
+Offsets for the fields describing a record descriptors. Used by
+user-space tools to be able to read descriptors without requiring
+the declaration of the structure.
+
+prb_data_blk_lpos
+-----------------
-Console output is written to the ring buffer log_buf at index
-log_first_idx. Used to get the kernel log.
+The size of the prb_data_blk_lpos structure. This structure contains
+information about where the text or dictionary data (data block) is
+located within the respective data ring.
-log_buf_len
+(prb_data_blk_lpos, begin|next)
+-------------------------------
+
+Offsets for the fields describing the location of a data block. Used
+by user-space tools to be able to locate data blocks without
+requiring the declaration of the structure.
+
+printk_info
-----------
-log_buf's length.
+The size of the printk_info structure. This structure contains all
+the meta-data for a record.
-clear_idx
----------
+(printk_info, seq|ts_nsec|text_len|dict_len|caller_id)
+------------------------------------------------------
-The index that the next printk() record to read after the last clear
-command. It indicates the first record after the last SYSLOG_ACTION
-_CLEAR, like issued by 'dmesg -c'. Used by user-space tools to dump
-the dmesg log.
+Offsets for the fields providing the meta-data for a record. Used by
+user-space tools to be able to read the information without requiring
+the declaration of the structure.
-log_next_idx
-------------
+prb_data_ring
+-------------
-The index of the next record to store in the buffer log_buf. Used to
-compute the index of the current buffer position.
+The size of the prb_data_ring structure. This structure contains
+information about a set of data blocks.
-printk_log
-----------
+(prb_data_ring, size_bits|data|head_lpos|tail_lpos)
+---------------------------------------------------
+
+Offsets for the fields describing a set of data blocks. Used by
+user-space tools to be able to access the data blocks without
+requiring the declaration of the structure.
-The size of a structure printk_log. Used to compute the size of
-messages, and extract dmesg log. It encapsulates header information for
-log_buf, such as timestamp, syslog level, etc.
+atomic_long_t
+-------------
-(printk_log, ts_nsec|len|text_len|dict_len)
--------------------------------------------
+The size of the atomic_long_t structure. Used by user-space tools to
+be able to copy the full structure, regardless of its
+architecture-specific implementation.
+
+(atomic_long_t, counter)
+------------------------
-It represents field offsets in struct printk_log. User space tools
-parse it and check whether the values of printk_log's members have been
-changed.
+Offset for the long value of an atomic_long_t variable. Used by
+user-space tools to access the long value without requiring the
+architecture-specific declaration.
(free_area.free_list, MIGRATE_TYPES)
------------------------------------
@@ -241,8 +325,8 @@ NR_FREE_PAGES
On linux-2.6.21 or later, the number of free pages is in
vm_stat[NR_FREE_PAGES]. Used to get the number of free pages.
-PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask
-------------------------------------------------------------------------------
+PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask|PG_hugetlb
+-----------------------------------------------------------------------------------------
Page attributes. These flags are used to filter various unnecessary for
dumping pages.
@@ -254,12 +338,6 @@ More page attributes. These flags are used to filter various unnecessary for
dumping pages.
-HUGETLB_PAGE_DTOR
------------------
-
-The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile
-excludes these pages.
-
x86_64
======
@@ -335,36 +413,6 @@ of a higher page table lookup overhead, and also consumes more page
table space per process. Used to check whether PAE was enabled in the
crash kernel when converting virtual addresses to physical addresses.
-ia64
-====
-
-pgdat_list|(pgdat_list, MAX_NUMNODES)
--------------------------------------
-
-pg_data_t array storing all NUMA nodes information. MAX_NUMNODES
-indicates the number of the nodes.
-
-node_memblk|(node_memblk, NR_NODE_MEMBLKS)
-------------------------------------------
-
-List of node memory chunks. Filled when parsing the SRAT table to obtain
-information about memory nodes. NR_NODE_MEMBLKS indicates the number of
-node memory chunks.
-
-These values are used to compute the number of nodes the crashed kernel used.
-
-node_memblk_s|(node_memblk_s, start_paddr)|(node_memblk_s, size)
-----------------------------------------------------------------
-
-The size of a struct node_memblk_s and the offsets of the
-node_memblk_s's members. Used to compute the number of nodes.
-
-PGTABLE_3|PGTABLE_4
--------------------
-
-User-space tools need to know whether the crash kernel was in 3-level or
-4-level paging mode. Used to distinguish the page table.
-
ARM64
=====
@@ -393,6 +441,31 @@ KERNELOFFSET
The kernel randomization offset. Used to compute the page offset. If
KASLR is disabled, this value is zero.
+KERNELPACMASK
+-------------
+
+The mask to extract the Pointer Authentication Code from a kernel virtual
+address.
+
+TCR_EL1.T1SZ
+------------
+
+Indicates the size offset of the memory region addressed by TTBR1_EL1.
+The region size is 2^(64-T1SZ) bytes.
+
+TTBR1_EL1 is the table base address register specified by ARMv8-A
+architecture which is used to lookup the page-tables for the Virtual
+addresses in the higher VA range (refer to ARMv8 ARM document for
+more details).
+
+MODULES_VADDR|MODULES_END|VMALLOC_START|VMALLOC_END|VMEMMAP_START|VMEMMAP_END
+-----------------------------------------------------------------------------
+
+Used to get the correct ranges:
+ MODULES_VADDR ~ MODULES_END-1 : Kernel module space.
+ VMALLOC_START ~ VMALLOC_END-1 : vmalloc() / ioremap() space.
+ VMEMMAP_START ~ VMEMMAP_END-1 : vmemmap region, used for struct page array.
+
arm
===
@@ -486,3 +559,38 @@ X2TLB
-----
Indicates whether the crashed kernel enabled SH extended mode.
+
+RISCV64
+=======
+
+VA_BITS
+-------
+
+The maximum number of bits for virtual addresses. Used to compute the
+virtual memory ranges.
+
+PAGE_OFFSET
+-----------
+
+Indicates the virtual kernel start address of the direct-mapped RAM region.
+
+phys_ram_base
+-------------
+
+Indicates the start physical RAM address.
+
+MODULES_VADDR|MODULES_END|VMALLOC_START|VMALLOC_END|VMEMMAP_START|VMEMMAP_END|KERNEL_LINK_ADDR
+----------------------------------------------------------------------------------------------
+
+Used to get the correct ranges:
+
+ * MODULES_VADDR ~ MODULES_END : Kernel module space.
+ * VMALLOC_START ~ VMALLOC_END : vmalloc() / ioremap() space.
+ * VMEMMAP_START ~ VMEMMAP_END : vmemmap space, used for struct page array.
+ * KERNEL_LINK_ADDR : start address of Kernel link and BPF
+
+va_kernel_pa_offset
+-------------------
+
+Indicates the offset between the kernel virtual and physical mappings.
+Used to translate virtual to physical addresses.
diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst
index 6d421694d98e..4410384596a9 100644
--- a/Documentation/admin-guide/kernel-parameters.rst
+++ b/Documentation/admin-guide/kernel-parameters.rst
@@ -3,8 +3,8 @@
The kernel's command-line parameters
====================================
-The following is a consolidated list of the kernel parameters as
-implemented by the __setup(), core_param() and module_param() macros
+The following is a consolidated list of the kernel parameters as implemented
+by the __setup(), early_param(), core_param() and module_param() macros
and sorted into English Dictionary order (defined as ignoring all
punctuation and sorting digits before letters in a case insensitive
manner), and with descriptions where known.
@@ -60,7 +60,7 @@ Note that for the special case of a range one can split the range into equal
sized groups and for each group use some amount from the beginning of that
group:
- <cpu number>-cpu number>:<used size>/<group size>
+ <cpu number>-<cpu number>:<used size>/<group size>
For example one can add to the command line following parameter:
@@ -68,7 +68,19 @@ For example one can add to the command line following parameter:
where the final item represents CPUs 100,101,125,126,150,151,...
+The value "N" can be used to represent the numerically last CPU on the system,
+i.e "foo_cpus=16-N" would be equivalent to "16-31" on a 32 core system.
+Keep in mind that "N" is dynamic, so if system changes cause the bitmap width
+to change, such as less cores in the CPU list, then N and any ranges using N
+will also change. Use the same on a small 4 core system, and "16-N" becomes
+"16-3" and now the same boot input will be flagged as invalid (start > end).
+
+The special case-tolerant group name "all" has a meaning of selecting all CPUs,
+so that "nohz_full=all" is the equivalent of "nohz_full=0-N".
+
+The semantics of "N" and "all" is supported on a level of bitmaps and holds for
+all users of bitmap_parselist().
This document may not be entirely up to date and comprehensive. The command
"modinfo -p ${modulename}" shows a current list of all parameters of a loadable
@@ -77,16 +89,18 @@ reveal their parameters in /sys/module/${modulename}/parameters/. Some of these
parameters may be changed at runtime by the command
``echo -n ${value} > /sys/module/${modulename}/parameters/${parm}``.
-The parameters listed below are only valid if certain kernel build options were
-enabled and if respective hardware is present. The text in square brackets at
-the beginning of each description states the restrictions within which a
-parameter is applicable::
+The parameters listed below are only valid if certain kernel build options
+were enabled and if respective hardware is present. This list should be kept
+in alphabetical order. The text in square brackets at the beginning
+of each description states the restrictions within which a parameter
+is applicable::
ACPI ACPI support is enabled.
AGP AGP (Accelerated Graphics Port) is enabled.
ALSA ALSA sound support is enabled.
APIC APIC support is enabled.
APM Advanced Power Management support is enabled.
+ APPARMOR AppArmor support is enabled.
ARM ARM architecture is enabled.
ARM64 ARM64 architecture is enabled.
AX25 Appropriate AX.25 support is enabled.
@@ -96,15 +110,15 @@ parameter is applicable::
DYNAMIC_DEBUG Build in debug messages and enable them at runtime
EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
EFI EFI Partitioning (GPT) is enabled
- EIDE EIDE/ATAPI support is enabled.
EVM Extended Verification Module
FB The frame buffer device is enabled.
FTRACE Function tracing enabled.
GCOV GCOV profiling is enabled.
+ HIBERNATION HIBERNATION is enabled.
HW Appropriate hardware is enabled.
+ HYPER_V HYPERV support is enabled.
IA-64 IA-64 architecture is enabled.
IMA Integrity measurement architecture is enabled.
- IOSCHED More than one I/O scheduler is enabled.
IP_PNP IP DHCP, BOOTP, or RARP is enabled.
IPV6 IPv6 support is enabled.
ISAPNP ISA PnP code is enabled.
@@ -114,23 +128,21 @@ parameter is applicable::
KGDB Kernel debugger support is enabled.
KVM Kernel Virtual Machine support is enabled.
LIBATA Libata driver is enabled
- LP Printer support is enabled.
+ LOONGARCH LoongArch architecture is enabled.
LOOP Loopback device support is enabled.
+ LP Printer support is enabled.
M68k M68k architecture is enabled.
These options have more detailed description inside of
- Documentation/m68k/kernel-options.rst.
+ Documentation/arch/m68k/kernel-options.rst.
MDA MDA console support is enabled.
MIPS MIPS architecture is enabled.
MOUSE Appropriate mouse support is enabled.
MSI Message Signaled Interrupts (PCI).
MTD MTD (Memory Technology Device) support is enabled.
NET Appropriate network support is enabled.
- NUMA NUMA support is enabled.
NFS Appropriate NFS support is enabled.
+ NUMA NUMA support is enabled.
OF Devicetree is enabled.
- OSS OSS sound support is enabled.
- PV_OPS A paravirtualized kernel is enabled.
- PARIDE The ParIDE (parallel port IDE) subsystem is enabled.
PARISC The PA-RISC architecture is enabled.
PCI PCI bus support is enabled.
PCIE PCI Express support is enabled.
@@ -139,53 +151,53 @@ parameter is applicable::
PPC PowerPC architecture is enabled.
PPT Parallel port support is enabled.
PS2 Appropriate PS/2 support is enabled.
+ PV_OPS A paravirtualized kernel is enabled.
RAM RAM disk support is enabled.
RDT Intel Resource Director Technology.
+ RISCV RISCV architecture is enabled.
S390 S390 architecture is enabled.
SCSI Appropriate SCSI support is enabled.
A lot of drivers have their options described inside
the Documentation/scsi/ sub-directory.
SECURITY Different security models are enabled.
SELINUX SELinux support is enabled.
- APPARMOR AppArmor support is enabled.
SERIAL Serial support is enabled.
SH SuperH architecture is enabled.
SMP The kernel is an SMP kernel.
SPARC Sparc architecture is enabled.
- SWSUSP Software suspend (hibernation) is enabled.
SUSPEND System suspend states are enabled.
+ SWSUSP Software suspend (hibernation) is enabled.
TPM TPM drivers are enabled.
- TS Appropriate touchscreen support is enabled.
UMS USB Mass Storage support is enabled.
USB USB support is enabled.
USBHID USB Human Interface Device support is enabled.
V4L Video For Linux support is enabled.
- VMMIO Driver for memory mapped virtio devices is enabled.
VGA The VGA console has been enabled.
+ VMMIO Driver for memory mapped virtio devices is enabled.
VT Virtual terminal support is enabled.
WDT Watchdog support is enabled.
- XT IBM PC/XT MFM hard disk support is enabled.
X86-32 X86-32, aka i386 architecture is enabled.
X86-64 X86-64 architecture is enabled.
More X86-64 boot options can be found in
- Documentation/x86/x86_64/boot-options.rst.
+ Documentation/arch/x86/x86_64/boot-options.rst.
X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
X86_UV SGI UV support is enabled.
XEN Xen support is enabled
+ XTENSA xtensa architecture is enabled.
In addition, the following text indicates that the option::
+ BOOT Is a boot loader parameter.
BUGS= Relates to possible processor bugs on the said processor.
KNL Is a kernel start-up parameter.
- BOOT Is a boot loader parameter.
Parameters denoted with BOOT are actually interpreted by the boot
loader, and have no meaning to the kernel directly.
Do not modify the syntax of boot loader parameters without extreme
-need or coordination with <Documentation/x86/boot.rst>.
+need or coordination with <Documentation/arch/x86/boot.rst>.
There are also arch-specific kernel-parameters not documented here.
-See for example <Documentation/x86/x86_64/boot-options.rst>.
+See for example <Documentation/arch/x86/x86_64/boot-options.rst>.
Note that ALL kernel parameters listed below are CASE SENSITIVE, and that
a trailing = on the name of any parameter states that that parameter will
@@ -197,7 +209,7 @@ The number of kernel parameters is not limited, but the length of the
complete command line (parameters including spaces etc.) is limited to
a fixed number of characters. This limit depends on the architecture
and is between 256 and 4096 characters. It is defined in the file
-./include/asm/setup.h as COMMAND_LINE_SIZE.
+./include/uapi/asm-generic/setup.h as COMMAND_LINE_SIZE.
Finally, the [KMG] suffix is commonly described after a number of kernel
parameter values. These 'K', 'M', and 'G' letters represent the _binary_
@@ -206,8 +218,3 @@ bytes respectively. Such letter suffixes can also be entirely omitted:
.. include:: kernel-parameters.txt
:literal:
-
-Todo
-----
-
- Add more DRM drivers.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7bc83f3d9bdf..31b3a25680d0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1,17 +1,28 @@
- acpi= [HW,ACPI,X86,ARM64]
+ accept_memory= [MM]
+ Format: { eager | lazy }
+ default: lazy
+ By default, unaccepted memory is accepted lazily to
+ avoid prolonged boot times. The lazy option will add
+ some runtime overhead until all memory is eventually
+ accepted. In most cases the overhead is negligible.
+ For some workloads or for debugging purposes
+ accept_memory=eager can be used to accept all memory
+ at once during boot.
+
+ acpi= [HW,ACPI,X86,ARM64,RISCV64]
Advanced Configuration and Power Interface
Format: { force | on | off | strict | noirq | rsdt |
copy_dsdt }
force -- enable ACPI if default was off
- on -- enable ACPI but allow fallback to DT [arm64]
+ on -- enable ACPI but allow fallback to DT [arm64,riscv64]
off -- disable ACPI if default was on
noirq -- do not use ACPI for IRQ routing
strict -- Be less tolerant of platforms that are not
strictly ACPI specification compliant.
rsdt -- prefer RSDT over (default) XSDT
copy_dsdt -- copy DSDT to memory
- For ARM64, ONLY "acpi=off", "acpi=on" or "acpi=force"
- are available
+ For ARM64 and RISCV64, ONLY "acpi=off", "acpi=on" or
+ "acpi=force" are available
See also Documentation/power/runtime_pm.rst, pci=noacpi
@@ -50,7 +61,7 @@
CONFIG_ACPI_DEBUG must be enabled to produce any ACPI
debug output. Bits in debug_layer correspond to a
_COMPONENT in an ACPI source file, e.g.,
- #define _COMPONENT ACPI_PCI_COMPONENT
+ #define _COMPONENT ACPI_EVENTS
Bits in debug_level correspond to a level in
ACPI_DEBUG_PRINT statements, e.g.,
ACPI_DEBUG_PRINT((ACPI_DB_INFO, ...
@@ -60,8 +71,6 @@
Enable processor driver info messages:
acpi.debug_layer=0x20000000
- Enable PCI/PCI interrupt routing info messages:
- acpi.debug_layer=0x400000
Enable AML "Debug" output, i.e., stores to the Debug
object while interpreting AML:
acpi.debug_layer=0xffffffff acpi.debug_level=0x2
@@ -115,7 +124,7 @@
the GPE dispatcher.
This facility can be used to prevent such uncontrolled
GPE floodings.
- Format: <byte>
+ Format: <byte> or <bitmap-list>
acpi_no_auto_serialize [HW,ACPI]
Disable auto-serialization of AML methods
@@ -227,14 +236,23 @@
For broken nForce2 BIOS resulting in XT-PIC timer.
acpi_sleep= [HW,ACPI] Sleep options
- Format: { s3_bios, s3_mode, s3_beep, s4_nohwsig,
- old_ordering, nonvs, sci_force_enable, nobl }
+ Format: { s3_bios, s3_mode, s3_beep, s4_hwsig,
+ s4_nohwsig, old_ordering, nonvs,
+ sci_force_enable, nobl }
See Documentation/power/video.rst for information on
s3_bios and s3_mode.
s3_beep is for debugging; it makes the PC's speaker beep
as soon as the kernel's real-mode entry point is called.
+ s4_hwsig causes the kernel to check the ACPI hardware
+ signature during resume from hibernation, and gracefully
+ refuse to resume if it has changed. This complies with
+ the ACPI specification but not with reality, since
+ Windows does not do this and many laptops do change it
+ on docking. So the default behaviour is to allow resume
+ and simply warn when the signature changes, unless the
+ s4_hwsig option is enabled.
s4_nohwsig prevents ACPI hardware signature from being
- used during resume from hibernation.
+ used (or even warned about) during resume.
old_ordering causes the ACPI 1.0 ordering of the _PTS
control method, with respect to putting devices into
low power states, to be enforced (the ACPI 2.0 ordering
@@ -289,13 +307,21 @@
do not want to use tracing_snapshot_alloc() as it needs
to be done where GFP_KERNEL allocations are allowed.
+ allow_mismatched_32bit_el0 [ARM64]
+ Allow execve() of 32-bit applications and setting of the
+ PER_LINUX32 personality on systems where only a strict
+ subset of the CPUs support 32-bit EL0. When this
+ parameter is present, the set of CPUs supporting 32-bit
+ EL0 is indicated by /sys/devices/system/cpu/aarch32_el0
+ and hot-unplug operations may be restricted.
+
+ See Documentation/arch/arm64/asymmetric-32bit.rst for more
+ information.
+
amd_iommu= [HW,X86-64]
Pass parameters to the AMD IOMMU driver in the system.
Possible values are:
- fullflush - enable flushing of IO/TLB entries when
- they are unmapped. Otherwise they are
- flushed before they will be reused, which
- is a lot of faster
+ fullflush - Deprecated, equivalent to iommu.strict=1
off - do not initialize any AMD IOMMU found in
the system
force_isolation - Force device isolation for all
@@ -303,6 +329,12 @@
allowed anymore to lift isolation
requirements as needed. This option
does not override iommu=pt
+ force_enable - Force enable the IOMMU on platforms known
+ to be buggy with IOMMU enabled. Use this
+ option with care.
+ pgtbl_v1 - Use v1 page table for DMA-API (Default).
+ pgtbl_v2 - Use v2 page table for DMA-API.
+ irtcachedis - Disable Interrupt Remapping Table (IRT) caching.
amd_iommu_dump= [HW,X86-64]
Enable AMD IOMMU driver option to dump the ACPI table
@@ -319,6 +351,29 @@
This mode requires kvm-amd.avic=1.
(Default when IOMMU HW support is present.)
+ amd_pstate= [X86]
+ disable
+ Do not enable amd_pstate as the default
+ scaling driver for the supported processors
+ passive
+ Use amd_pstate with passive mode as a scaling driver.
+ In this mode autonomous selection is disabled.
+ Driver requests a desired performance level and platform
+ tries to match the same performance level if it is
+ satisfied by guaranteed performance level.
+ active
+ Use amd_pstate_epp driver instance as the scaling driver,
+ driver provides a hint to the hardware if software wants
+ to bias toward performance (0x0) or energy efficiency (0xff)
+ to the CPPC firmware. then CPPC power algorithm will
+ calculate the runtime workload and adjust the realtime cores
+ frequency.
+ guided
+ Activate guided autonomous mode. Driver requests minimum and
+ maximum performance level and the platform autonomously
+ selects a performance level in this range and appropriate
+ to the current workload.
+
amijoy.map= [HW,JOY] Amiga joystick support
Map of devices attached to JOY0DAT and JOY1DAT
Format: <a>,<b>
@@ -356,23 +411,39 @@
shot down by NMI
autoconf= [IPV6]
- See Documentation/networking/ipv6.txt.
-
- show_lapic= [APIC,X86] Advanced Programmable Interrupt Controller
- Limit apic dumping. The parameter defines the maximal
- number of local apics being dumped. Also it is possible
- to set it to "all" by meaning -- no limit here.
- Format: { 1 (default) | 2 | ... | all }.
- The parameter valid if only apic=debug or
- apic=verbose is specified.
- Example: apic=debug show_lapic=all
+ See Documentation/networking/ipv6.rst.
apm= [APM] Advanced Power Management
See header of arch/x86/kernel/apm_32.c.
+ apparmor= [APPARMOR] Disable or enable AppArmor at boot time
+ Format: { "0" | "1" }
+ See security/apparmor/Kconfig help text
+ 0 -- disable.
+ 1 -- enable.
+ Default value is set via kernel config option.
+
arcrimi= [HW,NET] ARCnet - "RIM I" (entirely mem-mapped) cards
Format: <io>,<irq>,<nodeID>
+ arm64.nobti [ARM64] Unconditionally disable Branch Target
+ Identification support
+
+ arm64.nomops [ARM64] Unconditionally disable Memory Copy and Memory
+ Set instructions support
+
+ arm64.nomte [ARM64] Unconditionally disable Memory Tagging Extension
+ support
+
+ arm64.nopauth [ARM64] Unconditionally disable Pointer Authentication
+ support
+
+ arm64.nosme [ARM64] Unconditionally disable Scalable Matrix
+ Extension support
+
+ arm64.nosve [ARM64] Unconditionally disable Scalable Vector
+ Extension support
+
ataflop= [HW,M68k]
atarimouse= [HW,MOUSE] Atari Mouse
@@ -434,13 +505,21 @@
Format: <io>,<irq>,<mode>
See header of drivers/net/hamradio/baycom_ser_hdx.c.
+ bert_disable [ACPI]
+ Disable BERT OS support on buggy BIOSes.
+
+ bgrt_disable [ACPI][X86]
+ Disable BGRT to avoid flickering OEM logo.
+
blkdevparts= Manual partition parsing of block device(s) for
embedded devices based on command line input.
See Documentation/block/cmdline-partition.rst
boot_delay= Milliseconds to delay each printk during boot.
- Values larger than 10 seconds (10000) are changed to
- no delay (0).
+ Only works if CONFIG_BOOT_PRINTK_DELAY is enabled,
+ and you may also have to specify "lpj=". Boot_delay
+ values larger than 10 seconds (10000) are assumed
+ erroneous and ignored.
Format: integer
bootconfig [KNL]
@@ -449,16 +528,10 @@
See Documentation/admin-guide/bootconfig.rst
- bert_disable [ACPI]
- Disable BERT OS support on buggy BIOSes.
-
- bgrt_disable [ACPI][X86]
- Disable BGRT to avoid flickering OEM logo.
-
bttv.card= [HW,V4L] bttv (bt848 + bt878 based grabber cards)
bttv.radio= Most important insmod options are available as
kernel args too.
- bttv.pll= See Documentation/media/v4l-drivers/bttv.rst
+ bttv.pll= See Documentation/admin-guide/media/bttv.rst
bttv.tuner=
bulk_remove=off [PPC] This parameter disables the use of the pSeries
@@ -491,18 +564,23 @@
others).
ccw_timeout_log [S390]
- See Documentation/s390/common_io.rst for details.
+ See Documentation/arch/s390/common_io.rst for details.
- cgroup_disable= [KNL] Disable a particular controller
- Format: {name of the controller(s) to disable}
+ cgroup_disable= [KNL] Disable a particular controller or optional feature
+ Format: {name of the controller(s) or feature(s) to disable}
The effects of cgroup_disable=foo are:
- foo isn't auto-mounted if you mount all cgroups in
a single hierarchy
- foo isn't visible as an individually mountable
subsystem
+ - if foo is an optional feature then the feature is
+ disabled and corresponding cgroup files are not
+ created
{Currently only "memory" controller deal with this and
cut the overhead, others just disable the usage. So
only cgroup_disable=memory is actually worthy}
+ Specifying "pressure" disables per-cgroup pressure
+ stall information accounting feature
cgroup_no_v1= [KNL] Disable cgroup controllers and named hierarchies in v1
Format: { { controller | "all" | "named" }
@@ -513,12 +591,17 @@
named mounts. Specifying both "all" and "named" disables
all v1 hierarchies.
+ cgroup_favordynmods= [KNL] Enable or Disable favordynmods.
+ Format: { "true" | "false" }
+ Defaults to the value of CONFIG_CGROUP_FAVOR_DYNMODS.
+
cgroup.memory= [KNL] Pass options to the cgroup memory controller.
Format: <string>
nosocket -- Disable socket memory accounting.
nokmem -- Disable kernel memory accounting.
+ nobpf -- Disable BPF memory accounting.
- checkreqprot [SELINUX] Set initial checkreqprot flag value.
+ checkreqprot= [SELINUX] Set initial checkreqprot flag value.
Format: { "0" | "1" }
See security/selinux/Kconfig help text.
0 -- check protection applied by kernel (includes
@@ -530,7 +613,26 @@
Setting checkreqprot to 1 is deprecated.
cio_ignore= [S390]
- See Documentation/s390/common_io.rst for details.
+ See Documentation/arch/s390/common_io.rst for details.
+
+ clearcpuid=X[,X...] [X86]
+ Disable CPUID feature X for the kernel. See
+ arch/x86/include/asm/cpufeatures.h for the valid bit
+ numbers X. Note the Linux-specific bits are not necessarily
+ stable over kernel options, but the vendor-specific
+ ones should be.
+ X can also be a string as appearing in the flags: line
+ in /proc/cpuinfo which does not have the above
+ instability issue. However, not all features have names
+ in /proc/cpuinfo.
+ Note that using this option will taint your kernel.
+ Also note that user programs calling CPUID directly
+ or using the feature without checking anything
+ will still see it. This just prevents it from
+ being used by the kernel or shown in /proc/cpuinfo.
+ Also note the kernel might malfunction if you disable
+ some critical bits.
+
clk_ignore_unused
[CLK]
Prevents the clock framework from automatically gating
@@ -577,27 +679,58 @@
loops can be debugged more effectively on production
systems.
- clearcpuid=BITNUM [X86]
- Disable CPUID feature X for the kernel. See
- arch/x86/include/asm/cpufeatures.h for the valid bit
- numbers. Note the Linux specific bits are not necessarily
- stable over kernel options, but the vendor specific
- ones should be.
- Also note that user programs calling CPUID directly
- or using the feature without checking anything
- will still see it. This just prevents it from
- being used by the kernel or shown in /proc/cpuinfo.
- Also note the kernel might malfunction if you disable
- some critical bits.
+ clocksource.max_cswd_read_retries= [KNL]
+ Number of clocksource_watchdog() retries due to
+ external delays before the clock will be marked
+ unstable. Defaults to two retries, that is,
+ three attempts to read the clock under test.
+
+ clocksource.verify_n_cpus= [KNL]
+ Limit the number of CPUs checked for clocksources
+ marked with CLOCK_SOURCE_VERIFY_PERCPU that
+ are marked unstable due to excessive skew.
+ A negative value says to check all CPUs, while
+ zero says not to check any. Values larger than
+ nr_cpu_ids are silently truncated to nr_cpu_ids.
+ The actual CPUs are chosen randomly, with
+ no replacement if the same CPU is chosen twice.
+
+ clocksource-wdtest.holdoff= [KNL]
+ Set the time in seconds that the clocksource
+ watchdog test waits before commencing its tests.
+ Defaults to zero when built as a module and to
+ 10 seconds when built into the kernel.
cma=nn[MG]@[start[MG][-end[MG]]]
- [ARM,X86,KNL]
+ [KNL,CMA]
Sets the size of kernel global memory area for
contiguous memory allocations and optionally the
placement constraint by the physical address range of
memory allocations. A value of 0 disables CMA
altogether. For more information, see
- include/linux/dma-contiguous.h
+ kernel/dma/contiguous.c
+
+ cma_pernuma=nn[MG]
+ [KNL,CMA]
+ Sets the size of kernel per-numa memory area for
+ contiguous memory allocations. A value of 0 disables
+ per-numa CMA altogether. And If this option is not
+ specified, the default value is 0.
+ With per-numa CMA enabled, DMA users on node nid will
+ first try to allocate buffer from the pernuma area
+ which is located in node nid, if the allocation fails,
+ they will fallback to the global default memory area.
+
+ numa_cma=<node>:nn[MG][,<node>:nn[MG]]
+ [KNL,CMA]
+ Sets the size of kernel numa memory area for
+ contiguous memory allocations. It will reserve CMA
+ area for the specified node.
+
+ With numa CMA enabled, DMA users on node nid will
+ first try to allocate buffer from the numa area
+ which is located in node nid, if the allocation fails,
+ they will fallback to the global default memory area.
cmo_free_hint= [PPC] Format: { yes | no }
Specify whether pages are marked as being inactive
@@ -624,6 +757,17 @@
condev= [HW,S390] console device
conmode=
+ con3215_drop= [S390] 3215 console drop mode.
+ Format: y|n|Y|N|1|0
+ When set to true, drop data on the 3215 console when
+ the console buffer is full. In this case the
+ operator using a 3270 terminal emulator (for example
+ x3270) does not have to enter the clear key for the
+ console output to advance and the kernel to continue.
+ This leads to a much faster boot time when a 3270
+ terminal emulator is active. If no 3270 terminal
+ emulator is used, this parameter has no effect.
+
console= [KNL] Output console device and options.
tty<n> Use the virtual console device <n>.
@@ -638,7 +782,7 @@
See Documentation/admin-guide/serial-console.rst for more
information. See
- Documentation/networking/netconsole.txt for an
+ Documentation/networking/netconsole.rst for an
alternative.
uart[8250],io,<addr>[,options]
@@ -659,6 +803,12 @@
hvc<n> Use the hypervisor console device <n>. This is for
both Xen and PowerPC hypervisors.
+ { null | "" }
+ Use to disable console output, i.e., to have kernel
+ console messages discarded.
+ This must be the only console= parameter used on the
+ kernel command line.
+
If the device connected to the port is not a TTY but a braille
device, prepend "brl," before the device type, for instance
console=brl,ttyS0
@@ -694,6 +844,10 @@
0: default value, disable debugging
1: enable debugging at boot time
+ cpcihp_generic= [HW,PCI] Generic port I/O CompactPCI driver
+ Format:
+ <first_slot>,<last_slot>,<port>,<enum_bit>[,<debug>]
+
cpuidle.off=1 [CPU_IDLE]
disable the cpuidle sub-system
@@ -703,15 +857,30 @@
cpufreq.off=1 [CPU_FREQ]
disable the cpufreq sub-system
+ cpufreq.default_governor=
+ [CPU_FREQ] Name of the default cpufreq governor or
+ policy to use. This governor must be registered in the
+ kernel before the cpufreq driver probes.
+
cpu_init_udelay=N
[X86] Delay for N microsec between assert and de-assert
of APIC INIT to start processors. This delay occurs
on every CPU online, such as boot, and resume from suspend.
Default: 10000
- cpcihp_generic= [HW,PCI] Generic port I/O CompactPCI driver
- Format:
- <first_slot>,<last_slot>,<port>,<enum_bit>[,<debug>]
+ cpuhp.parallel=
+ [SMP] Enable/disable parallel bringup of secondary CPUs
+ Format: <bool>
+ Default is enabled if CONFIG_HOTPLUG_PARALLEL=y. Otherwise
+ the parameter has no effect.
+
+ crash_kexec_post_notifiers
+ Run kdump after running panic-notifiers and dumping
+ kmsg. This only for the users who doubt kdump always
+ succeeds in any situation.
+ Note that this also increases risks of kdump failure,
+ because some panic notifiers can make the crashed
+ kernel more unstable.
crashkernel=size[KMG][@offset[KMG]]
[KNL] Using kexec, Linux can switch to a 'crash kernel'
@@ -719,9 +888,9 @@
memory region [offset, offset + size] for that kernel
image. If '@offset' is omitted, then a suitable offset
is selected automatically.
- [KNL, x86_64] select a region under 4G first, and
- fall back to reserve region above 4G when '@offset'
- hasn't been specified.
+ [KNL, X86-64, ARM64, RISCV, LoongArch] Select a region
+ under 4G first, and fall back to reserve region above
+ 4G when '@offset' hasn't been specified.
See Documentation/admin-guide/kdump/kdump.rst for further details.
crashkernel=range1:size1[,range2:size2,...][@offset]
@@ -732,22 +901,28 @@
Documentation/admin-guide/kdump/kdump.rst for an example.
crashkernel=size[KMG],high
- [KNL, x86_64] range could be above 4G. Allow kernel
- to allocate physical memory region from top, so could
- be above 4G if system have more than 4G ram installed.
- Otherwise memory region will be allocated below 4G, if
- available.
+ [KNL, X86-64, ARM64, RISCV, LoongArch] range could be
+ above 4G.
+ Allow kernel to allocate physical memory region from top,
+ so could be above 4G if system have more than 4G ram
+ installed. Otherwise memory region will be allocated
+ below 4G, if available.
It will be ignored if crashkernel=X is specified.
crashkernel=size[KMG],low
- [KNL, x86_64] range under 4G. When crashkernel=X,high
- is passed, kernel could allocate physical memory region
- above 4G, that cause second kernel crash on system
- that require some amount of low memory, e.g. swiotlb
- requires at least 64M+32K low memory, also enough extra
- low memory is needed to make sure DMA buffers for 32-bit
- devices won't run out. Kernel would try to allocate at
- at least 256M below 4G automatically.
- This one let user to specify own low range under 4G
+ [KNL, X86-64, ARM64, RISCV, LoongArch] range under 4G.
+ When crashkernel=X,high is passed, kernel could allocate
+ physical memory region above 4G, that cause second kernel
+ crash on system that require some amount of low memory,
+ e.g. swiotlb requires at least 64M+32K low memory, also
+ enough extra low memory is needed to make sure DMA buffers
+ for 32-bit devices won't run out. Kernel would try to allocate
+ default size of memory below 4G automatically. The default
+ size is platform dependent.
+ --> x86: max(swiotlb_size_or_default() + 8MiB, 256MiB)
+ --> arm64: 128MiB
+ --> riscv: 128MiB
+ --> loongarch: 128MiB
+ This one lets the user specify own low range under 4G
for second kernel instead.
0: to disable low allocation.
It will be ignored when crashkernel=X,high is not used
@@ -762,6 +937,15 @@
cs89x0_media= [HW,NET]
Format: { rj45 | aui | bnc }
+ csdlock_debug= [KNL] Enable or disable debug add-ons of cross-CPU
+ function call handling. When switched on,
+ additional debug data is printed to the console
+ in case a hanging CPU is detected, and that
+ CPU is pinged again in order to try to resolve
+ the hang situation. The default value of this
+ option depends on the CSD_LOCK_WAIT_DEBUG_DEFAULT
+ Kconfig option.
+
dasd= [HW,NET]
See header of drivers/s390/block/dasd_devmap.c.
@@ -770,11 +954,6 @@
Format: <port#>,<type>
See also Documentation/input/devices/joystick-parport.rst
- ddebug_query= [KNL,DYNAMIC_DEBUG] Enable debug messages at early boot
- time. See
- Documentation/admin-guide/dynamic-debug-howto.rst for
- details. Deprecated, see dyndbg.
-
debug [KNL] Enable kernel debugging (events log level).
debug_boot_weak_hash
@@ -786,19 +965,17 @@
insecure, please do not use on production kernels.
debug_locks_verbose=
- [KNL] verbose self-tests
- Format=<0|1>
+ [KNL] verbose locking self-tests
+ Format: <int>
Print debugging info while doing the locking API
self-tests.
- We default to 0 (no extra messages), setting it to
- 1 will print _a lot_ more information - normally
- only useful to kernel developers.
+ Bitmask for the various LOCKTYPE_ tests. Defaults to 0
+ (no extra messages), setting it to -1 (all bits set)
+ will print _a_lot_ more information - normally only
+ useful to lockdep developers.
debug_objects [KNL] Enable object debugging
- no_debug_objects
- [KNL] Disable object debugging
-
debug_guardpage_minorder=
[KNL] When CONFIG_DEBUG_PAGEALLOC is set, this
parameter allows control of the order of pages that will
@@ -806,17 +983,17 @@
buddy allocator. Bigger value increase the probability
of catching random memory corruption, but reduce the
amount of memory for normal system use. The maximum
- possible value is MAX_ORDER/2. Setting this parameter
- to 1 or 2 should be enough to identify most random
- memory corruption problems caused by bugs in kernel or
- driver code when a CPU writes to (or reads from) a
- random memory location. Note that there exists a class
- of memory corruptions problems caused by buggy H/W or
- F/W or by drivers badly programing DMA (basically when
- memory is written at bus level and the CPU MMU is
- bypassed) which are not detectable by
- CONFIG_DEBUG_PAGEALLOC, hence this option will not help
- tracking down these problems.
+ possible value is MAX_PAGE_ORDER/2. Setting this
+ parameter to 1 or 2 should be enough to identify most
+ random memory corruption problems caused by bugs in
+ kernel or driver code when a CPU writes to (or reads
+ from) a random memory location. Note that there exists
+ a class of memory corruptions problems caused by buggy
+ H/W or F/W or by drivers badly programming DMA
+ (basically when memory is written at bus level and the
+ CPU MMU is bypassed) which are not detectable by
+ CONFIG_DEBUG_PAGEALLOC, hence this option will not
+ help tracking down these problems.
debug_pagealloc=
[KNL] When CONFIG_DEBUG_PAGEALLOC is set, this parameter
@@ -827,29 +1004,71 @@
useful to also enable the page_owner functionality.
on: enable the feature
- debugpat [X86] Enable PAT debugging
+ debugfs= [KNL] This parameter enables what is exposed to userspace
+ and debugfs internal clients.
+ Format: { on, no-mount, off }
+ on: All functions are enabled.
+ no-mount:
+ Filesystem is not registered but kernel clients can
+ access APIs and a crashkernel can be used to read
+ its content. There is nothing to mount.
+ off: Filesystem is not registered and clients
+ get a -EPERM as result when trying to register files
+ or directories within debugfs.
+ This is equivalent of the runtime functionality if
+ debugfs was not enabled in the kernel at all.
+ Default value is set in build-time with a kernel configuration.
- decnet.addr= [HW,NET]
- Format: <area>[,<node>]
- See also Documentation/networking/decnet.txt.
+ debugpat [X86] Enable PAT debugging
default_hugepagesz=
- [same as hugepagesz=] The size of the default
- HugeTLB page size. This is the size represented by
- the legacy /proc/ hugepages APIs, used for SHM, and
- default size when mounting hugetlbfs filesystems.
- Defaults to the default architecture's huge page size
- if not specified.
+ [HW] The size of the default HugeTLB page. This is
+ the size represented by the legacy /proc/ hugepages
+ APIs. In addition, this is the default hugetlb size
+ used for shmget(), mmap() and mounting hugetlbfs
+ filesystems. If not specified, defaults to the
+ architecture's default huge page size. Huge page
+ sizes are architecture dependent. See also
+ Documentation/admin-guide/mm/hugetlbpage.rst.
+ Format: size[KMG]
deferred_probe_timeout=
[KNL] Debugging option to set a timeout in seconds for
deferred probe to give up waiting on dependencies to
probe. Only specific dependencies (subsystems or
- drivers) that have opted in will be ignored. A timeout of 0
- will timeout at the end of initcalls. This option will also
+ drivers) that have opted in will be ignored. A timeout
+ of 0 will timeout at the end of initcalls. If the time
+ out hasn't expired, it'll be restarted by each
+ successful driver registration. This option will also
dump out devices still on the deferred probe list after
retrying.
+ delayacct [KNL] Enable per-task delay accounting
+
+ dell_smm_hwmon.ignore_dmi=
+ [HW] Continue probing hardware even if DMI data
+ indicates that the driver is running on unsupported
+ hardware.
+
+ dell_smm_hwmon.force=
+ [HW] Activate driver even if SMM BIOS signature does
+ not match list of supported models and enable otherwise
+ blacklisted features.
+
+ dell_smm_hwmon.power_status=
+ [HW] Report power status in /proc/i8k
+ (disabled by default).
+
+ dell_smm_hwmon.restricted=
+ [HW] Allow controlling fans only if SYS_ADMIN
+ capability is set.
+
+ dell_smm_hwmon.fan_mult=
+ [HW] Factor to multiply fan speed with.
+
+ dell_smm_hwmon.fan_max=
+ [HW] Maximum configurable fan speed.
+
dfltcc= [HW,S390]
Format: { on | off | def_only | inf_only | always }
on: s390 zlib hardware support for compression on
@@ -872,18 +1091,7 @@
miss to occur.
disable= [IPV6]
- See Documentation/networking/ipv6.txt.
-
- hardened_usercopy=
- [KNL] Under CONFIG_HARDENED_USERCOPY, whether
- hardening is enabled for this boot. Hardened
- usercopy checking is used to protect the kernel
- from reading or writing beyond known memory
- allocation boundaries as a proactive defense
- against bounds-checking flaws in the kernel's
- copy_to_user()/copy_from_user() interface.
- on Perform hardened usercopy checks (default).
- off Disable hardened usercopy checks.
+ See Documentation/networking/ipv6.rst.
disable_radix [PPC]
Disable RADIX MMU mode on POWER9
@@ -901,18 +1109,12 @@
causing system reset or hang due to sending
INIT from AP to BSP.
- perf_v4_pmi= [X86,INTEL]
- Format: <bool>
- Disable Intel PMU counter freezing feature.
- The feature only exists starting from
- Arch Perfmon v4 (Skylake and newer).
-
disable_ddw [PPC/PSERIES]
- Disable Dynamic DMA Window support. Use this if
+ Disable Dynamic DMA Window support. Use this
to workaround buggy firmware.
disable_ipv6= [IPV6]
- See Documentation/networking/ipv6.txt.
+ See Documentation/networking/ipv6.rst.
disable_mtrr_cleanup [X86]
The kernel tries to adjust MTRR layout from continuous
@@ -949,7 +1151,10 @@
driver later using sysfs.
driver_async_probe= [KNL]
- List of driver names to be probed asynchronously.
+ List of driver names to be probed asynchronously. *
+ matches with all driver names. If * is specified, the
+ rest of the listed driver names are those that will NOT
+ match the *.
Format: <driver_name1>,<driver_name2>...
drm.edid_firmware=[<connector>:]<file>[,[<connector>:]<file>]
@@ -987,17 +1192,11 @@
what data is available or for reverse-engineering.
dyndbg[="val"] [KNL,DYNAMIC_DEBUG]
- module.dyndbg[="val"]
+ <module>.dyndbg[="val"]
Enable debug messages at boot time. See
Documentation/admin-guide/dynamic-debug-howto.rst
for details.
- nopku [X86] Disable Memory Protection Keys CPU feature found
- in some Intel CPUs.
-
- module.async_probe [KNL]
- Enable asynchronous probe on this module.
-
early_ioremap_debug [KNL]
Enable debug messages in early_ioremap support. This
is useful for tracking down temporary early mappings
@@ -1017,10 +1216,10 @@
specified, the serial port must already be setup and
configured.
- uart[8250],io,<addr>[,options]
- uart[8250],mmio,<addr>[,options]
- uart[8250],mmio32,<addr>[,options]
- uart[8250],mmio32be,<addr>[,options]
+ uart[8250],io,<addr>[,options[,uartclk]]
+ uart[8250],mmio,<addr>[,options[,uartclk]]
+ uart[8250],mmio32,<addr>[,options[,uartclk]]
+ uart[8250],mmio32be,<addr>[,options[,uartclk]]
uart[8250],0x<addr>[,options]
Start an early, polled-mode console on the 8250/16550
UART at the specified I/O port or MMIO address.
@@ -1029,7 +1228,9 @@
If none of [io|mmio|mmio32|mmio32be], <addr> is assumed
to be equivalent to 'mmio'. 'options' are specified
in the same format described for "console=ttyS<n>"; if
- unspecified, the h/w is not initialized.
+ unspecified, the h/w is not initialized. 'uartclk' is
+ the uart clock frequency; if unspecified, it is set
+ to 'BASE_BAUD' * 16.
pl011,<addr>
pl011,mmio32,<addr>
@@ -1040,6 +1241,11 @@
the driver will use only 32-bit accessors to read/write
the device registers.
+ liteuart,<addr>
+ Start an early console on a litex serial port at the
+ specified address. The serial port must already be
+ setup and configured. Options are not yet supported.
+
meson,<addr>
Start an early, polled-mode console on a meson serial
port at the specified address. The serial port must
@@ -1142,6 +1348,7 @@
earlyprintk=dbgp[debugController#]
earlyprintk=pciserial[,force],bus:device.function[,baudrate]
earlyprintk=xdbc[xhciController#]
+ earlyprintk=bios
earlyprintk is useful when the kernel crashes before
the normal console is initialized. It is not enabled by
@@ -1150,7 +1357,7 @@
Append ",keep" to not disable it when the real console
takes over.
- Only one of vga, efi, serial, or usb debug port can
+ Only one of vga, serial, or usb debug port can
be used at a time.
Currently only ttyS0 and ttyS1 may be specified by
@@ -1165,13 +1372,15 @@
Interaction with the standard serial driver is not
very good.
- The VGA and EFI output is eventually overwritten by
+ The VGA output is eventually overwritten by
the real console.
- The xen output can only be used by Xen PV guests.
+ The xen option can only be used in Xen domains.
The sclp output can only be used on s390.
+ The bios output can only be used on SuperH.
+
The optional "force" to "pciserial" enables use of a
PCI device even when its classcode is not of the
UART class.
@@ -1184,34 +1393,27 @@
force: enforce the use of EDAC to report H/W event.
default: on.
- ekgdboc= [X86,KGDB] Allow early kernel console debugging
- ekgdboc=kbd
-
- This is designed to be used in conjunction with
- the boot argument: earlyprintk=vga
-
edd= [EDD]
Format: {"off" | "on" | "skip[mbr]"}
efi= [EFI]
- Format: { "old_map", "nochunk", "noruntime", "debug",
- "nosoftreserve", "disable_early_pci_dma",
- "no_disable_early_pci_dma" }
- old_map [X86-64]: switch to the old ioremap-based EFI
- runtime services mapping. [Needs CONFIG_X86_UV=y]
+ Format: { "debug", "disable_early_pci_dma",
+ "nochunk", "noruntime", "nosoftreserve",
+ "novamap", "no_disable_early_pci_dma" }
+ debug: enable misc debug output.
+ disable_early_pci_dma: disable the busmaster bit on all
+ PCI bridges while in the EFI boot stub.
nochunk: disable reading files in "chunks" in the EFI
boot stub, as chunking can cause problems with some
firmware implementations.
noruntime : disable EFI runtime services support
- debug: enable misc debug output
nosoftreserve: The EFI_MEMORY_SP (Specific Purpose)
attribute may cause the kernel to reserve the
memory range for a memory mapping driver to
claim. Specify efi=nosoftreserve to disable this
reservation and treat the memory by its base type
(i.e. EFI_CONVENTIONAL_MEMORY / "System RAM").
- disable_early_pci_dma: Disable the busmaster bit on all
- PCI bridges while in the EFI boot stub
+ novamap: do not call SetVirtualAddressMap().
no_disable_early_pci_dma: Leave the busmaster bit set
on all PCI bridges while in the EFI boot stub
@@ -1252,11 +1454,22 @@
eisa_irq_edge= [PARISC,HW]
See header of drivers/parisc/eisa.c.
+ ekgdboc= [X86,KGDB] Allow early kernel console debugging
+ Format: ekgdboc=kbd
+
+ This is designed to be used in conjunction with
+ the boot argument: earlyprintk=vga
+
+ This parameter works in place of the kgdboc parameter
+ but can only be used if the backing tty is available
+ very early in the boot process. For early debugging
+ via a serial port see kgdboc_earlycon instead.
+
elanfreq= [X86-32]
See comment before function elanfreq_setup() in
arch/x86/kernel/cpu/cpufreq/elanfreq.c.
- elfcorehdr=[size[KMG]@]offset[KMG] [IA64,PPC,SH,X86,S390]
+ elfcorehdr=[size[KMG]@]offset[KMG] [PPC,SH,X86,S390]
Specifies physical address of start of kernel core
image elf header and optionally the size. Generally
kexec loader will pass this option to capture kernel.
@@ -1273,7 +1486,7 @@
(in particular on some ATI chipsets).
The kernel tries to set a reasonable default.
- enforcing [SELINUX] Set initial enforcing status.
+ enforcing= [SELINUX] Set initial enforcing status.
Format: {"0" | "1"}
See security/selinux/Kconfig help text.
0 -- permissive (log only, no denials).
@@ -1295,22 +1508,30 @@
Permit 'security.evm' to be updated regardless of
current integrity status.
+ early_page_ext [KNL] Enforces page_ext initialization to earlier
+ stages so cover more early boot allocations.
+ Please note that as side effect some optimizations
+ might be disabled to achieve that (e.g. parallelized
+ memory initialization is disabled) so the boot process
+ might take longer, especially on systems with a lot of
+ memory. Available with CONFIG_PAGE_EXTENSION=y.
+
failslab=
+ fail_usercopy=
fail_page_alloc=
fail_make_request=[KNL]
General fault injection mechanism.
Format: <interval>,<probability>,<space>,<times>
See also Documentation/fault-injection/.
+ fb_tunnels= [NET]
+ Format: { initns | none }
+ See Documentation/admin-guide/sysctl/net.rst for
+ fb_tunnels_only_for_init_ns
+
floppy= [HW]
See Documentation/admin-guide/blockdev/floppy.rst.
- force_pal_cache_flush
- [IA-64] Avoid check_sal_cache_flush which may hang on
- buggy SAL_CACHE_FLUSH implementations. Using this
- parameter will force ia64_sal_cache_flush to call
- ia64_pal_cache_flush instead of SAL_CACHE_FLUSH.
-
forcepae [X86-32]
Forcefully enable Physical Address Extension (PAE).
Many Pentium M systems disable PAE but may have a
@@ -1323,6 +1544,23 @@
as early as possible in order to facilitate early
boot debugging.
+ ftrace_boot_snapshot
+ [FTRACE] On boot up, a snapshot will be taken of the
+ ftrace ring buffer that can be read at:
+ /sys/kernel/tracing/snapshot.
+ This is useful if you need tracing information from kernel
+ boot up that is likely to be overridden by user space
+ start up functionality.
+
+ Optionally, the snapshot can also be defined for a tracing
+ instance that was created by the trace_instance= command
+ line parameter.
+
+ trace_instance=foo,sched_switch ftrace_boot_snapshot=foo
+
+ The above will cause the "foo" tracing instance to trigger
+ a snapshot at the end of boot up.
+
ftrace_dump_on_oops[=orig_cpu]
[FTRACE] will dump the trace buffers on oops.
If no parameter is passed, ftrace will dump
@@ -1332,7 +1570,7 @@
ftrace_filter=[function-list]
[FTRACE] Limit the functions traced by the function
- tracer at boot up. function-list is a comma separated
+ tracer at boot up. function-list is a comma-separated
list of functions. This list can be changed at run
time by the set_ftrace_filter file in the debugfs
tracing directory.
@@ -1346,13 +1584,13 @@
ftrace_graph_filter=[function-list]
[FTRACE] Limit the top level callers functions traced
by the function graph tracer at boot up.
- function-list is a comma separated list of functions
+ function-list is a comma-separated list of functions
that can be changed at run time by the
set_graph_function file in the debugfs tracing directory.
ftrace_graph_notrace=[function-list]
[FTRACE] Do not trace from the functions specified in
- function-list. This list is a comma separated list of
+ function-list. This list is a comma-separated list of
functions that can be changed at run time by the
set_graph_notrace file in the debugfs tracing directory.
@@ -1380,6 +1618,25 @@
to enforce probe and suspend/resume ordering.
rpm -- Like "on", but also use to order runtime PM.
+ fw_devlink.strict=<bool>
+ [KNL] Treat all inferred dependencies as mandatory
+ dependencies. This only applies for fw_devlink=on|rpm.
+ Format: <bool>
+
+ fw_devlink.sync_state =
+ [KNL] When all devices that could probe have finished
+ probing, this parameter controls what to do with
+ devices that haven't yet received their sync_state()
+ calls.
+ Format: { strict | timeout }
+ strict -- Default. Continue waiting on consumers to
+ probe successfully.
+ timeout -- Give up waiting on consumers and call
+ sync_state() on any devices that haven't yet
+ received their sync_state() calls after
+ deferred_probe_timeout has expired or by
+ late_initcall() if !CONFIG_MODULES.
+
gamecon.map[2|3]=
[HW,JOY] Multisystem joystick and NES/SNES/PSX pad
support via parallel port (up to 5 devices per port)
@@ -1388,10 +1645,30 @@
gamma= [HW,DRM]
- gart_fix_e820= [X86_64] disable the fix e820 for K8 GART
+ gart_fix_e820= [X86-64] disable the fix e820 for K8 GART
Format: off | on
default: on
+ gather_data_sampling=
+ [X86,INTEL] Control the Gather Data Sampling (GDS)
+ mitigation.
+
+ Gather Data Sampling is a hardware vulnerability which
+ allows unprivileged speculative access to data which was
+ previously stored in vector registers.
+
+ This issue is mitigated by default in updated microcode.
+ The mitigation may have a performance impact but can be
+ disabled. On systems without the microcode mitigation
+ disabling AVX serves as a mitigation.
+
+ force: Disable AVX to mitigate systems without
+ microcode mitigation. No effect if the microcode
+ mitigation is present. Known to cause crashes in
+ userspace with buggy AVX enumeration.
+
+ off: Disable GDS mitigation.
+
gcov_persist= [GCOV] When non-zero (default), profiling data for
kernel modules is saved and remains accessible via
debugfs, even when the module is unloaded/reloaded.
@@ -1402,6 +1679,12 @@
Don't use this when you are not running on the
android emulator
+ gpio-mockup.gpio_mockup_ranges
+ [HW] Sets the ranges of gpiochip of for this device.
+ Format: <start1>,<end1>,<start2>,<end2>...
+ gpio-mockup.gpio_mockup_named_lines
+ [HW] Let the driver know GPIO lines should be named.
+
gpt [EFI] Forces disk with valid GPT signature but
invalid Protective MBR to be treated as GPT. If the
primary GPT is corrupted, it enables the backup/alternate
@@ -1425,14 +1708,21 @@
Format: <unsigned int> such that (rxsize & ~0x1fffc0) == 0.
Default: 1024
- gpio-mockup.gpio_mockup_ranges
- [HW] Sets the ranges of gpiochip of for this device.
- Format: <start1>,<end1>,<start2>,<end2>...
+ hardened_usercopy=
+ [KNL] Under CONFIG_HARDENED_USERCOPY, whether
+ hardening is enabled for this boot. Hardened
+ usercopy checking is used to protect the kernel
+ from reading or writing beyond known memory
+ allocation boundaries as a proactive defense
+ against bounds-checking flaws in the kernel's
+ copy_to_user()/copy_from_user() interface.
+ on Perform hardened usercopy checks (default).
+ off Disable hardened usercopy checks.
hardlockup_all_cpu_backtrace=
[KNL] Should the hard-lockup detector generate
backtraces on all cpus.
- Format: <integer>
+ Format: 0 | 1
hashdist= [KNL,NUMA] Large hashes allocated during boot
are distributed across NUMA nodes. Defaults on
@@ -1449,6 +1739,15 @@
corresponding firmware-first mode error processing
logic will be disabled.
+ hibernate= [HIBERNATION]
+ noresume Don't check if there's a hibernation image
+ present during boot.
+ nocompress Don't compress/decompress hibernation images.
+ no Disable hibernation and resume.
+ protect_image Turn on image protection during restoration
+ (that will set all pages holding image data
+ during restoration read-only).
+
highmem=nn[KMG] [KNL,BOOT] forces the highmem zone to have an exact
size of <nn>. This works even on boxes that have no
highmem otherwise. This also works to reduce highmem
@@ -1460,6 +1759,19 @@
hlt [BUGS=ARM,SH]
+ hostname= [KNL] Set the hostname (aka UTS nodename).
+ Format: <string>
+ This allows setting the system's hostname during early
+ startup. This sets the name returned by gethostname.
+ Using this parameter to set the hostname makes it
+ possible to ensure the hostname is correctly set before
+ any userspace processes run, avoiding the possibility
+ that a process may call gethostname before the hostname
+ has been explicitly set, resulting in the calling
+ process getting an incorrect result. The string must
+ not exceed the maximum allowed hostname length (usually
+ 64 characters) and will be truncated otherwise.
+
hpet= [X86-32,HPET] option to control HPET usage
Format: { enable (default) | disable | force |
verbose }
@@ -1471,27 +1783,62 @@
hpet_mmap= [X86, HPET_MMAP] Allow userspace to mmap HPET
registers. Default set by CONFIG_HPET_MMAP_DEFAULT.
- hugetlb_cma= [HW] The size of a cma area used for allocation
- of gigantic hugepages.
- Format: nn[KMGTPE]
-
- Reserve a cma area of given size and allocate gigantic
- hugepages using the cma allocator. If enabled, the
+ hugepages= [HW] Number of HugeTLB pages to allocate at boot.
+ If this follows hugepagesz (below), it specifies
+ the number of pages of hugepagesz to be allocated.
+ If this is the first HugeTLB parameter on the command
+ line, it specifies the number of pages to allocate for
+ the default huge page size. If using node format, the
+ number of pages to allocate per-node can be specified.
+ See also Documentation/admin-guide/mm/hugetlbpage.rst.
+ Format: <integer> or (node format)
+ <node>:<integer>[,<node>:<integer>]
+
+ hugepagesz=
+ [HW] The size of the HugeTLB pages. This is used in
+ conjunction with hugepages (above) to allocate huge
+ pages of a specific size at boot. The pair
+ hugepagesz=X hugepages=Y can be specified once for
+ each supported huge page size. Huge page sizes are
+ architecture dependent. See also
+ Documentation/admin-guide/mm/hugetlbpage.rst.
+ Format: size[KMG]
+
+ hugetlb_cma= [HW,CMA] The size of a CMA area used for allocation
+ of gigantic hugepages. Or using node format, the size
+ of a CMA area per node can be specified.
+ Format: nn[KMGTPE] or (node format)
+ <node>:nn[KMGTPE][,<node>:nn[KMGTPE]]
+
+ Reserve a CMA area of given size and allocate gigantic
+ hugepages using the CMA allocator. If enabled, the
boot-time allocation of gigantic hugepages is skipped.
- hugepages= [HW,X86-32,IA-64] HugeTLB pages to allocate at boot.
- hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
- On x86-64 and powerpc, this option can be specified
- multiple times interleaved with hugepages= to reserve
- huge pages of different sizes. Valid pages sizes on
- x86-64 are 2M (when the CPU supports "pse") and 1G
- (when the CPU supports the "pdpe1gb" cpuinfo flag).
+ hugetlb_free_vmemmap=
+ [KNL] Requires CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
+ enabled.
+ Control if HugeTLB Vmemmap Optimization (HVO) is enabled.
+ Allows heavy hugetlb users to free up some more
+ memory (7 * PAGE_SIZE for each 2MB hugetlb page).
+ Format: { on | off (default) }
+
+ on: enable HVO
+ off: disable HVO
+
+ Built with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON=y,
+ the default is on.
+
+ Note that the vmemmap pages may be allocated from the added
+ memory block itself when memory_hotplug.memmap_on_memory is
+ enabled, those vmemmap pages cannot be optimized even if this
+ feature is enabled. Other vmemmap pages not allocated from
+ the added memory block itself do not be affected.
hung_task_panic=
[KNL] Should the hung task detector generate panics.
- Format: <integer>
+ Format: 0 | 1
- A nonzero value instructs the kernel to panic when a
+ A value of 1 instructs the kernel to panic when a
hung task is detected. The default value is controlled
by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time
option. The value selected by this boot parameter can
@@ -1507,12 +1854,6 @@
which allow the hypervisor to 'idle' the
guest on lock contention.
- keep_bootcon [KNL]
- Do not unregister boot console at start. This is only
- useful for debugging when something happens in the window
- between unregistering the boot console and initializing
- the real console.
-
i2c_bus= [HW] Override the default board specific I2C bus speed
or register an additional I2C bus that is not
registered from board initialization code.
@@ -1547,20 +1888,11 @@
architectures force reset to be always executed
i8042.unlock [HW] Unlock (ignore) the keylock
i8042.kbdreset [HW] Reset device connected to KBD port
+ i8042.probe_defer
+ [HW] Allow deferred probing upon i8042 probe errors
i810= [HW,DRM]
- i8k.ignore_dmi [HW] Continue probing hardware even if DMI data
- indicates that the driver is running on unsupported
- hardware.
- i8k.force [HW] Activate i8k driver even if SMM BIOS signature
- does not match list of supported models.
- i8k.power_status
- [HW] Report power status in /proc/i8k
- (disabled by default)
- i8k.restricted [HW] Allow controlling fans only if SYS_ADMIN
- capability is set.
-
i915.invert_brightness=
[DRM] Invert the sense of the variable that is used to
set the brightness of the panel backlight. Normally a
@@ -1575,29 +1907,15 @@
0 -- machine default
1 -- force brightness inversion
+ ia32_emulation= [X86-64]
+ Format: <bool>
+ When true, allows loading 32-bit programs and executing 32-bit
+ syscalls, essentially overriding IA32_EMULATION_DEFAULT_DISABLED at
+ boot time. When false, unconditionally disables IA32 emulation.
+
icn= [HW,ISDN]
Format: <io>[,<membase>[,<icn_id>[,<icn_id2>]]]
- ide-core.nodma= [HW] (E)IDE subsystem
- Format: =0.0 to prevent dma on hda, =0.1 hdb =1.0 hdc
- .vlb_clock .pci_clock .noflush .nohpa .noprobe .nowerr
- .cdrom .chs .ignore_cable are additional options
- See Documentation/ide/ide.rst.
-
- ide-generic.probe-mask= [HW] (E)IDE subsystem
- Format: <int>
- Probe mask for legacy ISA IDE ports. Depending on
- platform up to 6 ports are supported, enabled by
- setting corresponding bits in the mask to 1. The
- default value is 0x0, which has a special meaning.
- On systems that have PCI, it triggers scanning the
- PCI bus for the first and the second port, which
- are then probed. On systems without PCI the value
- of 0x0 enables probing the two first ports as if it
- was 0x3.
-
- ide-pci-generic.all-generic-ide [HW] (E)IDE subsystem
- Claim all unknown PCI IDE storage controllers.
idle= [X86]
Format: idle=poll, idle=halt, idle=nomwait
@@ -1609,6 +1927,17 @@
In such case C2/C3 won't be used again.
idle=nomwait: Disable mwait for CPU C-states
+ idxd.sva= [HW]
+ Format: <bool>
+ Allow force disabling of Shared Virtual Memory (SVA)
+ support for the idxd driver. By default it is set to
+ true (1).
+
+ idxd.tc_override= [HW]
+ Format: <bool>
+ Allow override of default traffic class configuration
+ for the device. By default it is set to false (0).
+
ieee754= [MIPS] Select IEEE Std 754 conformance mode
Format: { strict | legacy | 2008 | relaxed }
Default: strict
@@ -1682,7 +2011,7 @@
ima_policy= [IMA]
The builtin policies to load during IMA setup.
Format: "tcb | appraise_tcb | secure_boot |
- fail_securely"
+ fail_securely | critical_data"
The "tcb" policy measures all programs exec'd, files
mmap'd for exec, and all files opened with the read
@@ -1701,6 +2030,9 @@
filesystems with the SB_I_UNVERIFIABLE_SIGNATURE
flag.
+ The "critical_data" policy measures kernel integrity
+ critical data.
+
ima_tcb [IMA] Deprecated. Use ima_policy= instead.
Load a policy which meets the needs of the Trusted
Computing Base. This means IMA will measure all
@@ -1709,7 +2041,8 @@
ima_template= [IMA]
Select one of defined IMA measurements template formats.
- Formats: { "ima" | "ima-ng" | "ima-sig" }
+ Formats: { "ima" | "ima-ng" | "ima-ngv2" | "ima-sig" |
+ "ima-sigv2" }
Default: "ima-ng"
ima_template_fmt=
@@ -1746,8 +2079,27 @@
initcall functions. Useful for debugging built-in
modules and initcalls.
+ initramfs_async= [KNL]
+ Format: <bool>
+ Default: 1
+ This parameter controls whether the initramfs
+ image is unpacked asynchronously, concurrently
+ with devices being probed and
+ initialized. This should normally just work,
+ but as a debugging aid, one can get the
+ historical behaviour of the initramfs
+ unpacking being completed before device_ and
+ late_ initcalls.
+
initrd= [BOOT] Specify the location of the initial ramdisk
+ initrdmem= [KNL] Specify a physical address and size from which to
+ load the initrd. If an initrd is compiled in or
+ specified in the bootparams, it takes priority over this
+ setting.
+ Format: ss[KMG],nn[KMG]
+ Default is 0, 0
+
init_on_alloc= [MM] Fill newly allocated pages and heap objects with
zeroes.
Format: 0 | 1
@@ -1757,7 +2109,7 @@
Format: 0 | 1
Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON.
- init_pkru= [x86] Specify the default memory protection keys rights
+ init_pkru= [X86] Specify the default memory protection keys rights
register contents for all processes. 0x55555554 by
default (disallow access to all but pkey 0). Can
override in debugfs after boot.
@@ -1765,7 +2117,7 @@
inport.irq= [HW] Inport (ATI XL and Microsoft) busmouse driver
Format: <irq>
- int_pln_enable [x86] Enable power limit notification interrupt
+ int_pln_enable [X86] Enable power limit notification interrupt
integrity_audit=[IMA]
Format: { "0" | "1" }
@@ -1783,26 +2135,18 @@
bypassed by not enabling DMAR with this option. In
this case, gfx device will use physical address for
DMA.
- forcedac [x86_64]
- With this option iommu will not optimize to look
- for io virtual address below 32-bit forcing dual
- address cycle on pci bus for cards supporting greater
- than 32-bit addressing. The default is to look
- for translation below 32-bit and if not available
- then look in the higher range.
strict [Default Off]
- With this option on every unmap_single operation will
- result in a hardware IOTLB flush operation as opposed
- to batching them for performance.
+ Deprecated, equivalent to iommu.strict=1.
sp_off [Default Off]
By default, super page will be supported if Intel IOMMU
has the capability. With this option, super page will
not be supported.
- sm_on [Default Off]
- By default, scalable mode will be disabled even if the
- hardware advertises that it has support for the scalable
- mode translation. With this option set, scalable mode
- will be used on hardware which claims to support it.
+ sm_on
+ Enable the Intel IOMMU scalable mode if the hardware
+ advertises that it has support for the scalable mode
+ translation.
+ sm_off
+ Disallow use of the Intel IOMMU scalable mode.
tboot_noforce [Default Off]
Do not force the Intel IOMMU enabled under tboot.
By default, tboot will force Intel IOMMU on, which
@@ -1812,11 +2156,6 @@
Note that using this option lowers the security
provided by tboot because it makes the system
vulnerable to DMA attacks.
- nobounce [Default off]
- Disable bounce buffer for untrusted devices such as
- the Thunderbolt devices. This will treat the untrusted
- devices as the trusted ones, hence might expose security
- risks of DMA attacks.
intel_idle.max_cstate= [KNL,HW,ACPI,X86]
0 disables intel_idle and fall back on acpi_idle.
@@ -1826,6 +2165,16 @@
disable
Do not enable intel_pstate as the default
scaling driver for the supported processors
+ active
+ Use intel_pstate driver to bypass the scaling
+ governors layer of cpufreq and provides it own
+ algorithms for p-state selection. There are two
+ P-state selection algorithms provided by
+ intel_pstate in the active mode: powersave and
+ performance. The way they both operate depends
+ on whether or not the hardware managed P-states
+ (HWP) feature has been enabled in the processor
+ and possibly on the processor model.
passive
Use intel_pstate as a scaling driver, but configure it
to work with generic cpufreq governors (instead of
@@ -1868,7 +2217,7 @@
strict regions from userspace.
relaxed
- iommu= [x86]
+ iommu= [X86]
off
force
noforce
@@ -1878,12 +2227,20 @@
merge
nomerge
soft
- pt [x86]
- nopt [x86]
+ pt [X86]
+ nopt [X86]
nobypass [PPC/POWERNV]
Disable IOMMU bypass, using IOMMU for PCI devices.
- iommu.strict= [ARM64] Configure TLB invalidation behaviour
+ iommu.forcedac= [ARM64, X86] Control IOVA allocation for PCI devices.
+ Format: { "0" | "1" }
+ 0 - Try to allocate a 32-bit DMA address first, before
+ falling back to the full range if needed.
+ 1 - Allocate directly from the full usable range,
+ forcing Dual Address Cycle for PCI cards supporting
+ greater than 32-bit addressing.
+
+ iommu.strict= [ARM64, X86, S390] Configure TLB invalidation behaviour
Format: { "0" | "1" }
0 - Lazy mode.
Request that DMA unmap operations use deferred
@@ -1891,9 +2248,12 @@
throughput at the cost of reduced device isolation.
Will fall back to strict mode if not supported by
the relevant IOMMU driver.
- 1 - Strict mode (default).
+ 1 - Strict mode.
DMA unmap operations invalidate IOMMU hardware TLBs
synchronously.
+ unset - Use value of CONFIG_IOMMU_DEFAULT_DMA_{LAZY,STRICT}.
+ Note: on x86, strict mode specified via one of the
+ legacy driver-specific options takes precedence.
iommu.passthrough=
[ARM64, X86] Configure DMA to bypass the IOMMU by default.
@@ -1902,7 +2262,7 @@
1 - Bypass the IOMMU for DMA.
unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.
- io7= [HW] IO7 for Marvel based alpha systems
+ io7= [HW] IO7 for Marvel-based Alpha systems
See comment before marvel_specify_io7 in
arch/alpha/kernel/core_marvel.c.
@@ -2022,42 +2382,76 @@
iucv= [HW,NET]
- ivrs_ioapic [HW,X86_64]
+ ivrs_ioapic [HW,X86-64]
Provide an override to the IOAPIC-ID<->DEVICE-ID
- mapping provided in the IVRS ACPI table. For
- example, to map IOAPIC-ID decimal 10 to
- PCI device 00:14.0 write the parameter as:
+ mapping provided in the IVRS ACPI table.
+ By default, PCI segment is 0, and can be omitted.
+
+ For example, to map IOAPIC-ID decimal 10 to
+ PCI segment 0x1 and PCI device 00:14.0,
+ write the parameter as:
+ ivrs_ioapic=10@0001:00:14.0
+
+ Deprecated formats:
+ * To map IOAPIC-ID decimal 10 to PCI device 00:14.0
+ write the parameter as:
ivrs_ioapic[10]=00:14.0
+ * To map IOAPIC-ID decimal 10 to PCI segment 0x1 and
+ PCI device 00:14.0 write the parameter as:
+ ivrs_ioapic[10]=0001:00:14.0
- ivrs_hpet [HW,X86_64]
+ ivrs_hpet [HW,X86-64]
Provide an override to the HPET-ID<->DEVICE-ID
- mapping provided in the IVRS ACPI table. For
- example, to map HPET-ID decimal 0 to
- PCI device 00:14.0 write the parameter as:
+ mapping provided in the IVRS ACPI table.
+ By default, PCI segment is 0, and can be omitted.
+
+ For example, to map HPET-ID decimal 10 to
+ PCI segment 0x1 and PCI device 00:14.0,
+ write the parameter as:
+ ivrs_hpet=10@0001:00:14.0
+
+ Deprecated formats:
+ * To map HPET-ID decimal 0 to PCI device 00:14.0
+ write the parameter as:
ivrs_hpet[0]=00:14.0
+ * To map HPET-ID decimal 10 to PCI segment 0x1 and
+ PCI device 00:14.0 write the parameter as:
+ ivrs_ioapic[10]=0001:00:14.0
- ivrs_acpihid [HW,X86_64]
+ ivrs_acpihid [HW,X86-64]
Provide an override to the ACPI-HID:UID<->DEVICE-ID
- mapping provided in the IVRS ACPI table. For
- example, to map UART-HID:UID AMD0020:0 to
- PCI device 00:14.5 write the parameter as:
+ mapping provided in the IVRS ACPI table.
+ By default, PCI segment is 0, and can be omitted.
+
+ For example, to map UART-HID:UID AMD0020:0 to
+ PCI segment 0x1 and PCI device ID 00:14.5,
+ write the parameter as:
+ ivrs_acpihid=AMD0020:0@0001:00:14.5
+
+ Deprecated formats:
+ * To map UART-HID:UID AMD0020:0 to PCI segment is 0,
+ PCI device ID 00:14.5, write the parameter as:
ivrs_acpihid[00:14.5]=AMD0020:0
+ * To map UART-HID:UID AMD0020:0 to PCI segment 0x1 and
+ PCI device ID 00:14.5, write the parameter as:
+ ivrs_acpihid[0001:00:14.5]=AMD0020:0
js= [HW,JOY] Analog joystick
See Documentation/input/joydev/joystick.rst.
- nokaslr [KNL]
- When CONFIG_RANDOMIZE_BASE is set, this disables
- kernel and module base offset ASLR (Address Space
- Layout Randomization).
-
kasan_multi_shot
[KNL] Enforce KASAN (Kernel Address Sanitizer) to print
report on every invalid memory access. Without this
parameter KASAN will print report only for the first
invalid access.
- keepinitrd [HW,ARM]
+ keep_bootcon [KNL]
+ Do not unregister boot console at start. This is only
+ useful for debugging when something happens in the window
+ between unregistering the boot console and initializing
+ the real console.
+
+ keepinitrd [HW,ARM] See retain_initrd.
kernelcore= [KNL,X86,IA-64,PPC]
Format: nn[KMGTPE] | nn% | "mirror"
@@ -2105,10 +2499,25 @@
kms, kbd format: kms,kbd
kms, kbd and serial format: kms,kbd,<ser_dev>[,baud]
+ kgdboc_earlycon= [KGDB,HW]
+ If the boot console provides the ability to read
+ characters and can work in polling mode, you can use
+ this parameter to tell kgdb to use it as a backend
+ until the normal console is registered. Intended to
+ be used together with the kgdboc parameter which
+ specifies the normal console to transition to.
+
+ The name of the early console should be specified
+ as the value of this parameter. Note that the name of
+ the early console might be different than the tty
+ name passed to kgdboc. It's OK to leave the value
+ blank and the first boot console that implements
+ read() will be picked.
+
kgdbwait [KGDB] Stop kernel execution and enter the
kernel debugger at the earliest opportunity.
- kmac= [MIPS] korina ethernet MAC address.
+ kmac= [MIPS] Korina ethernet MAC address.
Configure the RouterBoard 532 series on-chip
Ethernet adapter MAC address.
@@ -2137,16 +2546,43 @@
0: force disabled
1: force enabled
+ kunit.enable= [KUNIT] Enable executing KUnit tests. Requires
+ CONFIG_KUNIT to be set to be fully enabled. The
+ default value can be overridden via
+ KUNIT_DEFAULT_ENABLED.
+ Default is 1 (enabled)
+
kvm.ignore_msrs=[KVM] Ignore guest accesses to unhandled MSRs.
Default is 0 (don't ignore, but inject #GP)
+ kvm.eager_page_split=
+ [KVM,X86] Controls whether or not KVM will try to
+ proactively split all huge pages during dirty logging.
+ Eager page splitting reduces interruptions to vCPU
+ execution by eliminating the write-protection faults
+ and MMU lock contention that would otherwise be
+ required to split huge pages lazily.
+
+ VM workloads that rarely perform writes or that write
+ only to a small region of VM memory may benefit from
+ disabling eager page splitting to allow huge pages to
+ still be used for reads.
+
+ The behavior of eager page splitting depends on whether
+ KVM_DIRTY_LOG_INITIALLY_SET is enabled or disabled. If
+ disabled, all huge pages in a memslot will be eagerly
+ split when dirty logging is enabled on that memslot. If
+ enabled, eager page splitting will be performed during
+ the KVM_CLEAR_DIRTY ioctl, and only for the pages being
+ cleared.
+
+ Eager page splitting is only supported when kvm.tdp_mmu=Y.
+
+ Default is Y (on).
+
kvm.enable_vmware_backdoor=[KVM] Support VMware backdoor PV interface.
Default is false (don't support).
- kvm.mmu_audit= [KVM] This is a R/W parameter which allows audit
- KVM MMU at runtime.
- Default is 0 (off)
-
kvm.nx_huge_pages=
[KVM] Controls the software workaround for the
X86_BUG_ITLB_MULTIHIT bug.
@@ -2164,14 +2600,42 @@
[KVM] Controls how many 4KiB pages are periodically zapped
back to huge pages. 0 disables the recovery, otherwise if
the value is N KVM will zap 1/Nth of the 4KiB pages every
- minute. The default is 60.
+ period (see below). The default is 60.
- kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM.
- Default is 1 (enabled)
+ kvm.nx_huge_pages_recovery_period_ms=
+ [KVM] Controls the time period at which KVM zaps 4KiB pages
+ back to huge pages. If the value is a non-zero N, KVM will
+ zap a portion (see ratio above) of the pages every N msecs.
+ If the value is 0 (the default), KVM will pick a period based
+ on the ratio, such that a page is zapped after 1 hour on average.
+
+ kvm-amd.nested= [KVM,AMD] Control nested virtualization feature in
+ KVM/SVM. Default is 1 (enabled).
+
+ kvm-amd.npt= [KVM,AMD] Control KVM's use of Nested Page Tables,
+ a.k.a. Two-Dimensional Page Tables. Default is 1
+ (enabled). Disable by KVM if hardware lacks support
+ for NPT.
+
+ kvm-arm.mode=
+ [KVM,ARM] Select one of KVM/arm64's modes of operation.
- kvm-amd.npt= [KVM,AMD] Disable nested paging (virtualized MMU)
- for all guests.
- Default is 1 (enabled) if in 64-bit or 32-bit PAE mode.
+ none: Forcefully disable KVM.
+
+ nvhe: Standard nVHE-based mode, without support for
+ protected guests.
+
+ protected: nVHE-based mode with support for guests whose
+ state is kept private from the host.
+
+ nested: VHE-based mode with support for nested
+ virtualization. Requires at least ARMv8.3
+ hardware.
+
+ Defaults to VHE/nVHE based on hardware support. Setting
+ mode to "protected" will disable kexec and hibernation
+ for the host. "nested" is experimental and should be
+ used with extreme caution.
kvm-arm.vgic_v3_group0_trap=
[KVM,ARM] Trap guest accesses to GICv3 group-0
@@ -2189,26 +2653,41 @@
[KVM,ARM] Allow use of GICv4 for direct injection of
LPIs.
- kvm-intel.ept= [KVM,Intel] Disable extended page tables
- (virtualized MMU) support on capable Intel chips.
- Default is 1 (enabled)
+ kvm_cma_resv_ratio=n [PPC]
+ Reserves given percentage from system memory area for
+ contiguous memory allocation for KVM hash pagetable
+ allocation.
+ By default it reserves 5% of total system memory.
+ Format: <integer>
+ Default: 5
+
+ kvm-intel.ept= [KVM,Intel] Control KVM's use of Extended Page Tables,
+ a.k.a. Two-Dimensional Page Tables. Default is 1
+ (enabled). Disable by KVM if hardware lacks support
+ for EPT.
kvm-intel.emulate_invalid_guest_state=
- [KVM,Intel] Enable emulation of invalid guest states
- Default is 0 (disabled)
+ [KVM,Intel] Control whether to emulate invalid guest
+ state. Ignored if kvm-intel.enable_unrestricted_guest=1,
+ as guest state is never invalid for unrestricted
+ guests. This param doesn't apply to nested guests (L2),
+ as KVM never emulates invalid L2 guest state.
+ Default is 1 (enabled).
kvm-intel.flexpriority=
- [KVM,Intel] Disable FlexPriority feature (TPR shadow).
- Default is 1 (enabled)
+ [KVM,Intel] Control KVM's use of FlexPriority feature
+ (TPR shadow). Default is 1 (enabled). Disable by KVM if
+ hardware lacks support for it.
kvm-intel.nested=
- [KVM,Intel] Enable VMX nesting (nVMX).
- Default is 0 (disabled)
+ [KVM,Intel] Control nested virtualization feature in
+ KVM/VMX. Default is 1 (enabled).
kvm-intel.unrestricted_guest=
- [KVM,Intel] Disable unrestricted guest feature
- (virtualized real and unpaged mode) on capable
- Intel chips. Default is 1 (enabled)
+ [KVM,Intel] Control KVM's use of unrestricted guest
+ feature (virtualized real and unpaged mode). Default
+ is 1 (enabled). Disable by KVM if EPT is disabled or
+ hardware lacks support for it.
kvm-intel.vmentry_l1d_flush=[KVM,Intel] Mitigation for L1 Terminal Fault
CVE-2018-3620.
@@ -2222,9 +2701,27 @@
Default is cond (do L1 cache flush in specific instances)
- kvm-intel.vpid= [KVM,Intel] Disable Virtual Processor Identification
- feature (tagged TLBs) on capable Intel chips.
- Default is 1 (enabled)
+ kvm-intel.vpid= [KVM,Intel] Control KVM's use of Virtual Processor
+ Identification feature (tagged TLBs). Default is 1
+ (enabled). Disable by KVM if hardware lacks support
+ for it.
+
+ l1d_flush= [X86,INTEL]
+ Control mitigation for L1D based snooping vulnerability.
+
+ Certain CPUs are vulnerable to an exploit against CPU
+ internal buffers which can forward information to a
+ disclosure gadget under certain conditions.
+
+ In vulnerable processors, the speculatively
+ forwarded data can be used in a cache side channel
+ attack, to access data to which the attacker does
+ not have direct access.
+
+ This parameter controls the mitigation. The
+ options are:
+
+ on - enable the interface for the mitigation
l1tf= [X86] Control mitigation of the L1TF vulnerability on
affected CPUs
@@ -2298,9 +2795,10 @@
lapic [X86-32,APIC] Enable the local APIC even if BIOS
disabled it.
- lapic= [x86,APIC] "notscdeadline" Do not use TSC deadline
+ lapic= [X86,APIC] Do not use TSC deadline
value for LAPIC timer one-shot implementation. Default
back to the programmable timer unit in the LAPIC.
+ Format: notscdeadline
lapic_timer_c2_ok [X86,APIC] trust the local apic timer
in C2 power state.
@@ -2321,14 +2819,14 @@
when set.
Format: <int>
- libata.force= [LIBATA] Force configurations. The format is comma
- separated list of "[ID:]VAL" where ID is
- PORT[.DEVICE]. PORT and DEVICE are decimal numbers
- matching port, link or device. Basically, it matches
- the ATA ID string printed on console by libata. If
- the whole ID part is omitted, the last PORT and DEVICE
- values are used. If ID hasn't been specified yet, the
- configuration applies to all ports, links and devices.
+ libata.force= [LIBATA] Force configurations. The format is a comma-
+ separated list of "[ID:]VAL" where ID is PORT[.DEVICE].
+ PORT and DEVICE are decimal numbers matching port, link
+ or device. Basically, it matches the ATA ID string
+ printed on console by libata. If the whole ID part is
+ omitted, the last PORT and DEVICE values are used. If
+ ID hasn't been specified yet, the configuration applies
+ to all ports, links and devices.
If only DEVICE is omitted, the parameter applies to
the port and all links and devices behind it. DEVICE
@@ -2338,7 +2836,7 @@
host link and device attached to it.
The VAL specifies the configuration to force. As long
- as there's no ambiguity shortcut notation is allowed.
+ as there is no ambiguity, shortcut notation is allowed.
For example, both 1.5 and 1.5G would work for 1.5Gbps.
The following configurations can be forced.
@@ -2351,29 +2849,68 @@
udma[/][16,25,33,44,66,100,133] notation is also
allowed.
+ * nohrst, nosrst, norst: suppress hard, soft and both
+ resets.
+
+ * rstonce: only attempt one reset during hot-unplug
+ link recovery.
+
+ * [no]dbdelay: Enable or disable the extra 200ms delay
+ before debouncing a link PHY and device presence
+ detection.
+
* [no]ncq: Turn on or off NCQ.
- * [no]ncqtrim: Turn off queued DSM TRIM.
+ * [no]ncqtrim: Enable or disable queued DSM TRIM.
+
+ * [no]ncqati: Enable or disable NCQ trim on ATI chipset.
+
+ * [no]trim: Enable or disable (unqueued) TRIM.
+
+ * trim_zero: Indicate that TRIM command zeroes data.
+
+ * max_trim_128m: Set 128M maximum trim size limit.
+
+ * [no]dma: Turn on or off DMA transfers.
+
+ * atapi_dmadir: Enable ATAPI DMADIR bridge support.
+
+ * atapi_mod16_dma: Enable the use of ATAPI DMA for
+ commands that are not a multiple of 16 bytes.
+
+ * [no]dmalog: Enable or disable the use of the
+ READ LOG DMA EXT command to access logs.
+
+ * [no]iddevlog: Enable or disable access to the
+ identify device data log.
+
+ * [no]logdir: Enable or disable access to the general
+ purpose log directory.
- * nohrst, nosrst, norst: suppress hard, soft
- and both resets.
+ * max_sec_128: Set transfer size limit to 128 sectors.
- * rstonce: only attempt one reset during
- hot-unplug link recovery
+ * max_sec_1024: Set or clear transfer size limit to
+ 1024 sectors.
- * dump_id: dump IDENTIFY data.
+ * max_sec_lba48: Set or clear transfer size limit to
+ 65535 sectors.
- * atapi_dmadir: Enable ATAPI DMADIR bridge support
+ * [no]lpm: Enable or disable link power management.
+
+ * [no]setxfer: Indicate if transfer speed mode setting
+ should be skipped.
+
+ * [no]fua: Disable or enable FUA (Force Unit Access)
+ support for devices supporting this feature.
+
+ * dump_id: Dump IDENTIFY data.
* disable: Disable this device.
If there are multiple matching configurations changing
the same attribute, the last one is used.
- memblock=debug [KNL] Enable memblock debug messages.
-
- load_ramdisk= [RAM] List of ramdisks to load from floppy
- See Documentation/admin-guide/blockdev/ramdisk.rst.
+ load_ramdisk= [RAM] [Deprecated]
lockd.nlm_grace_period=P [NFS] Assign grace period.
Format: <integer>
@@ -2396,6 +2933,38 @@
to extract confidential information from the kernel
are also disabled.
+ locktorture.acq_writer_lim= [KNL]
+ Set the time limit in jiffies for a lock
+ acquisition. Acquisitions exceeding this limit
+ will result in a splat once they do complete.
+
+ locktorture.bind_readers= [KNL]
+ Specify the list of CPUs to which the readers are
+ to be bound.
+
+ locktorture.bind_writers= [KNL]
+ Specify the list of CPUs to which the writers are
+ to be bound.
+
+ locktorture.call_rcu_chains= [KNL]
+ Specify the number of self-propagating call_rcu()
+ chains to set up. These are used to ensure that
+ there is a high probability of an RCU grace period
+ in progress at any given time. Defaults to 0,
+ which disables these call_rcu() chains.
+
+ locktorture.long_hold= [KNL]
+ Specify the duration in milliseconds for the
+ occasional long-duration lock hold time. Defaults
+ to 100 milliseconds. Select 0 to disable.
+
+ locktorture.nested_locks= [KNL]
+ Specify the maximum lock nesting depth that
+ locktorture is to exercise, up to a limit of 8
+ (MAX_NESTED_LOCKS). Specify zero to disable.
+ Note that this parameter is ineffective on types
+ of locks that do not support nested acquisition.
+
locktorture.nreaders_stress= [KNL]
Set the number of locking read-acquisition kthreads.
Defaults to being automatically set based on the
@@ -2411,6 +2980,25 @@
Set time (s) between CPU-hotplug operations, or
zero to disable CPU-hotplug testing.
+ locktorture.rt_boost= [KNL]
+ Do periodic testing of real-time lock priority
+ boosting. Select 0 to disable, 1 to boost
+ only rt_mutex, and 2 to boost unconditionally.
+ Defaults to 2, which might seem to be an
+ odd choice, but which should be harmless for
+ non-real-time spinlocks, due to their disabling
+ of preemption. Note that non-realtime mutexes
+ disable boosting.
+
+ locktorture.rt_boost_factor= [KNL]
+ Number that determines how often and for how
+ long priority boosting is exercised. This is
+ scaled down by the number of writers, so that the
+ number of boosts per unit time remains roughly
+ constant as the number of writers increases.
+ On the other hand, the duration of each boost
+ increases with the number of writers.
+
locktorture.shuffle_interval= [KNL]
Set task-shuffle interval (jiffies). Shuffling
tasks allows some CPUs to go into dyntick-idle
@@ -2436,6 +3024,10 @@
locktorture.verbose= [KNL]
Enable additional printk() statements.
+ locktorture.writer_fifo= [KNL]
+ Run the write-side locktorture kthreads at
+ sched_set_fifo() real-time priority.
+
logibm.irq= [HW,MOUSE] Logitech Bus Mouse Driver
Format: <irq>
@@ -2510,11 +3102,11 @@
(machvec) in a generic kernel.
Example: machvec=hpzx1
- machtype= [Loongson] Share the same kernel image file between different
- yeeloong laptop.
+ machtype= [Loongson] Share the same kernel image file between
+ different yeeloong laptops.
Example: machtype=lemote-yeeloong-2f-7inch
- max_addr=nn[KMG] [KNL,BOOT,ia64] All physical memory greater
+ max_addr=nn[KMG] [KNL,BOOT,IA-64] All physical memory greater
than or equal to this physical address is ignored.
maxcpus= [SMP] Maximum number of processors that an SMP kernel
@@ -2535,7 +3127,7 @@
mce [X86-32] Machine Check Exception
- mce=option [X86-64] See Documentation/x86/x86_64/boot-options.rst
+ mce=option [X86-64] See Documentation/arch/x86/x86_64/boot-options.rst
md= [HW] RAID subsystems devices and level
See Documentation/admin-guide/md.rst.
@@ -2576,6 +3168,9 @@
For details see: Documentation/admin-guide/hw-vuln/mds.rst
+ mem=nn[KMG] [HEXAGON] Set the memory size.
+ Must be specified, otherwise memory size will be 0.
+
mem=nn[KMG] [KNL,BOOT] Force usage of a specific amount of memory
Amount of memory to be used in cases as follows:
@@ -2583,6 +3178,13 @@
2 when the kernel is not able to see the whole system memory;
3 memory that lies after 'mem=' boundary is excluded from
the hypervisor, then assigned to KVM guests.
+ 4 to limit the memory available for kdump kernel.
+
+ [ARC,MICROBLAZE] - the limit applies only to low memory,
+ high memory is not affected.
+
+ [ARM64] - only limits memory covered by the linear
+ mapping. The NOMAP regions are not affected.
[X86] Work as limiting max address. Use together
with memmap= to avoid physical address space collisions.
@@ -2593,14 +3195,24 @@
in above case 3, memory may need be hot added after boot
if system memory of hypervisor is not sufficient.
+ mem=nn[KMG]@ss[KMG]
+ [ARM,MIPS] - override the memory layout reported by
+ firmware.
+ Define a memory region of size nn[KMG] starting at
+ ss[KMG].
+ Multiple different regions can be specified with
+ multiple mem= parameters on the command line.
+
mem=nopentium [BUGS=X86-32] Disable usage of 4MB pages for kernel
memory.
+ memblock=debug [KNL] Enable memblock debug messages.
+
memchunk=nn[KMG]
[KNL,SH] Allow user to override the default size for
per-device physically contiguous DMA buffers.
- memhp_default_state=online/offline
+ memhp_default_state=online/offline/online_kernel/online_movable
[KNL] Set the initial state for the memory hotplug
onlining policy. If not specified, the default value is
set according to the
@@ -2615,7 +3227,7 @@
option description.
memmap=nn[KMG]@ss[KMG]
- [KNL] Force usage of a specific region of memory.
+ [KNL, X86, MIPS, XTENSA] Force usage of a specific region of memory.
Region of memory to be used is from ss to ss+nn.
If @ss[KMG] is omitted, it is equivalent to mem=nn[KMG],
which limits max address to nn[KMG].
@@ -2677,7 +3289,26 @@
seconds. Use this parameter to check at some
other rate. 0 disables periodic checking.
- memtest= [KNL,X86,ARM,PPC] Enable memtest
+ memory_hotplug.memmap_on_memory
+ [KNL,X86,ARM] Boolean flag to enable this feature.
+ Format: {on | off (default)}
+ When enabled, runtime hotplugged memory will
+ allocate its internal metadata (struct pages,
+ those vmemmap pages cannot be optimized even
+ if hugetlb_free_vmemmap is enabled) from the
+ hotadded memory which will allow to hotadd a
+ lot of memory without requiring additional
+ memory to do so.
+ This feature is disabled by default because it
+ has some implication on large (e.g. GB)
+ allocations in some configurations (e.g. small
+ memory blocks).
+ The state of the flag can be read in
+ /sys/module/memory_hotplug/parameters/memmap_on_memory.
+ Note that even when enabled, there are a few cases where
+ the feature is not effective.
+
+ memtest= [KNL,X86,ARM,M68K,PPC,RISCV] Enable memtest
Format: <integer>
default : 0 <disable>
Specifies the number of memtest passes to be
@@ -2695,7 +3326,7 @@
mem_encrypt=on: Activate SME
mem_encrypt=off: Do not activate SME
- Refer to Documentation/virt/kvm/amd-memory-encryption.rst
+ Refer to Documentation/virt/kvm/x86/amd-memory-encryption.rst
for details on when memory encryption can be activated.
mem_sleep_default= [SUSPEND] Default system suspend mode:
@@ -2704,9 +3335,6 @@
deep - Suspend-To-RAM or equivalent (if supported)
See Documentation/admin-guide/pm/sleep-states.rst.
- meye.*= [HW] Set MotionEye Camera parameters
- See Documentation/media/v4l-drivers/meye.rst.
-
mfgpt_irq= [IA-32] Specify the IRQ to use for the
Multi-Function General Purpose Timers on AMD Geode
platforms.
@@ -2718,7 +3346,12 @@
mga= [HW,DRM]
- min_addr=nn[KMG] [KNL,BOOT,ia64] All physical memory below this
+ microcode.force_minrev= [X86]
+ Format: <bool>
+ Enable or disable the microcode minimal revision
+ enforcement for the runtime microcode loader.
+
+ min_addr=nn[KMG] [KNL,BOOT,IA-64] All physical memory below this
physical address is ignored.
mini2440= [ARM,HW,KNL]
@@ -2740,7 +3373,7 @@
touchscreen support is not enabled in the mainstream
kernel as of 2.6.30, a preliminary port can be found
in the "bleeding edge" mini2440 support kernel at
- http://repo.or.cz/w/linux-2.6/mini2440.git
+ https://repo.or.cz/w/linux-2.6/mini2440.git
mitigations=
[X86,PPC,S390,ARM64] Control optional mitigations for
@@ -2752,18 +3385,25 @@
Disable all optional CPU mitigations. This
improves system performance, but it may also
expose users to several CPU vulnerabilities.
- Equivalent to: nopti [X86,PPC]
- kpti=0 [ARM64]
- nospectre_v1 [X86,PPC]
+ Equivalent to: if nokaslr then kpti=0 [ARM64]
+ gather_data_sampling=off [X86]
+ kvm.nx_huge_pages=off [X86]
+ l1tf=off [X86]
+ mds=off [X86]
+ mmio_stale_data=off [X86]
+ no_entry_flush [PPC]
+ no_uaccess_flush [PPC]
nobp=0 [S390]
+ nopti [X86,PPC]
+ nospectre_bhb [ARM64]
+ nospectre_v1 [X86,PPC]
nospectre_v2 [X86,PPC,S390,ARM64]
- spectre_v2_user=off [X86]
+ retbleed=off [X86]
spec_store_bypass_disable=off [X86,PPC]
+ spectre_v2_user=off [X86]
+ srbds=off [X86,INTEL]
ssbd=force-off [ARM64]
- l1tf=off [X86]
- mds=off [X86]
tsx_async_abort=off [X86]
- kvm.nx_huge_pages=off [X86]
Exceptions:
This does not have any effect on
@@ -2785,6 +3425,8 @@
Equivalent to: l1tf=flush,nosmt [X86]
mds=full,nosmt [X86]
tsx_async_abort=full,nosmt [X86]
+ mmio_stale_data=full,nosmt [X86]
+ retbleed=auto,nosmt [X86]
mminit_loglevel=
[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
@@ -2794,6 +3436,62 @@
log everything. Information is printed at KERN_DEBUG
so loglevel=8 may also need to be specified.
+ mmio_stale_data=
+ [X86,INTEL] Control mitigation for the Processor
+ MMIO Stale Data vulnerabilities.
+
+ Processor MMIO Stale Data is a class of
+ vulnerabilities that may expose data after an MMIO
+ operation. Exposed data could originate or end in
+ the same CPU buffers as affected by MDS and TAA.
+ Therefore, similar to MDS and TAA, the mitigation
+ is to clear the affected CPU buffers.
+
+ This parameter controls the mitigation. The
+ options are:
+
+ full - Enable mitigation on vulnerable CPUs
+
+ full,nosmt - Enable mitigation and disable SMT on
+ vulnerable CPUs.
+
+ off - Unconditionally disable mitigation
+
+ On MDS or TAA affected machines,
+ mmio_stale_data=off can be prevented by an active
+ MDS or TAA mitigation as these vulnerabilities are
+ mitigated with the same mechanism so in order to
+ disable this mitigation, you need to specify
+ mds=off and tsx_async_abort=off too.
+
+ Not specifying this option is equivalent to
+ mmio_stale_data=full.
+
+ For details see:
+ Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
+
+ <module>.async_probe[=<bool>] [KNL]
+ If no <bool> value is specified or if the value
+ specified is not a valid <bool>, enable asynchronous
+ probe on this module. Otherwise, enable/disable
+ asynchronous probe on this module as indicated by the
+ <bool> value. See also: module.async_probe
+
+ module.async_probe=<bool>
+ [KNL] When set to true, modules will use async probing
+ by default. To enable/disable async probing for a
+ specific module, use the module specific control that
+ is documented under <module>.async_probe. When both
+ module.async_probe and <module>.async_probe are
+ specified, <module>.async_probe takes precedence for
+ the specific module.
+
+ module.enable_dups_trace
+ [KNL] When CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS is set,
+ this means that duplicate request_module() calls will
+ trigger a WARN_ON() instead of a pr_warn(). Note that
+ if MODULE_DEBUG_AUTOLOAD_DUPS_TRACE is set, WARN_ON()s
+ will always be issued and this option does nothing.
module.sig_enforce
[KNL] When CONFIG_MODULE_SIG is set, this means that
modules without (valid) signatures will fail to load.
@@ -2840,29 +3538,19 @@
mtdparts= [MTD]
See drivers/mtd/parsers/cmdlinepart.c
- multitce=off [PPC] This parameter disables the use of the pSeries
- firmware feature for updating multiple TCE entries
- at a time.
-
- onenand.bdry= [HW,MTD] Flex-OneNAND Boundary Configuration
-
- Format: [die0_boundary][,die0_lock][,die1_boundary][,die1_lock]
-
- boundary - index of last SLC block on Flex-OneNAND.
- The remaining blocks are configured as MLC blocks.
- lock - Configure if Flex-OneNAND boundary should be locked.
- Once locked, the boundary cannot be changed.
- 1 indicates lock status, 0 indicates unlock status.
-
mtdset= [ARM]
ARM/S3C2412 JIVE boot control
- See arch/arm/mach-s3c2412/mach-jive.c
+ See arch/arm/mach-s3c/mach-jive.c
mtouchusb.raw_coordinates=
[HW] Make the MicroTouch USB driver use raw coordinates
('y', default) or cooked coordinates ('n')
+ mtrr=debug [X86]
+ Enable printing debug information related to MTRR
+ registers at boot time.
+
mtrr_chunk_size=nn[KMG] [X86]
used for mtrr cleanup. It is largest continuous chunk
that could hold holes aka. UC entries.
@@ -2880,6 +3568,10 @@
Used for mtrr cleanup. It is spare mtrr entries number.
Set to 2 or more if your graphical card needs more.
+ multitce=off [PPC] This parameter disables the use of the pSeries
+ firmware feature for updating multiple TCE entries
+ at a time.
+
n2= [NET] SDL Inc. RISCom/N2 synchronous serial card
netdev= [NET] Network devices parameters
@@ -2889,20 +3581,24 @@
This usage is only documented in each driver source
file if at all.
+ netpoll.carrier_timeout=
+ [NET] Specifies amount of time (in seconds) that
+ netpoll should wait for a carrier. By default netpoll
+ waits 4 seconds.
+
nf_conntrack.acct=
[NETFILTER] Enable connection tracking flow accounting
0 to disable accounting
1 to enable accounting
Default value is 0.
- nfsaddrs= [NFS] Deprecated. Use ip= instead.
- See Documentation/admin-guide/nfs/nfsroot.rst.
-
- nfsroot= [NFS] nfs root filesystem for disk-less boxes.
- See Documentation/admin-guide/nfs/nfsroot.rst.
+ nfs.cache_getent=
+ [NFS] sets the pathname to the program which is used
+ to update the NFS client cache entries.
- nfsrootdebug [NFS] enable nfsroot debugging messages.
- See Documentation/admin-guide/nfs/nfsroot.rst.
+ nfs.cache_getent_timeout=
+ [NFS] sets the timeout after which an attempt to
+ update a cache entry is deemed to have failed.
nfs.callback_nr_threads=
[NFSv4] set the total number of threads that the
@@ -2913,17 +3609,12 @@
[NFS] set the TCP port on which the NFSv4 callback
channel should listen.
- nfs.cache_getent=
- [NFS] sets the pathname to the program which is used
- to update the NFS client cache entries.
-
- nfs.cache_getent_timeout=
- [NFS] sets the timeout after which an attempt to
- update a cache entry is deemed to have failed.
-
- nfs.idmap_cache_timeout=
- [NFS] set the maximum lifetime for idmapper cache
- entries.
+ nfs.delay_retrans=
+ [NFS] specifies the number of times the NFSv4 client
+ retries the request before returning an EAGAIN error,
+ after a reply of NFS4ERR_DELAY from the server.
+ Only applies if the softerr mount option is enabled,
+ and the specified value is >= 0.
nfs.enable_ino64=
[NFS] enable 64-bit inode numbers.
@@ -2932,6 +3623,10 @@
of returning the full 64-bit number.
The default is to return 64-bit inode numbers.
+ nfs.idmap_cache_timeout=
+ [NFS] set the maximum lifetime for idmapper cache
+ entries.
+
nfs.max_session_cb_slots=
[NFSv4.1] Sets the maximum number of session
slots the client will assign to the callback
@@ -2959,21 +3654,14 @@
will be autodetected by the client, and it will fall
back to using the idmapper.
To turn off this behaviour, set the value to '0'.
+
nfs.nfs4_unique_id=
[NFS4] Specify an additional fixed unique ident-
ification string that NFSv4 clients can insert into
their nfs_client_id4 string. This is typically a
UUID that is generated at system install time.
- nfs.send_implementation_id =
- [NFSv4.1] Send client implementation identification
- information in exchange_id requests.
- If zero, no implementation identification information
- will be sent.
- The default is to send the implementation identification
- information.
-
- nfs.recover_lost_locks =
+ nfs.recover_lost_locks=
[NFSv4] Attempt to recover locks that were lost due
to a lease timeout on the server. Please note that
doing this risks data corruption, since there are
@@ -2985,7 +3673,15 @@
The default parameter value of '0' causes the kernel
not to attempt recovery of lost locks.
- nfs4.layoutstats_timer =
+ nfs.send_implementation_id=
+ [NFSv4.1] Send client implementation identification
+ information in exchange_id requests.
+ If zero, no implementation identification information
+ will be sent.
+ The default is to send the implementation identification
+ information.
+
+ nfs4.layoutstats_timer=
[NFSv4.2] Change the rate at which the kernel sends
layoutstats to the pNFS metadata server.
@@ -2994,6 +3690,11 @@
driver. A non-zero value sets the minimum interval
in seconds between layoutstats transmissions.
+ nfsd.inter_copy_offload_enable=
+ [NFSv4.2] When set to 1, the server will support
+ server-to-server copies for which this server is
+ the destination of the copy.
+
nfsd.nfs4_disable_idmapping=
[NFSv4] When set to the default of '1', the NFSv4
server will return only numeric uids and gids to
@@ -3001,6 +3702,27 @@
and gids from such clients. This is intended to ease
migration from NFSv2/v3.
+ nfsd.nfsd4_ssc_umount_timeout=
+ [NFSv4.2] When used as the destination of a
+ server-to-server copy, knfsd temporarily mounts
+ the source server. It caches the mount in case
+ it will be needed again, and discards it if not
+ used for the number of milliseconds specified by
+ this parameter.
+
+ nfsaddrs= [NFS] Deprecated. Use ip= instead.
+ See Documentation/admin-guide/nfs/nfsroot.rst.
+
+ nfsroot= [NFS] nfs root filesystem for disk-less boxes.
+ See Documentation/admin-guide/nfs/nfsroot.rst.
+
+ nfsrootdebug [NFS] enable nfsroot debugging messages.
+ See Documentation/admin-guide/nfs/nfsroot.rst.
+
+ nmi_backtrace.backtrace_idle [KNL]
+ Dump stacks even of idle CPUs in response to an
+ NMI stack-backtrace request.
+
nmi_debug= [KNL,SH] Specify one or more actions to take
when a NMI is triggered.
Format: [state][,regs][,debounce][,die]
@@ -3021,43 +3743,15 @@
These settings can be accessed at runtime via
the nmi_watchdog and hardlockup_panic sysctls.
- netpoll.carrier_timeout=
- [NET] Specifies amount of time (in seconds) that
- netpoll should wait for a carrier. By default netpoll
- waits 4 seconds.
-
no387 [BUGS=X86-32] Tells the kernel to use the 387 maths
emulation library even if a 387 maths coprocessor
is present.
- no5lvl [X86-64] Disable 5-level paging mode. Forces
- kernel to use 4-level paging instead.
+ no4lvl [RISCV] Disable 4-level and 5-level paging modes. Forces
+ kernel to use 3-level paging instead.
- no_console_suspend
- [HW] Never suspend the console
- Disable suspending of consoles during suspend and
- hibernate operations. Once disabled, debugging
- messages can reach various consoles while the rest
- of the system is being put to sleep (ie, while
- debugging driver suspend/resume hooks). This may
- not work reliably with all consoles, but is known
- to work with serial and VGA consoles.
- To facilitate more flexible debugging, we also add
- console_suspend, a printk module parameter to control
- it. Users could use console_suspend (usually
- /sys/module/printk/parameters/console_suspend) to
- turn on/off it dynamically.
-
- novmcoredd [KNL,KDUMP]
- Disable device dump. Device dump allows drivers to
- append dump data to vmcore so you can collect driver
- specified debug info. Drivers can append the data
- without any limit and this data is stored in memory,
- so this may cause significant memory stress. Disabling
- device dump can help save memory but the driver debug
- data will be no longer available. This parameter
- is only available when CONFIG_PROC_VMCORE_DEVICE_DUMP
- is set.
+ no5lvl [X86-64,RISCV] Disable 5-level paging mode. Forces
+ kernel to use 4-level paging instead.
noaliencache [MM, NUMA, SLAB] Disables the allocation of alien
caches in the slab allocator. Saves per-node memory,
@@ -3073,33 +3767,33 @@
noautogroup Disable scheduler automatic task group creation.
- nobats [PPC] Do not use BATs for mapping kernel lowmem
- on "Classic" PPC cores.
-
nocache [ARM]
- noclflush [BUGS=X86] Don't use the CLFLUSH instruction
+ no_console_suspend
+ [HW] Never suspend the console
+ Disable suspending of consoles during suspend and
+ hibernate operations. Once disabled, debugging
+ messages can reach various consoles while the rest
+ of the system is being put to sleep (ie, while
+ debugging driver suspend/resume hooks). This may
+ not work reliably with all consoles, but is known
+ to work with serial and VGA consoles.
+ To facilitate more flexible debugging, we also add
+ console_suspend, a printk module parameter to control
+ it. Users could use console_suspend (usually
+ /sys/module/printk/parameters/console_suspend) to
+ turn on/off it dynamically.
- nodelayacct [KNL] Disable per-task delay accounting
+ no_debug_objects
+ [KNL] Disable object debugging
nodsp [SH] Disable hardware DSP at boot time.
noefi Disable EFI runtime services support.
- noexec [IA-64]
-
- noexec [X86]
- On X86-32 available only on PAE configured kernels.
- noexec=on: enable non-executable mappings (default)
- noexec=off: disable non-executable mappings
+ no_entry_flush [PPC] Don't flush the L1-D cache when entering the kernel.
- nosmap [X86,PPC]
- Disable SMAP (Supervisor Mode Access Prevention)
- even if it is supported by processor.
-
- nosmep [X86,PPC]
- Disable SMEP (Supervisor Mode Execution Prevention)
- even if it is supported by processor.
+ noexec [IA-64]
noexec32 [X86-64]
This affects only 32-bit executables.
@@ -3108,60 +3802,18 @@
noexec32=off: disable non-executable mappings
read implies executable mappings
+ no_file_caps Tells the kernel not to honor file capabilities. The
+ only way then for a file to be executed with privilege
+ is to be setuid root or executed by root.
+
nofpu [MIPS,SH] Disable hardware FPU at boot time.
+ nofsgsbase [X86] Disables FSGSBASE instructions.
+
nofxsr [BUGS=X86-32] Disables x86 floating point extended
register save and restore. The kernel will only save
legacy floating-point registers on task switch.
- nohugeiomap [KNL,x86,PPC] Disable kernel huge I/O mappings.
-
- nosmt [KNL,S390] Disable symmetric multithreading (SMT).
- Equivalent to smt=1.
-
- [KNL,x86] Disable symmetric multithreading (SMT).
- nosmt=force: Force disable SMT, cannot be undone
- via the sysfs control file.
-
- nospectre_v1 [X86,PPC] Disable mitigations for Spectre Variant 1
- (bounds check bypass). With this option data leaks are
- possible in the system.
-
- nospectre_v2 [X86,PPC_FSL_BOOK3E,ARM64] Disable all mitigations for
- the Spectre variant 2 (indirect branch prediction)
- vulnerability. System may allow data leaks with this
- option.
-
- nospec_store_bypass_disable
- [HW] Disable all mitigations for the Speculative Store Bypass vulnerability
-
- noxsave [BUGS=X86] Disables x86 extended register state save
- and restore using xsave. The kernel will fallback to
- enabling legacy floating-point and sse state.
-
- noxsaveopt [X86] Disables xsaveopt used in saving x86 extended
- register states. The kernel will fall back to use
- xsave to save the states. By using this parameter,
- performance of saving the states is degraded because
- xsave doesn't support modified optimization while
- xsaveopt supports it on xsaveopt enabled systems.
-
- noxsaves [X86] Disables xsaves and xrstors used in saving and
- restoring x86 extended register state in compacted
- form of xsave area. The kernel will fall back to use
- xsaveopt and xrstor to save and restore the states
- in standard form of xsave area. By using this
- parameter, xsave area per process might occupy more
- memory on xsaves enabled systems.
-
- nohlt [BUGS=ARM,SH] Tells the kernel that the sleep(SH) or
- wfi(ARM) instruction doesn't work correctly and not to
- use it. This is also useful when using JTAG debugger.
-
- no_file_caps Tells the kernel not to honor file capabilities. The
- only way then for a file to be executed with privilege
- is to be setuid root or executed by root.
-
nohalt [IA-64] Tells the kernel not to use the power saving
function PAL_HALT_LIGHT when idle. This increases
power-consumption. On the positive side, it reduces
@@ -3169,8 +3821,35 @@
in certain environments such as networked servers or
real-time systems.
+ no_hash_pointers
+ Force pointers printed to the console or buffers to be
+ unhashed. By default, when a pointer is printed via %p
+ format string, that pointer is "hashed", i.e. obscured
+ by hashing the pointer value. This is a security feature
+ that hides actual kernel addresses from unprivileged
+ users, but it also makes debugging the kernel more
+ difficult since unequal pointers can no longer be
+ compared. However, if this command-line option is
+ specified, then all normal pointers will have their true
+ value printed. This option should only be specified when
+ debugging the kernel. Please do not use on production
+ kernels.
+
nohibernate [HIBERNATION] Disable hibernation and resume.
+ nohlt [ARM,ARM64,MICROBLAZE,MIPS,PPC,SH] Forces the kernel to
+ busy wait in do_idle() and not use the arch_cpu_idle()
+ implementation; requires CONFIG_GENERIC_IDLE_POLL_SETUP
+ to be effective. This is useful on platforms where the
+ sleep(SH) or wfi(ARM,ARM64) instructions do not work
+ correctly or when doing power measurements to evaluate
+ the impact of the sleep instructions. This is also
+ useful when using JTAG debugger.
+
+ nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings.
+
+ nohugevmalloc [KNL,X86,PPC,ARM64] Disable kernel huge vmalloc mappings.
+
nohz= [KNL] Boottime enable/disable dynamic ticks
Valid arguments: on, off
Default: on
@@ -3185,15 +3864,8 @@
just as if they had also been called out in the
rcu_nocbs= boot parameter.
- noiotrap [SH] Disables trapped I/O port accesses.
-
- noirqdebug [X86-32] Disables the code which attempts to detect and
- disable unhandled interrupt sources.
-
- no_timer_check [X86,APIC] Disables the code which tests for
- broken timer IRQ sources.
-
- noisapnp [ISAPNP] Disables ISA PnP code.
+ Note that this argument takes precedence over
+ the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
noinitrd [RAM] Tells the kernel not to load any configured
initial RAM disk.
@@ -3206,28 +3878,29 @@
noinvpcid [X86] Disable the INVPCID cpu feature.
+ noiotrap [SH] Disables trapped I/O port accesses.
+
+ noirqdebug [X86-32] Disables the code which attempts to detect and
+ disable unhandled interrupt sources.
+
+ noisapnp [ISAPNP] Disables ISA PnP code.
+
nojitter [IA-64] Disables jitter checking for ITC timers.
- no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver
+ nokaslr [KNL]
+ When CONFIG_RANDOMIZE_BASE is set, this disables
+ kernel and module base offset ASLR (Address Space
+ Layout Randomization).
no-kvmapf [X86,KVM] Disable paravirtualized asynchronous page
fault handling.
- no-vmw-sched-clock
- [X86,PV_OPS] Disable paravirtualized VMware scheduler
- clock and use the default one.
-
- no-steal-acc [X86,PV_OPS,ARM64] Disable paravirtualized steal time
- accounting. steal time is computed, but won't
- influence scheduler behaviour
+ no-kvmclock [X86,KVM] Disable paravirtualized KVM clock driver
nolapic [X86-32,APIC] Do not enable or use the local APIC.
nolapic_timer [X86-32,APIC] Do not use the local APIC timer.
- noltlbs [PPC] Do not use large page/tlb entries for kernel
- lowmem mapping on PPC40x and PPC8xx
-
nomca [IA-64] Disable machine check abort handling
nomce [X86-32] Disable Machine Check Exception
@@ -3235,46 +3908,123 @@
nomfgpt [X86-32] Disable Multi-Function General Purpose
Timer usage (for AMD Geode machines).
+ nomodeset Disable kernel modesetting. Most systems' firmware
+ sets up a display mode and provides framebuffer memory
+ for output. With nomodeset, DRM and fbdev drivers will
+ not load if they could possibly displace the pre-
+ initialized output. Only the system framebuffer will
+ be available for use. The respective drivers will not
+ perform display-mode changes or accelerated rendering.
+
+ Useful as error fallback, or for testing and debugging.
+
+ nomodule Disable module load
+
nonmi_ipi [X86] Disable using NMI IPIs during panic/reboot to
shutdown the other cpus. Instead use the REBOOT_VECTOR
irq.
- nomodule Disable module load
-
nopat [X86] Disable PAT (page attribute table extension of
pagetables) support.
nopcid [X86-64] Disable the PCID cpu feature.
+ nopku [X86] Disable Memory Protection Keys CPU feature found
+ in some Intel CPUs.
+
+ nopti [X86-64]
+ Equivalent to pti=off
+
+ nopv= [X86,XEN,KVM,HYPER_V,VMWARE]
+ Disables the PV optimizations forcing the guest to run
+ as generic guest with no PV drivers. Currently support
+ XEN HVM, KVM, HYPER_V and VMWARE guest.
+
+ nopvspin [X86,XEN,KVM]
+ Disables the qspinlock slow path using PV optimizations
+ which allow the hypervisor to 'idle' the guest on lock
+ contention.
+
norandmaps Don't use address space randomization. Equivalent to
echo 0 > /proc/sys/kernel/randomize_va_space
noreplace-smp [X86-32,SMP] Don't replace SMP instructions
with UP alternatives
- nordrand [X86] Disable kernel use of the RDRAND and
- RDSEED instructions even if they are supported
- by the processor. RDRAND and RDSEED are still
- available to user space applications.
-
noresume [SWSUSP] Disables resume and restores original swap
space.
+ nosbagart [IA-64]
+
no-scroll [VGA] Disables scrollback.
This is required for the Braillex ib80-piezo Braille
reader made by F.H. Papenmeier (Germany).
- nosbagart [IA-64]
+ nosgx [X86-64,SGX] Disables Intel SGX kernel support.
- nosep [BUGS=X86-32] Disables x86 SYSENTER/SYSEXIT support.
+ nosmap [PPC]
+ Disable SMAP (Supervisor Mode Access Prevention)
+ even if it is supported by processor.
+
+ nosmep [PPC64s]
+ Disable SMEP (Supervisor Mode Execution Prevention)
+ even if it is supported by processor.
nosmp [SMP] Tells an SMP kernel to act as a UP kernel,
and disable the IO APIC. legacy for "maxcpus=0".
+ nosmt [KNL,MIPS,PPC,S390] Disable symmetric multithreading (SMT).
+ Equivalent to smt=1.
+
+ [KNL,X86,PPC] Disable symmetric multithreading (SMT).
+ nosmt=force: Force disable SMT, cannot be undone
+ via the sysfs control file.
+
nosoftlockup [KNL] Disable the soft-lockup detector.
+ nospec_store_bypass_disable
+ [HW] Disable all mitigations for the Speculative Store Bypass vulnerability
+
+ nospectre_bhb [ARM64] Disable all mitigations for Spectre-BHB (branch
+ history injection) vulnerability. System may allow data leaks
+ with this option.
+
+ nospectre_v1 [X86,PPC] Disable mitigations for Spectre Variant 1
+ (bounds check bypass). With this option data leaks are
+ possible in the system.
+
+ nospectre_v2 [X86,PPC_E500,ARM64] Disable all mitigations for
+ the Spectre variant 2 (indirect branch prediction)
+ vulnerability. System may allow data leaks with this
+ option.
+
+ no-steal-acc [X86,PV_OPS,ARM64,PPC/PSERIES,RISCV] Disable
+ paravirtualized steal time accounting. steal time is
+ computed, but won't influence scheduler behaviour
+
nosync [HW,M68K] Disables sync negotiation for all devices.
+ no_timer_check [X86,APIC] Disables the code which tests for
+ broken timer IRQ sources.
+
+ no_uaccess_flush
+ [PPC] Don't flush the L1-D cache after accessing user data.
+
+ novmcoredd [KNL,KDUMP]
+ Disable device dump. Device dump allows drivers to
+ append dump data to vmcore so you can collect driver
+ specified debug info. Drivers can append the data
+ without any limit and this data is stored in memory,
+ so this may cause significant memory stress. Disabling
+ device dump can help save memory but the driver debug
+ data will be no longer available. This parameter
+ is only available when CONFIG_PROC_VMCORE_DEVICE_DUMP
+ is set.
+
+ no-vmw-sched-clock
+ [X86,PV_OPS] Disable paravirtualized VMware scheduler
+ clock and use the default one.
+
nowatchdog [KNL] Disable both lockup detectors, i.e.
soft-lockup and NMI watchdog (hard-lockup).
@@ -3282,19 +4032,28 @@
nox2apic [X86-64,APIC] Do not enable x2APIC mode.
- cpu0_hotplug [X86] Turn on CPU0 hotplug feature when
- CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off.
- Some features depend on CPU0. Known dependencies are:
- 1. Resume from suspend/hibernate depends on CPU0.
- Suspend/hibernate will fail if CPU0 is offline and you
- need to online CPU0 before suspend/hibernate.
- 2. PIC interrupts also depend on CPU0. CPU0 can't be
- removed if a PIC interrupt is detected.
- It's said poweroff/reboot may depend on CPU0 on some
- machines although I haven't seen such issues so far
- after CPU0 is offline on a few tested machines.
- If the dependencies are under your control, you can
- turn on cpu0_hotplug.
+ NOTE: this parameter will be ignored on systems with the
+ LEGACY_XAPIC_DISABLED bit set in the
+ IA32_XAPIC_DISABLE_STATUS MSR.
+
+ noxsave [BUGS=X86] Disables x86 extended register state save
+ and restore using xsave. The kernel will fallback to
+ enabling legacy floating-point and sse state.
+
+ noxsaveopt [X86] Disables xsaveopt used in saving x86 extended
+ register states. The kernel will fall back to use
+ xsave to save the states. By using this parameter,
+ performance of saving the states is degraded because
+ xsave doesn't support modified optimization while
+ xsaveopt supports it on xsaveopt enabled systems.
+
+ noxsaves [X86] Disables xsaves and xrstors used in saving and
+ restoring x86 extended register state in compacted
+ form of xsave area. The kernel will fall back to use
+ xsaveopt and xrstor to save and restore the states
+ in standard form of xsave area. By using this
+ parameter, xsave area per process might occupy more
+ memory on xsaves enabled systems.
nps_mtm_hs_ctr= [KNL,ARC]
This parameter sets the maximum duration, in
@@ -3320,7 +4079,11 @@
nr_uarts= [SERIAL] maximum number of UARTs to be registered.
- numa_balancing= [KNL,X86] Enable or disable automatic NUMA balancing.
+ numa=off [KNL, ARM64, PPC, RISCV, SPARC, X86] Disable NUMA, Only
+ set up a single NUMA node spanning all memory.
+
+ numa_balancing= [KNL,ARM64,PPC,RISCV,S390,X86] Enable or disable automatic
+ NUMA balancing.
Allowed values are enable and disable
numa_zonelist_order= [KNL, BOOT] Select zonelist order for NUMA.
@@ -3329,7 +4092,7 @@
See Documentation/admin-guide/sysctl/vm.rst for details.
ohci1394_dma=early [HW] enable debugging via the ohci1394 driver.
- See Documentation/debugging-via-ohci1394.txt for more
+ See Documentation/core-api/debugging-via-ohci1394.rst for more
info.
olpc_ec_timeout= [OLPC] ms delay when issuing EC commands
@@ -3344,19 +4107,15 @@
For example, to override I2C bus2:
omap_mux=i2c2_scl.i2c2_scl=0x100,i2c2_sda.i2c2_sda=0x100
- oprofile.timer= [HW]
- Use timer interrupt instead of performance counters
+ onenand.bdry= [HW,MTD] Flex-OneNAND Boundary Configuration
- oprofile.cpu_type= Force an oprofile cpu type
- This might be useful if you have an older oprofile
- userland or if you want common events.
- Format: { arch_perfmon }
- arch_perfmon: [X86] Force use of architectural
- perfmon on Intel CPUs instead of the
- CPU specific event set.
- timer: [X86] Force use of architectural NMI
- timer mode (see also oprofile.timer
- for generic hr timer mode)
+ Format: [die0_boundary][,die0_lock][,die1_boundary][,die1_lock]
+
+ boundary - index of last SLC block on Flex-OneNAND.
+ The remaining blocks are configured as MLC blocks.
+ lock - Configure if Flex-OneNAND boundary should be locked.
+ Once locked, the boundary cannot be changed.
+ 1 indicates lock status, 0 indicates unlock status.
oops=panic Always panic on oopses. Default is to just kill the
process, but there is a small probability of
@@ -3386,12 +4145,34 @@
off: turn off poisoning (default)
on: turn on poisoning
+ page_reporting.page_reporting_order=
+ [KNL] Minimal page reporting order
+ Format: <integer>
+ Adjust the minimal page reporting order. The page
+ reporting is disabled when it exceeds MAX_PAGE_ORDER.
+
panic= [KNL] Kernel behaviour on panic: delay <timeout>
timeout > 0: seconds before rebooting
timeout = 0: wait forever
timeout < 0: reboot immediately
Format: <timeout>
+ panic_on_taint= Bitmask for conditionally calling panic() in add_taint()
+ Format: <hex>[,nousertaint]
+ Hexadecimal bitmask representing the set of TAINT flags
+ that will cause the kernel to panic when add_taint() is
+ called with any of the flags in this set.
+ The optional switch "nousertaint" can be utilized to
+ prevent userspace forced crashes by writing to sysctl
+ /proc/sys/kernel/tainted any flagset matching with the
+ bitmask set on panic_on_taint.
+ See Documentation/admin-guide/tainted-kernels.rst for
+ extra details on the taint flags that users can pick
+ to compose the bitmask to assign to panic_on_taint.
+
+ panic_on_warn=1 panic() instead of WARN(). Useful to cause kdump
+ on a WARN().
+
panic_print= Bitmask for printing system info when panic happens.
User can chose combination of the following bits:
bit 0: print all tasks info
@@ -3400,17 +4181,11 @@
bit 3: print locks info if CONFIG_LOCKDEP is on
bit 4: print ftrace buffer
bit 5: print all printk messages in buffer
-
- panic_on_warn panic() instead of WARN(). Useful to cause kdump
- on a WARN().
-
- crash_kexec_post_notifiers
- Run kdump after running panic-notifiers and dumping
- kmsg. This only for the users who doubt kdump always
- succeeds in any situation.
- Note that this also increases risks of kdump failure,
- because some panic notifiers can make the crashed
- kernel more unstable.
+ bit 6: print all CPUs backtrace (if available in the arch)
+ *Be aware* that this option may print a _lot_ of lines,
+ so there are risks of losing older messages in the log.
+ Use this option carefully, maybe worth to setup a
+ bigger log buffer with "log_buf_len" along with this.
parkbd.port= [HW] Parallel port number the keyboard adapter is
connected to, default is 0.
@@ -3441,17 +4216,103 @@
Currently this function knows 686a and 8231 chips.
Format: [spp|ps2|epp|ecp|ecpepp]
- pause_on_oops=
+ pata_legacy.all= [HW,LIBATA]
+ Format: <int>
+ Set to non-zero to probe primary and secondary ISA
+ port ranges on PCI systems where no PCI PATA device
+ has been found at either range. Disabled by default.
+
+ pata_legacy.autospeed= [HW,LIBATA]
+ Format: <int>
+ Set to non-zero if a chip is present that snoops speed
+ changes. Disabled by default.
+
+ pata_legacy.ht6560a= [HW,LIBATA]
+ Format: <int>
+ Set to 1, 2, or 3 for HT 6560A on the primary channel,
+ the secondary channel, or both channels respectively.
+ Disabled by default.
+
+ pata_legacy.ht6560b= [HW,LIBATA]
+ Format: <int>
+ Set to 1, 2, or 3 for HT 6560B on the primary channel,
+ the secondary channel, or both channels respectively.
+ Disabled by default.
+
+ pata_legacy.iordy_mask= [HW,LIBATA]
+ Format: <int>
+ IORDY enable mask. Set individual bits to allow IORDY
+ for the respective channel. Bit 0 is for the first
+ legacy channel handled by this driver, bit 1 is for
+ the second channel, and so on. The sequence will often
+ correspond to the primary legacy channel, the secondary
+ legacy channel, and so on, but the handling of a PCI
+ bus and the use of other driver options may interfere
+ with the sequence. By default IORDY is allowed across
+ all channels.
+
+ pata_legacy.opti82c46x= [HW,LIBATA]
+ Format: <int>
+ Set to 1, 2, or 3 for Opti 82c611A on the primary
+ channel, the secondary channel, or both channels
+ respectively. Disabled by default.
+
+ pata_legacy.opti82c611a= [HW,LIBATA]
+ Format: <int>
+ Set to 1, 2, or 3 for Opti 82c465MV on the primary
+ channel, the secondary channel, or both channels
+ respectively. Disabled by default.
+
+ pata_legacy.pio_mask= [HW,LIBATA]
+ Format: <int>
+ PIO mode mask for autospeed devices. Set individual
+ bits to allow the use of the respective PIO modes.
+ Bit 0 is for mode 0, bit 1 is for mode 1, and so on.
+ All modes allowed by default.
+
+ pata_legacy.probe_all= [HW,LIBATA]
+ Format: <int>
+ Set to non-zero to probe tertiary and further ISA
+ port ranges on PCI systems. Disabled by default.
+
+ pata_legacy.probe_mask= [HW,LIBATA]
+ Format: <int>
+ Probe mask for legacy ISA PATA ports. Depending on
+ platform configuration and the use of other driver
+ options up to 6 legacy ports are supported: 0x1f0,
+ 0x170, 0x1e8, 0x168, 0x1e0, 0x160, however probing
+ of individual ports can be disabled by setting the
+ corresponding bits in the mask to 1. Bit 0 is for
+ the first port in the list above (0x1f0), and so on.
+ By default all supported ports are probed.
+
+ pata_legacy.qdi= [HW,LIBATA]
+ Format: <int>
+ Set to non-zero to probe QDI controllers. By default
+ set to 1 if CONFIG_PATA_QDI_MODULE, 0 otherwise.
+
+ pata_legacy.winbond= [HW,LIBATA]
+ Format: <int>
+ Set to non-zero to probe Winbond controllers. Use
+ the standard I/O port (0x130) if 1, otherwise the
+ value given is the I/O port to use (typically 0x1b0).
+ By default set to 1 if CONFIG_PATA_WINBOND_VLB_MODULE,
+ 0 otherwise.
+
+ pata_platform.pio_mask= [HW,LIBATA]
+ Format: <int>
+ Supported PIO mode mask. Set individual bits to allow
+ the use of the respective PIO modes. Bit 0 is for
+ mode 0, bit 1 is for mode 1, and so on. Mode 0 only
+ allowed by default.
+
+ pause_on_oops=<int>
Halt all CPUs after the first oops has been printed for
the specified number of seconds. This is to be used if
your oopses keep scrolling off the screen.
pcbit= [HW,ISDN]
- pcd. [PARIDE]
- See header of drivers/block/paride/pcd.c.
- See also Documentation/admin-guide/blockdev/paride.rst.
-
pci=option[,option...] [PCI] various PCI subsystem options.
Some options herein operate on a specific device
@@ -3566,6 +4427,15 @@
please report a bug.
nocrs [X86] Ignore PCI host bridge windows from ACPI.
If you need to use this, please report a bug.
+ use_e820 [X86] Use E820 reservations to exclude parts of
+ PCI host bridge windows. This is a workaround
+ for BIOS defects in host bridge _CRS methods.
+ If you need to use this, please report a bug to
+ <linux-pci@vger.kernel.org>.
+ no_e820 [X86] Ignore E820 reservations for PCI host
+ bridge windows. This is the default on modern
+ hardware. If you need to use this, please report
+ a bug to <linux-pci@vger.kernel.org>.
routeirq Do IRQ routing for all PCI devices.
This is normally done in pci_enable_device(),
so this option is a temporary workaround
@@ -3618,7 +4488,9 @@
specified, e.g., 12@pci:8086:9c22:103c:198f
for 4096-byte alignment.
ecrc= Enable/disable PCIe ECRC (transaction layer
- end-to-end CRC checking).
+ end-to-end CRC checking). Only effective if
+ OS has native AER control (either granted by
+ ACPI _OSC or forced via "pcie_ports=native")
bios: Use BIOS/firmware settings. This is the
the default.
off: Turn ECRC off
@@ -3669,6 +4541,8 @@
may put more devices in an IOMMU group.
force_floating [S390] Force usage of floating interrupts.
nomio [S390] Do not use MIO instructions.
+ norid [S390] ignore the RID field and force use of
+ one PCI domain per PCI function
pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active State Power
Management.
@@ -3703,9 +4577,6 @@
for debug and development, but should not be
needed on a platform with proper driver support.
- pd. [PARIDE]
- See Documentation/admin-guide/blockdev/paride.rst.
-
pdcchassis= [PARISC,HW] Disable/Enable PDC Chassis Status codes at
boot time.
Format: { 0 | 1 }
@@ -3718,14 +4589,8 @@
allocator. This parameter is primarily for debugging
and performance comparison.
- pf. [PARIDE]
- See Documentation/admin-guide/blockdev/paride.rst.
-
- pg. [PARIDE]
- See Documentation/admin-guide/blockdev/paride.rst.
-
pirq= [SMP,APIC] Manual mp-table setup
- See Documentation/x86/i386/IO-APIC.rst.
+ See Documentation/arch/x86/i386/IO-APIC.rst.
plip= [PPT,NET] Parallel port network link
Format: { parport<nr> | timid | 0 }
@@ -3735,6 +4600,14 @@
Override pmtimer IOPort with a hex value.
e.g. pmtmr=0x508
+ pmu_override= [PPC] Override the PMU.
+ This option takes over the PMU facility, so it is no
+ longer usable by perf. Setting this option starts the
+ PMU counters by setting MMCR0 to 0 (the FC bit is
+ cleared). If a number is given, then MMCR1 is set to
+ that number, otherwise (e.g., 'pmu_override=on'), MMCR1
+ remains 0.
+
pm_debug_messages [SUSPEND,KNL]
Enable suspend/resume debug messages during boot up.
@@ -3787,6 +4660,13 @@
Format: {"off"}
Disable Hardware Transactional Memory
+ preempt= [KNL]
+ Select preemption mode if you have CONFIG_PREEMPT_DYNAMIC
+ none - Limited to cond_resched() calls
+ voluntary - Limited to cond_resched() and might_sleep() calls
+ full - Any section that isn't explicitly preempt disabled
+ can be preempted anytime.
+
print-fatal-signals=
[KNL] debug: print fatal signals
@@ -3806,6 +4686,15 @@
Format: <bool> (1/Y/y=enable, 0/N/n=disable)
default: disabled
+ printk.console_no_auto_verbose=
+ Disable console loglevel raise on oops, panic
+ or lockdep-detected issues (only if lock debug is on).
+ With an exception to setups with low baudrate on
+ serial console, keeping this 0 is a good choice
+ in order to provide more debug information.
+ Format: <bool>
+ default: 0 (auto_verbose is enabled)
+
printk.devkmsg={on,off,ratelimit}
Control writing to /dev/kmsg.
on - unlimited logging to /dev/kmsg from userspace
@@ -3835,9 +4724,7 @@
Param: <number> - step/bucket size as a power of 2 for
statistical time based profiling.
- prompt_ramdisk= [RAM] List of RAM disks to prompt for floppy disk
- before loading.
- See Documentation/admin-guide/blockdev/ramdisk.rst.
+ prompt_ramdisk= [RAM] [Deprecated]
prot_virt= [S390] enable hosting protected virtual machines
isolated from the hypervisor (if hardware supports
@@ -3863,10 +4750,7 @@
pstore.backend= Specify the name of the pstore backend to use
- pt. [PARIDE]
- See Documentation/admin-guide/blockdev/paride.rst.
-
- pti= [X86_64] Control Page Table Isolation of user and
+ pti= [X86-64] Control Page Table Isolation of user and
kernel address spaces. Disabling this feature
removes hardening, but improves performance of
system calls and interrupts.
@@ -3878,9 +4762,6 @@
Not specifying this option is equivalent to pti=auto.
- nopti [X86_64]
- Equivalent to pti=off
-
pty.legacy_count=
[KNL] Number of legacy pty's. Overwrites compiled-in
default number.
@@ -3889,17 +4770,38 @@
r128= [HW,DRM]
+ radix_hcall_invalidate=on [PPC/PSERIES]
+ Disable RADIX GTSE feature and use hcall for TLB
+ invalidate.
+
raid= [HW,RAID]
See Documentation/admin-guide/md.rst.
ramdisk_size= [RAM] Sizes of RAM disks in kilobytes
See Documentation/admin-guide/blockdev/ramdisk.rst.
- random.trust_cpu={on,off}
- [KNL] Enable or disable trusting the use of the
- CPU's random number generator (if available) to
- fully seed the kernel's CRNG. Default is controlled
- by CONFIG_RANDOM_TRUST_CPU.
+ ramdisk_start= [RAM] RAM disk image start address
+
+ random.trust_cpu=off
+ [KNL] Disable trusting the use of the CPU's
+ random number generator (if available) to
+ initialize the kernel's RNG.
+
+ random.trust_bootloader=off
+ [KNL] Disable trusting the use of the a seed
+ passed by the bootloader (if available) to
+ initialize the kernel's RNG.
+
+ randomize_kstack_offset=
+ [KNL] Enable or disable kernel stack offset
+ randomization, which provides roughly 5 bits of
+ entropy, frustrating memory corruption attacks
+ that depend on stack address determinism or
+ cross-syscall address exposures. This is only
+ available on architectures that have defined
+ CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET.
+ Format: <bool> (1/Y/y=enable, 0/N/n=disable)
+ Default is CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.
ras=option[,option,...] [KNL] RAS-specific options
@@ -3907,21 +4809,33 @@
Disable the Correctable Errors Collector,
see CONFIG_RAS_CEC help text.
- rcu_nocbs= [KNL]
- The argument is a cpu list, as described above,
- except that the string "all" can be used to
- specify every CPU on the system.
-
- In kernels built with CONFIG_RCU_NOCB_CPU=y, set
- the specified list of CPUs to be no-callback CPUs.
- Invocation of these CPUs' RCU callbacks will be
- offloaded to "rcuox/N" kthreads created for that
- purpose, where "x" is "p" for RCU-preempt, and
- "s" for RCU-sched, and "N" is the CPU number.
- This reduces OS jitter on the offloaded CPUs,
- which can be useful for HPC and real-time
- workloads. It can also improve energy efficiency
- for asymmetric multiprocessors.
+ rcu_nocbs[=cpu-list]
+ [KNL] The optional argument is a cpu list,
+ as described above.
+
+ In kernels built with CONFIG_RCU_NOCB_CPU=y,
+ enable the no-callback CPU mode, which prevents
+ such CPUs' callbacks from being invoked in
+ softirq context. Invocation of such CPUs' RCU
+ callbacks will instead be offloaded to "rcuox/N"
+ kthreads created for that purpose, where "x" is
+ "p" for RCU-preempt, "s" for RCU-sched, and "g"
+ for the kthreads that mediate grace periods; and
+ "N" is the CPU number. This reduces OS jitter on
+ the offloaded CPUs, which can be useful for HPC
+ and real-time workloads. It can also improve
+ energy efficiency for asymmetric multiprocessors.
+
+ If a cpulist is passed as an argument, the specified
+ list of CPUs is set to no-callback mode from boot.
+
+ Otherwise, if the '=' sign and the cpulist
+ arguments are omitted, no CPU will be set to
+ no-callback mode from boot but the mode may be
+ toggled at runtime via cpusets.
+
+ Note that this argument takes precedence over
+ the CONFIG_RCU_NOCB_CPU_DEFAULT_ALL option.
rcu_nocb_poll [KNL]
Rather than requiring that offloaded CPUs
@@ -3938,6 +4852,13 @@
Set maximum number of finished RCU callbacks to
process in one batch.
+ rcutree.do_rcu_barrier= [KNL]
+ Request a call to rcu_barrier(). This is
+ throttled so that userspace tests can safely
+ hammer on the sysfs variable if they so choose.
+ If triggered before the RCU grace-period machinery
+ is fully active, this will error out with EAGAIN.
+
rcutree.dump_tree= [KNL]
Dump the structure of the rcu_node combining tree
out at early boot. This is used for diagnostic
@@ -3957,26 +4878,6 @@
the propagation of recent CPU-hotplug changes up
the rcu_node combining tree.
- rcutree.use_softirq= [KNL]
- If set to zero, move all RCU_SOFTIRQ processing to
- per-CPU rcuc kthreads. Defaults to a non-zero
- value, meaning that RCU_SOFTIRQ is used by default.
- Specify rcutree.use_softirq=0 to use rcuc kthreads.
-
- rcutree.rcu_fanout_exact= [KNL]
- Disable autobalancing of the rcu_node combining
- tree. This is used by rcutorture, and might
- possibly be useful for architectures having high
- cache-to-cache transfer latencies.
-
- rcutree.rcu_fanout_leaf= [KNL]
- Change the number of CPUs assigned to each
- leaf rcu_node structure. Useful for very
- large systems, which will choose the value 64,
- and for NUMA systems with large remote-access
- latencies, which will choose a value aligned
- with the appropriate hardware boundaries.
-
rcutree.jiffies_till_first_fqs= [KNL]
Set delay from grace-period initialization to
first attempt to force quiescent states.
@@ -4012,14 +4913,21 @@
(the least-favored priority). Otherwise, when
RCU_BOOST is not set, valid values are 0-99 and
the default is zero (non-realtime operation).
-
- rcutree.rcu_nocb_gp_stride= [KNL]
- Set the number of NOCB callback kthreads in
- each group, which defaults to the square root
- of the number of CPUs. Larger numbers reduce
- the wakeup overhead on the global grace-period
- kthread, but increases that same overhead on
- each group's NOCB grace-period kthread.
+ When RCU_NOCB_CPU is set, also adjust the
+ priority of NOCB callback kthreads.
+
+ rcutree.nocb_nobypass_lim_per_jiffy= [KNL]
+ On callback-offloaded (rcu_nocbs) CPUs,
+ RCU reduces the lock contention that would
+ otherwise be caused by callback floods through
+ use of the ->nocb_bypass list. However, in the
+ common non-flooded case, RCU queues directly to
+ the main ->cblist in order to avoid the extra
+ overhead of the ->nocb_bypass list and its lock.
+ But if there are too many callbacks queued during
+ a single jiffy, RCU pre-queues the callbacks into
+ the ->nocb_bypass queue. The definition of "too
+ many" is supplied by this kernel boot parameter.
rcutree.qhimark= [KNL]
Set threshold of queued RCU callbacks beyond which
@@ -4038,15 +4946,55 @@
on rcutree.qhimark at boot time and to zero to
disable more aggressive help enlistment.
- rcutree.rcu_idle_gp_delay= [KNL]
- Set wakeup interval for idle CPUs that have
- RCU callbacks (RCU_FAST_NO_HZ=y).
+ rcutree.rcu_delay_page_cache_fill_msec= [KNL]
+ Set the page-cache refill delay (in milliseconds)
+ in response to low-memory conditions. The range
+ of permitted values is in the range 0:100000.
+
+ rcutree.rcu_divisor= [KNL]
+ Set the shift-right count to use to compute
+ the callback-invocation batch limit bl from
+ the number of callbacks queued on this CPU.
+ The result will be bounded below by the value of
+ the rcutree.blimit kernel parameter. Every bl
+ callbacks, the softirq handler will exit in
+ order to allow the CPU to do other work.
+
+ Please note that this callback-invocation batch
+ limit applies only to non-offloaded callback
+ invocation. Offloaded callbacks are instead
+ invoked in the context of an rcuoc kthread, which
+ scheduler will preempt as it does any other task.
- rcutree.rcu_idle_lazy_gp_delay= [KNL]
- Set wakeup interval for idle CPUs that have
- only "lazy" RCU callbacks (RCU_FAST_NO_HZ=y).
- Lazy RCU callbacks are those which RCU can
- prove do nothing more than free memory.
+ rcutree.rcu_fanout_exact= [KNL]
+ Disable autobalancing of the rcu_node combining
+ tree. This is used by rcutorture, and might
+ possibly be useful for architectures having high
+ cache-to-cache transfer latencies.
+
+ rcutree.rcu_fanout_leaf= [KNL]
+ Change the number of CPUs assigned to each
+ leaf rcu_node structure. Useful for very
+ large systems, which will choose the value 64,
+ and for NUMA systems with large remote-access
+ latencies, which will choose a value aligned
+ with the appropriate hardware boundaries.
+
+ rcutree.rcu_min_cached_objs= [KNL]
+ Minimum number of objects which are cached and
+ maintained per one CPU. Object size is equal
+ to PAGE_SIZE. The cache allows to reduce the
+ pressure to page allocator, also it makes the
+ whole algorithm to behave better in low memory
+ condition.
+
+ rcutree.rcu_nocb_gp_stride= [KNL]
+ Set the number of NOCB callback kthreads in
+ each group, which defaults to the square root
+ of the number of CPUs. Larger numbers reduce
+ the wakeup overhead on the global grace-period
+ kthread, but increases that same overhead on
+ each group's NOCB grace-period kthread.
rcutree.rcu_kick_kthreads= [KNL]
Cause the grace-period kthread to get an extra
@@ -4055,46 +5003,99 @@
This wake_up() will be accompanied by a
WARN_ONCE() splat and an ftrace_dump().
+ rcutree.rcu_resched_ns= [KNL]
+ Limit the time spend invoking a batch of RCU
+ callbacks to the specified number of nanoseconds.
+ By default, this limit is checked only once
+ every 32 callbacks in order to limit the pain
+ inflicted by local_clock() overhead.
+
+ rcutree.rcu_unlock_delay= [KNL]
+ In CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels,
+ this specifies an rcu_read_unlock()-time delay
+ in microseconds. This defaults to zero.
+ Larger delays increase the probability of
+ catching RCU pointer leaks, that is, buggy use
+ of RCU-protected pointers after the relevant
+ rcu_read_unlock() has completed.
+
rcutree.sysrq_rcu= [KNL]
Commandeer a sysrq key to dump out Tree RCU's
rcu_node tree with an eye towards determining
why a new grace period has not yet started.
- rcuperf.gp_async= [KNL]
+ rcutree.use_softirq= [KNL]
+ If set to zero, move all RCU_SOFTIRQ processing to
+ per-CPU rcuc kthreads. Defaults to a non-zero
+ value, meaning that RCU_SOFTIRQ is used by default.
+ Specify rcutree.use_softirq=0 to use rcuc kthreads.
+
+ But note that CONFIG_PREEMPT_RT=y kernels disable
+ this kernel boot parameter, forcibly setting it
+ to zero.
+
+ rcuscale.gp_async= [KNL]
Measure performance of asynchronous
grace-period primitives such as call_rcu().
- rcuperf.gp_async_max= [KNL]
+ rcuscale.gp_async_max= [KNL]
Specify the maximum number of outstanding
callbacks per writer thread. When a writer
thread exceeds this limit, it invokes the
corresponding flavor of rcu_barrier() to allow
previously posted callbacks to drain.
- rcuperf.gp_exp= [KNL]
+ rcuscale.gp_exp= [KNL]
Measure performance of expedited synchronous
grace-period primitives.
- rcuperf.holdoff= [KNL]
+ rcuscale.holdoff= [KNL]
Set test-start holdoff period. The purpose of
this parameter is to delay the start of the
test until boot completes in order to avoid
interference.
- rcuperf.kfree_rcu_test= [KNL]
+ rcuscale.kfree_by_call_rcu= [KNL]
+ In kernels built with CONFIG_RCU_LAZY=y, test
+ call_rcu() instead of kfree_rcu().
+
+ rcuscale.kfree_mult= [KNL]
+ Instead of allocating an object of size kfree_obj,
+ allocate one of kfree_mult * sizeof(kfree_obj).
+ Defaults to 1.
+
+ rcuscale.kfree_rcu_test= [KNL]
Set to measure performance of kfree_rcu() flooding.
- rcuperf.kfree_nthreads= [KNL]
+ rcuscale.kfree_rcu_test_double= [KNL]
+ Test the double-argument variant of kfree_rcu().
+ If this parameter has the same value as
+ rcuscale.kfree_rcu_test_single, both the single-
+ and double-argument variants are tested.
+
+ rcuscale.kfree_rcu_test_single= [KNL]
+ Test the single-argument variant of kfree_rcu().
+ If this parameter has the same value as
+ rcuscale.kfree_rcu_test_double, both the single-
+ and double-argument variants are tested.
+
+ rcuscale.kfree_nthreads= [KNL]
The number of threads running loops of kfree_rcu().
- rcuperf.kfree_alloc_num= [KNL]
+ rcuscale.kfree_alloc_num= [KNL]
Number of allocations and frees done in an iteration.
- rcuperf.kfree_loops= [KNL]
- Number of loops doing rcuperf.kfree_alloc_num number
+ rcuscale.kfree_loops= [KNL]
+ Number of loops doing rcuscale.kfree_alloc_num number
of allocations and frees.
- rcuperf.nreaders= [KNL]
+ rcuscale.minruntime= [KNL]
+ Set the minimum test run time in seconds. This
+ does not affect the data-collection interval,
+ but instead allows better measurement of things
+ like CPU consumption.
+
+ rcuscale.nreaders= [KNL]
Set number of RCU readers. The value -1 selects
N, where N is the number of CPUs. A value
"n" less than -1 selects N-n+1, where N is again
@@ -4103,27 +5104,32 @@
A value of "n" less than or equal to -N selects
a single reader.
- rcuperf.nwriters= [KNL]
+ rcuscale.nwriters= [KNL]
Set number of RCU writers. The values operate
- the same as for rcuperf.nreaders.
+ the same as for rcuscale.nreaders.
N, where N is the number of CPUs
- rcuperf.perf_type= [KNL]
+ rcuscale.scale_type= [KNL]
Specify the RCU implementation to test.
- rcuperf.shutdown= [KNL]
+ rcuscale.shutdown= [KNL]
Shut the system down after performance tests
complete. This is useful for hands-off automated
testing.
- rcuperf.verbose= [KNL]
+ rcuscale.verbose= [KNL]
Enable additional printk() statements.
- rcuperf.writer_holdoff= [KNL]
+ rcuscale.writer_holdoff= [KNL]
Write-side holdoff between grace periods,
in microseconds. The default of zero says
no holdoff.
+ rcuscale.writer_holdoff_jiffies= [KNL]
+ Additional write-side holdoff between grace
+ periods, but in jiffies. The default of zero
+ says no holdoff.
+
rcutorture.fqs_duration= [KNL]
Set duration of force_quiescent_state bursts
in microseconds.
@@ -4137,8 +5143,12 @@
in seconds.
rcutorture.fwd_progress= [KNL]
- Enable RCU grace-period forward-progress testing
+ Specifies the number of kthreads to be used
+ for RCU grace-period forward-progress testing
for the types of RCU supporting this notion.
+ Defaults to 1 kthread, values less than zero or
+ greater than the number of CPUs cause the number
+ of CPUs to be used.
rcutorture.fwd_progress_div= [KNL]
Specify the fraction of a CPU-stall-warning
@@ -4172,6 +5182,18 @@
are zero, rcutorture acts as if is interpreted
they are all non-zero.
+ rcutorture.irqreader= [KNL]
+ Run RCU readers from irq handlers, or, more
+ accurately, from a timer handler. Not all RCU
+ flavors take kindly to this sort of thing.
+
+ rcutorture.leakpointer= [KNL]
+ Leak an RCU-protected pointer out of the reader.
+ This can of course result in splats, and is
+ intended to test the ability of things like
+ CONFIG_RCU_STRICT_GRACE_PERIOD=y to detect
+ such leaks.
+
rcutorture.n_barrier_cbs= [KNL]
Set callbacks/threads for rcu_barrier() testing.
@@ -4180,6 +5202,14 @@
stress RCU, they don't participate in the actual
test, hence the "fake".
+ rcutorture.nocbs_nthreads= [KNL]
+ Set number of RCU callback-offload togglers.
+ Zero (the default) disables toggling.
+
+ rcutorture.nocbs_toggle= [KNL]
+ Set the delay in milliseconds between successive
+ callback-offload toggling attempts.
+
rcutorture.nreaders= [KNL]
Set number of RCU readers. The value -1 selects
N-1, where N is the number of CPUs. A value
@@ -4197,6 +5227,20 @@
Set time (jiffies) between CPU-hotplug operations,
or zero to disable CPU-hotplug testing.
+ rcutorture.read_exit= [KNL]
+ Set the number of read-then-exit kthreads used
+ to test the interaction of RCU updaters and
+ task-exit processing.
+
+ rcutorture.read_exit_burst= [KNL]
+ The number of times in a given read-then-exit
+ episode that a set of read-then-exit kthreads
+ is spawned.
+
+ rcutorture.read_exit_delay= [KNL]
+ The delay, in seconds, between successive
+ read-then-exit testing episodes.
+
rcutorture.shuffle_interval= [KNL]
Set task-shuffle interval (s). Shuffling tasks
allows some CPUs to go into dyntick-idle mode
@@ -4210,12 +5254,33 @@
Duration of CPU stall (s) to test RCU CPU stall
warnings, zero to disable.
+ rcutorture.stall_cpu_block= [KNL]
+ Sleep while stalling if set. This will result
+ in warnings from preemptible RCU in addition to
+ any other stall-related activity. Note that
+ in kernels built with CONFIG_PREEMPTION=n and
+ CONFIG_PREEMPT_COUNT=y, this parameter will
+ cause the CPU to pass through a quiescent state.
+ Given CONFIG_PREEMPTION=n, this will suppress
+ RCU CPU stall warnings, but will instead result
+ in scheduling-while-atomic splats.
+
+ Use of this module parameter results in splats.
+
+
rcutorture.stall_cpu_holdoff= [KNL]
Time to wait (s) after boot before inducing stall.
rcutorture.stall_cpu_irqsoff= [KNL]
Disable interrupts while stalling if set.
+ rcutorture.stall_gp_kthread= [KNL]
+ Duration (s) of forced sleep within RCU
+ grace-period kthread to test RCU CPU stall
+ warnings, zero to disable. If both stall_cpu
+ and stall_gp_kthread are specified, the
+ kthread is starved first, then the CPU.
+
rcutorture.stat_interval= [KNL]
Time (s) between statistics printk()s.
@@ -4250,6 +5315,12 @@
Dump ftrace buffer after reporting RCU CPU
stall warning.
+ rcupdate.rcu_cpu_stall_notifiers= [KNL]
+ Provide RCU CPU stall notifiers, but see the
+ warnings in the RCU_CPU_STALL_NOTIFIER Kconfig
+ option's help text. TL;DR: You almost certainly
+ do not want rcupdate.rcu_cpu_stall_notifiers.
+
rcupdate.rcu_cpu_stall_suppress= [KNL]
Suppress RCU CPU stall warning messages.
@@ -4261,6 +5332,29 @@
rcupdate.rcu_cpu_stall_timeout= [KNL]
Set timeout for RCU CPU stall warning messages.
+ The value is in seconds and the maximum allowed
+ value is 300 seconds.
+
+ rcupdate.rcu_exp_cpu_stall_timeout= [KNL]
+ Set timeout for expedited RCU CPU stall warning
+ messages. The value is in milliseconds
+ and the maximum allowed value is 21000
+ milliseconds. Please note that this value is
+ adjusted to an arch timer tick resolution.
+ Setting this to zero causes the value from
+ rcupdate.rcu_cpu_stall_timeout to be used (after
+ conversion from seconds to milliseconds).
+
+ rcupdate.rcu_cpu_stall_cputime= [KNL]
+ Provide statistics on the cputime and count of
+ interrupts and tasks during the sampling period. For
+ multiple continuous RCU stalls, all sampling periods
+ begin at half of the first RCU stall timeout.
+
+ rcupdate.rcu_exp_stall_task_details= [KNL]
+ Print stack dumps of any tasks blocking the
+ current expedited RCU grace period during an
+ expedited RCU CPU stall warning.
rcupdate.rcu_expedited= [KNL]
Use expedited grace-period primitives, for
@@ -4286,10 +5380,101 @@
only normal grace-period primitives. No effect
on CONFIG_TINY_RCU kernels.
+ But note that CONFIG_PREEMPT_RT=y kernels enables
+ this kernel boot parameter, forcibly setting
+ it to the value one, that is, converting any
+ post-boot attempt at an expedited RCU grace
+ period to instead use normal non-expedited
+ grace-period processing.
+
+ rcupdate.rcu_task_collapse_lim= [KNL]
+ Set the maximum number of callbacks present
+ at the beginning of a grace period that allows
+ the RCU Tasks flavors to collapse back to using
+ a single callback queue. This switching only
+ occurs when rcupdate.rcu_task_enqueue_lim is
+ set to the default value of -1.
+
+ rcupdate.rcu_task_contend_lim= [KNL]
+ Set the minimum number of callback-queuing-time
+ lock-contention events per jiffy required to
+ cause the RCU Tasks flavors to switch to per-CPU
+ callback queuing. This switching only occurs
+ when rcupdate.rcu_task_enqueue_lim is set to
+ the default value of -1.
+
+ rcupdate.rcu_task_enqueue_lim= [KNL]
+ Set the number of callback queues to use for the
+ RCU Tasks family of RCU flavors. The default
+ of -1 allows this to be automatically (and
+ dynamically) adjusted. This parameter is intended
+ for use in testing.
+
+ rcupdate.rcu_task_ipi_delay= [KNL]
+ Set time in jiffies during which RCU tasks will
+ avoid sending IPIs, starting with the beginning
+ of a given grace period. Setting a large
+ number avoids disturbing real-time workloads,
+ but lengthens grace periods.
+
+ rcupdate.rcu_task_lazy_lim= [KNL]
+ Number of callbacks on a given CPU that will
+ cancel laziness on that CPU. Use -1 to disable
+ cancellation of laziness, but be advised that
+ doing so increases the danger of OOM due to
+ callback flooding.
+
+ rcupdate.rcu_task_stall_info= [KNL]
+ Set initial timeout in jiffies for RCU task stall
+ informational messages, which give some indication
+ of the problem for those not patient enough to
+ wait for ten minutes. Informational messages are
+ only printed prior to the stall-warning message
+ for a given grace period. Disable with a value
+ less than or equal to zero. Defaults to ten
+ seconds. A change in value does not take effect
+ until the beginning of the next grace period.
+
+ rcupdate.rcu_task_stall_info_mult= [KNL]
+ Multiplier for time interval between successive
+ RCU task stall informational messages for a given
+ RCU tasks grace period. This value is clamped
+ to one through ten, inclusive. It defaults to
+ the value three, so that the first informational
+ message is printed 10 seconds into the grace
+ period, the second at 40 seconds, the third at
+ 160 seconds, and then the stall warning at 600
+ seconds would prevent a fourth at 640 seconds.
+
rcupdate.rcu_task_stall_timeout= [KNL]
- Set timeout in jiffies for RCU task stall warning
- messages. Disable with a value less than or equal
- to zero.
+ Set timeout in jiffies for RCU task stall
+ warning messages. Disable with a value less
+ than or equal to zero. Defaults to ten minutes.
+ A change in value does not take effect until
+ the beginning of the next grace period.
+
+ rcupdate.rcu_tasks_lazy_ms= [KNL]
+ Set timeout in milliseconds RCU Tasks asynchronous
+ callback batching for call_rcu_tasks().
+ A negative value will take the default. A value
+ of zero will disable batching. Batching is
+ always disabled for synchronize_rcu_tasks().
+
+ rcupdate.rcu_tasks_rude_lazy_ms= [KNL]
+ Set timeout in milliseconds RCU Tasks
+ Rude asynchronous callback batching for
+ call_rcu_tasks_rude(). A negative value
+ will take the default. A value of zero will
+ disable batching. Batching is always disabled
+ for synchronize_rcu_tasks_rude().
+
+ rcupdate.rcu_tasks_trace_lazy_ms= [KNL]
+ Set timeout in milliseconds RCU Tasks
+ Trace asynchronous callback batching for
+ call_rcu_tasks_trace(). A negative value
+ will take the default. A value of zero will
+ disable batching. Batching is always disabled
+ for synchronize_rcu_tasks_trace().
rcupdate.rcu_self_test= [KNL]
Run the RCU early boot self tests
@@ -4309,13 +5494,13 @@
rdt= [HW,X86,RDT]
Turn on/off individual RDT features. List is:
cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
- mba.
+ mba, smba, bmec.
E.g. to turn on cmt and turn off mba use:
rdt=cmt,!mba
reboot= [KNL]
Format (x86 or x86_64):
- [w[arm] | c[old] | h[ard] | s[oft] | g[pio]] \
+ [w[arm] | c[old] | h[ard] | s[oft] | g[pio]] | d[efault] \
[[,]s[mp]#### \
[[,]b[ios] | a[cpi] | k[bd] | t[riple] | e[fi] | p[ci]] \
[[,]f[orce]
@@ -4327,6 +5512,64 @@
reboot_cpu is s[mp]#### with #### being the processor
to be used for rebooting.
+ refscale.holdoff= [KNL]
+ Set test-start holdoff period. The purpose of
+ this parameter is to delay the start of the
+ test until boot completes in order to avoid
+ interference.
+
+ refscale.lookup_instances= [KNL]
+ Number of data elements to use for the forms of
+ SLAB_TYPESAFE_BY_RCU testing. A negative number
+ is negated and multiplied by nr_cpu_ids, while
+ zero specifies nr_cpu_ids.
+
+ refscale.loops= [KNL]
+ Set the number of loops over the synchronization
+ primitive under test. Increasing this number
+ reduces noise due to loop start/end overhead,
+ but the default has already reduced the per-pass
+ noise to a handful of picoseconds on ca. 2020
+ x86 laptops.
+
+ refscale.nreaders= [KNL]
+ Set number of readers. The default value of -1
+ selects N, where N is roughly 75% of the number
+ of CPUs. A value of zero is an interesting choice.
+
+ refscale.nruns= [KNL]
+ Set number of runs, each of which is dumped onto
+ the console log.
+
+ refscale.readdelay= [KNL]
+ Set the read-side critical-section duration,
+ measured in microseconds.
+
+ refscale.scale_type= [KNL]
+ Specify the read-protection implementation to test.
+
+ refscale.shutdown= [KNL]
+ Shut down the system at the end of the performance
+ test. This defaults to 1 (shut it down) when
+ refscale is built into the kernel and to 0 (leave
+ it running) when refscale is built as a module.
+
+ refscale.verbose= [KNL]
+ Enable additional printk() statements.
+
+ refscale.verbose_batched= [KNL]
+ Batch the additional printk() statements. If zero
+ (the default) or negative, print everything. Otherwise,
+ print every Nth verbose statement, where N is the value
+ specified.
+
+ regulator_ignore_unused
+ [REGULATOR]
+ Prevents regulator framework from disabling regulators
+ that are unused, due no driver claiming them. This may
+ be useful for debug and development, but should not be
+ needed on a platform with proper driver support.
+
relax_domain_level=
[KNL, SMP] Set scheduler's default relax_domain_level.
See Documentation/admin-guide/cgroup-v1/cpusets.rst.
@@ -4342,11 +5585,6 @@
Reserves a hole at the top of the kernel virtual
address space.
- reservelow= [X86]
- Format: nn[K]
- Set the amount of memory to reserve for BIOS at
- the bottom of the address space.
-
reset_devices [KNL] Force drivers to reset the underlying device
during initialization.
@@ -4368,16 +5606,45 @@
Useful for devices that are detected asynchronously
(e.g. USB and MMC devices).
- hibernate= [HIBERNATION]
- noresume Don't check if there's a hibernation image
- present during boot.
- nocompress Don't compress/decompress hibernation images.
- no Disable hibernation and resume.
- protect_image Turn on image protection during restoration
- (that will set all pages holding image data
- during restoration read-only).
-
- retain_initrd [RAM] Keep initrd memory after extraction
+ retain_initrd [RAM] Keep initrd memory after extraction. After boot, it will
+ be accessible via /sys/firmware/initrd.
+
+ retbleed= [X86] Control mitigation of RETBleed (Arbitrary
+ Speculative Code Execution with Return Instructions)
+ vulnerability.
+
+ AMD-based UNRET and IBPB mitigations alone do not stop
+ sibling threads from influencing the predictions of other
+ sibling threads. For that reason, STIBP is used on pro-
+ cessors that support it, and mitigate SMT on processors
+ that don't.
+
+ off - no mitigation
+ auto - automatically select a migitation
+ auto,nosmt - automatically select a mitigation,
+ disabling SMT if necessary for
+ the full mitigation (only on Zen1
+ and older without STIBP).
+ ibpb - On AMD, mitigate short speculation
+ windows on basic block boundaries too.
+ Safe, highest perf impact. It also
+ enables STIBP if present. Not suitable
+ on Intel.
+ ibpb,nosmt - Like "ibpb" above but will disable SMT
+ when STIBP is not available. This is
+ the alternative for systems which do not
+ have STIBP.
+ unret - Force enable untrained return thunks,
+ only effective on AMD f15h-f17h based
+ systems.
+ unret,nosmt - Like unret, but will disable SMT when STIBP
+ is not available. This is the alternative for
+ systems which do not have STIBP.
+
+ Selecting 'auto' will choose a mitigation method at run
+ time according to the CPU.
+
+ Not specifying this option is equivalent to retbleed=auto.
rfkill.default_state=
0 "airplane mode". All wifi, bluetooth, wimax, gps, fm,
@@ -4398,11 +5665,20 @@
[KNL] Disable ring 3 MONITOR/MWAIT feature on supported
CPUs.
+ riscv_isa_fallback [RISCV]
+ When CONFIG_RISCV_ISA_FALLBACK is not enabled, permit
+ falling back to detecting extension support by parsing
+ "riscv,isa" property on devicetree systems when the
+ replacement properties are not found. See the Kconfig
+ entry for RISCV_ISA_FALLBACK.
+
ro [KNL] Mount root device read-only on boot
rodata= [KNL]
on Mark read-only kernel memory as read-only (default).
off Leave read-only kernel memory writable for debugging.
+ full Mark read-only kernel memory and aliases as read-only
+ [arm64]
rockchip.usb_uart
Enable the uart passthrough on the designated usb port
@@ -4411,7 +5687,12 @@
port and the regular usb controller gets disabled.
root= [KNL] Root filesystem
- See name_to_dev_t comment in init/do_mounts.c.
+ Usually this a a block device specifier of some kind,
+ see the early_lookup_bdev comment in
+ block/early-lookup.c for details.
+ Alternatively this can be "ram" for the legacy initial
+ ramdisk, "nfs" and "cifs" for root on a network file
+ system, or "mtd" and "ubi" for mounting from raw flash.
rootdelay= [KNL] Delay (in seconds) to pause before attempting to
mount the root filesystem
@@ -4424,6 +5705,10 @@
Useful for devices that are detected asynchronously
(e.g. USB and MMC devices).
+ rootwait= [KNL] Maximum time (in seconds) to wait for root device
+ to show up before attempting to mount the root
+ filesystem.
+
rproc_mem=nn[KMG][@address]
[KNL,ARM,CMA] Remoteproc physical memory block.
Memory area to be used by remote processor image,
@@ -4436,16 +5721,27 @@
s390_iommu= [HW,S390]
Set s390 IOTLB flushing mode
strict
- With strict flushing every unmap operation will result in
- an IOTLB flush. Default is lazy flushing before reuse,
- which is faster.
+ With strict flushing every unmap operation will result
+ in an IOTLB flush. Default is lazy flushing before
+ reuse, which is faster. Deprecated, equivalent to
+ iommu.strict=1.
+
+ s390_iommu_aperture= [KNL,S390]
+ Specifies the size of the per device DMA address space
+ accessible through the DMA and IOMMU APIs as a decimal
+ factor of the size of main memory.
+ The default is 1 meaning that one can concurrently use
+ as many DMA addresses as physical memory is installed,
+ if supported by hardware, and thus map all of memory
+ once. With a value of 2 one can map all of memory twice
+ and so on. As a special case a factor of 0 imposes no
+ restrictions other than those given by hardware at the
+ cost of significant additional memory use for tables.
sa1100ir [NET]
See drivers/net/irda/sa1100_ir.c.
- sbni= [NET] Granch SBNI12 leased line adapter
-
- sched_debug [KNL] Enables verbose scheduler debug messages.
+ sched_verbose [KNL] Enables verbose scheduler debug messages.
schedstats= [KNL,X86] Enable or disable scheduled statistics.
Allowed values are enable and disable. This feature
@@ -4468,6 +5764,98 @@
Format: integer between 0 and 10
Default is 0.
+ scftorture.holdoff= [KNL]
+ Number of seconds to hold off before starting
+ test. Defaults to zero for module insertion and
+ to 10 seconds for built-in smp_call_function()
+ tests.
+
+ scftorture.longwait= [KNL]
+ Request ridiculously long waits randomly selected
+ up to the chosen limit in seconds. Zero (the
+ default) disables this feature. Please note
+ that requesting even small non-zero numbers of
+ seconds can result in RCU CPU stall warnings,
+ softlockup complaints, and so on.
+
+ scftorture.nthreads= [KNL]
+ Number of kthreads to spawn to invoke the
+ smp_call_function() family of functions.
+ The default of -1 specifies a number of kthreads
+ equal to the number of CPUs.
+
+ scftorture.onoff_holdoff= [KNL]
+ Number seconds to wait after the start of the
+ test before initiating CPU-hotplug operations.
+
+ scftorture.onoff_interval= [KNL]
+ Number seconds to wait between successive
+ CPU-hotplug operations. Specifying zero (which
+ is the default) disables CPU-hotplug operations.
+
+ scftorture.shutdown_secs= [KNL]
+ The number of seconds following the start of the
+ test after which to shut down the system. The
+ default of zero avoids shutting down the system.
+ Non-zero values are useful for automated tests.
+
+ scftorture.stat_interval= [KNL]
+ The number of seconds between outputting the
+ current test statistics to the console. A value
+ of zero disables statistics output.
+
+ scftorture.stutter_cpus= [KNL]
+ The number of jiffies to wait between each change
+ to the set of CPUs under test.
+
+ scftorture.use_cpus_read_lock= [KNL]
+ Use use_cpus_read_lock() instead of the default
+ preempt_disable() to disable CPU hotplug
+ while invoking one of the smp_call_function*()
+ functions.
+
+ scftorture.verbose= [KNL]
+ Enable additional printk() statements.
+
+ scftorture.weight_single= [KNL]
+ The probability weighting to use for the
+ smp_call_function_single() function with a zero
+ "wait" parameter. A value of -1 selects the
+ default if all other weights are -1. However,
+ if at least one weight has some other value, a
+ value of -1 will instead select a weight of zero.
+
+ scftorture.weight_single_wait= [KNL]
+ The probability weighting to use for the
+ smp_call_function_single() function with a
+ non-zero "wait" parameter. See weight_single.
+
+ scftorture.weight_many= [KNL]
+ The probability weighting to use for the
+ smp_call_function_many() function with a zero
+ "wait" parameter. See weight_single.
+ Note well that setting a high probability for
+ this weighting can place serious IPI load
+ on the system.
+
+ scftorture.weight_many_wait= [KNL]
+ The probability weighting to use for the
+ smp_call_function_many() function with a
+ non-zero "wait" parameter. See weight_single
+ and weight_many.
+
+ scftorture.weight_all= [KNL]
+ The probability weighting to use for the
+ smp_call_function_all() function with a zero
+ "wait" parameter. See weight_single and
+ weight_many.
+
+ scftorture.weight_all_wait= [KNL]
+ The probability weighting to use for the
+ smp_call_function_all() function with a
+ non-zero "wait" parameter. See weight_single
+ and weight_many.
+
skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate
xtime_lock contention on larger systems, and/or RCU lock
contention on all systems with CONFIG_MAXSMP set.
@@ -4488,23 +5876,31 @@
1 -- enable.
Default value is 1.
- apparmor= [APPARMOR] Disable or enable AppArmor at boot time
- Format: { "0" | "1" }
- See security/apparmor/Kconfig help text
- 0 -- disable.
- 1 -- enable.
- Default value is set via kernel config option.
-
serialnumber [BUGS=X86-32]
+ sev=option[,option...] [X86-64] See Documentation/arch/x86/x86_64/boot-options.rst
+
shapers= [NET]
Maximal number of shapers.
+ show_lapic= [APIC,X86] Advanced Programmable Interrupt Controller
+ Limit apic dumping. The parameter defines the maximal
+ number of local apics being dumped. Also it is possible
+ to set it to "all" by meaning -- no limit here.
+ Format: { 1 (default) | 2 | ... | all }.
+ The parameter valid if only apic=debug or
+ apic=verbose is specified.
+ Example: apic=debug show_lapic=all
+
simeth= [IA-64]
simscsi=
slram= [HW,MTD]
+ slab_merge [MM]
+ Enable merging of slabs with similar size when the
+ kernel is built without CONFIG_SLAB_MERGE_DEFAULT.
+
slab_nomerge [MM]
Disable merging of slabs with similar size. May be
necessary if there is some reason to distinguish
@@ -4516,7 +5912,7 @@
cache (risks via metadata attacks are mostly
unchanged). Debug options disable merging on their
own.
- For more information see Documentation/vm/slub.rst.
+ For more information see Documentation/mm/slub.rst.
slab_max_order= [MM, SLAB]
Determines the maximum allowed order for slabs.
@@ -4524,27 +5920,19 @@
fragmentation. Defaults to 1 for systems with
more than 32MB of RAM, 0 otherwise.
- slub_debug[=options[,slabs]] [MM, SLUB]
+ slub_debug[=options[,slabs][;[options[,slabs]]...] [MM, SLUB]
Enabling slub_debug allows one to determine the
culprit if slab objects become corrupted. Enabling
slub_debug can create guard zones around objects and
may poison objects when not in use. Also tracks the
last alloc / free. For more information see
- Documentation/vm/slub.rst.
-
- slub_memcg_sysfs= [MM, SLUB]
- Determines whether to enable sysfs directories for
- memory cgroup sub-caches. 1 to enable, 0 to disable.
- The default is determined by CONFIG_SLUB_MEMCG_SYSFS_ON.
- Enabling this can lead to a very high number of debug
- directories and files being created under
- /sys/kernel/slub.
+ Documentation/mm/slub.rst.
slub_max_order= [MM, SLUB]
Determines the maximum allowed order for slabs.
A high setting may cause OOMs due to memory
fragmentation. For more information see
- Documentation/vm/slub.rst.
+ Documentation/mm/slub.rst.
slub_min_objects= [MM, SLUB]
The minimum number of objects per slab. SLUB will
@@ -4553,12 +5941,15 @@
the number of objects indicated. The higher the number
of objects the smaller the overhead of tracking slabs
and the less frequently locks need to be acquired.
- For more information see Documentation/vm/slub.rst.
+ For more information see Documentation/mm/slub.rst.
slub_min_order= [MM, SLUB]
Determines the minimum page order for slabs. Must be
lower than slub_max_order.
- For more information see Documentation/vm/slub.rst.
+ For more information see Documentation/mm/slub.rst.
+
+ slub_merge [MM, SLUB]
+ Same with slab_merge.
slub_nomerge [MM, SLUB]
Same with slab_nomerge. This is supported for legacy.
@@ -4567,6 +5958,24 @@
smart2= [HW]
Format: <io1>[,<io2>[,...,<io8>]]
+ smp.csd_lock_timeout= [KNL]
+ Specify the period of time in milliseconds
+ that smp_call_function() and friends will wait
+ for a CPU to release the CSD lock. This is
+ useful when diagnosing bugs involving CPUs
+ disabling interrupts for extended periods
+ of time. Defaults to 5,000 milliseconds, and
+ setting a value of zero disables this feature.
+ This feature may be more efficiently disabled
+ using the csdlock_debug- kernel parameter.
+
+ smp.panic_on_ipistall= [KNL]
+ If a csd_lock_timeout extends for more than
+ the specified number of milliseconds, panic the
+ system. By default, let CSD-lock acquisition
+ take as long as they take. Specifying 300,000
+ for this value provides a 5-minute timeout.
+
smsc-ircc2.nopnp [HW] Don't use PNP to discover SMC devices
smsc-ircc2.ircc_cfg= [HW] Device configuration I/O port
smsc-ircc2.ircc_sir= [HW] SIR base I/O port
@@ -4578,7 +5987,7 @@
1: Fast pin select (default)
2: ATC IRMode
- smt [KNL,S390] Set the maximum number of threads (logical
+ smt= [KNL,MIPS,S390] Set the maximum number of threads (logical
CPUs) to use per physical CPU on systems capable of
symmetric multithreading (SMT). Will be capped to the
actual hardware limit.
@@ -4587,9 +5996,9 @@
softlockup_panic=
[KNL] Should the soft-lockup detector generate panics.
- Format: <integer>
+ Format: 0 | 1
- A nonzero value instructs the soft-lockup detector
+ A value of 1 instructs the soft-lockup detector
to panic the machine when a soft-lockup occurs. It is
also controlled by the kernel.softlockup_panic sysctl
and CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC, which is the
@@ -4598,7 +6007,7 @@
softlockup_all_cpu_backtrace=
[KNL] Should the soft-lockup detector generate
backtraces on all cpus.
- Format: <integer>
+ Format: 0 | 1
sonypi.*= [HW] Sony Programmable I/O Control Device driver
See Documentation/admin-guide/laptops/sonypi.rst
@@ -4630,8 +6039,13 @@
Specific mitigations can also be selected manually:
retpoline - replace indirect branches
- retpoline,generic - google's original retpoline
- retpoline,amd - AMD-specific minimal thunk
+ retpoline,generic - Retpolines
+ retpoline,lfence - LFENCE; indirect branch
+ retpoline,amd - alias for retpoline,lfence
+ eibrs - Enhanced/Auto IBRS
+ eibrs,retpoline - Enhanced/Auto IBRS + Retpolines
+ eibrs,lfence - Enhanced/Auto IBRS + LFENCE
+ ibrs - use IBRS to protect kernel
Not specifying this option is equivalent to
spectre_v2=auto.
@@ -4672,12 +6086,22 @@
auto - Kernel selects the mitigation depending on
the available CPU features and vulnerability.
- Default mitigation:
- If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
+ Default mitigation: "prctl"
Not specifying this option is equivalent to
spectre_v2_user=auto.
+ spec_rstack_overflow=
+ [X86] Control RAS overflow mitigation on AMD Zen CPUs
+
+ off - Disable mitigation
+ microcode - Enable microcode mitigation only
+ safe-ret - Enable sw-only safe RET mitigation (default)
+ ibpb - Enable mitigation by issuing IBPB on
+ kernel entry
+ ibpb-vmexit - Issue IBPB only on VMEXIT
+ (cloud-specific mitigation)
+
spec_store_bypass_disable=
[HW] Control Speculative Store Bypass (SSB) Disable mitigation
(Speculative Store Bypass vulnerability)
@@ -4717,7 +6141,7 @@
will disable SSB unless they explicitly opt out.
Default mitigations:
- X86: If CONFIG_SECCOMP=y "seccomp", otherwise "prctl"
+ X86: "prctl"
On powerpc the options are:
@@ -4736,27 +6160,89 @@
spia_peddr=
split_lock_detect=
- [X86] Enable split lock detection
+ [X86] Enable split lock detection or bus lock detection
When enabled (and if hardware support is present), atomic
instructions that access data across cache line
- boundaries will result in an alignment check exception.
+ boundaries will result in an alignment check exception
+ for split lock detection or a debug exception for
+ bus lock detection.
off - not enabled
- warn - the kernel will emit rate limited warnings
+ warn - the kernel will emit rate-limited warnings
about applications triggering the #AC
- exception. This mode is the default on CPUs
- that supports split lock detection.
+ exception or the #DB exception. This mode is
+ the default on CPUs that support split lock
+ detection or bus lock detection. Default
+ behavior is by #AC if both features are
+ enabled in hardware.
fatal - the kernel will send SIGBUS to applications
- that trigger the #AC exception.
+ that trigger the #AC exception or the #DB
+ exception. Default behavior is by #AC if
+ both features are enabled in hardware.
+
+ ratelimit:N -
+ Set system wide rate limit to N bus locks
+ per second for bus lock detection.
+ 0 < N <= 1000.
+
+ N/A for split lock detection.
+
If an #AC exception is hit in the kernel or in
firmware (i.e. not while executing in user mode)
the kernel will oops in either "warn" or "fatal"
mode.
+ #DB exception for bus lock is triggered only when
+ CPL > 0.
+
+ srbds= [X86,INTEL]
+ Control the Special Register Buffer Data Sampling
+ (SRBDS) mitigation.
+
+ Certain CPUs are vulnerable to an MDS-like
+ exploit which can leak bits from the random
+ number generator.
+
+ By default, this issue is mitigated by
+ microcode. However, the microcode fix can cause
+ the RDRAND and RDSEED instructions to become
+ much slower. Among other effects, this will
+ result in reduced throughput from /dev/urandom.
+
+ The microcode mitigation can be disabled with
+ the following option:
+
+ off: Disable mitigation and remove
+ performance impact to RDRAND and RDSEED
+
+ srcutree.big_cpu_lim [KNL]
+ Specifies the number of CPUs constituting a
+ large system, such that srcu_struct structures
+ should immediately allocate an srcu_node array.
+ This kernel-boot parameter defaults to 128,
+ but takes effect only when the low-order four
+ bits of srcutree.convert_to_big is equal to 3
+ (decide at boot).
+
+ srcutree.convert_to_big [KNL]
+ Specifies under what conditions an SRCU tree
+ srcu_struct structure will be converted to big
+ form, that is, with an rcu_node tree:
+
+ 0: Never.
+ 1: At init_srcu_struct() time.
+ 2: When rcutorture decides to.
+ 3: Decide at boot time (default).
+ 0x1X: Above plus if high contention.
+
+ Either way, the srcu_node tree will be sized based
+ on the actual runtime number of CPUs (nr_cpu_ids)
+ instead of the compile-time CONFIG_NR_CPUS.
+
srcutree.counter_wrap_check [KNL]
Specifies how frequently to check for
grace-period sequence counter wrap for the
@@ -4774,6 +6260,32 @@
expediting. Set to zero to disable automatic
expediting.
+ srcutree.srcu_max_nodelay [KNL]
+ Specifies the number of no-delay instances
+ per jiffy for which the SRCU grace period
+ worker thread will be rescheduled with zero
+ delay. Beyond this limit, worker thread will
+ be rescheduled with a sleep delay of one jiffy.
+
+ srcutree.srcu_max_nodelay_phase [KNL]
+ Specifies the per-grace-period phase, number of
+ non-sleeping polls of readers. Beyond this limit,
+ grace period worker thread will be rescheduled
+ with a sleep delay of one jiffy, between each
+ rescan of the readers, for a grace period phase.
+
+ srcutree.srcu_retry_check_delay [KNL]
+ Specifies number of microseconds of non-sleeping
+ delay between each non-sleeping poll of readers.
+
+ srcutree.small_contention_lim [KNL]
+ Specifies the number of update-side contention
+ events per jiffy will be tolerated before
+ initiating a conversion of an srcu_struct
+ structure to big form. Note that the value of
+ srcutree.convert_to_big must have the 0x10 bit
+ set for contention-based conversions to occur.
+
ssbd= [ARM64,HW]
Speculative Store Bypass Disable control
@@ -4798,12 +6310,18 @@
growing up) the main stack are reserved for no other
mapping. Default value is 256 pages.
+ stack_depot_disable= [KNL]
+ Setting this to true through kernel command line will
+ disable the stack depot thereby saving the static memory
+ consumed by the stack hash table. By default this is set
+ to false.
+
stacktrace [FTRACE]
Enabled the stack tracer on boot up.
stacktrace_filter=[function-list]
[FTRACE] Limit the functions that the stack tracer
- will trace at boot up. function-list is a comma separated
+ will trace at boot up. function-list is a comma-separated
list of functions. This list can be changed at run
time by the stack_trace_filter file in the debugfs
tracing directory. Note, this enables stack tracing
@@ -4822,6 +6340,25 @@
stifb= [HW]
Format: bpp:<bpp1>[:<bpp2>[:<bpp3>...]]
+ strict_sas_size=
+ [X86]
+ Format: <bool>
+ Enable or disable strict sigaltstack size checks
+ against the required signal frame size which
+ depends on the supported FPU features. This can
+ be used to filter out binaries which have
+ not yet been made aware of AT_MINSIGSTKSZ.
+
+ stress_hpt [PPC]
+ Limits the number of kernel HPT entries in the hash
+ page table to increase the rate of hash page table
+ faults on kernel addresses.
+
+ stress_slb [PPC]
+ Limits the number of kernel SLB entries, and flushes
+ them frequently to increase the rate of SLB faults
+ on kernel addresses.
+
sunrpc.min_resvport=
sunrpc.max_resvport=
[NFS,SUNRPC]
@@ -4877,28 +6414,26 @@
This parameter controls use of the Protected
Execution Facility on pSeries.
- swapaccount=[0|1]
- [KNL] Enable accounting of swap in memory resource
- controller if no parameter or 1 is given or disable
- it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst)
-
swiotlb= [ARM,IA-64,PPC,MIPS,X86]
- Format: { <int> | force | noforce }
+ Format: { <int> [,<int>] | force | noforce }
<int> -- Number of I/O TLB slabs
+ <int> -- Second integer after comma. Number of swiotlb
+ areas with their own lock. Will be rounded up
+ to a power of 2.
force -- force using of bounce buffers even if they
wouldn't be automatically used by the kernel
noforce -- Never use bounce buffers (for debugging)
switches= [HW,M68k]
- sysfs.deprecated=0|1 [KNL]
- Enable/disable old style sysfs layout for old udev
- on older distributions. When this option is enabled
- very new udev will not work anymore. When this option
- is disabled (or CONFIG_SYSFS_DEPRECATED not compiled)
- in older udev will not work anymore.
- Default depends on CONFIG_SYSFS_DEPRECATED_V2 set in
- the kernel configuration.
+ sysctl.*= [KNL]
+ Set a sysctl parameter, right before loading the init
+ process, as if the value was written to the respective
+ /proc/sys/... file. Both '.' and '/' are recognized as
+ separators. Unrecognized parameters and invalid values
+ are reported in the kernel log. Sysctls registered
+ later by a loaded module cannot be set this way.
+ Example: sysctl.vm.swappiness=40
sysrq_always_enabled
[KNL]
@@ -4910,12 +6445,13 @@
Set the number of tcp_metrics_hash slots.
Default value is 8192 or 16384 depending on total
ram pages. This is used to specify the TCP metrics
- cache size. See Documentation/networking/ip-sysctl.txt
+ cache size. See Documentation/networking/ip-sysctl.rst
"tcp_no_metrics_save" section for more details.
tdfx= [HW,DRM]
- test_suspend= [SUSPEND][,N]
+ test_suspend= [SUSPEND]
+ Format: { "mem" | "standby" | "freeze" }[,N]
Specify "mem" (for Suspend-to-RAM) or "standby" (for
standby suspend) or "freeze" (for suspend type freeze)
as the system sleep state during system startup with
@@ -4934,10 +6470,6 @@
-1: disable all critical trip points in all thermal zones
<degrees C>: override all critical trip points
- thermal.nocrt= [HW,ACPI]
- Set to disable actions on ACPI thermal zone
- critical and hot trip points.
-
thermal.off= [HW,ACPI]
1: disable ACPI thermal control
@@ -4973,6 +6505,21 @@
Prevent the CPU-hotplug component of torturing
until after init has spawned.
+ torture.ftrace_dump_at_shutdown= [KNL]
+ Dump the ftrace buffer at torture-test shutdown,
+ even if there were no errors. This can be a
+ very costly operation when many torture tests
+ are running concurrently, especially on systems
+ with rotating-rust storage.
+
+ torture.verbose_sleep_frequency= [KNL]
+ Specifies how many verbose printk()s should be
+ emitted between each sleep. The default of zero
+ disables verbose-printk() sleeping.
+
+ torture.verbose_sleep_duration= [KNL]
+ Duration of each verbose-printk() sleep in jiffies.
+
tp720= [HW,PS2]
tpm_suspend_pcr=[HW,TPM]
@@ -4984,22 +6531,102 @@
This will guarantee that all the other pcrs
are saved.
+ tpm_tis.interrupts= [HW,TPM]
+ Enable interrupts for the MMIO based physical layer
+ for the FIFO interface. By default it is set to false
+ (0). For more information about TPM hardware interfaces
+ defined by Trusted Computing Group (TCG) see
+ https://trustedcomputinggroup.org/resource/pc-client-platform-tpm-profile-ptp-specification/
+
+ tp_printk [FTRACE]
+ Have the tracepoints sent to printk as well as the
+ tracing ring buffer. This is useful for early boot up
+ where the system hangs or reboots and does not give the
+ option for reading the tracing buffer or performing a
+ ftrace_dump_on_oops.
+
+ To turn off having tracepoints sent to printk,
+ echo 0 > /proc/sys/kernel/tracepoint_printk
+ Note, echoing 1 into this file without the
+ tracepoint_printk kernel cmdline option has no effect.
+
+ The tp_printk_stop_on_boot (see below) can also be used
+ to stop the printing of events to console at
+ late_initcall_sync.
+
+ ** CAUTION **
+
+ Having tracepoints sent to printk() and activating high
+ frequency tracepoints such as irq or sched, can cause
+ the system to live lock.
+
+ tp_printk_stop_on_boot [FTRACE]
+ When tp_printk (above) is set, it can cause a lot of noise
+ on the console. It may be useful to only include the
+ printing of events during boot up, as user space may
+ make the system inoperable.
+
+ This command line option will stop the printing of events
+ to console at the late_initcall_sync() time frame.
+
trace_buf_size=nn[KMG]
[FTRACE] will set tracing buffer size on each cpu.
+ trace_clock= [FTRACE] Set the clock used for tracing events
+ at boot up.
+ local - Use the per CPU time stamp counter
+ (converted into nanoseconds). Fast, but
+ depending on the architecture, may not be
+ in sync between CPUs.
+ global - Event time stamps are synchronize across
+ CPUs. May be slower than the local clock,
+ but better for some race conditions.
+ counter - Simple counting of events (1, 2, ..)
+ note, some counts may be skipped due to the
+ infrastructure grabbing the clock more than
+ once per event.
+ uptime - Use jiffies as the time stamp.
+ perf - Use the same clock that perf uses.
+ mono - Use ktime_get_mono_fast_ns() for time stamps.
+ mono_raw - Use ktime_get_raw_fast_ns() for time
+ stamps.
+ boot - Use ktime_get_boot_fast_ns() for time stamps.
+ Architectures may add more clocks. See
+ Documentation/trace/ftrace.rst for more details.
+
trace_event=[event-list]
[FTRACE] Set and start specified trace events in order
to facilitate early boot debugging. The event-list is a
- comma separated list of trace events to enable. See
+ comma-separated list of trace events to enable. See
also Documentation/trace/events.rst
+ trace_instance=[instance-info]
+ [FTRACE] Create a ring buffer instance early in boot up.
+ This will be listed in:
+
+ /sys/kernel/tracing/instances
+
+ Events can be enabled at the time the instance is created
+ via:
+
+ trace_instance=<name>,<system1>:<event1>,<system2>:<event2>
+
+ Note, the "<system*>:" portion is optional if the event is
+ unique.
+
+ trace_instance=foo,sched:sched_switch,irq_handler_entry,initcall
+
+ will enable the "sched_switch" event (note, the "sched:" is optional, and
+ the same thing would happen if it was left off). The irq_handler_entry
+ event, and all events under the "initcall" system.
+
trace_options=[option-list]
[FTRACE] Enable or disable tracer options at boot.
The option-list is a comma delimited list of options
that can be enabled or disabled just as if you were
to echo the option name into
- /sys/kernel/debug/tracing/trace_options
+ /sys/kernel/tracing/trace_options
For example, to enable stacktrace option (to dump the
stack trace of each event), add to the command line:
@@ -5009,29 +6636,30 @@
See also Documentation/trace/ftrace.rst "trace options"
section.
- tp_printk[FTRACE]
- Have the tracepoints sent to printk as well as the
- tracing ring buffer. This is useful for early boot up
- where the system hangs or reboots and does not give the
- option for reading the tracing buffer or performing a
- ftrace_dump_on_oops.
+ trace_trigger=[trigger-list]
+ [FTRACE] Add a event trigger on specific events.
+ Set a trigger on top of a specific event, with an optional
+ filter.
- To turn off having tracepoints sent to printk,
- echo 0 > /proc/sys/kernel/tracepoint_printk
- Note, echoing 1 into this file without the
- tracepoint_printk kernel cmdline option has no effect.
+ The format is is "trace_trigger=<event>.<trigger>[ if <filter>],..."
+ Where more than one trigger may be specified that are comma deliminated.
- ** CAUTION **
+ For example:
+
+ trace_trigger="sched_switch.stacktrace if prev_state == 2"
+
+ The above will enable the "stacktrace" trigger on the "sched_switch"
+ event but only trigger it if the "prev_state" of the "sched_switch"
+ event is "2" (TASK_UNINTERUPTIBLE).
+
+ See also "Event triggers" in Documentation/trace/events.rst
- Having tracepoints sent to printk() and activating high
- frequency tracepoints such as irq or sched, can cause
- the system to live lock.
traceoff_on_warning
[FTRACE] enable this option to disable tracing when a
warning is hit. This turns off "tracing_on". Tracing can
be enabled again by echoing '1' into the "tracing_on"
- file located in /sys/kernel/debug/tracing/
+ file located in /sys/kernel/tracing/
This option is useful, as it disables the trace before
the WARNING dump is called, which prevents the trace to
@@ -5048,6 +6676,29 @@
See Documentation/admin-guide/mm/transhuge.rst
for more details.
+ trusted.source= [KEYS]
+ Format: <string>
+ This parameter identifies the trust source as a backend
+ for trusted keys implementation. Supported trust
+ sources:
+ - "tpm"
+ - "tee"
+ - "caam"
+ If not specified then it defaults to iterating through
+ the trust source list starting with TPM and assigns the
+ first trust source as a backend which is initialized
+ successfully during iteration.
+
+ trusted.rng= [KEYS]
+ Format: <string>
+ The RNG used to generate key material for trusted keys.
+ Can be one of:
+ - "kernel"
+ - the same value as trusted.source: "tpm" or "tee"
+ - "default"
+ If not specified, "default" is used. In this case,
+ the RNG's choice is left to each individual trust source.
+
tsc= Disable clocksource stability checks for TSC.
Format: <string>
[x86] reliable: mark tsc clocksource as reliable, this
@@ -5066,6 +6717,22 @@
in situations with strict latency requirements (where
interruptions from clocksource watchdog are not
acceptable).
+ [x86] recalibrate: force recalibration against a HW timer
+ (HPET or PM timer) on systems whose TSC frequency was
+ obtained from HW or FW using either an MSR or CPUID(0x15).
+ Warn if the difference is more than 500 ppm.
+ [x86] watchdog: Use TSC as the watchdog clocksource with
+ which to check other HW timers (HPET or PM timer), but
+ only on systems where TSC has been deemed trustworthy.
+ This will be suppressed by an earlier tsc=nowatchdog and
+ can be overridden by a later tsc=nowatchdog. A console
+ message will flag any such suppression or overriding.
+
+ tsc_early_khz= [X86] Skip early TSC calibration and use the given
+ value instead. Useful when the early TSC frequency discovery
+ procedure is not reliable, such as on overclocked systems
+ with CPUID.16h support and partial CPUID.15h support.
+ Format: <unsigned int>
tsx= [X86] Control Transactional Synchronization
Extensions (TSX) feature in Intel processors that
@@ -5162,9 +6829,15 @@
unknown_nmi_panic
[X86] Cause panic on unknown NMI.
+ unwind_debug [X86-64]
+ Enable unwinder debug output. This can be
+ useful for debugging certain unwinder error
+ conditions, including corrupt stacks and
+ bad/missing unwinder metadata.
+
usbcore.authorized_default=
[USB] Default USB device authorization:
- (default -1 = authorized except for wireless USB,
+ (default -1 = authorized (same as 1),
0 = not authorized, 1 = authorized, 2 = authorized
if device connected to internal port)
@@ -5262,6 +6935,9 @@
pause after every control message);
o = USB_QUIRK_HUB_SLOW_RESET (Hub needs extra
delay after resetting its port);
+ p = USB_QUIRK_SHORT_SET_ADDRESS_REQ_TIMEOUT
+ (Reduce timeout of the SET_ADDRESS
+ request from 5000 ms to 500 ms);
Example: quirks=0781:5580:bk,0a5c:5834:gij
usbhid.mousepoll=
@@ -5306,6 +6982,7 @@
device);
j = NO_REPORT_LUNS (don't use report luns
command, uas only);
+ k = NO_SAME (do not use WRITE_SAME, uas only)
l = NOT_LOCKABLE (don't try to lock and
unlock ejectable media, not on uas);
m = MAX_SECTORS_64 (don't transfer more
@@ -5348,7 +7025,7 @@
HIGHMEM regardless of setting
of CONFIG_HIGHPTE.
- vdso= [X86,SH]
+ vdso= [X86,SH,SPARC]
On X86_32, this is an alias for vdso32=. Otherwise:
vdso=1: enable VDSO (the default)
@@ -5374,11 +7051,12 @@
video= [FB] Frame buffer configuration
See Documentation/fb/modedb.rst.
- video.brightness_switch_enabled= [0,1]
+ video.brightness_switch_enabled= [ACPI]
+ Format: [0|1]
If set to 1, on receiving an ACPI notify event
generated by hotkey, video driver will adjust brightness
level and then send out the event to user space through
- the allocated input device; If set to 0, video driver
+ the allocated input device. If set to 0, video driver
will only send out the event without touching backlight
brightness level.
default: 1
@@ -5400,7 +7078,7 @@
Can be used multiple times for multiple devices.
vga= [BOOT,X86-32] Select a particular video mode
- See Documentation/x86/boot.rst and
+ See Documentation/arch/x86/boot.rst and
Documentation/admin-guide/svga.rst.
Use vga=ask for menu.
This is actually a boot loader parameter; the value is
@@ -5445,11 +7123,11 @@
functions are at fixed addresses, they make nice
targets for exploits that can control RIP.
- emulate [default] Vsyscalls turn into traps and are
- emulated reasonably safely. The vsyscall
- page is readable.
+ emulate Vsyscalls turn into traps and are emulated
+ reasonably safely. The vsyscall page is
+ readable.
- xonly Vsyscalls turn into traps and are
+ xonly [default] Vsyscalls turn into traps and are
emulated reasonably safely. The vsyscall
page is not readable.
@@ -5519,6 +7197,13 @@
disables both lockup detectors. Default is 10
seconds.
+ workqueue.unbound_cpus=
+ [KNL,SMP] Specify to constrain one or some CPUs
+ to use in unbound workqueues.
+ Format: <cpu-list>
+ By default, all online CPUs are available for
+ unbound workqueues.
+
workqueue.watchdog_thresh=
If CONFIG_WQ_WATCHDOG is configured, workqueue can
warn stall conditions and dump internal state to
@@ -5528,14 +7213,17 @@
it can be updated at runtime by writing to the
corresponding sysfs file.
- workqueue.disable_numa
- By default, all work items queued to unbound
- workqueues are affine to the NUMA nodes they're
- issued on, which results in better behavior in
- general. If NUMA affinity needs to be disabled for
- whatever reason, this option can be used. Note
- that this also can be controlled per-workqueue for
- workqueues visible under /sys/bus/workqueue/.
+ workqueue.cpu_intensive_thresh_us=
+ Per-cpu work items which run for longer than this
+ threshold are automatically considered CPU intensive
+ and excluded from concurrency management to prevent
+ them from noticeably delaying other per-cpu work
+ items. Default is 10000 (10ms).
+
+ If CONFIG_WQ_CPU_INTENSIVE_REPORT is set, the kernel
+ will report the work functions which violate this
+ threshold repeatedly. They are likely good
+ candidates for using WQ_UNBOUND workqueues instead.
workqueue.power_efficient
Per-cpu workqueues are generally preferred because
@@ -5552,6 +7240,18 @@
The default value of this parameter is determined by
the config option CONFIG_WQ_POWER_EFFICIENT_DEFAULT.
+ workqueue.default_affinity_scope=
+ Select the default affinity scope to use for unbound
+ workqueues. Can be one of "cpu", "smt", "cache",
+ "numa" and "system". Default is "cache". For more
+ information, see the Affinity Scopes section in
+ Documentation/core-api/workqueue.rst.
+
+ This can be changed after boot by writing to the
+ matching /sys/module/workqueue/parameters file. All
+ workqueues with the "default" affinity scope will be
+ updated accordignly.
+
workqueue.debug_force_rr_cpu
Workqueue used to implicitly guarantee that work
items queued without explicit CPU specified are put
@@ -5563,16 +7263,16 @@
When enabled, memory and cache locality will be
impacted.
+ writecombine= [LOONGARCH] Control the MAT (Memory Access Type) of
+ ioremap_wc().
+
+ on - Enable writecombine, use WUC for ioremap_wc()
+ off - Disable writecombine, use SUC for ioremap_wc()
+
x2apic_phys [X86-64,APIC] Use x2apic physical mode instead of
default x2apic cluster mode on platforms
supporting x2apic.
- x86_intel_mid_timer= [X86-32,APBT]
- Choose timer option for x86 Intel MID platform.
- Two valid options are apbt timer only and lapic timer
- plus one apbt timer for broadcast timer.
- x86_intel_mid_timer=apbt_only | lapic_and_apbt
-
xen_512gb_limit [KNL,X86-64,XEN]
Restricts the kernel running paravirtualized under Xen
to use only up to 512 GB of RAM. The reason to do so is
@@ -5596,9 +7296,16 @@
Crash from Xen panic notifier, without executing late
panic() code such as dumping handler.
+ xen_msr_safe= [X86,XEN]
+ Format: <bool>
+ Select whether to always use non-faulting (safe) MSR
+ access functions when running as Xen PV guest. The
+ default value is controlled by CONFIG_XEN_PV_MSR_SAFE.
+
xen_nopvspin [X86,XEN]
- Disables the ticketlock slowpath using Xen PV
- optimizations.
+ Disables the qspinlock slowpath using Xen PV optimizations.
+ This parameter is obsoleted by "nopvspin" parameter, which
+ has equivalent effect for XEN platform.
xen_nopv [X86]
Disables the PV optimizations forcing the HVM guest to
@@ -5606,6 +7313,10 @@
This option is obsoleted by the "nopv" option, which
has equivalent effect for XEN platform.
+ xen_no_vector_callback
+ [KNL,X86,XEN] Disable the vector callback for Xen
+ event channel interrupts.
+
xen_scrub_pages= [XEN]
Boolean option to control scrubbing pages before giving them back
to Xen, for use by other domains. Can be also changed at runtime
@@ -5619,10 +7330,27 @@
improve timer resolution at the expense of processing
more timer interrupts.
- nopv= [X86,XEN,KVM,HYPER_V,VMWARE]
- Disables the PV optimizations forcing the guest to run
- as generic guest with no PV drivers. Currently support
- XEN HVM, KVM, HYPER_V and VMWARE guest.
+ xen.balloon_boot_timeout= [XEN]
+ The time (in seconds) to wait before giving up to boot
+ in case initial ballooning fails to free enough memory.
+ Applies only when running as HVM or PVH guest and
+ started with less memory configured than allowed at
+ max. Default is 180.
+
+ xen.event_eoi_delay= [XEN]
+ How long to delay EOI handling in case of event
+ storms (jiffies). Default is 10.
+
+ xen.event_loop_timeout= [XEN]
+ After which time (jiffies) the event handling loop
+ should start to delay EOI handling. Default is 2.
+
+ xen.fifo_events= [XEN]
+ Boolean parameter to disable using fifo event handling
+ even if available. Normally fifo event handling is
+ preferred over the 2-level event handling, as it is
+ fairer and the number of possible event channels is
+ much higher. Default is on (use fifo events).
xirc2ps_cs= [NET,PCMCIA]
Format:
@@ -5637,6 +7365,12 @@
controller on both pseries and powernv
platforms. Only useful on POWER9 and above.
+ xive.store-eoi=off [PPC]
+ By default on POWER10 and above, the kernel will use
+ stores for EOI handling when the XIVE interrupt mode
+ is active. This option allows the XIVE driver to use
+ loads instead, as on POWER9.
+
xhci-hcd.quirks [USB,KNL]
A hex value specifying bitmask with supplemental xhci
host controller quirks. Meaning of each bit can be
@@ -5660,3 +7394,4 @@
memory, and other data can't be written using
xmon commands.
off xmon is disabled.
+
diff --git a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst
index 21818aca4708..b6aeae3327ce 100644
--- a/Documentation/admin-guide/kernel-per-CPU-kthreads.rst
+++ b/Documentation/admin-guide/kernel-per-CPU-kthreads.rst
@@ -10,7 +10,7 @@ them to a "housekeeping" CPU dedicated to such work.
References
==========
-- Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs.
+- Documentation/core-api/irq/irq-affinity.rst: Binding interrupts to sets of CPUs.
- Documentation/admin-guide/cgroup-v1: Using cgroups to bind tasks to sets of CPUs.
@@ -25,7 +25,7 @@ References
- In order to locate kernel-generated OS jitter on CPU N:
- cd /sys/kernel/debug/tracing
+ cd /sys/kernel/tracing
echo 1 > max_graph_depth # Increase the "1" for more detail
echo function_graph > current_tracer
# run workload
@@ -208,7 +208,7 @@ Do at least one of the following:
2. Enable RCU to do its processing remotely via dyntick-idle by
doing all of the following:
- a. Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y.
+ a. Build with CONFIG_NO_HZ=y.
b. Ensure that the CPU goes idle frequently, allowing other
CPUs to detect that it has passed through an RCU quiescent
state. If the kernel is built with CONFIG_NO_HZ_FULL=y,
@@ -243,13 +243,9 @@ To reduce its OS jitter, do any of the following:
3. Do any of the following needed to avoid jitter that your
application cannot tolerate:
- a. Build your kernel with CONFIG_SLUB=y rather than
- CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
- use of each CPU's workqueues to run its cache_reap()
- function.
- b. Avoid using oprofile, thus avoiding OS jitter from
+ a. Avoid using oprofile, thus avoiding OS jitter from
wq_sync_buffer().
- c. Limit your CPU frequency so that a CPU-frequency
+ b. Limit your CPU frequency so that a CPU-frequency
governor is not required, possibly enlisting the aid of
special heatsinks or other cooling technologies. If done
correctly, and if you CPU architecture permits, you should
@@ -259,7 +255,7 @@ To reduce its OS jitter, do any of the following:
WARNING: Please check your CPU specifications to
make sure that this is safe on your particular system.
- d. As of v3.18, Christoph Lameter's on-demand vmstat workers
+ c. As of v3.18, Christoph Lameter's on-demand vmstat workers
commit prevents OS jitter due to vmstat_update() on
CONFIG_SMP=y systems. Before v3.18, is not possible
to entirely get rid of the OS jitter, but you can
@@ -273,8 +269,8 @@ To reduce its OS jitter, do any of the following:
However, there is an RFC patch from Christoph Lameter
(based on an earlier one from Gilad Ben-Yossef) that
reduces or even eliminates vmstat overhead for some
- workloads at https://lkml.org/lkml/2013/9/4/379.
- e. If running on high-end powerpc servers, build with
+ workloads at https://lore.kernel.org/r/00000140e9dfd6bd-40db3d4f-c1be-434f-8132-7820f81bb586-000000@email.amazonses.com.
+ d. If running on high-end powerpc servers, build with
CONFIG_PPC_RTAS_DAEMON=n. This prevents the RTAS
daemon from running on each CPU every second or so.
(This will require editing Kconfig files and will defeat
@@ -282,12 +278,12 @@ To reduce its OS jitter, do any of the following:
due to the rtas_event_scan() function.
WARNING: Please check your CPU specifications to
make sure that this is safe on your particular system.
- f. If running on Cell Processor, build your kernel with
+ e. If running on Cell Processor, build your kernel with
CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from
spu_gov_work().
WARNING: Please check your CPU specifications to
make sure that this is safe on your particular system.
- g. If running on PowerMAC, build your kernel with
+ f. If running on PowerMAC, build your kernel with
CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
avoiding OS jitter from rackmeter_do_timer().
@@ -332,23 +328,3 @@ To reduce its OS jitter, do at least one of the following:
kthreads from being created in the first place. However, please
note that this will not eliminate OS jitter, but will instead
shift it to RCU_SOFTIRQ.
-
-Name:
- watchdog/%u
-
-Purpose:
- Detect software lockups on each CPU.
-
-To reduce its OS jitter, do at least one of the following:
-
-1. Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these
- kthreads from being created in the first place.
-2. Boot with "nosoftlockup=0", which will also prevent these kthreads
- from being created. Other related watchdog and softlockup boot
- parameters may be found in Documentation/admin-guide/kernel-parameters.rst
- and Documentation/watchdog/watchdog-parameters.rst.
-3. Echo a zero to /proc/sys/kernel/watchdog to disable the
- watchdog timer.
-4. Echo a large number of /proc/sys/kernel/watchdog_thresh in
- order to reduce the frequency of OS jitter due to the watchdog
- timer down to a level that is acceptable for your workload.
diff --git a/Documentation/admin-guide/laptops/disk-shock-protection.rst b/Documentation/admin-guide/laptops/disk-shock-protection.rst
index e97c5f78d8c3..22c7ec3e84cf 100644
--- a/Documentation/admin-guide/laptops/disk-shock-protection.rst
+++ b/Documentation/admin-guide/laptops/disk-shock-protection.rst
@@ -135,7 +135,7 @@ single project which, although still considered experimental, is fit
for use. Please feel free to add projects that have been the victims
of my ignorance.
-- http://www.thinkwiki.org/wiki/HDAPS
+- https://www.thinkwiki.org/wiki/HDAPS
See this page for information about Linux support of the hard disk
active protection system as implemented in IBM/Lenovo Thinkpads.
diff --git a/Documentation/admin-guide/laptops/laptop-mode.rst b/Documentation/admin-guide/laptops/laptop-mode.rst
index c984c4262f2e..b61cc601d298 100644
--- a/Documentation/admin-guide/laptops/laptop-mode.rst
+++ b/Documentation/admin-guide/laptops/laptop-mode.rst
@@ -101,17 +101,6 @@ this results in concentration of disk activity in a small time interval which
occurs only once every 10 minutes, or whenever the disk is forced to spin up by
a cache miss. The disk can then be spun down in the periods of inactivity.
-If you want to find out which process caused the disk to spin up, you can
-gather information by setting the flag /proc/sys/vm/block_dump. When this flag
-is set, Linux reports all disk read and write operations that take place, and
-all block dirtyings done to files. This makes it possible to debug why a disk
-needs to spin up, and to increase battery life even more. The output of
-block_dump is written to the kernel output, and it can be retrieved using
-"dmesg". When you use block_dump and your kernel logging level also includes
-kernel debugging messages, you probably want to turn off klogd, otherwise
-the output of block_dump will be logged, causing disk activity that is not
-normally there.
-
Configuration
-------------
diff --git a/Documentation/admin-guide/laptops/lg-laptop.rst b/Documentation/admin-guide/laptops/lg-laptop.rst
index ce9b14671cb9..67fd6932cef4 100644
--- a/Documentation/admin-guide/laptops/lg-laptop.rst
+++ b/Documentation/admin-guide/laptops/lg-laptop.rst
@@ -13,10 +13,8 @@ Hotkeys
The following FN keys are ignored by the kernel without this driver:
- FN-F1 (LG control panel) - Generates F15
-- FN-F5 (Touchpad toggle) - Generates F13
+- FN-F5 (Touchpad toggle) - Generates F21
- FN-F6 (Airplane mode) - Generates RFKILL
-- FN-F8 (Keyboard backlight) - Generates F16.
- This key also changes keyboard backlight mode.
- FN-F9 (Reader mode) - Generates F14
The rest of the FN keys work without a need for a special driver.
@@ -40,7 +38,7 @@ FN lock.
Battery care limit
------------------
-Writing 80/100 to /sys/devices/platform/lg-laptop/battery_care_limit
+Writing 80/100 to /sys/class/power_supply/CMB0/charge_control_end_threshold
sets the maximum capacity to charge the battery. Limiting the charge
reduces battery capacity loss over time.
diff --git a/Documentation/admin-guide/laptops/sonypi.rst b/Documentation/admin-guide/laptops/sonypi.rst
index c6eaaf48f7c1..190da1234314 100644
--- a/Documentation/admin-guide/laptops/sonypi.rst
+++ b/Documentation/admin-guide/laptops/sonypi.rst
@@ -151,7 +151,7 @@ Bugs:
different way to adjust the backlighting of the screen. There
is a userspace utility to adjust the brightness on those models,
which can be downloaded from
- http://www.acc.umu.se/~erikw/program/smartdimmer-0.1.tar.bz2
+ https://www.acc.umu.se/~erikw/program/smartdimmer-0.1.tar.bz2
- since all development was done by reverse engineering, there is
*absolutely no guarantee* that this driver will not crash your
diff --git a/Documentation/admin-guide/laptops/thinkpad-acpi.rst b/Documentation/admin-guide/laptops/thinkpad-acpi.rst
index 822907dcc845..98d304010170 100644
--- a/Documentation/admin-guide/laptops/thinkpad-acpi.rst
+++ b/Documentation/admin-guide/laptops/thinkpad-acpi.rst
@@ -50,6 +50,10 @@ detailed description):
- WAN enable and disable
- UWB enable and disable
- LCD Shadow (PrivacyGuard) enable and disable
+ - Lap mode sensor
+ - Setting keyboard language
+ - WWAN Antenna type
+ - Auxmac
A compatibility table by model and feature is maintained on the web
site, http://ibm-acpi.sf.net/. I appreciate any success or failure
@@ -904,7 +908,7 @@ temperatures:
The mapping of thermal sensors to physical locations varies depending on
system-board model (and thus, on ThinkPad model).
-http://thinkwiki.org/wiki/Thermal_Sensors is a public wiki page that
+https://thinkwiki.org/wiki/Thermal_Sensors is a public wiki page that
tries to track down these locations for various models.
Most (newer?) models seem to follow this pattern:
@@ -925,7 +929,7 @@ For the R51 (source: Thomas Gruber):
- 3: Internal HDD
For the T43, T43/p (source: Shmidoax/Thinkwiki.org)
-http://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_T43.2C_T43p
+https://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_T43.2C_T43p
- 2: System board, left side (near PCMCIA slot), reported as HDAPS temp
- 3: PCMCIA slot
@@ -935,7 +939,7 @@ http://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_T43.2C_T43p
- 11: Power regulator, underside of system board, below F2 key
The A31 has a very atypical layout for the thermal sensors
-(source: Milos Popovic, http://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_A31)
+(source: Milos Popovic, https://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_A31)
- 1: CPU
- 2: Main Battery: main sensor
@@ -1432,6 +1436,20 @@ The first command ensures the best viewing angle and the latter one turns
on the feature, restricting the viewing angles.
+DYTC Lapmode sensor
+-------------------
+
+sysfs: dytc_lapmode
+
+Newer thinkpads and mobile workstations have the ability to determine if
+the device is in deskmode or lapmode. This feature is used by user space
+to decide if WWAN transmission can be increased to maximum power and is
+also useful for understanding the different thermal modes available as
+they differ between desk and lap mode.
+
+The property is read-only. If the platform doesn't have support the sysfs
+class is not created.
+
EXPERIMENTAL: UWB
-----------------
@@ -1451,6 +1469,68 @@ Sysfs notes
rfkill controller switch "tpacpi_uwb_sw": refer to
Documentation/driver-api/rfkill.rst for details.
+
+Setting keyboard language
+-------------------------
+
+sysfs: keyboard_lang
+
+This feature is used to set keyboard language to ECFW using ASL interface.
+Fewer thinkpads models like T580 , T590 , T15 Gen 1 etc.. has "=", "(',
+")" numeric keys, which are not displaying correctly, when keyboard language
+is other than "english". This is because the default keyboard language in ECFW
+is set as "english". Hence using this sysfs, user can set the correct keyboard
+language to ECFW and then these key's will work correctly.
+
+Example of command to set keyboard language is mentioned below::
+
+ echo jp > /sys/devices/platform/thinkpad_acpi/keyboard_lang
+
+Text corresponding to keyboard layout to be set in sysfs are: be(Belgian),
+cz(Czech), da(Danish), de(German), en(English), es(Spain), et(Estonian),
+fr(French), fr-ch(French(Switzerland)), hu(Hungarian), it(Italy), jp (Japan),
+nl(Dutch), nn(Norway), pl(Polish), pt(portuguese), sl(Slovenian), sv(Sweden),
+tr(Turkey)
+
+WWAN Antenna type
+-----------------
+
+sysfs: wwan_antenna_type
+
+On some newer Thinkpads we need to set SAR value based on the antenna
+type. This interface will be used by userspace to get the antenna type
+and set the corresponding SAR value, as is required for FCC certification.
+
+The available commands are::
+
+ cat /sys/devices/platform/thinkpad_acpi/wwan_antenna_type
+
+Currently 2 antenna types are supported as mentioned below:
+- type a
+- type b
+
+The property is read-only. If the platform doesn't have support the sysfs
+class is not created.
+
+Auxmac
+------
+
+sysfs: auxmac
+
+Some newer Thinkpads have a feature called MAC Address Pass-through. This
+feature is implemented by the system firmware to provide a system unique MAC,
+that can override a dock or USB ethernet dongle MAC, when connected to a
+network. This property enables user-space to easily determine the MAC address
+if the feature is enabled.
+
+The values of this auxiliary MAC are:
+
+ cat /sys/devices/platform/thinkpad_acpi/auxmac
+
+If the feature is disabled, the value will be 'disabled'.
+
+This property is read-only.
+
Adaptive keyboard
-----------------
@@ -1460,15 +1540,32 @@ This sysfs attribute controls the keyboard "face" that will be shown on the
Lenovo X1 Carbon 2nd gen (2014)'s adaptive keyboard. The value can be read
and set.
-- 1 = Home mode
-- 2 = Web-browser mode
-- 3 = Web-conference mode
-- 4 = Function mode
-- 5 = Layflat mode
+- 0 = Home mode
+- 1 = Web-browser mode
+- 2 = Web-conference mode
+- 3 = Function mode
+- 4 = Layflat mode
For more details about which buttons will appear depending on the mode, please
review the laptop's user guide:
-http://www.lenovo.com/shop/americas/content/user_guides/x1carbon_2_ug_en.pdf
+https://download.lenovo.com/ibmdl/pub/pc/pccbbs/mobiles_pdf/x1carbon_2_ug_en.pdf
+
+Battery charge control
+----------------------
+
+sysfs attributes:
+/sys/class/power_supply/BAT*/charge_control_{start,end}_threshold
+
+These two attributes are created for those batteries that are supported by the
+driver. They enable the user to control the battery charge thresholds of the
+given battery. Both values may be read and set. `charge_control_start_threshold`
+accepts an integer between 0 and 99 (inclusive); this value represents a battery
+percentage level, below which charging will begin. `charge_control_end_threshold`
+accepts an integer between 1 and 100 (inclusive); this value represents a battery
+percentage level, above which charging will stop.
+
+The exact semantics of the attributes may be found in
+Documentation/ABI/testing/sysfs-class-power.
Multiple Commands, Module Parameters
------------------------------------
diff --git a/Documentation/admin-guide/lockup-watchdogs.rst b/Documentation/admin-guide/lockup-watchdogs.rst
index 290840c160af..3e09284a8b9b 100644
--- a/Documentation/admin-guide/lockup-watchdogs.rst
+++ b/Documentation/admin-guide/lockup-watchdogs.rst
@@ -39,7 +39,7 @@ in principle, they should work in any architecture where these
subsystems are present.
A periodic hrtimer runs to generate interrupts and kick the watchdog
-task. An NMI perf event is generated every "watchdog_thresh"
+job. An NMI perf event is generated every "watchdog_thresh"
(compile-time initialized to 10 and configurable through sysctl of the
same name) seconds to check for hardlockups. If any CPU in the system
does not receive any hrtimer interrupt during that time the
@@ -47,7 +47,7 @@ does not receive any hrtimer interrupt during that time the
generate a kernel warning or call panic, depending on the
configuration.
-The watchdog task is a high priority kernel thread that updates a
+The watchdog job runs in a stop scheduling thread that updates a
timestamp every time it is scheduled. If that timestamp is not updated
for 2*watchdog_thresh seconds (the softlockup threshold) the
'softlockup detector' (coded inside the hrtimer callback function)
diff --git a/Documentation/admin-guide/md.rst b/Documentation/admin-guide/md.rst
index 3c51084ffd37..4ff2cc291d18 100644
--- a/Documentation/admin-guide/md.rst
+++ b/Documentation/admin-guide/md.rst
@@ -5,7 +5,7 @@ Boot time assembly of RAID arrays
---------------------------------
Tools that manage md devices can be found at
- http://www.kernel.org/pub/linux/utils/raid/
+ https://www.kernel.org/pub/linux/utils/raid/
You can boot with your md device with the following kernel command
@@ -221,7 +221,7 @@ All md devices contain:
layout
The ``layout`` for the array for the particular level. This is
- simply a number that is interpretted differently by different
+ simply a number that is interpreted differently by different
levels. It can be written while assembling an array.
array_size
@@ -317,7 +317,7 @@ All md devices contain:
suspended (not supported yet)
All IO requests will block. The array can be reconfigured.
- Writing this, if accepted, will block until array is quiessent
+ Writing this, if accepted, will block until array is quiescent
readonly
no resync can happen. no superblocks get written.
@@ -426,6 +426,10 @@ All md devices contain:
The accepted values when writing to this file are ``ppl`` and ``resync``,
used to enable and disable PPL.
+ uuid
+ This indicates the UUID of the array in the following format:
+ xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
+
As component devices are added to an md array, they appear in the ``md``
directory as new directories named::
diff --git a/Documentation/admin-guide/media/au0828-cardlist.rst b/Documentation/admin-guide/media/au0828-cardlist.rst
new file mode 100644
index 000000000000..aaaadc934e7a
--- /dev/null
+++ b/Documentation/admin-guide/media/au0828-cardlist.rst
@@ -0,0 +1,39 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+AU0828 cards list
+=================
+
+.. tabularcolumns:: |p{1.4cm}|p{6.5cm}|p{10.0cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 2 19 18
+ :stub-columns: 0
+
+ * - Card number
+ - Card name
+ - USB IDs
+
+ * - 0
+ - Unknown board
+ -
+
+ * - 1
+ - Hauppauge HVR950Q
+ - 2040:7200, 2040:7210, 2040:7217, 2040:721b, 2040:721e, 2040:721f, 2040:7280, 0fd9:0008, 2040:7260, 2040:7213, 2040:7270
+
+ * - 2
+ - Hauppauge HVR850
+ - 2040:7240
+
+ * - 3
+ - DViCO FusionHDTV USB
+ - 0fe9:d620
+
+ * - 4
+ - Hauppauge HVR950Q rev xxF8
+ - 2040:7201, 2040:7211, 2040:7281
+
+ * - 5
+ - Hauppauge Woodbury
+ - 05e1:0480, 2040:8200
diff --git a/Documentation/admin-guide/media/avermedia.rst b/Documentation/admin-guide/media/avermedia.rst
new file mode 100644
index 000000000000..93ff74002d20
--- /dev/null
+++ b/Documentation/admin-guide/media/avermedia.rst
@@ -0,0 +1,94 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================
+Avermedia DVB-T on BT878 Release Notes
+======================================
+
+February 14th 2006
+
+.. note::
+
+ Several other Avermedia devices are supported. For a more
+ broader and updated content about that, please check:
+
+ https://linuxtv.org/wiki/index.php/AVerMedia
+
+The Avermedia DVB-T
+~~~~~~~~~~~~~~~~~~~
+
+The Avermedia DVB-T is a budget PCI DVB card. It has 3 inputs:
+
+* RF Tuner Input
+* Composite Video Input (RCA Jack)
+* SVIDEO Input (Mini-DIN)
+
+The RF Tuner Input is the input to the tuner module of the
+card. The Tuner is otherwise known as the "Frontend" . The
+Frontend of the Avermedia DVB-T is a Microtune 7202D. A timely
+post to the linux-dvb mailing list ascertained that the
+Microtune 7202D is supported by the sp887x driver which is
+found in the dvb-hw CVS module.
+
+The DVB-T card is based around the BT878 chip which is a very
+common multimedia bridge and often found on Analogue TV cards.
+There is no on-board MPEG2 decoder, which means that all MPEG2
+decoding must be done in software, or if you have one, on an
+MPEG2 hardware decoding card or chipset.
+
+
+Getting the card going
+~~~~~~~~~~~~~~~~~~~~~~
+
+At this stage, it has not been able to ascertain the
+functionality of the remaining device nodes in respect of the
+Avermedia DVBT. However, full functionality in respect of
+tuning, receiving and supplying the MPEG2 data stream is
+possible with the currently available versions of the driver.
+It may be possible that additional functionality is available
+from the card (i.e. viewing the additional analogue inputs
+that the card presents), but this has not been tested yet. If
+I get around to this, I'll update the document with whatever I
+find.
+
+To power up the card, load the following modules in the
+following order:
+
+* modprobe bttv (normally loaded automatically)
+* modprobe dvb-bt8xx (or place dvb-bt8xx in /etc/modules)
+
+Insertion of these modules into the running kernel will
+activate the appropriate DVB device nodes. It is then possible
+to start accessing the card with utilities such as scan, tzap,
+dvbstream etc.
+
+The frontend module sp887x.o, requires an external firmware.
+Please use the command "get_dvb_firmware sp887x" to download
+it. Then copy it to /usr/lib/hotplug/firmware or /lib/firmware/
+(depending on configuration of firmware hotplug).
+
+Known Limitations
+~~~~~~~~~~~~~~~~~
+
+At present I can say with confidence that the frontend tunes
+via /dev/dvb/adapter{x}/frontend0 and supplies an MPEG2 stream
+via /dev/dvb/adapter{x}/dvr0. I have not tested the
+functionality of any other part of the card yet. I will do so
+over time and update this document.
+
+There are some limitations in the i2c layer due to a returned
+error message inconsistency. Although this generates errors in
+dmesg and the system logs, it does not appear to affect the
+ability of the frontend to function correctly.
+
+Further Update
+~~~~~~~~~~~~~~
+
+dvbstream and VideoLAN Client on windows works a treat with
+DVB, in fact this is currently serving as my main way of
+viewing DVB-T at the moment. Additionally, VLC is happily
+decoding HDTV signals, although the PC is dropping the odd
+frame here and there - I assume due to processing capability -
+as all the decoding is being done under windows in software.
+
+Many thanks to Nigel Pearson for the updates to this document
+since the recent revision of the driver.
diff --git a/Documentation/admin-guide/media/bt8xx.rst b/Documentation/admin-guide/media/bt8xx.rst
new file mode 100644
index 000000000000..3589f6ab7e46
--- /dev/null
+++ b/Documentation/admin-guide/media/bt8xx.rst
@@ -0,0 +1,157 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
+How to get the bt8xx cards working
+==================================
+
+Authors:
+ Richard Walker,
+ Jamie Honan,
+ Michael Hunold,
+ Manu Abraham,
+ Uwe Bugla,
+ Michael Krufky
+
+General information
+-------------------
+
+This class of cards has a bt878a as the PCI interface, and require the bttv
+driver for accessing the i2c bus and the gpio pins of the bt8xx chipset.
+
+Please see Documentation/admin-guide/media/bttv-cardlist.rst for a complete
+list of Cards based on the Conexant Bt8xx PCI bridge supported by the
+Linux Kernel.
+
+In order to be able to compile the kernel, some config options should be
+enabled::
+
+ ./scripts/config -e PCI
+ ./scripts/config -e INPUT
+ ./scripts/config -m I2C
+ ./scripts/config -m MEDIA_SUPPORT
+ ./scripts/config -e MEDIA_PCI_SUPPORT
+ ./scripts/config -e MEDIA_ANALOG_TV_SUPPORT
+ ./scripts/config -e MEDIA_DIGITAL_TV_SUPPORT
+ ./scripts/config -e MEDIA_RADIO_SUPPORT
+ ./scripts/config -e RC_CORE
+ ./scripts/config -m VIDEO_BT848
+ ./scripts/config -m DVB_BT8XX
+
+If you want to automatically support all possible variants of the Bt8xx
+cards, you should also do::
+
+ ./scripts/config -e MEDIA_SUBDRV_AUTOSELECT
+
+.. note::
+
+ Please use the following options with care as deselection of drivers which
+ are in fact necessary may result in DVB devices that cannot be tuned due
+ to lack of driver support.
+
+If your goal is to just support an specific board, you may, instead,
+disable MEDIA_SUBDRV_AUTOSELECT and manually select the frontend drivers
+required by your board. With that, you can save some RAM.
+
+You can do that by calling make xconfig/qconfig/menuconfig and look at
+the options on those menu options (only enabled if
+``Autoselect ancillary drivers`` is disabled:
+
+#) ``Device drivers`` => ``Multimedia support`` => ``Customize TV tuners``
+#) ``Device drivers`` => ``Multimedia support`` => ``Customize DVB frontends``
+
+Then, on each of the above menu, please select your card-specific
+frontend and tuner modules.
+
+
+Loading Modules
+---------------
+
+Regular case: If the bttv driver detects a bt8xx-based DVB card, all
+frontend and backend modules will be loaded automatically.
+
+Exceptions are:
+
+- Old TV cards without EEPROMs, sharing a common PCI subsystem ID;
+- Old TwinHan DST cards or clones with or without CA slot and not
+ containing an Eeprom.
+
+In the following cases overriding the PCI type detection for bttv and
+for dvb-bt8xx drivers by passing modprobe parameters may be necessary.
+
+Running TwinHan and Clones
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As shown at Documentation/admin-guide/media/bttv-cardlist.rst, TwinHan and
+clones use ``card=113`` modprobe parameter. So, in order to properly
+detect it for devices without EEPROM, you should use::
+
+ $ modprobe bttv card=113
+ $ modprobe dst
+
+Useful parameters for verbosity level and debugging the dst module::
+
+ verbose=0: messages are disabled
+ 1: only error messages are displayed
+ 2: notifications are displayed
+ 3: other useful messages are displayed
+ 4: debug setting
+ dst_addons=0: card is a free to air (FTA) card only
+ 0x20: card has a conditional access slot for scrambled channels
+ dst_algo=0: (default) Software tuning algorithm
+ 1: Hardware tuning algorithm
+
+
+The autodetected values are determined by the cards' "response string".
+
+In your logs see f. ex.: dst_get_device_id: Recognize [DSTMCI].
+
+For bug reports please send in a complete log with verbose=4 activated.
+Please also see Documentation/admin-guide/media/ci.rst.
+
+Running multiple cards
+~~~~~~~~~~~~~~~~~~~~~~
+
+See Documentation/admin-guide/media/bttv-cardlist.rst for a complete list of
+Card ID. Some examples:
+
+ =========================== ===
+ Brand name ID
+ =========================== ===
+ Pinnacle PCTV Sat 94
+ Nebula Electronics Digi TV 104
+ pcHDTV HD-2000 TV 112
+ Twinhan DST and clones 113
+ Avermedia AverTV DVB-T 77: 123
+ Avermedia AverTV DVB-T 761 124
+ DViCO FusionHDTV DVB-T Lite 128
+ DViCO FusionHDTV 5 Lite 135
+ =========================== ===
+
+.. note::
+
+ When you have multiple cards, the order of the card ID should
+ match the order where they're detected by the system. Please notice
+ that removing/inserting other PCI cards may change the detection
+ order.
+
+Example::
+
+ $ modprobe bttv card=113 card=135
+
+In case of further problems please subscribe and send questions to
+the mailing list: linux-media@vger.kernel.org.
+
+Probing the cards with broken PCI subsystem ID
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There are some TwinHan cards whose EEPROM has become corrupted for some
+reason. The cards do not have a correct PCI subsystem ID.
+Still, it is possible to force probing the cards with::
+
+ $ echo 109e 0878 $subvendor $subdevice > \
+ /sys/bus/pci/drivers/bt878/new_id
+
+The two numbers there are::
+
+ 109e: PCI_VENDOR_ID_BROOKTREE
+ 0878: PCI_DEVICE_ID_BROOKTREE_878
diff --git a/Documentation/admin-guide/media/bttv-cardlist.rst b/Documentation/admin-guide/media/bttv-cardlist.rst
new file mode 100644
index 000000000000..8671d4f7ba7b
--- /dev/null
+++ b/Documentation/admin-guide/media/bttv-cardlist.rst
@@ -0,0 +1,683 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+BTTV cards list
+===============
+
+.. tabularcolumns:: |p{1.4cm}|p{11.1cm}|p{4.2cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 2 19 18
+ :stub-columns: 0
+
+ * - Card number
+ - Card name
+ - PCI subsystem IDs
+
+ * - 0
+ - *** UNKNOWN/GENERIC ***
+ -
+
+ * - 1
+ - MIRO PCTV
+ -
+
+ * - 2
+ - Hauppauge (bt848)
+ -
+
+ * - 3
+ - STB, Gateway P/N 6000699 (bt848)
+ -
+
+ * - 4
+ - Intel Create and Share PCI/ Smart Video Recorder III
+ -
+
+ * - 5
+ - Diamond DTV2000
+ -
+
+ * - 6
+ - AVerMedia TVPhone
+ -
+
+ * - 7
+ - MATRIX-Vision MV-Delta
+ -
+
+ * - 8
+ - Lifeview FlyVideo II (Bt848) LR26 / MAXI TV Video PCI2 LR26
+ -
+
+ * - 9
+ - IMS/IXmicro TurboTV
+ -
+
+ * - 10
+ - Hauppauge (bt878)
+ - 0070:13eb, 0070:3900, 2636:10b4
+
+ * - 11
+ - MIRO PCTV pro
+ -
+
+ * - 12
+ - ADS Technologies Channel Surfer TV (bt848)
+ -
+
+ * - 13
+ - AVerMedia TVCapture 98
+ - 1461:0002, 1461:0004, 1461:0300
+
+ * - 14
+ - Aimslab Video Highway Xtreme (VHX)
+ -
+
+ * - 15
+ - Zoltrix TV-Max
+ - a1a0:a0fc
+
+ * - 16
+ - Prolink Pixelview PlayTV (bt878)
+ -
+
+ * - 17
+ - Leadtek WinView 601
+ -
+
+ * - 18
+ - AVEC Intercapture
+ -
+
+ * - 19
+ - Lifeview FlyVideo II EZ /FlyKit LR38 Bt848 (capture only)
+ -
+
+ * - 20
+ - CEI Raffles Card
+ -
+
+ * - 21
+ - Lifeview FlyVideo 98/ Lucky Star Image World ConferenceTV LR50
+ -
+
+ * - 22
+ - Askey CPH050/ Phoebe Tv Master + FM
+ - 14ff:3002
+
+ * - 23
+ - Modular Technology MM201/MM202/MM205/MM210/MM215 PCTV, bt878
+ - 14c7:0101
+
+ * - 24
+ - Askey CPH05X/06X (bt878) [many vendors]
+ - 144f:3002, 144f:3005, 144f:5000, 14ff:3000
+
+ * - 25
+ - Terratec TerraTV+ Version 1.0 (Bt848)/ Terra TValue Version 1.0/ Vobis TV-Boostar
+ -
+
+ * - 26
+ - Hauppauge WinCam newer (bt878)
+ -
+
+ * - 27
+ - Lifeview FlyVideo 98/ MAXI TV Video PCI2 LR50
+ -
+
+ * - 28
+ - Terratec TerraTV+ Version 1.1 (bt878)
+ - 153b:1127, 1852:1852
+
+ * - 29
+ - Imagenation PXC200
+ - 1295:200a
+
+ * - 30
+ - Lifeview FlyVideo 98 LR50
+ - 1f7f:1850
+
+ * - 31
+ - Formac iProTV, Formac ProTV I (bt848)
+ -
+
+ * - 32
+ - Intel Create and Share PCI/ Smart Video Recorder III
+ -
+
+ * - 33
+ - Terratec TerraTValue Version Bt878
+ - 153b:1117, 153b:1118, 153b:1119, 153b:111a, 153b:1134, 153b:5018
+
+ * - 34
+ - Leadtek WinFast 2000/ WinFast 2000 XP
+ - 107d:6606, 107d:6609, 6606:217d, f6ff:fff6
+
+ * - 35
+ - Lifeview FlyVideo 98 LR50 / Chronos Video Shuttle II
+ - 1851:1850, 1851:a050
+
+ * - 36
+ - Lifeview FlyVideo 98FM LR50 / Typhoon TView TV/FM Tuner
+ - 1852:1852
+
+ * - 37
+ - Prolink PixelView PlayTV pro
+ -
+
+ * - 38
+ - Askey CPH06X TView99
+ - 144f:3000, 144f:a005, a04f:a0fc
+
+ * - 39
+ - Pinnacle PCTV Studio/Rave
+ - 11bd:0012, bd11:1200, bd11:ff00, 11bd:ff12
+
+ * - 40
+ - STB TV PCI FM, Gateway P/N 6000704 (bt878), 3Dfx VoodooTV 100
+ - 10b4:2636, 10b4:2645, 121a:3060
+
+ * - 41
+ - AVerMedia TVPhone 98
+ - 1461:0001, 1461:0003
+
+ * - 42
+ - ProVideo PV951
+ - aa0c:146c
+
+ * - 43
+ - Little OnAir TV
+ -
+
+ * - 44
+ - Sigma TVII-FM
+ -
+
+ * - 45
+ - MATRIX-Vision MV-Delta 2
+ -
+
+ * - 46
+ - Zoltrix Genie TV/FM
+ - 15b0:4000, 15b0:400a, 15b0:400d, 15b0:4010, 15b0:4016
+
+ * - 47
+ - Terratec TV/Radio+
+ - 153b:1123
+
+ * - 48
+ - Askey CPH03x/ Dynalink Magic TView
+ -
+
+ * - 49
+ - IODATA GV-BCTV3/PCI
+ - 10fc:4020
+
+ * - 50
+ - Prolink PV-BT878P+4E / PixelView PlayTV PAK / Lenco MXTV-9578 CP
+ -
+
+ * - 51
+ - Eagle Wireless Capricorn2 (bt878A)
+ -
+
+ * - 52
+ - Pinnacle PCTV Studio Pro
+ -
+
+ * - 53
+ - Typhoon TView RDS + FM Stereo / KNC1 TV Station RDS
+ -
+
+ * - 54
+ - Lifeview FlyVideo 2000 /FlyVideo A2/ Lifetec LT 9415 TV [LR90]
+ -
+
+ * - 55
+ - Askey CPH031/ BESTBUY Easy TV
+ -
+
+ * - 56
+ - Lifeview FlyVideo 98FM LR50
+ - a051:41a0
+
+ * - 57
+ - GrandTec 'Grand Video Capture' (Bt848)
+ - 4344:4142
+
+ * - 58
+ - Askey CPH060/ Phoebe TV Master Only (No FM)
+ -
+
+ * - 59
+ - Askey CPH03x TV Capturer
+ -
+
+ * - 60
+ - Modular Technology MM100PCTV
+ -
+
+ * - 61
+ - AG Electronics GMV1
+ - 15cb:0101
+
+ * - 62
+ - Askey CPH061/ BESTBUY Easy TV (bt878)
+ -
+
+ * - 63
+ - ATI TV-Wonder
+ - 1002:0001
+
+ * - 64
+ - ATI TV-Wonder VE
+ - 1002:0003
+
+ * - 65
+ - Lifeview FlyVideo 2000S LR90
+ -
+
+ * - 66
+ - Terratec TValueRadio
+ - 153b:1135, 153b:ff3b
+
+ * - 67
+ - IODATA GV-BCTV4/PCI
+ - 10fc:4050
+
+ * - 68
+ - 3Dfx VoodooTV FM (Euro)
+ - 10b4:2637
+
+ * - 69
+ - Active Imaging AIMMS
+ -
+
+ * - 70
+ - Prolink Pixelview PV-BT878P+ (Rev.4C,8E)
+ -
+
+ * - 71
+ - Lifeview FlyVideo 98EZ (capture only) LR51
+ - 1851:1851
+
+ * - 72
+ - Prolink Pixelview PV-BT878P+9B (PlayTV Pro rev.9B FM+NICAM)
+ - 1554:4011
+
+ * - 73
+ - Sensoray 311/611
+ - 6000:0311, 6000:0611
+
+ * - 74
+ - RemoteVision MX (RV605)
+ -
+
+ * - 75
+ - Powercolor MTV878/ MTV878R/ MTV878F
+ -
+
+ * - 76
+ - Canopus WinDVR PCI (COMPAQ Presario 3524JP, 5112JP)
+ - 0e11:0079
+
+ * - 77
+ - GrandTec Multi Capture Card (Bt878)
+ -
+
+ * - 78
+ - Jetway TV/Capture JW-TV878-FBK, Kworld KW-TV878RF
+ - 0a01:17de
+
+ * - 79
+ - DSP Design TCVIDEO
+ -
+
+ * - 80
+ - Hauppauge WinTV PVR
+ - 0070:4500
+
+ * - 81
+ - IODATA GV-BCTV5/PCI
+ - 10fc:4070, 10fc:d018
+
+ * - 82
+ - Osprey 100/150 (878)
+ - 0070:ff00
+
+ * - 83
+ - Osprey 100/150 (848)
+ -
+
+ * - 84
+ - Osprey 101 (848)
+ -
+
+ * - 85
+ - Osprey 101/151
+ -
+
+ * - 86
+ - Osprey 101/151 w/ svid
+ -
+
+ * - 87
+ - Osprey 200/201/250/251
+ -
+
+ * - 88
+ - Osprey 200/250
+ - 0070:ff01
+
+ * - 89
+ - Osprey 210/220/230
+ -
+
+ * - 90
+ - Osprey 500
+ - 0070:ff02
+
+ * - 91
+ - Osprey 540
+ - 0070:ff04
+
+ * - 92
+ - Osprey 2000
+ - 0070:ff03
+
+ * - 93
+ - IDS Eagle
+ -
+
+ * - 94
+ - Pinnacle PCTV Sat
+ - 11bd:001c
+
+ * - 95
+ - Formac ProTV II (bt878)
+ -
+
+ * - 96
+ - MachTV
+ -
+
+ * - 97
+ - Euresys Picolo
+ -
+
+ * - 98
+ - ProVideo PV150
+ - aa00:1460, aa01:1461, aa02:1462, aa03:1463, aa04:1464, aa05:1465, aa06:1466, aa07:1467
+
+ * - 99
+ - AD-TVK503
+ -
+
+ * - 100
+ - Hercules Smart TV Stereo
+ -
+
+ * - 101
+ - Pace TV & Radio Card
+ -
+
+ * - 102
+ - IVC-200
+ - 0000:a155, 0001:a155, 0002:a155, 0003:a155, 0100:a155, 0101:a155, 0102:a155, 0103:a155, 0800:a155, 0801:a155, 0802:a155, 0803:a155
+
+ * - 103
+ - Grand X-Guard / Trust 814PCI
+ - 0304:0102
+
+ * - 104
+ - Nebula Electronics DigiTV
+ - 0071:0101
+
+ * - 105
+ - ProVideo PV143
+ - aa00:1430, aa00:1431, aa00:1432, aa00:1433, aa03:1433
+
+ * - 106
+ - PHYTEC VD-009-X1 VD-011 MiniDIN (bt878)
+ -
+
+ * - 107
+ - PHYTEC VD-009-X1 VD-011 Combi (bt878)
+ -
+
+ * - 108
+ - PHYTEC VD-009 MiniDIN (bt878)
+ -
+
+ * - 109
+ - PHYTEC VD-009 Combi (bt878)
+ -
+
+ * - 110
+ - IVC-100
+ - ff00:a132
+
+ * - 111
+ - IVC-120G
+ - ff00:a182, ff01:a182, ff02:a182, ff03:a182, ff04:a182, ff05:a182, ff06:a182, ff07:a182, ff08:a182, ff09:a182, ff0a:a182, ff0b:a182, ff0c:a182, ff0d:a182, ff0e:a182, ff0f:a182
+
+ * - 112
+ - pcHDTV HD-2000 TV
+ - 7063:2000
+
+ * - 113
+ - Twinhan DST + clones
+ - 11bd:0026, 1822:0001, 270f:fc00, 1822:0026
+
+ * - 114
+ - Winfast VC100
+ - 107d:6607
+
+ * - 115
+ - Teppro TEV-560/InterVision IV-560
+ -
+
+ * - 116
+ - SIMUS GVC1100
+ - aa6a:82b2
+
+ * - 117
+ - NGS NGSTV+
+ -
+
+ * - 118
+ - LMLBT4
+ -
+
+ * - 119
+ - Tekram M205 PRO
+ -
+
+ * - 120
+ - Conceptronic CONTVFMi
+ -
+
+ * - 121
+ - Euresys Picolo Tetra
+ - 1805:0105, 1805:0106, 1805:0107, 1805:0108
+
+ * - 122
+ - Spirit TV Tuner
+ -
+
+ * - 123
+ - AVerMedia AVerTV DVB-T 771
+ - 1461:0771
+
+ * - 124
+ - AverMedia AverTV DVB-T 761
+ - 1461:0761
+
+ * - 125
+ - MATRIX Vision Sigma-SQ
+ -
+
+ * - 126
+ - MATRIX Vision Sigma-SLC
+ -
+
+ * - 127
+ - APAC Viewcomp 878(AMAX)
+ -
+
+ * - 128
+ - DViCO FusionHDTV DVB-T Lite
+ - 18ac:db10, 18ac:db11
+
+ * - 129
+ - V-Gear MyVCD
+ -
+
+ * - 130
+ - Super TV Tuner
+ -
+
+ * - 131
+ - Tibet Systems 'Progress DVR' CS16
+ -
+
+ * - 132
+ - Kodicom 4400R (master)
+ -
+
+ * - 133
+ - Kodicom 4400R (slave)
+ -
+
+ * - 134
+ - Adlink RTV24
+ -
+
+ * - 135
+ - DViCO FusionHDTV 5 Lite
+ - 18ac:d500
+
+ * - 136
+ - Acorp Y878F
+ - 9511:1540
+
+ * - 137
+ - Conceptronic CTVFMi v2
+ - 036e:109e
+
+ * - 138
+ - Prolink Pixelview PV-BT878P+ (Rev.2E)
+ -
+
+ * - 139
+ - Prolink PixelView PlayTV MPEG2 PV-M4900
+ -
+
+ * - 140
+ - Osprey 440
+ - 0070:ff07
+
+ * - 141
+ - Asound Skyeye PCTV
+ -
+
+ * - 142
+ - Sabrent TV-FM (bttv version)
+ -
+
+ * - 143
+ - Hauppauge ImpactVCB (bt878)
+ - 0070:13eb
+
+ * - 144
+ - MagicTV
+ -
+
+ * - 145
+ - SSAI Security Video Interface
+ - 4149:5353
+
+ * - 146
+ - SSAI Ultrasound Video Interface
+ - 414a:5353
+
+ * - 147
+ - VoodooTV 200 (USA)
+ - 121a:3000
+
+ * - 148
+ - DViCO FusionHDTV 2
+ - dbc0:d200
+
+ * - 149
+ - Typhoon TV-Tuner PCI (50684)
+ -
+
+ * - 150
+ - Geovision GV-600
+ - 008a:763c
+
+ * - 151
+ - Kozumi KTV-01C
+ -
+
+ * - 152
+ - Encore ENL TV-FM-2
+ - 1000:1801
+
+ * - 153
+ - PHYTEC VD-012 (bt878)
+ -
+
+ * - 154
+ - PHYTEC VD-012-X1 (bt878)
+ -
+
+ * - 155
+ - PHYTEC VD-012-X2 (bt878)
+ -
+
+ * - 156
+ - IVCE-8784
+ - 0000:f050, 0001:f050, 0002:f050, 0003:f050
+
+ * - 157
+ - Geovision GV-800(S) (master)
+ - 800a:763d
+
+ * - 158
+ - Geovision GV-800(S) (slave)
+ - 800b:763d, 800c:763d, 800d:763d
+
+ * - 159
+ - ProVideo PV183
+ - 1830:1540, 1831:1540, 1832:1540, 1833:1540, 1834:1540, 1835:1540, 1836:1540, 1837:1540
+
+ * - 160
+ - Tongwei Video Technology TD-3116
+ - f200:3116
+
+ * - 161
+ - Aposonic W-DVR
+ - 0279:0228
+
+ * - 162
+ - Adlink MPG24
+ -
+
+ * - 163
+ - Bt848 Capture 14MHz
+ -
+
+ * - 164
+ - CyberVision CV06 (SV)
+ -
+
+ * - 165
+ - Kworld V-Stream Xpert TV PVR878
+ -
+
+ * - 166
+ - PCI-8604PW
+ -
diff --git a/Documentation/admin-guide/media/bttv.rst b/Documentation/admin-guide/media/bttv.rst
new file mode 100644
index 000000000000..58cbaf6df694
--- /dev/null
+++ b/Documentation/admin-guide/media/bttv.rst
@@ -0,0 +1,1762 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+The bttv driver
+===============
+
+Release notes for bttv
+----------------------
+
+You'll need at least these config options for bttv::
+
+ ./scripts/config -e PCI
+ ./scripts/config -m I2C
+ ./scripts/config -m INPUT
+ ./scripts/config -m MEDIA_SUPPORT
+ ./scripts/config -e MEDIA_PCI_SUPPORT
+ ./scripts/config -e MEDIA_ANALOG_TV_SUPPORT
+ ./scripts/config -e MEDIA_DIGITAL_TV_SUPPORT
+ ./scripts/config -e MEDIA_RADIO_SUPPORT
+ ./scripts/config -e RC_CORE
+ ./scripts/config -m VIDEO_BT848
+
+If your board has digital TV, you'll also need::
+
+ ./scripts/config -m DVB_BT8XX
+
+In this case, please see Documentation/admin-guide/media/bt8xx.rst
+for additional notes.
+
+Make bttv work with your card
+-----------------------------
+
+If you have bttv compiled and installed, just booting the Kernel
+should be enough for it to try probing it. However, depending
+on the model, the Kernel may require additional information about
+the hardware, as the device may not be able to provide such info
+directly to the Kernel.
+
+If it doesn't bttv likely could not autodetect your card and needs some
+insmod options. The most important insmod option for bttv is "card=n"
+to select the correct card type. If you get video but no sound you've
+very likely specified the wrong (or no) card type. A list of supported
+cards is in Documentation/admin-guide/media/bttv-cardlist.rst.
+
+If bttv takes very long to load (happens sometimes with the cheap
+cards which have no tuner), try adding this to your modules configuration
+file (usually, it is either ``/etc/modules.conf`` or some file at
+``/etc/modules-load.d/``, but the actual place depends on your
+distribution)::
+
+ options i2c-algo-bit bit_test=1
+
+Some cards may require an extra firmware file to work. For example,
+for the WinTV/PVR you need one firmware file from its driver CD,
+called: ``hcwamc.rbf``. It is inside a self-extracting zip file
+called ``pvr45xxx.exe``. Just placing it at the ``/etc/firmware``
+directory should be enough for it to be autoload during the driver's
+probing mode (e. g. when the Kernel boots or when the driver is
+manually loaded via ``modprobe`` command).
+
+If your card isn't listed in Documentation/admin-guide/media/bttv-cardlist.rst
+or if you have trouble making audio work, please read :ref:`still_doesnt_work`.
+
+
+Autodetecting cards
+-------------------
+
+bttv uses the PCI Subsystem ID to autodetect the card type. lspci lists
+the Subsystem ID in the second line, looks like this:
+
+.. code-block:: none
+
+ 00:0a.0 Multimedia video controller: Brooktree Corporation Bt878 (rev 02)
+ Subsystem: Hauppauge computer works Inc. WinTV/GO
+ Flags: bus master, medium devsel, latency 32, IRQ 5
+ Memory at e2000000 (32-bit, prefetchable) [size=4K]
+
+only bt878-based cards can have a subsystem ID (which does not mean
+that every card really has one). bt848 cards can't have a Subsystem
+ID and therefore can't be autodetected. There is a list with the ID's
+at Documentation/admin-guide/media/bttv-cardlist.rst
+(in case you are interested or want to mail patches with updates).
+
+
+.. _still_doesnt_work:
+
+Still doesn't work?
+-------------------
+
+I do NOT have a lab with 30+ different grabber boards and a
+PAL/NTSC/SECAM test signal generator at home, so I often can't
+reproduce your problems. This makes debugging very difficult for me.
+
+If you have some knowledge and spare time, please try to fix this
+yourself (patches very welcome of course...) You know: The linux
+slogan is "Do it yourself".
+
+There is a mailing list at
+http://vger.kernel.org/vger-lists.html#linux-media
+
+If you have trouble with some specific TV card, try to ask there
+instead of mailing me directly. The chance that someone with the
+same card listens there is much higher...
+
+For problems with sound: There are a lot of different systems used
+for TV sound all over the world. And there are also different chips
+which decode the audio signal. Reports about sound problems ("stereo
+doesn't work") are pretty useless unless you include some details
+about your hardware and the TV sound scheme used in your country (or
+at least the country you are living in).
+
+Modprobe options
+----------------
+
+.. note::
+
+
+ The following argument list can be outdated, as we might add more
+ options if ever needed. In case of doubt, please check with
+ ``modinfo <module>``.
+
+ This command prints various information about a kernel
+ module, among them a complete and up-to-date list of insmod options.
+
+
+
+bttv
+ The bt848/878 (grabber chip) driver
+
+ insmod args::
+
+ card=n card type, see CARDLIST for a list.
+ tuner=n tuner type, see CARDLIST for a list.
+ radio=0/1 card supports radio
+ pll=0/1/2 pll settings
+
+ 0: don't use PLL
+ 1: 28 MHz crystal installed
+ 2: 35 MHz crystal installed
+
+ triton1=0/1 for Triton1 (+others) compatibility
+ vsfx=0/1 yet another chipset bug compatibility bit
+ see README.quirks for details on these two.
+
+ bigendian=n Set the endianness of the gfx framebuffer.
+ Default is native endian.
+ fieldnr=0/1 Count fields. Some TV descrambling software
+ needs this, for others it only generates
+ 50 useless IRQs/sec. default is 0 (off).
+ autoload=0/1 autoload helper modules (tuner, audio).
+ default is 1 (on).
+ bttv_verbose=0/1/2 verbose level (at insmod time, while
+ looking at the hardware). default is 1.
+ bttv_debug=0/1 debug messages (for capture).
+ default is 0 (off).
+ irq_debug=0/1 irq handler debug messages.
+ default is 0 (off).
+ gbuffers=2-32 number of capture buffers for mmap'ed capture.
+ default is 4.
+ gbufsize= size of capture buffers. default and
+ maximum value is 0x208000 (~2MB)
+ no_overlay=0 Enable overlay on broken hardware. There
+ are some chipsets (SIS for example) which
+ are known to have problems with the PCI DMA
+ push used by bttv. bttv will disable overlay
+ by default on this hardware to avoid crashes.
+ With this insmod option you can override this.
+ no_overlay=1 Disable overlay. It should be used by broken
+ hardware that doesn't support PCI2PCI direct
+ transfers.
+ automute=0/1 Automatically mutes the sound if there is
+ no TV signal, on by default. You might try
+ to disable this if you have bad input signal
+ quality which leading to unwanted sound
+ dropouts.
+ chroma_agc=0/1 AGC of chroma signal, off by default.
+ adc_crush=0/1 Luminance ADC crush, on by default.
+ i2c_udelay= Allow reduce I2C speed. Default is 5 usecs
+ (meaning 66,67 Kbps). The default is the
+ maximum supported speed by kernel bitbang
+ algorithm. You may use lower numbers, if I2C
+ messages are lost (16 is known to work on
+ all supported cards).
+
+ bttv_gpio=0/1
+ gpiomask=
+ audioall=
+ audiomux=
+ See Sound-FAQ for a detailed description.
+
+ remap, card, radio and pll accept up to four comma-separated arguments
+ (for multiple boards).
+
+tuner
+ The tuner driver. You need this unless you want to use only
+ with a camera or the board doesn't provide analog TV tuning.
+
+ insmod args::
+
+ debug=1 print some debug info to the syslog
+ type=n type of the tuner chip. n as follows:
+ see CARDLIST for a complete list.
+ pal=[bdgil] select PAL variant (used for some tuners
+ only, important for the audio carrier).
+
+tvaudio
+ Provide a single driver for all simple i2c audio control
+ chips (tda/tea*).
+
+ insmod args::
+
+ tda8425 = 1 enable/disable the support for the
+ tda9840 = 1 various chips.
+ tda9850 = 1 The tea6300 can't be autodetected and is
+ tda9855 = 1 therefore off by default, if you have
+ tda9873 = 1 this one on your card (STB uses these)
+ tda9874a = 1 you have to enable it explicitly.
+ tea6300 = 0 The two tda985x chips use the same i2c
+ tea6420 = 1 address and can't be disturgished from
+ pic16c54 = 1 each other, you might have to disable
+ the wrong one.
+ debug = 1 print debug messages
+
+msp3400
+ The driver for the msp34xx sound processor chips. If you have a
+ stereo card, you probably want to insmod this one.
+
+ insmod args::
+
+ debug=1/2 print some debug info to the syslog,
+ 2 is more verbose.
+ simple=1 Use the "short programming" method. Newer
+ msp34xx versions support this. You need this
+ for dbx stereo. Default is on if supported by
+ the chip.
+ once=1 Don't check the TV-stations Audio mode
+ every few seconds, but only once after
+ channel switches.
+ amsound=1 Audio carrier is AM/NICAM at 6.5 Mhz. This
+ should improve things for french people, the
+ carrier autoscan seems to work with FM only...
+
+If the box freezes hard with bttv
+---------------------------------
+
+It might be a bttv driver bug. It also might be bad hardware. It also
+might be something else ...
+
+Just mailing me "bttv freezes" isn't going to help much. This README
+has a few hints how you can help to pin down the problem.
+
+
+bttv bugs
+~~~~~~~~~
+
+If some version works and another doesn't it is likely to be a driver
+bug. It is very helpful if you can tell where exactly it broke
+(i.e. the last working and the first broken version).
+
+With a hard freeze you probably doesn't find anything in the logfiles.
+The only way to capture any kernel messages is to hook up a serial
+console and let some terminal application log the messages. /me uses
+screen. See Documentation/admin-guide/serial-console.rst for details on
+setting up a serial console.
+
+Read Documentation/admin-guide/bug-hunting.rst to learn how to get any useful
+information out of a register+stack dump printed by the kernel on
+protection faults (so-called "kernel oops").
+
+If you run into some kind of deadlock, you can try to dump a call trace
+for each process using sysrq-t (see Documentation/admin-guide/sysrq.rst).
+This way it is possible to figure where *exactly* some process in "D"
+state is stuck.
+
+I've seen reports that bttv 0.7.x crashes whereas 0.8.x works rock solid
+for some people. Thus probably a small buglet left somewhere in bttv
+0.7.x. I have no idea where exactly, it works stable for me and a lot of
+other people. But in case you have problems with the 0.7.x versions you
+can give 0.8.x a try ...
+
+
+hardware bugs
+~~~~~~~~~~~~~
+
+Some hardware can't deal with PCI-PCI transfers (i.e. grabber => vga).
+Sometimes problems show up with bttv just because of the high load on
+the PCI bus. The bt848/878 chips have a few workarounds for known
+incompatibilities, see README.quirks.
+
+Some folks report that increasing the pci latency helps too,
+althrought I'm not sure whenever this really fixes the problems or
+only makes it less likely to happen. Both bttv and btaudio have a
+insmod option to set the PCI latency of the device.
+
+Some mainboard have problems to deal correctly with multiple devices
+doing DMA at the same time. bttv + ide seems to cause this sometimes,
+if this is the case you likely see freezes only with video and hard disk
+access at the same time. Updating the IDE driver to get the latest and
+greatest workarounds for hardware bugs might fix these problems.
+
+
+other
+~~~~~
+
+If you use some binary-only yunk (like nvidia module) try to reproduce
+the problem without.
+
+IRQ sharing is known to cause problems in some cases. It works just
+fine in theory and many configurations. Neverless it might be worth a
+try to shuffle around the PCI cards to give bttv another IRQ or make
+it share the IRQ with some other piece of hardware. IRQ sharing with
+VGA cards seems to cause trouble sometimes. I've also seen funny
+effects with bttv sharing the IRQ with the ACPI bridge (and
+apci-enabled kernel).
+
+Bttv quirks
+-----------
+
+Below is what the bt878 data book says about the PCI bug compatibility
+modes of the bt878 chip.
+
+The triton1 insmod option sets the EN_TBFX bit in the control register.
+The vsfx insmod option does the same for EN_VSFX bit. If you have
+stability problems you can try if one of these options makes your box
+work solid.
+
+drivers/pci/quirks.c knows about these issues, this way these bits are
+enabled automagically for known-buggy chipsets (look at the kernel
+messages, bttv tells you).
+
+Normal PCI Mode
+~~~~~~~~~~~~~~~
+
+The PCI REQ signal is the logical-or of the incoming function requests.
+The inter-nal GNT[0:1] signals are gated asynchronously with GNT and
+demultiplexed by the audio request signal. Thus the arbiter defaults to
+the video function at power-up and parks there during no requests for
+bus access. This is desirable since the video will request the bus more
+often. However, the audio will have highest bus access priority. Thus
+the audio will have first access to the bus even when issuing a request
+after the video request but before the PCI external arbiter has granted
+access to the Bt879. Neither function can preempt the other once on the
+bus. The duration to empty the entire video PCI FIFO onto the PCI bus is
+very short compared to the bus access latency the audio PCI FIFO can
+tolerate.
+
+
+430FX Compatibility Mode
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+When using the 430FX PCI, the following rules will ensure
+compatibility:
+
+ (1) Deassert REQ at the same time as asserting FRAME.
+ (2) Do not reassert REQ to request another bus transaction until after
+ finish-ing the previous transaction.
+
+Since the individual bus masters do not have direct control of REQ, a
+simple logical-or of video and audio requests would violate the rules.
+Thus, both the arbiter and the initiator contain 430FX compatibility
+mode logic. To enable 430FX mode, set the EN_TBFX bit as indicated in
+Device Control Register on page 104.
+
+When EN_TBFX is enabled, the arbiter ensures that the two compatibility
+rules are satisfied. Before GNT is asserted by the PCI arbiter, this
+internal arbiter may still logical-or the two requests. However, once
+the GNT is issued, this arbiter must lock in its decision and now route
+only the granted request to the REQ pin. The arbiter decision lock
+happens regardless of the state of FRAME because it does not know when
+FRAME will be asserted (typically - each initiator will assert FRAME on
+the cycle following GNT). When FRAME is asserted, it is the initiator s
+responsibility to remove its request at the same time. It is the
+arbiters responsibility to allow this request to flow through to REQ and
+not allow the other request to hold REQ asserted. The decision lock may
+be removed at the end of the transaction: for example, when the bus is
+idle (FRAME and IRDY). The arbiter decision may then continue
+asynchronously until GNT is again asserted.
+
+
+Interfacing with Non-PCI 2.1 Compliant Core Logic
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A small percentage of core logic devices may start a bus transaction
+during the same cycle that GNT is de-asserted. This is non PCI 2.1
+compliant. To ensure compatibility when using PCs with these PCI
+controllers, the EN_VSFX bit must be enabled (refer to Device Control
+Register on page 104). When in this mode, the arbiter does not pass GNT
+to the internal functions unless REQ is asserted. This prevents a bus
+transaction from starting the same cycle as GNT is de-asserted. This
+also has the side effect of not being able to take advantage of bus
+parking, thus lowering arbitration performance. The Bt879 drivers must
+query for these non-compliant devices, and set the EN_VSFX bit only if
+required.
+
+
+Other elements of the tvcards array
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you are trying to make a new card work you might find it useful to
+know what the other elements in the tvcards array are good for::
+
+ video_inputs - # of video inputs the card has
+ audio_inputs - historical cruft, not used any more.
+ tuner - which input is the tuner
+ svhs - which input is svhs (all others are labeled composite)
+ muxsel - video mux, input->registervalue mapping
+ pll - same as pll= insmod option
+ tuner_type - same as tuner= insmod option
+ *_modulename - hint whenever some card needs this or that audio
+ module loaded to work properly.
+ has_radio - whenever this TV card has a radio tuner.
+ no_msp34xx - "1" disables loading of msp3400.o module
+ no_tda9875 - "1" disables loading of tda9875.o module
+ needs_tvaudio - set to "1" to load tvaudio.o module
+
+If some config item is specified both from the tvcards array and as
+insmod option, the insmod option takes precedence.
+
+Cards
+-----
+
+.. note::
+
+ For a more updated list, please check
+ https://linuxtv.org/wiki/index.php/Hardware_Device_Information
+
+Supported cards: Bt848/Bt848a/Bt849/Bt878/Bt879 cards
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+All cards with Bt848/Bt848a/Bt849/Bt878/Bt879 and normal
+Composite/S-VHS inputs are supported. Teletext and Intercast support
+(PAL only) for ALL cards via VBI sample decoding in software.
+
+Some cards with additional multiplexing of inputs or other additional
+fancy chips are only partially supported (unless specifications by the
+card manufacturer are given). When a card is listed here it isn't
+necessarily fully supported.
+
+All other cards only differ by additional components as tuners, sound
+decoders, EEPROMs, teletext decoders ...
+
+
+MATRIX Vision
+~~~~~~~~~~~~~
+
+MV-Delta
+- Bt848A
+- 4 Composite inputs, 1 S-VHS input (shared with 4th composite)
+- EEPROM
+
+http://www.matrix-vision.de/
+
+This card has no tuner but supports all 4 composite (1 shared with an
+S-VHS input) of the Bt848A.
+Very nice card if you only have satellite TV but several tuners connected
+to the card via composite.
+
+Many thanks to Matrix-Vision for giving us 2 cards for free which made
+Bt848a/Bt849 single crystal operation support possible!!!
+
+
+
+Miro/Pinnacle PCTV
+~~~~~~~~~~~~~~~~~~
+
+- Bt848
+ some (all??) come with 2 crystals for PAL/SECAM and NTSC
+- PAL, SECAM or NTSC TV tuner (Philips or TEMIC)
+- MSP34xx sound decoder on add on board
+ decoder is supported but AFAIK does not yet work
+ (other sound MUX setting in GPIO port needed??? somebody who fixed this???)
+- 1 tuner, 1 composite and 1 S-VHS input
+- tuner type is autodetected
+
+http://www.miro.de/
+http://www.miro.com/
+
+
+Many thanks for the free card which made first NTSC support possible back
+in 1997!
+
+
+Hauppauge Win/TV pci
+~~~~~~~~~~~~~~~~~~~~
+
+There are many different versions of the Hauppauge cards with different
+tuners (TV+Radio ...), teletext decoders.
+Note that even cards with same model numbers have (depending on the revision)
+different chips on it.
+
+- Bt848 (and others but always in 2 crystal operation???)
+ newer cards have a Bt878
+
+- PAL, SECAM, NTSC or tuner with or without Radio support
+
+e.g.:
+
+- PAL:
+
+ - TDA5737: VHF, hyperband and UHF mixer/oscillator for TV and VCR 3-band tuners
+ - TSA5522: 1.4 GHz I2C-bus controlled synthesizer, I2C 0xc2-0xc3
+
+- NTSC:
+
+ - TDA5731: VHF, hyperband and UHF mixer/oscillator for TV and VCR 3-band tuners
+ - TSA5518: no datasheet available on Philips site
+
+- Philips SAA5246 or SAA5284 ( or no) Teletext decoder chip
+ with buffer RAM (e.g. Winbond W24257AS-35: 32Kx8 CMOS static RAM)
+ SAA5246 (I2C 0x22) is supported
+
+- 256 bytes EEPROM: Microchip 24LC02B or Philips 8582E2Y
+ with configuration information
+ I2C address 0xa0 (24LC02B also responds to 0xa2-0xaf)
+
+- 1 tuner, 1 composite and (depending on model) 1 S-VHS input
+
+- 14052B: mux for selection of sound source
+
+- sound decoder: TDA9800, MSP34xx (stereo cards)
+
+
+Askey CPH-Series
+~~~~~~~~~~~~~~~~
+Developed by TelSignal(?), OEMed by many vendors (Typhoon, Anubis, Dynalink)
+
+- Card series:
+ - CPH01x: BT848 capture only
+ - CPH03x: BT848
+ - CPH05x: BT878 with FM
+ - CPH06x: BT878 (w/o FM)
+ - CPH07x: BT878 capture only
+
+- TV standards:
+ - CPH0x0: NTSC-M/M
+ - CPH0x1: PAL-B/G
+ - CPH0x2: PAL-I/I
+ - CPH0x3: PAL-D/K
+ - CPH0x4: SECAM-L/L
+ - CPH0x5: SECAM-B/G
+ - CPH0x6: SECAM-D/K
+ - CPH0x7: PAL-N/N
+ - CPH0x8: PAL-B/H
+ - CPH0x9: PAL-M/M
+
+- CPH03x was often sold as "TV capturer".
+
+Identifying:
+
+ #) 878 cards can be identified by PCI Subsystem-ID:
+ - 144f:3000 = CPH06x
+ - 144F:3002 = CPH05x w/ FM
+ - 144F:3005 = CPH06x_LC (w/o remote control)
+ #) The cards have a sticker with "CPH"-model on the back.
+ #) These cards have a number printed on the PCB just above the tuner metal box:
+ - "80-CP2000300-x" = CPH03X
+ - "80-CP2000500-x" = CPH05X
+ - "80-CP2000600-x" = CPH06X / CPH06x_LC
+
+ Askey sells these cards as "Magic TView series", Brand "MagicXpress".
+ Other OEM often call these "Tview", "TView99" or else.
+
+Lifeview Flyvideo Series:
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The naming of these series differs in time and space.
+
+Identifying:
+ #) Some models can be identified by PCI subsystem ID:
+
+ - 1852:1852 = Flyvideo 98 FM
+ - 1851:1850 = Flyvideo 98
+ - 1851:1851 = Flyvideo 98 EZ (capture only)
+
+ #) There is a print on the PCB:
+
+ - LR25 = Flyvideo (Zoran ZR36120, SAA7110A)
+ - LR26 Rev.N = Flyvideo II (Bt848)
+ - LR26 Rev.O = Flyvideo II (Bt878)
+ - LR37 Rev.C = Flyvideo EZ (Capture only, ZR36120 + SAA7110)
+ - LR38 Rev.A1= Flyvideo II EZ (Bt848 capture only)
+ - LR50 Rev.Q = Flyvideo 98 (w/eeprom and PCI subsystem ID)
+ - LR50 Rev.W = Flyvideo 98 (no eeprom)
+ - LR51 Rev.E = Flyvideo 98 EZ (capture only)
+ - LR90 = Flyvideo 2000 (Bt878)
+ - LR90 Flyvideo 2000S (Bt878) w/Stereo TV (Package incl. LR91 daughterboard)
+ - LR91 = Stereo daughter card for LR90
+ - LR97 = Flyvideo DVBS
+ - LR99 Rev.E = Low profile card for OEM integration (only internal audio!) bt878
+ - LR136 = Flyvideo 2100/3100 (Low profile, SAA7130/SAA7134)
+ - LR137 = Flyvideo DV2000/DV3000 (SAA7130/SAA7134 + IEEE1394)
+ - LR138 Rev.C= Flyvideo 2000 (SAA7130)
+ - LR138 Flyvideo 3000 (SAA7134) w/Stereo TV
+
+ - These exist in variations w/FM and w/Remote sometimes denoted
+ by suffixes "FM" and "R".
+
+ #) You have a laptop (miniPCI card):
+
+ - Product = FlyTV Platinum Mini
+ - Model/Chip = LR212/saa7135
+
+ - Lifeview.com.tw states (Feb. 2002):
+ "The FlyVideo2000 and FlyVideo2000s product name have renamed to FlyVideo98."
+ Their Bt8x8 cards are listed as discontinued.
+ - Flyvideo 2000S was probably sold as Flyvideo 3000 in some countries(Europe?).
+ The new Flyvideo 2000/3000 are SAA7130/SAA7134 based.
+
+"Flyvideo II" had been the name for the 848 cards, nowadays (in Germany)
+this name is re-used for LR50 Rev.W.
+
+The Lifeview website mentioned Flyvideo III at some time, but such a card
+has not yet been seen (perhaps it was the german name for LR90 [stereo]).
+These cards are sold by many OEMs too.
+
+FlyVideo A2 (Elta 8680)= LR90 Rev.F (w/Remote, w/o FM, stereo TV by tda9821) {Germany}
+
+Lifeview 3000 (Elta 8681) as sold by Plus(April 2002), Germany = LR138 w/ saa7134
+
+lifeview config coding on gpio pins 0-9
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- LR50 rev. Q ("PARTS: 7031505116), Tuner wurde als Nr. 5 erkannt, Eingänge
+ SVideo, TV, Composite, Audio, Remote:
+
+ - CP9..1=100001001 (1: 0-Ohm-Widerstand gegen GND unbestückt; 0: bestückt)
+
+
+Typhoon TV card series:
+~~~~~~~~~~~~~~~~~~~~~~~
+
+These can be CPH, Flyvideo, Pixelview or KNC1 series.
+
+Typhoon is the brand of Anubis.
+
+Model 50680 got re-used, some model no. had different contents over time.
+
+Models:
+
+ - 50680 "TV Tuner PCI Pal BG"(old,red package)=can be CPH03x(bt848) or CPH06x(bt878)
+ - 50680 "TV Tuner Pal BG" (blue package)= Pixelview PV-BT878P+ (Rev 9B)
+ - 50681 "TV Tuner PCI Pal I" (variant of 50680)
+ - 50682 "TView TV/FM Tuner Pal BG" = Flyvideo 98FM (LR50 Rev.Q)
+
+ .. note::
+
+ The package has a picture of CPH05x (which would be a real TView)
+
+ - 50683 "TV Tuner PCI SECAM" (variant of 50680)
+ - 50684 "TV Tuner Pal BG" = Pixelview 878TV(Rev.3D)
+ - 50686 "TV Tuner" = KNC1 TV Station
+ - 50687 "TV Tuner stereo" = KNC1 TV Station pro
+ - 50688 "TV Tuner RDS" (black package) = KNC1 TV Station RDS
+ - 50689 TV SAT DVB-S CARD CI PCI (SAA7146AH, SU1278?) = "KNC1 TV Station DVB-S"
+ - 50692 "TV/FM Tuner" (small PCB)
+ - 50694 TV TUNER CARD RDS (PHILIPS CHIPSET SAA7134HL)
+ - 50696 TV TUNER STEREO (PHILIPS CHIPSET SAA7134HL, MK3ME Tuner)
+ - 50804 PC-SAT TV/Audio Karte = Techni-PC-Sat (ZORAN 36120PQC, Tuner:Alps)
+ - 50866 TVIEW SAT RECEIVER+ADR
+ - 50868 "TV/FM Tuner Pal I" (variant of 50682)
+ - 50999 "TV/FM Tuner Secam" (variant of 50682)
+
+Guillemot
+~~~~~~~~~
+
+Models:
+
+- Maxi-TV PCI (ZR36120)
+- Maxi TV Video 2 = LR50 Rev.Q (FI1216MF, PAL BG+SECAM)
+- Maxi TV Video 3 = CPH064 (PAL BG + SECAM)
+
+Mentor
+~~~~~~
+
+Mentor TV card ("55-878TV-U1") = Pixelview 878TV(Rev.3F) (w/FM w/Remote)
+
+Prolink
+~~~~~~~
+
+- TV cards:
+
+ - PixelView Play TV pro - (Model: PV-BT878P+ REV 8E)
+ - PixelView Play TV pro - (Model: PV-BT878P+ REV 9D)
+ - PixelView Play TV pro - (Model: PV-BT878P+ REV 4C / 8D / 10A )
+ - PixelView Play TV - (Model: PV-BT848P+)
+ - 878TV - (Model: PV-BT878TV)
+
+- Multimedia TV packages (card + software pack):
+
+ - PixelView Play TV Theater - (Model: PV-M4200) = PixelView Play TV pro + Software
+ - PixelView Play TV PAK - (Model: PV-BT878P+ REV 4E)
+ - PixelView Play TV/VCR - (Model: PV-M3200 REV 4C / 8D / 10A )
+ - PixelView Studio PAK - (Model: M2200 REV 4C / 8D / 10A )
+ - PixelView PowerStudio PAK - (Model: PV-M3600 REV 4E)
+ - PixelView DigitalVCR PAK - (Model: PV-M2400 REV 4C / 8D / 10A )
+ - PixelView PlayTV PAK II (TV/FM card + usb camera) PV-M3800
+ - PixelView PlayTV XP PV-M4700,PV-M4700(w/FM)
+ - PixelView PlayTV DVR PV-M4600 package contents:PixelView PlayTV pro, windvr & videoMail s/w
+
+- Further Cards:
+
+ - PV-BT878P+rev.9B (Play TV Pro, opt. w/FM w/NICAM)
+ - PV-BT878P+rev.2F
+ - PV-BT878P Rev.1D (bt878, capture only)
+
+ - XCapture PV-CX881P (cx23881)
+ - PlayTV HD PV-CX881PL+, PV-CX881PL+(w/FM) (cx23881)
+
+ - DTV3000 PV-DTV3000P+ DVB-S CI = Twinhan VP-1030
+ - DTV2000 DVB-S = Twinhan VP-1020
+
+- Video Conferencing:
+
+ - PixelView Meeting PAK - (Model: PV-BT878P)
+ - PixelView Meeting PAK Lite - (Model: PV-BT878P)
+ - PixelView Meeting PAK plus - (Model: PV-BT878P+rev 4C/8D/10A)
+ - PixelView Capture - (Model: PV-BT848P)
+ - PixelView PlayTV USB pro
+ - Model No. PV-NT1004+, PV-NT1004+ (w/FM) = NT1004 USB decoder chip + SAA7113 video decoder chip
+
+Dynalink
+~~~~~~~~
+
+These are CPH series.
+
+Phoebemicro
+~~~~~~~~~~~
+
+- TV Master = CPH030 or CPH060
+- TV Master FM = CPH050
+
+Genius/Kye
+~~~~~~~~~~
+
+- Video Wonder/Genius Internet Video Kit = LR37 Rev.C
+- Video Wonder Pro II (848 or 878) = LR26
+
+Tekram
+~~~~~~
+
+- VideoCap C205 (Bt848)
+- VideoCap C210 (zr36120 +Philips)
+- CaptureTV M200 (ISA)
+- CaptureTV M205 (Bt848)
+
+Lucky Star
+~~~~~~~~~~
+
+- Image World Conference TV = LR50 Rev. Q
+
+Leadtek
+~~~~~~~
+
+- WinView 601 (Bt848)
+- WinView 610 (Zoran)
+- WinFast2000
+- WinFast2000 XP
+
+Support for the Leadtek WinView 601 TV/FM
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Author of this section: Jon Tombs <jon@gte.esi.us.es>
+
+This card is basically the same as all the rest (Bt484A, Philips tuner),
+the main difference is that they have attached a programmable attenuator to 3
+GPIO lines in order to give some volume control. They have also stuck an
+infra-red remote control decoded on the board, I will add support for this
+when I get time (it simple generates an interrupt for each key press, with
+the key code is placed in the GPIO port).
+
+I don't yet have any application to test the radio support. The tuner
+frequency setting should work but it is possible that the audio multiplexer
+is wrong. If it doesn't work, send me email.
+
+
+- No Thanks to Leadtek they refused to answer any questions about their
+ hardware. The driver was written by visual inspection of the card. If you
+ use this driver, send an email insult to them, and tell them you won't
+ continue buying their hardware unless they support Linux.
+
+- Little thanks to Princeton Technology Corp (http://www.princeton.com.tw)
+ who make the audio attenuator. Their publicly available data-sheet available
+ on their web site doesn't include the chip programming information! Hidden
+ on their server are the full data-sheets, but don't ask how I found it.
+
+To use the driver I use the following options, the tuner and pll settings might
+be different in your country. You can force it via modprobe parameters.
+For example::
+
+ modprobe bttv tuner=1 pll=28 radio=1 card=17
+
+Sets tuner type 1 (Philips PAL_I), PLL with a 28 MHz crystal, enables
+FM radio and selects bttv card ID 17 (Leadtek WinView 601).
+
+
+KNC One
+~~~~~~~
+
+- TV-Station
+- TV-Station SE (+Software Bundle)
+- TV-Station pro (+TV stereo)
+- TV-Station FM (+Radio)
+- TV-Station RDS (+RDS)
+- TV Station SAT (analog satellite)
+- TV-Station DVB-S
+
+.. note:: newer Cards have saa7134, but model name stayed the same?
+
+Provideo
+~~~~~~~~
+
+- PV951 or PV-951, now named PV-951T
+ (also are sold as:
+ Boeder TV-FM Video Capture Card,
+ Titanmedia Supervision TV-2400,
+ Provideo PV951 TF,
+ 3DeMon PV951,
+ MediaForte TV-Vision PV951,
+ Yoko PV951,
+ Vivanco Tuner Card PCI Art.-Nr.: 68404
+ )
+
+- Surveillance Series:
+
+ - PV-141
+ - PV-143
+ - PV-147
+ - PV-148 (capture only)
+ - PV-150
+ - PV-151
+
+- TV-FM Tuner Series:
+
+ - PV-951TDV (tv tuner + 1394)
+ - PV-951T/TF
+ - PV-951PT/TF
+ - PV-956T/TF Low Profile
+ - PV-911
+
+Highscreen
+~~~~~~~~~~
+
+Models:
+
+- TV Karte = LR50 Rev.S
+- TV-Boostar = Terratec Terra TV+ Version 1.0 (Bt848, tda9821) "ceb105.pcb"
+
+Zoltrix
+~~~~~~~
+
+Models:
+
+- Face to Face Capture (Bt848 capture only) (PCB "VP-2848")
+- Face To Face TV MAX (Bt848) (PCB "VP-8482 Rev1.3")
+- Genie TV (Bt878) (PCB "VP-8790 Rev 2.1")
+- Genie Wonder Pro
+
+AVerMedia
+~~~~~~~~~
+
+- AVer FunTV Lite (ISA, AV3001 chipset) "M101.C"
+- AVerTV
+- AVerTV Stereo
+- AVerTV Studio (w/FM)
+- AVerMedia TV98 with Remote
+- AVerMedia TV/FM98 Stereo
+- AVerMedia TVCAM98
+- TVCapture (Bt848)
+- TVPhone (Bt848)
+- TVCapture98 (="AVerMedia TV98" in USA) (Bt878)
+- TVPhone98 (Bt878, w/FM)
+
+======== =========== =============== ======= ====== ======== =======================
+PCB PCI-ID Model-Name Eeprom Tuner Sound Country
+======== =========== =============== ======= ====== ======== =======================
+M101.C ISA !
+M108-B Bt848 -- FR1236 US [#f2]_, [#f3]_
+M1A8-A Bt848 AVer TV-Phone FM1216 --
+M168-T 1461:0003 AVerTV Studio 48:17 FM1216 TDA9840T D [#f1]_ w/FM w/Remote
+M168-U 1461:0004 TVCapture98 40:11 FI1216 -- D w/Remote
+M168II-B 1461:0003 Medion MD9592 48:16 FM1216 TDA9873H D w/FM
+======== =========== =============== ======= ====== ======== =======================
+
+.. [#f1] Daughterboard MB68-A with TDA9820T and TDA9840T
+.. [#f2] Sony NE41S soldered (stereo sound?)
+.. [#f3] Daughterboard M118-A w/ pic 16c54 and 4 MHz quartz
+
+- US site has different drivers for (as of 09/2002):
+
+ - EZ Capture/InterCam PCI (BT-848 chip)
+ - EZ Capture/InterCam PCI (BT-878 chip)
+ - TV-Phone (BT-848 chip)
+ - TV98 (BT-848 chip)
+ - TV98 With Remote (BT-848 chip)
+ - TV98 (BT-878 chip)
+ - TV98 With Remote (BT-878)
+ - TV/FM98 (BT-878 chip)
+ - AVerTV
+ - AverTV Stereo
+ - AVerTV Studio
+
+DE hat diverse Treiber fuer diese Modelle (Stand 09/2002):
+
+ - TVPhone (848) mit Philips tuner FR12X6 (w/ FM radio)
+ - TVPhone (848) mit Philips tuner FM12X6 (w/ FM radio)
+ - TVCapture (848) w/Philips tuner FI12X6
+ - TVCapture (848) non-Philips tuner
+ - TVCapture98 (Bt878)
+ - TVPhone98 (Bt878)
+ - AVerTV und TVCapture98 w/VCR (Bt 878)
+ - AVerTVStudio und TVPhone98 w/VCR (Bt878)
+ - AVerTV GO Series (Kein SVideo Input)
+ - AVerTV98 (BT-878 chip)
+ - AVerTV98 mit Fernbedienung (BT-878 chip)
+ - AVerTV/FM98 (BT-878 chip)
+
+ - VDOmate (www.averm.com.cn) = M168U ?
+
+Aimslab
+~~~~~~~
+
+Models:
+
+- Video Highway or "Video Highway TR200" (ISA)
+- Video Highway Xtreme (aka "VHX") (Bt848, FM w/ TEA5757)
+
+IXMicro (former: IMS=Integrated Micro Solutions)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- IXTV BT848 (=TurboTV)
+- IXTV BT878
+- IMS TurboTV (Bt848)
+
+Lifetec/Medion/Tevion/Aldi
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- LT9306/MD9306 = CPH061
+- LT9415/MD9415 = LR90 Rev.F or Rev.G
+- MD9592 = Avermedia TVphone98 (PCI_ID=1461:0003), PCB-Rev=M168II-B (w/TDA9873H)
+- MD9717 = KNC One (Rev D4, saa7134, FM1216 MK2 tuner)
+- MD5044 = KNC One (Rev D4, saa7134, FM1216ME MK3 tuner)
+
+Modular Technologies (www.modulartech.com) UK
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- MM100 PCTV (Bt848)
+- MM201 PCTV (Bt878, Bt832) w/ Quartzsight camera
+- MM202 PCTV (Bt878, Bt832, tda9874)
+- MM205 PCTV (Bt878)
+- MM210 PCTV (Bt878) (Galaxy TV, Galaxymedia ?)
+
+Terratec
+~~~~~~~~
+
+Models:
+
+- Terra TV+ Version 1.0 (Bt848), "ceb105.PCB" printed on the PCB, TDA9821
+- Terra TV+ Version 1.1 (Bt878), "LR74 Rev.E" printed on the PCB, TDA9821
+- Terra TValueRadio, "LR102 Rev.C" printed on the PCB
+- Terra TV/Radio+ Version 1.0, "80-CP2830100-0" TTTV3 printed on the PCB,
+ "CPH010-E83" on the back, SAA6588T, TDA9873H
+- Terra TValue Version BT878, "80-CP2830110-0 TTTV4" printed on the PCB,
+ "CPH011-D83" on back
+- Terra TValue Version 1.0 "ceb105.PCB" (really identical to Terra TV+ Version 1.0)
+- Terra TValue New Revision "LR102 Rec.C"
+- Terra Active Radio Upgrade (tea5757h, saa6588t)
+
+- LR74 is a newer PCB revision of ceb105 (both incl. connector for Active Radio Upgrade)
+
+- Cinergy 400 (saa7134), "E877 11(S)", "PM820092D" printed on PCB
+- Cinergy 600 (saa7134)
+
+Technisat
+~~~~~~~~~
+
+Models:
+
+- Discos ADR PC-Karte ISA (no TV!)
+- Discos ADR PC-Karte PCI (probably no TV?)
+- Techni-PC-Sat (Sat. analog)
+ Rev 1.2 (zr36120, vpx3220, stv0030, saa5246, BSJE3-494A)
+- Mediafocus I (zr36120/zr36125, drp3510, Sat. analog + ADR Radio)
+- Mediafocus II (saa7146, Sat. analog)
+- SatADR Rev 2.1 (saa7146a, saa7113h, stv0056a, msp3400c, drp3510a, BSKE3-307A)
+- SkyStar 1 DVB (AV7110) = Technotrend Premium
+- SkyStar 2 DVB (B2C2) (=Sky2PC)
+
+Siemens
+~~~~~~~
+
+Multimedia eXtension Board (MXB) (SAA7146, SAA7111)
+
+Powercolor
+~~~~~~~~~~
+
+Models:
+
+- MTV878
+ Package comes with different contents:
+
+ a) pcb "MTV878" (CARD=75)
+ b) Pixelview Rev. 4\_
+
+- MTV878R w/Remote Control
+- MTV878F w/Remote Control w/FM radio
+
+Pinnacle
+~~~~~~~~
+
+PCTV models:
+
+- Mirovideo PCTV (Bt848)
+- Mirovideo PCTV SE (Bt848)
+- Mirovideo PCTV Pro (Bt848 + Daughterboard for TV Stereo and FM)
+- Studio PCTV Rave (Bt848 Version = Mirovideo PCTV)
+- Studio PCTV Rave (Bt878 package w/o infrared)
+- Studio PCTV (Bt878)
+- Studio PCTV Pro (Bt878 stereo w/ FM)
+- Pinnacle PCTV (Bt878, MT2032)
+- Pinnacle PCTV Pro (Bt878, MT2032)
+- Pinncale PCTV Sat (bt878a, HM1821/1221) ["Conexant CX24110 with CX24108 tuner, aka HM1221/HM1811"]
+- Pinnacle PCTV Sat XE
+
+M(J)PEG capture and playback models:
+
+- DC1+ (ISA)
+- DC10 (zr36057, zr36060, saa7110, adv7176)
+- DC10+ (zr36067, zr36060, saa7110, adv7176)
+- DC20 (ql16x24b,zr36050, zr36016, saa7110, saa7187 ...)
+- DC30 (zr36057, zr36050, zr36016, vpx3220, adv7176, ad1843, tea6415, miro FST97A1)
+- DC30+ (zr36067, zr36050, zr36016, vpx3220, adv7176)
+- DC50 (zr36067, zr36050, zr36016, saa7112, adv7176 (2 pcs.?), ad1843, miro FST97A1, Lattice ???)
+
+Lenco
+~~~~~
+
+Models:
+
+- MXR-9565 (=Technisat Mediafocus?)
+- MXR-9571 (Bt848) (=CPH031?)
+- MXR-9575
+- MXR-9577 (Bt878) (=Prolink 878TV Rev.3x)
+- MXTV-9578CP (Bt878) (= Prolink PV-BT878P+4E)
+
+Iomega
+~~~~~~
+
+Buz (zr36067, zr36060, saa7111, saa7185)
+
+LML
+~~~
+ LML33 (zr36067, zr36060, bt819, bt856)
+
+Grandtec
+~~~~~~~~
+
+Models:
+
+- Grand Video Capture (Bt848)
+- Multi Capture Card (Bt878)
+
+Koutech
+~~~~~~~
+
+Models:
+
+- KW-606 (Bt848)
+- KW-607 (Bt848 capture only)
+- KW-606RSF
+- KW-607A (capture only)
+- KW-608 (Zoran capture only)
+
+IODATA (jp)
+~~~~~~~~~~~
+
+Models:
+
+- GV-BCTV/PCI
+- GV-BCTV2/PCI
+- GV-BCTV3/PCI
+- GV-BCTV4/PCI
+- GV-VCP/PCI (capture only)
+- GV-VCP2/PCI (capture only)
+
+Canopus (jp)
+~~~~~~~~~~~~
+
+WinDVR = Kworld "KW-TVL878RF"
+
+www.sigmacom.co.kr
+~~~~~~~~~~~~~~~~~~
+
+Sigma Cyber TV II
+
+www.sasem.co.kr
+~~~~~~~~~~~~~~~
+
+Litte OnAir TV
+
+hama
+~~~~
+
+TV/Radio-Tuner Card, PCI (Model 44677) = CPH051
+
+Sigma Designs
+~~~~~~~~~~~~~
+
+Hollywood plus (em8300, em9010, adv7175), (PCB "M340-10") MPEG DVD decoder
+
+Formac
+~~~~~~
+
+Models:
+
+- iProTV (Card for iMac Mezzanine slot, Bt848+SCSI)
+- ProTV (Bt848)
+- ProTV II = ProTV Stereo (Bt878) ["stereo" means FM stereo, tv is still mono]
+
+ATI
+~~~
+
+Models:
+
+- TV-Wonder
+- TV-Wonder VE
+
+Diamond Multimedia
+~~~~~~~~~~~~~~~~~~
+
+DTV2000 (Bt848, tda9875)
+
+Aopen
+~~~~~
+
+- VA1000 Plus (w/ Stereo)
+- VA1000 Lite
+- VA1000 (=LR90)
+
+Intel
+~~~~~
+
+Models:
+
+- Smart Video Recorder (ISA full-length)
+- Smart Video Recorder pro (ISA half-length)
+- Smart Video Recorder III (Bt848)
+
+STB
+~~~
+
+Models:
+
+- STB Gateway 6000704 (bt878)
+- STB Gateway 6000699 (bt848)
+- STB Gateway 6000402 (bt848)
+- STB TV130 PCI
+
+Videologic
+~~~~~~~~~~
+
+Models:
+
+- Captivator Pro/TV (ISA?)
+- Captivator PCI/VC (Bt848 bundled with camera) (capture only)
+
+Technotrend
+~~~~~~~~~~~~
+
+Models:
+
+- TT-SAT PCI (PCB "Sat-PCI Rev.:1.3.1"; zr36125, vpx3225d, stc0056a, Tuner:BSKE6-155A
+- TT-DVB-Sat
+ - revisions 1.1, 1.3, 1.5, 1.6 and 2.1
+ - This card is sold as OEM from:
+
+ - Siemens DVB-s Card
+ - Hauppauge WinTV DVB-S
+ - Technisat SkyStar 1 DVB
+ - Galaxis DVB Sat
+
+ - Now this card is called TT-PCline Premium Family
+ - TT-Budget (saa7146, bsru6-701a)
+ This card is sold as OEM from:
+
+ - Hauppauge WinTV Nova
+ - Satelco Standard PCI (DVB-S)
+ - TT-DVB-C PCI
+
+Teles
+~~~~~
+
+ DVB-s (Rev. 2.2, BSRV2-301A, data only?)
+
+Remote Vision
+~~~~~~~~~~~~~
+
+MX RV605 (Bt848 capture only)
+
+Boeder
+~~~~~~
+
+Models:
+
+- PC ChatCam (Model 68252) (Bt848 capture only)
+- Tv/Fm Capture Card (Model 68404) = PV951
+
+Media-Surfer (esc-kathrein.de)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- Sat-Surfer (ISA)
+- Sat-Surfer PCI = Techni-PC-Sat
+- Cable-Surfer 1
+- Cable-Surfer 2
+- Cable-Surfer PCI (zr36120)
+- Audio-Surfer (ISA Radio card)
+
+Jetway (www.jetway.com.tw)
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- JW-TV 878M
+- JW-TV 878 = KWorld KW-TV878RF
+
+Galaxis
+~~~~~~~
+
+Models:
+
+- Galaxis DVB Card S CI
+- Galaxis DVB Card C CI
+- Galaxis DVB Card S
+- Galaxis DVB Card C
+- Galaxis plug.in S [neuer Name: Galaxis DVB Card S CI
+
+Hauppauge
+~~~~~~~~~
+
+Models:
+
+- many many WinTV models ...
+- WinTV DVBs = Technotrend Premium 1.3
+- WinTV NOVA = Technotrend Budget 1.1 "S-DVB DATA"
+- WinTV NOVA-CI "SDVBACI"
+- WinTV Nova USB (=Technotrend USB 1.0)
+- WinTV-Nexus-s (=Technotrend Premium 2.1 or 2.2)
+- WinTV PVR
+- WinTV PVR 250
+- WinTV PVR 450
+
+US models
+
+-990 WinTV-PVR-350 (249USD) (iTVC15 chipset + radio)
+-980 WinTV-PVR-250 (149USD) (iTVC15 chipset)
+-880 WinTV-PVR-PCI (199USD) (KFIR chipset + bt878)
+-881 WinTV-PVR-USB
+-190 WinTV-GO
+-191 WinTV-GO-FM
+-404 WinTV
+-401 WinTV-radio
+-495 WinTV-Theater
+-602 WinTV-USB
+-621 WinTV-USB-FM
+-600 USB-Live
+-698 WinTV-HD
+-697 WinTV-D
+-564 WinTV-Nexus-S
+
+Deutsche Modelle:
+
+-603 WinTV GO
+-719 WinTV Primio-FM
+-718 WinTV PCI-FM
+-497 WinTV Theater
+-569 WinTV USB
+-568 WinTV USB-FM
+-882 WinTV PVR
+-981 WinTV PVR 250
+-891 WinTV-PVR-USB
+-541 WinTV Nova
+-488 WinTV Nova-Ci
+-564 WinTV-Nexus-s
+-727 WinTV-DVB-c
+-545 Common Interface
+-898 WinTV-Nova-USB
+
+UK models:
+
+-607 WinTV Go
+-693,793 WinTV Primio FM
+-647,747 WinTV PCI FM
+-498 WinTV Theater
+-883 WinTV PVR
+-893 WinTV PVR USB (Duplicate entry)
+-566 WinTV USB (UK)
+-573 WinTV USB FM
+-429 Impact VCB (bt848)
+-600 USB Live (Video-In 1x Comp, 1xSVHS)
+-542 WinTV Nova
+-717 WinTV DVB-S
+-909 Nova-t PCI
+-893 Nova-t USB (Duplicate entry)
+-802 MyTV
+-804 MyView
+-809 MyVideo
+-872 MyTV2Go FM
+-546 WinTV Nova-S CI
+-543 WinTV Nova
+-907 Nova-S USB
+-908 Nova-T USB
+-717 WinTV Nexus-S
+-157 DEC3000-s Standalone + USB
+
+Spain:
+
+-685 WinTV-Go
+-690 WinTV-PrimioFM
+-416 WinTV-PCI Nicam Estereo
+-677 WinTV-PCI-FM
+-699 WinTV-Theater
+-683 WinTV-USB
+-678 WinTV-USB-FM
+-983 WinTV-PVR-250
+-883 WinTV-PVR-PCI
+-993 WinTV-PVR-350
+-893 WinTV-PVR-USB
+-728 WinTV-DVB-C PCI
+-832 MyTV2Go
+-869 MyTV2Go-FM
+-805 MyVideo (USB)
+
+
+Matrix-Vision
+~~~~~~~~~~~~~
+
+Models:
+
+- MATRIX-Vision MV-Delta
+- MATRIX-Vision MV-Delta 2
+- MVsigma-SLC (Bt848)
+
+Conceptronic (.net)
+~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- TVCON FM, TV card w/ FM = CPH05x
+- TVCON = CPH06x
+
+BestData
+~~~~~~~~
+
+Models:
+
+- HCC100 = VCC100rev1 + camera
+- VCC100 rev1 (bt848)
+- VCC100 rev2 (bt878)
+
+Gallant (www.gallantcom.com) www.minton.com.tw
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- Intervision IV-510 (capture only bt8x8)
+- Intervision IV-550 (bt8x8)
+- Intervision IV-100 (zoran)
+- Intervision IV-1000 (bt8x8)
+
+Asonic (www.asonic.com.cn) (website down)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+SkyEye tv 878
+
+Hoontech
+~~~~~~~~
+
+878TV/FM
+
+Teppro (www.itcteppro.com.tw)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- ITC PCITV (Card Ver 1.0) "Teppro TV1/TVFM1 Card"
+- ITC PCITV (Card Ver 2.0)
+- ITC PCITV (Card Ver 3.0) = "PV-BT878P+ (REV.9D)"
+- ITC PCITV (Card Ver 4.0)
+- TEPPRO IV-550 (For BT848 Main Chip)
+- ITC DSTTV (bt878, satellite)
+- ITC VideoMaker (saa7146, StreamMachine sm2110, tvtuner) "PV-SM2210P+ (REV:1C)"
+
+Kworld (www.kworld.com.tw)
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+PC TV Station:
+
+- KWORLD KW-TV878R TV (no radio)
+- KWORLD KW-TV878RF TV (w/ radio)
+- KWORLD KW-TVL878RF (low profile)
+- KWORLD KW-TV713XRF (saa7134)
+
+
+ MPEG TV Station (same cards as above plus WinDVR Software MPEG en/decoder)
+
+- KWORLD KW-TV878R -Pro TV (no Radio)
+- KWORLD KW-TV878RF-Pro TV (w/ Radio)
+- KWORLD KW-TV878R -Ultra TV (no Radio)
+- KWORLD KW-TV878RF-Ultra TV (w/ Radio)
+
+JTT/ Justy Corp.(http://www.jtt.ne.jp/)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+JTT-02 (JTT TV) "TV watchmate pro" (bt848)
+
+ADS www.adstech.com
+~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- Channel Surfer TV ( CHX-950 )
+- Channel Surfer TV+FM ( CHX-960FM )
+
+AVEC www.prochips.com
+~~~~~~~~~~~~~~~~~~~~~
+
+AVEC Intercapture (bt848, tea6320)
+
+NoBrand
+~~~~~~~
+
+TV Excel = Australian Name for "PV-BT878P+ 8E" or "878TV Rev.3\_"
+
+Mach www.machspeed.com
+~~~~~~~~~~~~~~~~~~~~~~
+
+Mach TV 878
+
+Eline www.eline-net.com/
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- Eline Vision TVMaster / TVMaster FM (ELV-TVM/ ELV-TVM-FM) = LR26 (bt878)
+- Eline Vision TVMaster-2000 (ELV-TVM-2000, ELV-TVM-2000-FM)= LR138 (saa713x)
+
+Spirit
+~~~~~~
+
+- Spirit TV Tuner/Video Capture Card (bt848)
+
+Boser www.boser.com.tw
+~~~~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- HS-878 Mini PCI Capture Add-on Card
+- HS-879 Mini PCI 3D Audio and Capture Add-on Card (w/ ES1938 Solo-1)
+
+Satelco www.citycom-gmbh.de, www.satelco.de
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- TV-FM =KNC1 saa7134
+- Standard PCI (DVB-S) = Technotrend Budget
+- Standard PCI (DVB-S) w/ CI
+- Satelco Highend PCI (DVB-S) = Technotrend Premium
+
+
+Sensoray www.sensoray.com
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- Sensoray 311 (PC/104 bus)
+- Sensoray 611 (PCI)
+
+CEI (Chartered Electronics Industries Pte Ltd [CEI] [FCC ID HBY])
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- TV Tuner - HBY-33A-RAFFLES Brooktree Bt848KPF + Philips
+- TV Tuner MG9910 - HBY33A-TVO CEI + Philips SAA7110 + OKI M548262 + ST STV8438CV
+- Primetime TV (ISA)
+
+ - acquired by Singapore Technologies
+ - now operating as Chartered Semiconductor Manufacturing
+ - Manufacturer of video cards is listed as:
+
+ - Cogent Electronics Industries [CEI]
+
+AITech
+~~~~~~
+
+Models:
+
+- Wavewatcher TV (ISA)
+- AITech WaveWatcher TV-PCI = can be LR26 (Bt848) or LR50 (BT878)
+- WaveWatcher TVR-202 TV/FM Radio Card (ISA)
+
+MAXRON
+~~~~~~
+
+Maxron MaxTV/FM Radio (KW-TV878-FNT) = Kworld or JW-TV878-FBK
+
+www.ids-imaging.de
+~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- Falcon Series (capture only)
+
+In USA: http://www.theimagingsource.com/
+- DFG/LC1
+
+www.sknet-web.co.jp
+~~~~~~~~~~~~~~~~~~~
+
+SKnet Monster TV (saa7134)
+
+A-Max www.amaxhk.com (Colormax, Amax, Napa)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+APAC Viewcomp 878
+
+Cybertainment
+~~~~~~~~~~~~~
+
+Models:
+
+- CyberMail AV Video Email Kit w/ PCI Capture Card (capture only)
+- CyberMail Xtreme
+
+These are Flyvideo
+
+VCR (http://www.vcrinc.com/)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Video Catcher 16
+
+Twinhan
+~~~~~~~
+
+Models:
+
+- DST Card/DST-IP (bt878, twinhan asic) VP-1020
+ - Sold as:
+
+ - KWorld DVBS Satellite TV-Card
+ - Powercolor DSTV Satellite Tuner Card
+ - Prolink Pixelview DTV2000
+ - Provideo PV-911 Digital Satellite TV Tuner Card With Common Interface ?
+
+- DST-CI Card (DVB Satellite) VP-1030
+- DCT Card (DVB cable)
+
+MSI
+~~~
+
+Models:
+
+- MSI TV@nywhere Tuner Card (MS-8876) (CX23881/883) Not Bt878 compatible.
+- MS-8401 DVB-S
+
+Focus www.focusinfo.com
+~~~~~~~~~~~~~~~~~~~~~~~
+
+InVideo PCI (bt878)
+
+Sdisilk www.sdisilk.com/
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- SDI Silk 100
+- SDI Silk 200 SDI Input Card
+
+www.euresys.com
+~~~~~~~~~~~~~~~
+
+PICOLO series
+
+PMC/Pace
+~~~~~~~~
+
+www.pacecom.co.uk website closed
+
+Mercury www.kobian.com (UK and FR)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- LR50
+- LR138RBG-Rx == LR138
+
+TEC sound
+~~~~~~~~~
+
+TV-Mate = Zoltrix VP-8482
+
+Though educated googling found: www.techmakers.com
+
+(package and manuals don't have any other manufacturer info) TecSound
+
+Lorenzen www.lorenzen.de
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+SL DVB-S PCI = Technotrend Budget PCI (su1278 or bsru version)
+
+Origo (.uk) www.origo2000.com
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+PC TV Card = LR50
+
+I/O Magic www.iomagic.com
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+PC PVR - Desktop TV Personal Video Recorder DR-PCTV100 = Pinnacle ROB2D-51009464 4.0 + Cyberlink PowerVCR II
+
+Arowana
+~~~~~~~
+
+TV-Karte / Poso Power TV (?) = Zoltrix VP-8482 (?)
+
+iTVC15 boards
+~~~~~~~~~~~~~
+
+kuroutoshikou.com ITVC15
+yuan.com MPG160 PCI TV (Internal PCI MPEG2 encoder card plus TV-tuner)
+
+Asus www.asuscom.com
+~~~~~~~~~~~~~~~~~~~~
+
+Models:
+
+- Asus TV Tuner Card 880 NTSC (low profile, cx23880)
+- Asus TV (saa7134)
+
+Hoontech
+~~~~~~~~
+
+http://www.hoontech.de/
+
+- HART Vision 848 (H-ART Vision 848)
+- HART Vision 878 (H-Art Vision 878)
+
+
+
+Chips used at bttv devices
+--------------------------
+
+- all boards:
+
+ - Brooktree Bt848/848A/849/878/879: video capture chip
+
+- Board specific
+
+ - Miro PCTV:
+
+ - Philips or Temic Tuner
+
+ - Hauppauge Win/TV pci (version 405):
+
+ - Microchip 24LC02B or Philips 8582E2Y:
+
+ - 256 Byte EEPROM with configuration information
+ - I2C 0xa0-0xa1, (24LC02B also responds to 0xa2-0xaf)
+
+ - Philips SAA5246AGP/E: Videotext decoder chip, I2C 0x22-0x23
+
+ - TDA9800: sound decoder
+
+ - Winbond W24257AS-35: 32Kx8 CMOS static RAM (Videotext buffer mem)
+
+ - 14052B: analog switch for selection of sound source
+
+- PAL:
+
+ - TDA5737: VHF, hyperband and UHF mixer/oscillator for TV and VCR 3-band tuners
+ - TSA5522: 1.4 GHz I2C-bus controlled synthesizer, I2C 0xc2-0xc3
+
+- NTSC:
+
+ - TDA5731: VHF, hyperband and UHF mixer/oscillator for TV and VCR 3-band tuners
+ - TSA5518: no datasheet available on Philips site
+
+- STB TV pci:
+
+ - ???
+ - if you want better support for STB cards send me info!
+ Look at the board! What chips are on it?
+
+
+
+
+Specs
+-----
+
+Philips http://www.Semiconductors.COM/pip/
+
+Conexant http://www.conexant.com/
+
+Micronas http://www.micronas.com/en/home/index.html
+
+Thanks
+------
+
+Many thanks to:
+
+- Markus Schroeder <schroedm@uni-duesseldorf.de> for information on the Bt848
+ and tuner programming and his control program xtvc.
+
+- Martin Buck <martin-2.buck@student.uni-ulm.de> for his great Videotext
+ package.
+
+- Gerd Hoffmann for the MSP3400 support and the modular
+ I2C, tuner, ... support.
+
+
+- MATRIX Vision for giving us 2 cards for free, which made support of
+ single crystal operation possible.
+
+- MIRO for providing a free PCTV card and detailed information about the
+ components on their cards. (E.g. how the tuner type is detected)
+ Without their card I could not have debugged the NTSC mode.
+
+- Hauppauge for telling how the sound input is selected and what components
+ they do and will use on their radio cards.
+ Also many thanks for faxing me the FM1216 data sheet.
+
+Contributors
+------------
+
+Michael Chu <mmchu@pobox.com>
+ AverMedia fix and more flexible card recognition
+
+Alan Cox <alan@lxorguk.ukuu.org.uk>
+ Video4Linux interface and 2.1.x kernel adaptation
+
+Chris Kleitsch
+ Hardware I2C
+
+Gerd Hoffmann
+ Radio card (ITT sound processor)
+
+bigfoot <bigfoot@net-way.net>
+
+Ragnar Hojland Espinosa <ragnar@macula.net>
+ ConferenceTV card
+
+
++ many more (please mail me if you are missing in this list and would
+ like to be mentioned)
diff --git a/Documentation/admin-guide/media/building.rst b/Documentation/admin-guide/media/building.rst
new file mode 100644
index 000000000000..a06473429916
--- /dev/null
+++ b/Documentation/admin-guide/media/building.rst
@@ -0,0 +1,357 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================
+Building support for a media device
+===================================
+
+The first step is to download the Kernel's source code, either via a
+distribution-specific source file or via the Kernel's main git tree\ [1]_.
+
+Please notice, however, that, if:
+
+- you're a braveheart and want to experiment with new stuff;
+- if you want to report a bug;
+- if you're developing new patches
+
+you should use the main media development tree ``master`` branch:
+
+ https://git.linuxtv.org/media_tree.git/
+
+In this case, you may find some useful information at the
+`LinuxTv wiki pages <https://linuxtv.org/wiki>`_:
+
+ https://linuxtv.org/wiki/index.php/How_to_Obtain,_Build_and_Install_V4L-DVB_Device_Drivers
+
+.. [1] The upstream Linux Kernel development tree is located at
+
+ https://git.kernel.org/pub/scm/li nux/kernel/git/torvalds/linux.git/
+
+Configuring the Linux Kernel
+============================
+
+You can access a menu of Kernel building options with::
+
+ $ make menuconfig
+
+Then, select all desired options and exit it, saving the configuration.
+
+The changed configuration will be at the ``.config`` file. It would
+look like::
+
+ ...
+ # CONFIG_RC_CORE is not set
+ # CONFIG_CEC_CORE is not set
+ CONFIG_MEDIA_SUPPORT=m
+ CONFIG_MEDIA_SUPPORT_FILTER=y
+ ...
+
+The media subsystem is controlled by those menu configuration options::
+
+ Device Drivers --->
+ <M> Remote Controller support --->
+ [ ] HDMI CEC RC integration
+ [ ] Enable CEC error injection support
+ [*] HDMI CEC drivers --->
+ <*> Multimedia support --->
+
+The ``Remote Controller support`` option enables the core support for
+remote controllers\ [2]_.
+
+The ``HDMI CEC RC integration`` option enables integration of HDMI CEC
+with Linux, allowing to receive data via HDMI CEC as if it were produced
+by a remote controller directly connected to the machine.
+
+The ``HDMI CEC drivers`` option allow selecting platform and USB drivers
+that receives and/or transmits CEC codes via HDMI interfaces\ [3]_.
+
+The last option (``Multimedia support``) enables support for cameras,
+audio/video grabbers and TV.
+
+The media subsystem support can either be built together with the main
+Kernel or as a module. For most use cases, it is preferred to have it
+built as modules.
+
+.. note::
+
+ Instead of using a menu, the Kernel provides a script with allows
+ enabling configuration options directly. To enable media support
+ and remote controller support using Kernel modules, you could use::
+
+ $ scripts/config -m RC_CORE
+ $ scripts/config -m MEDIA_SUPPORT
+
+.. [2] ``Remote Controller support`` should also be enabled if you
+ want to use some TV card drivers that may depend on the remote
+ controller core support.
+
+.. [3] Please notice that the DRM subsystem also have drivers for GPUs
+ that use the media HDMI CEC support.
+
+ Those GPU-specific drivers are selected via the ``Graphics support``
+ menu, under ``Device Drivers``.
+
+ When a GPU driver supports HDMI CEC, it will automatically
+ enable the CEC core support at the media subsystem.
+
+Media dependencies
+------------------
+
+It should be noticed that enabling the above from a clean config is
+usually not enough. The media subsystem depends on several other Linux
+core support in order to work.
+
+For example, most media devices use a serial communication bus in
+order to talk with some peripherals. Such bus is called I²C
+(Inter-Integrated Circuit). In order to be able to build support
+for such hardware, the I²C bus support should be enabled, either via
+menu or with::
+
+ ./scripts/config -m I2C
+
+Another example: the remote controller core requires support for
+input devices, with can be enabled with::
+
+ ./scripts/config -m INPUT
+
+Other core functionality may also be needed (like PCI and/or USB support),
+depending on the specific driver(s) you would like to enable.
+
+Enabling Remote Controller Support
+----------------------------------
+
+The remote controller menu allows selecting drivers for specific devices.
+It's menu looks like this::
+
+ --- Remote Controller support
+ <M> Compile Remote Controller keymap modules
+ [*] LIRC user interface
+ [*] Support for eBPF programs attached to lirc devices
+ [*] Remote controller decoders --->
+ [*] Remote Controller devices --->
+
+The ``Compile Remote Controller keymap modules`` option creates key maps for
+several popular remote controllers.
+
+The ``LIRC user interface`` option adds enhanced functionality when using the
+``lirc`` program, by enabling an API that allows userspace to receive raw data
+from remote controllers.
+
+The ``Support for eBPF programs attached to lirc devices`` option allows
+the usage of special programs (called eBPF) that would allow applications
+to add extra remote controller decoding functionality to the Linux Kernel.
+
+The ``Remote controller decoders`` option allows selecting the
+protocols that will be recognized by the Linux Kernel. Except if you
+want to disable some specific decoder, it is suggested to keep all
+sub-options enabled.
+
+The ``Remote Controller devices`` allows you to select the drivers
+that would be needed to support your device.
+
+The same configuration can also be set via the ``script/config``
+script. So, for instance, in order to support the ITE remote controller
+driver (found on Intel NUCs and on some ASUS x86 desktops), you could do::
+
+ $ scripts/config -e INPUT
+ $ scripts/config -e ACPI
+ $ scripts/config -e MODULES
+ $ scripts/config -m RC_CORE
+ $ scripts/config -e RC_DEVICES
+ $ scripts/config -e RC_DECODERS
+ $ scripts/config -m IR_RC5_DECODER
+ $ scripts/config -m IR_ITE_CIR
+
+Enabling HDMI CEC Support
+-------------------------
+
+The HDMI CEC support is set automatically when a driver requires it. So,
+all you need to do is to enable support either for a graphics card
+that needs it or by one of the existing HDMI drivers.
+
+The HDMI-specific drivers are available at the ``HDMI CEC drivers``
+menu\ [4]_::
+
+ --- HDMI CEC drivers
+ < > ChromeOS EC CEC driver
+ < > Amlogic Meson AO CEC driver
+ < > Amlogic Meson G12A AO CEC driver
+ < > Generic GPIO-based CEC driver
+ < > Samsung S5P CEC driver
+ < > STMicroelectronics STiH4xx HDMI CEC driver
+ < > STMicroelectronics STM32 HDMI CEC driver
+ < > Tegra HDMI CEC driver
+ < > SECO Boards HDMI CEC driver
+ [ ] SECO Boards IR RC5 support
+ < > Pulse Eight HDMI CEC
+ < > RainShadow Tech HDMI CEC
+
+.. [4] The above contents is just an example. The actual options for
+ HDMI devices depends on the system's architecture and may vary
+ on new Kernels.
+
+Enabling Media Support
+----------------------
+
+The Media menu has a lot more options than the remote controller menu.
+Once selected, you should see the following options::
+
+ --- Media support
+ [ ] Filter media drivers
+ [*] Autoselect ancillary drivers
+ Media device types --->
+ Media core support --->
+ Video4Linux options --->
+ Media controller options --->
+ Digital TV options --->
+ HDMI CEC options --->
+ Media drivers --->
+ Media ancillary drivers --->
+
+Except if you know exactly what you're doing, or if you want to build
+a driver for a SoC platform, it is strongly recommended to keep the
+``Autoselect ancillary drivers`` option turned on, as it will auto-select
+the needed I²C ancillary drivers.
+
+There are now two ways to select media device drivers, as described
+below.
+
+``Filter media drivers`` menu
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This menu is meant to easy setup for PC and Laptop hardware. It works
+by letting the user to specify what kind of media drivers are desired,
+with those options::
+
+ [ ] Cameras and video grabbers
+ [ ] Analog TV
+ [ ] Digital TV
+ [ ] AM/FM radio receivers/transmitters
+ [ ] Software defined radio
+ [ ] Platform-specific devices
+ [ ] Test drivers
+
+So, if you want to add support to a camera or video grabber only,
+select just the first option. Multiple options are allowed.
+
+Once the options on this menu are selected, the building system will
+auto-select the needed core drivers in order to support the selected
+functionality.
+
+.. note::
+
+ Most TV cards are hybrid: they support both Analog TV and Digital TV.
+
+ If you have an hybrid card, you may need to enable both ``Analog TV``
+ and ``Digital TV`` at the menu.
+
+When using this option, the defaults for the media support core
+functionality are usually good enough to provide the basic functionality
+for the driver. Yet, you could manually enable some desired extra (optional)
+functionality using the settings under each of the following
+``Media support`` sub-menus::
+
+ Media core support --->
+ Video4Linux options --->
+ Media controller options --->
+ Digital TV options --->
+ HDMI CEC options --->
+
+Once you select the desired filters, the drivers that matches the filtering
+criteria will be available at the ``Media support->Media drivers`` sub-menu.
+
+``Media Core Support`` menu without filtering
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If you disable the ``Filter media drivers`` menu, all drivers available
+for your system whose dependencies are met should be shown at the
+``Media drivers`` menu.
+
+Please notice, however, that you should first ensure that the
+``Media Core Support`` menu has all the core functionality your drivers
+would need, as otherwise the corresponding device drivers won't be shown.
+
+Example
+-------
+
+In order to enable modular support for one of the boards listed on
+:doc:`this table <cx231xx-cardlist>`, with modular media core modules, the
+``.config`` file should contain those lines::
+
+ CONFIG_MODULES=y
+ CONFIG_USB=y
+ CONFIG_I2C=y
+ CONFIG_INPUT=y
+ CONFIG_RC_CORE=m
+ CONFIG_MEDIA_SUPPORT=m
+ CONFIG_MEDIA_SUPPORT_FILTER=y
+ CONFIG_MEDIA_ANALOG_TV_SUPPORT=y
+ CONFIG_MEDIA_DIGITAL_TV_SUPPORT=y
+ CONFIG_MEDIA_USB_SUPPORT=y
+ CONFIG_VIDEO_CX231XX=y
+ CONFIG_VIDEO_CX231XX_DVB=y
+
+Building and installing a new Kernel
+====================================
+
+Once the ``.config`` file has everything needed, all it takes to build
+is to run the ``make`` command::
+
+ $ make
+
+And then install the new Kernel and its modules::
+
+ $ sudo make modules_install
+ $ sudo make install
+
+Building just the new media drivers and core
+============================================
+
+Running a new development Kernel from the development tree is usually risky,
+because it may have experimental changes that may have bugs. So, there are
+some ways to build just the new drivers, using alternative trees.
+
+There is the `Linux Kernel backports project
+<https://backports.wiki.kernel.org/index.php/Main_Page>`_, with contains
+newer drivers meant to be compiled against stable Kernels.
+
+The LinuxTV developers, with are responsible for maintaining the media
+subsystem also maintains a backport tree, with just the media drivers
+daily updated from the newest kernel. Such tree is available at:
+
+https://git.linuxtv.org/media_build.git/
+
+It should be noticed that, while it should be relatively safe to use the
+``media_build`` tree for testing purposes, there are not warranties that
+it would work (or even build) on a random Kernel. This tree is maintained
+using a "best-efforts" principle, as time permits us to fix issues there.
+
+If you notice anything wrong on it, feel free to submit patches at the
+Linux media subsystem's mailing list: media@vger.kernel.org. Please
+add ``[PATCH media-build]`` at the e-mail's subject if you submit a new
+patch for the media-build.
+
+Before using it, you should run::
+
+ $ ./build
+
+.. note::
+
+ 1) you may need to run it twice if the ``media-build`` tree gets
+ updated;
+ 2) you may need to do a ``make distclean`` if you had built it
+ in the past for a different Kernel version than the one you're
+ currently using;
+ 3) by default, it will use the same config options for media as
+ the ones defined on the Kernel you're running.
+
+In order to select different drivers or different config options,
+use::
+
+ $ make menuconfig
+
+Then, you can build and install the new drivers::
+
+ $ make && sudo make install
+
+This will override the previous media drivers that your Kernel were
+using.
diff --git a/Documentation/admin-guide/media/cafe_ccic.rst b/Documentation/admin-guide/media/cafe_ccic.rst
new file mode 100644
index 000000000000..ff7fbce1342a
--- /dev/null
+++ b/Documentation/admin-guide/media/cafe_ccic.rst
@@ -0,0 +1,62 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+The cafe_ccic driver
+====================
+
+Author: Jonathan Corbet <corbet@lwn.net>
+
+Introduction
+------------
+
+"cafe_ccic" is a driver for the Marvell 88ALP01 "cafe" CMOS camera
+controller. This is the controller found in first-generation OLPC systems,
+and this driver was written with support from the OLPC project.
+
+Current status: the core driver works. It can generate data in YUV422,
+RGB565, and RGB444 formats. (Anybody looking at the code will see RGB32 as
+well, but that is a debugging aid which will be removed shortly). VGA and
+QVGA modes work; CIF is there but the colors remain funky. Only the OV7670
+sensor is known to work with this controller at this time.
+
+To try it out: either of these commands will work:
+
+.. code-block:: none
+
+ $ mplayer tv:// -tv driver=v4l2:width=640:height=480 -nosound
+ $ mplayer tv:// -tv driver=v4l2:width=640:height=480:outfmt=bgr16 -nosound
+
+The "xawtv" utility also works; gqcam does not, for unknown reasons.
+
+Load time options
+-----------------
+
+There are a few load-time options, most of which can be changed after
+loading via sysfs as well:
+
+ - alloc_bufs_at_load: Normally, the driver will not allocate any DMA
+ buffers until the time comes to transfer data. If this option is set,
+ then worst-case-sized buffers will be allocated at module load time.
+ This option nails down the memory for the life of the module, but
+ perhaps decreases the chances of an allocation failure later on.
+
+ - dma_buf_size: The size of DMA buffers to allocate. Note that this
+ option is only consulted for load-time allocation; when buffers are
+ allocated at run time, they will be sized appropriately for the current
+ camera settings.
+
+ - n_dma_bufs: The controller can cycle through either two or three DMA
+ buffers. Normally, the driver tries to use three buffers; on faster
+ systems, however, it will work well with only two.
+
+ - min_buffers: The minimum number of streaming I/O buffers that the driver
+ will consent to work with. Default is one, but, on slower systems,
+ better behavior with mplayer can be achieved by setting to a higher
+ value (like six).
+
+ - max_buffers: The maximum number of streaming I/O buffers; default is
+ ten. That number was carefully picked out of a hat and should not be
+ assumed to actually mean much of anything.
+
+ - flip: If this boolean parameter is set, the sensor will be instructed to
+ invert the video image. Whether it makes sense is determined by how
+ your particular camera is mounted.
diff --git a/Documentation/admin-guide/media/cardlist.rst b/Documentation/admin-guide/media/cardlist.rst
new file mode 100644
index 000000000000..5b38bfd6a19d
--- /dev/null
+++ b/Documentation/admin-guide/media/cardlist.rst
@@ -0,0 +1,29 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========
+Cards List
+==========
+
+The media subsystem provide support for lots of PCI and USB drivers, plus
+platform-specific drivers. It also contains several ancillary I²C drivers.
+
+The platform-specific drivers are usually present on embedded systems,
+or are supported by the main board. Usually, setting them is done via
+OpenFirmware or ACPI.
+
+The PCI and USB drivers, however, are independent of the system's board,
+and may be added/removed by the user.
+
+You may also take a look at
+https://linuxtv.org/wiki/index.php/Hardware_Device_Information
+for more details about supported cards.
+
+.. toctree::
+ :maxdepth: 2
+
+ usb-cardlist
+ pci-cardlist
+ platform-cardlist
+ radio-cardlist
+ i2c-cardlist
+ misc-cardlist
diff --git a/Documentation/admin-guide/media/cec.rst b/Documentation/admin-guide/media/cec.rst
new file mode 100644
index 000000000000..6b30e355cf23
--- /dev/null
+++ b/Documentation/admin-guide/media/cec.rst
@@ -0,0 +1,380 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========
+HDMI CEC
+========
+
+Supported hardware in mainline
+==============================
+
+HDMI Transmitters:
+
+- Exynos4
+- Exynos5
+- STIH4xx HDMI CEC
+- V4L2 adv7511 (same HW, but a different driver from the drm adv7511)
+- stm32
+- Allwinner A10 (sun4i)
+- Raspberry Pi
+- dw-hdmi (Synopsis IP)
+- amlogic (meson ao-cec and ao-cec-g12a)
+- drm adv7511/adv7533
+- omap4
+- tegra
+- rk3288, rk3399
+- tda998x
+- DisplayPort CEC-Tunneling-over-AUX on i915, nouveau and amdgpu
+- ChromeOS EC CEC
+- CEC for SECO boards (UDOO x86).
+- Chrontel CH7322
+
+
+HDMI Receivers:
+
+- adv7604/11/12
+- adv7842
+- tc358743
+
+USB Dongles (see below for additional information on how to use these
+dongles):
+
+- Pulse-Eight: the pulse8-cec driver implements the following module option:
+ ``persistent_config``: by default this is off, but when set to 1 the driver
+ will store the current settings to the device's internal eeprom and restore
+ it the next time the device is connected to the USB port.
+- RainShadow Tech. Note: this driver does not support the persistent_config
+ module option of the Pulse-Eight driver. The hardware supports it, but I
+ have no plans to add this feature. But I accept patches :-)
+
+Miscellaneous:
+
+- vivid: emulates a CEC receiver and CEC transmitter.
+ Can be used to test CEC applications without actual CEC hardware.
+
+- cec-gpio. If the CEC pin is hooked up to a GPIO pin then
+ you can control the CEC line through this driver. This supports error
+ injection as well.
+
+- cec-gpio and Allwinner A10 (or any other driver that uses the CEC pin
+ framework to drive the CEC pin directly): the CEC pin framework uses
+ high-resolution timers. These timers are affected by NTP daemons that
+ speed up or slow down the clock to sync with the official time. The
+ chronyd server will by default increase or decrease the clock by
+ 1/12th. This will cause the CEC timings to go out of spec. To fix this,
+ add a 'maxslewrate 40000' line to chronyd.conf. This limits the clock
+ frequency change to 1/25th, which keeps the CEC timings within spec.
+
+
+Utilities
+=========
+
+Utilities are available here: https://git.linuxtv.org/v4l-utils.git
+
+``utils/cec-ctl``: control a CEC device
+
+``utils/cec-compliance``: test compliance of a remote CEC device
+
+``utils/cec-follower``: emulate a CEC follower device
+
+Note that ``cec-ctl`` has support for the CEC Hospitality Profile as is
+used in some hotel displays. See http://www.htng.org.
+
+Note that the libcec library (https://github.com/Pulse-Eight/libcec) supports
+the linux CEC framework.
+
+If you want to get the CEC specification, then look at the References of
+the HDMI wikipedia page: https://en.wikipedia.org/wiki/HDMI. CEC is part
+of the HDMI specification. HDMI 1.3 is freely available (very similar to
+HDMI 1.4 w.r.t. CEC) and should be good enough for most things.
+
+
+DisplayPort to HDMI Adapters with working CEC
+=============================================
+
+Background: most adapters do not support the CEC Tunneling feature,
+and of those that do many did not actually connect the CEC pin.
+Unfortunately, this means that while a CEC device is created, it
+is actually all alone in the world and will never be able to see other
+CEC devices.
+
+This is a list of known working adapters that have CEC Tunneling AND
+that properly connected the CEC pin. If you find adapters that work
+but are not in this list, then drop me a note.
+
+To test: hook up your DP-to-HDMI adapter to a CEC capable device
+(typically a TV), then run::
+
+ cec-ctl --playback # Configure the PC as a CEC Playback device
+ cec-ctl -S # Show the CEC topology
+
+The ``cec-ctl -S`` command should show at least two CEC devices,
+ourselves and the CEC device you are connected to (i.e. typically the TV).
+
+General note: I have only seen this work with the Parade PS175, PS176 and
+PS186 chipsets and the MegaChips 2900. While MegaChips 28x0 claims CEC support,
+I have never seen it work.
+
+USB-C to HDMI
+-------------
+
+Samsung Multiport Adapter EE-PW700: https://www.samsung.com/ie/support/model/EE-PW700BBEGWW/
+
+Kramer ADC-U31C/HF: https://www.kramerav.com/product/ADC-U31C/HF
+
+Club3D CAC-2504: https://www.club-3d.com/en/detail/2449/usb_3.1_type_c_to_hdmi_2.0_uhd_4k_60hz_active_adapter/
+
+DisplayPort to HDMI
+-------------------
+
+Club3D CAC-1080: https://www.club-3d.com/en/detail/2442/displayport_1.4_to_hdmi_2.0b_hdr/
+
+CableCreation (SKU: CD0712): https://www.cablecreation.com/products/active-displayport-to-hdmi-adapter-4k-hdr
+
+HP DisplayPort to HDMI True 4k Adapter (P/N 2JA63AA): https://www.hp.com/us-en/shop/pdp/hp-displayport-to-hdmi-true-4k-adapter
+
+Mini-DisplayPort to HDMI
+------------------------
+
+Club3D CAC-1180: https://www.club-3d.com/en/detail/2443/mini_displayport_1.4_to_hdmi_2.0b_hdr/
+
+Note that passive adapters will never work, you need an active adapter.
+
+The Club3D adapters in this list are all MegaChips 2900 based. Other Club3D adapters
+are PS176 based and do NOT have the CEC pin hooked up, so only the three Club3D
+adapters above are known to work.
+
+I suspect that MegaChips 2900 based designs in general are likely to work
+whereas with the PS176 it is more hit-and-miss (mostly miss). The PS186 is
+likely to have the CEC pin hooked up, it looks like they changed the reference
+design for that chipset.
+
+
+USB CEC Dongles
+===============
+
+These dongles appear as ``/dev/ttyACMX`` devices and need the ``inputattach``
+utility to create the ``/dev/cecX`` devices. Support for the Pulse-Eight
+has been added to ``inputattach`` 1.6.0. Support for the Rainshadow Tech has
+been added to ``inputattach`` 1.6.1.
+
+You also need udev rules to automatically start systemd services::
+
+ SUBSYSTEM=="tty", KERNEL=="ttyACM[0-9]*", ATTRS{idVendor}=="2548", ATTRS{idProduct}=="1002", ACTION=="add", TAG+="systemd", ENV{SYSTEMD_WANTS}+="pulse8-cec-inputattach@%k.service"
+ SUBSYSTEM=="tty", KERNEL=="ttyACM[0-9]*", ATTRS{idVendor}=="2548", ATTRS{idProduct}=="1001", ACTION=="add", TAG+="systemd", ENV{SYSTEMD_WANTS}+="pulse8-cec-inputattach@%k.service"
+ SUBSYSTEM=="tty", KERNEL=="ttyACM[0-9]*", ATTRS{idVendor}=="04d8", ATTRS{idProduct}=="ff59", ACTION=="add", TAG+="systemd", ENV{SYSTEMD_WANTS}+="rainshadow-cec-inputattach@%k.service"
+
+and these systemd services:
+
+For Pulse-Eight make /lib/systemd/system/pulse8-cec-inputattach@.service::
+
+ [Unit]
+ Description=inputattach for pulse8-cec device on %I
+
+ [Service]
+ Type=simple
+ ExecStart=/usr/bin/inputattach --pulse8-cec /dev/%I
+
+For the RainShadow Tech make /lib/systemd/system/rainshadow-cec-inputattach@.service::
+
+ [Unit]
+ Description=inputattach for rainshadow-cec device on %I
+
+ [Service]
+ Type=simple
+ ExecStart=/usr/bin/inputattach --rainshadow-cec /dev/%I
+
+
+For proper suspend/resume support create: /lib/systemd/system/restart-cec-inputattach.service::
+
+ [Unit]
+ Description=restart inputattach for cec devices
+ After=suspend.target
+
+ [Service]
+ Type=forking
+ ExecStart=/bin/bash -c 'for d in /dev/serial/by-id/usb-Pulse-Eight*; do /usr/bin/inputattach --daemon --pulse8-cec $d; done; for d in /dev/serial/by-id/usb-RainShadow_Tech*; do /usr/bin/inputattach --daemon --rainshadow-cec $d; done'
+
+ [Install]
+ WantedBy=suspend.target
+
+And run ``systemctl enable restart-cec-inputattach``.
+
+To automatically set the physical address of the CEC device whenever the
+EDID changes, you can use ``cec-ctl`` with the ``-E`` option::
+
+ cec-ctl -E /sys/class/drm/card0-DP-1/edid
+
+This assumes the dongle is connected to the card0-DP-1 output (``xrandr`` will tell
+you which output is used) and it will poll for changes to the EDID and update
+the Physical Address whenever they occur.
+
+To automatically run this command you can use cron. Edit crontab with
+``crontab -e`` and add this line::
+
+ @reboot /usr/local/bin/cec-ctl -E /sys/class/drm/card0-DP-1/edid
+
+This only works for display drivers that expose the EDID in ``/sys/class/drm``,
+such as the i915 driver.
+
+
+CEC Without HPD
+===============
+
+Some displays when in standby mode have no HDMI Hotplug Detect signal, but
+CEC is still enabled so connected devices can send an <Image View On> CEC
+message in order to wake up such displays. Unfortunately, not all CEC
+adapters can support this. An example is the Odroid-U3 SBC that has a
+level-shifter that is powered off when the HPD signal is low, thus
+blocking the CEC pin. Even though the SoC can use CEC without a HPD,
+the level-shifter will prevent this from functioning.
+
+There is a CEC capability flag to signal this: ``CEC_CAP_NEEDS_HPD``.
+If set, then the hardware cannot wake up displays with this behavior.
+
+Note for CEC application implementers: the <Image View On> message must
+be the first message you send, don't send any other messages before.
+Certain very bad but unfortunately not uncommon CEC implementations
+get very confused if they receive anything else but this message and
+they won't wake up.
+
+When writing a driver it can be tricky to test this. There are two
+ways to do this:
+
+1) Get a Pulse-Eight USB CEC dongle, connect an HDMI cable from your
+ device to the Pulse-Eight, but do not connect the Pulse-Eight to
+ the display.
+
+ Now configure the Pulse-Eight dongle::
+
+ cec-ctl -p0.0.0.0 --tv
+
+ and start monitoring::
+
+ sudo cec-ctl -M
+
+ On the device you are testing run::
+
+ cec-ctl --playback
+
+ It should report a physical address of f.f.f.f. Now run this
+ command::
+
+ cec-ctl -t0 --image-view-on
+
+ The Pulse-Eight should see the <Image View On> message. If not,
+ then something (hardware and/or software) is preventing the CEC
+ message from going out.
+
+ To make sure you have the wiring correct just connect the
+ Pulse-Eight to a CEC-enabled display and run the same command
+ on your device: now there is a HPD, so you should see the command
+ arriving at the Pulse-Eight.
+
+2) If you have another linux device supporting CEC without HPD, then
+ you can just connect your device to that device. Yes, you can connect
+ two HDMI outputs together. You won't have a HPD (which is what we
+ want for this test), but the second device can monitor the CEC pin.
+
+ Otherwise use the same commands as in 1.
+
+If CEC messages do not come through when there is no HPD, then you
+need to figure out why. Typically it is either a hardware restriction
+or the software powers off the CEC core when the HPD goes low. The
+first cannot be corrected of course, the second will likely required
+driver changes.
+
+
+Microcontrollers & CEC
+======================
+
+We have seen some CEC implementations in displays that use a microcontroller
+to sample the bus. This does not have to be a problem, but some implementations
+have timing issues. This is hard to discover unless you can hook up a low-level
+CEC debugger (see the next section).
+
+You will see cases where the CEC transmitter holds the CEC line high or low for
+a longer time than is allowed. For directed messages this is not a problem since
+if that happens the message will not be Acked and it will be retransmitted.
+For broadcast messages no such mechanism exists.
+
+It's not clear what to do about this. It is probably wise to transmit some
+broadcast messages twice to reduce the chance of them being lost. Specifically
+<Standby> and <Active Source> are candidates for that.
+
+
+Making a CEC debugger
+=====================
+
+By using a Raspberry Pi 4B and some cheap components you can make
+your own low-level CEC debugger.
+
+The critical component is one of these HDMI female-female passthrough connectors
+(full soldering type 1):
+
+https://elabbay.myshopify.com/collections/camera/products/hdmi-af-af-v1a-hdmi-type-a-female-to-hdmi-type-a-female-pass-through-adapter-breakout-board?variant=45533926147
+
+The video quality is variable and certainly not enough to pass-through 4kp60
+(594 MHz) video. You might be able to support 4kp30, but more likely you will
+be limited to 1080p60 (148.5 MHz). But for CEC testing that is fine.
+
+You need a breadboard and some breadboard wires:
+
+http://www.dx.com/p/diy-40p-male-to-female-male-to-male-female-to-female-dupont-line-wire-3pcs-356089#.WYLOOXWGN7I
+
+If you want to monitor the HPD and/or 5V lines as well, then you need one of
+these 5V to 3.3V level shifters:
+
+https://www.adafruit.com/product/757
+
+(This is just where I got these components, there are many other places you
+can get similar things).
+
+The ground pin of the HDMI connector needs to be connected to a ground
+pin of the Raspberry Pi, of course.
+
+The CEC pin of the HDMI connector needs to be connected to these pins:
+GPIO 6 and GPIO 7. The optional HPD pin of the HDMI connector should
+be connected via the level shifter to these pins: GPIO 23 and GPIO 12.
+The optional 5V pin of the HDMI connector should be connected via the
+level shifter to these pins: GPIO 25 and GPIO 22. Monitoring the HPD and
+5V lines is not necessary, but it is helpful.
+
+This device tree addition in ``arch/arm/boot/dts/bcm2711-rpi-4-b.dts``
+will hook up the cec-gpio driver correctly::
+
+ cec@6 {
+ compatible = "cec-gpio";
+ cec-gpios = <&gpio 6 (GPIO_ACTIVE_HIGH|GPIO_OPEN_DRAIN)>;
+ hpd-gpios = <&gpio 23 GPIO_ACTIVE_HIGH>;
+ v5-gpios = <&gpio 25 GPIO_ACTIVE_HIGH>;
+ };
+
+ cec@7 {
+ compatible = "cec-gpio";
+ cec-gpios = <&gpio 7 (GPIO_ACTIVE_HIGH|GPIO_OPEN_DRAIN)>;
+ hpd-gpios = <&gpio 12 GPIO_ACTIVE_HIGH>;
+ v5-gpios = <&gpio 22 GPIO_ACTIVE_HIGH>;
+ };
+
+If you haven't hooked up the HPD and/or 5V lines, then just delete those
+lines.
+
+This dts change will enable two cec GPIO devices: I typically use one to
+send/receive CEC commands and the other to monitor. If you monitor using
+an unconfigured CEC adapter then it will use GPIO interrupts which makes
+monitoring very accurate.
+
+If you just want to monitor traffic, then a single instance is sufficient.
+The minimum configuration is one HDMI female-female passthrough connector
+and two female-female breadboard wires: one for connecting the HDMI ground
+pin to a ground pin on the Raspberry Pi, and the other to connect the HDMI
+CEC pin to GPIO 6 on the Raspberry Pi.
+
+The documentation on how to use the error injection is here: :ref:`cec_pin_error_inj`.
+
+``cec-ctl --monitor-pin`` will do low-level CEC bus sniffing and analysis.
+You can also store the CEC traffic to file using ``--store-pin`` and analyze
+it later using ``--analyze-pin``.
+
+You can also use this as a full-fledged CEC device by configuring it
+using ``cec-ctl --tv -p0.0.0.0`` or ``cec-ctl --playback -p1.0.0.0``.
diff --git a/Documentation/admin-guide/media/ci.rst b/Documentation/admin-guide/media/ci.rst
new file mode 100644
index 000000000000..ded4d8fbbf92
--- /dev/null
+++ b/Documentation/admin-guide/media/ci.rst
@@ -0,0 +1,77 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Digital TV Conditional Access Interface
+=======================================
+
+
+.. note::
+
+ This documentation is outdated.
+
+This document describes the usage of the high level CI API as
+in accordance to the Linux DVB API. This is a not a documentation for the,
+existing low level CI API.
+
+.. note::
+
+ For the Twinhan/Twinhan clones, the dst_ca module handles the CI
+ hardware handling. This module is loaded automatically if a CI
+ (Common Interface, that holds the CAM (Conditional Access Module)
+ is detected.
+
+ca_zap
+~~~~~~
+
+A userspace application, like ``ca_zap`` is required to handle encrypted
+MPEG-TS streams.
+
+The ``ca_zap`` userland application is in charge of sending the
+descrambling related information to the Conditional Access Module (CAM).
+
+This application requires the following to function properly as of now.
+
+a) Tune to a valid channel, with szap.
+
+ eg: $ szap -c channels.conf -r "TMC" -x
+
+b) a channels.conf containing a valid PMT PID
+
+ eg: TMC:11996:h:0:27500:278:512:650:321
+
+ here 278 is a valid PMT PID. the rest of the values are the
+ same ones that szap uses.
+
+c) after running a szap, you have to run ca_zap, for the
+ descrambler to function,
+
+ eg: $ ca_zap channels.conf "TMC"
+
+d) Hopefully enjoy your favourite subscribed channel as you do with
+ a FTA card.
+
+.. note::
+
+ Currently ca_zap, and dst_test, both are meant for demonstration
+ purposes only, they can become full fledged applications if necessary.
+
+
+Cards that fall in this category
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+At present the cards that fall in this category are the Twinhan and its
+clones, these cards are available as VVMER, Tomato, Hercules, Orange and
+so on.
+
+CI modules that are supported
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The CI module support is largely dependent upon the firmware on the cards
+Some cards do support almost all of the available CI modules. There is
+nothing much that can be done in order to make additional CI modules
+working with these cards.
+
+Modules that have been tested by this driver at present are
+
+(1) Irdeto 1 and 2 from SCM
+(2) Viaccess from SCM
+(3) Dragoncam
diff --git a/Documentation/admin-guide/media/cx18-cardlist.rst b/Documentation/admin-guide/media/cx18-cardlist.rst
new file mode 100644
index 000000000000..26f2da9aa542
--- /dev/null
+++ b/Documentation/admin-guide/media/cx18-cardlist.rst
@@ -0,0 +1,17 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+CX18 cards list
+===============
+
+Those cards are supported by cx18 driver:
+
+- Hauppauge HVR-1600 (ESMT memory)
+- Hauppauge HVR-1600 (Samsung memory)
+- Compro VideoMate H900
+- Yuan MPC718 MiniPCI DVB-T/Analog
+- Conexant Raptor PAL/SECAM
+- Toshiba Qosmio DVB-T/Analog
+- Leadtek WinFast PVR2100
+- Leadtek WinFast DVR3100
+- GoTView PCI DVD3 Hybrid
+- Hauppauge HVR-1600 (s5h1411/tda18271)
diff --git a/Documentation/admin-guide/media/cx231xx-cardlist.rst b/Documentation/admin-guide/media/cx231xx-cardlist.rst
new file mode 100644
index 000000000000..d374101be047
--- /dev/null
+++ b/Documentation/admin-guide/media/cx231xx-cardlist.rst
@@ -0,0 +1,99 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+cx231xx cards list
+==================
+
+.. tabularcolumns:: |p{1.4cm}|p{10.0cm}|p{6.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 2 12 19
+ :stub-columns: 0
+
+ * - Card number
+ - Card name
+ - USB IDs
+ * - 0
+ - Unknown CX231xx video grabber
+ - 0572:5A3C
+ * - 1
+ - Conexant Hybrid TV - CARRAERA
+ - 0572:58A2
+ * - 2
+ - Conexant Hybrid TV - SHELBY
+ - 0572:58A1
+ * - 3
+ - Conexant Hybrid TV - RDE253S
+ - 0572:58A4
+ * - 4
+ - Conexant Hybrid TV - RDU253S
+ - 0572:58A5
+ * - 5
+ - Conexant VIDEO GRABBER
+ - 0572:58A6, 07ca:c039
+ * - 6
+ - Conexant Hybrid TV - rde 250
+ - 0572:589E
+ * - 7
+ - Conexant Hybrid TV - RDU 250
+ - 0572:58A0
+ * - 8
+ - Hauppauge EXETER
+ - 2040:b120, 2040:b140
+ * - 9
+ - Hauppauge USB Live 2
+ - 2040:c200
+ * - 10
+ - Pixelview PlayTV USB Hybrid
+ - 4000:4001
+ * - 11
+ - Pixelview Xcapture USB
+ - 1D19:6109, 4000:4001
+ * - 12
+ - Kworld UB430 USB Hybrid
+ - 1b80:e424
+ * - 13
+ - Iconbit Analog Stick U100 FM
+ - 1f4d:0237
+ * - 14
+ - Hauppauge WinTV USB2 FM (PAL)
+ - 2040:b110
+ * - 15
+ - Hauppauge WinTV USB2 FM (NTSC)
+ - 2040:b111
+ * - 16
+ - Elgato Video Capture V2
+ - 0fd9:0037
+ * - 17
+ - Geniatech OTG102
+ - 1f4d:0102
+ * - 18
+ - Kworld UB445 USB Hybrid
+ - 1b80:e421
+ * - 19
+ - Hauppauge WinTV 930C-HD (1113xx) / HVR-900H (111xxx) / PCTV QuatroStick 521e
+ - 2040:b130, 2040:b138, 2013:0259
+ * - 20
+ - Hauppauge WinTV 930C-HD (1114xx) / HVR-901H (1114xx) / PCTV QuatroStick 522e
+ - 2040:b131, 2040:b139, 2013:025e
+ * - 21
+ - Hauppauge WinTV-HVR-955Q (111401)
+ - 2040:b123, 2040:b124
+ * - 22
+ - Terratec Grabby
+ - 1f4d:0102
+ * - 23
+ - Evromedia USB Full Hybrid Full HD
+ - 1b80:d3b2
+ * - 24
+ - Astrometa T2hybrid
+ - 15f4:0135
+ * - 25
+ - The Imaging Source DFG/USB2pro
+ - 199e:8002
+ * - 26
+ - Hauppauge WinTV-HVR-935C
+ - 2040:b151
+ * - 27
+ - Hauppauge WinTV-HVR-975
+ - 2040:b150
diff --git a/Documentation/admin-guide/media/cx23885-cardlist.rst b/Documentation/admin-guide/media/cx23885-cardlist.rst
new file mode 100644
index 000000000000..c47514fead33
--- /dev/null
+++ b/Documentation/admin-guide/media/cx23885-cardlist.rst
@@ -0,0 +1,267 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+cx23885 cards list
+==================
+
+.. tabularcolumns:: |p{1.4cm}|p{11.1cm}|p{4.2cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 2 19 18
+ :stub-columns: 0
+
+ * - Card number
+ - Card name
+ - PCI subsystem IDs
+
+ * - 0
+ - UNKNOWN/GENERIC
+ - 0070:3400
+
+ * - 1
+ - Hauppauge WinTV-HVR1800lp
+ - 0070:7600
+
+ * - 2
+ - Hauppauge WinTV-HVR1800
+ - 0070:7800, 0070:7801, 0070:7809
+
+ * - 3
+ - Hauppauge WinTV-HVR1250
+ - 0070:7911
+
+ * - 4
+ - DViCO FusionHDTV5 Express
+ - 18ac:d500
+
+ * - 5
+ - Hauppauge WinTV-HVR1500Q
+ - 0070:7790, 0070:7797
+
+ * - 6
+ - Hauppauge WinTV-HVR1500
+ - 0070:7710, 0070:7717
+
+ * - 7
+ - Hauppauge WinTV-HVR1200
+ - 0070:71d1, 0070:71d3
+
+ * - 8
+ - Hauppauge WinTV-HVR1700
+ - 0070:8101
+
+ * - 9
+ - Hauppauge WinTV-HVR1400
+ - 0070:8010
+
+ * - 10
+ - DViCO FusionHDTV7 Dual Express
+ - 18ac:d618
+
+ * - 11
+ - DViCO FusionHDTV DVB-T Dual Express
+ - 18ac:db78
+
+ * - 12
+ - Leadtek Winfast PxDVR3200 H
+ - 107d:6681
+
+ * - 13
+ - Compro VideoMate E650F
+ - 185b:e800
+
+ * - 14
+ - TurboSight TBS 6920
+ - 6920:8888
+
+ * - 15
+ - TeVii S470
+ - d470:9022
+
+ * - 16
+ - DVBWorld DVB-S2 2005
+ - 0001:2005
+
+ * - 17
+ - NetUP Dual DVB-S2 CI
+ - 1b55:2a2c
+
+ * - 18
+ - Hauppauge WinTV-HVR1270
+ - 0070:2211
+
+ * - 19
+ - Hauppauge WinTV-HVR1275
+ - 0070:2215, 0070:221d, 0070:22f2
+
+ * - 20
+ - Hauppauge WinTV-HVR1255
+ - 0070:2251, 0070:22f1
+
+ * - 21
+ - Hauppauge WinTV-HVR1210
+ - 0070:2291, 0070:2295, 0070:2299, 0070:229d, 0070:22f0, 0070:22f3, 0070:22f4, 0070:22f5
+
+ * - 22
+ - Mygica X8506 DMB-TH
+ - 14f1:8651
+
+ * - 23
+ - Magic-Pro ProHDTV Extreme 2
+ - 14f1:8657
+
+ * - 24
+ - Hauppauge WinTV-HVR1850
+ - 0070:8541
+
+ * - 25
+ - Compro VideoMate E800
+ - 1858:e800
+
+ * - 26
+ - Hauppauge WinTV-HVR1290
+ - 0070:8551
+
+ * - 27
+ - Mygica X8558 PRO DMB-TH
+ - 14f1:8578
+
+ * - 28
+ - LEADTEK WinFast PxTV1200
+ - 107d:6f22
+
+ * - 29
+ - GoTView X5 3D Hybrid
+ - 5654:2390
+
+ * - 30
+ - NetUP Dual DVB-T/C-CI RF
+ - 1b55:e2e4
+
+ * - 31
+ - Leadtek Winfast PxDVR3200 H XC4000
+ - 107d:6f39
+
+ * - 32
+ - MPX-885
+ -
+
+ * - 33
+ - Mygica X8502/X8507 ISDB-T
+ - 14f1:8502
+
+ * - 34
+ - TerraTec Cinergy T PCIe Dual
+ - 153b:117e
+
+ * - 35
+ - TeVii S471
+ - d471:9022
+
+ * - 36
+ - Hauppauge WinTV-HVR1255
+ - 0070:2259
+
+ * - 37
+ - Prof Revolution DVB-S2 8000
+ - 8000:3034
+
+ * - 38
+ - Hauppauge WinTV-HVR4400/HVR5500
+ - 0070:c108, 0070:c138, 0070:c1f8
+
+ * - 39
+ - AVerTV Hybrid Express Slim HC81R
+ - 1461:d939
+
+ * - 40
+ - TurboSight TBS 6981
+ - 6981:8888
+
+ * - 41
+ - TurboSight TBS 6980
+ - 6980:8888
+
+ * - 42
+ - Leadtek Winfast PxPVR2200
+ - 107d:6f21
+
+ * - 43
+ - Hauppauge ImpactVCB-e
+ - 0070:7133, 0070:7137
+
+ * - 44
+ - DViCO FusionHDTV DVB-T Dual Express2
+ - 18ac:db98
+
+ * - 45
+ - DVBSky T9580
+ - 4254:9580
+
+ * - 46
+ - DVBSky T980C
+ - 4254:980c
+
+ * - 47
+ - DVBSky S950C
+ - 4254:950c
+
+ * - 48
+ - Technotrend TT-budget CT2-4500 CI
+ - 13c2:3013
+
+ * - 49
+ - DVBSky S950
+ - 4254:0950
+
+ * - 50
+ - DVBSky S952
+ - 4254:0952
+
+ * - 51
+ - DVBSky T982
+ - 4254:0982
+
+ * - 52
+ - Hauppauge WinTV-HVR5525
+ - 0070:f038
+
+ * - 53
+ - Hauppauge WinTV Starburst
+ - 0070:c12a
+
+ * - 54
+ - ViewCast 260e
+ - 1576:0260
+
+ * - 55
+ - ViewCast 460e
+ - 1576:0460
+
+ * - 56
+ - Hauppauge WinTV-QuadHD-DVB
+ - 0070:6a28, 0070:6b28
+
+ * - 57
+ - Hauppauge WinTV-QuadHD-ATSC
+ - 0070:6a18, 0070:6b18
+
+ * - 58
+ - Hauppauge WinTV-HVR-1265(161111)
+ - 0070:2a18
+
+ * - 59
+ - Hauppauge WinTV-Starburst2
+ - 0070:f02a
+
+ * - 60
+ - Hauppauge WinTV-QuadHD-DVB(885)
+ -
+
+ * - 61
+ - Hauppauge WinTV-QuadHD-ATSC(885)
+ -
+
+ * - 62
+ - AVerMedia CE310B
+ - 1461:3100
diff --git a/Documentation/admin-guide/media/cx88-cardlist.rst b/Documentation/admin-guide/media/cx88-cardlist.rst
new file mode 100644
index 000000000000..76dc9a14cf91
--- /dev/null
+++ b/Documentation/admin-guide/media/cx88-cardlist.rst
@@ -0,0 +1,383 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+CX88 cards list
+===============
+
+.. tabularcolumns:: |p{1.4cm}|p{11.1cm}|p{4.2cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 2 19 18
+ :stub-columns: 0
+
+ * - Card number
+ - Card name
+ - PCI subsystem IDs
+
+ * - 0
+ - UNKNOWN/GENERIC
+ -
+
+ * - 1
+ - Hauppauge WinTV 34xxx models
+ - 0070:3400, 0070:3401
+
+ * - 2
+ - GDI Black Gold
+ - 14c7:0106, 14c7:0107
+
+ * - 3
+ - PixelView
+ - 1554:4811
+
+ * - 4
+ - ATI TV Wonder Pro
+ - 1002:00f8, 1002:00f9
+
+ * - 5
+ - Leadtek Winfast 2000XP Expert
+ - 107d:6611, 107d:6613
+
+ * - 6
+ - AverTV Studio 303 (M126)
+ - 1461:000b
+
+ * - 7
+ - MSI TV-@nywhere Master
+ - 1462:8606
+
+ * - 8
+ - Leadtek Winfast DV2000
+ - 107d:6620, 107d:6621
+
+ * - 9
+ - Leadtek PVR 2000
+ - 107d:663b, 107d:663c, 107d:6632, 107d:6630, 107d:6638, 107d:6631, 107d:6637, 107d:663d
+
+ * - 10
+ - IODATA GV-VCP3/PCI
+ - 10fc:d003
+
+ * - 11
+ - Prolink PlayTV PVR
+ -
+
+ * - 12
+ - ASUS PVR-416
+ - 1043:4823, 1461:c111
+
+ * - 13
+ - MSI TV-@nywhere
+ -
+
+ * - 14
+ - KWorld/VStream XPert DVB-T
+ - 17de:08a6
+
+ * - 15
+ - DViCO FusionHDTV DVB-T1
+ - 18ac:db00
+
+ * - 16
+ - KWorld LTV883RF
+ -
+
+ * - 17
+ - DViCO FusionHDTV 3 Gold-Q
+ - 18ac:d810, 18ac:d800
+
+ * - 18
+ - Hauppauge Nova-T DVB-T
+ - 0070:9002, 0070:9001, 0070:9000
+
+ * - 19
+ - Conexant DVB-T reference design
+ - 14f1:0187
+
+ * - 20
+ - Provideo PV259
+ - 1540:2580
+
+ * - 21
+ - DViCO FusionHDTV DVB-T Plus
+ - 18ac:db10, 18ac:db11
+
+ * - 22
+ - pcHDTV HD3000 HDTV
+ - 7063:3000
+
+ * - 23
+ - digitalnow DNTV Live! DVB-T
+ - 17de:a8a6
+
+ * - 24
+ - Hauppauge WinTV 28xxx (Roslyn) models
+ - 0070:2801
+
+ * - 25
+ - Digital-Logic MICROSPACE Entertainment Center (MEC)
+ - 14f1:0342
+
+ * - 26
+ - IODATA GV/BCTV7E
+ - 10fc:d035
+
+ * - 27
+ - PixelView PlayTV Ultra Pro (Stereo)
+ -
+
+ * - 28
+ - DViCO FusionHDTV 3 Gold-T
+ - 18ac:d820
+
+ * - 29
+ - ADS Tech Instant TV DVB-T PCI
+ - 1421:0334
+
+ * - 30
+ - TerraTec Cinergy 1400 DVB-T
+ - 153b:1166
+
+ * - 31
+ - DViCO FusionHDTV 5 Gold
+ - 18ac:d500
+
+ * - 32
+ - AverMedia UltraTV Media Center PCI 550
+ - 1461:8011
+
+ * - 33
+ - Kworld V-Stream Xpert DVD
+ -
+
+ * - 34
+ - ATI HDTV Wonder
+ - 1002:a101
+
+ * - 35
+ - WinFast DTV1000-T
+ - 107d:665f
+
+ * - 36
+ - AVerTV 303 (M126)
+ - 1461:000a
+
+ * - 37
+ - Hauppauge Nova-S-Plus DVB-S
+ - 0070:9201, 0070:9202
+
+ * - 38
+ - Hauppauge Nova-SE2 DVB-S
+ - 0070:9200
+
+ * - 39
+ - KWorld DVB-S 100
+ - 17de:08b2, 1421:0341
+
+ * - 40
+ - Hauppauge WinTV-HVR1100 DVB-T/Hybrid
+ - 0070:9400, 0070:9402
+
+ * - 41
+ - Hauppauge WinTV-HVR1100 DVB-T/Hybrid (Low Profile)
+ - 0070:9800, 0070:9802
+
+ * - 42
+ - digitalnow DNTV Live! DVB-T Pro
+ - 1822:0025, 1822:0019
+
+ * - 43
+ - KWorld/VStream XPert DVB-T with cx22702
+ - 17de:08a1, 12ab:2300
+
+ * - 44
+ - DViCO FusionHDTV DVB-T Dual Digital
+ - 18ac:db50, 18ac:db54
+
+ * - 45
+ - KWorld HardwareMpegTV XPert
+ - 17de:0840, 1421:0305
+
+ * - 46
+ - DViCO FusionHDTV DVB-T Hybrid
+ - 18ac:db40, 18ac:db44
+
+ * - 47
+ - pcHDTV HD5500 HDTV
+ - 7063:5500
+
+ * - 48
+ - Kworld MCE 200 Deluxe
+ - 17de:0841
+
+ * - 49
+ - PixelView PlayTV P7000
+ - 1554:4813
+
+ * - 50
+ - NPG Tech Real TV FM Top 10
+ - 14f1:0842
+
+ * - 51
+ - WinFast DTV2000 H
+ - 107d:665e
+
+ * - 52
+ - Geniatech DVB-S
+ - 14f1:0084
+
+ * - 53
+ - Hauppauge WinTV-HVR3000 TriMode Analog/DVB-S/DVB-T
+ - 0070:1404, 0070:1400, 0070:1401, 0070:1402
+
+ * - 54
+ - Norwood Micro TV Tuner
+ -
+
+ * - 55
+ - Shenzhen Tungsten Ages Tech TE-DTV-250 / Swann OEM
+ - c180:c980
+
+ * - 56
+ - Hauppauge WinTV-HVR1300 DVB-T/Hybrid MPEG Encoder
+ - 0070:9600, 0070:9601, 0070:9602
+
+ * - 57
+ - ADS Tech Instant Video PCI
+ - 1421:0390
+
+ * - 58
+ - Pinnacle PCTV HD 800i
+ - 11bd:0051
+
+ * - 59
+ - DViCO FusionHDTV 5 PCI nano
+ - 18ac:d530
+
+ * - 60
+ - Pinnacle Hybrid PCTV
+ - 12ab:1788
+
+ * - 61
+ - Leadtek TV2000 XP Global
+ - 107d:6f18, 107d:6618, 107d:6619
+
+ * - 62
+ - PowerColor RA330
+ - 14f1:ea3d
+
+ * - 63
+ - Geniatech X8000-MT DVBT
+ - 14f1:8852
+
+ * - 64
+ - DViCO FusionHDTV DVB-T PRO
+ - 18ac:db30
+
+ * - 65
+ - DViCO FusionHDTV 7 Gold
+ - 18ac:d610
+
+ * - 66
+ - Prolink Pixelview MPEG 8000GT
+ - 1554:4935
+
+ * - 67
+ - Kworld PlusTV HD PCI 120 (ATSC 120)
+ - 17de:08c1
+
+ * - 68
+ - Hauppauge WinTV-HVR4000 DVB-S/S2/T/Hybrid
+ - 0070:6900, 0070:6904, 0070:6902
+
+ * - 69
+ - Hauppauge WinTV-HVR4000(Lite) DVB-S/S2
+ - 0070:6905, 0070:6906
+
+ * - 70
+ - TeVii S460 DVB-S/S2
+ - d460:9022
+
+ * - 71
+ - Omicom SS4 DVB-S/S2 PCI
+ - A044:2011
+
+ * - 72
+ - TBS 8920 DVB-S/S2
+ - 8920:8888
+
+ * - 73
+ - TeVii S420 DVB-S
+ - d420:9022
+
+ * - 74
+ - Prolink Pixelview Global Extreme
+ - 1554:4976
+
+ * - 75
+ - PROF 7300 DVB-S/S2
+ - B033:3033
+
+ * - 76
+ - SATTRADE ST4200 DVB-S/S2
+ - b200:4200
+
+ * - 77
+ - TBS 8910 DVB-S
+ - 8910:8888
+
+ * - 78
+ - Prof 6200 DVB-S
+ - b022:3022
+
+ * - 79
+ - Terratec Cinergy HT PCI MKII
+ - 153b:1177
+
+ * - 80
+ - Hauppauge WinTV-IR Only
+ - 0070:9290
+
+ * - 81
+ - Leadtek WinFast DTV1800 Hybrid
+ - 107d:6654
+
+ * - 82
+ - WinFast DTV2000 H rev. J
+ - 107d:6f2b
+
+ * - 83
+ - Prof 7301 DVB-S/S2
+ - b034:3034
+
+ * - 84
+ - Samsung SMT 7020 DVB-S
+ - 18ac:dc00, 18ac:dccd
+
+ * - 85
+ - Twinhan VP-1027 DVB-S
+ - 1822:0023
+
+ * - 86
+ - TeVii S464 DVB-S/S2
+ - d464:9022
+
+ * - 87
+ - Leadtek WinFast DTV2000 H PLUS
+ - 107d:6f42
+
+ * - 88
+ - Leadtek WinFast DTV1800 H (XC4000)
+ - 107d:6f38
+
+ * - 89
+ - Leadtek TV2000 XP Global (SC4100)
+ - 107d:6f36
+
+ * - 90
+ - Leadtek TV2000 XP Global (XC4100)
+ - 107d:6f43
+
+ * - 91
+ - NotOnlyTV LV3H
+ -
diff --git a/Documentation/admin-guide/media/cx88.rst b/Documentation/admin-guide/media/cx88.rst
new file mode 100644
index 000000000000..e4badb18199d
--- /dev/null
+++ b/Documentation/admin-guide/media/cx88.rst
@@ -0,0 +1,58 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+The cx88 driver
+===============
+
+Author: Gerd Hoffmann
+
+This is a v4l2 device driver for the cx2388x chip.
+
+
+Current status
+--------------
+
+video
+ - Works.
+ - Overlay isn't supported.
+
+audio
+ - Works. The TV standard detection is made by the driver, as the
+ hardware has bugs to auto-detect.
+ - audio data dma (i.e. recording without loopback cable to the
+ sound card) is supported via cx88-alsa.
+
+vbi
+ - Works.
+
+
+How to add support for new cards
+--------------------------------
+
+The driver needs some config info for the TV cards. This stuff is in
+cx88-cards.c. If the driver doesn't work well you likely need a new
+entry for your card in that file. Check the kernel log (using dmesg)
+to see whenever the driver knows your card or not. There is a line
+like this one:
+
+.. code-block:: none
+
+ cx8800[0]: subsystem: 0070:3400, board: Hauppauge WinTV \
+ 34xxx models [card=1,autodetected]
+
+If your card is listed as "board: UNKNOWN/GENERIC" it is unknown to
+the driver. What to do then?
+
+1) Try upgrading to the latest snapshot, maybe it has been added
+ meanwhile.
+2) You can try to create a new entry yourself, have a look at
+ cx88-cards.c. If that worked, mail me your changes as unified
+ diff ("diff -u").
+3) Or you can mail me the config information. We need at least the
+ following information to add the card:
+
+ - the PCI Subsystem ID ("0070:3400" from the line above,
+ "lspci -v" output is fine too).
+ - the tuner type used by the card. You can try to find one by
+ trial-and-error using the tuner=<n> insmod option. If you
+ know which one the card has you can also have a look at the
+ list in CARDLIST.tuner
diff --git a/Documentation/admin-guide/media/dvb-drivers.rst b/Documentation/admin-guide/media/dvb-drivers.rst
new file mode 100644
index 000000000000..66fa4edd0606
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-drivers.rst
@@ -0,0 +1,15 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================================
+Digital TV driver-specific documentation
+========================================
+
+.. toctree::
+ :maxdepth: 2
+
+ avermedia
+ bt8xx
+ lmedm04
+ opera-firmware
+ technisat
+ ttusb-dec
diff --git a/Documentation/admin-guide/media/dvb-usb-a800-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-a800-cardlist.rst
new file mode 100644
index 000000000000..2ec8bb8230ff
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-a800-cardlist.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-a800 cards list
+=======================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - AVerMedia AverTV DVB-T USB 2.0 (A800)
+ - 07ca:a800, 07ca:a801
diff --git a/Documentation/admin-guide/media/dvb-usb-af9005-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-af9005-cardlist.rst
new file mode 100644
index 000000000000..285160ee82e8
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-af9005-cardlist.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-af9005 cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - Afatech DVB-T USB1.1 stick
+ - 15a4:9020
+ * - Ansonic DVB-T USB1.1 stick
+ - 10b9:6000
+ * - TerraTec Cinergy T USB XE
+ - 0ccd:0055
diff --git a/Documentation/admin-guide/media/dvb-usb-af9015-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-af9015-cardlist.rst
new file mode 100644
index 000000000000..c557994f796a
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-af9015-cardlist.rst
@@ -0,0 +1,80 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-af9015 cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - AVerMedia A309
+ - 07ca:a309
+ * - AVerMedia AVerTV DVB-T Volar X
+ - 07ca:a815
+ * - Afatech AF9015 reference design
+ - 15a4:9015, 15a4:9016
+ * - AverMedia AVerTV Red HD+ (A850T)
+ - 07ca:850b
+ * - AverMedia AVerTV Volar Black HD (A850)
+ - 07ca:850a
+ * - AverMedia AVerTV Volar GPS 805 (A805)
+ - 07ca:a805
+ * - AverMedia AVerTV Volar M (A815Mac)
+ - 07ca:815a
+ * - Conceptronic USB2.0 DVB-T CTVDIGRCU V3.0
+ - 1b80:e397
+ * - DigitalNow TinyTwin
+ - 13d3:3226
+ * - DigitalNow TinyTwin v2
+ - 1b80:e402
+ * - DigitalNow TinyTwin v3
+ - 1f4d:9016
+ * - Fujitsu-Siemens Slim Mobile USB DVB-T
+ - 07ca:8150
+ * - Genius TVGo DVB-T03
+ - 0458:4012
+ * - KWorld Digital MC-810
+ - 1b80:c810
+ * - KWorld PlusTV DVB-T PCI Pro Card (DVB-T PC160-T)
+ - 1b80:c161
+ * - KWorld PlusTV Dual DVB-T PCI (DVB-T PC160-2T)
+ - 1b80:c160
+ * - KWorld PlusTV Dual DVB-T Stick (DVB-T 399U)
+ - 1b80:e399, 1b80:e400
+ * - KWorld USB DVB-T Stick Mobile (UB383-T)
+ - 1b80:e383
+ * - KWorld USB DVB-T TV Stick II (VS-DVB-T 395U)
+ - 1b80:e396, 1b80:e39b, 1b80:e395, 1b80:e39a
+ * - Leadtek WinFast DTV Dongle Gold
+ - 0413:6029
+ * - Leadtek WinFast DTV2000DS
+ - 0413:6a04
+ * - MSI DIGIVOX Duo
+ - 1462:8801
+ * - MSI Digi VOX mini III
+ - 1462:8807
+ * - Pinnacle PCTV 71e
+ - 2304:022b
+ * - Sveon STV20 Tuner USB DVB-T HDTV
+ - 1b80:e39d
+ * - Sveon STV22 Dual USB DVB-T Tuner HDTV
+ - 1b80:e401
+ * - Telestar Starstick 2
+ - 10b9:8000
+ * - TerraTec Cinergy T Stick Dual RC
+ - 0ccd:0099
+ * - TerraTec Cinergy T Stick RC
+ - 0ccd:0097
+ * - TerraTec Cinergy T USB XE
+ - 0ccd:0069
+ * - TrekStor DVB-T USB Stick
+ - 15a4:901b
+ * - TwinHan AzureWave AD-TU700(704J)
+ - 13d3:3237
+ * - Xtensions XD-380
+ - 1ae7:0381
diff --git a/Documentation/admin-guide/media/dvb-usb-af9035-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-af9035-cardlist.rst
new file mode 100644
index 000000000000..63e4170777c4
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-af9035-cardlist.rst
@@ -0,0 +1,74 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-af9035 cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - AVerMedia AVerTV Volar HD/PRO (A835)
+ - 07ca:a835, 07ca:b835
+ * - AVerMedia HD Volar (A867)
+ - 07ca:1867, 07ca:a867, 07ca:0337
+ * - AVerMedia TD310 DVB-T2
+ - 07ca:1871
+ * - AVerMedia Twinstar (A825)
+ - 07ca:0825
+ * - Afatech AF9035 reference design
+ - 15a4:9035, 15a4:1000, 15a4:1001, 15a4:1002, 15a4:1003
+ * - Asus U3100Mini Plus
+ - 0b05:1779
+ * - Avermedia A835B(1835)
+ - 07ca:1835
+ * - Avermedia A835B(2835)
+ - 07ca:2835
+ * - Avermedia A835B(3835)
+ - 07ca:3835
+ * - Avermedia A835B(4835)
+ - 07ca:4835
+ * - Avermedia AverTV Volar HD 2 (TD110)
+ - 07ca:a110
+ * - Avermedia H335
+ - 07ca:0335
+ * - Digital Dual TV Receiver CTVDIGDUAL_V2
+ - 1b80:e410
+ * - EVOLVEO XtraTV stick
+ - 1f4d:a115
+ * - Hauppauge WinTV-MiniStick 2
+ - 2040:f900
+ * - ITE 9135 Generic
+ - 048d:9135
+ * - ITE 9135(9005) Generic
+ - 048d:9005
+ * - ITE 9135(9006) Generic
+ - 048d:9006
+ * - ITE 9303 Generic
+ - 048d:9306
+ * - Kworld UB499-2T T09
+ - 1b80:e409
+ * - Leadtek WinFast DTV Dongle Dual
+ - 0413:6a05
+ * - Logilink VG0022A
+ - 1d19:0100
+ * - PCTV AndroiDTV (78e)
+ - 2013:025a
+ * - PCTV microStick (79e)
+ - 2013:0262
+ * - Sveon STV22 Dual DVB-T HDTV
+ - 1b80:e411
+ * - TerraTec Cinergy T Stick
+ - 0ccd:0093
+ * - TerraTec Cinergy T Stick (rev. 2)
+ - 0ccd:00aa
+ * - TerraTec Cinergy T Stick Dual RC (rev. 2)
+ - 0ccd:0099
+ * - TerraTec Cinergy TC2 Stick
+ - 0ccd:10b2
+ * - TerraTec T1
+ - 0ccd:10ae
diff --git a/Documentation/admin-guide/media/dvb-usb-anysee-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-anysee-cardlist.rst
new file mode 100644
index 000000000000..1fb5d22a00dc
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-anysee-cardlist.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-anysee cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - Anysee
+ - 04b4:861f, 1c73:861f
diff --git a/Documentation/admin-guide/media/dvb-usb-au6610-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-au6610-cardlist.rst
new file mode 100644
index 000000000000..02b2b742710b
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-au6610-cardlist.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-au6610 cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - Sigmatek DVB-110
+ - 058f:6610
diff --git a/Documentation/admin-guide/media/dvb-usb-az6007-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-az6007-cardlist.rst
new file mode 100644
index 000000000000..db27eb47cc8f
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-az6007-cardlist.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-az6007 cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - Azurewave 6007
+ - 13d3:0ccd
+ * - Technisat CableStar Combo HD CI
+ - 14f7:0003
+ * - Terratec H7
+ - 0ccd:10b4, 0ccd:10a3
diff --git a/Documentation/admin-guide/media/dvb-usb-az6027-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-az6027-cardlist.rst
new file mode 100644
index 000000000000..6d8575e9d90c
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-az6027-cardlist.rst
@@ -0,0 +1,24 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-az6027 cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - AZUREWAVE DVB-S/S2 USB2.0 (AZ6027)
+ - 13d3:3275
+ * - Elgato EyeTV Sat
+ - 0fd9:002a, 0fd9:0025, 0fd9:0036
+ * - TERRATEC S7
+ - 0ccd:10a4
+ * - TERRATEC S7 MKII
+ - 0ccd:10ac
+ * - Technisat SkyStar USB 2 HD CI
+ - 14f7:0001, 14f7:0002
diff --git a/Documentation/admin-guide/media/dvb-usb-ce6230-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-ce6230-cardlist.rst
new file mode 100644
index 000000000000..09750e8ac139
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-ce6230-cardlist.rst
@@ -0,0 +1,18 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-ce6230 cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - AVerMedia A310 USB 2.0 DVB-T tuner
+ - 07ca:a310
+ * - Intel CE9500 reference design
+ - 8086:9500
diff --git a/Documentation/admin-guide/media/dvb-usb-cinergyT2-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-cinergyT2-cardlist.rst
new file mode 100644
index 000000000000..0ee753929eca
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-cinergyT2-cardlist.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-cinergyT2 cards list
+============================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - TerraTec/qanu USB2.0 Highspeed DVB-T Receiver
+ - 0ccd:0x0038
diff --git a/Documentation/admin-guide/media/dvb-usb-cxusb-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-cxusb-cardlist.rst
new file mode 100644
index 000000000000..a73f15d1acf5
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-cxusb-cardlist.rst
@@ -0,0 +1,40 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-cxusb cards list
+========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - AVerMedia AVerTVHD Volar (A868R)
+ -
+ * - Conexant DMB-TH Stick
+ -
+ * - DViCO FusionHDTV DVB-T Dual Digital 2
+ -
+ * - DViCO FusionHDTV DVB-T Dual Digital 4
+ -
+ * - DViCO FusionHDTV DVB-T Dual Digital 4 (rev 2)
+ -
+ * - DViCO FusionHDTV DVB-T Dual USB
+ -
+ * - DViCO FusionHDTV DVB-T NANO2
+ -
+ * - DViCO FusionHDTV DVB-T USB (LGZ201)
+ -
+ * - DViCO FusionHDTV DVB-T USB (TH7579)
+ -
+ * - DViCO FusionHDTV5 USB Gold
+ -
+ * - DigitalNow DVB-T Dual USB
+ -
+ * - Medion MD95700 (MDUSBTV-HYBRID)
+ -
+ * - Mygica D689 DMB-TH
+ -
diff --git a/Documentation/admin-guide/media/dvb-usb-dib0700-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-dib0700-cardlist.rst
new file mode 100644
index 000000000000..4b76b6f1089b
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-dib0700-cardlist.rst
@@ -0,0 +1,162 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-dib0700 cards list
+==========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - ASUS My Cinema U3000 Mini DVBT Tuner
+ - 0b05:171f
+ * - ASUS My Cinema U3100 Mini DVBT Tuner
+ - 0b05:173f
+ * - AVerMedia AVerTV DVB-T Express
+ - 07ca:b568
+ * - AVerMedia AVerTV DVB-T Volar
+ - 07ca:a807, 07ca:b808
+ * - Artec T14BR DVB-T
+ - 05d8:810f
+ * - Asus My Cinema-U3000Hybrid
+ - 0b05:1736
+ * - Compro Videomate U500
+ - 185b:1e78, 185b:1e80
+ * - DiBcom NIM7090 reference design
+ - 10b8:1bb2
+ * - DiBcom NIM8096MD reference design
+ - 10b8:1fa8
+ * - DiBcom NIM9090MD reference design
+ - 10b8:2384
+ * - DiBcom STK7070P reference design
+ - 10b8:1ebc
+ * - DiBcom STK7070PD reference design
+ - 10b8:1ebe
+ * - DiBcom STK7700D reference design
+ - 10b8:1ef0
+ * - DiBcom STK7700P reference design
+ - 10b8:1e14, 10b8:1e78
+ * - DiBcom STK7770P reference design
+ - 10b8:1e80
+ * - DiBcom STK807xP reference design
+ - 10b8:1f90
+ * - DiBcom STK807xPVR reference design
+ - 10b8:1f98
+ * - DiBcom STK8096-PVR reference design
+ - 2013:1faa, 10b8:1faa
+ * - DiBcom STK8096GP reference design
+ - 10b8:1fa0
+ * - DiBcom STK9090M reference design
+ - 10b8:2383
+ * - DiBcom TFE7090PVR reference design
+ - 10b8:1bb4
+ * - DiBcom TFE7790P reference design
+ - 10b8:1e6e
+ * - DiBcom TFE8096P reference design
+ - 10b8:1f9C
+ * - Elgato EyeTV DTT
+ - 0fd9:0021
+ * - Elgato EyeTV DTT rev. 2
+ - 0fd9:003f
+ * - Elgato EyeTV Diversity
+ - 0fd9:0011
+ * - Elgato EyeTV Dtt Dlx PD378S
+ - 0fd9:0020
+ * - EvolutePC TVWay+
+ - 1e59:0002
+ * - Gigabyte U7000
+ - 1044:7001
+ * - Gigabyte U8000-RH
+ - 1044:7002
+ * - Hama DVB=T Hybrid USB Stick
+ - 147f:2758
+ * - Hauppauge ATSC MiniCard (B200)
+ - 2040:b200
+ * - Hauppauge ATSC MiniCard (B210)
+ - 2040:b210
+ * - Hauppauge Nova-T 500 Dual DVB-T
+ - 2040:9941, 2040:9950
+ * - Hauppauge Nova-T MyTV.t
+ - 2040:7080
+ * - Hauppauge Nova-T Stick
+ - 2040:7050, 2040:7060, 2040:7070
+ * - Hauppauge Nova-TD Stick (52009)
+ - 2040:5200
+ * - Hauppauge Nova-TD Stick/Elgato Eye-TV Diversity
+ - 2040:9580
+ * - Hauppauge Nova-TD-500 (84xxx)
+ - 2040:8400
+ * - Leadtek WinFast DTV Dongle H
+ - 0413:60f6
+ * - Leadtek Winfast DTV Dongle (STK7700P based)
+ - 0413:6f00, 0413:6f01
+ * - Medion CTX1921 DVB-T USB
+ - 1660:1921
+ * - Microsoft Xbox One Digital TV Tuner
+ - 045e:02d5
+ * - PCTV 2002e
+ - 2013:025c
+ * - PCTV 2002e SE
+ - 2013:025d
+ * - Pinnacle Expresscard 320cx
+ - 2304:022e
+ * - Pinnacle PCTV 2000e
+ - 2304:022c
+ * - Pinnacle PCTV 282e
+ - 2013:0248, 2304:0248
+ * - Pinnacle PCTV 340e HD Pro USB Stick
+ - 2304:023d
+ * - Pinnacle PCTV 72e
+ - 2304:0236
+ * - Pinnacle PCTV 73A
+ - 2304:0243
+ * - Pinnacle PCTV 73e
+ - 2304:0237
+ * - Pinnacle PCTV 73e SE
+ - 2013:0245, 2304:0245
+ * - Pinnacle PCTV DVB-T Flash Stick
+ - 2304:0228
+ * - Pinnacle PCTV Dual DVB-T Diversity Stick
+ - 2304:0229
+ * - Pinnacle PCTV HD Pro USB Stick
+ - 2304:023a
+ * - Pinnacle PCTV HD USB Stick
+ - 2304:023b
+ * - Pinnacle PCTV Hybrid Stick Solo
+ - 2304:023e
+ * - Prolink Pixelview SBTVD
+ - 1554:5010
+ * - Sony PlayTV
+ - 1415:0003
+ * - TechniSat AirStar TeleStick 2
+ - 14f7:0004
+ * - Terratec Cinergy DT USB XS Diversity/ T5
+ - 0ccd:0081, 0ccd:10a1
+ * - Terratec Cinergy DT XS Diversity
+ - 0ccd:005a
+ * - Terratec Cinergy HT Express
+ - 0ccd:0060
+ * - Terratec Cinergy HT USB XE
+ - 0ccd:0058
+ * - Terratec Cinergy T Express
+ - 0ccd:0062
+ * - Terratec Cinergy T USB XXS (HD)/ T3
+ - 0ccd:0078, 0ccd:10a0, 0ccd:00ab
+ * - Uniwill STK7700P based (Hama and others)
+ - 1584:6003
+ * - YUAN High-Tech DiBcom STK7700D
+ - 1164:1e8c
+ * - YUAN High-Tech MC770
+ - 1164:0871
+ * - YUAN High-Tech STK7700D
+ - 1164:1efc
+ * - YUAN High-Tech STK7700PH
+ - 1164:1f08
+ * - Yuan EC372S
+ - 1164:1edc
+ * - Yuan PD378S
+ - 1164:2edc
diff --git a/Documentation/admin-guide/media/dvb-usb-dibusb-mb-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-dibusb-mb-cardlist.rst
new file mode 100644
index 000000000000..f25a54721f0d
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-dibusb-mb-cardlist.rst
@@ -0,0 +1,42 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-dibusb-mb cards list
+============================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - AVerMedia AverTV DVBT USB1.1
+ - 14aa:0001, 14aa:0002
+ * - Artec T1 USB1.1 TVBOX with AN2135
+ - 05d8:8105, 05d8:8106
+ * - Artec T1 USB1.1 TVBOX with AN2235
+ - 05d8:8107, 05d8:8108
+ * - Artec T1 USB1.1 TVBOX with AN2235 (faulty USB IDs)
+ - 0547:2235
+ * - Artec T1 USB2.0
+ - 05d8:8109, 05d8:810a
+ * - Compro Videomate DVB-U2000 - DVB-T USB1.1 (please confirm to linux-dvb)
+ - 185b:d000, 145f:010c, 185b:d001
+ * - DiBcom USB1.1 DVB-T reference design (MOD3000)
+ - 10b8:0bb8, 10b8:0bb9
+ * - Grandtec USB1.1 DVB-T
+ - 5032:0fa0, 5032:0bb8, 5032:0fa1, 5032:0bb9
+ * - KWorld V-Stream XPERT DTV - DVB-T USB1.1
+ - eb1a:17de, eb1a:17df
+ * - KWorld Xpert DVB-T USB2.0
+ - eb2a:17de
+ * - KWorld/ADSTech Instant DVB-T USB2.0
+ - 06e1:a333, 06e1:a334
+ * - TwinhanDTV USB-Ter USB1.1 / Magic Box I / HAMA USB1.1 DVB-T device
+ - 13d3:3201, 1822:3201, 13d3:3202, 1822:3202
+ * - Unknown USB1.1 DVB-T device ???? please report the name to the author
+ - 1025:005e, 1025:005f
+ * - VideoWalker DVB-T USB
+ - 0458:701e, 0458:701f
diff --git a/Documentation/admin-guide/media/dvb-usb-dibusb-mc-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-dibusb-mc-cardlist.rst
new file mode 100644
index 000000000000..8d03bae0e084
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-dibusb-mc-cardlist.rst
@@ -0,0 +1,30 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-dibusb-mc cards list
+============================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - Artec T1 USB2.0 TVBOX (please check the warm ID)
+ - 05d8:8109, 05d8:810a
+ * - Artec T14 - USB2.0 DVB-T
+ - 05d8:810b, 05d8:810c
+ * - DiBcom USB2.0 DVB-T reference design (MOD3000P)
+ - 10b8:0bc6, 10b8:0bc7
+ * - GRAND - USB2.0 DVB-T adapter
+ - 5032:0bc6, 5032:0bc7
+ * - Humax/Coex DVB-T USB Stick 2.0 High Speed
+ - 10b9:5000, 10b9:5001
+ * - LITE-ON USB2.0 DVB-T Tuner
+ - 04ca:f000, 04ca:f001
+ * - Leadtek - USB2.0 Winfast DTV dongle
+ - 0413:6025, 0413:6026
+ * - MSI Digivox Mini SL
+ - eb1a:e360, eb1a:e361
diff --git a/Documentation/admin-guide/media/dvb-usb-digitv-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-digitv-cardlist.rst
new file mode 100644
index 000000000000..2b4d8325e8e9
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-digitv-cardlist.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-digitv cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - Nebula Electronics uDigiTV DVB-T USB2.0)
+ - 0547:0201
diff --git a/Documentation/admin-guide/media/dvb-usb-dtt200u-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-dtt200u-cardlist.rst
new file mode 100644
index 000000000000..b4150a7bf31f
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-dtt200u-cardlist.rst
@@ -0,0 +1,22 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-dtt200u cards list
+==========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - WideView WT-220U PenType Receiver (Miglia)
+ - 18f3:0220
+ * - WideView WT-220U PenType Receiver (Typhoon/Freecom)
+ - 14aa:0222, 14aa:0220, 14aa:0221, 14aa:0225, 14aa:0226
+ * - WideView WT-220U PenType Receiver (based on ZL353)
+ - 14aa:022a, 14aa:022b
+ * - WideView/Yuan/Yakumo/Hama/Typhoon DVB-T USB2.0 (WT-200U)
+ - 14aa:0201, 14aa:0301
diff --git a/Documentation/admin-guide/media/dvb-usb-dtv5100-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-dtv5100-cardlist.rst
new file mode 100644
index 000000000000..91d6e35e6f9d
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-dtv5100-cardlist.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-dtv5100 cards list
+==========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - AME DTV-5100 USB2.0 DVB-T
+ - 0x06be:0xa232
diff --git a/Documentation/admin-guide/media/dvb-usb-dvbsky-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-dvbsky-cardlist.rst
new file mode 100644
index 000000000000..9f7b619f35f7
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-dvbsky-cardlist.rst
@@ -0,0 +1,42 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-dvbsky cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - DVBSky S960/S860
+ - 0572:6831
+ * - DVBSky S960CI
+ - 0572:960c
+ * - DVBSky T330
+ - 0572:0320
+ * - DVBSky T680CI
+ - 0572:680c
+ * - MyGica Mini DVB-(T/T2/C) USB Stick T230
+ - 0572:c688
+ * - MyGica Mini DVB-(T/T2/C) USB Stick T230C
+ - 0572:c689
+ * - MyGica Mini DVB-(T/T2/C) USB Stick T230C Lite
+ - 0572:c699
+ * - MyGica Mini DVB-(T/T2/C) USB Stick T230C v2
+ - 0572:c68a
+ * - TechnoTrend TT-connect CT2-4650 CI
+ - 0b48:3012
+ * - TechnoTrend TT-connect CT2-4650 CI v1.1
+ - 0b48:3015
+ * - TechnoTrend TT-connect S2-4650 CI
+ - 0b48:3017
+ * - TechnoTrend TVStick CT2-4400
+ - 0b48:3014
+ * - Terratec Cinergy S2 Rev.4
+ - 0ccd:0105
+ * - Terratec H7 Rev.4
+ - 0ccd:10a5
diff --git a/Documentation/admin-guide/media/dvb-usb-dw2102-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-dw2102-cardlist.rst
new file mode 100644
index 000000000000..e39bc8e4bffe
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-dw2102-cardlist.rst
@@ -0,0 +1,56 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-dw2102 cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - DVBWorld DVB-C 3101 USB2.0
+ - 04b4:3101
+ * - DVBWorld DVB-S 2101 USB2.0
+ - 04b4:0x2101
+ * - DVBWorld DVB-S 2102 USB2.0
+ - 04b4:2102
+ * - DVBWorld DW2104 USB2.0
+ - 04b4:2104
+ * - GOTVIEW Satellite HD
+ - 0x1FE1:5456
+ * - Geniatech T220 DVB-T/T2 USB2.0
+ - 0x1f4d:0xD220
+ * - SU3000HD DVB-S USB2.0
+ - 0x1f4d:0x3000
+ * - TeVii S482 (tuner 1)
+ - 0x9022:0xd483
+ * - TeVii S482 (tuner 2)
+ - 0x9022:0xd484
+ * - TeVii S630 USB
+ - 0x9022:d630
+ * - TeVii S650 USB2.0
+ - 0x9022:d650
+ * - TeVii S662
+ - 0x9022:d662
+ * - TechnoTrend TT-connect S2-4600
+ - 0b48:3011
+ * - TerraTec Cinergy S USB
+ - 0ccd:0064
+ * - Terratec Cinergy S2 PCIe Dual Port 1
+ - 153b:1181
+ * - Terratec Cinergy S2 PCIe Dual Port 2
+ - 153b:1182
+ * - Terratec Cinergy S2 USB BOX
+ - 0ccd:0x0105
+ * - Terratec Cinergy S2 USB HD
+ - 0ccd:00a8
+ * - Terratec Cinergy S2 USB HD Rev.2
+ - 0ccd:00b0
+ * - Terratec Cinergy S2 USB HD Rev.3
+ - 0ccd:0102
+ * - X3M TV SPC1400HD PCI
+ - 0x1f4d:0x3100
diff --git a/Documentation/admin-guide/media/dvb-usb-ec168-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-ec168-cardlist.rst
new file mode 100644
index 000000000000..a3660dfa5dcc
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-ec168-cardlist.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-ec168 cards list
+========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - E3C EC168 reference design
+ - 18b4:1689, 18b4:fffa, 18b4:fffb, 18b4:1001, 18b4:1002
diff --git a/Documentation/admin-guide/media/dvb-usb-gl861-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-gl861-cardlist.rst
new file mode 100644
index 000000000000..5ec62fe03d64
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-gl861-cardlist.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-gl861 cards list
+========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - 774 Friio White ISDB-T USB2.0
+ - 7a69:0001
+ * - A-LINK DTU DVB-T USB2.0
+ - 05e3:f170
+ * - MSI Mega Sky 55801 DVB-T USB2.0
+ - 0db0:5581
diff --git a/Documentation/admin-guide/media/dvb-usb-gp8psk-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-gp8psk-cardlist.rst
new file mode 100644
index 000000000000..150fa9f7810a
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-gp8psk-cardlist.rst
@@ -0,0 +1,22 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-gp8psk cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - Genpix 8PSK-to-USB2 Rev.1 DVB-S receiver
+ - 09c0:0200, 09c0:0201
+ * - Genpix 8PSK-to-USB2 Rev.2 DVB-S receiver
+ - 09c0:0202
+ * - Genpix SkyWalker-1 DVB-S receiver
+ - 09c0:0203
+ * - Genpix SkyWalker-2 DVB-S receiver
+ - 09c0:0206
diff --git a/Documentation/admin-guide/media/dvb-usb-lmedm04-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-lmedm04-cardlist.rst
new file mode 100644
index 000000000000..2050fbf03d4a
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-lmedm04-cardlist.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-lmedm04 cards list
+==========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - DM04_LME2510C_DVB-S
+ - 3344:1120
+ * - DM04_LME2510C_DVB-S RS2000
+ - 3344:22f0
+ * - DM04_LME2510_DVB-S
+ - 3344:1122
diff --git a/Documentation/admin-guide/media/dvb-usb-m920x-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-m920x-cardlist.rst
new file mode 100644
index 000000000000..73145940b5c5
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-m920x-cardlist.rst
@@ -0,0 +1,26 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-m920x cards list
+========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - DTV-DVB UDTT7049
+ - 13d3:3219
+ * - Dposh DVB-T USB2.0
+ - 1498:9206, 1498:a090
+ * - LifeView TV Walker Twin DVB-T USB2.0
+ - 10fd:0514, 10fd:0513
+ * - MSI DIGI VOX mini II DVB-T USB2.0
+ - 10fd:1513
+ * - MSI Mega Sky 580 DVB-T USB2.0
+ - 0db0:5580
+ * - Pinnacle PCTV 310e
+ - 13d3:3211
diff --git a/Documentation/admin-guide/media/dvb-usb-mxl111sf-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-mxl111sf-cardlist.rst
new file mode 100644
index 000000000000..6974801c43b6
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-mxl111sf-cardlist.rst
@@ -0,0 +1,36 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-mxl111sf cards list
+===========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - HCW 117xxx
+ - 2040:b702
+ * - HCW 126xxx
+ - 2040:c602, 2040:c60a
+ * - Hauppauge 117xxx ATSC+
+ - 2040:b700, 2040:b703, 2040:b753, 2040:b763, 2040:b757, 2040:b767
+ * - Hauppauge 117xxx DVBT
+ - 2040:b704, 2040:b764
+ * - Hauppauge 126xxx
+ - 2040:c612, 2040:c61a
+ * - Hauppauge 126xxx ATSC
+ - 2040:c601, 2040:c609, 2040:b701
+ * - Hauppauge 126xxx ATSC+
+ - 2040:c600, 2040:c603, 2040:c60b, 2040:c653, 2040:c65b
+ * - Hauppauge 126xxx DVBT
+ - 2040:c604, 2040:c60c
+ * - Hauppauge 138xxx DVBT
+ - 2040:d854, 2040:d864, 2040:d8d4, 2040:d8e4
+ * - Hauppauge Mercury
+ - 2040:d853, 2040:d863, 2040:d8d3, 2040:d8e3, 2040:d8ff
+ * - Hauppauge WinTV-Aero-M
+ - 2040:c613, 2040:c61b
diff --git a/Documentation/admin-guide/media/dvb-usb-nova-t-usb2-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-nova-t-usb2-cardlist.rst
new file mode 100644
index 000000000000..e295f912a585
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-nova-t-usb2-cardlist.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-nova-t-usb2 cards list
+==============================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - Hauppauge WinTV-NOVA-T usb2
+ - 2040:9300, 2040:9301
diff --git a/Documentation/admin-guide/media/dvb-usb-opera1-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-opera1-cardlist.rst
new file mode 100644
index 000000000000..362245f5a46a
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-opera1-cardlist.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-opera1 cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - Opera1 DVB-S USB2.0
+ - 04b4:2830, 695c:3829
diff --git a/Documentation/admin-guide/media/dvb-usb-pctv452e-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-pctv452e-cardlist.rst
new file mode 100644
index 000000000000..886d8cc18acb
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-pctv452e-cardlist.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-pctv452e cards list
+===========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - PCTV HDTV USB
+ - 2304:021f
+ * - Technotrend TT Connect S2-3600
+ - 0b48:3007
+ * - Technotrend TT Connect S2-3650-CI
+ - 0b48:300a
diff --git a/Documentation/admin-guide/media/dvb-usb-rtl28xxu-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-rtl28xxu-cardlist.rst
new file mode 100644
index 000000000000..9f4295331a15
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-rtl28xxu-cardlist.rst
@@ -0,0 +1,80 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-rtl28xxu cards list
+===========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - ASUS My Cinema-U3100Mini Plus V2
+ - 1b80:d3a8
+ * - Astrometa DVB-T2
+ - 15f4:0131
+ * - Compro VideoMate U620F
+ - 185b:0620
+ * - Compro VideoMate U650F
+ - 185b:0650
+ * - Crypto ReDi PC 50 A
+ - 1f4d:a803
+ * - Dexatek DK DVB-T Dongle
+ - 1d19:1101
+ * - Dexatek DK mini DVB-T Dongle
+ - 1d19:1102
+ * - DigitalNow Quad DVB-T Receiver
+ - 0413:6680
+ * - Freecom USB2.0 DVB-T
+ - 14aa:0160, 14aa:0161
+ * - G-Tek Electronics Group Lifeview LV5TDLX DVB-T
+ - 1f4d:b803
+ * - GIGABYTE U7300
+ - 1b80:d393
+ * - Genius TVGo DVB-T03
+ - 0458:707f
+ * - GoTView MasterHD 3
+ - 5654:ca42
+ * - Leadtek WinFast DTV Dongle mini
+ - 0413:6a03
+ * - Leadtek WinFast DTV2000DS Plus
+ - 0413:6f12
+ * - Leadtek Winfast DTV Dongle Mini D
+ - 0413:6f0f
+ * - MSI DIGIVOX Micro HD
+ - 1d19:1104
+ * - MaxMedia HU394-T
+ - 1b80:d394
+ * - PROlectrix DV107669
+ - 1f4d:d803
+ * - Peak DVB-T USB
+ - 1b80:d395
+ * - Realtek RTL2831U reference design
+ - 0bda:2831
+ * - Realtek RTL2832U reference design
+ - 0bda:2832, 0bda:2838
+ * - Sveon STV20
+ - 1b80:d39d
+ * - Sveon STV21
+ - 1b80:d3b0
+ * - Sveon STV27
+ - 1b80:d3af
+ * - TURBO-X Pure TV Tuner DTT-2000
+ - 1b80:d3a4
+ * - TerraTec Cinergy T Stick Black
+ - 0ccd:00a9
+ * - TerraTec Cinergy T Stick RC (Rev. 3)
+ - 0ccd:00d3
+ * - TerraTec Cinergy T Stick+
+ - 0ccd:00d7
+ * - TerraTec NOXON DAB Stick
+ - 0ccd:00b3
+ * - TerraTec NOXON DAB Stick (rev 2)
+ - 0ccd:00e0
+ * - TerraTec NOXON DAB Stick (rev 3)
+ - 0ccd:00b4
+ * - Trekstor DVB-T Stick Terres 2.0
+ - 1f4d:C803
diff --git a/Documentation/admin-guide/media/dvb-usb-technisat-usb2-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-technisat-usb2-cardlist.rst
new file mode 100644
index 000000000000..30ee92ada134
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-technisat-usb2-cardlist.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-technisat-usb2 cards list
+=================================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - Technisat SkyStar USB HD (DVB-S/S2)
+ - 14f7:0500
diff --git a/Documentation/admin-guide/media/dvb-usb-ttusb2-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-ttusb2-cardlist.rst
new file mode 100644
index 000000000000..faa78e5f3f5d
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-ttusb2-cardlist.rst
@@ -0,0 +1,24 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-ttusb2 cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - Pinnacle 400e DVB-S USB2.0
+ - 2304:020f
+ * - Pinnacle 450e DVB-S USB2.0
+ - 2304:0222
+ * - Technotrend TT-connect CT-3650
+ - 0b48:300d
+ * - Technotrend TT-connect S-2400
+ - 0b48:3006
+ * - Technotrend TT-connect S-2400 (8kB EEPROM)
+ - 0b48:3009
diff --git a/Documentation/admin-guide/media/dvb-usb-umt-010-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-umt-010-cardlist.rst
new file mode 100644
index 000000000000..ce7ce901b5ac
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-umt-010-cardlist.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-umt-010 cards list
+==========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - Hanftek UMT-010 DVB-T USB2.0
+ - 15f4:0001, 15f4:0015
diff --git a/Documentation/admin-guide/media/dvb-usb-vp702x-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-vp702x-cardlist.rst
new file mode 100644
index 000000000000..101442434268
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-vp702x-cardlist.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-vp702x cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - TwinhanDTV StarBox DVB-S USB2.0 (VP7021)
+ - 13d3:3207
diff --git a/Documentation/admin-guide/media/dvb-usb-vp7045-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-vp7045-cardlist.rst
new file mode 100644
index 000000000000..2fc8fc4ecc32
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-vp7045-cardlist.rst
@@ -0,0 +1,18 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-vp7045 cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - DigitalNow TinyUSB 2 DVB-t Receiver
+ - 13d3:3223, 13d3:3224
+ * - Twinhan USB2.0 DVB-T receiver (TwinhanDTV Alpha/MagicBox II)
+ - 13d3:3205, 13d3:3206
diff --git a/Documentation/admin-guide/media/dvb-usb-zd1301-cardlist.rst b/Documentation/admin-guide/media/dvb-usb-zd1301-cardlist.rst
new file mode 100644
index 000000000000..9ca446184753
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb-usb-zd1301-cardlist.rst
@@ -0,0 +1,16 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+dvb-usb-zd1301 cards list
+=========================
+
+.. tabularcolumns:: |p{7.0cm}|p{10.5cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 7 13
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - ZyDAS ZD1301 reference design
+ - 0ace:13a1
diff --git a/Documentation/admin-guide/media/dvb.rst b/Documentation/admin-guide/media/dvb.rst
new file mode 100644
index 000000000000..e5258bfa5cd9
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb.rst
@@ -0,0 +1,12 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========
+Digital TV
+==========
+
+.. toctree::
+
+ dvb_intro
+ ci
+ faq
+ dvb_references
diff --git a/Documentation/admin-guide/media/dvb_intro.rst b/Documentation/admin-guide/media/dvb_intro.rst
new file mode 100644
index 000000000000..44eac9b3be6c
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb_intro.rst
@@ -0,0 +1,616 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============================
+Using the Digital TV Framework
+==============================
+
+Introduction
+~~~~~~~~~~~~
+
+One significant difference between Digital TV and Analogue TV that the
+unwary (like myself) should consider is that, although the component
+structure of DVB-T cards are substantially similar to Analogue TV cards,
+they function in substantially different ways.
+
+The purpose of an Analogue TV is to receive and display an Analogue
+Television signal. An Analogue TV signal (otherwise known as composite
+video) is an analogue encoding of a sequence of image frames (25 frames
+per second in Europe) rasterised using an interlacing technique.
+Interlacing takes two fields to represent one frame. Therefore, an
+Analogue TV card for a PC has the following purpose:
+
+* Tune the receiver to receive a broadcast signal
+* demodulate the broadcast signal
+* demultiplex the analogue video signal and analogue audio
+ signal.
+
+ .. note::
+
+ some countries employ a digital audio signal
+ embedded within the modulated composite analogue signal -
+ using NICAM signaling.)
+
+* digitize the analogue video signal and make the resulting datastream
+ available to the data bus.
+
+The digital datastream from an Analogue TV card is generated by
+circuitry on the card and is often presented uncompressed. For a PAL TV
+signal encoded at a resolution of 768x576 24-bit color pixels over 25
+frames per second - a fair amount of data is generated and must be
+processed by the PC before it can be displayed on the video monitor
+screen. Some Analogue TV cards for PCs have onboard MPEG2 encoders which
+permit the raw digital data stream to be presented to the PC in an
+encoded and compressed form - similar to the form that is used in
+Digital TV.
+
+The purpose of a simple budget digital TV card (DVB-T,C or S) is to
+simply:
+
+* Tune the received to receive a broadcast signal. * Extract the encoded
+ digital datastream from the broadcast signal.
+* Make the encoded digital datastream (MPEG2) available to the data bus.
+
+The significant difference between the two is that the tuner on the
+analogue TV card spits out an Analogue signal, whereas the tuner on the
+digital TV card spits out a compressed encoded digital datastream. As
+the signal is already digitised, it is trivial to pass this datastream
+to the PC databus with minimal additional processing and then extract
+the digital video and audio datastreams passing them to the appropriate
+software or hardware for decoding and viewing.
+
+Getting the card going
+~~~~~~~~~~~~~~~~~~~~~~
+
+The Device Driver API for DVB under Linux will the following
+device nodes via the devfs filesystem:
+
+* /dev/dvb/adapter0/demux0
+* /dev/dvb/adapter0/dvr0
+* /dev/dvb/adapter0/frontend0
+
+The ``/dev/dvb/adapter0/dvr0`` device node is used to read the MPEG2
+Data Stream and the ``/dev/dvb/adapter0/frontend0`` device node is used
+to tune the frontend tuner module. The ``/dev/dvb/adapter0/demux0`` is
+used to control what programs will be received.
+
+Depending on the card's feature set, the Device Driver API could also
+expose other device nodes:
+
+* /dev/dvb/adapter0/ca0
+* /dev/dvb/adapter0/audio0
+* /dev/dvb/adapter0/net0
+* /dev/dvb/adapter0/osd0
+* /dev/dvb/adapter0/video0
+
+The ``/dev/dvb/adapter0/ca0`` is used to decode encrypted channels. The
+other device nodes are found only on devices that use the av7110
+driver, with is now obsoleted, together with the extra API whose such
+devices use.
+
+Receiving a digital TV channel
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This section attempts to explain how it works and how this affects the
+configuration of a Digital TV card.
+
+On this example, we're considering tuning into DVB-T channels in
+Australia, at the Melbourne region.
+
+The frequencies broadcast by Mount Dandenong transmitters are,
+currently:
+
+Table 1. Transponder Frequencies Mount Dandenong, Vic, Aus.
+
+=========== ===========
+Broadcaster Frequency
+=========== ===========
+Seven 177.500 Mhz
+SBS 184.500 Mhz
+Nine 191.625 Mhz
+Ten 219.500 Mhz
+ABC 226.500 Mhz
+Channel 31 557.625 Mhz
+=========== ===========
+
+The digital TV Scan utilities (like dvbv5-scan) have use a set of
+compiled-in defaults for various countries and regions. Those are
+currently provided as a separate package, called dtv-scan-tables. It's
+git tree is located at LinuxTV.org:
+
+ https://git.linuxtv.org/dtv-scan-tables.git/
+
+If none of the tables there suit, you can specify a data file on the
+command line which contains the transponder frequencies. Here is a
+sample file for the above channel transponders, in the old "channel"
+format::
+
+ # Data file for DVB scan program
+ #
+ # C Frequency SymbolRate FEC QAM
+ # S Frequency Polarisation SymbolRate FEC
+ # T Frequency Bandwidth FEC FEC2 QAM Mode Guard Hier
+
+ T 177500000 7MHz AUTO AUTO QAM64 8k 1/16 NONE
+ T 184500000 7MHz AUTO AUTO QAM64 8k 1/8 NONE
+ T 191625000 7MHz AUTO AUTO QAM64 8k 1/16 NONE
+ T 219500000 7MHz AUTO AUTO QAM64 8k 1/16 NONE
+ T 226500000 7MHz AUTO AUTO QAM64 8k 1/16 NONE
+ T 557625000 7MHz AUTO AUTO QPSK 8k 1/16 NONE
+
+Nowadays, we prefer to use a newer format, with is more verbose and easier
+to understand. With the new format, the "Seven" channel transponder's
+data is represented by::
+
+ [Seven]
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 177500000
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = AUTO
+ CODE_RATE_LP = AUTO
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+ INVERSION = AUTO
+
+For an updated version of the complete table, please see:
+
+ https://git.linuxtv.org/dtv-scan-tables.git/tree/dvb-t/au-Melbourne
+
+When the Digital TV scanning utility runs, it will output a file
+containing the information for all the audio and video programs that
+exists into each channel's transponders which the card's frontend can
+lock onto. (i.e. any whose signal is strong enough at your antenna).
+
+Here's the output of the dvbv5 tools from a channel scan took from
+Melburne::
+
+ [ABC HDTV]
+ SERVICE_ID = 560
+ VIDEO_PID = 2307
+ AUDIO_PID = 0
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 226500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 3/4
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [ABC TV Melbourne]
+ SERVICE_ID = 561
+ VIDEO_PID = 512
+ AUDIO_PID = 650
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 226500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 3/4
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [ABC TV 2]
+ SERVICE_ID = 562
+ VIDEO_PID = 512
+ AUDIO_PID = 650
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 226500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 3/4
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [ABC TV 3]
+ SERVICE_ID = 563
+ VIDEO_PID = 512
+ AUDIO_PID = 650
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 226500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 3/4
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [ABC TV 4]
+ SERVICE_ID = 564
+ VIDEO_PID = 512
+ AUDIO_PID = 650
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 226500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 3/4
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [ABC DiG Radio]
+ SERVICE_ID = 566
+ VIDEO_PID = 0
+ AUDIO_PID = 2311
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 226500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 3/4
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [TEN Digital]
+ SERVICE_ID = 1585
+ VIDEO_PID = 512
+ AUDIO_PID = 650
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 219500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 1/2
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [TEN Digital 1]
+ SERVICE_ID = 1586
+ VIDEO_PID = 512
+ AUDIO_PID = 650
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 219500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 1/2
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [TEN Digital 2]
+ SERVICE_ID = 1587
+ VIDEO_PID = 512
+ AUDIO_PID = 650
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 219500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 1/2
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [TEN Digital 3]
+ SERVICE_ID = 1588
+ VIDEO_PID = 512
+ AUDIO_PID = 650
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 219500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 1/2
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [TEN Digital]
+ SERVICE_ID = 1589
+ VIDEO_PID = 512
+ AUDIO_PID = 650
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 219500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 1/2
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [TEN Digital 4]
+ SERVICE_ID = 1590
+ VIDEO_PID = 512
+ AUDIO_PID = 650
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 219500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 1/2
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [TEN Digital]
+ SERVICE_ID = 1591
+ VIDEO_PID = 512
+ AUDIO_PID = 650
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 219500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 1/2
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [TEN HD]
+ SERVICE_ID = 1592
+ VIDEO_PID = 514
+ AUDIO_PID = 0
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 219500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 1/2
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [TEN Digital]
+ SERVICE_ID = 1593
+ VIDEO_PID = 512
+ AUDIO_PID = 650
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 219500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 1/2
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [Nine Digital]
+ SERVICE_ID = 1072
+ VIDEO_PID = 513
+ AUDIO_PID = 660
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 191625000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 1/2
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [Nine Digital HD]
+ SERVICE_ID = 1073
+ VIDEO_PID = 512
+ AUDIO_PID = 0
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 191625000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 1/2
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [Nine Guide]
+ SERVICE_ID = 1074
+ VIDEO_PID = 514
+ AUDIO_PID = 670
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 191625000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 3/4
+ CODE_RATE_LP = 1/2
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/16
+ HIERARCHY = NONE
+
+ [7 Digital]
+ SERVICE_ID = 1328
+ VIDEO_PID = 769
+ AUDIO_PID = 770
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 177500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 2/3
+ CODE_RATE_LP = 2/3
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/8
+ HIERARCHY = NONE
+
+ [7 Digital 1]
+ SERVICE_ID = 1329
+ VIDEO_PID = 769
+ AUDIO_PID = 770
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 177500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 2/3
+ CODE_RATE_LP = 2/3
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/8
+ HIERARCHY = NONE
+
+ [7 Digital 2]
+ SERVICE_ID = 1330
+ VIDEO_PID = 769
+ AUDIO_PID = 770
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 177500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 2/3
+ CODE_RATE_LP = 2/3
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/8
+ HIERARCHY = NONE
+
+ [7 Digital 3]
+ SERVICE_ID = 1331
+ VIDEO_PID = 769
+ AUDIO_PID = 770
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 177500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 2/3
+ CODE_RATE_LP = 2/3
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/8
+ HIERARCHY = NONE
+
+ [7 HD Digital]
+ SERVICE_ID = 1332
+ VIDEO_PID = 833
+ AUDIO_PID = 834
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 177500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 2/3
+ CODE_RATE_LP = 2/3
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/8
+ HIERARCHY = NONE
+
+ [7 Program Guide]
+ SERVICE_ID = 1334
+ VIDEO_PID = 865
+ AUDIO_PID = 866
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 177500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 2/3
+ CODE_RATE_LP = 2/3
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/8
+ HIERARCHY = NONE
+
+ [SBS HD]
+ SERVICE_ID = 784
+ VIDEO_PID = 102
+ AUDIO_PID = 103
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 536500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 2/3
+ CODE_RATE_LP = 2/3
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/8
+ HIERARCHY = NONE
+
+ [SBS DIGITAL 1]
+ SERVICE_ID = 785
+ VIDEO_PID = 161
+ AUDIO_PID = 81
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 536500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 2/3
+ CODE_RATE_LP = 2/3
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/8
+ HIERARCHY = NONE
+
+ [SBS DIGITAL 2]
+ SERVICE_ID = 786
+ VIDEO_PID = 162
+ AUDIO_PID = 83
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 536500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 2/3
+ CODE_RATE_LP = 2/3
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/8
+ HIERARCHY = NONE
+
+ [SBS EPG]
+ SERVICE_ID = 787
+ VIDEO_PID = 163
+ AUDIO_PID = 85
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 536500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 2/3
+ CODE_RATE_LP = 2/3
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/8
+ HIERARCHY = NONE
+
+ [SBS RADIO 1]
+ SERVICE_ID = 798
+ VIDEO_PID = 0
+ AUDIO_PID = 201
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 536500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 2/3
+ CODE_RATE_LP = 2/3
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/8
+ HIERARCHY = NONE
+
+ [SBS RADIO 2]
+ SERVICE_ID = 799
+ VIDEO_PID = 0
+ AUDIO_PID = 202
+ DELIVERY_SYSTEM = DVBT
+ FREQUENCY = 536500000
+ INVERSION = OFF
+ BANDWIDTH_HZ = 7000000
+ CODE_RATE_HP = 2/3
+ CODE_RATE_LP = 2/3
+ MODULATION = QAM/64
+ TRANSMISSION_MODE = 8K
+ GUARD_INTERVAL = 1/8
+ HIERARCHY = NONE
diff --git a/Documentation/admin-guide/media/dvb_references.rst b/Documentation/admin-guide/media/dvb_references.rst
new file mode 100644
index 000000000000..4f0fd4259cfa
--- /dev/null
+++ b/Documentation/admin-guide/media/dvb_references.rst
@@ -0,0 +1,29 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+References
+==========
+
+The main development site and GIT repository for Digital TV
+drivers is https://linuxtv.org.
+
+The DVB mailing list linux-dvb is hosted at vger. Please see
+http://vger.kernel.org/vger-lists.html#linux-media for details.
+
+There are also some other old lists hosted at:
+https://linuxtv.org/lists.php. If you're interested on that for historic
+reasons, please check the archive at https://linuxtv.org/pipermail/linux-dvb/.
+
+The media subsystem Wiki is hosted at https://linuxtv.org/wiki/.
+There, you'll find lots of information, from both development and usage
+of media boards. Please check it before asking newbie questions on the
+mailing list or IRC channels.
+
+The API documentation is documented at the Kernel tree. You can find it
+in both html and pdf formats, together with other useful documentation at:
+
+ - https://linuxtv.org/docs.php.
+
+You may also find useful material at https://linuxtv.org/downloads/.
+
+In order to get the needed firmware for some drivers to work, there's
+a script at the kernel tree, at scripts/get_dvb_firmware.
diff --git a/Documentation/admin-guide/media/em28xx-cardlist.rst b/Documentation/admin-guide/media/em28xx-cardlist.rst
new file mode 100644
index 000000000000..ace65718ea22
--- /dev/null
+++ b/Documentation/admin-guide/media/em28xx-cardlist.rst
@@ -0,0 +1,440 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+EM28xx cards list
+=================
+
+.. tabularcolumns:: |p{1.4cm}|p{10.0cm}|p{1.9cm}|p{4.2cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 2 12 3 16
+ :stub-columns: 0
+
+ * - Card number
+ - Card name
+ - Empia Chip
+ - USB IDs
+ * - 0
+ - Unknown EM2800 video grabber
+ - em2800
+ - eb1a:2800
+ * - 1
+ - Unknown EM2750/28xx video grabber
+ - em2820 or em2840
+ - eb1a:2710, eb1a:2820, eb1a:2821, eb1a:2860, eb1a:2861, eb1a:2862, eb1a:2863, eb1a:2870, eb1a:2881, eb1a:2883, eb1a:2868, eb1a:2875
+ * - 2
+ - Terratec Cinergy 250 USB
+ - em2820 or em2840
+ - 0ccd:0036
+ * - 3
+ - Pinnacle PCTV USB 2
+ - em2820 or em2840
+ - 2304:0208
+ * - 4
+ - Hauppauge WinTV USB 2
+ - em2820 or em2840
+ - 2040:4200, 2040:4201
+ * - 5
+ - MSI VOX USB 2.0
+ - em2820 or em2840
+ -
+ * - 6
+ - Terratec Cinergy 200 USB
+ - em2800
+ -
+ * - 7
+ - Leadtek Winfast USB II
+ - em2800
+ - 0413:6023
+ * - 8
+ - Kworld USB2800
+ - em2800
+ -
+ * - 9
+ - Pinnacle Dazzle DVC 90/100/101/107 / Kaiser Baas Video to DVD maker / Kworld DVD Maker 2 / Plextor ConvertX PX-AV100U
+ - em2820 or em2840
+ - 1b80:e302, 1b80:e304, 2304:0207, 2304:021a, 093b:a003
+ * - 10
+ - Hauppauge WinTV HVR 900
+ - em2880
+ - 2040:6500
+ * - 11
+ - Terratec Hybrid XS
+ - em2880
+ -
+ * - 12
+ - Kworld PVR TV 2800 RF
+ - em2820 or em2840
+ -
+ * - 13
+ - Terratec Prodigy XS
+ - em2880
+ -
+ * - 14
+ - SIIG AVTuner-PVR / Pixelview Prolink PlayTV USB 2.0
+ - em2820 or em2840
+ -
+ * - 15
+ - V-Gear PocketTV
+ - em2800
+ -
+ * - 16
+ - Hauppauge WinTV HVR 950
+ - em2883
+ - 2040:6513, 2040:6517, 2040:651b
+ * - 17
+ - Pinnacle PCTV HD Pro Stick
+ - em2880
+ - 2304:0227
+ * - 18
+ - Hauppauge WinTV HVR 900 (R2)
+ - em2880
+ - 2040:6502
+ * - 19
+ - EM2860/SAA711X Reference Design
+ - em2860
+ -
+ * - 20
+ - AMD ATI TV Wonder HD 600
+ - em2880
+ - 0438:b002
+ * - 21
+ - eMPIA Technology, Inc. GrabBeeX+ Video Encoder
+ - em2800
+ - eb1a:2801
+ * - 22
+ - EM2710/EM2750/EM2751 webcam grabber
+ - em2750
+ - eb1a:2750, eb1a:2751
+ * - 23
+ - Huaqi DLCW-130
+ - em2750
+ -
+ * - 24
+ - D-Link DUB-T210 TV Tuner
+ - em2820 or em2840
+ - 2001:f112
+ * - 25
+ - Gadmei UTV310
+ - em2820 or em2840
+ -
+ * - 26
+ - Hercules Smart TV USB 2.0
+ - em2820 or em2840
+ -
+ * - 27
+ - Pinnacle PCTV USB 2 (Philips FM1216ME)
+ - em2820 or em2840
+ -
+ * - 28
+ - Leadtek Winfast USB II Deluxe
+ - em2820 or em2840
+ -
+ * - 29
+ - EM2860/TVP5150 Reference Design
+ - em2860
+ - eb1a:5051
+ * - 30
+ - Videology 20K14XUSB USB2.0
+ - em2820 or em2840
+ -
+ * - 31
+ - Usbgear VD204v9
+ - em2821
+ -
+ * - 32
+ - Supercomp USB 2.0 TV
+ - em2821
+ -
+ * - 33
+ - Elgato Video Capture
+ - em2860
+ - 0fd9:0033
+ * - 34
+ - Terratec Cinergy A Hybrid XS
+ - em2860
+ - 0ccd:004f
+ * - 35
+ - Typhoon DVD Maker
+ - em2860
+ -
+ * - 36
+ - NetGMBH Cam
+ - em2860
+ -
+ * - 37
+ - Gadmei UTV330
+ - em2860
+ - eb1a:50a6
+ * - 38
+ - Yakumo MovieMixer
+ - em2861
+ -
+ * - 39
+ - KWorld PVRTV 300U
+ - em2861
+ - eb1a:e300
+ * - 40
+ - Plextor ConvertX PX-TV100U
+ - em2861
+ - 093b:a005
+ * - 41
+ - Kworld 350 U DVB-T
+ - em2870
+ - eb1a:e350
+ * - 42
+ - Kworld 355 U DVB-T
+ - em2870
+ - eb1a:e355, eb1a:e357, eb1a:e359
+ * - 43
+ - Terratec Cinergy T XS
+ - em2870
+ -
+ * - 44
+ - Terratec Cinergy T XS (MT2060)
+ - em2870
+ - 0ccd:0043
+ * - 45
+ - Pinnacle PCTV DVB-T
+ - em2870
+ -
+ * - 46
+ - Compro, VideoMate U3
+ - em2870
+ - 185b:2870
+ * - 47
+ - KWorld DVB-T 305U
+ - em2880
+ - eb1a:e305
+ * - 48
+ - KWorld DVB-T 310U
+ - em2880
+ -
+ * - 49
+ - MSI DigiVox A/D
+ - em2880
+ - eb1a:e310
+ * - 50
+ - MSI DigiVox A/D II
+ - em2880
+ - eb1a:e320
+ * - 51
+ - Terratec Hybrid XS Secam
+ - em2880
+ - 0ccd:004c
+ * - 52
+ - DNT DA2 Hybrid
+ - em2881
+ -
+ * - 53
+ - Pinnacle Hybrid Pro
+ - em2881
+ -
+ * - 54
+ - Kworld VS-DVB-T 323UR
+ - em2882
+ - eb1a:e323
+ * - 55
+ - Terratec Cinergy Hybrid T USB XS (em2882)
+ - em2882
+ - 0ccd:005e, 0ccd:0042
+ * - 56
+ - Pinnacle Hybrid Pro (330e)
+ - em2882
+ - 2304:0226
+ * - 57
+ - Kworld PlusTV HD Hybrid 330
+ - em2883
+ - eb1a:a316
+ * - 58
+ - Compro VideoMate ForYou/Stereo
+ - em2820 or em2840
+ - 185b:2041
+ * - 59
+ - Pinnacle PCTV HD Mini
+ - em2874
+ - 2304:023f
+ * - 60
+ - Hauppauge WinTV HVR 850
+ - em2883
+ - 2040:651f
+ * - 61
+ - Pixelview PlayTV Box 4 USB 2.0
+ - em2820 or em2840
+ -
+ * - 62
+ - Gadmei TVR200
+ - em2820 or em2840
+ -
+ * - 63
+ - Kaiomy TVnPC U2
+ - em2860
+ - eb1a:e303
+ * - 64
+ - Easy Cap Capture DC-60
+ - em2860
+ - 1b80:e309
+ * - 65
+ - IO-DATA GV-MVP/SZ
+ - em2820 or em2840
+ - 04bb:0515
+ * - 66
+ - Empire dual TV
+ - em2880
+ -
+ * - 67
+ - Terratec Grabby
+ - em2860
+ - 0ccd:0096, 0ccd:10AF
+ * - 68
+ - Terratec AV350
+ - em2860
+ - 0ccd:0084
+ * - 69
+ - KWorld ATSC 315U HDTV TV Box
+ - em2882
+ - eb1a:a313
+ * - 70
+ - Evga inDtube
+ - em2882
+ -
+ * - 71
+ - Silvercrest Webcam 1.3mpix
+ - em2820 or em2840
+ -
+ * - 72
+ - Gadmei UTV330+
+ - em2861
+ -
+ * - 73
+ - Reddo DVB-C USB TV Box
+ - em2870
+ -
+ * - 74
+ - Actionmaster/LinXcel/Digitus VC211A
+ - em2800
+ -
+ * - 75
+ - Dikom DK300
+ - em2882
+ -
+ * - 76
+ - KWorld PlusTV 340U or UB435-Q (ATSC)
+ - em2870
+ - 1b80:a340
+ * - 77
+ - EM2874 Leadership ISDBT
+ - em2874
+ -
+ * - 78
+ - PCTV nanoStick T2 290e
+ - em28174
+ - 2013:024f
+ * - 79
+ - Terratec Cinergy H5
+ - em2884
+ - eb1a:2885, 0ccd:10a2, 0ccd:10ad, 0ccd:10b6
+ * - 80
+ - PCTV DVB-S2 Stick (460e)
+ - em28174
+ - 2013:024c
+ * - 81
+ - Hauppauge WinTV HVR 930C
+ - em2884
+ - 2040:1605
+ * - 82
+ - Terratec Cinergy HTC Stick
+ - em2884
+ - 0ccd:00b2
+ * - 83
+ - Honestech Vidbox NW03
+ - em2860
+ - eb1a:5006
+ * - 84
+ - MaxMedia UB425-TC
+ - em2874
+ - 1b80:e425
+ * - 85
+ - PCTV QuatroStick (510e)
+ - em2884
+ - 2304:0242
+ * - 86
+ - PCTV QuatroStick nano (520e)
+ - em2884
+ - 2013:0251
+ * - 87
+ - Terratec Cinergy HTC USB XS
+ - em2884
+ - 0ccd:008e, 0ccd:00ac
+ * - 88
+ - C3 Tech Digital Duo HDTV/SDTV USB
+ - em2884
+ - 1b80:e755
+ * - 89
+ - Delock 61959
+ - em2874
+ - 1b80:e1cc
+ * - 90
+ - KWorld USB ATSC TV Stick UB435-Q V2
+ - em2874
+ - 1b80:e346
+ * - 91
+ - SpeedLink Vicious And Devine Laplace webcam
+ - em2765
+ - 1ae7:9003, 1ae7:9004
+ * - 92
+ - PCTV DVB-S2 Stick (461e)
+ - em28178
+ - 2013:0258
+ * - 93
+ - KWorld USB ATSC TV Stick UB435-Q V3
+ - em2874
+ - 1b80:e34c
+ * - 94
+ - PCTV tripleStick (292e)
+ - em28178
+ - 2013:025f, 2013:0264, 2040:0264, 2040:8264, 2040:8268
+ * - 95
+ - Leadtek VC100
+ - em2861
+ - 0413:6f07
+ * - 96
+ - Terratec Cinergy T2 Stick HD
+ - em28178
+ - eb1a:8179
+ * - 97
+ - Elgato EyeTV Hybrid 2008 INT
+ - em2884
+ - 0fd9:0018
+ * - 98
+ - PLEX PX-BCUD
+ - em28178
+ - 3275:0085
+ * - 99
+ - Hauppauge WinTV-dualHD DVB
+ - em28174
+ - 2040:0265, 2040:8265
+ * - 100
+ - Hauppauge WinTV-dualHD 01595 ATSC/QAM
+ - em28174
+ - 2040:026d, 2040:826d
+ * - 101
+ - Terratec Cinergy H6 rev. 2
+ - em2884
+ - 0ccd:10b2
+ * - 102
+ - :ZOLID HYBRID TV STICK
+ - em2882
+ -
+ * - 103
+ - Magix USB Videowandler-2
+ - em2861
+ - 1b80:e349
+ * - 104
+ - PCTV DVB-S2 Stick (461e v2)
+ - em28178
+ - 2013:0461, 2013:0259
+ * - 105
+ - MyGica iGrabber
+ - em2860
+ - 1f4d:1abe
diff --git a/Documentation/admin-guide/media/faq.rst b/Documentation/admin-guide/media/faq.rst
new file mode 100644
index 000000000000..b63548b6f313
--- /dev/null
+++ b/Documentation/admin-guide/media/faq.rst
@@ -0,0 +1,216 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+FAQ
+===
+
+.. note::
+
+ 1. With Digital TV, a single physical channel may have different
+ contents inside it. The specs call each one as a *service*.
+ This is what a TV user would call "channel". So, in order to
+ avoid confusion, we're calling *transponders* as the physical
+ channel on this FAQ, and *services* for the logical channel.
+ 2. The LinuxTV community maintains some Wiki pages with contain
+ a lot of information related to the media subsystem. If you
+ don't find an answer for your needs here, it is likely that
+ you'll be able to get something useful there. It is hosted
+ at:
+
+ https://www.linuxtv.org/wiki/
+
+Some very frequently asked questions about Linux Digital TV support
+
+1. The signal seems to die a few seconds after tuning.
+
+ It's not a bug, it's a feature. Because the frontends have
+ significant power requirements (and hence get very hot), they
+ are powered down if they are unused (i.e. if the frontend device
+ is closed). The ``dvb-core`` module parameter ``dvb_shutdown_timeout``
+ allow you to change the timeout (default 5 seconds). Setting the
+ timeout to 0 disables the timeout feature.
+
+2. How can I watch TV?
+
+ Together with the Linux Kernel, the Digital TV developers support
+ some simple utilities which are mainly intended for testing
+ and to demonstrate how the DVB API works. This is called DVB v5
+ tools and are grouped together with the ``v4l-utils`` git repository:
+
+ https://git.linuxtv.org/v4l-utils.git/
+
+ You can find more information at the LinuxTV wiki:
+
+ https://www.linuxtv.org/wiki/index.php/DVBv5_Tools
+
+ The first step is to get a list of services that are transmitted.
+
+ This is done by using several existing tools. You can use
+ for example the ``dvbv5-scan`` tool. You can find more information
+ about it at:
+
+ https://www.linuxtv.org/wiki/index.php/Dvbv5-scan
+
+ There are some other applications like ``w_scan`` [#]_ that do a
+ blind scan, trying hard to find all possible channels, but
+ those consumes a large amount of time to run.
+
+ .. [#] https://www.linuxtv.org/wiki/index.php/W_scan
+
+ Also, some applications like ``kaffeine`` have their own code
+ to scan for services. So, you don't need to use an external
+ application to obtain such list.
+
+ Most of such tools need a file containing a list of channel
+ transponders available on your area. So, LinuxTV developers
+ maintain tables of Digital TV channel transponders, receiving
+ patches from the community to keep them updated.
+
+ This list is hosted at:
+
+ https://git.linuxtv.org/dtv-scan-tables.git
+
+ And packaged on several distributions.
+
+ Kaffeine has some blind scan support for some terrestrial standards.
+ It also relies on DTV scan tables, although it contains a copy
+ of it internally (and, if requested by the user, it will download
+ newer versions of it).
+
+ If you are lucky you can just use one of the supplied channel
+ transponders. If not, you may need to seek for such info at
+ the Internet and create a new file. There are several sites with
+ contains physical channel lists. For cable and satellite, usually
+ knowing how to tune into a single channel is enough for the
+ scanning tool to identify the other channels. On some places,
+ this could also work for terrestrial transmissions.
+
+ Once you have a transponders list, you need to generate a services
+ list with a tool like ``dvbv5-scan``.
+
+ Almost all modern Digital TV cards don't have built-in hardware
+ MPEG-decoders. So, it is up to the application to get a MPEG-TS
+ stream provided by the board, split it into audio, video and other
+ data and decode.
+
+3. Which Digital TV applications exist?
+
+ Several media player applications are capable of tuning into
+ digital TV channels, including Kaffeine, Vlc, mplayer and MythTV.
+
+ Kaffeine aims to be very user-friendly, and it is maintained
+ by one of the Kernel driver developers.
+
+ A comprehensive list of those and other apps can be found at:
+
+ https://www.linuxtv.org/wiki/index.php/TV_Related_Software
+
+ Some of the most popular ones are linked below:
+
+ https://kde.org/applications/multimedia/org.kde.kaffeine
+ KDE media player, focused on Digital TV support
+
+ https://www.linuxtv.org/vdrwiki/index.php/Main_Page
+ Klaus Schmidinger's Video Disk Recorder
+
+ https://linuxtv.org/downloads and https://git.linuxtv.org/
+ Digital TV and other media-related applications and
+ Kernel drivers. The ``v4l-utils`` package there contains
+ several swiss knife tools for using with Digital TV.
+
+ http://sourceforge.net/projects/dvbtools/
+ Dave Chapman's dvbtools package, including
+ dvbstream and dvbtune
+
+ http://www.dbox2.info/
+ LinuxDVB on the dBox2
+
+ http://www.tuxbox.org/
+ the TuxBox CVS many interesting DVB applications and the dBox2
+ DVB source
+
+ http://www.nenie.org/misc/mpsys/
+ MPSYS: a MPEG2 system library and tools
+
+ https://www.videolan.org/vlc/index.pt.html
+ Vlc
+
+ http://mplayerhq.hu/
+ MPlayer
+
+ http://xine.sourceforge.net/ and http://xinehq.de/
+ Xine
+
+ http://www.mythtv.org/
+ MythTV - analog TV and digital TV PVR
+
+ http://dvbsnoop.sourceforge.net/
+ DVB sniffer program to monitor, analyze, debug, dump
+ or view dvb/mpeg/dsm-cc/mhp stream information (TS,
+ PES, SECTION)
+
+4. Can't get a signal tuned correctly
+
+ That could be due to a lot of problems. On my personal experience,
+ usually TV cards need stronger signals than TV sets, and are more
+ sensitive to noise. So, perhaps you just need a better antenna or
+ cabling. Yet, it could also be some hardware or driver issue.
+
+ For example, if you are using a Technotrend/Hauppauge DVB-C card
+ *without* analog module, you might have to use module parameter
+ adac=-1 (dvb-ttpci.o).
+
+ Please see the FAQ page at linuxtv.org, as it could contain some
+ valuable information:
+
+ https://www.linuxtv.org/wiki/index.php/FAQ_%26_Troubleshooting
+
+ If that doesn't work, check at the linux-media ML archives, to
+ see if someone else had a similar problem with your hardware
+ and/or digital TV service provider:
+
+ https://lore.kernel.org/linux-media/
+
+ If none of this works, you can try sending an e-mail to the
+ linux-media ML and see if someone else could shed some light.
+ The e-mail is linux-media AT vger.kernel.org.
+
+5. The dvb_net device doesn't give me any packets at all
+
+ Run ``tcpdump`` on the ``dvb0_0`` interface. This sets the interface
+ into promiscuous mode so it accepts any packets from the PID
+ you have configured with the ``dvbnet`` utility. Check if there
+ are any packets with the IP addr and MAC addr you have
+ configured with ``ifconfig`` or with ``ip addr``.
+
+ If ``tcpdump`` doesn't give you any output, check the statistics
+ which ``ifconfig`` or ``netstat -ni`` outputs. (Note: If the MAC
+ address is wrong, ``dvb_net`` won't get any input; thus you have to
+ run ``tcpdump`` before checking the statistics.) If there are no
+ packets at all then maybe the PID is wrong. If there are error packets,
+ then either the PID is wrong or the stream does not conform to
+ the MPE standard (EN 301 192, http://www.etsi.org/). You can
+ use e.g. ``dvbsnoop`` for debugging.
+
+6. The ``dvb_net`` device doesn't give me any multicast packets
+
+ Check your routes if they include the multicast address range.
+ Additionally make sure that "source validation by reversed path
+ lookup" is disabled::
+
+ $ "echo 0 > /proc/sys/net/ipv4/conf/dvb0/rp_filter"
+
+7. What are all those modules that need to be loaded?
+
+ In order to make it more flexible and support different hardware
+ combinations, the media subsystem is written on a modular way.
+
+ So, besides the Digital TV hardware module for the main chipset,
+ it also needs to load a frontend driver, plus the Digital TV
+ core. If the board also has remote controller, it will also
+ need the remote controller core and the remote controller tables.
+ The same happens if the board has support for analog TV: the
+ core support for video4linux need to be loaded.
+
+ The actual module names are Linux-kernel version specific, as,
+ from time to time, things change, in order to make the media
+ support more flexible.
diff --git a/Documentation/admin-guide/media/fimc.rst b/Documentation/admin-guide/media/fimc.rst
new file mode 100644
index 000000000000..267ef52fe387
--- /dev/null
+++ b/Documentation/admin-guide/media/fimc.rst
@@ -0,0 +1,153 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: <isonum.txt>
+
+The Samsung S5P/Exynos4 FIMC driver
+===================================
+
+Copyright |copy| 2012 - 2013 Samsung Electronics Co., Ltd.
+
+The FIMC (Fully Interactive Mobile Camera) device available in Samsung
+SoC Application Processors is an integrated camera host interface, color
+space converter, image resizer and rotator. It's also capable of capturing
+data from LCD controller (FIMD) through the SoC internal writeback data
+path. There are multiple FIMC instances in the SoCs (up to 4), having
+slightly different capabilities, like pixel alignment constraints, rotator
+availability, LCD writeback support, etc. The driver is located at
+drivers/media/platform/samsung/exynos4-is directory.
+
+Supported SoCs
+--------------
+
+S5PC100 (mem-to-mem only), S5PV210, Exynos4210
+
+Supported features
+------------------
+
+- camera parallel interface capture (ITU-R.BT601/565);
+- camera serial interface capture (MIPI-CSI2);
+- memory-to-memory processing (color space conversion, scaling, mirror
+ and rotation);
+- dynamic pipeline re-configuration at runtime (re-attachment of any FIMC
+ instance to any parallel video input or any MIPI-CSI front-end);
+- runtime PM and system wide suspend/resume
+
+Not currently supported
+-----------------------
+
+- LCD writeback input
+- per frame clock gating (mem-to-mem)
+
+User space interfaces
+---------------------
+
+Media device interface
+~~~~~~~~~~~~~~~~~~~~~~
+
+The driver supports Media Controller API as defined at :ref:`media_controller`.
+The media device driver name is "Samsung S5P FIMC".
+
+The purpose of this interface is to allow changing assignment of FIMC instances
+to the SoC peripheral camera input at runtime and optionally to control internal
+connections of the MIPI-CSIS device(s) to the FIMC entities.
+
+The media device interface allows to configure the SoC for capturing image
+data from the sensor through more than one FIMC instance (e.g. for simultaneous
+viewfinder and still capture setup).
+
+Reconfiguration is done by enabling/disabling media links created by the driver
+during initialization. The internal device topology can be easily discovered
+through media entity and links enumeration.
+
+Memory-to-memory video node
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+V4L2 memory-to-memory interface at /dev/video? device node. This is standalone
+video device, it has no media pads. However please note the mem-to-mem and
+capture video node operation on same FIMC instance is not allowed. The driver
+detects such cases but the applications should prevent them to avoid an
+undefined behaviour.
+
+Capture video node
+~~~~~~~~~~~~~~~~~~
+
+The driver supports V4L2 Video Capture Interface as defined at
+:ref:`devices`.
+
+At the capture and mem-to-mem video nodes only the multi-planar API is
+supported. For more details see: :ref:`planar-apis`.
+
+Camera capture subdevs
+~~~~~~~~~~~~~~~~~~~~~~
+
+Each FIMC instance exports a sub-device node (/dev/v4l-subdev?), a sub-device
+node is also created per each available and enabled at the platform level
+MIPI-CSI receiver device (currently up to two).
+
+sysfs
+~~~~~
+
+In order to enable more precise camera pipeline control through the sub-device
+API the driver creates a sysfs entry associated with "s5p-fimc-md" platform
+device. The entry path is: /sys/platform/devices/s5p-fimc-md/subdev_conf_mode.
+
+In typical use case there could be a following capture pipeline configuration:
+sensor subdev -> mipi-csi subdev -> fimc subdev -> video node
+
+When we configure these devices through sub-device API at user space, the
+configuration flow must be from left to right, and the video node is
+configured as last one.
+
+When we don't use sub-device user space API the whole configuration of all
+devices belonging to the pipeline is done at the video node driver.
+The sysfs entry allows to instruct the capture node driver not to configure
+the sub-devices (format, crop), to avoid resetting the subdevs' configuration
+when the last configuration steps at the video node is performed.
+
+For full sub-device control support (subdevs configured at user space before
+starting streaming):
+
+.. code-block:: none
+
+ # echo "sub-dev" > /sys/platform/devices/s5p-fimc-md/subdev_conf_mode
+
+For V4L2 video node control only (subdevs configured internally by the host
+driver):
+
+.. code-block:: none
+
+ # echo "vid-dev" > /sys/platform/devices/s5p-fimc-md/subdev_conf_mode
+
+This is a default option.
+
+5. Device mapping to video and subdev device nodes
+--------------------------------------------------
+
+There are associated two video device nodes with each device instance in
+hardware - video capture and mem-to-mem and additionally a subdev node for
+more precise FIMC capture subsystem control. In addition a separate v4l2
+sub-device node is created per each MIPI-CSIS device.
+
+How to find out which /dev/video? or /dev/v4l-subdev? is assigned to which
+device?
+
+You can either grep through the kernel log to find relevant information, i.e.
+
+.. code-block:: none
+
+ # dmesg | grep -i fimc
+
+(note that udev, if present, might still have rearranged the video nodes),
+
+or retrieve the information from /dev/media? with help of the media-ctl tool:
+
+.. code-block:: none
+
+ # media-ctl -p
+
+7. Build
+--------
+
+If the driver is built as a loadable kernel module (CONFIG_VIDEO_SAMSUNG_S5P_FIMC=m)
+two modules are created (in addition to the core v4l2 modules): s5p-fimc.ko and
+optional s5p-csis.ko (MIPI-CSI receiver subdev).
diff --git a/Documentation/admin-guide/media/frontend-cardlist.rst b/Documentation/admin-guide/media/frontend-cardlist.rst
new file mode 100644
index 000000000000..ba5b7c69a978
--- /dev/null
+++ b/Documentation/admin-guide/media/frontend-cardlist.rst
@@ -0,0 +1,226 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+Frontend drivers
+================
+
+.. note::
+
+ #) There is no guarantee that every frontend driver works
+ out of the box with every card, because of different wiring.
+
+ #) The demodulator chips can be used with a variety of
+ tuner/PLL chips, and not all combinations are supported. Often
+ the demodulator and tuner/PLL chip are inside a metal box for
+ shielding, and the whole metal box has its own part number.
+
+
+Common Interface (EN50221) controller drivers
+=============================================
+
+============== =========================================================
+Driver Name
+============== =========================================================
+cxd2099 Sony CXD2099AR Common Interface driver
+sp2 CIMaX SP2
+============== =========================================================
+
+ATSC (North American/Korean Terrestrial/Cable DTV) frontends
+============================================================
+
+============== =========================================================
+Driver Name
+============== =========================================================
+au8522_dig Auvitek AU8522 based DTV demod
+au8522_decoder Auvitek AU8522 based ATV demod
+bcm3510 Broadcom BCM3510
+lg2160 LG Electronics LG216x based
+lgdt3305 LG Electronics LGDT3304 and LGDT3305 based
+lgdt3306a LG Electronics LGDT3306A based
+lgdt330x LG Electronics LGDT3302/LGDT3303 based
+nxt200x NxtWave Communications NXT2002/NXT2004 based
+or51132 Oren OR51132 based
+or51211 Oren OR51211 based
+s5h1409 Samsung S5H1409 based
+s5h1411 Samsung S5H1411 based
+============== =========================================================
+
+DVB-C (cable) frontends
+=======================
+
+============== =========================================================
+Driver Name
+============== =========================================================
+stv0297 ST STV0297 based
+tda10021 Philips TDA10021 based
+tda10023 Philips TDA10023 based
+ves1820 VLSI VES1820 based
+============== =========================================================
+
+DVB-S (satellite) frontends
+===========================
+
+============== =========================================================
+Driver Name
+============== =========================================================
+cx24110 Conexant CX24110 based
+cx24116 Conexant CX24116 based
+cx24117 Conexant CX24117 based
+cx24120 Conexant CX24120 based
+cx24123 Conexant CX24123 based
+ds3000 Montage Technology DS3000 based
+mb86a16 Fujitsu MB86A16 based
+mt312 Zarlink VP310/MT312/ZL10313 based
+s5h1420 Samsung S5H1420 based
+si21xx Silicon Labs SI21XX based
+stb6000 ST STB6000 silicon tuner
+stv0288 ST STV0288 based
+stv0299 ST STV0299 based
+stv0900 ST STV0900 based
+stv6110 ST STV6110 silicon tuner
+tda10071 NXP TDA10071
+tda10086 Philips TDA10086 based
+tda8083 Philips TDA8083 based
+tda8261 Philips TDA8261 based
+tda826x Philips TDA826X silicon tuner
+ts2020 Montage Technology TS2020 based tuners
+tua6100 Infineon TUA6100 PLL
+cx24113 Conexant CX24113/CX24128 tuner for DVB-S/DSS
+itd1000 Integrant ITD1000 Zero IF tuner for DVB-S/DSS
+ves1x93 VLSI VES1893 or VES1993 based
+zl10036 Zarlink ZL10036 silicon tuner
+zl10039 Zarlink ZL10039 silicon tuner
+============== =========================================================
+
+DVB-T (terrestrial) frontends
+=============================
+
+============== =========================================================
+Driver Name
+============== =========================================================
+af9013 Afatech AF9013 demodulator
+cx22700 Conexant CX22700 based
+cx22702 Conexant cx22702 demodulator (OFDM)
+cxd2820r Sony CXD2820R
+cxd2841er Sony CXD2841ER
+cxd2880 Sony CXD2880 DVB-T2/T tuner + demodulator
+dib3000mb DiBcom 3000M-B
+dib3000mc DiBcom 3000P/M-C
+dib7000m DiBcom 7000MA/MB/PA/PB/MC
+dib7000p DiBcom 7000PC
+dib9000 DiBcom 9000
+drxd Micronas DRXD driver
+ec100 E3C EC100
+l64781 LSI L64781
+mt352 Zarlink MT352 based
+nxt6000 NxtWave Communications NXT6000 based
+rtl2830 Realtek RTL2830 DVB-T
+rtl2832 Realtek RTL2832 DVB-T
+rtl2832_sdr Realtek RTL2832 SDR
+s5h1432 Samsung s5h1432 demodulator (OFDM)
+si2168 Silicon Labs Si2168
+sp8870 Spase sp8870 based
+sp887x Spase sp887x based
+stv0367 ST STV0367 based
+tda10048 Philips TDA10048HN based
+tda1004x Philips TDA10045H/TDA10046H based
+zd1301_demod ZyDAS ZD1301
+zl10353 Zarlink ZL10353 based
+============== =========================================================
+
+Digital terrestrial only tuners/PLL
+===================================
+
+============== =========================================================
+Driver Name
+============== =========================================================
+dvb-pll Generic I2C PLL based tuners
+dib0070 DiBcom DiB0070 silicon base-band tuner
+dib0090 DiBcom DiB0090 silicon base-band tuner
+============== =========================================================
+
+ISDB-S (satellite) & ISDB-T (terrestrial) frontends
+===================================================
+
+============== =========================================================
+Driver Name
+============== =========================================================
+mn88443x Socionext MN88443x
+tc90522 Toshiba TC90522
+============== =========================================================
+
+ISDB-T (terrestrial) frontends
+==============================
+
+============== =========================================================
+Driver Name
+============== =========================================================
+dib8000 DiBcom 8000MB/MC
+mb86a20s Fujitsu mb86a20s
+s921 Sharp S921 frontend
+============== =========================================================
+
+Multistandard (cable + terrestrial) frontends
+=============================================
+
+============== =========================================================
+Driver Name
+============== =========================================================
+drxk Micronas DRXK based
+mn88472 Panasonic MN88472
+mn88473 Panasonic MN88473
+si2165 Silicon Labs si2165 based
+tda18271c2dd NXP TDA18271C2 silicon tuner
+============== =========================================================
+
+Multistandard (satellite) frontends
+===================================
+
+============== =========================================================
+Driver Name
+============== =========================================================
+m88ds3103 Montage Technology M88DS3103
+mxl5xx MaxLinear MxL5xx based tuner-demodulators
+stb0899 STB0899 based
+stb6100 STB6100 based tuners
+stv090x STV0900/STV0903(A/B) based
+stv0910 STV0910 based
+stv6110x STV6110/(A) based tuners
+stv6111 STV6111 based tuners
+============== =========================================================
+
+SEC control devices for DVB-S
+=============================
+
+============== =========================================================
+Driver Name
+============== =========================================================
+a8293 Allegro A8293
+af9033 Afatech AF9033 DVB-T demodulator
+ascot2e Sony Ascot2E tuner
+atbm8830 AltoBeam ATBM8830/8831 DMB-TH demodulator
+drx39xyj Micronas DRX-J demodulator
+helene Sony HELENE Sat/Ter tuner (CXD2858ER)
+horus3a Sony Horus3A tuner
+isl6405 ISL6405 SEC controller
+isl6421 ISL6421 SEC controller
+isl6423 ISL6423 SEC controller
+ix2505v Sharp IX2505V silicon tuner
+lgs8gl5 Silicon Legend LGS-8GL5 demodulator (OFDM)
+lgs8gxx Legend Silicon LGS8913/LGS8GL5/LGS8GXX DMB-TH demodulator
+lnbh25 LNBH25 SEC controller
+lnbh29 LNBH29 SEC controller
+lnbp21 LNBP21/LNBH24 SEC controllers
+lnbp22 LNBP22 SEC controllers
+m88rs2000 M88RS2000 DVB-S demodulator and tuner
+tda665x TDA665x tuner
+============== =========================================================
+
+Tools to develop new frontends
+==============================
+
+============== =========================================================
+Driver Name
+============== =========================================================
+dvb_dummy_fe Dummy frontend driver
+============== =========================================================
diff --git a/Documentation/admin-guide/media/gspca-cardlist.rst b/Documentation/admin-guide/media/gspca-cardlist.rst
new file mode 100644
index 000000000000..e3404d1589da
--- /dev/null
+++ b/Documentation/admin-guide/media/gspca-cardlist.rst
@@ -0,0 +1,451 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+The gspca cards list
+====================
+
+The modules for the gspca webcam drivers are:
+
+- gspca_main: main driver
+- gspca\_\ *driver*: subdriver module with *driver* as follows
+
+========= ========= ===================================================================
+*driver* vend:prod Device
+========= ========= ===================================================================
+spca501 0000:0000 MystFromOri Unknown Camera
+spca508 0130:0130 Clone Digital Webcam 11043
+se401 03e8:0004 Endpoints/AoxSE401
+zc3xx 03f0:1b07 HP Premium Starter Cam
+m5602 0402:5602 ALi Video Camera Controller
+spca501 040a:0002 Kodak DVC-325
+spca500 040a:0300 Kodak EZ200
+zc3xx 041e:041e Creative WebCam Live!
+ov519 041e:4003 Video Blaster WebCam Go Plus
+stv0680 041e:4007 Go Mini
+spca500 041e:400a Creative PC-CAM 300
+sunplus 041e:400b Creative PC-CAM 600
+sunplus 041e:4012 PC-Cam350
+sunplus 041e:4013 Creative Pccam750
+zc3xx 041e:4017 Creative Webcam Mobile PD1090
+spca508 041e:4018 Creative Webcam Vista (PD1100)
+spca561 041e:401a Creative Webcam Vista (PD1100)
+zc3xx 041e:401c Creative NX
+spca505 041e:401d Creative Webcam NX ULTRA
+zc3xx 041e:401e Creative Nx Pro
+zc3xx 041e:401f Creative Webcam Notebook PD1171
+zc3xx 041e:4022 Webcam NX Pro
+pac207 041e:4028 Creative Webcam Vista Plus
+zc3xx 041e:4029 Creative WebCam Vista Pro
+zc3xx 041e:4034 Creative Instant P0620
+zc3xx 041e:4035 Creative Instant P0620D
+zc3xx 041e:4036 Creative Live !
+sq930x 041e:4038 Creative Joy-IT
+zc3xx 041e:403a Creative Nx Pro 2
+spca561 041e:403b Creative Webcam Vista (VF0010)
+sq930x 041e:403c Creative Live! Ultra
+sq930x 041e:403d Creative Live! Ultra for Notebooks
+sq930x 041e:4041 Creative Live! Motion
+zc3xx 041e:4051 Creative Live!Cam Notebook Pro (VF0250)
+ov519 041e:4052 Creative Live! VISTA IM
+zc3xx 041e:4053 Creative Live!Cam Video IM
+vc032x 041e:405b Creative Live! Cam Notebook Ultra (VC0130)
+ov519 041e:405f Creative Live! VISTA VF0330
+ov519 041e:4060 Creative Live! VISTA VF0350
+ov519 041e:4061 Creative Live! VISTA VF0400
+ov519 041e:4064 Creative Live! VISTA VF0420
+ov519 041e:4067 Creative Live! Cam Video IM (VF0350)
+ov519 041e:4068 Creative Live! VISTA VF0470
+sn9c2028 0458:7003 GeniusVideocam Live v2
+spca561 0458:7004 Genius VideoCAM Express V2
+sn9c2028 0458:7005 Genius Smart 300, version 2
+sunplus 0458:7006 Genius Dsc 1.3 Smart
+zc3xx 0458:7007 Genius VideoCam V2
+zc3xx 0458:700c Genius VideoCam V3
+zc3xx 0458:700f Genius VideoCam Web V2
+sonixj 0458:7025 Genius Eye 311Q
+sn9c20x 0458:7029 Genius Look 320s
+sonixj 0458:702e Genius Slim 310 NB
+sn9c20x 0458:7045 Genius Look 1320 V2
+sn9c20x 0458:704a Genius Slim 1320
+sn9c20x 0458:704c Genius i-Look 1321
+sn9c20x 045e:00f4 LifeCam VX-6000 (SN9C20x + OV9650)
+sonixj 045e:00f5 MicroSoft VX3000
+sonixj 045e:00f7 MicroSoft VX1000
+ov519 045e:028c Micro$oft xbox cam
+kinect 045e:02ae Xbox NUI Camera
+kinect 045e:02bf Kinect for Windows NUI Camera
+spca561 0461:0815 Micro Innovations IC200 Webcam
+sunplus 0461:0821 Fujifilm MV-1
+zc3xx 0461:0a00 MicroInnovation WebCam320
+stv06xx 046D:08F0 QuickCamMessenger
+stv06xx 046D:08F5 QuickCamCommunicate
+stv06xx 046D:08F6 QuickCamMessenger (new)
+stv06xx 046d:0840 QuickCamExpress
+stv06xx 046d:0850 LEGOcam / QuickCam Web
+stv06xx 046d:0870 DexxaWebCam USB
+spca500 046d:0890 Logitech QuickCam traveler
+vc032x 046d:0892 Logitech Orbicam
+vc032x 046d:0896 Logitech Orbicam
+vc032x 046d:0897 Logitech QuickCam for Dell notebooks
+zc3xx 046d:089d Logitech QuickCam E2500
+zc3xx 046d:08a0 Logitech QC IM
+zc3xx 046d:08a1 Logitech QC IM 0x08A1 +sound
+zc3xx 046d:08a2 Labtec Webcam Pro
+zc3xx 046d:08a3 Logitech QC Chat
+zc3xx 046d:08a6 Logitech QCim
+zc3xx 046d:08a7 Logitech QuickCam Image
+zc3xx 046d:08a9 Logitech Notebook Deluxe
+zc3xx 046d:08aa Labtec Webcam Notebook
+zc3xx 046d:08ac Logitech QuickCam Cool
+zc3xx 046d:08ad Logitech QCCommunicate STX
+zc3xx 046d:08ae Logitech QuickCam for Notebooks
+zc3xx 046d:08af Logitech QuickCam Cool
+zc3xx 046d:08b9 Logitech QuickCam Express
+zc3xx 046d:08d7 Logitech QCam STX
+zc3xx 046d:08d8 Logitech Notebook Deluxe
+zc3xx 046d:08d9 Logitech QuickCam IM/Connect
+zc3xx 046d:08da Logitech QuickCam Messenger
+zc3xx 046d:08dd Logitech QuickCam for Notebooks
+spca500 046d:0900 Logitech Inc. ClickSmart 310
+spca500 046d:0901 Logitech Inc. ClickSmart 510
+sunplus 046d:0905 Logitech ClickSmart 820
+tv8532 046d:0920 Logitech QuickCam Express
+tv8532 046d:0921 Labtec Webcam
+spca561 046d:0928 Logitech QC Express Etch2
+spca561 046d:0929 Labtec Webcam Elch2
+spca561 046d:092a Logitech QC for Notebook
+spca561 046d:092b Labtec Webcam Plus
+spca561 046d:092c Logitech QC chat Elch2
+spca561 046d:092d Logitech QC Elch2
+spca561 046d:092e Logitech QC Elch2
+spca561 046d:092f Logitech QuickCam Express Plus
+sunplus 046d:0960 Logitech ClickSmart 420
+nw80x 046d:d001 Logitech QuickCam Pro (dark focus ring)
+se401 0471:030b PhilipsPCVC665K
+sunplus 0471:0322 Philips DMVC1300K
+zc3xx 0471:0325 Philips SPC 200 NC
+zc3xx 0471:0326 Philips SPC 300 NC
+sonixj 0471:0327 Philips SPC 600 NC
+sonixj 0471:0328 Philips SPC 700 NC
+zc3xx 0471:032d Philips SPC 210 NC
+zc3xx 0471:032e Philips SPC 315 NC
+sonixj 0471:0330 Philips SPC 710 NC
+se401 047d:5001 Kensington67014
+se401 047d:5002 Kensington6701(5/7)
+se401 047d:5003 Kensington67016
+spca501 0497:c001 Smile International
+sunplus 04a5:3003 Benq DC 1300
+sunplus 04a5:3008 Benq DC 1500
+sunplus 04a5:300a Benq DC 3410
+spca500 04a5:300c Benq DC 1016
+benq 04a5:3035 Benq DC E300
+vicam 04c1:009d HomeConnect Webcam [vicam]
+konica 04c8:0720 IntelYC 76
+finepix 04cb:0104 Fujifilm FinePix 4800
+finepix 04cb:0109 Fujifilm FinePix A202
+finepix 04cb:010b Fujifilm FinePix A203
+finepix 04cb:010f Fujifilm FinePix A204
+finepix 04cb:0111 Fujifilm FinePix A205
+finepix 04cb:0113 Fujifilm FinePix A210
+finepix 04cb:0115 Fujifilm FinePix A303
+finepix 04cb:0117 Fujifilm FinePix A310
+finepix 04cb:0119 Fujifilm FinePix F401
+finepix 04cb:011b Fujifilm FinePix F402
+finepix 04cb:011d Fujifilm FinePix F410
+finepix 04cb:0121 Fujifilm FinePix F601
+finepix 04cb:0123 Fujifilm FinePix F700
+finepix 04cb:0125 Fujifilm FinePix M603
+finepix 04cb:0127 Fujifilm FinePix S300
+finepix 04cb:0129 Fujifilm FinePix S304
+finepix 04cb:012b Fujifilm FinePix S500
+finepix 04cb:012d Fujifilm FinePix S602
+finepix 04cb:012f Fujifilm FinePix S700
+finepix 04cb:0131 Fujifilm FinePix unknown model
+finepix 04cb:013b Fujifilm FinePix unknown model
+finepix 04cb:013d Fujifilm FinePix unknown model
+finepix 04cb:013f Fujifilm FinePix F420
+sunplus 04f1:1001 JVC GC A50
+spca561 04fc:0561 Flexcam 100
+spca1528 04fc:1528 Sunplus MD80 clone
+sunplus 04fc:500c Sunplus CA500C
+sunplus 04fc:504a Aiptek Mini PenCam 1.3
+sunplus 04fc:504b Maxell MaxPocket LE 1.3
+sunplus 04fc:5330 Digitrex 2110
+sunplus 04fc:5360 Sunplus Generic
+spca500 04fc:7333 PalmPixDC85
+sunplus 04fc:ffff Pure DigitalDakota
+nw80x 0502:d001 DVC V6
+spca501 0506:00df 3Com HomeConnect Lite
+sunplus 052b:1507 Megapixel 5 Pretec DC-1007
+sunplus 052b:1513 Megapix V4
+sunplus 052b:1803 MegaImage VI
+nw80x 052b:d001 EZCam Pro p35u
+tv8532 0545:808b Veo Stingray
+tv8532 0545:8333 Veo Stingray
+sunplus 0546:3155 Polaroid PDC3070
+sunplus 0546:3191 Polaroid Ion 80
+sunplus 0546:3273 Polaroid PDC2030
+touptek 0547:6801 TTUCMOS08000KPB, AS MU800
+dtcs033 0547:7303 Anchor Chips, Inc
+ov519 054c:0154 Sonny toy4
+ov519 054c:0155 Sonny toy5
+cpia1 0553:0002 CPIA CPiA (version1) based cameras
+stv0680 0553:0202 STV0680 Camera
+zc3xx 055f:c005 Mustek Wcam300A
+spca500 055f:c200 Mustek Gsmart 300
+sunplus 055f:c211 Kowa Bs888e Microcamera
+spca500 055f:c220 Gsmart Mini
+sunplus 055f:c230 Mustek Digicam 330K
+sunplus 055f:c232 Mustek MDC3500
+sunplus 055f:c360 Mustek DV4000 Mpeg4
+sunplus 055f:c420 Mustek gSmart Mini 2
+sunplus 055f:c430 Mustek Gsmart LCD 2
+sunplus 055f:c440 Mustek DV 3000
+sunplus 055f:c520 Mustek gSmart Mini 3
+sunplus 055f:c530 Mustek Gsmart LCD 3
+sunplus 055f:c540 Gsmart D30
+sunplus 055f:c630 Mustek MDC4000
+sunplus 055f:c650 Mustek MDC5500Z
+nw80x 055f:d001 Mustek Wcam 300 mini
+zc3xx 055f:d003 Mustek WCam300A
+zc3xx 055f:d004 Mustek WCam300 AN
+conex 0572:0041 Creative Notebook cx11646
+ov519 05a9:0511 Video Blaster WebCam 3/WebCam Plus, D-Link USB Digital Video Camera
+ov519 05a9:0518 Creative WebCam
+ov519 05a9:0519 OV519 Microphone
+ov519 05a9:0530 OmniVision
+ov534_9 05a9:1550 OmniVision VEHO Filmscanner
+ov519 05a9:2800 OmniVision SuperCAM
+ov519 05a9:4519 Webcam Classic
+ov534_9 05a9:8065 OmniVision test kit ov538+ov9712
+ov519 05a9:8519 OmniVision
+ov519 05a9:a511 D-Link USB Digital Video Camera
+ov519 05a9:a518 D-Link DSB-C310 Webcam
+sunplus 05da:1018 Digital Dream Enigma 1.3
+stk014 05e1:0893 Syntek DV4000
+gl860 05e3:0503 Genesys Logic PC Camera
+gl860 05e3:f191 Genesys Logic PC Camera
+vicam 0602:1001 ViCam Webcam
+spca561 060b:a001 Maxell Compact Pc PM3
+zc3xx 0698:2003 CTX M730V built in
+topro 06a2:0003 TP6800 PC Camera, CmoX CX0342 webcam
+topro 06a2:6810 Creative Qmax
+nw80x 06a5:0000 Typhoon Webcam 100 USB
+nw80x 06a5:d001 Divio based webcams
+nw80x 06a5:d800 Divio Chicony TwinkleCam, Trust SpaceCam
+spca500 06bd:0404 Agfa CL20
+spca500 06be:0800 Optimedia
+nw80x 06be:d001 EZCam Pro p35u
+sunplus 06d6:0031 Trust 610 LCD PowerC@m Zoom
+sunplus 06d6:0041 Aashima Technology B.V.
+spca506 06e1:a190 ADS Instant VCD
+ov534 06f8:3002 Hercules Blog Webcam
+ov534_9 06f8:3003 Hercules Dualpix HD Weblog
+sonixj 06f8:3004 Hercules Classic Silver
+sonixj 06f8:3008 Hercules Deluxe Optical Glass
+pac7302 06f8:3009 Hercules Classic Link
+pac7302 06f8:301b Hercules Link
+nw80x 0728:d001 AVerMedia Camguard
+spca508 0733:0110 ViewQuest VQ110
+spca501 0733:0401 Intel Create and Share
+spca501 0733:0402 ViewQuest M318B
+spca505 0733:0430 Intel PC Camera Pro
+sunplus 0733:1311 Digital Dream Epsilon 1.3
+sunplus 0733:1314 Mercury 2.1MEG Deluxe Classic Cam
+sunplus 0733:2211 Jenoptik jdc 21 LCD
+sunplus 0733:2221 Mercury Digital Pro 3.1p
+sunplus 0733:3261 Concord 3045 spca536a
+sunplus 0733:3281 Cyberpix S550V
+spca506 0734:043b 3DeMon USB Capture aka
+cpia1 0813:0001 QX3 camera
+ov519 0813:0002 Dual Mode USB Camera Plus
+spca500 084d:0003 D-Link DSC-350
+spca500 08ca:0103 Aiptek PocketDV
+sunplus 08ca:0104 Aiptek PocketDVII 1.3
+sunplus 08ca:0106 Aiptek Pocket DV3100+
+mr97310a 08ca:0110 Trust Spyc@m 100
+mr97310a 08ca:0111 Aiptek PenCam VGA+
+sunplus 08ca:2008 Aiptek Mini PenCam 2 M
+sunplus 08ca:2010 Aiptek PocketCam 3M
+sunplus 08ca:2016 Aiptek PocketCam 2 Mega
+sunplus 08ca:2018 Aiptek Pencam SD 2M
+sunplus 08ca:2020 Aiptek Slim 3000F
+sunplus 08ca:2022 Aiptek Slim 3200
+sunplus 08ca:2024 Aiptek DV3500 Mpeg4
+sunplus 08ca:2028 Aiptek PocketCam4M
+sunplus 08ca:2040 Aiptek PocketDV4100M
+sunplus 08ca:2042 Aiptek PocketDV5100
+sunplus 08ca:2050 Medion MD 41437
+sunplus 08ca:2060 Aiptek PocketDV5300
+tv8532 0923:010f ICM532 cams
+mr97310a 093a:010e All known CIF cams with this ID
+mr97310a 093a:010f All known VGA cams with this ID
+mars 093a:050f Mars-Semi Pc-Camera
+pac207 093a:2460 Qtec Webcam 100
+pac207 093a:2461 HP Webcam
+pac207 093a:2463 Philips SPC 220 NC
+pac207 093a:2464 Labtec Webcam 1200
+pac207 093a:2468 Webcam WB-1400T
+pac207 093a:2470 Genius GF112
+pac207 093a:2471 Genius VideoCam ge111
+pac207 093a:2472 Genius VideoCam ge110
+pac207 093a:2474 Genius iLook 111
+pac207 093a:2476 Genius e-Messenger 112
+pac7311 093a:2600 PAC7311 Typhoon
+pac7311 093a:2601 Philips SPC 610 NC
+pac7311 093a:2603 Philips SPC 500 NC
+pac7311 093a:2608 Trust WB-3300p
+pac7311 093a:260e Gigaware VGA PC Camera, Trust WB-3350p, SIGMA cam 2350
+pac7311 093a:260f SnakeCam
+pac7302 093a:2620 Apollo AC-905
+pac7302 093a:2621 PAC731x
+pac7302 093a:2622 Genius Eye 312
+pac7302 093a:2623 Pixart Imaging, Inc.
+pac7302 093a:2624 PAC7302
+pac7302 093a:2625 Genius iSlim 310
+pac7302 093a:2626 Labtec 2200
+pac7302 093a:2627 Genius FaceCam 300
+pac7302 093a:2628 Genius iLook 300
+pac7302 093a:2629 Genius iSlim 300
+pac7302 093a:262a Webcam 300k
+pac7302 093a:262c Philips SPC 230 NC
+jl2005bcd 0979:0227 Various brands, 19 known cameras supported
+jeilinj 0979:0270 Sakar 57379
+jeilinj 0979:0280 Sportscam DV15, Sakar 57379
+zc3xx 0ac8:0301 Web Camera
+zc3xx 0ac8:0302 Z-star Vimicro zc0302
+vc032x 0ac8:0321 Vimicro generic vc0321
+vc032x 0ac8:0323 Vimicro Vc0323
+vc032x 0ac8:0328 A4Tech PK-130MG
+zc3xx 0ac8:301b Z-Star zc301b
+zc3xx 0ac8:303b Vimicro 0x303b
+zc3xx 0ac8:305b Z-star Vimicro zc0305b
+zc3xx 0ac8:307b PC Camera (ZS0211)
+vc032x 0ac8:c001 Sony embedded vimicro
+vc032x 0ac8:c002 Sony embedded vimicro
+vc032x 0ac8:c301 Samsung Q1 Ultra Premium
+spca508 0af9:0010 Hama USB Sightcam 100
+spca508 0af9:0011 Hama USB Sightcam 100
+ov519 0b62:0059 iBOT2 Webcam
+sonixb 0c45:6001 Genius VideoCAM NB
+sonixb 0c45:6005 Microdia Sweex Mini Webcam
+sonixb 0c45:6007 Sonix sn9c101 + Tas5110D
+sonixb 0c45:6009 spcaCam@120
+sonixb 0c45:600d spcaCam@120
+sonixb 0c45:6011 Microdia PC Camera (SN9C102)
+sonixb 0c45:6019 Generic Sonix OV7630
+sonixb 0c45:6024 Generic Sonix Tas5130c
+sonixb 0c45:6025 Xcam Shanga
+sonixb 0c45:6027 GeniusEye 310
+sonixb 0c45:6028 Sonix Btc Pc380
+sonixb 0c45:6029 spcaCam@150
+sonixb 0c45:602a Meade ETX-105EC Camera
+sonixb 0c45:602c Generic Sonix OV7630
+sonixb 0c45:602d LIC-200 LG
+sonixb 0c45:602e Genius VideoCam Messenger
+sonixj 0c45:6040 Speed NVC 350K
+sonixj 0c45:607c Sonix sn9c102p Hv7131R
+sonixb 0c45:6083 VideoCAM Look
+sonixb 0c45:608c VideoCAM Look
+sonixb 0c45:608f PC Camera (SN9C103 + OV7630)
+sonixb 0c45:60a8 VideoCAM Look
+sonixb 0c45:60aa VideoCAM Look
+sonixb 0c45:60af VideoCAM Look
+sonixb 0c45:60b0 Genius VideoCam Look
+sonixj 0c45:60c0 Sangha Sn535
+sonixj 0c45:60ce USB-PC-Camera-168 (TALK-5067)
+sonixj 0c45:60ec SN9C105+MO4000
+sonixj 0c45:60fb Surfer NoName
+sonixj 0c45:60fc LG-LIC300
+sonixj 0c45:60fe Microdia Audio
+sonixj 0c45:6100 PC Camera (SN9C128)
+sonixj 0c45:6102 PC Camera (SN9C128)
+sonixj 0c45:610a PC Camera (SN9C128)
+sonixj 0c45:610b PC Camera (SN9C128)
+sonixj 0c45:610c PC Camera (SN9C128)
+sonixj 0c45:610e PC Camera (SN9C128)
+sonixj 0c45:6128 Microdia/Sonix SNP325
+sonixj 0c45:612a Avant Camera
+sonixj 0c45:612b Speed-Link REFLECT2
+sonixj 0c45:612c Typhoon Rasy Cam 1.3MPix
+sonixj 0c45:612e PC Camera (SN9C110)
+sonixj 0c45:6130 Sonix Pccam
+sonixj 0c45:6138 Sn9c120 Mo4000
+sonixj 0c45:613a Microdia Sonix PC Camera
+sonixj 0c45:613b Surfer SN-206
+sonixj 0c45:613c Sonix Pccam168
+sonixj 0c45:613e PC Camera (SN9C120)
+sonixj 0c45:6142 Hama PC-Webcam AC-150
+sonixj 0c45:6143 Sonix Pccam168
+sonixj 0c45:6148 Digitus DA-70811/ZSMC USB PC Camera ZS211/Microdia
+sonixj 0c45:614a Frontech E-Ccam (JIL-2225)
+sn9c20x 0c45:6240 PC Camera (SN9C201 + MT9M001)
+sn9c20x 0c45:6242 PC Camera (SN9C201 + MT9M111)
+sn9c20x 0c45:6248 PC Camera (SN9C201 + OV9655)
+sn9c20x 0c45:624c PC Camera (SN9C201 + MT9M112)
+sn9c20x 0c45:624e PC Camera (SN9C201 + SOI968)
+sn9c20x 0c45:624f PC Camera (SN9C201 + OV9650)
+sn9c20x 0c45:6251 PC Camera (SN9C201 + OV9650)
+sn9c20x 0c45:6253 PC Camera (SN9C201 + OV9650)
+sn9c20x 0c45:6260 PC Camera (SN9C201 + OV7670)
+sn9c20x 0c45:6270 PC Camera (SN9C201 + MT9V011/MT9V111/MT9V112)
+sn9c20x 0c45:627b PC Camera (SN9C201 + OV7660)
+sn9c20x 0c45:627c PC Camera (SN9C201 + HV7131R)
+sn9c20x 0c45:627f PC Camera (SN9C201 + OV9650)
+sn9c20x 0c45:6280 PC Camera (SN9C202 + MT9M001)
+sn9c20x 0c45:6282 PC Camera (SN9C202 + MT9M111)
+sn9c20x 0c45:6288 PC Camera (SN9C202 + OV9655)
+sn9c20x 0c45:628c PC Camera (SN9C201 + MT9M112)
+sn9c20x 0c45:628e PC Camera (SN9C202 + SOI968)
+sn9c20x 0c45:628f PC Camera (SN9C202 + OV9650)
+sn9c20x 0c45:62a0 PC Camera (SN9C202 + OV7670)
+sn9c20x 0c45:62b0 PC Camera (SN9C202 + MT9V011/MT9V111/MT9V112)
+sn9c20x 0c45:62b3 PC Camera (SN9C202 + OV9655)
+sn9c20x 0c45:62bb PC Camera (SN9C202 + OV7660)
+sn9c20x 0c45:62bc PC Camera (SN9C202 + HV7131R)
+sn9c2028 0c45:8001 Wild Planet Digital Spy Camera
+sn9c2028 0c45:8003 Sakar #11199, #6637x, #67480 keychain cams
+sn9c2028 0c45:8008 Mini-Shotz ms-350
+sn9c2028 0c45:800a Vivitar Vivicam 3350B
+sunplus 0d64:0303 Sunplus FashionCam DXG
+ov519 0e96:c001 TRUST 380 USB2 SPACEC@M
+etoms 102c:6151 Qcam Sangha CIF
+etoms 102c:6251 Qcam xxxxxx VGA
+ov519 1046:9967 W9967CF/W9968CF WebCam IC, Video Blaster WebCam Go
+zc3xx 10fd:0128 Typhoon Webshot II USB 300k 0x0128
+spca561 10fd:7e50 FlyCam Usb 100
+zc3xx 10fd:804d Typhoon Webshot II Webcam [zc0301]
+zc3xx 10fd:8050 Typhoon Webshot II USB 300k
+ov534 1415:2000 Sony HD Eye for PS3 (SLEH 00201)
+pac207 145f:013a Trust WB-1300N
+pac7302 145f:013c Trust
+sn9c20x 145f:013d Trust WB-3600R
+vc032x 15b8:6001 HP 2.0 Megapixel
+vc032x 15b8:6002 HP 2.0 Megapixel rz406aa
+stk1135 174f:6a31 ASUSlaptop, MT9M112 sensor
+spca501 1776:501c Arowana 300K CMOS Camera
+t613 17a1:0128 TASCORP JPEG Webcam, NGS Cyclops
+vc032x 17ef:4802 Lenovo Vc0323+MI1310_SOC
+pac7302 1ae7:2001 SpeedLinkSnappy Mic SL-6825-SBK
+pac207 2001:f115 D-Link DSB-C120
+sq905c 2770:9050 Disney pix micro (CIF)
+sq905c 2770:9051 Lego Bionicle
+sq905c 2770:9052 Disney pix micro 2 (VGA)
+sq905c 2770:905c All 11 known cameras with this ID
+sq905 2770:9120 All 24 known cameras with this ID
+sq905c 2770:913d All 4 known cameras with this ID
+sq930x 2770:930b Sweex Motion Tracking / I-Tec iCam Tracer
+sq930x 2770:930c Trust WB-3500T / NSG Robbie 2.0
+spca500 2899:012c Toptro Industrial
+ov519 8020:ef04 ov519
+spca508 8086:0110 Intel Easy PC Camera
+spca500 8086:0630 Intel Pocket PC Camera
+spca506 99fa:8988 Grandtec V.cap
+sn9c20x a168:0610 Dino-Lite Digital Microscope (SN9C201 + HV7131R)
+sn9c20x a168:0611 Dino-Lite Digital Microscope (SN9C201 + HV7131R)
+sn9c20x a168:0613 Dino-Lite Digital Microscope (SN9C201 + HV7131R)
+sn9c20x a168:0614 Dino-Lite Digital Microscope (SN9C201 + MT9M111)
+sn9c20x a168:0615 Dino-Lite Digital Microscope (SN9C201 + MT9M111)
+sn9c20x a168:0617 Dino-Lite Digital Microscope (SN9C201 + MT9M111)
+sn9c20x a168:0618 Dino-Lite Digital Microscope (SN9C201 + HV7131R)
+spca561 abcd:cdee Petcam
+========= ========= ===================================================================
diff --git a/Documentation/admin-guide/media/i2c-cardlist.rst b/Documentation/admin-guide/media/i2c-cardlist.rst
new file mode 100644
index 000000000000..1825a0bb47bd
--- /dev/null
+++ b/Documentation/admin-guide/media/i2c-cardlist.rst
@@ -0,0 +1,288 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+I²C drivers
+===========
+
+The I²C (Inter-Integrated Circuit) bus is a three-wires bus used internally
+at the media cards for communication between different chips. While the bus
+is not visible to the Linux Kernel, drivers need to send and receive
+commands via the bus. The Linux Kernel driver abstraction has support to
+implement different drivers for each component inside an I²C bus, as if
+the bus were visible to the main system board.
+
+One of the problems with I²C devices is that sometimes the same device may
+work with different I²C hardware. This is common, for example, on devices
+that comes with a tuner for North America market, and another one for
+Europe. Some drivers have a ``tuner=`` modprobe parameter to allow using a
+different tuner number in order to address such issue.
+
+The current supported of I²C drivers (not including staging drivers) are
+listed below.
+
+Audio decoders, processors and mixers
+-------------------------------------
+
+============ ==========================================================
+Driver Name
+============ ==========================================================
+cs3308 Cirrus Logic CS3308 audio ADC
+cs5345 Cirrus Logic CS5345 audio ADC
+cs53l32a Cirrus Logic CS53L32A audio ADC
+msp3400 Micronas MSP34xx audio decoders
+sony-btf-mpx Sony BTF's internal MPX
+tda1997x NXP TDA1997x HDMI receiver
+tda7432 Philips TDA7432 audio processor
+tda9840 Philips TDA9840 audio processor
+tea6415c Philips TEA6415C audio processor
+tea6420 Philips TEA6420 audio processor
+tlv320aic23b Texas Instruments TLV320AIC23B audio codec
+tvaudio Simple audio decoder chips
+uda1342 Philips UDA1342 audio codec
+vp27smpx Panasonic VP27's internal MPX
+wm8739 Wolfson Microelectronics WM8739 stereo audio ADC
+wm8775 Wolfson Microelectronics WM8775 audio ADC with input mixer
+============ ==========================================================
+
+Audio/Video compression chips
+-----------------------------
+
+============ ==========================================================
+Driver Name
+============ ==========================================================
+saa6752hs Philips SAA6752HS MPEG-2 Audio/Video Encoder
+============ ==========================================================
+
+Camera sensor devices
+---------------------
+
+============ ==========================================================
+Driver Name
+============ ==========================================================
+ccs MIPI CCS compliant camera sensors (also SMIA++ and SMIA)
+et8ek8 ET8EK8 camera sensor
+hi556 Hynix Hi-556 sensor
+hi846 Hynix Hi-846 sensor
+imx208 Sony IMX208 sensor
+imx214 Sony IMX214 sensor
+imx219 Sony IMX219 sensor
+imx258 Sony IMX258 sensor
+imx274 Sony IMX274 sensor
+imx290 Sony IMX290 sensor
+imx319 Sony IMX319 sensor
+imx334 Sony IMX334 sensor
+imx355 Sony IMX355 sensor
+imx412 Sony IMX412 sensor
+mt9m001 mt9m001
+mt9m111 mt9m111, mt9m112 and mt9m131
+mt9p031 Aptina MT9P031
+mt9t112 Aptina MT9T111/MT9T112
+mt9v011 Micron mt9v011 sensor
+mt9v032 Micron MT9V032 sensor
+mt9v111 Aptina MT9V111 sensor
+ov13858 OmniVision OV13858 sensor
+ov13b10 OmniVision OV13B10 sensor
+ov2640 OmniVision OV2640 sensor
+ov2659 OmniVision OV2659 sensor
+ov2680 OmniVision OV2680 sensor
+ov2685 OmniVision OV2685 sensor
+ov5640 OmniVision OV5640 sensor
+ov5645 OmniVision OV5645 sensor
+ov5647 OmniVision OV5647 sensor
+ov5670 OmniVision OV5670 sensor
+ov5675 OmniVision OV5675 sensor
+ov5695 OmniVision OV5695 sensor
+ov6650 OmniVision OV6650 sensor
+ov7251 OmniVision OV7251 sensor
+ov7640 OmniVision OV7640 sensor
+ov7670 OmniVision OV7670 sensor
+ov772x OmniVision OV772x sensor
+ov7740 OmniVision OV7740 sensor
+ov8856 OmniVision OV8856 sensor
+ov9640 OmniVision OV9640 sensor
+ov9650 OmniVision OV9650/OV9652 sensor
+rj54n1cb0c Sharp RJ54N1CB0C sensor
+s5c73m3 Samsung S5C73M3 sensor
+s5k4ecgx Samsung S5K4ECGX sensor
+s5k5baf Samsung S5K5BAF sensor
+s5k6a3 Samsung S5K6A3 sensor
+============ ==========================================================
+
+Flash devices
+-------------
+
+============ ==========================================================
+Driver Name
+============ ==========================================================
+adp1653 ADP1653 flash
+lm3560 LM3560 dual flash driver
+lm3646 LM3646 dual flash driver
+============ ==========================================================
+
+IR I2C driver
+-------------
+
+============ ==========================================================
+Driver Name
+============ ==========================================================
+ir-kbd-i2c I2C module for IR
+============ ==========================================================
+
+Lens drivers
+------------
+
+============ ==========================================================
+Driver Name
+============ ==========================================================
+ad5820 AD5820 lens voice coil
+ak7375 AK7375 lens voice coil
+dw9714 DW9714 lens voice coil
+dw9768 DW9768 lens voice coil
+dw9807-vcm DW9807 lens voice coil
+============ ==========================================================
+
+Miscellaneous helper chips
+--------------------------
+
+============ ==========================================================
+Driver Name
+============ ==========================================================
+video-i2c I2C transport video
+m52790 Mitsubishi M52790 A/V switch
+st-mipid02 STMicroelectronics MIPID02 CSI-2 to PARALLEL bridge
+ths7303 THS7303/53 Video Amplifier
+============ ==========================================================
+
+RDS decoders
+------------
+
+============ ==========================================================
+Driver Name
+============ ==========================================================
+saa6588 SAA6588 Radio Chip RDS decoder
+============ ==========================================================
+
+SDR tuner chips
+---------------
+
+============ ==========================================================
+Driver Name
+============ ==========================================================
+max2175 Maxim 2175 RF to Bits tuner
+============ ==========================================================
+
+Video and audio decoders
+------------------------
+
+============ ==========================================================
+Driver Name
+============ ==========================================================
+cx25840 Conexant CX2584x audio/video decoders
+saa717x Philips SAA7171/3/4 audio/video decoders
+============ ==========================================================
+
+Video decoders
+--------------
+
+============ ==========================================================
+Driver Name
+============ ==========================================================
+adv7180 Analog Devices ADV7180 decoder
+adv7183 Analog Devices ADV7183 decoder
+adv748x Analog Devices ADV748x decoder
+adv7604 Analog Devices ADV7604 decoder
+adv7842 Analog Devices ADV7842 decoder
+bt819 BT819A VideoStream decoder
+bt856 BT856 VideoStream decoder
+bt866 BT866 VideoStream decoder
+ks0127 KS0127 video decoder
+ml86v7667 OKI ML86V7667 video decoder
+saa7110 Philips SAA7110 video decoder
+saa7115 Philips SAA7111/3/4/5 video decoders
+tc358743 Toshiba TC358743 decoder
+tvp514x Texas Instruments TVP514x video decoder
+tvp5150 Texas Instruments TVP5150 video decoder
+tvp7002 Texas Instruments TVP7002 video decoder
+tw2804 Techwell TW2804 multiple video decoder
+tw9903 Techwell TW9903 video decoder
+tw9906 Techwell TW9906 video decoder
+tw9910 Techwell TW9910 video decoder
+vpx3220 vpx3220a, vpx3216b & vpx3214c video decoders
+============ ==========================================================
+
+Video encoders
+--------------
+
+============ ==========================================================
+Driver Name
+============ ==========================================================
+adv7170 Analog Devices ADV7170 video encoder
+adv7175 Analog Devices ADV7175 video encoder
+adv7343 ADV7343 video encoder
+adv7393 ADV7393 video encoder
+adv7511-v4l2 Analog Devices ADV7511 encoder
+ak881x AK8813/AK8814 video encoders
+saa7127 Philips SAA7127/9 digital video encoders
+saa7185 Philips SAA7185 video encoder
+ths8200 Texas Instruments THS8200 video encoder
+============ ==========================================================
+
+Video improvement chips
+-----------------------
+
+============ ==========================================================
+Driver Name
+============ ==========================================================
+upd64031a NEC Electronics uPD64031A Ghost Reduction
+upd64083 NEC Electronics uPD64083 3-Dimensional Y/C separation
+============ ==========================================================
+
+Tuner drivers
+-------------
+
+============ ==================================================
+Driver Name
+============ ==================================================
+e4000 Elonics E4000 silicon tuner
+fc0011 Fitipower FC0011 silicon tuner
+fc0012 Fitipower FC0012 silicon tuner
+fc0013 Fitipower FC0013 silicon tuner
+fc2580 FCI FC2580 silicon tuner
+it913x ITE Tech IT913x silicon tuner
+m88rs6000t Montage M88RS6000 internal tuner
+max2165 Maxim MAX2165 silicon tuner
+mc44s803 Freescale MC44S803 Low Power CMOS Broadband tuners
+msi001 Mirics MSi001
+mt2060 Microtune MT2060 silicon IF tuner
+mt2063 Microtune MT2063 silicon IF tuner
+mt20xx Microtune 2032 / 2050 tuners
+mt2131 Microtune MT2131 silicon tuner
+mt2266 Microtune MT2266 silicon tuner
+mxl301rf MaxLinear MxL301RF tuner
+mxl5005s MaxLinear MSL5005S silicon tuner
+mxl5007t MaxLinear MxL5007T silicon tuner
+qm1d1b0004 Sharp QM1D1B0004 tuner
+qm1d1c0042 Sharp QM1D1C0042 tuner
+qt1010 Quantek QT1010 silicon tuner
+r820t Rafael Micro R820T silicon tuner
+si2157 Silicon Labs Si2157 silicon tuner
+tuner-types Simple tuner support
+tda18212 NXP TDA18212 silicon tuner
+tda18218 NXP TDA18218 silicon tuner
+tda18250 NXP TDA18250 silicon tuner
+tda18271 NXP TDA18271 silicon tuner
+tda827x Philips TDA827X silicon tuner
+tda8290 TDA 8290/8295 + 8275(a)/18271 tuner combo
+tda9887 TDA 9885/6/7 analog IF demodulator
+tea5761 TEA 5761 radio tuner
+tea5767 TEA 5767 radio tuner
+tua9001 Infineon TUA9001 silicon tuner
+xc2028 XCeive xc2028/xc3028 tuners
+xc4000 Xceive XC4000 silicon tuner
+xc5000 Xceive XC5000 silicon tuner
+============ ==================================================
+
+.. toctree::
+ :maxdepth: 1
+
+ tuner-cardlist
+ frontend-cardlist
diff --git a/Documentation/admin-guide/media/imx.rst b/Documentation/admin-guide/media/imx.rst
new file mode 100644
index 000000000000..b8fa70f854fd
--- /dev/null
+++ b/Documentation/admin-guide/media/imx.rst
@@ -0,0 +1,714 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+i.MX Video Capture Driver
+=========================
+
+Introduction
+------------
+
+The Freescale i.MX5/6 contains an Image Processing Unit (IPU), which
+handles the flow of image frames to and from capture devices and
+display devices.
+
+For image capture, the IPU contains the following internal subunits:
+
+- Image DMA Controller (IDMAC)
+- Camera Serial Interface (CSI)
+- Image Converter (IC)
+- Sensor Multi-FIFO Controller (SMFC)
+- Image Rotator (IRT)
+- Video De-Interlacing or Combining Block (VDIC)
+
+The IDMAC is the DMA controller for transfer of image frames to and from
+memory. Various dedicated DMA channels exist for both video capture and
+display paths. During transfer, the IDMAC is also capable of vertical
+image flip, 8x8 block transfer (see IRT description), pixel component
+re-ordering (for example UYVY to YUYV) within the same colorspace, and
+packed <--> planar conversion. The IDMAC can also perform a simple
+de-interlacing by interweaving even and odd lines during transfer
+(without motion compensation which requires the VDIC).
+
+The CSI is the backend capture unit that interfaces directly with
+camera sensors over Parallel, BT.656/1120, and MIPI CSI-2 buses.
+
+The IC handles color-space conversion, resizing (downscaling and
+upscaling), horizontal flip, and 90/270 degree rotation operations.
+
+There are three independent "tasks" within the IC that can carry out
+conversions concurrently: pre-process encoding, pre-process viewfinder,
+and post-processing. Within each task, conversions are split into three
+sections: downsizing section, main section (upsizing, flip, colorspace
+conversion, and graphics plane combining), and rotation section.
+
+The IPU time-shares the IC task operations. The time-slice granularity
+is one burst of eight pixels in the downsizing section, one image line
+in the main processing section, one image frame in the rotation section.
+
+The SMFC is composed of four independent FIFOs that each can transfer
+captured frames from sensors directly to memory concurrently via four
+IDMAC channels.
+
+The IRT carries out 90 and 270 degree image rotation operations. The
+rotation operation is carried out on 8x8 pixel blocks at a time. This
+operation is supported by the IDMAC which handles the 8x8 block transfer
+along with block reordering, in coordination with vertical flip.
+
+The VDIC handles the conversion of interlaced video to progressive, with
+support for different motion compensation modes (low, medium, and high
+motion). The deinterlaced output frames from the VDIC can be sent to the
+IC pre-process viewfinder task for further conversions. The VDIC also
+contains a Combiner that combines two image planes, with alpha blending
+and color keying.
+
+In addition to the IPU internal subunits, there are also two units
+outside the IPU that are also involved in video capture on i.MX:
+
+- MIPI CSI-2 Receiver for camera sensors with the MIPI CSI-2 bus
+ interface. This is a Synopsys DesignWare core.
+- Two video multiplexers for selecting among multiple sensor inputs
+ to send to a CSI.
+
+For more info, refer to the latest versions of the i.MX5/6 reference
+manuals [#f1]_ and [#f2]_.
+
+
+Features
+--------
+
+Some of the features of this driver include:
+
+- Many different pipelines can be configured via media controller API,
+ that correspond to the hardware video capture pipelines supported in
+ the i.MX.
+
+- Supports parallel, BT.565, and MIPI CSI-2 interfaces.
+
+- Concurrent independent streams, by configuring pipelines to multiple
+ video capture interfaces using independent entities.
+
+- Scaling, color-space conversion, horizontal and vertical flip, and
+ image rotation via IC task subdevs.
+
+- Many pixel formats supported (RGB, packed and planar YUV, partial
+ planar YUV).
+
+- The VDIC subdev supports motion compensated de-interlacing, with three
+ motion compensation modes: low, medium, and high motion. Pipelines are
+ defined that allow sending frames to the VDIC subdev directly from the
+ CSI. There is also support in the future for sending frames to the
+ VDIC from memory buffers via a output/mem2mem devices.
+
+- Includes a Frame Interval Monitor (FIM) that can correct vertical sync
+ problems with the ADV718x video decoders.
+
+
+Topology
+--------
+
+The following shows the media topologies for the i.MX6Q SabreSD and
+i.MX6Q SabreAuto. Refer to these diagrams in the entity descriptions
+in the next section.
+
+The i.MX5/6 topologies can differ upstream from the IPUv3 CSI video
+multiplexers, but the internal IPUv3 topology downstream from there
+is common to all i.MX5/6 platforms. For example, the SabreSD, with the
+MIPI CSI-2 OV5640 sensor, requires the i.MX6 MIPI CSI-2 receiver. But
+the SabreAuto has only the ADV7180 decoder on a parallel bt.656 bus, and
+therefore does not require the MIPI CSI-2 receiver, so it is missing in
+its graph.
+
+.. _imx6q_topology_graph:
+
+.. kernel-figure:: imx6q-sabresd.dot
+ :alt: Diagram of the i.MX6Q SabreSD media pipeline topology
+ :align: center
+
+ Media pipeline graph on i.MX6Q SabreSD
+
+.. kernel-figure:: imx6q-sabreauto.dot
+ :alt: Diagram of the i.MX6Q SabreAuto media pipeline topology
+ :align: center
+
+ Media pipeline graph on i.MX6Q SabreAuto
+
+Entities
+--------
+
+imx6-mipi-csi2
+--------------
+
+This is the MIPI CSI-2 receiver entity. It has one sink pad to receive
+the MIPI CSI-2 stream (usually from a MIPI CSI-2 camera sensor). It has
+four source pads, corresponding to the four MIPI CSI-2 demuxed virtual
+channel outputs. Multiple source pads can be enabled to independently
+stream from multiple virtual channels.
+
+This entity actually consists of two sub-blocks. One is the MIPI CSI-2
+core. This is a Synopsys Designware MIPI CSI-2 core. The other sub-block
+is a "CSI-2 to IPU gasket". The gasket acts as a demultiplexer of the
+four virtual channels streams, providing four separate parallel buses
+containing each virtual channel that are routed to CSIs or video
+multiplexers as described below.
+
+On i.MX6 solo/dual-lite, all four virtual channel buses are routed to
+two video multiplexers. Both CSI0 and CSI1 can receive any virtual
+channel, as selected by the video multiplexers.
+
+On i.MX6 Quad, virtual channel 0 is routed to IPU1-CSI0 (after selected
+by a video mux), virtual channels 1 and 2 are hard-wired to IPU1-CSI1
+and IPU2-CSI0, respectively, and virtual channel 3 is routed to
+IPU2-CSI1 (again selected by a video mux).
+
+ipuX_csiY_mux
+-------------
+
+These are the video multiplexers. They have two or more sink pads to
+select from either camera sensors with a parallel interface, or from
+MIPI CSI-2 virtual channels from imx6-mipi-csi2 entity. They have a
+single source pad that routes to a CSI (ipuX_csiY entities).
+
+On i.MX6 solo/dual-lite, there are two video mux entities. One sits
+in front of IPU1-CSI0 to select between a parallel sensor and any of
+the four MIPI CSI-2 virtual channels (a total of five sink pads). The
+other mux sits in front of IPU1-CSI1, and again has five sink pads to
+select between a parallel sensor and any of the four MIPI CSI-2 virtual
+channels.
+
+On i.MX6 Quad, there are two video mux entities. One sits in front of
+IPU1-CSI0 to select between a parallel sensor and MIPI CSI-2 virtual
+channel 0 (two sink pads). The other mux sits in front of IPU2-CSI1 to
+select between a parallel sensor and MIPI CSI-2 virtual channel 3 (two
+sink pads).
+
+ipuX_csiY
+---------
+
+These are the CSI entities. They have a single sink pad receiving from
+either a video mux or from a MIPI CSI-2 virtual channel as described
+above.
+
+This entity has two source pads. The first source pad can link directly
+to the ipuX_vdic entity or the ipuX_ic_prp entity, using hardware links
+that require no IDMAC memory buffer transfer.
+
+When the direct source pad is routed to the ipuX_ic_prp entity, frames
+from the CSI can be processed by one or both of the IC pre-processing
+tasks.
+
+When the direct source pad is routed to the ipuX_vdic entity, the VDIC
+will carry out motion-compensated de-interlace using "high motion" mode
+(see description of ipuX_vdic entity).
+
+The second source pad sends video frames directly to memory buffers
+via the SMFC and an IDMAC channel, bypassing IC pre-processing. This
+source pad is routed to a capture device node, with a node name of the
+format "ipuX_csiY capture".
+
+Note that since the IDMAC source pad makes use of an IDMAC channel,
+pixel reordering within the same colorspace can be carried out by the
+IDMAC channel. For example, if the CSI sink pad is receiving in UYVY
+order, the capture device linked to the IDMAC source pad can capture
+in YUYV order. Also, if the CSI sink pad is receiving a packed YUV
+format, the capture device can capture a planar YUV format such as
+YUV420.
+
+The IDMAC channel at the IDMAC source pad also supports simple
+interweave without motion compensation, which is activated if the source
+pad's field type is sequential top-bottom or bottom-top, and the
+requested capture interface field type is set to interlaced (t-b, b-t,
+or unqualified interlaced). The capture interface will enforce the same
+field order as the source pad field order (interlaced-bt if source pad
+is seq-bt, interlaced-tb if source pad is seq-tb).
+
+For events produced by ipuX_csiY, see ref:`imx_api_ipuX_csiY`.
+
+Cropping in ipuX_csiY
+---------------------
+
+The CSI supports cropping the incoming raw sensor frames. This is
+implemented in the ipuX_csiY entities at the sink pad, using the
+crop selection subdev API.
+
+The CSI also supports fixed divide-by-two downscaling independently in
+width and height. This is implemented in the ipuX_csiY entities at
+the sink pad, using the compose selection subdev API.
+
+The output rectangle at the ipuX_csiY source pad is the same as
+the compose rectangle at the sink pad. So the source pad rectangle
+cannot be negotiated, it must be set using the compose selection
+API at sink pad (if /2 downscale is desired, otherwise source pad
+rectangle is equal to incoming rectangle).
+
+To give an example of crop and /2 downscale, this will crop a
+1280x960 input frame to 640x480, and then /2 downscale in both
+dimensions to 320x240 (assumes ipu1_csi0 is linked to ipu1_csi0_mux):
+
+.. code-block:: none
+
+ media-ctl -V "'ipu1_csi0_mux':2[fmt:UYVY2X8/1280x960]"
+ media-ctl -V "'ipu1_csi0':0[crop:(0,0)/640x480]"
+ media-ctl -V "'ipu1_csi0':0[compose:(0,0)/320x240]"
+
+Frame Skipping in ipuX_csiY
+---------------------------
+
+The CSI supports frame rate decimation, via frame skipping. Frame
+rate decimation is specified by setting the frame intervals at
+sink and source pads. The ipuX_csiY entity then applies the best
+frame skip setting to the CSI to achieve the desired frame rate
+at the source pad.
+
+The following example reduces an assumed incoming 60 Hz frame
+rate by half at the IDMAC output source pad:
+
+.. code-block:: none
+
+ media-ctl -V "'ipu1_csi0':0[fmt:UYVY2X8/640x480@1/60]"
+ media-ctl -V "'ipu1_csi0':2[fmt:UYVY2X8/640x480@1/30]"
+
+Frame Interval Monitor in ipuX_csiY
+-----------------------------------
+
+See ref:`imx_api_FIM`.
+
+ipuX_vdic
+---------
+
+The VDIC carries out motion compensated de-interlacing, with three
+motion compensation modes: low, medium, and high motion. The mode is
+specified with the menu control V4L2_CID_DEINTERLACING_MODE. The VDIC
+has two sink pads and a single source pad.
+
+The direct sink pad receives from an ipuX_csiY direct pad. With this
+link the VDIC can only operate in high motion mode.
+
+When the IDMAC sink pad is activated, it receives from an output
+or mem2mem device node. With this pipeline, the VDIC can also operate
+in low and medium modes, because these modes require receiving
+frames from memory buffers. Note that an output or mem2mem device
+is not implemented yet, so this sink pad currently has no links.
+
+The source pad routes to the IC pre-processing entity ipuX_ic_prp.
+
+ipuX_ic_prp
+-----------
+
+This is the IC pre-processing entity. It acts as a router, routing
+data from its sink pad to one or both of its source pads.
+
+This entity has a single sink pad. The sink pad can receive from the
+ipuX_csiY direct pad, or from ipuX_vdic.
+
+This entity has two source pads. One source pad routes to the
+pre-process encode task entity (ipuX_ic_prpenc), the other to the
+pre-process viewfinder task entity (ipuX_ic_prpvf). Both source pads
+can be activated at the same time if the sink pad is receiving from
+ipuX_csiY. Only the source pad to the pre-process viewfinder task entity
+can be activated if the sink pad is receiving from ipuX_vdic (frames
+from the VDIC can only be processed by the pre-process viewfinder task).
+
+ipuX_ic_prpenc
+--------------
+
+This is the IC pre-processing encode entity. It has a single sink
+pad from ipuX_ic_prp, and a single source pad. The source pad is
+routed to a capture device node, with a node name of the format
+"ipuX_ic_prpenc capture".
+
+This entity performs the IC pre-process encode task operations:
+color-space conversion, resizing (downscaling and upscaling),
+horizontal and vertical flip, and 90/270 degree rotation. Flip
+and rotation are provided via standard V4L2 controls.
+
+Like the ipuX_csiY IDMAC source, this entity also supports simple
+de-interlace without motion compensation, and pixel reordering.
+
+ipuX_ic_prpvf
+-------------
+
+This is the IC pre-processing viewfinder entity. It has a single sink
+pad from ipuX_ic_prp, and a single source pad. The source pad is routed
+to a capture device node, with a node name of the format
+"ipuX_ic_prpvf capture".
+
+This entity is identical in operation to ipuX_ic_prpenc, with the same
+resizing and CSC operations and flip/rotation controls. It will receive
+and process de-interlaced frames from the ipuX_vdic if ipuX_ic_prp is
+receiving from ipuX_vdic.
+
+Like the ipuX_csiY IDMAC source, this entity supports simple
+interweaving without motion compensation. However, note that if the
+ipuX_vdic is included in the pipeline (ipuX_ic_prp is receiving from
+ipuX_vdic), it's not possible to use interweave in ipuX_ic_prpvf,
+since the ipuX_vdic has already carried out de-interlacing (with
+motion compensation) and therefore the field type output from
+ipuX_vdic can only be none (progressive).
+
+Capture Pipelines
+-----------------
+
+The following describe the various use-cases supported by the pipelines.
+
+The links shown do not include the backend sensor, video mux, or mipi
+csi-2 receiver links. This depends on the type of sensor interface
+(parallel or mipi csi-2). So these pipelines begin with:
+
+sensor -> ipuX_csiY_mux -> ...
+
+for parallel sensors, or:
+
+sensor -> imx6-mipi-csi2 -> (ipuX_csiY_mux) -> ...
+
+for mipi csi-2 sensors. The imx6-mipi-csi2 receiver may need to route
+to the video mux (ipuX_csiY_mux) before sending to the CSI, depending
+on the mipi csi-2 virtual channel, hence ipuX_csiY_mux is shown in
+parenthesis.
+
+Unprocessed Video Capture:
+--------------------------
+
+Send frames directly from sensor to camera device interface node, with
+no conversions, via ipuX_csiY IDMAC source pad:
+
+-> ipuX_csiY:2 -> ipuX_csiY capture
+
+IC Direct Conversions:
+----------------------
+
+This pipeline uses the preprocess encode entity to route frames directly
+from the CSI to the IC, to carry out scaling up to 1024x1024 resolution,
+CSC, flipping, and image rotation:
+
+-> ipuX_csiY:1 -> 0:ipuX_ic_prp:1 -> 0:ipuX_ic_prpenc:1 -> ipuX_ic_prpenc capture
+
+Motion Compensated De-interlace:
+--------------------------------
+
+This pipeline routes frames from the CSI direct pad to the VDIC entity to
+support motion-compensated de-interlacing (high motion mode only),
+scaling up to 1024x1024, CSC, flip, and rotation:
+
+-> ipuX_csiY:1 -> 0:ipuX_vdic:2 -> 0:ipuX_ic_prp:2 -> 0:ipuX_ic_prpvf:1 -> ipuX_ic_prpvf capture
+
+
+Usage Notes
+-----------
+
+To aid in configuration and for backward compatibility with V4L2
+applications that access controls only from video device nodes, the
+capture device interfaces inherit controls from the active entities
+in the current pipeline, so controls can be accessed either directly
+from the subdev or from the active capture device interface. For
+example, the FIM controls are available either from the ipuX_csiY
+subdevs or from the active capture device.
+
+The following are specific usage notes for the Sabre* reference
+boards:
+
+
+i.MX6Q SabreLite with OV5642 and OV5640
+---------------------------------------
+
+This platform requires the OmniVision OV5642 module with a parallel
+camera interface, and the OV5640 module with a MIPI CSI-2
+interface. Both modules are available from Boundary Devices:
+
+- https://boundarydevices.com/product/nit6x_5mp
+- https://boundarydevices.com/product/nit6x_5mp_mipi
+
+Note that if only one camera module is available, the other sensor
+node can be disabled in the device tree.
+
+The OV5642 module is connected to the parallel bus input on the i.MX
+internal video mux to IPU1 CSI0. It's i2c bus connects to i2c bus 2.
+
+The MIPI CSI-2 OV5640 module is connected to the i.MX internal MIPI CSI-2
+receiver, and the four virtual channel outputs from the receiver are
+routed as follows: vc0 to the IPU1 CSI0 mux, vc1 directly to IPU1 CSI1,
+vc2 directly to IPU2 CSI0, and vc3 to the IPU2 CSI1 mux. The OV5640 is
+also connected to i2c bus 2 on the SabreLite, therefore the OV5642 and
+OV5640 must not share the same i2c slave address.
+
+The following basic example configures unprocessed video capture
+pipelines for both sensors. The OV5642 is routed to ipu1_csi0, and
+the OV5640, transmitting on MIPI CSI-2 virtual channel 1 (which is
+imx6-mipi-csi2 pad 2), is routed to ipu1_csi1. Both sensors are
+configured to output 640x480, and the OV5642 outputs YUYV2X8, the
+OV5640 UYVY2X8:
+
+.. code-block:: none
+
+ # Setup links for OV5642
+ media-ctl -l "'ov5642 1-0042':0 -> 'ipu1_csi0_mux':1[1]"
+ media-ctl -l "'ipu1_csi0_mux':2 -> 'ipu1_csi0':0[1]"
+ media-ctl -l "'ipu1_csi0':2 -> 'ipu1_csi0 capture':0[1]"
+ # Setup links for OV5640
+ media-ctl -l "'ov5640 1-0040':0 -> 'imx6-mipi-csi2':0[1]"
+ media-ctl -l "'imx6-mipi-csi2':2 -> 'ipu1_csi1':0[1]"
+ media-ctl -l "'ipu1_csi1':2 -> 'ipu1_csi1 capture':0[1]"
+ # Configure pads for OV5642 pipeline
+ media-ctl -V "'ov5642 1-0042':0 [fmt:YUYV2X8/640x480 field:none]"
+ media-ctl -V "'ipu1_csi0_mux':2 [fmt:YUYV2X8/640x480 field:none]"
+ media-ctl -V "'ipu1_csi0':2 [fmt:AYUV32/640x480 field:none]"
+ # Configure pads for OV5640 pipeline
+ media-ctl -V "'ov5640 1-0040':0 [fmt:UYVY2X8/640x480 field:none]"
+ media-ctl -V "'imx6-mipi-csi2':2 [fmt:UYVY2X8/640x480 field:none]"
+ media-ctl -V "'ipu1_csi1':2 [fmt:AYUV32/640x480 field:none]"
+
+Streaming can then begin independently on the capture device nodes
+"ipu1_csi0 capture" and "ipu1_csi1 capture". The v4l2-ctl tool can
+be used to select any supported YUV pixelformat on the capture device
+nodes, including planar.
+
+i.MX6Q SabreAuto with ADV7180 decoder
+-------------------------------------
+
+On the i.MX6Q SabreAuto, an on-board ADV7180 SD decoder is connected to the
+parallel bus input on the internal video mux to IPU1 CSI0.
+
+The following example configures a pipeline to capture from the ADV7180
+video decoder, assuming NTSC 720x480 input signals, using simple
+interweave (unconverted and without motion compensation). The adv7180
+must output sequential or alternating fields (field type 'seq-bt' for
+NTSC, or 'alternate'):
+
+.. code-block:: none
+
+ # Setup links
+ media-ctl -l "'adv7180 3-0021':0 -> 'ipu1_csi0_mux':1[1]"
+ media-ctl -l "'ipu1_csi0_mux':2 -> 'ipu1_csi0':0[1]"
+ media-ctl -l "'ipu1_csi0':2 -> 'ipu1_csi0 capture':0[1]"
+ # Configure pads
+ media-ctl -V "'adv7180 3-0021':0 [fmt:UYVY2X8/720x480 field:seq-bt]"
+ media-ctl -V "'ipu1_csi0_mux':2 [fmt:UYVY2X8/720x480]"
+ media-ctl -V "'ipu1_csi0':2 [fmt:AYUV32/720x480]"
+ # Configure "ipu1_csi0 capture" interface (assumed at /dev/video4)
+ v4l2-ctl -d4 --set-fmt-video=field=interlaced_bt
+
+Streaming can then begin on /dev/video4. The v4l2-ctl tool can also be
+used to select any supported YUV pixelformat on /dev/video4.
+
+This example configures a pipeline to capture from the ADV7180
+video decoder, assuming PAL 720x576 input signals, with Motion
+Compensated de-interlacing. The adv7180 must output sequential or
+alternating fields (field type 'seq-tb' for PAL, or 'alternate').
+
+.. code-block:: none
+
+ # Setup links
+ media-ctl -l "'adv7180 3-0021':0 -> 'ipu1_csi0_mux':1[1]"
+ media-ctl -l "'ipu1_csi0_mux':2 -> 'ipu1_csi0':0[1]"
+ media-ctl -l "'ipu1_csi0':1 -> 'ipu1_vdic':0[1]"
+ media-ctl -l "'ipu1_vdic':2 -> 'ipu1_ic_prp':0[1]"
+ media-ctl -l "'ipu1_ic_prp':2 -> 'ipu1_ic_prpvf':0[1]"
+ media-ctl -l "'ipu1_ic_prpvf':1 -> 'ipu1_ic_prpvf capture':0[1]"
+ # Configure pads
+ media-ctl -V "'adv7180 3-0021':0 [fmt:UYVY2X8/720x576 field:seq-tb]"
+ media-ctl -V "'ipu1_csi0_mux':2 [fmt:UYVY2X8/720x576]"
+ media-ctl -V "'ipu1_csi0':1 [fmt:AYUV32/720x576]"
+ media-ctl -V "'ipu1_vdic':2 [fmt:AYUV32/720x576 field:none]"
+ media-ctl -V "'ipu1_ic_prp':2 [fmt:AYUV32/720x576 field:none]"
+ media-ctl -V "'ipu1_ic_prpvf':1 [fmt:AYUV32/720x576 field:none]"
+ # Configure "ipu1_ic_prpvf capture" interface (assumed at /dev/video2)
+ v4l2-ctl -d2 --set-fmt-video=field=none
+
+Streaming can then begin on /dev/video2. The v4l2-ctl tool can also be
+used to select any supported YUV pixelformat on /dev/video2.
+
+This platform accepts Composite Video analog inputs to the ADV7180 on
+Ain1 (connector J42).
+
+i.MX6DL SabreAuto with ADV7180 decoder
+--------------------------------------
+
+On the i.MX6DL SabreAuto, an on-board ADV7180 SD decoder is connected to the
+parallel bus input on the internal video mux to IPU1 CSI0.
+
+The following example configures a pipeline to capture from the ADV7180
+video decoder, assuming NTSC 720x480 input signals, using simple
+interweave (unconverted and without motion compensation). The adv7180
+must output sequential or alternating fields (field type 'seq-bt' for
+NTSC, or 'alternate'):
+
+.. code-block:: none
+
+ # Setup links
+ media-ctl -l "'adv7180 4-0021':0 -> 'ipu1_csi0_mux':4[1]"
+ media-ctl -l "'ipu1_csi0_mux':5 -> 'ipu1_csi0':0[1]"
+ media-ctl -l "'ipu1_csi0':2 -> 'ipu1_csi0 capture':0[1]"
+ # Configure pads
+ media-ctl -V "'adv7180 4-0021':0 [fmt:UYVY2X8/720x480 field:seq-bt]"
+ media-ctl -V "'ipu1_csi0_mux':5 [fmt:UYVY2X8/720x480]"
+ media-ctl -V "'ipu1_csi0':2 [fmt:AYUV32/720x480]"
+ # Configure "ipu1_csi0 capture" interface (assumed at /dev/video0)
+ v4l2-ctl -d0 --set-fmt-video=field=interlaced_bt
+
+Streaming can then begin on /dev/video0. The v4l2-ctl tool can also be
+used to select any supported YUV pixelformat on /dev/video0.
+
+This example configures a pipeline to capture from the ADV7180
+video decoder, assuming PAL 720x576 input signals, with Motion
+Compensated de-interlacing. The adv7180 must output sequential or
+alternating fields (field type 'seq-tb' for PAL, or 'alternate').
+
+.. code-block:: none
+
+ # Setup links
+ media-ctl -l "'adv7180 4-0021':0 -> 'ipu1_csi0_mux':4[1]"
+ media-ctl -l "'ipu1_csi0_mux':5 -> 'ipu1_csi0':0[1]"
+ media-ctl -l "'ipu1_csi0':1 -> 'ipu1_vdic':0[1]"
+ media-ctl -l "'ipu1_vdic':2 -> 'ipu1_ic_prp':0[1]"
+ media-ctl -l "'ipu1_ic_prp':2 -> 'ipu1_ic_prpvf':0[1]"
+ media-ctl -l "'ipu1_ic_prpvf':1 -> 'ipu1_ic_prpvf capture':0[1]"
+ # Configure pads
+ media-ctl -V "'adv7180 4-0021':0 [fmt:UYVY2X8/720x576 field:seq-tb]"
+ media-ctl -V "'ipu1_csi0_mux':5 [fmt:UYVY2X8/720x576]"
+ media-ctl -V "'ipu1_csi0':1 [fmt:AYUV32/720x576]"
+ media-ctl -V "'ipu1_vdic':2 [fmt:AYUV32/720x576 field:none]"
+ media-ctl -V "'ipu1_ic_prp':2 [fmt:AYUV32/720x576 field:none]"
+ media-ctl -V "'ipu1_ic_prpvf':1 [fmt:AYUV32/720x576 field:none]"
+ # Configure "ipu1_ic_prpvf capture" interface (assumed at /dev/video2)
+ v4l2-ctl -d2 --set-fmt-video=field=none
+
+Streaming can then begin on /dev/video2. The v4l2-ctl tool can also be
+used to select any supported YUV pixelformat on /dev/video2.
+
+This platform accepts Composite Video analog inputs to the ADV7180 on
+Ain1 (connector J42).
+
+i.MX6Q SabreSD with MIPI CSI-2 OV5640
+-------------------------------------
+
+Similarly to i.MX6Q SabreLite, the i.MX6Q SabreSD supports a parallel
+interface OV5642 module on IPU1 CSI0, and a MIPI CSI-2 OV5640
+module. The OV5642 connects to i2c bus 1 and the OV5640 to i2c bus 2.
+
+The device tree for SabreSD includes OF graphs for both the parallel
+OV5642 and the MIPI CSI-2 OV5640, but as of this writing only the MIPI
+CSI-2 OV5640 has been tested, so the OV5642 node is currently disabled.
+The OV5640 module connects to MIPI connector J5. The NXP part number
+for the OV5640 module that connects to the SabreSD board is H120729.
+
+The following example configures unprocessed video capture pipeline to
+capture from the OV5640, transmitting on MIPI CSI-2 virtual channel 0:
+
+.. code-block:: none
+
+ # Setup links
+ media-ctl -l "'ov5640 1-003c':0 -> 'imx6-mipi-csi2':0[1]"
+ media-ctl -l "'imx6-mipi-csi2':1 -> 'ipu1_csi0_mux':0[1]"
+ media-ctl -l "'ipu1_csi0_mux':2 -> 'ipu1_csi0':0[1]"
+ media-ctl -l "'ipu1_csi0':2 -> 'ipu1_csi0 capture':0[1]"
+ # Configure pads
+ media-ctl -V "'ov5640 1-003c':0 [fmt:UYVY2X8/640x480]"
+ media-ctl -V "'imx6-mipi-csi2':1 [fmt:UYVY2X8/640x480]"
+ media-ctl -V "'ipu1_csi0_mux':0 [fmt:UYVY2X8/640x480]"
+ media-ctl -V "'ipu1_csi0':0 [fmt:AYUV32/640x480]"
+
+Streaming can then begin on "ipu1_csi0 capture" node. The v4l2-ctl
+tool can be used to select any supported pixelformat on the capture
+device node.
+
+To determine what is the /dev/video node correspondent to
+"ipu1_csi0 capture":
+
+.. code-block:: none
+
+ media-ctl -e "ipu1_csi0 capture"
+ /dev/video0
+
+/dev/video0 is the streaming element in this case.
+
+Starting the streaming via v4l2-ctl:
+
+.. code-block:: none
+
+ v4l2-ctl --stream-mmap -d /dev/video0
+
+Starting the streaming via Gstreamer and sending the content to the display:
+
+.. code-block:: none
+
+ gst-launch-1.0 v4l2src device=/dev/video0 ! kmssink
+
+The following example configures a direct conversion pipeline to capture
+from the OV5640, transmitting on MIPI CSI-2 virtual channel 0. It also
+shows colorspace conversion and scaling at IC output.
+
+.. code-block:: none
+
+ # Setup links
+ media-ctl -l "'ov5640 1-003c':0 -> 'imx6-mipi-csi2':0[1]"
+ media-ctl -l "'imx6-mipi-csi2':1 -> 'ipu1_csi0_mux':0[1]"
+ media-ctl -l "'ipu1_csi0_mux':2 -> 'ipu1_csi0':0[1]"
+ media-ctl -l "'ipu1_csi0':1 -> 'ipu1_ic_prp':0[1]"
+ media-ctl -l "'ipu1_ic_prp':1 -> 'ipu1_ic_prpenc':0[1]"
+ media-ctl -l "'ipu1_ic_prpenc':1 -> 'ipu1_ic_prpenc capture':0[1]"
+ # Configure pads
+ media-ctl -V "'ov5640 1-003c':0 [fmt:UYVY2X8/640x480]"
+ media-ctl -V "'imx6-mipi-csi2':1 [fmt:UYVY2X8/640x480]"
+ media-ctl -V "'ipu1_csi0_mux':2 [fmt:UYVY2X8/640x480]"
+ media-ctl -V "'ipu1_csi0':1 [fmt:AYUV32/640x480]"
+ media-ctl -V "'ipu1_ic_prp':1 [fmt:AYUV32/640x480]"
+ media-ctl -V "'ipu1_ic_prpenc':1 [fmt:ARGB8888_1X32/800x600]"
+ # Set a format at the capture interface
+ v4l2-ctl -d /dev/video1 --set-fmt-video=pixelformat=RGB3
+
+Streaming can then begin on "ipu1_ic_prpenc capture" node.
+
+To determine what is the /dev/video node correspondent to
+"ipu1_ic_prpenc capture":
+
+.. code-block:: none
+
+ media-ctl -e "ipu1_ic_prpenc capture"
+ /dev/video1
+
+
+/dev/video1 is the streaming element in this case.
+
+Starting the streaming via v4l2-ctl:
+
+.. code-block:: none
+
+ v4l2-ctl --stream-mmap -d /dev/video1
+
+Starting the streaming via Gstreamer and sending the content to the display:
+
+.. code-block:: none
+
+ gst-launch-1.0 v4l2src device=/dev/video1 ! kmssink
+
+Known Issues
+------------
+
+1. When using 90 or 270 degree rotation control at capture resolutions
+ near the IC resizer limit of 1024x1024, and combined with planar
+ pixel formats (YUV420, YUV422p), frame capture will often fail with
+ no end-of-frame interrupts from the IDMAC channel. To work around
+ this, use lower resolution and/or packed formats (YUYV, RGB3, etc.)
+ when 90 or 270 rotations are needed.
+
+
+File list
+---------
+
+drivers/staging/media/imx/
+include/media/imx.h
+include/linux/imx-media.h
+
+References
+----------
+
+.. [#f1] http://www.nxp.com/assets/documents/data/en/reference-manuals/IMX6DQRM.pdf
+.. [#f2] http://www.nxp.com/assets/documents/data/en/reference-manuals/IMX6SDLRM.pdf
+
+
+Authors
+-------
+
+- Steve Longerbeam <steve_longerbeam@mentor.com>
+- Philipp Zabel <kernel@pengutronix.de>
+- Russell King <linux@armlinux.org.uk>
+
+Copyright (C) 2012-2017 Mentor Graphics Inc.
diff --git a/Documentation/admin-guide/media/imx6q-sabreauto.dot b/Documentation/admin-guide/media/imx6q-sabreauto.dot
new file mode 100644
index 000000000000..bd6cf0b358c0
--- /dev/null
+++ b/Documentation/admin-guide/media/imx6q-sabreauto.dot
@@ -0,0 +1,51 @@
+digraph board {
+ rankdir=TB
+ n00000001 [label="{{<port0> 0} | ipu1_csi0\n/dev/v4l-subdev0 | {<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000001:port2 -> n00000005 [style=dashed]
+ n00000001:port1 -> n0000000f:port0 [style=dashed]
+ n00000001:port1 -> n0000000b:port0 [style=dashed]
+ n00000005 [label="ipu1_csi0 capture\n/dev/video0", shape=box, style=filled, fillcolor=yellow]
+ n0000000b [label="{{<port0> 0 | <port1> 1} | ipu1_vdic\n/dev/v4l-subdev1 | {<port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000000b:port2 -> n0000000f:port0 [style=dashed]
+ n0000000f [label="{{<port0> 0} | ipu1_ic_prp\n/dev/v4l-subdev2 | {<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000000f:port1 -> n00000013:port0 [style=dashed]
+ n0000000f:port2 -> n0000001c:port0 [style=dashed]
+ n00000013 [label="{{<port0> 0} | ipu1_ic_prpenc\n/dev/v4l-subdev3 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000013:port1 -> n00000016 [style=dashed]
+ n00000016 [label="ipu1_ic_prpenc capture\n/dev/video1", shape=box, style=filled, fillcolor=yellow]
+ n0000001c [label="{{<port0> 0} | ipu1_ic_prpvf\n/dev/v4l-subdev4 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000001c:port1 -> n0000001f [style=dashed]
+ n0000001f [label="ipu1_ic_prpvf capture\n/dev/video2", shape=box, style=filled, fillcolor=yellow]
+ n0000002f [label="{{<port0> 0} | ipu1_csi1\n/dev/v4l-subdev5 | {<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000002f:port2 -> n00000033 [style=dashed]
+ n0000002f:port1 -> n0000000f:port0 [style=dashed]
+ n0000002f:port1 -> n0000000b:port0 [style=dashed]
+ n00000033 [label="ipu1_csi1 capture\n/dev/video3", shape=box, style=filled, fillcolor=yellow]
+ n0000003d [label="{{<port0> 0} | ipu2_csi0\n/dev/v4l-subdev6 | {<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000003d:port2 -> n00000041 [style=dashed]
+ n0000003d:port1 -> n0000004b:port0 [style=dashed]
+ n0000003d:port1 -> n00000047:port0 [style=dashed]
+ n00000041 [label="ipu2_csi0 capture\n/dev/video4", shape=box, style=filled, fillcolor=yellow]
+ n00000047 [label="{{<port0> 0 | <port1> 1} | ipu2_vdic\n/dev/v4l-subdev7 | {<port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000047:port2 -> n0000004b:port0 [style=dashed]
+ n0000004b [label="{{<port0> 0} | ipu2_ic_prp\n/dev/v4l-subdev8 | {<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000004b:port1 -> n0000004f:port0 [style=dashed]
+ n0000004b:port2 -> n00000058:port0 [style=dashed]
+ n0000004f [label="{{<port0> 0} | ipu2_ic_prpenc\n/dev/v4l-subdev9 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000004f:port1 -> n00000052 [style=dashed]
+ n00000052 [label="ipu2_ic_prpenc capture\n/dev/video5", shape=box, style=filled, fillcolor=yellow]
+ n00000058 [label="{{<port0> 0} | ipu2_ic_prpvf\n/dev/v4l-subdev10 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000058:port1 -> n0000005b [style=dashed]
+ n0000005b [label="ipu2_ic_prpvf capture\n/dev/video6", shape=box, style=filled, fillcolor=yellow]
+ n0000006b [label="{{<port0> 0} | ipu2_csi1\n/dev/v4l-subdev11 | {<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000006b:port2 -> n0000006f [style=dashed]
+ n0000006b:port1 -> n0000004b:port0 [style=dashed]
+ n0000006b:port1 -> n00000047:port0 [style=dashed]
+ n0000006f [label="ipu2_csi1 capture\n/dev/video7", shape=box, style=filled, fillcolor=yellow]
+ n00000079 [label="{{<port0> 0 | <port1> 1} | ipu1_csi0_mux\n/dev/v4l-subdev12 | {<port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000079:port2 -> n00000001:port0 [style=dashed]
+ n0000007d [label="{{<port0> 0 | <port1> 1} | ipu2_csi1_mux\n/dev/v4l-subdev13 | {<port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000007d:port2 -> n0000006b:port0 [style=dashed]
+ n00000081 [label="{{} | adv7180 3-0021\n/dev/v4l-subdev14 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000081:port0 -> n00000079:port1 [style=dashed]
+}
diff --git a/Documentation/admin-guide/media/imx6q-sabresd.dot b/Documentation/admin-guide/media/imx6q-sabresd.dot
new file mode 100644
index 000000000000..7d56cafa1944
--- /dev/null
+++ b/Documentation/admin-guide/media/imx6q-sabresd.dot
@@ -0,0 +1,56 @@
+digraph board {
+ rankdir=TB
+ n00000001 [label="{{<port0> 0} | ipu1_csi0\n/dev/v4l-subdev0 | {<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000001:port2 -> n00000005 [style=dashed]
+ n00000001:port1 -> n0000000f:port0 [style=dashed]
+ n00000001:port1 -> n0000000b:port0 [style=dashed]
+ n00000005 [label="ipu1_csi0 capture\n/dev/video0", shape=box, style=filled, fillcolor=yellow]
+ n0000000b [label="{{<port0> 0 | <port1> 1} | ipu1_vdic\n/dev/v4l-subdev1 | {<port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000000b:port2 -> n0000000f:port0 [style=dashed]
+ n0000000f [label="{{<port0> 0} | ipu1_ic_prp\n/dev/v4l-subdev2 | {<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000000f:port1 -> n00000013:port0 [style=dashed]
+ n0000000f:port2 -> n0000001c:port0 [style=dashed]
+ n00000013 [label="{{<port0> 0} | ipu1_ic_prpenc\n/dev/v4l-subdev3 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000013:port1 -> n00000016 [style=dashed]
+ n00000016 [label="ipu1_ic_prpenc capture\n/dev/video1", shape=box, style=filled, fillcolor=yellow]
+ n0000001c [label="{{<port0> 0} | ipu1_ic_prpvf\n/dev/v4l-subdev4 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000001c:port1 -> n0000001f [style=dashed]
+ n0000001f [label="ipu1_ic_prpvf capture\n/dev/video2", shape=box, style=filled, fillcolor=yellow]
+ n0000002f [label="{{<port0> 0} | ipu1_csi1\n/dev/v4l-subdev5 | {<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000002f:port2 -> n00000033 [style=dashed]
+ n0000002f:port1 -> n0000000f:port0 [style=dashed]
+ n0000002f:port1 -> n0000000b:port0 [style=dashed]
+ n00000033 [label="ipu1_csi1 capture\n/dev/video3", shape=box, style=filled, fillcolor=yellow]
+ n0000003d [label="{{<port0> 0} | ipu2_csi0\n/dev/v4l-subdev6 | {<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000003d:port2 -> n00000041 [style=dashed]
+ n0000003d:port1 -> n0000004b:port0 [style=dashed]
+ n0000003d:port1 -> n00000047:port0 [style=dashed]
+ n00000041 [label="ipu2_csi0 capture\n/dev/video4", shape=box, style=filled, fillcolor=yellow]
+ n00000047 [label="{{<port0> 0 | <port1> 1} | ipu2_vdic\n/dev/v4l-subdev7 | {<port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000047:port2 -> n0000004b:port0 [style=dashed]
+ n0000004b [label="{{<port0> 0} | ipu2_ic_prp\n/dev/v4l-subdev8 | {<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000004b:port1 -> n0000004f:port0 [style=dashed]
+ n0000004b:port2 -> n00000058:port0 [style=dashed]
+ n0000004f [label="{{<port0> 0} | ipu2_ic_prpenc\n/dev/v4l-subdev9 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000004f:port1 -> n00000052 [style=dashed]
+ n00000052 [label="ipu2_ic_prpenc capture\n/dev/video5", shape=box, style=filled, fillcolor=yellow]
+ n00000058 [label="{{<port0> 0} | ipu2_ic_prpvf\n/dev/v4l-subdev10 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000058:port1 -> n0000005b [style=dashed]
+ n0000005b [label="ipu2_ic_prpvf capture\n/dev/video6", shape=box, style=filled, fillcolor=yellow]
+ n0000006b [label="{{<port0> 0} | ipu2_csi1\n/dev/v4l-subdev11 | {<port1> 1 | <port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000006b:port2 -> n0000006f [style=dashed]
+ n0000006b:port1 -> n0000004b:port0 [style=dashed]
+ n0000006b:port1 -> n00000047:port0 [style=dashed]
+ n0000006f [label="ipu2_csi1 capture\n/dev/video7", shape=box, style=filled, fillcolor=yellow]
+ n00000079 [label="{{<port0> 0} | imx6-mipi-csi2\n/dev/v4l-subdev12 | {<port1> 1 | <port2> 2 | <port3> 3 | <port4> 4}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000079:port2 -> n0000002f:port0 [style=dashed]
+ n00000079:port3 -> n0000003d:port0 [style=dashed]
+ n00000079:port1 -> n0000007f:port0 [style=dashed]
+ n00000079:port4 -> n00000083:port0 [style=dashed]
+ n0000007f [label="{{<port0> 0 | <port1> 1} | ipu1_csi0_mux\n/dev/v4l-subdev13 | {<port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000007f:port2 -> n00000001:port0 [style=dashed]
+ n00000083 [label="{{<port0> 0 | <port1> 1} | ipu2_csi1_mux\n/dev/v4l-subdev14 | {<port2> 2}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000083:port2 -> n0000006b:port0 [style=dashed]
+ n00000087 [label="{{} | ov5640 1-003c\n/dev/v4l-subdev15 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000087:port0 -> n00000079:port0 [style=dashed]
+}
diff --git a/Documentation/admin-guide/media/imx7.rst b/Documentation/admin-guide/media/imx7.rst
new file mode 100644
index 000000000000..2fa27718f52a
--- /dev/null
+++ b/Documentation/admin-guide/media/imx7.rst
@@ -0,0 +1,221 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+i.MX7 Video Capture Driver
+==========================
+
+Introduction
+------------
+
+The i.MX7 contrary to the i.MX5/6 family does not contain an Image Processing
+Unit (IPU); because of that the capabilities to perform operations or
+manipulation of the capture frames are less feature rich.
+
+For image capture the i.MX7 has three units:
+- CMOS Sensor Interface (CSI)
+- Video Multiplexer
+- MIPI CSI-2 Receiver
+
+.. code-block:: none
+
+ MIPI Camera Input ---> MIPI CSI-2 --- > |\
+ | \
+ | \
+ | M |
+ | U | ------> CSI ---> Capture
+ | X |
+ | /
+ Parallel Camera Input ----------------> | /
+ |/
+
+For additional information, please refer to the latest versions of the i.MX7
+reference manual [#f1]_.
+
+Entities
+--------
+
+imx-mipi-csi2
+--------------
+
+This is the MIPI CSI-2 receiver entity. It has one sink pad to receive the pixel
+data from MIPI CSI-2 camera sensor. It has one source pad, corresponding to the
+virtual channel 0. This module is compliant to previous version of Samsung
+D-phy, and supports two D-PHY Rx Data lanes.
+
+csi-mux
+-------
+
+This is the video multiplexer. It has two sink pads to select from either camera
+sensor with a parallel interface or from MIPI CSI-2 virtual channel 0. It has
+a single source pad that routes to the CSI.
+
+csi
+---
+
+The CSI enables the chip to connect directly to external CMOS image sensor. CSI
+can interface directly with Parallel and MIPI CSI-2 buses. It has 256 x 64 FIFO
+to store received image pixel data and embedded DMA controllers to transfer data
+from the FIFO through AHB bus.
+
+This entity has one sink pad that receives from the csi-mux entity and a single
+source pad that routes video frames directly to memory buffers. This pad is
+routed to a capture device node.
+
+Usage Notes
+-----------
+
+To aid in configuration and for backward compatibility with V4L2 applications
+that access controls only from video device nodes, the capture device interfaces
+inherit controls from the active entities in the current pipeline, so controls
+can be accessed either directly from the subdev or from the active capture
+device interface. For example, the sensor controls are available either from the
+sensor subdevs or from the active capture device.
+
+Warp7 with OV2680
+-----------------
+
+On this platform an OV2680 MIPI CSI-2 module is connected to the internal MIPI
+CSI-2 receiver. The following example configures a video capture pipeline with
+an output of 800x600, and BGGR 10 bit bayer format:
+
+.. code-block:: none
+
+ # Setup links
+ media-ctl -l "'ov2680 1-0036':0 -> 'imx7-mipi-csis.0':0[1]"
+ media-ctl -l "'imx7-mipi-csis.0':1 -> 'csi-mux':1[1]"
+ media-ctl -l "'csi-mux':2 -> 'csi':0[1]"
+ media-ctl -l "'csi':1 -> 'csi capture':0[1]"
+
+ # Configure pads for pipeline
+ media-ctl -V "'ov2680 1-0036':0 [fmt:SBGGR10_1X10/800x600 field:none]"
+ media-ctl -V "'csi-mux':1 [fmt:SBGGR10_1X10/800x600 field:none]"
+ media-ctl -V "'csi-mux':2 [fmt:SBGGR10_1X10/800x600 field:none]"
+ media-ctl -V "'imx7-mipi-csis.0':0 [fmt:SBGGR10_1X10/800x600 field:none]"
+ media-ctl -V "'csi':0 [fmt:SBGGR10_1X10/800x600 field:none]"
+
+After this streaming can start. The v4l2-ctl tool can be used to select any of
+the resolutions supported by the sensor.
+
+.. code-block:: none
+
+ # media-ctl -p
+ Media controller API version 5.2.0
+
+ Media device information
+ ------------------------
+ driver imx7-csi
+ model imx-media
+ serial
+ bus info
+ hw revision 0x0
+ driver version 5.2.0
+
+ Device topology
+ - entity 1: csi (2 pads, 2 links)
+ type V4L2 subdev subtype Unknown flags 0
+ device node name /dev/v4l-subdev0
+ pad0: Sink
+ [fmt:SBGGR10_1X10/800x600 field:none colorspace:srgb xfer:srgb ycbcr:601 quantization:full-range]
+ <- "csi-mux":2 [ENABLED]
+ pad1: Source
+ [fmt:SBGGR10_1X10/800x600 field:none colorspace:srgb xfer:srgb ycbcr:601 quantization:full-range]
+ -> "csi capture":0 [ENABLED]
+
+ - entity 4: csi capture (1 pad, 1 link)
+ type Node subtype V4L flags 0
+ device node name /dev/video0
+ pad0: Sink
+ <- "csi":1 [ENABLED]
+
+ - entity 10: csi-mux (3 pads, 2 links)
+ type V4L2 subdev subtype Unknown flags 0
+ device node name /dev/v4l-subdev1
+ pad0: Sink
+ [fmt:Y8_1X8/1x1 field:none]
+ pad1: Sink
+ [fmt:SBGGR10_1X10/800x600 field:none]
+ <- "imx7-mipi-csis.0":1 [ENABLED]
+ pad2: Source
+ [fmt:SBGGR10_1X10/800x600 field:none]
+ -> "csi":0 [ENABLED]
+
+ - entity 14: imx7-mipi-csis.0 (2 pads, 2 links)
+ type V4L2 subdev subtype Unknown flags 0
+ device node name /dev/v4l-subdev2
+ pad0: Sink
+ [fmt:SBGGR10_1X10/800x600 field:none]
+ <- "ov2680 1-0036":0 [ENABLED]
+ pad1: Source
+ [fmt:SBGGR10_1X10/800x600 field:none]
+ -> "csi-mux":1 [ENABLED]
+
+ - entity 17: ov2680 1-0036 (1 pad, 1 link)
+ type V4L2 subdev subtype Sensor flags 0
+ device node name /dev/v4l-subdev3
+ pad0: Source
+ [fmt:SBGGR10_1X10/800x600@1/30 field:none colorspace:srgb]
+ -> "imx7-mipi-csis.0":0 [ENABLED]
+
+i.MX6ULL-EVK with OV5640
+------------------------
+
+On this platform a parallel OV5640 sensor is connected to the CSI port.
+The following example configures a video capture pipeline with an output
+of 640x480 and UYVY8_2X8 format:
+
+.. code-block:: none
+
+ # Setup links
+ media-ctl -l "'ov5640 1-003c':0 -> 'csi':0[1]"
+ media-ctl -l "'csi':1 -> 'csi capture':0[1]"
+
+ # Configure pads for pipeline
+ media-ctl -v -V "'ov5640 1-003c':0 [fmt:UYVY8_2X8/640x480 field:none]"
+
+After this streaming can start:
+
+.. code-block:: none
+
+ gst-launch-1.0 -v v4l2src device=/dev/video1 ! video/x-raw,format=UYVY,width=640,height=480 ! v4l2convert ! fbdevsink
+
+.. code-block:: none
+
+ # media-ctl -p
+ Media controller API version 5.14.0
+
+ Media device information
+ ------------------------
+ driver imx7-csi
+ model imx-media
+ serial
+ bus info
+ hw revision 0x0
+ driver version 5.14.0
+
+ Device topology
+ - entity 1: csi (2 pads, 2 links)
+ type V4L2 subdev subtype Unknown flags 0
+ device node name /dev/v4l-subdev0
+ pad0: Sink
+ [fmt:UYVY8_2X8/640x480 field:none colorspace:srgb xfer:srgb ycbcr:601 quantization:full-range]
+ <- "ov5640 1-003c":0 [ENABLED,IMMUTABLE]
+ pad1: Source
+ [fmt:UYVY8_2X8/640x480 field:none colorspace:srgb xfer:srgb ycbcr:601 quantization:full-range]
+ -> "csi capture":0 [ENABLED,IMMUTABLE]
+
+ - entity 4: csi capture (1 pad, 1 link)
+ type Node subtype V4L flags 0
+ device node name /dev/video1
+ pad0: Sink
+ <- "csi":1 [ENABLED,IMMUTABLE]
+
+ - entity 10: ov5640 1-003c (1 pad, 1 link)
+ type V4L2 subdev subtype Sensor flags 0
+ device node name /dev/v4l-subdev1
+ pad0: Source
+ [fmt:UYVY8_2X8/640x480@1/30 field:none colorspace:srgb xfer:srgb ycbcr:601 quantization:full-range]
+ -> "csi":0 [ENABLED,IMMUTABLE]
+
+References
+----------
+
+.. [#f1] https://www.nxp.com/docs/en/reference-manual/IMX7SRM.pdf
diff --git a/Documentation/admin-guide/media/index.rst b/Documentation/admin-guide/media/index.rst
new file mode 100644
index 000000000000..be7e0e4482ca
--- /dev/null
+++ b/Documentation/admin-guide/media/index.rst
@@ -0,0 +1,56 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: <isonum.txt>
+
+====================================
+Media subsystem admin and user guide
+====================================
+
+This section contains usage information about media subsystem and
+its supported drivers.
+
+Please see:
+
+Documentation/userspace-api/media/index.rst
+
+ - for the userspace APIs used on media devices.
+
+Documentation/driver-api/media/index.rst
+
+ - for driver development information and Kernel APIs used by
+ media devices;
+
+.. toctree::
+ :caption: Table of Contents
+ :maxdepth: 2
+ :numbered:
+
+ intro
+ building
+
+ remote-controller
+
+ cec
+
+ dvb
+
+ cardlist
+
+ v4l-drivers
+ dvb-drivers
+
+**Copyright** |copy| 1999-2020 : LinuxTV Developers
+
+::
+
+ This documentation is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by the Free
+ Software Foundation; either version 2 of the License, or (at your option) any
+ later version.
+
+ This program is distributed in the hope that it will be useful, but WITHOUT
+ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ For more details see the file COPYING in the source distribution of Linux.
diff --git a/Documentation/admin-guide/media/intro.rst b/Documentation/admin-guide/media/intro.rst
new file mode 100644
index 000000000000..fec8122f2412
--- /dev/null
+++ b/Documentation/admin-guide/media/intro.rst
@@ -0,0 +1,27 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+Introduction
+============
+
+The media subsystem consists on Linux support for several different types
+of devices:
+
+- Audio and video grabbers;
+- PC and Laptop Cameras;
+- Complex cameras found on Embedded hardware;
+- Analog and digital TV;
+- HDMI Customer Electronics Control (CEC);
+- Multi-touch input devices;
+- Remote Controllers;
+- Media encoders and decoders.
+
+Due to the diversity of devices, the subsystem provides several different
+APIs:
+
+- Remote Controller API;
+- HDMI CEC API;
+- Video4Linux API;
+- Media controller API;
+- Video4Linux Request API (experimental);
+- Digital TV API (also known as DVB API).
diff --git a/Documentation/admin-guide/media/ipu3.rst b/Documentation/admin-guide/media/ipu3.rst
new file mode 100644
index 000000000000..83b3cd03b35c
--- /dev/null
+++ b/Documentation/admin-guide/media/ipu3.rst
@@ -0,0 +1,600 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: <isonum.txt>
+
+===============================================================
+Intel Image Processing Unit 3 (IPU3) Imaging Unit (ImgU) driver
+===============================================================
+
+Copyright |copy| 2018 Intel Corporation
+
+Introduction
+============
+
+This file documents the Intel IPU3 (3rd generation Image Processing Unit)
+Imaging Unit drivers located under drivers/media/pci/intel/ipu3 (CIO2) as well
+as under drivers/staging/media/ipu3 (ImgU).
+
+The Intel IPU3 found in certain Kaby Lake (as well as certain Sky Lake)
+platforms (U/Y processor lines) is made up of two parts namely the Imaging Unit
+(ImgU) and the CIO2 device (MIPI CSI2 receiver).
+
+The CIO2 device receives the raw Bayer data from the sensors and outputs the
+frames in a format that is specific to the IPU3 (for consumption by the IPU3
+ImgU). The CIO2 driver is available as drivers/media/pci/intel/ipu3/ipu3-cio2*
+and is enabled through the CONFIG_VIDEO_IPU3_CIO2 config option.
+
+The Imaging Unit (ImgU) is responsible for processing images captured
+by the IPU3 CIO2 device. The ImgU driver sources can be found under
+drivers/staging/media/ipu3 directory. The driver is enabled through the
+CONFIG_VIDEO_IPU3_IMGU config option.
+
+The two driver modules are named ipu3_csi2 and ipu3_imgu, respectively.
+
+The drivers has been tested on Kaby Lake platforms (U/Y processor lines).
+
+Both of the drivers implement V4L2, Media Controller and V4L2 sub-device
+interfaces. The IPU3 CIO2 driver supports camera sensors connected to the CIO2
+MIPI CSI-2 interfaces through V4L2 sub-device sensor drivers.
+
+CIO2
+====
+
+The CIO2 is represented as a single V4L2 subdev, which provides a V4L2 subdev
+interface to the user space. There is a video node for each CSI-2 receiver,
+with a single media controller interface for the entire device.
+
+The CIO2 contains four independent capture channel, each with its own MIPI CSI-2
+receiver and DMA engine. Each channel is modelled as a V4L2 sub-device exposed
+to userspace as a V4L2 sub-device node and has two pads:
+
+.. tabularcolumns:: |p{0.8cm}|p{4.0cm}|p{4.0cm}|
+
+.. flat-table::
+ :header-rows: 1
+
+ * - Pad
+ - Direction
+ - Purpose
+
+ * - 0
+ - sink
+ - MIPI CSI-2 input, connected to the sensor subdev
+
+ * - 1
+ - source
+ - Raw video capture, connected to the V4L2 video interface
+
+The V4L2 video interfaces model the DMA engines. They are exposed to userspace
+as V4L2 video device nodes.
+
+Capturing frames in raw Bayer format
+------------------------------------
+
+CIO2 MIPI CSI2 receiver is used to capture frames (in packed raw Bayer format)
+from the raw sensors connected to the CSI2 ports. The captured frames are used
+as input to the ImgU driver.
+
+Image processing using IPU3 ImgU requires tools such as raw2pnm [#f1]_, and
+yavta [#f2]_ due to the following unique requirements and / or features specific
+to IPU3.
+
+-- The IPU3 CSI2 receiver outputs the captured frames from the sensor in packed
+raw Bayer format that is specific to IPU3.
+
+-- Multiple video nodes have to be operated simultaneously.
+
+Let us take the example of ov5670 sensor connected to CSI2 port 0, for a
+2592x1944 image capture.
+
+Using the media controller APIs, the ov5670 sensor is configured to send
+frames in packed raw Bayer format to IPU3 CSI2 receiver.
+
+.. code-block:: none
+
+ # This example assumes /dev/media0 as the CIO2 media device
+ export MDEV=/dev/media0
+
+ # and that ov5670 sensor is connected to i2c bus 10 with address 0x36
+ export SDEV=$(media-ctl -d $MDEV -e "ov5670 10-0036")
+
+ # Establish the link for the media devices using media-ctl [#f3]_
+ media-ctl -d $MDEV -l "ov5670:0 -> ipu3-csi2 0:0[1]"
+
+ # Set the format for the media devices
+ media-ctl -d $MDEV -V "ov5670:0 [fmt:SGRBG10/2592x1944]"
+ media-ctl -d $MDEV -V "ipu3-csi2 0:0 [fmt:SGRBG10/2592x1944]"
+ media-ctl -d $MDEV -V "ipu3-csi2 0:1 [fmt:SGRBG10/2592x1944]"
+
+Once the media pipeline is configured, desired sensor specific settings
+(such as exposure and gain settings) can be set, using the yavta tool.
+
+e.g
+
+.. code-block:: none
+
+ yavta -w 0x009e0903 444 $SDEV
+ yavta -w 0x009e0913 1024 $SDEV
+ yavta -w 0x009e0911 2046 $SDEV
+
+Once the desired sensor settings are set, frame captures can be done as below.
+
+e.g
+
+.. code-block:: none
+
+ yavta --data-prefix -u -c10 -n5 -I -s2592x1944 --file=/tmp/frame-#.bin \
+ -f IPU3_SGRBG10 $(media-ctl -d $MDEV -e "ipu3-cio2 0")
+
+With the above command, 10 frames are captured at 2592x1944 resolution, with
+sGRBG10 format and output as IPU3_SGRBG10 format.
+
+The captured frames are available as /tmp/frame-#.bin files.
+
+ImgU
+====
+
+The ImgU is represented as two V4L2 subdevs, each of which provides a V4L2
+subdev interface to the user space.
+
+Each V4L2 subdev represents a pipe, which can support a maximum of 2 streams.
+This helps to support advanced camera features like Continuous View Finder (CVF)
+and Snapshot During Video(SDV).
+
+The ImgU contains two independent pipes, each modelled as a V4L2 sub-device
+exposed to userspace as a V4L2 sub-device node.
+
+Each pipe has two sink pads and three source pads for the following purpose:
+
+.. tabularcolumns:: |p{0.8cm}|p{4.0cm}|p{4.0cm}|
+
+.. flat-table::
+ :header-rows: 1
+
+ * - Pad
+ - Direction
+ - Purpose
+
+ * - 0
+ - sink
+ - Input raw video stream
+
+ * - 1
+ - sink
+ - Processing parameters
+
+ * - 2
+ - source
+ - Output processed video stream
+
+ * - 3
+ - source
+ - Output viewfinder video stream
+
+ * - 4
+ - source
+ - 3A statistics
+
+Each pad is connected to a corresponding V4L2 video interface, exposed to
+userspace as a V4L2 video device node.
+
+Device operation
+----------------
+
+With ImgU, once the input video node ("ipu3-imgu 0/1":0, in
+<entity>:<pad-number> format) is queued with buffer (in packed raw Bayer
+format), ImgU starts processing the buffer and produces the video output in YUV
+format and statistics output on respective output nodes. The driver is expected
+to have buffers ready for all of parameter, output and statistics nodes, when
+input video node is queued with buffer.
+
+At a minimum, all of input, main output, 3A statistics and viewfinder
+video nodes should be enabled for IPU3 to start image processing.
+
+Each ImgU V4L2 subdev has the following set of video nodes.
+
+input, output and viewfinder video nodes
+----------------------------------------
+
+The frames (in packed raw Bayer format specific to the IPU3) received by the
+input video node is processed by the IPU3 Imaging Unit and are output to 2 video
+nodes, with each targeting a different purpose (main output and viewfinder
+output).
+
+Details onand the Bayer format specific to the IPU3 can be found in
+:ref:`v4l2-pix-fmt-ipu3-sbggr10`.
+
+The driver supports V4L2 Video Capture Interface as defined at :ref:`devices`.
+
+Only the multi-planar API is supported. More details can be found at
+:ref:`planar-apis`.
+
+Parameters video node
+---------------------
+
+The parameters video node receives the ImgU algorithm parameters that are used
+to configure how the ImgU algorithms process the image.
+
+Details on processing parameters specific to the IPU3 can be found in
+:ref:`v4l2-meta-fmt-params`.
+
+3A statistics video node
+------------------------
+
+3A statistics video node is used by the ImgU driver to output the 3A (auto
+focus, auto exposure and auto white balance) statistics for the frames that are
+being processed by the ImgU to user space applications. User space applications
+can use this statistics data to compute the desired algorithm parameters for
+the ImgU.
+
+Configuring the Intel IPU3
+==========================
+
+The IPU3 ImgU pipelines can be configured using the Media Controller, defined at
+:ref:`media_controller`.
+
+Running mode and firmware binary selection
+------------------------------------------
+
+ImgU works based on firmware, currently the ImgU firmware support run 2 pipes
+in time-sharing with single input frame data. Each pipe can run at certain mode
+- "VIDEO" or "STILL", "VIDEO" mode is commonly used for video frames capture,
+and "STILL" is used for still frame capture. However, you can also select
+"VIDEO" to capture still frames if you want to capture images with less system
+load and power. For "STILL" mode, ImgU will try to use smaller BDS factor and
+output larger bayer frame for further YUV processing than "VIDEO" mode to get
+high quality images. Besides, "STILL" mode need XNR3 to do noise reduction,
+hence "STILL" mode will need more power and memory bandwidth than "VIDEO" mode.
+TNR will be enabled in "VIDEO" mode and bypassed by "STILL" mode. ImgU is
+running at "VIDEO" mode by default, the user can use v4l2 control
+V4L2_CID_INTEL_IPU3_MODE (currently defined in
+drivers/staging/media/ipu3/include/uapi/intel-ipu3.h) to query and set the
+running mode. For user, there is no difference for buffer queueing between the
+"VIDEO" and "STILL" mode, mandatory input and main output node should be
+enabled and buffers need be queued, the statistics and the view-finder queues
+are optional.
+
+The firmware binary will be selected according to current running mode, such log
+"using binary if_to_osys_striped " or "using binary if_to_osys_primary_striped"
+could be observed if you enable the ImgU dynamic debug, the binary
+if_to_osys_striped is selected for "VIDEO" and the binary
+"if_to_osys_primary_striped" is selected for "STILL".
+
+
+Processing the image in raw Bayer format
+----------------------------------------
+
+Configuring ImgU V4L2 subdev for image processing
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ImgU V4L2 subdevs have to be configured with media controller APIs to have
+all the video nodes setup correctly.
+
+Let us take "ipu3-imgu 0" subdev as an example.
+
+.. code-block:: none
+
+ media-ctl -d $MDEV -r
+ media-ctl -d $MDEV -l "ipu3-imgu 0 input":0 -> "ipu3-imgu 0":0[1]
+ media-ctl -d $MDEV -l "ipu3-imgu 0":2 -> "ipu3-imgu 0 output":0[1]
+ media-ctl -d $MDEV -l "ipu3-imgu 0":3 -> "ipu3-imgu 0 viewfinder":0[1]
+ media-ctl -d $MDEV -l "ipu3-imgu 0":4 -> "ipu3-imgu 0 3a stat":0[1]
+
+Also the pipe mode of the corresponding V4L2 subdev should be set as desired
+(e.g 0 for video mode or 1 for still mode) through the control id 0x009819a1 as
+below.
+
+.. code-block:: none
+
+ yavta -w "0x009819A1 1" /dev/v4l-subdev7
+
+Certain hardware blocks in ImgU pipeline can change the frame resolution by
+cropping or scaling, these hardware blocks include Input Feeder(IF), Bayer Down
+Scaler (BDS) and Geometric Distortion Correction (GDC).
+There is also a block which can change the frame resolution - YUV Scaler, it is
+only applicable to the secondary output.
+
+RAW Bayer frames go through these ImgU pipeline hardware blocks and the final
+processed image output to the DDR memory.
+
+.. kernel-figure:: ipu3_rcb.svg
+ :alt: ipu3 resolution blocks image
+
+ IPU3 resolution change hardware blocks
+
+**Input Feeder**
+
+Input Feeder gets the Bayer frame data from the sensor, it can enable cropping
+of lines and columns from the frame and then store pixels into device's internal
+pixel buffer which are ready to readout by following blocks.
+
+**Bayer Down Scaler**
+
+Bayer Down Scaler is capable of performing image scaling in Bayer domain, the
+downscale factor can be configured from 1X to 1/4X in each axis with
+configuration steps of 0.03125 (1/32).
+
+**Geometric Distortion Correction**
+
+Geometric Distortion Correction is used to perform correction of distortions
+and image filtering. It needs some extra filter and envelope padding pixels to
+work, so the input resolution of GDC should be larger than the output
+resolution.
+
+**YUV Scaler**
+
+YUV Scaler which similar with BDS, but it is mainly do image down scaling in
+YUV domain, it can support up to 1/12X down scaling, but it can not be applied
+to the main output.
+
+The ImgU V4L2 subdev has to be configured with the supported resolutions in all
+the above hardware blocks, for a given input resolution.
+For a given supported resolution for an input frame, the Input Feeder, Bayer
+Down Scaler and GDC blocks should be configured with the supported resolutions
+as each hardware block has its own alignment requirement.
+
+You must configure the output resolution of the hardware blocks smartly to meet
+the hardware requirement along with keeping the maximum field of view. The
+intermediate resolutions can be generated by specific tool -
+
+https://github.com/intel/intel-ipu3-pipecfg
+
+This tool can be used to generate intermediate resolutions. More information can
+be obtained by looking at the following IPU3 ImgU configuration table.
+
+https://chromium.googlesource.com/chromiumos/overlays/board-overlays/+/master
+
+Under baseboard-poppy/media-libs/cros-camera-hal-configs-poppy/files/gcss
+directory, graph_settings_ov5670.xml can be used as an example.
+
+The following steps prepare the ImgU pipeline for the image processing.
+
+1. The ImgU V4L2 subdev data format should be set by using the
+VIDIOC_SUBDEV_S_FMT on pad 0, using the GDC width and height obtained above.
+
+2. The ImgU V4L2 subdev cropping should be set by using the
+VIDIOC_SUBDEV_S_SELECTION on pad 0, with V4L2_SEL_TGT_CROP as the target,
+using the input feeder height and width.
+
+3. The ImgU V4L2 subdev composing should be set by using the
+VIDIOC_SUBDEV_S_SELECTION on pad 0, with V4L2_SEL_TGT_COMPOSE as the target,
+using the BDS height and width.
+
+For the ov5670 example, for an input frame with a resolution of 2592x1944
+(which is input to the ImgU subdev pad 0), the corresponding resolutions
+for input feeder, BDS and GDC are 2592x1944, 2592x1944 and 2560x1920
+respectively.
+
+Once this is done, the received raw Bayer frames can be input to the ImgU
+V4L2 subdev as below, using the open source application v4l2n [#f1]_.
+
+For an image captured with 2592x1944 [#f4]_ resolution, with desired output
+resolution as 2560x1920 and viewfinder resolution as 2560x1920, the following
+v4l2n command can be used. This helps process the raw Bayer frames and produces
+the desired results for the main output image and the viewfinder output, in NV12
+format.
+
+.. code-block:: none
+
+ v4l2n --pipe=4 --load=/tmp/frame-#.bin --open=/dev/video4
+ --fmt=type:VIDEO_OUTPUT_MPLANE,width=2592,height=1944,pixelformat=0X47337069 \
+ --reqbufs=type:VIDEO_OUTPUT_MPLANE,count:1 --pipe=1 \
+ --output=/tmp/frames.out --open=/dev/video5 \
+ --fmt=type:VIDEO_CAPTURE_MPLANE,width=2560,height=1920,pixelformat=NV12 \
+ --reqbufs=type:VIDEO_CAPTURE_MPLANE,count:1 --pipe=2 \
+ --output=/tmp/frames.vf --open=/dev/video6 \
+ --fmt=type:VIDEO_CAPTURE_MPLANE,width=2560,height=1920,pixelformat=NV12 \
+ --reqbufs=type:VIDEO_CAPTURE_MPLANE,count:1 --pipe=3 --open=/dev/video7 \
+ --output=/tmp/frames.3A --fmt=type:META_CAPTURE,? \
+ --reqbufs=count:1,type:META_CAPTURE --pipe=1,2,3,4 --stream=5
+
+You can also use yavta [#f2]_ command to do same thing as above:
+
+.. code-block:: none
+
+ yavta --data-prefix -Bcapture-mplane -c10 -n5 -I -s2592x1944 \
+ --file=frame-#.out-f NV12 /dev/video5 & \
+ yavta --data-prefix -Bcapture-mplane -c10 -n5 -I -s2592x1944 \
+ --file=frame-#.vf -f NV12 /dev/video6 & \
+ yavta --data-prefix -Bmeta-capture -c10 -n5 -I \
+ --file=frame-#.3a /dev/video7 & \
+ yavta --data-prefix -Boutput-mplane -c10 -n5 -I -s2592x1944 \
+ --file=/tmp/frame-in.cio2 -f IPU3_SGRBG10 /dev/video4
+
+where /dev/video4, /dev/video5, /dev/video6 and /dev/video7 devices point to
+input, output, viewfinder and 3A statistics video nodes respectively.
+
+Converting the raw Bayer image into YUV domain
+----------------------------------------------
+
+The processed images after the above step, can be converted to YUV domain
+as below.
+
+Main output frames
+~~~~~~~~~~~~~~~~~~
+
+.. code-block:: none
+
+ raw2pnm -x2560 -y1920 -fNV12 /tmp/frames.out /tmp/frames.out.ppm
+
+where 2560x1920 is output resolution, NV12 is the video format, followed
+by input frame and output PNM file.
+
+Viewfinder output frames
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: none
+
+ raw2pnm -x2560 -y1920 -fNV12 /tmp/frames.vf /tmp/frames.vf.ppm
+
+where 2560x1920 is output resolution, NV12 is the video format, followed
+by input frame and output PNM file.
+
+Example user space code for IPU3
+================================
+
+User space code that configures and uses IPU3 is available here.
+
+https://chromium.googlesource.com/chromiumos/platform/arc-camera/+/master/
+
+The source can be located under hal/intel directory.
+
+Overview of IPU3 pipeline
+=========================
+
+IPU3 pipeline has a number of image processing stages, each of which takes a
+set of parameters as input. The major stages of pipelines are shown here:
+
+.. kernel-render:: DOT
+ :alt: IPU3 ImgU Pipeline
+ :caption: IPU3 ImgU Pipeline Diagram
+
+ digraph "IPU3 ImgU" {
+ node [shape=box]
+ splines="ortho"
+ rankdir="LR"
+
+ a [label="Raw pixels"]
+ b [label="Bayer Downscaling"]
+ c [label="Optical Black Correction"]
+ d [label="Linearization"]
+ e [label="Lens Shading Correction"]
+ f [label="White Balance / Exposure / Focus Apply"]
+ g [label="Bayer Noise Reduction"]
+ h [label="ANR"]
+ i [label="Demosaicing"]
+ j [label="Color Correction Matrix"]
+ k [label="Gamma correction"]
+ l [label="Color Space Conversion"]
+ m [label="Chroma Down Scaling"]
+ n [label="Chromatic Noise Reduction"]
+ o [label="Total Color Correction"]
+ p [label="XNR3"]
+ q [label="TNR"]
+ r [label="DDR", style=filled, fillcolor=yellow, shape=cylinder]
+ s [label="YUV Downscaling"]
+ t [label="DDR", style=filled, fillcolor=yellow, shape=cylinder]
+
+ { rank=same; a -> b -> c -> d -> e -> f -> g -> h -> i }
+ { rank=same; j -> k -> l -> m -> n -> o -> p -> q -> s -> t}
+
+ a -> j [style=invis, weight=10]
+ i -> j
+ q -> r
+ }
+
+The table below presents a description of the above algorithms.
+
+======================== =======================================================
+Name Description
+======================== =======================================================
+Optical Black Correction Optical Black Correction block subtracts a pre-defined
+ value from the respective pixel values to obtain better
+ image quality.
+ Defined in struct ipu3_uapi_obgrid_param.
+Linearization This algo block uses linearization parameters to
+ address non-linearity sensor effects. The Lookup table
+ table is defined in
+ struct ipu3_uapi_isp_lin_vmem_params.
+SHD Lens shading correction is used to correct spatial
+ non-uniformity of the pixel response due to optical
+ lens shading. This is done by applying a different gain
+ for each pixel. The gain, black level etc are
+ configured in struct ipu3_uapi_shd_config_static.
+BNR Bayer noise reduction block removes image noise by
+ applying a bilateral filter.
+ See struct ipu3_uapi_bnr_static_config for details.
+ANR Advanced Noise Reduction is a block based algorithm
+ that performs noise reduction in the Bayer domain. The
+ convolution matrix etc can be found in
+ struct ipu3_uapi_anr_config.
+DM Demosaicing converts raw sensor data in Bayer format
+ into RGB (Red, Green, Blue) presentation. Then add
+ outputs of estimation of Y channel for following stream
+ processing by Firmware. The struct is defined as
+ struct ipu3_uapi_dm_config.
+Color Correction Color Correction algo transforms sensor specific color
+ space to the standard "sRGB" color space. This is done
+ by applying 3x3 matrix defined in
+ struct ipu3_uapi_ccm_mat_config.
+Gamma correction Gamma correction struct ipu3_uapi_gamma_config is a
+ basic non-linear tone mapping correction that is
+ applied per pixel for each pixel component.
+CSC Color space conversion transforms each pixel from the
+ RGB primary presentation to YUV (Y: brightness,
+ UV: Luminance) presentation. This is done by applying
+ a 3x3 matrix defined in
+ struct ipu3_uapi_csc_mat_config
+CDS Chroma down sampling
+ After the CSC is performed, the Chroma Down Sampling
+ is applied for a UV plane down sampling by a factor
+ of 2 in each direction for YUV 4:2:0 using a 4x2
+ configurable filter struct ipu3_uapi_cds_params.
+CHNR Chroma noise reduction
+ This block processes only the chrominance pixels and
+ performs noise reduction by cleaning the high
+ frequency noise.
+ See struct struct ipu3_uapi_yuvp1_chnr_config.
+TCC Total color correction as defined in struct
+ struct ipu3_uapi_yuvp2_tcc_static_config.
+XNR3 eXtreme Noise Reduction V3 is the third revision of
+ noise reduction algorithm used to improve image
+ quality. This removes the low frequency noise in the
+ captured image. Two related structs are being defined,
+ struct ipu3_uapi_isp_xnr3_params for ISP data memory
+ and struct ipu3_uapi_isp_xnr3_vmem_params for vector
+ memory.
+TNR Temporal Noise Reduction block compares successive
+ frames in time to remove anomalies / noise in pixel
+ values. struct ipu3_uapi_isp_tnr3_vmem_params and
+ struct ipu3_uapi_isp_tnr3_params are defined for ISP
+ vector and data memory respectively.
+======================== =======================================================
+
+Other often encountered acronyms not listed in above table:
+
+ ACC
+ Accelerator cluster
+ AWB_FR
+ Auto white balance filter response statistics
+ BDS
+ Bayer downscaler parameters
+ CCM
+ Color correction matrix coefficients
+ IEFd
+ Image enhancement filter directed
+ Obgrid
+ Optical black level compensation
+ OSYS
+ Output system configuration
+ ROI
+ Region of interest
+ YDS
+ Y down sampling
+ YTM
+ Y-tone mapping
+
+A few stages of the pipeline will be executed by firmware running on the ISP
+processor, while many others will use a set of fixed hardware blocks also
+called accelerator cluster (ACC) to crunch pixel data and produce statistics.
+
+ACC parameters of individual algorithms, as defined by
+struct ipu3_uapi_acc_param, can be chosen to be applied by the user
+space through struct struct ipu3_uapi_flags embedded in
+struct ipu3_uapi_params structure. For parameters that are configured as
+not enabled by the user space, the corresponding structs are ignored by the
+driver, in which case the existing configuration of the algorithm will be
+preserved.
+
+References
+==========
+
+.. [#f5] drivers/staging/media/ipu3/include/uapi/intel-ipu3.h
+
+.. [#f1] https://github.com/intel/nvt
+
+.. [#f2] http://git.ideasonboard.org/yavta.git
+
+.. [#f3] http://git.ideasonboard.org/?p=media-ctl.git;a=summary
+
+.. [#f4] ImgU limitation requires an additional 16x16 for all input resolutions
diff --git a/Documentation/admin-guide/media/ipu3_rcb.svg b/Documentation/admin-guide/media/ipu3_rcb.svg
new file mode 100644
index 000000000000..d878421b42a0
--- /dev/null
+++ b/Documentation/admin-guide/media/ipu3_rcb.svg
@@ -0,0 +1,331 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="774pt" height="152pt" viewBox="0 0 774 152" version="1.1">
+<defs>
+<g>
+<symbol overflow="visible" id="glyph0-0">
+<path style="stroke:none;" d="M 1 0 L 1 -15 L 9 -15 L 9 0 Z M 8 -1 L 8 -14 L 2 -14 L 2 -1 Z M 8 -1 "/>
+</symbol>
+<symbol overflow="visible" id="glyph0-1">
+<path style="stroke:none;" d="M 4.6875 -1.15625 C 5.519531 -1.15625 6.15625 -1.316406 6.59375 -1.640625 C 7.039062 -1.960938 7.265625 -2.441406 7.265625 -3.078125 C 7.265625 -3.460938 7.179688 -3.789062 7.015625 -4.0625 C 6.859375 -4.34375 6.644531 -4.582031 6.375 -4.78125 C 6.113281 -4.988281 5.816406 -5.171875 5.484375 -5.328125 C 5.148438 -5.484375 4.804688 -5.628906 4.453125 -5.765625 C 4.054688 -5.921875 3.675781 -6.097656 3.3125 -6.296875 C 2.945312 -6.492188 2.617188 -6.726562 2.328125 -7 C 2.046875 -7.269531 1.820312 -7.582031 1.65625 -7.9375 C 1.488281 -8.300781 1.40625 -8.726562 1.40625 -9.21875 C 1.40625 -10.300781 1.742188 -11.144531 2.421875 -11.75 C 3.097656 -12.351562 4.046875 -12.65625 5.265625 -12.65625 C 5.597656 -12.65625 5.925781 -12.628906 6.25 -12.578125 C 6.570312 -12.535156 6.875 -12.476562 7.15625 -12.40625 C 7.4375 -12.34375 7.6875 -12.265625 7.90625 -12.171875 C 8.125 -12.085938 8.300781 -12 8.4375 -11.90625 L 7.921875 -10.515625 C 7.648438 -10.679688 7.28125 -10.84375 6.8125 -11 C 6.351562 -11.15625 5.835938 -11.234375 5.265625 -11.234375 C 4.660156 -11.234375 4.140625 -11.082031 3.703125 -10.78125 C 3.265625 -10.488281 3.046875 -10.039062 3.046875 -9.4375 C 3.046875 -9.09375 3.109375 -8.800781 3.234375 -8.5625 C 3.359375 -8.320312 3.53125 -8.109375 3.75 -7.921875 C 3.96875 -7.742188 4.222656 -7.582031 4.515625 -7.4375 C 4.804688 -7.289062 5.128906 -7.144531 5.484375 -7 C 5.984375 -6.789062 6.441406 -6.578125 6.859375 -6.359375 C 7.285156 -6.148438 7.648438 -5.894531 7.953125 -5.59375 C 8.253906 -5.300781 8.488281 -4.953125 8.65625 -4.546875 C 8.820312 -4.148438 8.90625 -3.664062 8.90625 -3.09375 C 8.90625 -2.019531 8.539062 -1.191406 7.8125 -0.609375 C 7.082031 -0.0234375 6.039062 0.265625 4.6875 0.265625 C 4.238281 0.265625 3.820312 0.234375 3.4375 0.171875 C 3.050781 0.109375 2.707031 0.03125 2.40625 -0.0625 C 2.101562 -0.15625 1.835938 -0.25 1.609375 -0.34375 C 1.390625 -0.4375 1.21875 -0.519531 1.09375 -0.59375 L 1.59375 -1.953125 C 1.863281 -1.804688 2.257812 -1.632812 2.78125 -1.4375 C 3.300781 -1.25 3.9375 -1.15625 4.6875 -1.15625 Z M 4.6875 -1.15625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph0-2">
+<path style="stroke:none;" d="M 5.1875 -9.5 C 6.4375 -9.5 7.398438 -9.109375 8.078125 -8.328125 C 8.753906 -7.546875 9.09375 -6.363281 9.09375 -4.78125 L 9.09375 -4.203125 L 2.453125 -4.203125 C 2.523438 -3.242188 2.84375 -2.515625 3.40625 -2.015625 C 3.976562 -1.515625 4.773438 -1.265625 5.796875 -1.265625 C 6.390625 -1.265625 6.890625 -1.3125 7.296875 -1.40625 C 7.710938 -1.5 8.023438 -1.597656 8.234375 -1.703125 L 8.453125 -0.296875 C 8.253906 -0.191406 7.894531 -0.0820312 7.375 0.03125 C 6.851562 0.15625 6.269531 0.21875 5.625 0.21875 C 4.820312 0.21875 4.113281 0.0976562 3.5 -0.140625 C 2.894531 -0.390625 2.394531 -0.726562 2 -1.15625 C 1.601562 -1.582031 1.300781 -2.09375 1.09375 -2.6875 C 0.894531 -3.28125 0.796875 -3.925781 0.796875 -4.625 C 0.796875 -5.445312 0.921875 -6.164062 1.171875 -6.78125 C 1.429688 -7.394531 1.765625 -7.898438 2.171875 -8.296875 C 2.585938 -8.703125 3.054688 -9.003906 3.578125 -9.203125 C 4.097656 -9.398438 4.632812 -9.5 5.1875 -9.5 Z M 7.421875 -5.546875 C 7.421875 -6.328125 7.210938 -6.945312 6.796875 -7.40625 C 6.390625 -7.863281 5.84375 -8.09375 5.15625 -8.09375 C 4.769531 -8.09375 4.421875 -8.019531 4.109375 -7.875 C 3.796875 -7.726562 3.523438 -7.535156 3.296875 -7.296875 C 3.066406 -7.054688 2.882812 -6.78125 2.75 -6.46875 C 2.625 -6.164062 2.539062 -5.859375 2.5 -5.546875 Z M 7.421875 -5.546875 "/>
+</symbol>
+<symbol overflow="visible" id="glyph0-3">
+<path style="stroke:none;" d="M 1.421875 -9.015625 C 2.015625 -9.160156 2.609375 -9.273438 3.203125 -9.359375 C 3.796875 -9.441406 4.351562 -9.484375 4.875 -9.484375 C 6.113281 -9.484375 7.050781 -9.160156 7.6875 -8.515625 C 8.320312 -7.878906 8.640625 -6.851562 8.640625 -5.4375 L 8.640625 0 L 7 0 L 7 -5.140625 C 7 -5.742188 6.945312 -6.226562 6.84375 -6.59375 C 6.738281 -6.96875 6.585938 -7.257812 6.390625 -7.46875 C 6.191406 -7.675781 5.957031 -7.816406 5.6875 -7.890625 C 5.414062 -7.972656 5.117188 -8.015625 4.796875 -8.015625 C 4.535156 -8.015625 4.253906 -8 3.953125 -7.96875 C 3.648438 -7.9375 3.359375 -7.894531 3.078125 -7.84375 L 3.078125 0 L 1.421875 0 Z M 1.421875 -9.015625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph0-4">
+<path style="stroke:none;" d="M 7.015625 -2.3125 C 7.015625 -2.644531 6.878906 -2.914062 6.609375 -3.125 C 6.335938 -3.34375 6 -3.53125 5.59375 -3.6875 C 5.1875 -3.851562 4.742188 -4.015625 4.265625 -4.171875 C 3.785156 -4.328125 3.335938 -4.515625 2.921875 -4.734375 C 2.515625 -4.960938 2.175781 -5.242188 1.90625 -5.578125 C 1.632812 -5.910156 1.5 -6.34375 1.5 -6.875 C 1.5 -7.625 1.800781 -8.25 2.40625 -8.75 C 3.007812 -9.25 3.960938 -9.5 5.265625 -9.5 C 5.765625 -9.5 6.285156 -9.460938 6.828125 -9.390625 C 7.367188 -9.316406 7.832031 -9.21875 8.21875 -9.09375 L 7.921875 -7.625 C 7.816406 -7.675781 7.671875 -7.726562 7.484375 -7.78125 C 7.296875 -7.84375 7.082031 -7.894531 6.84375 -7.9375 C 6.601562 -7.988281 6.34375 -8.023438 6.0625 -8.046875 C 5.789062 -8.078125 5.53125 -8.09375 5.28125 -8.09375 C 3.84375 -8.09375 3.125 -7.703125 3.125 -6.921875 C 3.125 -6.640625 3.257812 -6.398438 3.53125 -6.203125 C 3.800781 -6.015625 4.144531 -5.835938 4.5625 -5.671875 C 4.976562 -5.515625 5.425781 -5.351562 5.90625 -5.1875 C 6.382812 -5.019531 6.828125 -4.816406 7.234375 -4.578125 C 7.648438 -4.335938 7.992188 -4.046875 8.265625 -3.703125 C 8.546875 -3.367188 8.6875 -2.941406 8.6875 -2.421875 C 8.6875 -1.578125 8.359375 -0.925781 7.703125 -0.46875 C 7.046875 -0.0078125 6.007812 0.21875 4.59375 0.21875 C 3.957031 0.21875 3.375 0.164062 2.84375 0.0625 C 2.3125 -0.0390625 1.800781 -0.203125 1.3125 -0.421875 L 1.640625 -1.921875 C 2.109375 -1.703125 2.597656 -1.523438 3.109375 -1.390625 C 3.617188 -1.253906 4.171875 -1.1875 4.765625 -1.1875 C 6.265625 -1.1875 7.015625 -1.5625 7.015625 -2.3125 Z M 7.015625 -2.3125 "/>
+</symbol>
+<symbol overflow="visible" id="glyph0-5">
+<path style="stroke:none;" d="M 9.203125 -4.640625 C 9.203125 -3.910156 9.097656 -3.25 8.890625 -2.65625 C 8.679688 -2.0625 8.390625 -1.550781 8.015625 -1.125 C 7.640625 -0.695312 7.191406 -0.363281 6.671875 -0.125 C 6.160156 0.101562 5.597656 0.21875 4.984375 0.21875 C 4.378906 0.21875 3.820312 0.101562 3.3125 -0.125 C 2.800781 -0.363281 2.359375 -0.695312 1.984375 -1.125 C 1.609375 -1.550781 1.316406 -2.0625 1.109375 -2.65625 C 0.898438 -3.25 0.796875 -3.910156 0.796875 -4.640625 C 0.796875 -5.367188 0.898438 -6.035156 1.109375 -6.640625 C 1.316406 -7.242188 1.609375 -7.753906 1.984375 -8.171875 C 2.359375 -8.585938 2.800781 -8.910156 3.3125 -9.140625 C 3.820312 -9.378906 4.378906 -9.5 4.984375 -9.5 C 5.597656 -9.5 6.160156 -9.378906 6.671875 -9.140625 C 7.191406 -8.910156 7.640625 -8.585938 8.015625 -8.171875 C 8.390625 -7.753906 8.679688 -7.242188 8.890625 -6.640625 C 9.097656 -6.035156 9.203125 -5.367188 9.203125 -4.640625 Z M 7.5 -4.640625 C 7.5 -5.691406 7.269531 -6.519531 6.8125 -7.125 C 6.363281 -7.738281 5.753906 -8.046875 4.984375 -8.046875 C 4.222656 -8.046875 3.617188 -7.738281 3.171875 -7.125 C 2.722656 -6.519531 2.5 -5.691406 2.5 -4.640625 C 2.5 -3.597656 2.722656 -2.773438 3.171875 -2.171875 C 3.617188 -1.566406 4.222656 -1.265625 4.984375 -1.265625 C 5.753906 -1.265625 6.363281 -1.566406 6.8125 -2.171875 C 7.269531 -2.773438 7.5 -3.597656 7.5 -4.640625 Z M 7.5 -4.640625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph0-6">
+<path style="stroke:none;" d="M 2.140625 0 L 2.140625 -8.78125 C 3.503906 -9.25 4.878906 -9.484375 6.265625 -9.484375 C 6.691406 -9.484375 7.097656 -9.460938 7.484375 -9.421875 C 7.867188 -9.390625 8.296875 -9.320312 8.765625 -9.21875 L 8.453125 -7.765625 C 8.023438 -7.878906 7.648438 -7.953125 7.328125 -7.984375 C 7.003906 -8.023438 6.648438 -8.046875 6.265625 -8.046875 C 5.453125 -8.046875 4.625 -7.929688 3.78125 -7.703125 L 3.78125 0 Z M 2.140625 0 "/>
+</symbol>
+<symbol overflow="visible" id="glyph0-7">
+<path style="stroke:none;" d="M 5.8125 -10.984375 L 5.8125 -1.40625 L 8.21875 -1.40625 L 8.21875 0 L 1.78125 0 L 1.78125 -1.40625 L 4.1875 -1.40625 L 4.1875 -10.984375 L 1.78125 -10.984375 L 1.78125 -12.375 L 8.21875 -12.375 L 8.21875 -10.984375 Z M 5.8125 -10.984375 "/>
+</symbol>
+<symbol overflow="visible" id="glyph0-8">
+<path style="stroke:none;" d="M 1.8125 0 L 1.8125 -12.375 L 8.84375 -12.375 L 8.84375 -10.984375 L 3.453125 -10.984375 L 3.453125 -7.125 L 8.203125 -7.125 L 8.203125 -5.734375 L 3.453125 -5.734375 L 3.453125 0 Z M 1.8125 0 "/>
+</symbol>
+<symbol overflow="visible" id="glyph0-9">
+<path style="stroke:none;" d="M 4.078125 0.09375 C 3.878906 0.09375 3.644531 0.0859375 3.375 0.078125 C 3.113281 0.0664062 2.847656 0.0507812 2.578125 0.03125 C 2.316406 0.0078125 2.050781 -0.0195312 1.78125 -0.0625 C 1.507812 -0.101562 1.273438 -0.148438 1.078125 -0.203125 L 1.078125 -12.203125 C 1.273438 -12.253906 1.503906 -12.300781 1.765625 -12.34375 C 2.023438 -12.382812 2.289062 -12.410156 2.5625 -12.421875 C 2.84375 -12.441406 3.113281 -12.457031 3.375 -12.46875 C 3.632812 -12.488281 3.867188 -12.5 4.078125 -12.5 C 4.691406 -12.5 5.265625 -12.445312 5.796875 -12.34375 C 6.328125 -12.238281 6.789062 -12.054688 7.1875 -11.796875 C 7.582031 -11.546875 7.890625 -11.210938 8.109375 -10.796875 C 8.328125 -10.390625 8.4375 -9.878906 8.4375 -9.265625 C 8.4375 -8.960938 8.390625 -8.675781 8.296875 -8.40625 C 8.203125 -8.132812 8.070312 -7.878906 7.90625 -7.640625 C 7.738281 -7.398438 7.546875 -7.1875 7.328125 -7 C 7.109375 -6.820312 6.875 -6.6875 6.625 -6.59375 C 7.300781 -6.40625 7.867188 -6.0625 8.328125 -5.5625 C 8.785156 -5.0625 9.015625 -4.414062 9.015625 -3.625 C 9.015625 -2.394531 8.617188 -1.46875 7.828125 -0.84375 C 7.046875 -0.21875 5.796875 0.09375 4.078125 0.09375 Z M 2.71875 -5.78125 L 2.71875 -1.359375 C 2.75 -1.347656 2.898438 -1.332031 3.171875 -1.3125 C 3.441406 -1.289062 3.785156 -1.28125 4.203125 -1.28125 C 4.609375 -1.28125 5 -1.3125 5.375 -1.375 C 5.757812 -1.445312 6.097656 -1.570312 6.390625 -1.75 C 6.691406 -1.925781 6.929688 -2.160156 7.109375 -2.453125 C 7.285156 -2.753906 7.375 -3.132812 7.375 -3.59375 C 7.375 -4.007812 7.289062 -4.359375 7.125 -4.640625 C 6.957031 -4.921875 6.738281 -5.144531 6.46875 -5.3125 C 6.195312 -5.476562 5.878906 -5.597656 5.515625 -5.671875 C 5.160156 -5.742188 4.789062 -5.78125 4.40625 -5.78125 Z M 2.71875 -7.140625 L 4.015625 -7.140625 C 4.347656 -7.140625 4.679688 -7.171875 5.015625 -7.234375 C 5.347656 -7.304688 5.644531 -7.414062 5.90625 -7.5625 C 6.175781 -7.707031 6.390625 -7.90625 6.546875 -8.15625 C 6.710938 -8.414062 6.796875 -8.738281 6.796875 -9.125 C 6.796875 -9.476562 6.722656 -9.78125 6.578125 -10.03125 C 6.429688 -10.289062 6.238281 -10.5 6 -10.65625 C 5.757812 -10.820312 5.484375 -10.9375 5.171875 -11 C 4.859375 -11.0625 4.53125 -11.09375 4.1875 -11.09375 C 3.832031 -11.09375 3.523438 -11.085938 3.265625 -11.078125 C 3.003906 -11.078125 2.820312 -11.066406 2.71875 -11.046875 Z M 2.71875 -7.140625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph0-10">
+<path style="stroke:none;" d="M 9.203125 -6.203125 C 9.203125 -5.054688 9.054688 -4.082031 8.765625 -3.28125 C 8.484375 -2.476562 8.09375 -1.828125 7.59375 -1.328125 C 7.09375 -0.828125 6.5 -0.460938 5.8125 -0.234375 C 5.125 -0.015625 4.378906 0.09375 3.578125 0.09375 C 2.753906 0.09375 1.921875 -0.00390625 1.078125 -0.203125 L 1.078125 -12.203125 C 1.921875 -12.398438 2.753906 -12.5 3.578125 -12.5 C 4.378906 -12.5 5.125 -12.382812 5.8125 -12.15625 C 6.5 -11.925781 7.09375 -11.554688 7.59375 -11.046875 C 8.09375 -10.546875 8.484375 -9.894531 8.765625 -9.09375 C 9.054688 -8.300781 9.203125 -7.335938 9.203125 -6.203125 Z M 2.71875 -1.375 C 3.050781 -1.332031 3.390625 -1.3125 3.734375 -1.3125 C 4.335938 -1.3125 4.875 -1.398438 5.34375 -1.578125 C 5.8125 -1.765625 6.203125 -2.054688 6.515625 -2.453125 C 6.835938 -2.847656 7.082031 -3.351562 7.25 -3.96875 C 7.425781 -4.59375 7.515625 -5.335938 7.515625 -6.203125 C 7.515625 -7.878906 7.191406 -9.109375 6.546875 -9.890625 C 5.898438 -10.679688 4.945312 -11.078125 3.6875 -11.078125 C 3.507812 -11.078125 3.335938 -11.070312 3.171875 -11.0625 C 3.003906 -11.0625 2.851562 -11.046875 2.71875 -11.015625 Z M 2.71875 -1.375 "/>
+</symbol>
+<symbol overflow="visible" id="glyph0-11">
+<path style="stroke:none;" d="M 7.453125 -6.09375 L 9.09375 -6.09375 L 9.09375 -0.296875 C 8.84375 -0.203125 8.4375 -0.0859375 7.875 0.046875 C 7.320312 0.191406 6.664062 0.265625 5.90625 0.265625 C 5.15625 0.265625 4.472656 0.125 3.859375 -0.15625 C 3.242188 -0.445312 2.71875 -0.863281 2.28125 -1.40625 C 1.851562 -1.957031 1.519531 -2.632812 1.28125 -3.4375 C 1.039062 -4.25 0.921875 -5.171875 0.921875 -6.203125 C 0.921875 -7.242188 1.050781 -8.160156 1.3125 -8.953125 C 1.582031 -9.753906 1.945312 -10.425781 2.40625 -10.96875 C 2.863281 -11.519531 3.398438 -11.9375 4.015625 -12.21875 C 4.628906 -12.507812 5.289062 -12.65625 6 -12.65625 C 6.457031 -12.65625 6.859375 -12.617188 7.203125 -12.546875 C 7.546875 -12.484375 7.835938 -12.40625 8.078125 -12.3125 C 8.328125 -12.226562 8.53125 -12.132812 8.6875 -12.03125 C 8.851562 -11.925781 8.976562 -11.847656 9.0625 -11.796875 L 8.515625 -10.421875 C 8.210938 -10.660156 7.847656 -10.851562 7.421875 -11 C 7.003906 -11.15625 6.5625 -11.234375 6.09375 -11.234375 C 5.59375 -11.234375 5.125 -11.113281 4.6875 -10.875 C 4.257812 -10.632812 3.890625 -10.296875 3.578125 -9.859375 C 3.273438 -9.421875 3.035156 -8.890625 2.859375 -8.265625 C 2.679688 -7.648438 2.59375 -6.960938 2.59375 -6.203125 C 2.59375 -5.453125 2.671875 -4.769531 2.828125 -4.15625 C 2.984375 -3.539062 3.207031 -3.015625 3.5 -2.578125 C 3.789062 -2.140625 4.148438 -1.796875 4.578125 -1.546875 C 5.015625 -1.304688 5.515625 -1.1875 6.078125 -1.1875 C 6.460938 -1.1875 6.757812 -1.210938 6.96875 -1.265625 C 7.1875 -1.316406 7.347656 -1.367188 7.453125 -1.421875 Z M 7.453125 -6.09375 "/>
+</symbol>
+<symbol overflow="visible" id="glyph0-12">
+<path style="stroke:none;" d="M 9.203125 -0.515625 C 8.734375 -0.253906 8.234375 -0.0625 7.703125 0.0625 C 7.179688 0.195312 6.617188 0.265625 6.015625 0.265625 C 5.285156 0.265625 4.609375 0.132812 3.984375 -0.125 C 3.367188 -0.382812 2.832031 -0.773438 2.375 -1.296875 C 1.925781 -1.828125 1.570312 -2.5 1.3125 -3.3125 C 1.050781 -4.132812 0.921875 -5.097656 0.921875 -6.203125 C 0.921875 -7.253906 1.054688 -8.179688 1.328125 -8.984375 C 1.597656 -9.785156 1.96875 -10.457031 2.4375 -11 C 2.90625 -11.539062 3.453125 -11.953125 4.078125 -12.234375 C 4.703125 -12.515625 5.367188 -12.65625 6.078125 -12.65625 C 6.566406 -12.65625 7.066406 -12.585938 7.578125 -12.453125 C 8.097656 -12.328125 8.601562 -12.109375 9.09375 -11.796875 L 8.625 -10.4375 C 7.738281 -10.945312 6.910156 -11.203125 6.140625 -11.203125 C 5.585938 -11.203125 5.09375 -11.082031 4.65625 -10.84375 C 4.226562 -10.613281 3.859375 -10.28125 3.546875 -9.84375 C 3.242188 -9.40625 3.007812 -8.878906 2.84375 -8.265625 C 2.675781 -7.648438 2.59375 -6.960938 2.59375 -6.203125 C 2.59375 -5.347656 2.679688 -4.609375 2.859375 -3.984375 C 3.046875 -3.359375 3.296875 -2.835938 3.609375 -2.421875 C 3.929688 -2.003906 4.316406 -1.695312 4.765625 -1.5 C 5.210938 -1.300781 5.695312 -1.203125 6.21875 -1.203125 C 6.601562 -1.203125 7.007812 -1.25 7.4375 -1.34375 C 7.863281 -1.445312 8.304688 -1.625 8.765625 -1.875 Z M 9.203125 -0.515625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-0">
+<path style="stroke:none;" d="M 0.59375 0 L 0.59375 -9 L 5.40625 -9 L 5.40625 0 Z M 4.796875 -0.59375 L 4.796875 -8.40625 L 1.203125 -8.40625 L 1.203125 -0.59375 Z M 4.796875 -0.59375 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-1">
+<path style="stroke:none;" d="M 2.515625 0 L 2.515625 -2.765625 C 2.023438 -3.554688 1.582031 -4.332031 1.1875 -5.09375 C 0.789062 -5.851562 0.445312 -6.628906 0.15625 -7.421875 L 1.265625 -7.421875 C 1.492188 -6.753906 1.757812 -6.113281 2.0625 -5.5 C 2.363281 -4.882812 2.6875 -4.253906 3.03125 -3.609375 C 3.394531 -4.285156 3.71875 -4.929688 4 -5.546875 C 4.28125 -6.160156 4.539062 -6.785156 4.78125 -7.421875 L 5.859375 -7.421875 C 5.554688 -6.640625 5.207031 -5.875 4.8125 -5.125 C 4.414062 -4.382812 3.976562 -3.601562 3.5 -2.78125 L 3.5 0 Z M 2.515625 0 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-2">
+<path style="stroke:none;" d="M 3 0.15625 C 2.5625 0.15625 2.1875 0.09375 1.875 -0.03125 C 1.570312 -0.164062 1.320312 -0.347656 1.125 -0.578125 C 0.9375 -0.804688 0.796875 -1.085938 0.703125 -1.421875 C 0.617188 -1.765625 0.578125 -2.144531 0.578125 -2.5625 L 0.578125 -7.421875 L 1.5625 -7.421875 L 1.5625 -2.65625 C 1.5625 -2.28125 1.59375 -1.96875 1.65625 -1.71875 C 1.726562 -1.46875 1.828125 -1.265625 1.953125 -1.109375 C 2.078125 -0.960938 2.222656 -0.859375 2.390625 -0.796875 C 2.566406 -0.734375 2.769531 -0.703125 3 -0.703125 C 3.226562 -0.703125 3.425781 -0.734375 3.59375 -0.796875 C 3.769531 -0.859375 3.921875 -0.960938 4.046875 -1.109375 C 4.171875 -1.265625 4.265625 -1.46875 4.328125 -1.71875 C 4.398438 -1.96875 4.4375 -2.28125 4.4375 -2.65625 L 4.4375 -7.421875 L 5.421875 -7.421875 L 5.421875 -2.5625 C 5.421875 -2.144531 5.375 -1.765625 5.28125 -1.421875 C 5.195312 -1.085938 5.054688 -0.804688 4.859375 -0.578125 C 4.671875 -0.347656 4.421875 -0.164062 4.109375 -0.03125 C 3.804688 0.09375 3.4375 0.15625 3 0.15625 Z M 3 0.15625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-3">
+<path style="stroke:none;" d="M 1.21875 -7.421875 C 1.320312 -6.921875 1.445312 -6.375 1.59375 -5.78125 C 1.738281 -5.1875 1.890625 -4.585938 2.046875 -3.984375 C 2.210938 -3.390625 2.378906 -2.820312 2.546875 -2.28125 C 2.722656 -1.738281 2.882812 -1.265625 3.03125 -0.859375 C 3.15625 -1.265625 3.300781 -1.742188 3.46875 -2.296875 C 3.644531 -2.847656 3.816406 -3.421875 3.984375 -4.015625 C 4.148438 -4.609375 4.304688 -5.203125 4.453125 -5.796875 C 4.609375 -6.390625 4.734375 -6.929688 4.828125 -7.421875 L 5.859375 -7.421875 C 5.796875 -7.109375 5.691406 -6.679688 5.546875 -6.140625 C 5.398438 -5.597656 5.226562 -4.992188 5.03125 -4.328125 C 4.832031 -3.660156 4.609375 -2.953125 4.359375 -2.203125 C 4.117188 -1.453125 3.863281 -0.71875 3.59375 0 L 2.375 0 C 2.125 -0.71875 1.878906 -1.445312 1.640625 -2.1875 C 1.410156 -2.9375 1.195312 -3.644531 1 -4.3125 C 0.800781 -4.976562 0.628906 -5.582031 0.484375 -6.125 C 0.335938 -6.675781 0.226562 -7.109375 0.15625 -7.421875 Z M 1.21875 -7.421875 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-4">
+<path style="stroke:none;" d=""/>
+</symbol>
+<symbol overflow="visible" id="glyph1-5">
+<path style="stroke:none;" d="M 5.515625 -3.71875 C 5.515625 -3.03125 5.425781 -2.445312 5.25 -1.96875 C 5.082031 -1.488281 4.847656 -1.097656 4.546875 -0.796875 C 4.253906 -0.492188 3.898438 -0.273438 3.484375 -0.140625 C 3.078125 -0.00390625 2.628906 0.0625 2.140625 0.0625 C 1.648438 0.0625 1.148438 0 0.640625 -0.125 L 0.640625 -7.3125 C 1.148438 -7.4375 1.648438 -7.5 2.140625 -7.5 C 2.628906 -7.5 3.078125 -7.429688 3.484375 -7.296875 C 3.898438 -7.160156 4.253906 -6.941406 4.546875 -6.640625 C 4.847656 -6.335938 5.082031 -5.941406 5.25 -5.453125 C 5.425781 -4.972656 5.515625 -4.394531 5.515625 -3.71875 Z M 1.625 -0.828125 C 1.832031 -0.804688 2.039062 -0.796875 2.25 -0.796875 C 2.601562 -0.796875 2.921875 -0.847656 3.203125 -0.953125 C 3.484375 -1.054688 3.71875 -1.226562 3.90625 -1.46875 C 4.101562 -1.707031 4.253906 -2.007812 4.359375 -2.375 C 4.460938 -2.75 4.515625 -3.195312 4.515625 -3.71875 C 4.515625 -4.726562 4.316406 -5.46875 3.921875 -5.9375 C 3.535156 -6.40625 2.960938 -6.640625 2.203125 -6.640625 C 2.097656 -6.640625 1.992188 -6.640625 1.890625 -6.640625 C 1.796875 -6.640625 1.707031 -6.628906 1.625 -6.609375 Z M 1.625 -0.828125 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-6">
+<path style="stroke:none;" d="M 5.515625 -2.78125 C 5.515625 -2.34375 5.453125 -1.945312 5.328125 -1.59375 C 5.203125 -1.238281 5.023438 -0.929688 4.796875 -0.671875 C 4.578125 -0.410156 4.3125 -0.210938 4 -0.078125 C 3.695312 0.0546875 3.359375 0.125 2.984375 0.125 C 2.628906 0.125 2.296875 0.0546875 1.984375 -0.078125 C 1.679688 -0.210938 1.414062 -0.410156 1.1875 -0.671875 C 0.96875 -0.929688 0.796875 -1.238281 0.671875 -1.59375 C 0.546875 -1.945312 0.484375 -2.34375 0.484375 -2.78125 C 0.484375 -3.21875 0.546875 -3.617188 0.671875 -3.984375 C 0.796875 -4.347656 0.96875 -4.65625 1.1875 -4.90625 C 1.414062 -5.15625 1.679688 -5.347656 1.984375 -5.484375 C 2.296875 -5.628906 2.628906 -5.703125 2.984375 -5.703125 C 3.359375 -5.703125 3.695312 -5.628906 4 -5.484375 C 4.3125 -5.347656 4.578125 -5.15625 4.796875 -4.90625 C 5.023438 -4.65625 5.203125 -4.347656 5.328125 -3.984375 C 5.453125 -3.617188 5.515625 -3.21875 5.515625 -2.78125 Z M 4.5 -2.78125 C 4.5 -3.414062 4.363281 -3.914062 4.09375 -4.28125 C 3.820312 -4.644531 3.453125 -4.828125 2.984375 -4.828125 C 2.523438 -4.828125 2.160156 -4.644531 1.890625 -4.28125 C 1.628906 -3.914062 1.5 -3.414062 1.5 -2.78125 C 1.5 -2.15625 1.628906 -1.660156 1.890625 -1.296875 C 2.160156 -0.929688 2.523438 -0.75 2.984375 -0.75 C 3.453125 -0.75 3.820312 -0.929688 4.09375 -1.296875 C 4.363281 -1.660156 4.5 -2.15625 4.5 -2.78125 Z M 4.5 -2.78125 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-7">
+<path style="stroke:none;" d="M 4.109375 0 C 3.992188 -0.269531 3.890625 -0.515625 3.796875 -0.734375 C 3.710938 -0.960938 3.628906 -1.1875 3.546875 -1.40625 C 3.460938 -1.632812 3.378906 -1.867188 3.296875 -2.109375 C 3.210938 -2.359375 3.113281 -2.640625 3 -2.953125 C 2.882812 -2.640625 2.78125 -2.359375 2.6875 -2.109375 C 2.601562 -1.867188 2.519531 -1.632812 2.4375 -1.40625 C 2.351562 -1.1875 2.265625 -0.960938 2.171875 -0.734375 C 2.085938 -0.515625 1.984375 -0.269531 1.859375 0 L 1.109375 0 C 0.890625 -0.976562 0.707031 -1.953125 0.5625 -2.921875 C 0.414062 -3.890625 0.304688 -4.769531 0.234375 -5.5625 L 1.15625 -5.5625 C 1.1875 -5.25 1.210938 -4.941406 1.234375 -4.640625 C 1.265625 -4.347656 1.300781 -4.035156 1.34375 -3.703125 C 1.382812 -3.378906 1.429688 -3.023438 1.484375 -2.640625 C 1.535156 -2.253906 1.59375 -1.820312 1.65625 -1.34375 C 1.78125 -1.664062 1.882812 -1.945312 1.96875 -2.1875 C 2.0625 -2.425781 2.144531 -2.648438 2.21875 -2.859375 C 2.289062 -3.078125 2.359375 -3.296875 2.421875 -3.515625 C 2.492188 -3.742188 2.570312 -4 2.65625 -4.28125 L 3.390625 -4.28125 C 3.472656 -4 3.546875 -3.742188 3.609375 -3.515625 C 3.671875 -3.296875 3.738281 -3.078125 3.8125 -2.859375 C 3.882812 -2.648438 3.957031 -2.425781 4.03125 -2.1875 C 4.113281 -1.945312 4.21875 -1.671875 4.34375 -1.359375 C 4.414062 -1.796875 4.476562 -2.203125 4.53125 -2.578125 C 4.59375 -2.953125 4.640625 -3.304688 4.671875 -3.640625 C 4.710938 -3.972656 4.75 -4.296875 4.78125 -4.609375 C 4.820312 -4.921875 4.851562 -5.238281 4.875 -5.5625 L 5.765625 -5.5625 C 5.734375 -5.164062 5.6875 -4.738281 5.625 -4.28125 C 5.570312 -3.820312 5.503906 -3.351562 5.421875 -2.875 C 5.335938 -2.394531 5.25 -1.910156 5.15625 -1.421875 C 5.0625 -0.929688 4.960938 -0.457031 4.859375 0 Z M 4.109375 0 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-8">
+<path style="stroke:none;" d="M 0.859375 -5.40625 C 1.210938 -5.5 1.566406 -5.566406 1.921875 -5.609375 C 2.273438 -5.660156 2.609375 -5.6875 2.921875 -5.6875 C 3.671875 -5.6875 4.234375 -5.492188 4.609375 -5.109375 C 4.992188 -4.722656 5.1875 -4.109375 5.1875 -3.265625 L 5.1875 0 L 4.203125 0 L 4.203125 -3.078125 C 4.203125 -3.441406 4.171875 -3.734375 4.109375 -3.953125 C 4.046875 -4.179688 3.953125 -4.359375 3.828125 -4.484375 C 3.710938 -4.609375 3.570312 -4.691406 3.40625 -4.734375 C 3.25 -4.785156 3.070312 -4.8125 2.875 -4.8125 C 2.71875 -4.8125 2.546875 -4.800781 2.359375 -4.78125 C 2.179688 -4.757812 2.007812 -4.734375 1.84375 -4.703125 L 1.84375 0 L 0.859375 0 Z M 0.859375 -5.40625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-9">
+<path style="stroke:none;" d="M 4.21875 -1.390625 C 4.21875 -1.585938 4.132812 -1.75 3.96875 -1.875 C 3.800781 -2.007812 3.59375 -2.125 3.34375 -2.21875 C 3.101562 -2.3125 2.835938 -2.40625 2.546875 -2.5 C 2.265625 -2.59375 2 -2.707031 1.75 -2.84375 C 1.507812 -2.976562 1.304688 -3.144531 1.140625 -3.34375 C 0.984375 -3.539062 0.90625 -3.800781 0.90625 -4.125 C 0.90625 -4.570312 1.082031 -4.945312 1.4375 -5.25 C 1.800781 -5.550781 2.375 -5.703125 3.15625 -5.703125 C 3.457031 -5.703125 3.769531 -5.675781 4.09375 -5.625 C 4.414062 -5.582031 4.695312 -5.523438 4.9375 -5.453125 L 4.75 -4.578125 C 4.6875 -4.609375 4.597656 -4.640625 4.484375 -4.671875 C 4.367188 -4.710938 4.238281 -4.742188 4.09375 -4.765625 C 3.957031 -4.796875 3.804688 -4.816406 3.640625 -4.828125 C 3.472656 -4.847656 3.316406 -4.859375 3.171875 -4.859375 C 2.304688 -4.859375 1.875 -4.625 1.875 -4.15625 C 1.875 -3.988281 1.953125 -3.84375 2.109375 -3.71875 C 2.273438 -3.601562 2.484375 -3.5 2.734375 -3.40625 C 2.984375 -3.3125 3.25 -3.210938 3.53125 -3.109375 C 3.820312 -3.015625 4.09375 -2.894531 4.34375 -2.75 C 4.59375 -2.601562 4.796875 -2.425781 4.953125 -2.21875 C 5.117188 -2.019531 5.203125 -1.765625 5.203125 -1.453125 C 5.203125 -0.953125 5.003906 -0.5625 4.609375 -0.28125 C 4.222656 -0.0078125 3.609375 0.125 2.765625 0.125 C 2.378906 0.125 2.023438 0.09375 1.703125 0.03125 C 1.378906 -0.03125 1.078125 -0.125 0.796875 -0.25 L 0.984375 -1.15625 C 1.265625 -1.019531 1.554688 -0.910156 1.859375 -0.828125 C 2.171875 -0.742188 2.503906 -0.703125 2.859375 -0.703125 C 3.765625 -0.703125 4.21875 -0.929688 4.21875 -1.390625 Z M 4.21875 -1.390625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-10">
+<path style="stroke:none;" d="M 0.59375 -2.765625 C 0.59375 -3.273438 0.671875 -3.710938 0.828125 -4.078125 C 0.984375 -4.441406 1.203125 -4.742188 1.484375 -4.984375 C 1.765625 -5.234375 2.09375 -5.414062 2.46875 -5.53125 C 2.84375 -5.644531 3.238281 -5.703125 3.65625 -5.703125 C 3.925781 -5.703125 4.195312 -5.679688 4.46875 -5.640625 C 4.738281 -5.609375 5.023438 -5.546875 5.328125 -5.453125 L 5.09375 -4.59375 C 4.832031 -4.6875 4.59375 -4.75 4.375 -4.78125 C 4.15625 -4.8125 3.929688 -4.828125 3.703125 -4.828125 C 3.421875 -4.828125 3.148438 -4.785156 2.890625 -4.703125 C 2.640625 -4.628906 2.414062 -4.507812 2.21875 -4.34375 C 2.03125 -4.1875 1.878906 -3.976562 1.765625 -3.71875 C 1.660156 -3.457031 1.609375 -3.140625 1.609375 -2.765625 C 1.609375 -2.421875 1.660156 -2.117188 1.765625 -1.859375 C 1.867188 -1.609375 2.015625 -1.398438 2.203125 -1.234375 C 2.390625 -1.078125 2.613281 -0.957031 2.875 -0.875 C 3.144531 -0.789062 3.4375 -0.75 3.75 -0.75 C 4.007812 -0.75 4.253906 -0.765625 4.484375 -0.796875 C 4.722656 -0.828125 4.984375 -0.890625 5.265625 -0.984375 L 5.40625 -0.15625 C 5.125 -0.0507812 4.835938 0.0195312 4.546875 0.0625 C 4.265625 0.101562 3.957031 0.125 3.625 0.125 C 3.175781 0.125 2.765625 0.0664062 2.390625 -0.046875 C 2.023438 -0.171875 1.707031 -0.351562 1.4375 -0.59375 C 1.164062 -0.832031 0.957031 -1.132812 0.8125 -1.5 C 0.664062 -1.863281 0.59375 -2.285156 0.59375 -2.765625 Z M 0.59375 -2.765625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-11">
+<path style="stroke:none;" d="M 3.0625 -0.703125 C 3.3125 -0.703125 3.53125 -0.707031 3.71875 -0.71875 C 3.914062 -0.738281 4.082031 -0.765625 4.21875 -0.796875 L 4.21875 -2.453125 C 4.082031 -2.492188 3.925781 -2.523438 3.75 -2.546875 C 3.570312 -2.566406 3.382812 -2.578125 3.1875 -2.578125 C 3 -2.578125 2.816406 -2.5625 2.640625 -2.53125 C 2.460938 -2.507812 2.304688 -2.460938 2.171875 -2.390625 C 2.035156 -2.316406 1.921875 -2.222656 1.828125 -2.109375 C 1.742188 -1.992188 1.703125 -1.847656 1.703125 -1.671875 C 1.703125 -1.304688 1.820312 -1.050781 2.0625 -0.90625 C 2.3125 -0.769531 2.644531 -0.703125 3.0625 -0.703125 Z M 2.96875 -5.703125 C 3.382812 -5.703125 3.734375 -5.648438 4.015625 -5.546875 C 4.296875 -5.441406 4.523438 -5.296875 4.703125 -5.109375 C 4.878906 -4.929688 5.003906 -4.707031 5.078125 -4.4375 C 5.148438 -4.175781 5.1875 -3.890625 5.1875 -3.578125 L 5.1875 -0.09375 C 4.957031 -0.0507812 4.648438 -0.00390625 4.265625 0.046875 C 3.890625 0.0976562 3.5 0.125 3.09375 0.125 C 2.789062 0.125 2.492188 0.0976562 2.203125 0.046875 C 1.921875 -0.00390625 1.664062 -0.09375 1.4375 -0.21875 C 1.21875 -0.351562 1.039062 -0.535156 0.90625 -0.765625 C 0.769531 -0.992188 0.703125 -1.289062 0.703125 -1.65625 C 0.703125 -1.976562 0.769531 -2.25 0.90625 -2.46875 C 1.039062 -2.6875 1.21875 -2.863281 1.4375 -3 C 1.664062 -3.132812 1.921875 -3.234375 2.203125 -3.296875 C 2.484375 -3.359375 2.769531 -3.390625 3.0625 -3.390625 C 3.445312 -3.390625 3.832031 -3.34375 4.21875 -3.25 L 4.21875 -3.53125 C 4.21875 -3.695312 4.195312 -3.859375 4.15625 -4.015625 C 4.125 -4.171875 4.054688 -4.3125 3.953125 -4.4375 C 3.847656 -4.5625 3.707031 -4.660156 3.53125 -4.734375 C 3.363281 -4.816406 3.144531 -4.859375 2.875 -4.859375 C 2.53125 -4.859375 2.226562 -4.832031 1.96875 -4.78125 C 1.71875 -4.738281 1.523438 -4.691406 1.390625 -4.640625 L 1.265625 -5.453125 C 1.398438 -5.523438 1.625 -5.582031 1.9375 -5.625 C 2.257812 -5.675781 2.601562 -5.703125 2.96875 -5.703125 Z M 2.96875 -5.703125 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-12">
+<path style="stroke:none;" d="M 4.0625 0.125 C 3.707031 0.125 3.410156 0.078125 3.171875 -0.015625 C 2.941406 -0.109375 2.757812 -0.25 2.625 -0.4375 C 2.488281 -0.632812 2.390625 -0.875 2.328125 -1.15625 C 2.273438 -1.4375 2.25 -1.765625 2.25 -2.140625 L 2.25 -7.421875 L 0.640625 -7.421875 L 0.640625 -8.25 L 3.234375 -8.25 L 3.234375 -2.140625 C 3.234375 -1.867188 3.25 -1.644531 3.28125 -1.46875 C 3.320312 -1.289062 3.378906 -1.144531 3.453125 -1.03125 C 3.535156 -0.925781 3.628906 -0.851562 3.734375 -0.8125 C 3.847656 -0.769531 3.984375 -0.75 4.140625 -0.75 C 4.367188 -0.75 4.582031 -0.773438 4.78125 -0.828125 C 4.988281 -0.890625 5.144531 -0.953125 5.25 -1.015625 L 5.40625 -0.1875 C 5.351562 -0.15625 5.28125 -0.117188 5.1875 -0.078125 C 5.101562 -0.046875 5 -0.015625 4.875 0.015625 C 4.757812 0.046875 4.628906 0.0703125 4.484375 0.09375 C 4.347656 0.113281 4.207031 0.125 4.0625 0.125 Z M 4.0625 0.125 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-13">
+<path style="stroke:none;" d="M 2.515625 -6.4375 C 2.304688 -6.4375 2.125 -6.503906 1.96875 -6.640625 C 1.8125 -6.785156 1.734375 -6.984375 1.734375 -7.234375 C 1.734375 -7.484375 1.8125 -7.679688 1.96875 -7.828125 C 2.125 -7.972656 2.304688 -8.046875 2.515625 -8.046875 C 2.722656 -8.046875 2.898438 -7.972656 3.046875 -7.828125 C 3.203125 -7.679688 3.28125 -7.484375 3.28125 -7.234375 C 3.28125 -6.984375 3.203125 -6.785156 3.046875 -6.640625 C 2.898438 -6.503906 2.722656 -6.4375 2.515625 -6.4375 Z M 2.25 -4.734375 L 0.640625 -4.734375 L 0.640625 -5.5625 L 3.234375 -5.5625 L 3.234375 -2.140625 C 3.234375 -1.585938 3.3125 -1.21875 3.46875 -1.03125 C 3.625 -0.84375 3.851562 -0.75 4.15625 -0.75 C 4.382812 -0.75 4.597656 -0.773438 4.796875 -0.828125 C 4.992188 -0.890625 5.144531 -0.953125 5.25 -1.015625 L 5.40625 -0.1875 C 5.351562 -0.15625 5.28125 -0.117188 5.1875 -0.078125 C 5.101562 -0.046875 5.003906 -0.015625 4.890625 0.015625 C 4.773438 0.046875 4.644531 0.0703125 4.5 0.09375 C 4.363281 0.113281 4.21875 0.125 4.0625 0.125 C 3.71875 0.125 3.425781 0.078125 3.1875 -0.015625 C 2.957031 -0.109375 2.769531 -0.25 2.625 -0.4375 C 2.488281 -0.632812 2.390625 -0.875 2.328125 -1.15625 C 2.273438 -1.4375 2.25 -1.765625 2.25 -2.140625 Z M 2.25 -4.734375 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-14">
+<path style="stroke:none;" d="M 4.15625 -0.515625 C 4.039062 -0.453125 3.863281 -0.382812 3.625 -0.3125 C 3.394531 -0.238281 3.128906 -0.203125 2.828125 -0.203125 C 2.503906 -0.203125 2.195312 -0.253906 1.90625 -0.359375 C 1.625 -0.472656 1.378906 -0.640625 1.171875 -0.859375 C 0.960938 -1.078125 0.796875 -1.351562 0.671875 -1.6875 C 0.546875 -2.03125 0.484375 -2.4375 0.484375 -2.90625 C 0.484375 -3.3125 0.539062 -3.679688 0.65625 -4.015625 C 0.769531 -4.359375 0.9375 -4.65625 1.15625 -4.90625 C 1.375 -5.15625 1.644531 -5.347656 1.96875 -5.484375 C 2.289062 -5.628906 2.65625 -5.703125 3.0625 -5.703125 C 3.539062 -5.703125 3.945312 -5.664062 4.28125 -5.59375 C 4.625 -5.53125 4.910156 -5.46875 5.140625 -5.40625 L 5.140625 -0.4375 C 5.140625 0.425781 4.921875 1.050781 4.484375 1.4375 C 4.054688 1.820312 3.398438 2.015625 2.515625 2.015625 C 2.160156 2.015625 1.835938 1.984375 1.546875 1.921875 C 1.253906 1.867188 0.992188 1.804688 0.765625 1.734375 L 0.953125 0.859375 C 1.160156 0.941406 1.394531 1.007812 1.65625 1.0625 C 1.925781 1.125 2.222656 1.15625 2.546875 1.15625 C 3.117188 1.15625 3.53125 1.035156 3.78125 0.796875 C 4.03125 0.566406 4.15625 0.191406 4.15625 -0.328125 Z M 4.15625 -4.6875 C 4.0625 -4.71875 3.925781 -4.75 3.75 -4.78125 C 3.582031 -4.8125 3.359375 -4.828125 3.078125 -4.828125 C 2.554688 -4.828125 2.160156 -4.648438 1.890625 -4.296875 C 1.628906 -3.941406 1.5 -3.472656 1.5 -2.890625 C 1.5 -2.566406 1.535156 -2.289062 1.609375 -2.0625 C 1.691406 -1.84375 1.796875 -1.65625 1.921875 -1.5 C 2.054688 -1.351562 2.207031 -1.242188 2.375 -1.171875 C 2.539062 -1.109375 2.722656 -1.078125 2.921875 -1.078125 C 3.160156 -1.078125 3.390625 -1.113281 3.609375 -1.1875 C 3.835938 -1.257812 4.019531 -1.34375 4.15625 -1.4375 Z M 4.15625 -4.6875 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-15">
+<path style="stroke:none;" d="M 2.8125 -0.703125 C 3.3125 -0.703125 3.691406 -0.796875 3.953125 -0.984375 C 4.222656 -1.171875 4.359375 -1.457031 4.359375 -1.84375 C 4.359375 -2.082031 4.304688 -2.28125 4.203125 -2.4375 C 4.109375 -2.601562 3.984375 -2.75 3.828125 -2.875 C 3.671875 -3 3.488281 -3.109375 3.28125 -3.203125 C 3.082031 -3.296875 2.878906 -3.378906 2.671875 -3.453125 C 2.429688 -3.546875 2.203125 -3.648438 1.984375 -3.765625 C 1.765625 -3.890625 1.566406 -4.03125 1.390625 -4.1875 C 1.222656 -4.351562 1.085938 -4.546875 0.984375 -4.765625 C 0.890625 -4.984375 0.84375 -5.238281 0.84375 -5.53125 C 0.84375 -6.175781 1.046875 -6.679688 1.453125 -7.046875 C 1.859375 -7.410156 2.425781 -7.59375 3.15625 -7.59375 C 3.351562 -7.59375 3.550781 -7.578125 3.75 -7.546875 C 3.945312 -7.523438 4.128906 -7.492188 4.296875 -7.453125 C 4.460938 -7.410156 4.609375 -7.359375 4.734375 -7.296875 C 4.867188 -7.242188 4.976562 -7.191406 5.0625 -7.140625 L 4.75 -6.3125 C 4.59375 -6.40625 4.375 -6.5 4.09375 -6.59375 C 3.8125 -6.695312 3.5 -6.75 3.15625 -6.75 C 2.789062 -6.75 2.476562 -6.65625 2.21875 -6.46875 C 1.957031 -6.289062 1.828125 -6.019531 1.828125 -5.65625 C 1.828125 -5.457031 1.863281 -5.285156 1.9375 -5.140625 C 2.007812 -4.992188 2.113281 -4.863281 2.25 -4.75 C 2.382812 -4.644531 2.535156 -4.546875 2.703125 -4.453125 C 2.878906 -4.367188 3.070312 -4.285156 3.28125 -4.203125 C 3.59375 -4.078125 3.875 -3.945312 4.125 -3.8125 C 4.375 -3.6875 4.585938 -3.535156 4.765625 -3.359375 C 4.953125 -3.179688 5.09375 -2.972656 5.1875 -2.734375 C 5.289062 -2.492188 5.34375 -2.203125 5.34375 -1.859375 C 5.34375 -1.210938 5.125 -0.710938 4.6875 -0.359375 C 4.25 -0.015625 3.625 0.15625 2.8125 0.15625 C 2.539062 0.15625 2.289062 0.132812 2.0625 0.09375 C 1.832031 0.0625 1.625 0.0195312 1.4375 -0.03125 C 1.257812 -0.09375 1.101562 -0.148438 0.96875 -0.203125 C 0.832031 -0.253906 0.726562 -0.304688 0.65625 -0.359375 L 0.953125 -1.171875 C 1.117188 -1.085938 1.359375 -0.988281 1.671875 -0.875 C 1.984375 -0.757812 2.363281 -0.703125 2.8125 -0.703125 Z M 2.8125 -0.703125 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-16">
+<path style="stroke:none;" d="M 3.109375 -5.703125 C 3.859375 -5.703125 4.4375 -5.46875 4.84375 -5 C 5.25 -4.53125 5.453125 -3.820312 5.453125 -2.875 L 5.453125 -2.515625 L 1.46875 -2.515625 C 1.507812 -1.941406 1.703125 -1.503906 2.046875 -1.203125 C 2.390625 -0.898438 2.867188 -0.75 3.484375 -0.75 C 3.835938 -0.75 4.132812 -0.773438 4.375 -0.828125 C 4.625 -0.890625 4.8125 -0.953125 4.9375 -1.015625 L 5.078125 -0.1875 C 4.953125 -0.113281 4.734375 -0.046875 4.421875 0.015625 C 4.109375 0.0859375 3.757812 0.125 3.375 0.125 C 2.894531 0.125 2.472656 0.0507812 2.109375 -0.09375 C 1.742188 -0.238281 1.441406 -0.4375 1.203125 -0.6875 C 0.960938 -0.945312 0.78125 -1.253906 0.65625 -1.609375 C 0.539062 -1.960938 0.484375 -2.347656 0.484375 -2.765625 C 0.484375 -3.265625 0.554688 -3.695312 0.703125 -4.0625 C 0.859375 -4.4375 1.0625 -4.742188 1.3125 -4.984375 C 1.5625 -5.222656 1.835938 -5.398438 2.140625 -5.515625 C 2.453125 -5.640625 2.773438 -5.703125 3.109375 -5.703125 Z M 4.453125 -3.328125 C 4.453125 -3.796875 4.328125 -4.164062 4.078125 -4.4375 C 3.828125 -4.71875 3.5 -4.859375 3.09375 -4.859375 C 2.863281 -4.859375 2.65625 -4.8125 2.46875 -4.71875 C 2.28125 -4.632812 2.117188 -4.519531 1.984375 -4.375 C 1.847656 -4.226562 1.738281 -4.0625 1.65625 -3.875 C 1.570312 -3.695312 1.519531 -3.515625 1.5 -3.328125 Z M 4.453125 -3.328125 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-17">
+<path style="stroke:none;" d="M 4.15625 -4.390625 C 4.039062 -4.492188 3.875 -4.59375 3.65625 -4.6875 C 3.445312 -4.78125 3.222656 -4.828125 2.984375 -4.828125 C 2.722656 -4.828125 2.5 -4.773438 2.3125 -4.671875 C 2.125 -4.566406 1.96875 -4.421875 1.84375 -4.234375 C 1.726562 -4.054688 1.640625 -3.84375 1.578125 -3.59375 C 1.523438 -3.34375 1.5 -3.070312 1.5 -2.78125 C 1.5 -2.132812 1.648438 -1.632812 1.953125 -1.28125 C 2.253906 -0.925781 2.648438 -0.75 3.140625 -0.75 C 3.390625 -0.75 3.597656 -0.757812 3.765625 -0.78125 C 3.941406 -0.8125 4.070312 -0.835938 4.15625 -0.859375 Z M 4.15625 -8.140625 L 5.140625 -8.3125 L 5.140625 -0.15625 C 4.929688 -0.09375 4.65625 -0.03125 4.3125 0.03125 C 3.976562 0.09375 3.585938 0.125 3.140625 0.125 C 2.742188 0.125 2.378906 0.0546875 2.046875 -0.078125 C 1.722656 -0.210938 1.441406 -0.40625 1.203125 -0.65625 C 0.972656 -0.90625 0.796875 -1.207031 0.671875 -1.5625 C 0.546875 -1.925781 0.484375 -2.332031 0.484375 -2.78125 C 0.484375 -3.21875 0.535156 -3.613281 0.640625 -3.96875 C 0.742188 -4.320312 0.898438 -4.625 1.109375 -4.875 C 1.316406 -5.132812 1.566406 -5.335938 1.859375 -5.484375 C 2.148438 -5.628906 2.488281 -5.703125 2.875 -5.703125 C 3.164062 -5.703125 3.421875 -5.664062 3.640625 -5.59375 C 3.867188 -5.519531 4.039062 -5.441406 4.15625 -5.359375 Z M 4.15625 -8.140625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-18">
+<path style="stroke:none;" d="M 1.28125 0 L 1.28125 -5.265625 C 2.101562 -5.546875 2.925781 -5.6875 3.75 -5.6875 C 4.007812 -5.6875 4.253906 -5.675781 4.484375 -5.65625 C 4.722656 -5.632812 4.976562 -5.59375 5.25 -5.53125 L 5.078125 -4.65625 C 4.816406 -4.726562 4.585938 -4.773438 4.390625 -4.796875 C 4.203125 -4.816406 3.988281 -4.828125 3.75 -4.828125 C 3.269531 -4.828125 2.773438 -4.757812 2.265625 -4.625 L 2.265625 0 Z M 1.28125 0 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-19">
+<path style="stroke:none;" d="M 0.609375 1.046875 C 0.679688 1.085938 0.78125 1.117188 0.90625 1.140625 C 1.039062 1.160156 1.164062 1.171875 1.28125 1.171875 C 1.675781 1.171875 1.984375 1.082031 2.203125 0.90625 C 2.421875 0.738281 2.625 0.460938 2.8125 0.078125 C 2.363281 -0.773438 1.941406 -1.6875 1.546875 -2.65625 C 1.148438 -3.625 0.828125 -4.59375 0.578125 -5.5625 L 1.65625 -5.5625 C 1.738281 -5.25 1.832031 -4.90625 1.9375 -4.53125 C 2.039062 -4.15625 2.160156 -3.769531 2.296875 -3.375 C 2.441406 -2.988281 2.585938 -2.597656 2.734375 -2.203125 C 2.890625 -1.804688 3.054688 -1.425781 3.234375 -1.0625 C 3.367188 -1.4375 3.488281 -1.800781 3.59375 -2.15625 C 3.707031 -2.519531 3.8125 -2.882812 3.90625 -3.25 C 4.007812 -3.613281 4.109375 -3.984375 4.203125 -4.359375 C 4.296875 -4.742188 4.394531 -5.144531 4.5 -5.5625 L 5.53125 -5.5625 C 5.269531 -4.53125 4.984375 -3.523438 4.671875 -2.546875 C 4.359375 -1.566406 4.019531 -0.660156 3.65625 0.171875 C 3.519531 0.484375 3.375 0.753906 3.21875 0.984375 C 3.0625 1.222656 2.890625 1.414062 2.703125 1.5625 C 2.523438 1.71875 2.316406 1.832031 2.078125 1.90625 C 1.847656 1.976562 1.585938 2.015625 1.296875 2.015625 C 1.140625 2.015625 0.972656 1.992188 0.796875 1.953125 C 0.617188 1.910156 0.5 1.875 0.4375 1.84375 Z M 0.609375 1.046875 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-20">
+<path style="stroke:none;" d="M 0.34375 -3.71875 C 0.34375 -4.382812 0.40625 -4.960938 0.53125 -5.453125 C 0.664062 -5.941406 0.847656 -6.34375 1.078125 -6.65625 C 1.304688 -6.96875 1.582031 -7.203125 1.90625 -7.359375 C 2.238281 -7.515625 2.601562 -7.59375 3 -7.59375 C 3.394531 -7.59375 3.753906 -7.515625 4.078125 -7.359375 C 4.410156 -7.203125 4.691406 -6.96875 4.921875 -6.65625 C 5.148438 -6.34375 5.328125 -5.941406 5.453125 -5.453125 C 5.585938 -4.960938 5.65625 -4.382812 5.65625 -3.71875 C 5.65625 -3.050781 5.585938 -2.472656 5.453125 -1.984375 C 5.328125 -1.503906 5.148438 -1.101562 4.921875 -0.78125 C 4.691406 -0.457031 4.410156 -0.21875 4.078125 -0.0625 C 3.753906 0.0820312 3.394531 0.15625 3 0.15625 C 2.601562 0.15625 2.238281 0.0820312 1.90625 -0.0625 C 1.582031 -0.21875 1.304688 -0.457031 1.078125 -0.78125 C 0.847656 -1.101562 0.664062 -1.503906 0.53125 -1.984375 C 0.40625 -2.472656 0.34375 -3.050781 0.34375 -3.71875 Z M 1.359375 -3.71875 C 1.359375 -2.738281 1.488281 -1.988281 1.75 -1.46875 C 2.007812 -0.957031 2.414062 -0.703125 2.96875 -0.703125 C 3.53125 -0.703125 3.953125 -0.957031 4.234375 -1.46875 C 4.515625 -1.988281 4.65625 -2.738281 4.65625 -3.71875 C 4.65625 -4.695312 4.515625 -5.445312 4.234375 -5.96875 C 3.953125 -6.488281 3.53125 -6.75 2.96875 -6.75 C 2.414062 -6.75 2.007812 -6.488281 1.75 -5.96875 C 1.488281 -5.445312 1.359375 -4.695312 1.359375 -3.71875 Z M 1.359375 -3.71875 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-21">
+<path style="stroke:none;" d="M 5.140625 -0.15625 C 4.929688 -0.101562 4.644531 -0.046875 4.28125 0.015625 C 3.925781 0.0859375 3.507812 0.125 3.03125 0.125 C 2.613281 0.125 2.265625 0.0625 1.984375 -0.0625 C 1.703125 -0.1875 1.472656 -0.363281 1.296875 -0.59375 C 1.117188 -0.820312 0.992188 -1.09375 0.921875 -1.40625 C 0.847656 -1.71875 0.8125 -2.0625 0.8125 -2.4375 L 0.8125 -5.5625 L 1.796875 -5.5625 L 1.796875 -2.65625 C 1.796875 -1.96875 1.894531 -1.476562 2.09375 -1.1875 C 2.300781 -0.894531 2.644531 -0.75 3.125 -0.75 C 3.226562 -0.75 3.332031 -0.753906 3.4375 -0.765625 C 3.550781 -0.773438 3.65625 -0.785156 3.75 -0.796875 C 3.851562 -0.804688 3.9375 -0.816406 4 -0.828125 C 4.070312 -0.847656 4.125 -0.859375 4.15625 -0.859375 L 4.15625 -5.5625 L 5.140625 -5.5625 Z M 5.140625 -0.15625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-22">
+<path style="stroke:none;" d="M 2.921875 -5.5625 L 5.265625 -5.5625 L 5.265625 -4.734375 L 2.921875 -4.734375 L 2.921875 -2.140625 C 2.921875 -1.867188 2.9375 -1.644531 2.96875 -1.46875 C 3.007812 -1.289062 3.078125 -1.144531 3.171875 -1.03125 C 3.265625 -0.925781 3.382812 -0.851562 3.53125 -0.8125 C 3.675781 -0.769531 3.851562 -0.75 4.0625 -0.75 C 4.34375 -0.75 4.570312 -0.773438 4.75 -0.828125 C 4.925781 -0.878906 5.09375 -0.941406 5.25 -1.015625 L 5.40625 -0.1875 C 5.289062 -0.132812 5.109375 -0.0703125 4.859375 0 C 4.617188 0.0820312 4.316406 0.125 3.953125 0.125 C 3.546875 0.125 3.207031 0.078125 2.9375 -0.015625 C 2.675781 -0.109375 2.46875 -0.25 2.3125 -0.4375 C 2.164062 -0.632812 2.066406 -0.875 2.015625 -1.15625 C 1.960938 -1.4375 1.9375 -1.765625 1.9375 -2.140625 L 1.9375 -4.734375 L 0.75 -4.734375 L 0.75 -5.5625 L 1.9375 -5.5625 L 1.9375 -7.125 L 2.921875 -7.296875 Z M 2.921875 -5.5625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-23">
+<path style="stroke:none;" d="M 4.5 -2.765625 C 4.5 -3.421875 4.347656 -3.925781 4.046875 -4.28125 C 3.742188 -4.632812 3.347656 -4.8125 2.859375 -4.8125 C 2.585938 -4.8125 2.375 -4.796875 2.21875 -4.765625 C 2.0625 -4.742188 1.9375 -4.71875 1.84375 -4.6875 L 1.84375 -1.171875 C 1.957031 -1.066406 2.117188 -0.96875 2.328125 -0.875 C 2.546875 -0.789062 2.773438 -0.75 3.015625 -0.75 C 3.273438 -0.75 3.5 -0.800781 3.6875 -0.90625 C 3.875 -1.007812 4.023438 -1.148438 4.140625 -1.328125 C 4.265625 -1.515625 4.351562 -1.726562 4.40625 -1.96875 C 4.46875 -2.21875 4.5 -2.484375 4.5 -2.765625 Z M 5.515625 -2.765625 C 5.515625 -2.347656 5.460938 -1.957031 5.359375 -1.59375 C 5.253906 -1.238281 5.097656 -0.929688 4.890625 -0.671875 C 4.679688 -0.421875 4.429688 -0.222656 4.140625 -0.078125 C 3.847656 0.0546875 3.507812 0.125 3.125 0.125 C 2.832031 0.125 2.570312 0.0859375 2.34375 0.015625 C 2.125 -0.046875 1.957031 -0.125 1.84375 -0.21875 L 1.84375 1.984375 L 0.859375 1.984375 L 0.859375 -5.40625 C 1.066406 -5.46875 1.34375 -5.53125 1.6875 -5.59375 C 2.03125 -5.65625 2.421875 -5.6875 2.859375 -5.6875 C 3.253906 -5.6875 3.613281 -5.617188 3.9375 -5.484375 C 4.269531 -5.347656 4.550781 -5.15625 4.78125 -4.90625 C 5.019531 -4.65625 5.203125 -4.347656 5.328125 -3.984375 C 5.453125 -3.617188 5.515625 -3.210938 5.515625 -2.765625 Z M 5.515625 -2.765625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph1-24">
+<path style="stroke:none;" d="M 3.015625 -3.734375 L 4.15625 -7.421875 L 5.09375 -7.421875 C 5.25 -6.253906 5.359375 -5.054688 5.421875 -3.828125 C 5.492188 -2.609375 5.554688 -1.332031 5.609375 0 L 4.65625 0 C 4.644531 -0.425781 4.632812 -0.890625 4.625 -1.390625 C 4.625 -1.898438 4.613281 -2.421875 4.59375 -2.953125 C 4.582031 -3.492188 4.570312 -4.039062 4.5625 -4.59375 C 4.550781 -5.144531 4.539062 -5.679688 4.53125 -6.203125 L 3.4375 -2.8125 L 2.578125 -2.8125 L 1.46875 -6.203125 C 1.46875 -5.679688 1.457031 -5.144531 1.4375 -4.59375 C 1.425781 -4.050781 1.414062 -3.507812 1.40625 -2.96875 C 1.394531 -2.425781 1.382812 -1.898438 1.375 -1.390625 C 1.363281 -0.890625 1.351562 -0.425781 1.34375 0 L 0.390625 0 C 0.410156 -0.601562 0.4375 -1.222656 0.46875 -1.859375 C 0.5 -2.503906 0.535156 -3.144531 0.578125 -3.78125 C 0.617188 -4.414062 0.671875 -5.039062 0.734375 -5.65625 C 0.796875 -6.269531 0.863281 -6.859375 0.9375 -7.421875 L 1.84375 -7.421875 Z M 3.015625 -3.734375 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-0">
+<path style="stroke:none;" d="M 0.640625 2.296875 L 0.640625 -9.171875 L 7.140625 -9.171875 L 7.140625 2.296875 Z M 1.375 1.578125 L 6.421875 1.578125 L 6.421875 -8.4375 L 1.375 -8.4375 Z M 1.375 1.578125 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-1">
+<path style="stroke:none;" d="M 6.34375 -6.84375 L 6.34375 -5.75 C 6.007812 -5.925781 5.675781 -6.0625 5.34375 -6.15625 C 5.007812 -6.25 4.675781 -6.296875 4.34375 -6.296875 C 3.582031 -6.296875 2.992188 -6.050781 2.578125 -5.5625 C 2.160156 -5.082031 1.953125 -4.410156 1.953125 -3.546875 C 1.953125 -2.679688 2.160156 -2.007812 2.578125 -1.53125 C 2.992188 -1.050781 3.582031 -0.8125 4.34375 -0.8125 C 4.675781 -0.8125 5.007812 -0.851562 5.34375 -0.9375 C 5.675781 -1.03125 6.007812 -1.171875 6.34375 -1.359375 L 6.34375 -0.265625 C 6.019531 -0.117188 5.679688 -0.0078125 5.328125 0.0625 C 4.984375 0.144531 4.613281 0.1875 4.21875 0.1875 C 3.144531 0.1875 2.289062 -0.144531 1.65625 -0.8125 C 1.03125 -1.488281 0.71875 -2.398438 0.71875 -3.546875 C 0.71875 -4.703125 1.035156 -5.613281 1.671875 -6.28125 C 2.304688 -6.945312 3.179688 -7.28125 4.296875 -7.28125 C 4.648438 -7.28125 5 -7.242188 5.34375 -7.171875 C 5.6875 -7.097656 6.019531 -6.988281 6.34375 -6.84375 Z M 6.34375 -6.84375 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-2">
+<path style="stroke:none;" d="M 5.34375 -6.015625 C 5.207031 -6.085938 5.0625 -6.140625 4.90625 -6.171875 C 4.757812 -6.210938 4.59375 -6.234375 4.40625 -6.234375 C 3.75 -6.234375 3.242188 -6.019531 2.890625 -5.59375 C 2.535156 -5.164062 2.359375 -4.550781 2.359375 -3.75 L 2.359375 0 L 1.1875 0 L 1.1875 -7.109375 L 2.359375 -7.109375 L 2.359375 -6 C 2.597656 -6.4375 2.914062 -6.757812 3.3125 -6.96875 C 3.707031 -7.175781 4.1875 -7.28125 4.75 -7.28125 C 4.832031 -7.28125 4.921875 -7.273438 5.015625 -7.265625 C 5.109375 -7.253906 5.21875 -7.238281 5.34375 -7.21875 Z M 5.34375 -6.015625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-3">
+<path style="stroke:none;" d="M 3.984375 -6.296875 C 3.359375 -6.296875 2.863281 -6.050781 2.5 -5.5625 C 2.132812 -5.070312 1.953125 -4.398438 1.953125 -3.546875 C 1.953125 -2.691406 2.128906 -2.019531 2.484375 -1.53125 C 2.847656 -1.050781 3.347656 -0.8125 3.984375 -0.8125 C 4.597656 -0.8125 5.085938 -1.054688 5.453125 -1.546875 C 5.816406 -2.035156 6 -2.703125 6 -3.546875 C 6 -4.390625 5.816406 -5.054688 5.453125 -5.546875 C 5.085938 -6.046875 4.597656 -6.296875 3.984375 -6.296875 Z M 3.984375 -7.28125 C 4.992188 -7.28125 5.789062 -6.945312 6.375 -6.28125 C 6.957031 -5.625 7.25 -4.710938 7.25 -3.546875 C 7.25 -2.378906 6.957031 -1.460938 6.375 -0.796875 C 5.789062 -0.140625 4.992188 0.1875 3.984375 0.1875 C 2.960938 0.1875 2.160156 -0.140625 1.578125 -0.796875 C 1.003906 -1.460938 0.71875 -2.378906 0.71875 -3.546875 C 0.71875 -4.710938 1.003906 -5.625 1.578125 -6.28125 C 2.160156 -6.945312 2.960938 -7.28125 3.984375 -7.28125 Z M 3.984375 -7.28125 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-4">
+<path style="stroke:none;" d="M 2.359375 -1.0625 L 2.359375 2.703125 L 1.1875 2.703125 L 1.1875 -7.109375 L 2.359375 -7.109375 L 2.359375 -6.03125 C 2.597656 -6.457031 2.90625 -6.769531 3.28125 -6.96875 C 3.65625 -7.175781 4.101562 -7.28125 4.625 -7.28125 C 5.488281 -7.28125 6.191406 -6.9375 6.734375 -6.25 C 7.273438 -5.5625 7.546875 -4.660156 7.546875 -3.546875 C 7.546875 -2.429688 7.273438 -1.53125 6.734375 -0.84375 C 6.191406 -0.15625 5.488281 0.1875 4.625 0.1875 C 4.101562 0.1875 3.65625 0.0820312 3.28125 -0.125 C 2.90625 -0.332031 2.597656 -0.644531 2.359375 -1.0625 Z M 6.328125 -3.546875 C 6.328125 -4.410156 6.148438 -5.082031 5.796875 -5.5625 C 5.441406 -6.050781 4.957031 -6.296875 4.34375 -6.296875 C 3.726562 -6.296875 3.242188 -6.050781 2.890625 -5.5625 C 2.535156 -5.082031 2.359375 -4.410156 2.359375 -3.546875 C 2.359375 -2.691406 2.535156 -2.019531 2.890625 -1.53125 C 3.242188 -1.039062 3.726562 -0.796875 4.34375 -0.796875 C 4.957031 -0.796875 5.441406 -1.039062 5.796875 -1.53125 C 6.148438 -2.019531 6.328125 -2.691406 6.328125 -3.546875 Z M 6.328125 -3.546875 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-5">
+<path style="stroke:none;" d="M 5.75 -6.90625 L 5.75 -5.796875 C 5.425781 -5.960938 5.085938 -6.085938 4.734375 -6.171875 C 4.378906 -6.253906 4.007812 -6.296875 3.625 -6.296875 C 3.039062 -6.296875 2.601562 -6.207031 2.3125 -6.03125 C 2.03125 -5.851562 1.890625 -5.585938 1.890625 -5.234375 C 1.890625 -4.960938 1.988281 -4.75 2.1875 -4.59375 C 2.394531 -4.445312 2.816406 -4.300781 3.453125 -4.15625 L 3.84375 -4.0625 C 4.675781 -3.882812 5.265625 -3.632812 5.609375 -3.3125 C 5.960938 -2.988281 6.140625 -2.539062 6.140625 -1.96875 C 6.140625 -1.300781 5.878906 -0.773438 5.359375 -0.390625 C 4.835938 -0.00390625 4.117188 0.1875 3.203125 0.1875 C 2.816406 0.1875 2.414062 0.148438 2 0.078125 C 1.59375 0.00390625 1.160156 -0.109375 0.703125 -0.265625 L 0.703125 -1.46875 C 1.140625 -1.238281 1.566406 -1.066406 1.984375 -0.953125 C 2.398438 -0.847656 2.8125 -0.796875 3.21875 -0.796875 C 3.769531 -0.796875 4.191406 -0.890625 4.484375 -1.078125 C 4.785156 -1.265625 4.9375 -1.53125 4.9375 -1.875 C 4.9375 -2.1875 4.828125 -2.425781 4.609375 -2.59375 C 4.398438 -2.769531 3.9375 -2.9375 3.21875 -3.09375 L 2.8125 -3.1875 C 2.082031 -3.34375 1.554688 -3.578125 1.234375 -3.890625 C 0.910156 -4.203125 0.75 -4.632812 0.75 -5.1875 C 0.75 -5.851562 0.984375 -6.367188 1.453125 -6.734375 C 1.929688 -7.097656 2.609375 -7.28125 3.484375 -7.28125 C 3.910156 -7.28125 4.316406 -7.25 4.703125 -7.1875 C 5.085938 -7.125 5.4375 -7.03125 5.75 -6.90625 Z M 5.75 -6.90625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-6">
+<path style="stroke:none;" d="M 4.453125 -3.578125 C 3.515625 -3.578125 2.863281 -3.46875 2.5 -3.25 C 2.132812 -3.03125 1.953125 -2.660156 1.953125 -2.140625 C 1.953125 -1.734375 2.085938 -1.40625 2.359375 -1.15625 C 2.628906 -0.914062 3 -0.796875 3.46875 -0.796875 C 4.113281 -0.796875 4.632812 -1.023438 5.03125 -1.484375 C 5.425781 -1.941406 5.625 -2.550781 5.625 -3.3125 L 5.625 -3.578125 Z M 6.78125 -4.0625 L 6.78125 0 L 5.625 0 L 5.625 -1.078125 C 5.351562 -0.648438 5.019531 -0.332031 4.625 -0.125 C 4.226562 0.0820312 3.738281 0.1875 3.15625 0.1875 C 2.425781 0.1875 1.847656 -0.015625 1.421875 -0.421875 C 0.992188 -0.835938 0.78125 -1.382812 0.78125 -2.0625 C 0.78125 -2.863281 1.046875 -3.46875 1.578125 -3.875 C 2.117188 -4.28125 2.921875 -4.484375 3.984375 -4.484375 L 5.625 -4.484375 L 5.625 -4.609375 C 5.625 -5.140625 5.445312 -5.550781 5.09375 -5.84375 C 4.738281 -6.144531 4.238281 -6.296875 3.59375 -6.296875 C 3.1875 -6.296875 2.789062 -6.242188 2.40625 -6.140625 C 2.019531 -6.046875 1.648438 -5.898438 1.296875 -5.703125 L 1.296875 -6.78125 C 1.722656 -6.945312 2.140625 -7.070312 2.546875 -7.15625 C 2.953125 -7.238281 3.34375 -7.28125 3.71875 -7.28125 C 4.75 -7.28125 5.515625 -7.015625 6.015625 -6.484375 C 6.523438 -5.953125 6.78125 -5.144531 6.78125 -4.0625 Z M 6.78125 -4.0625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-7">
+<path style="stroke:none;" d="M 1.21875 -9.875 L 2.390625 -9.875 L 2.390625 0 L 1.21875 0 Z M 1.21875 -9.875 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-8">
+<path style="stroke:none;" d="M 7.3125 -3.84375 L 7.3125 -3.28125 L 1.9375 -3.28125 C 1.988281 -2.46875 2.226562 -1.851562 2.65625 -1.4375 C 3.09375 -1.019531 3.695312 -0.8125 4.46875 -0.8125 C 4.914062 -0.8125 5.347656 -0.863281 5.765625 -0.96875 C 6.191406 -1.082031 6.613281 -1.25 7.03125 -1.46875 L 7.03125 -0.359375 C 6.613281 -0.179688 6.179688 -0.046875 5.734375 0.046875 C 5.296875 0.140625 4.851562 0.1875 4.40625 0.1875 C 3.269531 0.1875 2.367188 -0.140625 1.703125 -0.796875 C 1.046875 -1.460938 0.71875 -2.359375 0.71875 -3.484375 C 0.71875 -4.648438 1.03125 -5.570312 1.65625 -6.25 C 2.289062 -6.9375 3.140625 -7.28125 4.203125 -7.28125 C 5.160156 -7.28125 5.914062 -6.972656 6.46875 -6.359375 C 7.03125 -5.742188 7.3125 -4.90625 7.3125 -3.84375 Z M 6.140625 -4.1875 C 6.128906 -4.820312 5.945312 -5.332031 5.59375 -5.71875 C 5.25 -6.101562 4.789062 -6.296875 4.21875 -6.296875 C 3.5625 -6.296875 3.035156 -6.109375 2.640625 -5.734375 C 2.253906 -5.367188 2.03125 -4.851562 1.96875 -4.1875 Z M 6.140625 -4.1875 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-9">
+<path style="stroke:none;" d="M 6.328125 -3.546875 C 6.328125 -4.410156 6.148438 -5.082031 5.796875 -5.5625 C 5.441406 -6.050781 4.957031 -6.296875 4.34375 -6.296875 C 3.726562 -6.296875 3.242188 -6.050781 2.890625 -5.5625 C 2.535156 -5.082031 2.359375 -4.410156 2.359375 -3.546875 C 2.359375 -2.691406 2.535156 -2.019531 2.890625 -1.53125 C 3.242188 -1.039062 3.726562 -0.796875 4.34375 -0.796875 C 4.957031 -0.796875 5.441406 -1.039062 5.796875 -1.53125 C 6.148438 -2.019531 6.328125 -2.691406 6.328125 -3.546875 Z M 2.359375 -6.03125 C 2.597656 -6.457031 2.90625 -6.769531 3.28125 -6.96875 C 3.65625 -7.175781 4.101562 -7.28125 4.625 -7.28125 C 5.488281 -7.28125 6.191406 -6.9375 6.734375 -6.25 C 7.273438 -5.5625 7.546875 -4.660156 7.546875 -3.546875 C 7.546875 -2.429688 7.273438 -1.53125 6.734375 -0.84375 C 6.191406 -0.15625 5.488281 0.1875 4.625 0.1875 C 4.101562 0.1875 3.65625 0.0820312 3.28125 -0.125 C 2.90625 -0.332031 2.597656 -0.644531 2.359375 -1.0625 L 2.359375 0 L 1.1875 0 L 1.1875 -9.875 L 2.359375 -9.875 Z M 2.359375 -6.03125 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-10">
+<path style="stroke:none;" d="M 1.21875 -7.109375 L 2.390625 -7.109375 L 2.390625 0 L 1.21875 0 Z M 1.21875 -9.875 L 2.390625 -9.875 L 2.390625 -8.390625 L 1.21875 -8.390625 Z M 1.21875 -9.875 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-11">
+<path style="stroke:none;" d="M 7.140625 -4.296875 L 7.140625 0 L 5.96875 0 L 5.96875 -4.25 C 5.96875 -4.925781 5.835938 -5.429688 5.578125 -5.765625 C 5.316406 -6.097656 4.921875 -6.265625 4.390625 -6.265625 C 3.765625 -6.265625 3.269531 -6.0625 2.90625 -5.65625 C 2.539062 -5.257812 2.359375 -4.710938 2.359375 -4.015625 L 2.359375 0 L 1.1875 0 L 1.1875 -7.109375 L 2.359375 -7.109375 L 2.359375 -6 C 2.640625 -6.425781 2.96875 -6.742188 3.34375 -6.953125 C 3.71875 -7.171875 4.15625 -7.28125 4.65625 -7.28125 C 5.46875 -7.28125 6.082031 -7.023438 6.5 -6.515625 C 6.925781 -6.015625 7.140625 -5.273438 7.140625 -4.296875 Z M 7.140625 -4.296875 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-12">
+<path style="stroke:none;" d="M 5.90625 -3.640625 C 5.90625 -4.484375 5.726562 -5.132812 5.375 -5.59375 C 5.03125 -6.0625 4.539062 -6.296875 3.90625 -6.296875 C 3.28125 -6.296875 2.789062 -6.0625 2.4375 -5.59375 C 2.09375 -5.132812 1.921875 -4.484375 1.921875 -3.640625 C 1.921875 -2.796875 2.09375 -2.140625 2.4375 -1.671875 C 2.789062 -1.210938 3.28125 -0.984375 3.90625 -0.984375 C 4.539062 -0.984375 5.03125 -1.210938 5.375 -1.671875 C 5.726562 -2.140625 5.90625 -2.796875 5.90625 -3.640625 Z M 7.078125 -0.875 C 7.078125 0.332031 6.804688 1.226562 6.265625 1.8125 C 5.722656 2.40625 4.898438 2.703125 3.796875 2.703125 C 3.390625 2.703125 3.003906 2.671875 2.640625 2.609375 C 2.273438 2.546875 1.921875 2.453125 1.578125 2.328125 L 1.578125 1.1875 C 1.921875 1.375 2.257812 1.507812 2.59375 1.59375 C 2.925781 1.6875 3.265625 1.734375 3.609375 1.734375 C 4.378906 1.734375 4.953125 1.535156 5.328125 1.140625 C 5.710938 0.742188 5.90625 0.140625 5.90625 -0.671875 L 5.90625 -1.25 C 5.664062 -0.832031 5.351562 -0.519531 4.96875 -0.3125 C 4.59375 -0.101562 4.144531 0 3.625 0 C 2.75 0 2.046875 -0.332031 1.515625 -1 C 0.984375 -1.664062 0.71875 -2.546875 0.71875 -3.640625 C 0.71875 -4.734375 0.984375 -5.613281 1.515625 -6.28125 C 2.046875 -6.945312 2.75 -7.28125 3.625 -7.28125 C 4.144531 -7.28125 4.59375 -7.175781 4.96875 -6.96875 C 5.351562 -6.757812 5.664062 -6.445312 5.90625 -6.03125 L 5.90625 -7.109375 L 7.078125 -7.109375 Z M 7.078125 -0.875 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-13">
+<path style="stroke:none;" d="M 5.125 -8.609375 C 4.1875 -8.609375 3.441406 -8.257812 2.890625 -7.5625 C 2.347656 -6.875 2.078125 -5.929688 2.078125 -4.734375 C 2.078125 -3.535156 2.347656 -2.585938 2.890625 -1.890625 C 3.441406 -1.203125 4.1875 -0.859375 5.125 -0.859375 C 6.050781 -0.859375 6.785156 -1.203125 7.328125 -1.890625 C 7.878906 -2.585938 8.15625 -3.535156 8.15625 -4.734375 C 8.15625 -5.929688 7.878906 -6.875 7.328125 -7.5625 C 6.785156 -8.257812 6.050781 -8.609375 5.125 -8.609375 Z M 5.125 -9.65625 C 6.445312 -9.65625 7.503906 -9.207031 8.296875 -8.3125 C 9.097656 -7.414062 9.5 -6.222656 9.5 -4.734375 C 9.5 -3.234375 9.097656 -2.035156 8.296875 -1.140625 C 7.503906 -0.253906 6.445312 0.1875 5.125 0.1875 C 3.789062 0.1875 2.722656 -0.253906 1.921875 -1.140625 C 1.128906 -2.035156 0.734375 -3.234375 0.734375 -4.734375 C 0.734375 -6.222656 1.128906 -7.414062 1.921875 -8.3125 C 2.722656 -9.207031 3.789062 -9.65625 5.125 -9.65625 Z M 5.125 -9.65625 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-14">
+<path style="stroke:none;" d="M 1.109375 -2.8125 L 1.109375 -7.109375 L 2.265625 -7.109375 L 2.265625 -2.84375 C 2.265625 -2.175781 2.394531 -1.671875 2.65625 -1.328125 C 2.925781 -0.992188 3.320312 -0.828125 3.84375 -0.828125 C 4.476562 -0.828125 4.976562 -1.023438 5.34375 -1.421875 C 5.707031 -1.828125 5.890625 -2.378906 5.890625 -3.078125 L 5.890625 -7.109375 L 7.0625 -7.109375 L 7.0625 0 L 5.890625 0 L 5.890625 -1.09375 C 5.609375 -0.65625 5.28125 -0.332031 4.90625 -0.125 C 4.53125 0.0820312 4.09375 0.1875 3.59375 0.1875 C 2.78125 0.1875 2.160156 -0.0664062 1.734375 -0.578125 C 1.316406 -1.085938 1.109375 -1.832031 1.109375 -2.8125 Z M 4.046875 -7.28125 Z M 4.046875 -7.28125 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-15">
+<path style="stroke:none;" d="M 2.375 -9.125 L 2.375 -7.109375 L 4.78125 -7.109375 L 4.78125 -6.203125 L 2.375 -6.203125 L 2.375 -2.34375 C 2.375 -1.757812 2.453125 -1.382812 2.609375 -1.21875 C 2.773438 -1.0625 3.101562 -0.984375 3.59375 -0.984375 L 4.78125 -0.984375 L 4.78125 0 L 3.59375 0 C 2.6875 0 2.0625 -0.164062 1.71875 -0.5 C 1.375 -0.84375 1.203125 -1.457031 1.203125 -2.34375 L 1.203125 -6.203125 L 0.34375 -6.203125 L 0.34375 -7.109375 L 1.203125 -7.109375 L 1.203125 -9.125 Z M 2.375 -9.125 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-16">
+<path style="stroke:none;" d=""/>
+</symbol>
+<symbol overflow="visible" id="glyph2-17">
+<path style="stroke:none;" d="M 1.28125 -9.484375 L 6.71875 -9.484375 L 6.71875 -8.390625 L 2.5625 -8.390625 L 2.5625 -5.609375 L 6.3125 -5.609375 L 6.3125 -4.53125 L 2.5625 -4.53125 L 2.5625 0 L 1.28125 0 Z M 1.28125 -9.484375 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-18">
+<path style="stroke:none;" d="M 6.765625 -5.75 C 7.054688 -6.269531 7.40625 -6.65625 7.8125 -6.90625 C 8.21875 -7.15625 8.695312 -7.28125 9.25 -7.28125 C 9.988281 -7.28125 10.554688 -7.019531 10.953125 -6.5 C 11.359375 -5.976562 11.5625 -5.242188 11.5625 -4.296875 L 11.5625 0 L 10.390625 0 L 10.390625 -4.25 C 10.390625 -4.9375 10.265625 -5.441406 10.015625 -5.765625 C 9.773438 -6.097656 9.410156 -6.265625 8.921875 -6.265625 C 8.316406 -6.265625 7.835938 -6.0625 7.484375 -5.65625 C 7.128906 -5.257812 6.953125 -4.710938 6.953125 -4.015625 L 6.953125 0 L 5.78125 0 L 5.78125 -4.25 C 5.78125 -4.9375 5.660156 -5.441406 5.421875 -5.765625 C 5.179688 -6.097656 4.804688 -6.265625 4.296875 -6.265625 C 3.703125 -6.265625 3.226562 -6.0625 2.875 -5.65625 C 2.53125 -5.25 2.359375 -4.703125 2.359375 -4.015625 L 2.359375 0 L 1.1875 0 L 1.1875 -7.109375 L 2.359375 -7.109375 L 2.359375 -6 C 2.617188 -6.4375 2.9375 -6.757812 3.3125 -6.96875 C 3.6875 -7.175781 4.128906 -7.28125 4.640625 -7.28125 C 5.160156 -7.28125 5.597656 -7.148438 5.953125 -6.890625 C 6.316406 -6.628906 6.585938 -6.25 6.765625 -5.75 Z M 6.765625 -5.75 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-19">
+<path style="stroke:none;" d="M 6.953125 -9.171875 L 6.953125 -7.921875 C 6.472656 -8.148438 6.015625 -8.320312 5.578125 -8.4375 C 5.148438 -8.550781 4.734375 -8.609375 4.328125 -8.609375 C 3.628906 -8.609375 3.085938 -8.472656 2.703125 -8.203125 C 2.328125 -7.929688 2.140625 -7.546875 2.140625 -7.046875 C 2.140625 -6.628906 2.265625 -6.3125 2.515625 -6.09375 C 2.773438 -5.882812 3.253906 -5.710938 3.953125 -5.578125 L 4.734375 -5.421875 C 5.679688 -5.234375 6.382812 -4.910156 6.84375 -4.453125 C 7.300781 -3.992188 7.53125 -3.378906 7.53125 -2.609375 C 7.53125 -1.691406 7.222656 -0.992188 6.609375 -0.515625 C 5.992188 -0.046875 5.085938 0.1875 3.890625 0.1875 C 3.441406 0.1875 2.960938 0.132812 2.453125 0.03125 C 1.953125 -0.0703125 1.429688 -0.222656 0.890625 -0.421875 L 0.890625 -1.734375 C 1.410156 -1.441406 1.921875 -1.222656 2.421875 -1.078125 C 2.921875 -0.929688 3.410156 -0.859375 3.890625 -0.859375 C 4.628906 -0.859375 5.195312 -1 5.59375 -1.28125 C 5.988281 -1.570312 6.1875 -1.984375 6.1875 -2.515625 C 6.1875 -2.984375 6.039062 -3.347656 5.75 -3.609375 C 5.46875 -3.867188 5.003906 -4.066406 4.359375 -4.203125 L 3.578125 -4.359375 C 2.617188 -4.546875 1.925781 -4.84375 1.5 -5.25 C 1.070312 -5.65625 0.859375 -6.21875 0.859375 -6.9375 C 0.859375 -7.78125 1.148438 -8.441406 1.734375 -8.921875 C 2.328125 -9.410156 3.144531 -9.65625 4.1875 -9.65625 C 4.625 -9.65625 5.070312 -9.613281 5.53125 -9.53125 C 6 -9.445312 6.472656 -9.328125 6.953125 -9.171875 Z M 6.953125 -9.171875 "/>
+</symbol>
+<symbol overflow="visible" id="glyph2-20">
+<path style="stroke:none;" d="M 4.1875 0.65625 C 3.851562 1.507812 3.53125 2.0625 3.21875 2.3125 C 2.90625 2.570312 2.488281 2.703125 1.96875 2.703125 L 1.03125 2.703125 L 1.03125 1.734375 L 1.71875 1.734375 C 2.039062 1.734375 2.289062 1.65625 2.46875 1.5 C 2.644531 1.34375 2.835938 0.984375 3.046875 0.421875 L 3.265625 -0.109375 L 0.390625 -7.109375 L 1.625 -7.109375 L 3.84375 -1.546875 L 6.0625 -7.109375 L 7.3125 -7.109375 Z M 4.1875 0.65625 "/>
+</symbol>
+</g>
+</defs>
+<g id="surface268880">
+<rect x="0" y="0" width="774" height="152" style="fill:rgb(100%,100%,100%);fill-opacity:1;stroke:none;"/>
+<path style="fill-rule:evenodd;fill:rgb(100%,100%,100%);fill-opacity:1;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 21.75297 10.408118 L 26.433829 10.408118 L 26.433829 12.281165 L 21.75297 12.281165 Z M 21.75297 10.408118 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<path style="fill-rule:evenodd;fill:rgb(100%,100%,100%);fill-opacity:1;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 29.079728 10.51222 L 32.829728 10.51222 L 32.829728 12.149915 L 29.079728 12.149915 Z M 29.079728 10.51222 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<g style="fill:rgb(0%,0%,0%);fill-opacity:1;">
+ <use xlink:href="#glyph0-1" x="20.171875" y="57.705621"/>
+ <use xlink:href="#glyph0-2" x="30.171875" y="57.705621"/>
+ <use xlink:href="#glyph0-3" x="40.171875" y="57.705621"/>
+ <use xlink:href="#glyph0-4" x="50.171875" y="57.705621"/>
+ <use xlink:href="#glyph0-5" x="60.171875" y="57.705621"/>
+ <use xlink:href="#glyph0-6" x="70.171875" y="57.705621"/>
+</g>
+<g style="fill:rgb(0%,0%,0%);fill-opacity:1;">
+ <use xlink:href="#glyph0-7" x="174.203125" y="60.053277"/>
+ <use xlink:href="#glyph0-8" x="184.203125" y="60.053277"/>
+</g>
+<path style="fill-rule:evenodd;fill:rgb(100%,100%,100%);fill-opacity:1;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 40.925236 10.544446 L 44.675236 10.544446 L 44.675236 12.090345 L 40.925236 12.090345 Z M 40.925236 10.544446 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<path style="fill-rule:evenodd;fill:rgb(100%,100%,100%);fill-opacity:1;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 34.883439 10.536634 L 38.633439 10.536634 L 38.633439 12.120032 L 34.883439 12.120032 Z M 34.883439 10.536634 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<path style="fill-rule:evenodd;fill:rgb(100%,100%,100%);fill-opacity:1;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 47.084806 10.484876 L 52.045743 10.484876 L 52.045743 12.130774 L 47.084806 12.130774 Z M 47.084806 10.484876 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<path style="fill-rule:evenodd;fill:rgb(100%,100%,100%);fill-opacity:1;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 53.980118 10.376868 L 59.866642 10.376868 L 59.866642 12.279603 L 53.980118 12.279603 Z M 53.980118 10.376868 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<path style="fill-rule:evenodd;fill:rgb(100%,100%,100%);fill-opacity:1;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 54.048478 13.825501 L 59.868009 13.825501 L 59.868009 15.490345 L 54.048478 15.490345 Z M 54.048478 13.825501 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<path style="fill:none;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 26.481876 11.338001 L 28.593009 11.332337 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<path style="fill-rule:evenodd;fill:rgb(0%,0%,0%);fill-opacity:1;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 28.968009 11.33136 L 28.468595 11.582728 L 28.593009 11.332337 L 28.467228 11.082728 Z M 28.968009 11.33136 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<path style="fill:none;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 32.876798 11.329798 L 34.396525 11.328626 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<path style="fill-rule:evenodd;fill:rgb(0%,0%,0%);fill-opacity:1;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 34.771525 11.328431 L 34.27172 11.578821 L 34.396525 11.328626 L 34.271329 11.078821 Z M 34.771525 11.328431 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<path style="fill:none;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 38.633439 11.328431 L 40.438517 11.319642 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<path style="fill-rule:evenodd;fill:rgb(0%,0%,0%);fill-opacity:1;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 40.813517 11.317884 L 40.314689 11.570228 L 40.438517 11.319642 L 40.312345 11.070423 Z M 40.813517 11.317884 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<path style="fill:none;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 44.675236 11.317298 L 46.597892 11.309876 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<path style="fill-rule:evenodd;fill:rgb(0%,0%,0%);fill-opacity:1;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 46.972892 11.308313 L 46.473868 11.560267 L 46.597892 11.309876 L 46.471915 11.060267 Z M 46.972892 11.308313 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<path style="fill:none;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 52.045743 11.307923 L 53.4934 11.323157 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<path style="fill-rule:evenodd;fill:rgb(0%,0%,0%);fill-opacity:1;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 53.8684 11.327063 L 53.365861 11.57179 L 53.4934 11.323157 L 53.370939 11.07179 Z M 53.8684 11.327063 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<g style="fill:rgb(0%,0%,0%);fill-opacity:1;">
+ <use xlink:href="#glyph0-9" x="286.757813" y="60.611871"/>
+ <use xlink:href="#glyph0-10" x="296.757813" y="60.611871"/>
+ <use xlink:href="#glyph0-1" x="306.757813" y="60.611871"/>
+</g>
+<g style="fill:rgb(0%,0%,0%);fill-opacity:1;">
+ <use xlink:href="#glyph0-11" x="405.660156" y="59.904839"/>
+ <use xlink:href="#glyph0-10" x="415.660156" y="59.904839"/>
+ <use xlink:href="#glyph0-12" x="425.660156" y="59.904839"/>
+</g>
+<g style="fill:rgb(0%,0%,0%);fill-opacity:1;">
+ <use xlink:href="#glyph1-1" x="511.308594" y="58.064616"/>
+ <use xlink:href="#glyph1-2" x="517.308757" y="58.064616"/>
+ <use xlink:href="#glyph1-3" x="523.308919" y="58.064616"/>
+ <use xlink:href="#glyph1-4" x="529.309082" y="58.064616"/>
+ <use xlink:href="#glyph1-5" x="535.309245" y="58.064616"/>
+ <use xlink:href="#glyph1-6" x="541.309408" y="58.064616"/>
+ <use xlink:href="#glyph1-7" x="547.30957" y="58.064616"/>
+ <use xlink:href="#glyph1-8" x="553.309733" y="58.064616"/>
+ <use xlink:href="#glyph1-9" x="559.309896" y="58.064616"/>
+ <use xlink:href="#glyph1-10" x="565.310059" y="58.064616"/>
+ <use xlink:href="#glyph1-11" x="571.310221" y="58.064616"/>
+ <use xlink:href="#glyph1-12" x="577.310384" y="58.064616"/>
+ <use xlink:href="#glyph1-13" x="583.310547" y="58.064616"/>
+ <use xlink:href="#glyph1-8" x="589.31071" y="58.064616"/>
+ <use xlink:href="#glyph1-14" x="595.310872" y="58.064616"/>
+</g>
+<path style="fill:none;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 45.671915 11.342298 L 45.655704 11.342298 L 45.655704 14.657923 L 53.561759 14.657923 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<path style="fill-rule:evenodd;fill:rgb(0%,0%,0%);fill-opacity:1;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 53.936759 14.657923 L 53.436759 14.907923 L 53.561759 14.657923 L 53.436759 14.407923 Z M 53.936759 14.657923 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<g style="fill:rgb(0%,0%,0%);fill-opacity:1;">
+ <use xlink:href="#glyph1-15" x="657.078125" y="57.724772"/>
+ <use xlink:href="#glyph1-16" x="663.078288" y="57.724772"/>
+ <use xlink:href="#glyph1-10" x="669.078451" y="57.724772"/>
+ <use xlink:href="#glyph1-6" x="675.078613" y="57.724772"/>
+ <use xlink:href="#glyph1-8" x="681.078776" y="57.724772"/>
+ <use xlink:href="#glyph1-17" x="687.078939" y="57.724772"/>
+ <use xlink:href="#glyph1-11" x="693.079102" y="57.724772"/>
+ <use xlink:href="#glyph1-18" x="699.079264" y="57.724772"/>
+ <use xlink:href="#glyph1-19" x="705.079427" y="57.724772"/>
+ <use xlink:href="#glyph1-4" x="711.07959" y="57.724772"/>
+ <use xlink:href="#glyph1-20" x="717.079753" y="57.724772"/>
+ <use xlink:href="#glyph1-21" x="723.079915" y="57.724772"/>
+ <use xlink:href="#glyph1-22" x="729.080078" y="57.724772"/>
+ <use xlink:href="#glyph1-23" x="735.080241" y="57.724772"/>
+ <use xlink:href="#glyph1-21" x="741.080404" y="57.724772"/>
+ <use xlink:href="#glyph1-22" x="747.080566" y="57.724772"/>
+</g>
+<g style="fill:rgb(0%,0%,0%);fill-opacity:1;">
+ <use xlink:href="#glyph1-24" x="673.335938" y="124.170085"/>
+ <use xlink:href="#glyph1-11" x="679.3361" y="124.170085"/>
+ <use xlink:href="#glyph1-13" x="685.336263" y="124.170085"/>
+ <use xlink:href="#glyph1-8" x="691.336426" y="124.170085"/>
+ <use xlink:href="#glyph1-4" x="697.336589" y="124.170085"/>
+ <use xlink:href="#glyph1-20" x="703.336751" y="124.170085"/>
+ <use xlink:href="#glyph1-21" x="709.336914" y="124.170085"/>
+ <use xlink:href="#glyph1-22" x="715.337077" y="124.170085"/>
+ <use xlink:href="#glyph1-23" x="721.33724" y="124.170085"/>
+ <use xlink:href="#glyph1-21" x="727.337402" y="124.170085"/>
+ <use xlink:href="#glyph1-22" x="733.337565" y="124.170085"/>
+</g>
+<g style="fill:rgb(0%,0%,0%);fill-opacity:1;">
+ <use xlink:href="#glyph2-1" x="168.71875" y="31.959093"/>
+ <use xlink:href="#glyph2-2" x="175.866102" y="31.959093"/>
+ <use xlink:href="#glyph2-3" x="180.92551" y="31.959093"/>
+ <use xlink:href="#glyph2-4" x="188.879069" y="31.959093"/>
+</g>
+<g style="fill:rgb(0%,0%,0%);fill-opacity:1;">
+ <use xlink:href="#glyph2-5" x="288.109375" y="31.681749"/>
+ <use xlink:href="#glyph2-1" x="294.882378" y="31.681749"/>
+ <use xlink:href="#glyph2-6" x="302.029731" y="31.681749"/>
+ <use xlink:href="#glyph2-7" x="309.996039" y="31.681749"/>
+ <use xlink:href="#glyph2-8" x="313.607964" y="31.681749"/>
+</g>
+<g style="fill:rgb(0%,0%,0%);fill-opacity:1;">
+ <use xlink:href="#glyph2-5" x="535.988281" y="33.365343"/>
+ <use xlink:href="#glyph2-1" x="542.761285" y="33.365343"/>
+ <use xlink:href="#glyph2-6" x="549.908637" y="33.365343"/>
+ <use xlink:href="#glyph2-7" x="557.874946" y="33.365343"/>
+ <use xlink:href="#glyph2-8" x="561.486871" y="33.365343"/>
+</g>
+<g style="fill:rgb(0%,0%,0%);fill-opacity:1;">
+ <use xlink:href="#glyph2-9" x="26.695313" y="32.365343"/>
+ <use xlink:href="#glyph2-10" x="34.947266" y="32.365343"/>
+ <use xlink:href="#glyph2-11" x="38.559191" y="32.365343"/>
+ <use xlink:href="#glyph2-11" x="46.798394" y="32.365343"/>
+ <use xlink:href="#glyph2-10" x="55.037598" y="32.365343"/>
+ <use xlink:href="#glyph2-11" x="58.649523" y="32.365343"/>
+ <use xlink:href="#glyph2-12" x="66.888726" y="32.365343"/>
+</g>
+<path style="fill:none;stroke-width:0.1;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-dasharray:0.14,0.14;stroke-miterlimit:10;" d="M 45.300431 9.486438 L 60.373478 9.486438 L 60.373478 16.175696 L 45.300431 16.175696 Z M 45.300431 9.486438 " transform="matrix(20,0,0,20,-434.059401,-172.47877)"/>
+<g style="fill:rgb(0%,0%,0%);fill-opacity:1;">
+ <use xlink:href="#glyph2-13" x="532.003906" y="11.904405"/>
+ <use xlink:href="#glyph2-14" x="542.236382" y="11.904405"/>
+ <use xlink:href="#glyph2-15" x="550.475586" y="11.904405"/>
+ <use xlink:href="#glyph2-4" x="555.5727" y="11.904405"/>
+ <use xlink:href="#glyph2-14" x="563.824653" y="11.904405"/>
+ <use xlink:href="#glyph2-15" x="572.063856" y="11.904405"/>
+ <use xlink:href="#glyph2-16" x="577.16097" y="11.904405"/>
+ <use xlink:href="#glyph2-17" x="581.293186" y="11.904405"/>
+ <use xlink:href="#glyph2-3" x="588.307346" y="11.904405"/>
+ <use xlink:href="#glyph2-2" x="596.260905" y="11.904405"/>
+ <use xlink:href="#glyph2-18" x="601.377279" y="11.904405"/>
+ <use xlink:href="#glyph2-6" x="614.040853" y="11.904405"/>
+ <use xlink:href="#glyph2-15" x="622.007161" y="11.904405"/>
+ <use xlink:href="#glyph2-15" x="627.104275" y="11.904405"/>
+ <use xlink:href="#glyph2-8" x="632.201389" y="11.904405"/>
+ <use xlink:href="#glyph2-2" x="640.199436" y="11.904405"/>
+ <use xlink:href="#glyph2-16" x="645.544217" y="11.904405"/>
+ <use xlink:href="#glyph2-19" x="649.676432" y="11.904405"/>
+ <use xlink:href="#glyph2-20" x="657.928385" y="11.904405"/>
+ <use xlink:href="#glyph2-5" x="665.621799" y="11.904405"/>
+ <use xlink:href="#glyph2-15" x="672.394803" y="11.904405"/>
+ <use xlink:href="#glyph2-8" x="677.491916" y="11.904405"/>
+ <use xlink:href="#glyph2-18" x="685.489963" y="11.904405"/>
+</g>
+</g>
+</svg>
diff --git a/Documentation/admin-guide/media/ivtv-cardlist.rst b/Documentation/admin-guide/media/ivtv-cardlist.rst
new file mode 100644
index 000000000000..0ffc3b71ae60
--- /dev/null
+++ b/Documentation/admin-guide/media/ivtv-cardlist.rst
@@ -0,0 +1,139 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+IVTV cards list
+===============
+
+.. tabularcolumns:: |p{1.4cm}|p{12.7cm}|p{3.4cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 2 19 18
+ :stub-columns: 0
+
+ * - Card number
+ - Card name
+ - PCI subsystem IDs
+
+ * - 0
+ - Hauppauge WinTV PVR-250
+ - IVTV16 104d:813d
+
+ * - 1
+ - Hauppauge WinTV PVR-350
+ - IVTV16 104d:813d
+
+ * - 2
+ - Hauppauge WinTV PVR-150
+ - IVTV16 104d:813d
+
+ * - 3
+ - AVerMedia M179
+ - IVTV15 1461:a3cf, IVTV15 1461:a3ce
+
+ * - 4
+ - Yuan MPG600, Kuroutoshikou ITVC16-STVLP
+ - IVTV16 12ab:fff3, IVTV16 12ab:ffff
+
+ * - 5
+ - YUAN MPG160, Kuroutoshikou ITVC15-STVLP, I/O Data GV-M2TV/PCI
+ - IVTV15 10fc:40a0
+
+ * - 6
+ - Yuan PG600, Diamond PVR-550
+ - IVTV16 ff92:0070, IVTV16 ffab:0600
+
+ * - 7
+ - Adaptec VideOh! AVC-2410
+ - IVTV16 9005:0093
+
+ * - 8
+ - Adaptec VideOh! AVC-2010
+ - IVTV16 9005:0092
+
+ * - 9
+ - Nagase Transgear 5000TV
+ - IVTV16 1461:bfff
+
+ * - 10
+ - AOpen VA2000MAX-SNT6
+ - IVTV16 0000:ff5f
+
+ * - 11
+ - Yuan MPG600GR, Kuroutoshikou CX23416GYC-STVLP
+ - IVTV16 12ab:0600, IVTV16 fbab:0600, IVTV16 1154:0523
+
+ * - 12
+ - I/O Data GV-MVP/RX, GV-MVP/RX2W (dual tuner)
+ - IVTV16 10fc:d01e, IVTV16 10fc:d038, IVTV16 10fc:d039
+
+ * - 13
+ - I/O Data GV-MVP/RX2E
+ - IVTV16 10fc:d025
+
+ * - 14
+ - GotView PCI DVD
+ - IVTV16 12ab:0600
+
+ * - 15
+ - GotView PCI DVD2 Deluxe
+ - IVTV16 ffac:0600
+
+ * - 16
+ - Yuan MPC622
+ - IVTV16 ff01:d998
+
+ * - 17
+ - Digital Cowboy DCT-MTVP1
+ - IVTV16 1461:bfff
+
+ * - 18
+ - Yuan PG600-2, GotView PCI DVD Lite
+ - IVTV16 ffab:0600, IVTV16 ffad:0600
+
+ * - 19
+ - Club3D ZAP-TV1x01
+ - IVTV16 ffab:0600
+
+ * - 20
+ - AVerTV MCE 116 Plus
+ - IVTV16 1461:c439
+
+ * - 21
+ - ASUS Falcon2
+ - IVTV16 1043:4b66, IVTV16 1043:462e, IVTV16 1043:4b2e
+
+ * - 22
+ - AVerMedia PVR-150 Plus / AVerTV M113 Partsnic (Daewoo) Tuner
+ - IVTV16 1461:c034, IVTV16 1461:c035
+
+ * - 23
+ - AVerMedia EZMaker PCI Deluxe
+ - IVTV16 1461:c03f
+
+ * - 24
+ - AVerMedia M104
+ - IVTV16 1461:c136
+
+ * - 25
+ - Buffalo PC-MV5L/PCI
+ - IVTV16 1154:052b
+
+ * - 26
+ - AVerMedia UltraTV 1500 MCE / AVerTV M113 Philips Tuner
+ - IVTV16 1461:c019, IVTV16 1461:c01b
+
+ * - 27
+ - Sony VAIO Giga Pocket (ENX Kikyou)
+ - IVTV16 104d:813d
+
+ * - 28
+ - Hauppauge WinTV PVR-350 (V1)
+ - IVTV16 104d:813d
+
+ * - 29
+ - Yuan MPG600GR, Kuroutoshikou CX23416GYC-STVLP (no GR)
+ - IVTV16 104d:813d
+
+ * - 30
+ - Yuan MPG600GR, Kuroutoshikou CX23416GYC-STVLP (no GR/YCS)
+ - IVTV16 104d:813d
diff --git a/Documentation/admin-guide/media/ivtv.rst b/Documentation/admin-guide/media/ivtv.rst
new file mode 100644
index 000000000000..101f16d0263e
--- /dev/null
+++ b/Documentation/admin-guide/media/ivtv.rst
@@ -0,0 +1,218 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+The ivtv driver
+===============
+
+Author: Hans Verkuil <hverkuil@xs4all.nl>
+
+This is a v4l2 device driver for the Conexant cx23415/6 MPEG encoder/decoder.
+The cx23415 can do both encoding and decoding, the cx23416 can only do MPEG
+encoding. Currently the only card featuring full decoding support is the
+Hauppauge PVR-350.
+
+.. note::
+
+ #) This driver requires the latest encoder firmware (version 2.06.039, size
+ 376836 bytes). Get the firmware from here:
+
+ https://linuxtv.org/downloads/firmware/#conexant
+
+ #) 'normal' TV applications do not work with this driver, you need
+ an application that can handle MPEG input such as mplayer, xine, MythTV,
+ etc.
+
+The primary goal of the IVTV project is to provide a "clean room" Linux
+Open Source driver implementation for video capture cards based on the
+iCompression iTVC15 or Conexant CX23415/CX23416 MPEG Codec.
+
+Features
+--------
+
+ * Hardware mpeg2 capture of broadcast video (and sound) via the tuner or
+ S-Video/Composite and audio line-in.
+ * Hardware mpeg2 capture of FM radio where hardware support exists
+ * Supports NTSC, PAL, SECAM with stereo sound
+ * Supports SAP and bilingual transmissions.
+ * Supports raw VBI (closed captions and teletext).
+ * Supports sliced VBI (closed captions and teletext) and is able to insert
+ this into the captured MPEG stream.
+ * Supports raw YUV and PCM input.
+
+Additional features for the PVR-350 (CX23415 based)
+---------------------------------------------------
+
+ * Provides hardware mpeg2 playback
+ * Provides comprehensive OSD (On Screen Display: ie. graphics overlaying the
+ video signal)
+ * Provides a framebuffer (allowing X applications to appear on the video
+ device)
+ * Supports raw YUV output.
+
+IMPORTANT: In case of problems first read this page:
+ https://help.ubuntu.com/community/Install_IVTV_Troubleshooting
+
+See also
+--------
+
+https://linuxtv.org
+
+IRC
+---
+
+irc://irc.freenode.net/#v4l
+
+----------------------------------------------------------
+
+Devices
+-------
+
+A maximum of 12 ivtv boards are allowed at the moment.
+
+Cards that don't have a video output capability (i.e. non PVR350 cards)
+lack the vbi8, vbi16, video16 and video48 devices. They also do not
+support the framebuffer device /dev/fbx for OSD.
+
+The radio0 device may or may not be present, depending on whether the
+card has a radio tuner or not.
+
+Here is a list of the base v4l devices:
+
+.. code-block:: none
+
+ crw-rw---- 1 root video 81, 0 Jun 19 22:22 /dev/video0
+ crw-rw---- 1 root video 81, 16 Jun 19 22:22 /dev/video16
+ crw-rw---- 1 root video 81, 24 Jun 19 22:22 /dev/video24
+ crw-rw---- 1 root video 81, 32 Jun 19 22:22 /dev/video32
+ crw-rw---- 1 root video 81, 48 Jun 19 22:22 /dev/video48
+ crw-rw---- 1 root video 81, 64 Jun 19 22:22 /dev/radio0
+ crw-rw---- 1 root video 81, 224 Jun 19 22:22 /dev/vbi0
+ crw-rw---- 1 root video 81, 228 Jun 19 22:22 /dev/vbi8
+ crw-rw---- 1 root video 81, 232 Jun 19 22:22 /dev/vbi16
+
+Base devices
+------------
+
+For every extra card you have the numbers increased by one. For example,
+/dev/video0 is listed as the 'base' encoding capture device so we have:
+
+- /dev/video0 is the encoding capture device for the first card (card 0)
+- /dev/video1 is the encoding capture device for the second card (card 1)
+- /dev/video2 is the encoding capture device for the third card (card 2)
+
+Note that if the first card doesn't have a feature (eg no decoder, so no
+video16, the second card will still use video17. The simple rule is 'add
+the card number to the base device number'. If you have other capture
+cards (e.g. WinTV PCI) that are detected first, then you have to tell
+the ivtv module about it so that it will start counting at 1 (or 2, or
+whatever). Otherwise the device numbers can get confusing. The ivtv
+'ivtv_first_minor' module option can be used for that.
+
+
+- /dev/video0
+
+ The encoding capture device(s).
+
+ Read-only.
+
+ Reading from this device gets you the MPEG1/2 program stream.
+ Example:
+
+ .. code-block:: none
+
+ cat /dev/video0 > my.mpg (you need to hit ctrl-c to exit)
+
+
+- /dev/video16
+
+ The decoder output device(s)
+
+ Write-only. Only present if the MPEG decoder (i.e. CX23415) exists.
+
+ An mpeg2 stream sent to this device will appear on the selected video
+ display, audio will appear on the line-out/audio out. It is only
+ available for cards that support video out. Example:
+
+ .. code-block:: none
+
+ cat my.mpg >/dev/video16
+
+
+- /dev/video24
+
+ The raw audio capture device(s).
+
+ Read-only
+
+ The raw audio PCM stereo stream from the currently selected
+ tuner or audio line-in. Reading from this device results in a raw
+ (signed 16 bit Little Endian, 48000 Hz, stereo pcm) capture.
+ This device only captures audio. This should be replaced by an ALSA
+ device in the future.
+ Note that there is no corresponding raw audio output device, this is
+ not supported in the decoder firmware.
+
+
+- /dev/video32
+
+ The raw video capture device(s)
+
+ Read-only
+
+ The raw YUV video output from the current video input. The YUV format
+ is a 16x16 linear tiled NV12 format (V4L2_PIX_FMT_NV12_16L16)
+
+ Note that the YUV and PCM streams are not synchronized, so they are of
+ limited use.
+
+
+- /dev/video48
+
+ The raw video display device(s)
+
+ Write-only. Only present if the MPEG decoder (i.e. CX23415) exists.
+
+ Writes a YUV stream to the decoder of the card.
+
+
+- /dev/radio0
+
+ The radio tuner device(s)
+
+ Cannot be read or written.
+
+ Used to enable the radio tuner and tune to a frequency. You cannot
+ read or write audio streams with this device. Once you use this
+ device to tune the radio, use /dev/video24 to read the raw pcm stream
+ or /dev/video0 to get an mpeg2 stream with black video.
+
+
+- /dev/vbi0
+
+ The 'vertical blank interval' (Teletext, CC, WSS etc) capture device(s)
+
+ Read-only
+
+ Captures the raw (or sliced) video data sent during the Vertical Blank
+ Interval. This data is used to encode teletext, closed captions, VPS,
+ widescreen signalling, electronic program guide information, and other
+ services.
+
+
+- /dev/vbi8
+
+ Processed vbi feedback device(s)
+
+ Read-only. Only present if the MPEG decoder (i.e. CX23415) exists.
+
+ The sliced VBI data embedded in an MPEG stream is reproduced on this
+ device. So while playing back a recording on /dev/video16, you can
+ read the embedded VBI data from /dev/vbi8.
+
+
+- /dev/vbi16
+
+ The vbi 'display' device(s)
+
+ Write-only. Only present if the MPEG decoder (i.e. CX23415) exists.
+
+ Can be used to send sliced VBI data to the video-out connector.
diff --git a/Documentation/admin-guide/media/lmedm04.rst b/Documentation/admin-guide/media/lmedm04.rst
new file mode 100644
index 000000000000..a6ee33413748
--- /dev/null
+++ b/Documentation/admin-guide/media/lmedm04.rst
@@ -0,0 +1,107 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Firmware files for lmedm04 cards
+================================
+
+To extract firmware for the DM04/QQBOX you need to copy the
+following file(s) to this directory.
+
+For DM04+/QQBOX LME2510C (Sharp 7395 Tuner)
+-------------------------------------------
+
+The Sharp 7395 driver can be found in windows/system32/drivers
+
+US2A0D.sys (dated 17 Mar 2009)
+
+
+and run:
+
+.. code-block:: none
+
+ scripts/get_dvb_firmware lme2510c_s7395
+
+will produce dvb-usb-lme2510c-s7395.fw
+
+An alternative but older firmware can be found on the driver
+disk DVB-S_EN_3.5A in BDADriver/driver
+
+LMEBDA_DVBS7395C.sys (dated 18 Jan 2008)
+
+and run:
+
+.. code-block:: none
+
+ ./get_dvb_firmware lme2510c_s7395_old
+
+will produce dvb-usb-lme2510c-s7395.fw
+
+The LG firmware can be found on the driver
+disk DM04+_5.1A[LG] in BDADriver/driver
+
+For DM04 LME2510 (LG Tuner)
+---------------------------
+
+LMEBDA_DVBS.sys (dated 13 Nov 2007)
+
+and run:
+
+
+.. code-block:: none
+
+ ./get_dvb_firmware lme2510_lg
+
+will produce dvb-usb-lme2510-lg.fw
+
+
+Other LG firmware can be extracted manually from US280D.sys
+only found in windows/system32/drivers
+
+dd if=US280D.sys ibs=1 skip=42360 count=3924 of=dvb-usb-lme2510-lg.fw
+
+For DM04 LME2510C (LG Tuner)
+----------------------------
+
+.. code-block:: none
+
+ dd if=US280D.sys ibs=1 skip=35200 count=3850 of=dvb-usb-lme2510c-lg.fw
+
+
+The Sharp 0194 tuner driver can be found in windows/system32/drivers
+
+US290D.sys (dated 09 Apr 2009)
+
+For LME2510
+-----------
+
+.. code-block:: none
+
+ dd if=US290D.sys ibs=1 skip=36856 count=3976 of=dvb-usb-lme2510-s0194.fw
+
+
+For LME2510C
+------------
+
+
+.. code-block:: none
+
+ dd if=US290D.sys ibs=1 skip=33152 count=3697 of=dvb-usb-lme2510c-s0194.fw
+
+
+The m88rs2000 tuner driver can be found in windows/system32/drivers
+
+US2B0D.sys (dated 29 Jun 2010)
+
+
+.. code-block:: none
+
+ dd if=US2B0D.sys ibs=1 skip=34432 count=3871 of=dvb-usb-lme2510c-rs2000.fw
+
+We need to modify id of rs2000 firmware or it will warm boot id 3344:1120.
+
+
+.. code-block:: none
+
+
+ echo -ne \\xF0\\x22 | dd conv=notrunc bs=1 count=2 seek=266 of=dvb-usb-lme2510c-rs2000.fw
+
+Copy the firmware file(s) to /lib/firmware
diff --git a/Documentation/admin-guide/media/mgb4.rst b/Documentation/admin-guide/media/mgb4.rst
new file mode 100644
index 000000000000..2977f74d7e26
--- /dev/null
+++ b/Documentation/admin-guide/media/mgb4.rst
@@ -0,0 +1,374 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+mgb4 sysfs interface
+====================
+
+The mgb4 driver provides a sysfs interface, that is used to configure video
+stream related parameters (some of them must be set properly before the v4l2
+device can be opened) and obtain the video device/stream status.
+
+There are two types of parameters - global / PCI card related, found under
+``/sys/class/video4linux/videoX/device`` and module specific found under
+``/sys/class/video4linux/videoX``.
+
+
+Global (PCI card) parameters
+============================
+
+**module_type** (R):
+ Module type.
+
+ | 0 - No module present
+ | 1 - FPDL3
+ | 2 - GMSL
+
+**module_version** (R):
+ Module version number. Zero in case of a missing module.
+
+**fw_type** (R):
+ Firmware type.
+
+ | 1 - FPDL3
+ | 2 - GMSL
+
+**fw_version** (R):
+ Firmware version number.
+
+**serial_number** (R):
+ Card serial number. The format is::
+
+ PRODUCT-REVISION-SERIES-SERIAL
+
+ where each component is a 8b number.
+
+
+Common FPDL3/GMSL input parameters
+==================================
+
+**input_id** (R):
+ Input number ID, zero based.
+
+**oldi_lane_width** (RW):
+ Number of deserializer output lanes.
+
+ | 0 - single
+ | 1 - dual (default)
+
+**color_mapping** (RW):
+ Mapping of the incoming bits in the signal to the colour bits of the pixels.
+
+ | 0 - OLDI/JEIDA
+ | 1 - SPWG/VESA (default)
+
+**link_status** (R):
+ Video link status. If the link is locked, chips are properly connected and
+ communicating at the same speed and protocol. The link can be locked without
+ an active video stream.
+
+ A value of 0 is equivalent to the V4L2_IN_ST_NO_SYNC flag of the V4L2
+ VIDIOC_ENUMINPUT status bits.
+
+ | 0 - unlocked
+ | 1 - locked
+
+**stream_status** (R):
+ Video stream status. A stream is detected if the link is locked, the input
+ pixel clock is running and the DE signal is moving.
+
+ A value of 0 is equivalent to the V4L2_IN_ST_NO_SIGNAL flag of the V4L2
+ VIDIOC_ENUMINPUT status bits.
+
+ | 0 - not detected
+ | 1 - detected
+
+**video_width** (R):
+ Video stream width. This is the actual width as detected by the HW.
+
+ The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in the width
+ field of the v4l2_bt_timings struct.
+
+**video_height** (R):
+ Video stream height. This is the actual height as detected by the HW.
+
+ The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in the height
+ field of the v4l2_bt_timings struct.
+
+**vsync_status** (R):
+ The type of VSYNC pulses as detected by the video format detector.
+
+ The value is equivalent to the flags returned by VIDIOC_QUERY_DV_TIMINGS in
+ the polarities field of the v4l2_bt_timings struct.
+
+ | 0 - active low
+ | 1 - active high
+ | 2 - not available
+
+**hsync_status** (R):
+ The type of HSYNC pulses as detected by the video format detector.
+
+ The value is equivalent to the flags returned by VIDIOC_QUERY_DV_TIMINGS in
+ the polarities field of the v4l2_bt_timings struct.
+
+ | 0 - active low
+ | 1 - active high
+ | 2 - not available
+
+**vsync_gap_length** (RW):
+ If the incoming video signal does not contain synchronization VSYNC and
+ HSYNC pulses, these must be generated internally in the FPGA to achieve
+ the correct frame ordering. This value indicates, how many "empty" pixels
+ (pixels with deasserted Data Enable signal) are necessary to generate the
+ internal VSYNC pulse.
+
+**hsync_gap_length** (RW):
+ If the incoming video signal does not contain synchronization VSYNC and
+ HSYNC pulses, these must be generated internally in the FPGA to achieve
+ the correct frame ordering. This value indicates, how many "empty" pixels
+ (pixels with deasserted Data Enable signal) are necessary to generate the
+ internal HSYNC pulse. The value must be greater than 1 and smaller than
+ vsync_gap_length.
+
+**pclk_frequency** (R):
+ Input pixel clock frequency in kHz.
+
+ The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in
+ the pixelclock field of the v4l2_bt_timings struct.
+
+ *Note: The frequency_range parameter must be set properly first to get
+ a valid frequency here.*
+
+**hsync_width** (R):
+ Width of the HSYNC signal in PCLK clock ticks.
+
+ The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in
+ the hsync field of the v4l2_bt_timings struct.
+
+**vsync_width** (R):
+ Width of the VSYNC signal in PCLK clock ticks.
+
+ The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in
+ the vsync field of the v4l2_bt_timings struct.
+
+**hback_porch** (R):
+ Number of PCLK pulses between deassertion of the HSYNC signal and the first
+ valid pixel in the video line (marked by DE=1).
+
+ The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in
+ the hbackporch field of the v4l2_bt_timings struct.
+
+**hfront_porch** (R):
+ Number of PCLK pulses between the end of the last valid pixel in the video
+ line (marked by DE=1) and assertion of the HSYNC signal.
+
+ The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in
+ the hfrontporch field of the v4l2_bt_timings struct.
+
+**vback_porch** (R):
+ Number of video lines between deassertion of the VSYNC signal and the video
+ line with the first valid pixel (marked by DE=1).
+
+ The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in
+ the vbackporch field of the v4l2_bt_timings struct.
+
+**vfront_porch** (R):
+ Number of video lines between the end of the last valid pixel line (marked
+ by DE=1) and assertion of the VSYNC signal.
+
+ The value is identical to what VIDIOC_QUERY_DV_TIMINGS returns in
+ the vfrontporch field of the v4l2_bt_timings struct.
+
+**frequency_range** (RW)
+ PLL frequency range of the OLDI input clock generator. The PLL frequency is
+ derived from the Pixel Clock Frequency (PCLK) and is equal to PCLK if
+ oldi_lane_width is set to "single" and PCLK/2 if oldi_lane_width is set to
+ "dual".
+
+ | 0 - PLL < 50MHz (default)
+ | 1 - PLL >= 50MHz
+
+ *Note: This parameter can not be changed while the input v4l2 device is
+ open.*
+
+
+Common FPDL3/GMSL output parameters
+===================================
+
+**output_id** (R):
+ Output number ID, zero based.
+
+**video_source** (RW):
+ Output video source. If set to 0 or 1, the source is the corresponding card
+ input and the v4l2 output devices are disabled. If set to 2 or 3, the source
+ is the corresponding v4l2 video output device. The default is
+ the corresponding v4l2 output, i.e. 2 for OUT1 and 3 for OUT2.
+
+ | 0 - input 0
+ | 1 - input 1
+ | 2 - v4l2 output 0
+ | 3 - v4l2 output 1
+
+ *Note: This parameter can not be changed while ANY of the input/output v4l2
+ devices is open.*
+
+**display_width** (RW):
+ Display width. There is no autodetection of the connected display, so the
+ proper value must be set before the start of streaming. The default width
+ is 1280.
+
+ *Note: This parameter can not be changed while the output v4l2 device is
+ open.*
+
+**display_height** (RW):
+ Display height. There is no autodetection of the connected display, so the
+ proper value must be set before the start of streaming. The default height
+ is 640.
+
+ *Note: This parameter can not be changed while the output v4l2 device is
+ open.*
+
+**frame_rate** (RW):
+ Output video frame rate in frames per second. The default frame rate is
+ 60Hz.
+
+**hsync_polarity** (RW):
+ HSYNC signal polarity.
+
+ | 0 - active low (default)
+ | 1 - active high
+
+**vsync_polarity** (RW):
+ VSYNC signal polarity.
+
+ | 0 - active low (default)
+ | 1 - active high
+
+**de_polarity** (RW):
+ DE signal polarity.
+
+ | 0 - active low
+ | 1 - active high (default)
+
+**pclk_frequency** (RW):
+ Output pixel clock frequency. Allowed values are between 25000-190000(kHz)
+ and there is a non-linear stepping between two consecutive allowed
+ frequencies. The driver finds the nearest allowed frequency to the given
+ value and sets it. When reading this property, you get the exact
+ frequency set by the driver. The default frequency is 70000kHz.
+
+ *Note: This parameter can not be changed while the output v4l2 device is
+ open.*
+
+**hsync_width** (RW):
+ Width of the HSYNC signal in pixels. The default value is 16.
+
+**vsync_width** (RW):
+ Width of the VSYNC signal in video lines. The default value is 2.
+
+**hback_porch** (RW):
+ Number of PCLK pulses between deassertion of the HSYNC signal and the first
+ valid pixel in the video line (marked by DE=1). The default value is 32.
+
+**hfront_porch** (RW):
+ Number of PCLK pulses between the end of the last valid pixel in the video
+ line (marked by DE=1) and assertion of the HSYNC signal. The default value
+ is 32.
+
+**vback_porch** (RW):
+ Number of video lines between deassertion of the VSYNC signal and the video
+ line with the first valid pixel (marked by DE=1). The default value is 2.
+
+**vfront_porch** (RW):
+ Number of video lines between the end of the last valid pixel line (marked
+ by DE=1) and assertion of the VSYNC signal. The default value is 2.
+
+
+FPDL3 specific input parameters
+===============================
+
+**fpdl3_input_width** (RW):
+ Number of deserializer input lines.
+
+ | 0 - auto (default)
+ | 1 - single
+ | 2 - dual
+
+FPDL3 specific output parameters
+================================
+
+**fpdl3_output_width** (RW):
+ Number of serializer output lines.
+
+ | 0 - auto (default)
+ | 1 - single
+ | 2 - dual
+
+GMSL specific input parameters
+==============================
+
+**gmsl_mode** (RW):
+ GMSL speed mode.
+
+ | 0 - 12Gb/s (default)
+ | 1 - 6Gb/s
+ | 2 - 3Gb/s
+ | 3 - 1.5Gb/s
+
+**gmsl_stream_id** (RW):
+ The GMSL multi-stream contains up to four video streams. This parameter
+ selects which stream is captured by the video input. The value is the
+ zero-based index of the stream. The default stream id is 0.
+
+ *Note: This parameter can not be changed while the input v4l2 device is
+ open.*
+
+**gmsl_fec** (RW):
+ GMSL Forward Error Correction (FEC).
+
+ | 0 - disabled
+ | 1 - enabled (default)
+
+
+====================
+mgb4 mtd partitions
+====================
+
+The mgb4 driver creates a MTD device with two partitions:
+ - mgb4-fw.X - FPGA firmware.
+ - mgb4-data.X - Factory settings, e.g. card serial number.
+
+The *mgb4-fw* partition is writable and is used for FW updates, *mgb4-data* is
+read-only. The *X* attached to the partition name represents the card number.
+Depending on the CONFIG_MTD_PARTITIONED_MASTER kernel configuration, you may
+also have a third partition named *mgb4-flash* available in the system. This
+partition represents the whole, unpartitioned, card's FLASH memory and one should
+not fiddle with it...
+
+====================
+mgb4 iio (triggers)
+====================
+
+The mgb4 driver creates an Industrial I/O (IIO) device that provides trigger and
+signal level status capability. The following scan elements are available:
+
+**activity**:
+ The trigger levels and pending status.
+
+ | bit 1 - trigger 1 pending
+ | bit 2 - trigger 2 pending
+ | bit 5 - trigger 1 level
+ | bit 6 - trigger 2 level
+
+**timestamp**:
+ The trigger event timestamp.
+
+The iio device can operate either in "raw" mode where you can fetch the signal
+levels (activity bits 5 and 6) using sysfs access or in triggered buffer mode.
+In the triggered buffer mode you can follow the signal level changes (activity
+bits 1 and 2) using the iio device in /dev. If you enable the timestamps, you
+will also get the exact trigger event time that can be matched to a video frame
+(every mgb4 video frame has a timestamp with the same clock source).
+
+*Note: although the activity sample always contains all the status bits, it makes
+no sense to get the pending bits in raw mode or the level bits in the triggered
+buffer mode - the values do not represent valid data in such case.*
diff --git a/Documentation/admin-guide/media/misc-cardlist.rst b/Documentation/admin-guide/media/misc-cardlist.rst
new file mode 100644
index 000000000000..4c26bcfccd61
--- /dev/null
+++ b/Documentation/admin-guide/media/misc-cardlist.rst
@@ -0,0 +1,28 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Firewire driver
+===============
+
+The media subsystem also provides a firewire driver for digital TV:
+
+======= =====================
+Driver Name
+======= =====================
+firedtv FireDTV and FloppyDTV
+======= =====================
+
+Test drivers
+============
+
+In order to test userspace applications, there's a number of virtual
+drivers, with provide test functionality, simulating real hardware
+devices:
+
+======= ======================================
+Driver Name
+======= ======================================
+vicodec Virtual Codec Driver
+vim2m Virtual Memory-to-Memory Driver
+vimc Virtual Media Controller Driver (VIMC)
+vivid Virtual Video Test Driver
+======= ======================================
diff --git a/Documentation/admin-guide/media/omap3isp.rst b/Documentation/admin-guide/media/omap3isp.rst
new file mode 100644
index 000000000000..f32e7375a1a2
--- /dev/null
+++ b/Documentation/admin-guide/media/omap3isp.rst
@@ -0,0 +1,92 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: <isonum.txt>
+
+OMAP 3 Image Signal Processor (ISP) driver
+==========================================
+
+Copyright |copy| 2010 Nokia Corporation
+
+Copyright |copy| 2009 Texas Instruments, Inc.
+
+Contacts: Laurent Pinchart <laurent.pinchart@ideasonboard.com>,
+Sakari Ailus <sakari.ailus@iki.fi>, David Cohen <dacohen@gmail.com>
+
+
+Introduction
+------------
+
+This file documents the Texas Instruments OMAP 3 Image Signal Processor (ISP)
+driver located under drivers/media/platform/ti/omap3isp. The original driver was
+written by Texas Instruments but since that it has been rewritten (twice) at
+Nokia.
+
+The driver has been successfully used on the following versions of OMAP 3:
+
+- 3430
+- 3530
+- 3630
+
+The driver implements V4L2, Media controller and v4l2_subdev interfaces.
+Sensor, lens and flash drivers using the v4l2_subdev interface in the kernel
+are supported.
+
+
+Split to subdevs
+----------------
+
+The OMAP 3 ISP is split into V4L2 subdevs, each of the blocks inside the ISP
+having one subdev to represent it. Each of the subdevs provide a V4L2 subdev
+interface to userspace.
+
+- OMAP3 ISP CCP2
+- OMAP3 ISP CSI2a
+- OMAP3 ISP CCDC
+- OMAP3 ISP preview
+- OMAP3 ISP resizer
+- OMAP3 ISP AEWB
+- OMAP3 ISP AF
+- OMAP3 ISP histogram
+
+Each possible link in the ISP is modelled by a link in the Media controller
+interface. For an example program see [#]_.
+
+
+Controlling the OMAP 3 ISP
+--------------------------
+
+In general, the settings given to the OMAP 3 ISP take effect at the beginning
+of the following frame. This is done when the module becomes idle during the
+vertical blanking period on the sensor. In memory-to-memory operation the pipe
+is run one frame at a time. Applying the settings is done between the frames.
+
+All the blocks in the ISP, excluding the CSI-2 and possibly the CCP2 receiver,
+insist on receiving complete frames. Sensors must thus never send the ISP
+partial frames.
+
+Autoidle does have issues with some ISP blocks on the 3430, at least.
+Autoidle is only enabled on 3630 when the omap3isp module parameter autoidle
+is non-zero.
+
+Technical reference manuals (TRMs) and other documentation
+----------------------------------------------------------
+
+OMAP 3430 TRM:
+<URL:http://focus.ti.com/pdfs/wtbu/OMAP34xx_ES3.1.x_PUBLIC_TRM_vZM.zip>
+Referenced 2011-03-05.
+
+OMAP 35xx TRM:
+<URL:http://www.ti.com/litv/pdf/spruf98o> Referenced 2011-03-05.
+
+OMAP 3630 TRM:
+<URL:http://focus.ti.com/pdfs/wtbu/OMAP36xx_ES1.x_PUBLIC_TRM_vQ.zip>
+Referenced 2011-03-05.
+
+DM 3730 TRM:
+<URL:http://www.ti.com/litv/pdf/sprugn4h> Referenced 2011-03-06.
+
+
+References
+----------
+
+.. [#] http://git.ideasonboard.org/?p=media-ctl.git;a=summary
diff --git a/Documentation/admin-guide/media/omap4_camera.rst b/Documentation/admin-guide/media/omap4_camera.rst
new file mode 100644
index 000000000000..2ada9b1e6897
--- /dev/null
+++ b/Documentation/admin-guide/media/omap4_camera.rst
@@ -0,0 +1,62 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+OMAP4 ISS Driver
+================
+
+Author: Sergio Aguirre <sergio.a.aguirre@gmail.com>
+
+Copyright (C) 2012, Texas Instruments
+
+Introduction
+------------
+
+The OMAP44XX family of chips contains the Imaging SubSystem (a.k.a. ISS),
+Which contains several components that can be categorized in 3 big groups:
+
+- Interfaces (2 Interfaces: CSI2-A & CSI2-B/CCP2)
+- ISP (Image Signal Processor)
+- SIMCOP (Still Image Coprocessor)
+
+For more information, please look in [#f1]_ for latest version of:
+"OMAP4430 Multimedia Device Silicon Revision 2.x"
+
+As of Revision AB, the ISS is described in detail in section 8.
+
+This driver is supporting **only** the CSI2-A/B interfaces for now.
+
+It makes use of the Media Controller framework [#f2]_, and inherited most of the
+code from OMAP3 ISP driver (found under drivers/media/platform/ti/omap3isp/\*),
+except that it doesn't need an IOMMU now for ISS buffers memory mapping.
+
+Supports usage of MMAP buffers only (for now).
+
+Tested platforms
+----------------
+
+- OMAP4430SDP, w/ ES2.1 GP & SEVM4430-CAM-V1-0 (Contains IMX060 & OV5640, in
+ which only the last one is supported, outputting YUV422 frames).
+
+- TI Blaze MDP, w/ OMAP4430 ES2.2 EMU (Contains 1 IMX060 & 2 OV5650 sensors, in
+ which only the OV5650 are supported, outputting RAW10 frames).
+
+- PandaBoard, Rev. A2, w/ OMAP4430 ES2.1 GP & OV adapter board, tested with
+ following sensors:
+ * OV5640
+ * OV5650
+
+- Tested on mainline kernel:
+
+ http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=summary
+
+ Tag: v3.3 (commit c16fa4f2ad19908a47c63d8fa436a1178438c7e7)
+
+File list
+---------
+drivers/staging/media/omap4iss/
+include/linux/platform_data/media/omap4iss.h
+
+References
+----------
+
+.. [#f1] http://focus.ti.com/general/docs/wtbu/wtbudocumentcenter.tsp?navigationId=12037&templateId=6123#62
+.. [#f2] http://lwn.net/Articles/420485/
diff --git a/Documentation/admin-guide/media/opera-firmware.rst b/Documentation/admin-guide/media/opera-firmware.rst
new file mode 100644
index 000000000000..fab3581551de
--- /dev/null
+++ b/Documentation/admin-guide/media/opera-firmware.rst
@@ -0,0 +1,33 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Opera firmware
+==============
+
+Author: Marco Gittler <g.marco@freenet.de>
+
+To extract the firmware for the Opera DVB-S1 USB-Box
+you need to copy the files:
+
+2830SCap2.sys
+2830SLoad2.sys
+
+from the windriver disk into this directory.
+
+Then run:
+
+.. code-block:: none
+
+ scripts/get_dvb_firmware opera1
+
+and after that you have 2 files:
+
+dvb-usb-opera-01.fw
+dvb-usb-opera1-fpga-01.fw
+
+in here.
+
+Copy them into /lib/firmware/ .
+
+After that the driver can load the firmware
+(if you have enabled firmware loading
+in kernel config and have hotplug running).
diff --git a/Documentation/admin-guide/media/other-usb-cardlist.rst b/Documentation/admin-guide/media/other-usb-cardlist.rst
new file mode 100644
index 000000000000..fb88db50e861
--- /dev/null
+++ b/Documentation/admin-guide/media/other-usb-cardlist.rst
@@ -0,0 +1,78 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Other USB cards list
+====================
+
+================ ====================================== =====================
+Driver Card name USB IDs
+================ ====================================== =====================
+airspy Airspy 1d50:60a1
+dvb-as102 Abilis Systems DVB-Titan 1BA6:0001
+dvb-as102 PCTV Systems picoStick (74e) 2013:0246
+dvb-as102 Elgato EyeTV DTT Deluxe 0fd9:002c
+dvb-as102 nBox DVB-T Dongle 0b89:0007
+dvb-as102 Sky IT Digital Key (green led) 2137:0001
+b2c2-flexcop-usb Technisat/B2C2 FlexCop II/IIb/III 0af7:0101
+ Digital TV
+go7007 WIS GO7007 MPEG encoder 1943:a250, 093b:a002,
+ 093b:a004, 0eb1:6666,
+ 0eb1:6668
+hackrf HackRF Software Decoder Radio 1d50:6089
+hdpvr Hauppauge HD PVR 2040:4900, 2040:4901,
+ 2040:4902, 2040:4982,
+ 2040:4903
+msi2500 Mirics MSi3101 SDR Dongle 1df7:2500, 2040:d300
+pvrusb2 Hauppauge WinTV-PVR USB2 2040:2900, 2040:2950,
+ 2040:2400, 1164:0622,
+ 1164:0602, 11ba:1003,
+ 11ba:1001, 2040:7300,
+ 2040:7500, 2040:7501,
+ 0ccd:0039, 2040:7502,
+ 2040:7510
+pwc Creative Webcam 5 041E:400C
+pwc Creative Webcam Pro Ex 041E:4011
+pwc Logitech QuickCam 3000 Pro 046D:08B0
+pwc Logitech QuickCam Notebook Pro 046D:08B1
+pwc Logitech QuickCam 4000 Pro 046D:08B2
+pwc Logitech QuickCam Zoom (old model) 046D:08B3
+pwc Logitech QuickCam Zoom (new model) 046D:08B4
+pwc Logitech QuickCam Orbit/Sphere 046D:08B5
+pwc Logitech/Cisco VT Camera 046D:08B6
+pwc Logitech ViewPort AV 100 046D:08B7
+pwc Logitech QuickCam 046D:08B8
+pwc Philips PCA645VC 0471:0302
+pwc Philips PCA646VC 0471:0303
+pwc Askey VC010 type 2 0471:0304
+pwc Philips PCVC675K (Vesta) 0471:0307
+pwc Philips PCVC680K (Vesta Pro) 0471:0308
+pwc Philips PCVC690K (Vesta Pro Scan) 0471:030C
+pwc Philips PCVC730K (ToUCam Fun), 0471:0310
+ PCVC830 (ToUCam II)
+pwc Philips PCVC740K (ToUCam Pro), 0471:0311
+ PCVC840 (ToUCam II)
+pwc Philips PCVC750K (ToUCam Pro Scan) 0471:0312
+pwc Philips PCVC720K/40 (ToUCam XS) 0471:0313
+pwc Philips SPC 900NC 0471:0329
+pwc Philips SPC 880NC 0471:032C
+pwc Sotec Afina Eye 04CC:8116
+pwc Samsung MPC-C10 055D:9000
+pwc Samsung MPC-C30 055D:9001
+pwc Samsung SNC-35E (Ver3.0) 055D:9002
+pwc Askey VC010 type 1 069A:0001
+pwc AME Co. Afina Eye 06BE:8116
+pwc Visionite VCS-UC300 0d81:1900
+pwc Visionite VCS-UM100 0d81:1910
+s2255drv Sensoray 2255 1943:2255, 1943:2257
+stk1160 STK1160 USB video capture dongle 05e1:0408
+dvb-ttusb-budget Technotrend/Hauppauge Nova-USB devices 0b48:1003, 0b48:1004,
+ 0b48:1005
+dvb-ttusb_dec Technotrend/Hauppauge MPEG decoder 0b48:1006
+ DEC3000-s
+dvb-ttusb_dec Technotrend/Hauppauge MPEG decoder 0b48:1007
+dvb-ttusb_dec Technotrend/Hauppauge MPEG decoder 0b48:1008
+ DEC2000-t
+dvb-ttusb_dec Technotrend/Hauppauge MPEG decoder
+ DEC2540-t 0b48:1009
+usbtv Fushicai USBTV007 Audio-Video Grabber 1b71:3002, 1f71:3301,
+ 1f71:3306
+================ ====================================== =====================
diff --git a/Documentation/admin-guide/media/pci-cardlist.rst b/Documentation/admin-guide/media/pci-cardlist.rst
new file mode 100644
index 000000000000..7d8e3c8987db
--- /dev/null
+++ b/Documentation/admin-guide/media/pci-cardlist.rst
@@ -0,0 +1,109 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+PCI drivers
+===========
+
+The PCI boards are identified by an identification called PCI ID. The PCI ID
+is actually composed by two parts:
+
+ - Vendor ID and device ID;
+ - Subsystem ID and Subsystem device ID;
+
+The ``lspci -nn`` command allows identifying the vendor/device PCI IDs:
+
+.. code-block:: none
+ :emphasize-lines: 3
+
+ $ lspci -nn
+ ...
+ 00:0a.0 Multimedia controller [0480]: Philips Semiconductors SAA7131/SAA7133/SAA7135 Video Broadcast Decoder [1131:7133] (rev d1)
+ 00:0b.0 Multimedia controller [0480]: Brooktree Corporation Bt878 Audio Capture [109e:0878] (rev 11)
+ 01:00.0 Multimedia video controller [0400]: Conexant Systems, Inc. CX23887/8 PCIe Broadcast Audio and Video Decoder with 3D Comb [14f1:8880] (rev 0f)
+ 02:01.0 Multimedia video controller [0400]: Internext Compression Inc iTVC15 (CX23415) Video Decoder [4444:0803] (rev 01)
+ 02:02.0 Multimedia video controller [0400]: Conexant Systems, Inc. CX23418 Single-Chip MPEG-2 Encoder with Integrated Analog Video/Broadcast Audio Decoder [14f1:5b7a]
+ 02:03.0 Multimedia video controller [0400]: Brooktree Corporation Bt878 Video Capture [109e:036e] (rev 11)
+ ...
+
+The subsystem IDs can be obtained using ``lspci -vn``
+
+.. code-block:: none
+ :emphasize-lines: 4
+
+ $ lspci -vn
+ ...
+ 00:0a.0 0480: 1131:7133 (rev d1)
+ Subsystem: 1461:f01d
+ Flags: bus master, medium devsel, latency 32, IRQ 209
+ Memory at e2002000 (32-bit, non-prefetchable) [size=2K]
+ Capabilities: [40] Power Management version 2
+ ...
+
+At the above example, the first card uses the ``saa7134`` driver, and
+has a vendor/device PCI ID equal to ``1131:7133`` and a PCI subsystem
+ID equal to ``1461:f01d`` (see :doc:`Saa7134 card list<saa7134-cardlist>`).
+
+Unfortunately, sometimes the same PCI subsystem ID is used by different
+products. So, several media drivers allow passing a ``card=`` parameter,
+in order to setup a card number that would match the correct settings for
+an specific board.
+
+The current supported PCI/PCIe cards (not including staging drivers) are
+listed below\ [#]_.
+
+.. [#] some of the drivers have sub-drivers, not shown at this table
+
+================ ========================================================
+Driver Name
+================ ========================================================
+altera-ci Altera FPGA based CI module
+b2c2-flexcop-pci Technisat/B2C2 Air/Sky/Cable2PC PCI
+bt878 DVB/ATSC Support for bt878 based TV cards
+bttv BT8x8 Video For Linux
+cobalt Cisco Cobalt
+cx18 Conexant cx23418 MPEG encoder
+cx23885 Conexant cx23885 (2388x successor)
+cx25821 Conexant cx25821
+cx88xx Conexant 2388x (bt878 successor)
+ddbridge Digital Devices bridge
+dm1105 SDMC DM1105 based PCI cards
+dt3155 DT3155 frame grabber
+dvb-ttpci AV7110 cards
+earth-pt1 PT1 cards
+earth-pt3 Earthsoft PT3 cards
+hexium_gemini Hexium Gemini frame grabber
+hexium_orion Hexium HV-PCI6 and Orion frame grabber
+hopper HOPPER based cards
+ipu3-cio2 Intel ipu3-cio2 driver
+ivtv Conexant cx23416/cx23415 MPEG encoder/decoder
+ivtvfb Conexant cx23415 framebuffer
+mantis MANTIS based cards
+mgb4 Digiteq Automotive MGB4 frame grabber
+mxb Siemens-Nixdorf 'Multimedia eXtension Board'
+netup-unidvb NetUP Universal DVB card
+ngene Micronas nGene
+pluto2 Pluto2 cards
+saa7134 Philips SAA7134
+saa7164 NXP SAA7164
+smipcie SMI PCIe DVBSky cards
+solo6x10 Bluecherry / Softlogic 6x10 capture cards (MPEG-4/H.264)
+sta2x11_vip STA2X11 VIP Video For Linux
+tw5864 Techwell TW5864 video/audio grabber and encoder
+tw686x Intersil/Techwell TW686x
+tw68 Techwell tw68x Video For Linux
+zoran Zoran-36057/36067 JPEG codec
+================ ========================================================
+
+Some of those drivers support multiple devices, as shown at the card
+lists below:
+
+.. toctree::
+ :maxdepth: 1
+
+ bttv-cardlist
+ cx18-cardlist
+ cx23885-cardlist
+ cx88-cardlist
+ ivtv-cardlist
+ saa7134-cardlist
+ saa7164-cardlist
+ zoran-cardlist
diff --git a/Documentation/admin-guide/media/philips.rst b/Documentation/admin-guide/media/philips.rst
new file mode 100644
index 000000000000..e2840be10d08
--- /dev/null
+++ b/Documentation/admin-guide/media/philips.rst
@@ -0,0 +1,247 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Philips webcams (pwc driver)
+============================
+
+This file contains some additional information for the Philips and OEM webcams.
+E-mail: webcam@smcc.demon.nl Last updated: 2004-01-19
+Site: http://www.smcc.demon.nl/webcam/
+
+As of this moment, the following cameras are supported:
+
+ * Philips PCA645
+ * Philips PCA646
+ * Philips PCVC675
+ * Philips PCVC680
+ * Philips PCVC690
+ * Philips PCVC720/40
+ * Philips PCVC730
+ * Philips PCVC740
+ * Philips PCVC750
+ * Askey VC010
+ * Creative Labs Webcam 5
+ * Creative Labs Webcam Pro Ex
+ * Logitech QuickCam 3000 Pro
+ * Logitech QuickCam 4000 Pro
+ * Logitech QuickCam Notebook Pro
+ * Logitech QuickCam Zoom
+ * Logitech QuickCam Orbit
+ * Logitech QuickCam Sphere
+ * Samsung MPC-C10
+ * Samsung MPC-C30
+ * Sotec Afina Eye
+ * AME CU-001
+ * Visionite VCS-UM100
+ * Visionite VCS-UC300
+
+The main webpage for the Philips driver is at the address above. It contains
+a lot of extra information, a FAQ, and the binary plugin 'PWCX'. This plugin
+contains decompression routines that allow you to use higher image sizes and
+framerates; in addition the webcam uses less bandwidth on the USB bus (handy
+if you want to run more than 1 camera simultaneously). These routines fall
+under a NDA, and may therefore not be distributed as source; however, its use
+is completely optional.
+
+You can build this code either into your kernel, or as a module. I recommend
+the latter, since it makes troubleshooting a lot easier. The built-in
+microphone is supported through the USB Audio class.
+
+When you load the module you can set some default settings for the
+camera; some programs depend on a particular image-size or -format and
+don't know how to set it properly in the driver. The options are:
+
+size
+ Can be one of 'sqcif', 'qsif', 'qcif', 'sif', 'cif' or
+ 'vga', for an image size of resp. 128x96, 160x120, 176x144,
+ 320x240, 352x288 and 640x480 (of course, only for those cameras that
+ support these resolutions).
+
+fps
+ Specifies the desired framerate. Is an integer in the range of 4-30.
+
+fbufs
+ This parameter specifies the number of internal buffers to use for storing
+ frames from the cam. This will help if the process that reads images from
+ the cam is a bit slow or momentarily busy. However, on slow machines it
+ only introduces lag, so choose carefully. The default is 3, which is
+ reasonable. You can set it between 2 and 5.
+
+mbufs
+ This is an integer between 1 and 10. It will tell the module the number of
+ buffers to reserve for mmap(), VIDIOCCGMBUF, VIDIOCMCAPTURE and friends.
+ The default is 2, which is adequate for most applications (double
+ buffering).
+
+ Should you experience a lot of 'Dumping frame...' messages during
+ grabbing with a tool that uses mmap(), you might want to increase if.
+ However, it doesn't really buffer images, it just gives you a bit more
+ slack when your program is behind. But you need a multi-threaded or
+ forked program to really take advantage of these buffers.
+
+ The absolute maximum is 10, but don't set it too high! Every buffer takes
+ up 460 KB of RAM, so unless you have a lot of memory setting this to
+ something more than 4 is an absolute waste. This memory is only
+ allocated during open(), so nothing is wasted when the camera is not in
+ use.
+
+power_save
+ When power_save is enabled (set to 1), the module will try to shut down
+ the cam on close() and re-activate on open(). This will save power and
+ turn off the LED. Not all cameras support this though (the 645 and 646
+ don't have power saving at all), and some models don't work either (they
+ will shut down, but never wake up). Consider this experimental. By
+ default this option is disabled.
+
+compression (only useful with the plugin)
+ With this option you can control the compression factor that the camera
+ uses to squeeze the image through the USB bus. You can set the
+ parameter between 0 and 3::
+
+ 0 = prefer uncompressed images; if the requested mode is not available
+ in an uncompressed format, the driver will silently switch to low
+ compression.
+ 1 = low compression.
+ 2 = medium compression.
+ 3 = high compression.
+
+ High compression takes less bandwidth of course, but it could also
+ introduce some unwanted artefacts. The default is 2, medium compression.
+ See the FAQ on the website for an overview of which modes require
+ compression.
+
+ The compression parameter does not apply to the 645 and 646 cameras
+ and OEM models derived from those (only a few). Most cams honour this
+ parameter.
+
+leds
+ This settings takes 2 integers, that define the on/off time for the LED
+ (in milliseconds). One of the interesting things that you can do with
+ this is let the LED blink while the camera is in use. This::
+
+ leds=500,500
+
+ will blink the LED once every second. But with::
+
+ leds=0,0
+
+ the LED never goes on, making it suitable for silent surveillance.
+
+ By default the camera's LED is on solid while in use, and turned off
+ when the camera is not used anymore.
+
+ This parameter works only with the ToUCam range of cameras (720, 730, 740,
+ 750) and OEMs. For other cameras this command is silently ignored, and
+ the LED cannot be controlled.
+
+ Finally: this parameters does not take effect UNTIL the first time you
+ open the camera device. Until then, the LED remains on.
+
+dev_hint
+ A long standing problem with USB devices is their dynamic nature: you
+ never know what device a camera gets assigned; it depends on module load
+ order, the hub configuration, the order in which devices are plugged in,
+ and the phase of the moon (i.e. it can be random). With this option you
+ can give the driver a hint as to what video device node (/dev/videoX) it
+ should use with a specific camera. This is also handy if you have two
+ cameras of the same model.
+
+ A camera is specified by its type (the number from the camera model,
+ like PCA645, PCVC750VC, etc) and optionally the serial number (visible
+ in /sys/kernel/debug/usb/devices). A hint consists of a string with the
+ following format::
+
+ [type[.serialnumber]:]node
+
+ The square brackets mean that both the type and the serialnumber are
+ optional, but a serialnumber cannot be specified without a type (which
+ would be rather pointless). The serialnumber is separated from the type
+ by a '.'; the node number by a ':'.
+
+ This somewhat cryptic syntax is best explained by a few examples::
+
+ dev_hint=3,5 The first detected cam gets assigned
+ /dev/video3, the second /dev/video5. Any
+ other cameras will get the first free
+ available slot (see below).
+
+ dev_hint=645:1,680:2 The PCA645 camera will get /dev/video1,
+ and a PCVC680 /dev/video2.
+
+ dev_hint=645.0123:3,645.4567:0 The PCA645 camera with serialnumber
+ 0123 goes to /dev/video3, the same
+ camera model with the 4567 serial
+ gets /dev/video0.
+
+ dev_hint=750:1,4,5,6 The PCVC750 camera will get /dev/video1, the
+ next 3 Philips cams will use /dev/video4
+ through /dev/video6.
+
+ Some points worth knowing:
+
+ - Serialnumbers are case sensitive and must be written full, including
+ leading zeroes (it's treated as a string).
+ - If a device node is already occupied, registration will fail and
+ the webcam is not available.
+ - You can have up to 64 video devices; be sure to make enough device
+ nodes in /dev if you want to spread the numbers.
+ After /dev/video9 comes /dev/video10 (not /dev/videoA).
+ - If a camera does not match any dev_hint, it will simply get assigned
+ the first available device node, just as it used to be.
+
+trace
+ In order to better detect problems, it is now possible to turn on a
+ 'trace' of some of the calls the module makes; it logs all items in your
+ kernel log at debug level.
+
+ The trace variable is a bitmask; each bit represents a certain feature.
+ If you want to trace something, look up the bit value(s) in the table
+ below, add the values together and supply that to the trace variable.
+
+ ====== ======= ================================================ =======
+ Value Value Description Default
+ (dec) (hex)
+ ====== ======= ================================================ =======
+ 1 0x1 Module initialization; this will log messages On
+ while loading and unloading the module
+
+ 2 0x2 probe() and disconnect() traces On
+
+ 4 0x4 Trace open() and close() calls Off
+
+ 8 0x8 read(), mmap() and associated ioctl() calls Off
+
+ 16 0x10 Memory allocation of buffers, etc. Off
+
+ 32 0x20 Showing underflow, overflow and Dumping frame On
+ messages
+
+ 64 0x40 Show viewport and image sizes Off
+
+ 128 0x80 PWCX debugging Off
+ ====== ======= ================================================ =======
+
+ For example, to trace the open() & read() functions, sum 8 + 4 = 12,
+ so you would supply trace=12 during insmod or modprobe. If
+ you want to turn the initialization and probing tracing off, set trace=0.
+ The default value for trace is 35 (0x23).
+
+
+
+Example::
+
+ # modprobe pwc size=cif fps=15 power_save=1
+
+The fbufs, mbufs and trace parameters are global and apply to all connected
+cameras. Each camera has its own set of buffers.
+
+size and fps only specify defaults when you open() the device; this is to
+accommodate some tools that don't set the size. You can change these
+settings after open() with the Video4Linux ioctl() calls. The default of
+defaults is QCIF size at 10 fps.
+
+The compression parameter is semiglobal; it sets the initial compression
+preference for all camera's, but this parameter can be set per camera with
+the VIDIOCPWCSCQUAL ioctl() call.
+
+All parameters are optional.
+
diff --git a/Documentation/admin-guide/media/platform-cardlist.rst b/Documentation/admin-guide/media/platform-cardlist.rst
new file mode 100644
index 000000000000..1230ae4037ad
--- /dev/null
+++ b/Documentation/admin-guide/media/platform-cardlist.rst
@@ -0,0 +1,89 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Platform drivers
+================
+
+There are several drivers that are focused on providing support for
+functionality that are already included at the main board, and don't
+use neither USB nor PCI bus. Those drivers are called platform
+drivers, and are very popular on embedded devices.
+
+The current supported of platform drivers (not including staging drivers) are
+listed below
+
+================= ============================================================
+Driver Name
+================= ============================================================
+am437x-vpfe TI AM437x VPFE
+aspeed-video Aspeed AST2400 and AST2500
+atmel-isc ATMEL Image Sensor Controller (ISC)
+atmel-isi ATMEL Image Sensor Interface (ISI)
+c8sectpfe SDR platform devices
+c8sectpfe SDR platform devices
+cafe_ccic Marvell 88ALP01 (Cafe) CMOS Camera Controller
+cdns-csi2rx Cadence MIPI-CSI2 RX Controller
+cdns-csi2tx Cadence MIPI-CSI2 TX Controller
+coda-vpu Chips&Media Coda multi-standard codec IP
+dm355_ccdc TI DM355 CCDC video capture
+dm644x_ccdc TI DM6446 CCDC video capture
+exynos-fimc-is EXYNOS4x12 FIMC-IS (Imaging Subsystem)
+exynos-fimc-lite EXYNOS FIMC-LITE camera interface
+exynos-gsc Samsung Exynos G-Scaler
+exy Samsung S5P/EXYNOS4 SoC series Camera Subsystem
+imx-pxp i.MX Pixel Pipeline (PXP)
+isdf TI DM365 ISIF video capture
+mmp_camera Marvell Armada 610 integrated camera controller
+mtk_jpeg Mediatek JPEG Codec
+mtk-mdp Mediatek MDP
+mtk-vcodec-dec Mediatek Video Codec
+mtk-vpu Mediatek Video Processor Unit
+mx2_emmaprp MX2 eMMa-PrP
+omap3-isp OMAP 3 Camera
+omap-vout OMAP2/OMAP3 V4L2-Display
+pxa_camera PXA27x Quick Capture Interface
+qcom-camss Qualcomm V4L2 Camera Subsystem
+rcar-csi2 R-Car MIPI CSI-2 Receiver
+rcar_drif Renesas Digital Radio Interface (DRIF)
+rcar-fcp Renesas Frame Compression Processor
+rcar_fdp1 Renesas Fine Display Processor
+rcar_jpu Renesas JPEG Processing Unit
+rcar-vin R-Car Video Input (VIN)
+renesas-ceu Renesas Capture Engine Unit (CEU)
+rockchip-rga Rockchip Raster 2d Graphic Acceleration Unit
+s3c-camif Samsung S3C24XX/S3C64XX SoC Camera Interface
+s5p-csis S5P/EXYNOS MIPI-CSI2 receiver (MIPI-CSIS)
+s5p-fimc S5P/EXYNOS4 FIMC/CAMIF camera interface
+s5p-g2d Samsung S5P and EXYNOS4 G2D 2d graphics accelerator
+s5p-jpeg Samsung S5P/Exynos3250/Exynos4 JPEG codec
+s5p-mfc Samsung S5P MFC Video Codec
+sh_veu SuperH VEU mem2mem video processing
+sh_vou SuperH VOU video output
+stm32-dcmi STM32 Digital Camera Memory Interface (DCMI)
+stm32-dma2d STM32 Chrom-Art Accelerator Unit
+sun4i-csi Allwinner A10 CMOS Sensor Interface Support
+sun6i-csi Allwinner V3s Camera Sensor Interface
+sun8i-di Allwinner Deinterlace
+sun8i-rotate Allwinner DE2 rotation
+ti-cal TI Memory-to-memory multimedia devices
+ti-csc TI DVB platform devices
+ti-vpe TI VPE (Video Processing Engine)
+venus-enc Qualcomm Venus V4L2 encoder/decoder
+via-camera VIAFB camera controller
+video-mux Video Multiplexer
+vpif_display TI DaVinci VPIF V4L2-Display
+vpif_capture TI DaVinci VPIF video capture
+vsp1 Renesas VSP1 Video Processing Engine
+xilinx-tpg Xilinx Video Test Pattern Generator
+xilinx-video Xilinx Video IP (EXPERIMENTAL)
+xilinx-vtc Xilinx Video Timing Controller
+================= ============================================================
+
+MMC/SDIO DVB adapters
+---------------------
+
+======= ===========================================
+Driver Name
+======= ===========================================
+smssdio Siano SMS1xxx based MDTV via SDIO interface
+======= ===========================================
+
diff --git a/Documentation/admin-guide/media/qcom_camss.rst b/Documentation/admin-guide/media/qcom_camss.rst
new file mode 100644
index 000000000000..8a8f3ff40105
--- /dev/null
+++ b/Documentation/admin-guide/media/qcom_camss.rst
@@ -0,0 +1,185 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: <isonum.txt>
+
+Qualcomm Camera Subsystem driver
+================================
+
+Introduction
+------------
+
+This file documents the Qualcomm Camera Subsystem driver located under
+drivers/media/platform/qcom/camss.
+
+The current version of the driver supports the Camera Subsystem found on
+Qualcomm MSM8916/APQ8016 and MSM8996/APQ8096 processors.
+
+The driver implements V4L2, Media controller and V4L2 subdev interfaces.
+Camera sensor using V4L2 subdev interface in the kernel is supported.
+
+The driver is implemented using as a reference the Qualcomm Camera Subsystem
+driver for Android as found in Code Linaro [#f1]_ [#f2]_.
+
+
+Qualcomm Camera Subsystem hardware
+----------------------------------
+
+The Camera Subsystem hardware found on 8x16 / 8x96 processors and supported by
+the driver consists of:
+
+- 2 / 3 CSIPHY modules. They handle the Physical layer of the CSI2 receivers.
+ A separate camera sensor can be connected to each of the CSIPHY module;
+- 2 / 4 CSID (CSI Decoder) modules. They handle the Protocol and Application
+ layer of the CSI2 receivers. A CSID can decode data stream from any of the
+ CSIPHY. Each CSID also contains a TG (Test Generator) block which can generate
+ artificial input data for test purposes;
+- ISPIF (ISP Interface) module. Handles the routing of the data streams from
+ the CSIDs to the inputs of the VFE;
+- 1 / 2 VFE (Video Front End) module(s). Contain a pipeline of image processing
+ hardware blocks. The VFE has different input interfaces. The PIX (Pixel) input
+ interface feeds the input data to the image processing pipeline. The image
+ processing pipeline contains also a scale and crop module at the end. Three
+ RDI (Raw Dump Interface) input interfaces bypass the image processing
+ pipeline. The VFE also contains the AXI bus interface which writes the output
+ data to memory.
+
+
+Supported functionality
+-----------------------
+
+The current version of the driver supports:
+
+- Input from camera sensor via CSIPHY;
+- Generation of test input data by the TG in CSID;
+- RDI interface of VFE
+
+ - Raw dump of the input data to memory.
+
+ Supported formats:
+
+ - YUYV/UYVY/YVYU/VYUY (packed YUV 4:2:2 - V4L2_PIX_FMT_YUYV /
+ V4L2_PIX_FMT_UYVY / V4L2_PIX_FMT_YVYU / V4L2_PIX_FMT_VYUY);
+ - MIPI RAW8 (8bit Bayer RAW - V4L2_PIX_FMT_SRGGB8 /
+ V4L2_PIX_FMT_SGRBG8 / V4L2_PIX_FMT_SGBRG8 / V4L2_PIX_FMT_SBGGR8);
+ - MIPI RAW10 (10bit packed Bayer RAW - V4L2_PIX_FMT_SBGGR10P /
+ V4L2_PIX_FMT_SGBRG10P / V4L2_PIX_FMT_SGRBG10P / V4L2_PIX_FMT_SRGGB10P /
+ V4L2_PIX_FMT_Y10P);
+ - MIPI RAW12 (12bit packed Bayer RAW - V4L2_PIX_FMT_SRGGB12P /
+ V4L2_PIX_FMT_SGBRG12P / V4L2_PIX_FMT_SGRBG12P / V4L2_PIX_FMT_SRGGB12P).
+ - (8x96 only) MIPI RAW14 (14bit packed Bayer RAW - V4L2_PIX_FMT_SRGGB14P /
+ V4L2_PIX_FMT_SGBRG14P / V4L2_PIX_FMT_SGRBG14P / V4L2_PIX_FMT_SRGGB14P).
+
+ - (8x96 only) Format conversion of the input data.
+
+ Supported input formats:
+
+ - MIPI RAW10 (10bit packed Bayer RAW - V4L2_PIX_FMT_SBGGR10P / V4L2_PIX_FMT_Y10P).
+
+ Supported output formats:
+
+ - Plain16 RAW10 (10bit unpacked Bayer RAW - V4L2_PIX_FMT_SBGGR10 / V4L2_PIX_FMT_Y10).
+
+- PIX interface of VFE
+
+ - Format conversion of the input data.
+
+ Supported input formats:
+
+ - YUYV/UYVY/YVYU/VYUY (packed YUV 4:2:2 - V4L2_PIX_FMT_YUYV /
+ V4L2_PIX_FMT_UYVY / V4L2_PIX_FMT_YVYU / V4L2_PIX_FMT_VYUY).
+
+ Supported output formats:
+
+ - NV12/NV21 (two plane YUV 4:2:0 - V4L2_PIX_FMT_NV12 / V4L2_PIX_FMT_NV21);
+ - NV16/NV61 (two plane YUV 4:2:2 - V4L2_PIX_FMT_NV16 / V4L2_PIX_FMT_NV61).
+ - (8x96 only) YUYV/UYVY/YVYU/VYUY (packed YUV 4:2:2 - V4L2_PIX_FMT_YUYV /
+ V4L2_PIX_FMT_UYVY / V4L2_PIX_FMT_YVYU / V4L2_PIX_FMT_VYUY).
+
+ - Scaling support. Configuration of the VFE Encoder Scale module
+ for downscalling with ratio up to 16x.
+
+ - Cropping support. Configuration of the VFE Encoder Crop module.
+
+- Concurrent and independent usage of two (8x96: three) data inputs -
+ could be camera sensors and/or TG.
+
+
+Driver Architecture and Design
+------------------------------
+
+The driver implements the V4L2 subdev interface. With the goal to model the
+hardware links between the modules and to expose a clean, logical and usable
+interface, the driver is split into V4L2 sub-devices as follows (8x16 / 8x96):
+
+- 2 / 3 CSIPHY sub-devices - each CSIPHY is represented by a single sub-device;
+- 2 / 4 CSID sub-devices - each CSID is represented by a single sub-device;
+- 2 / 4 ISPIF sub-devices - ISPIF is represented by a number of sub-devices
+ equal to the number of CSID sub-devices;
+- 4 / 8 VFE sub-devices - VFE is represented by a number of sub-devices equal to
+ the number of the input interfaces (3 RDI and 1 PIX for each VFE).
+
+The considerations to split the driver in this particular way are as follows:
+
+- representing CSIPHY and CSID modules by a separate sub-device for each module
+ allows to model the hardware links between these modules;
+- representing VFE by a separate sub-devices for each input interface allows
+ to use the input interfaces concurrently and independently as this is
+ supported by the hardware;
+- representing ISPIF by a number of sub-devices equal to the number of CSID
+ sub-devices allows to create linear media controller pipelines when using two
+ cameras simultaneously. This avoids branches in the pipelines which otherwise
+ will require a) userspace and b) media framework (e.g. power on/off
+ operations) to make assumptions about the data flow from a sink pad to a
+ source pad on a single media entity.
+
+Each VFE sub-device is linked to a separate video device node.
+
+The media controller pipeline graph is as follows (with connected two / three
+OV5645 camera sensors):
+
+.. _qcom_camss_graph:
+
+.. kernel-figure:: qcom_camss_graph.dot
+ :alt: qcom_camss_graph.dot
+ :align: center
+
+ Media pipeline graph 8x16
+
+.. kernel-figure:: qcom_camss_8x96_graph.dot
+ :alt: qcom_camss_8x96_graph.dot
+ :align: center
+
+ Media pipeline graph 8x96
+
+
+Implementation
+--------------
+
+Runtime configuration of the hardware (updating settings while streaming) is
+not required to implement the currently supported functionality. The complete
+configuration on each hardware module is applied on STREAMON ioctl based on
+the current active media links, formats and controls set.
+
+The output size of the scaler module in the VFE is configured with the actual
+compose selection rectangle on the sink pad of the 'msm_vfe0_pix' entity.
+
+The crop output area of the crop module in the VFE is configured with the actual
+crop selection rectangle on the source pad of the 'msm_vfe0_pix' entity.
+
+
+Documentation
+-------------
+
+APQ8016 Specification:
+https://developer.qualcomm.com/download/sd410/snapdragon-410-processor-device-specification.pdf
+Referenced 2016-11-24.
+
+APQ8096 Specification:
+https://developer.qualcomm.com/download/sd820e/qualcomm-snapdragon-820e-processor-apq8096sge-device-specification.pdf
+Referenced 2018-06-22.
+
+References
+----------
+
+.. [#f1] https://git.codelinaro.org/clo/la/kernel/msm-3.10/
+.. [#f2] https://git.codelinaro.org/clo/la/kernel/msm-3.18/
diff --git a/Documentation/admin-guide/media/qcom_camss_8x96_graph.dot b/Documentation/admin-guide/media/qcom_camss_8x96_graph.dot
new file mode 100644
index 000000000000..7ed243b41b67
--- /dev/null
+++ b/Documentation/admin-guide/media/qcom_camss_8x96_graph.dot
@@ -0,0 +1,106 @@
+# SPDX-License-Identifier: GPL-2.0
+
+digraph board {
+ rankdir=TB
+ n00000001 [label="{{<port0> 0} | msm_csiphy0\n/dev/v4l-subdev0 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000001:port1 -> n0000000a:port0 [style=dashed]
+ n00000001:port1 -> n0000000d:port0 [style=dashed]
+ n00000001:port1 -> n00000010:port0 [style=dashed]
+ n00000001:port1 -> n00000013:port0 [style=dashed]
+ n00000004 [label="{{<port0> 0} | msm_csiphy1\n/dev/v4l-subdev1 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000004:port1 -> n0000000a:port0 [style=dashed]
+ n00000004:port1 -> n0000000d:port0 [style=dashed]
+ n00000004:port1 -> n00000010:port0 [style=dashed]
+ n00000004:port1 -> n00000013:port0 [style=dashed]
+ n00000007 [label="{{<port0> 0} | msm_csiphy2\n/dev/v4l-subdev2 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000007:port1 -> n0000000a:port0 [style=dashed]
+ n00000007:port1 -> n0000000d:port0 [style=dashed]
+ n00000007:port1 -> n00000010:port0 [style=dashed]
+ n00000007:port1 -> n00000013:port0 [style=dashed]
+ n0000000a [label="{{<port0> 0} | msm_csid0\n/dev/v4l-subdev3 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000000a:port1 -> n00000016:port0 [style=dashed]
+ n0000000a:port1 -> n00000019:port0 [style=dashed]
+ n0000000a:port1 -> n0000001c:port0 [style=dashed]
+ n0000000a:port1 -> n0000001f:port0 [style=dashed]
+ n0000000d [label="{{<port0> 0} | msm_csid1\n/dev/v4l-subdev4 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000000d:port1 -> n00000016:port0 [style=dashed]
+ n0000000d:port1 -> n00000019:port0 [style=dashed]
+ n0000000d:port1 -> n0000001c:port0 [style=dashed]
+ n0000000d:port1 -> n0000001f:port0 [style=dashed]
+ n00000010 [label="{{<port0> 0} | msm_csid2\n/dev/v4l-subdev5 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000010:port1 -> n00000016:port0 [style=dashed]
+ n00000010:port1 -> n00000019:port0 [style=dashed]
+ n00000010:port1 -> n0000001c:port0 [style=dashed]
+ n00000010:port1 -> n0000001f:port0 [style=dashed]
+ n00000013 [label="{{<port0> 0} | msm_csid3\n/dev/v4l-subdev6 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000013:port1 -> n00000016:port0 [style=dashed]
+ n00000013:port1 -> n00000019:port0 [style=dashed]
+ n00000013:port1 -> n0000001c:port0 [style=dashed]
+ n00000013:port1 -> n0000001f:port0 [style=dashed]
+ n00000016 [label="{{<port0> 0} | msm_ispif0\n/dev/v4l-subdev7 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000016:port1 -> n00000022:port0 [style=dashed]
+ n00000016:port1 -> n0000002b:port0 [style=dashed]
+ n00000016:port1 -> n00000034:port0 [style=dashed]
+ n00000016:port1 -> n0000003d:port0 [style=dashed]
+ n00000016:port1 -> n00000046:port0 [style=dashed]
+ n00000016:port1 -> n0000004f:port0 [style=dashed]
+ n00000016:port1 -> n00000058:port0 [style=dashed]
+ n00000016:port1 -> n00000061:port0 [style=dashed]
+ n00000019 [label="{{<port0> 0} | msm_ispif1\n/dev/v4l-subdev8 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000019:port1 -> n00000022:port0 [style=dashed]
+ n00000019:port1 -> n0000002b:port0 [style=dashed]
+ n00000019:port1 -> n00000034:port0 [style=dashed]
+ n00000019:port1 -> n0000003d:port0 [style=dashed]
+ n00000019:port1 -> n00000046:port0 [style=dashed]
+ n00000019:port1 -> n0000004f:port0 [style=dashed]
+ n00000019:port1 -> n00000058:port0 [style=dashed]
+ n00000019:port1 -> n00000061:port0 [style=dashed]
+ n0000001c [label="{{<port0> 0} | msm_ispif2\n/dev/v4l-subdev9 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000001c:port1 -> n00000022:port0 [style=dashed]
+ n0000001c:port1 -> n0000002b:port0 [style=dashed]
+ n0000001c:port1 -> n00000034:port0 [style=dashed]
+ n0000001c:port1 -> n0000003d:port0 [style=dashed]
+ n0000001c:port1 -> n00000046:port0 [style=dashed]
+ n0000001c:port1 -> n0000004f:port0 [style=dashed]
+ n0000001c:port1 -> n00000058:port0 [style=dashed]
+ n0000001c:port1 -> n00000061:port0 [style=dashed]
+ n0000001f [label="{{<port0> 0} | msm_ispif3\n/dev/v4l-subdev10 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000001f:port1 -> n00000022:port0 [style=dashed]
+ n0000001f:port1 -> n0000002b:port0 [style=dashed]
+ n0000001f:port1 -> n00000034:port0 [style=dashed]
+ n0000001f:port1 -> n0000003d:port0 [style=dashed]
+ n0000001f:port1 -> n00000046:port0 [style=dashed]
+ n0000001f:port1 -> n0000004f:port0 [style=dashed]
+ n0000001f:port1 -> n00000058:port0 [style=dashed]
+ n0000001f:port1 -> n00000061:port0 [style=dashed]
+ n00000022 [label="{{<port0> 0} | msm_vfe0_rdi0\n/dev/v4l-subdev11 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000022:port1 -> n00000025 [style=bold]
+ n00000025 [label="msm_vfe0_video0\n/dev/video0", shape=box, style=filled, fillcolor=yellow]
+ n0000002b [label="{{<port0> 0} | msm_vfe0_rdi1\n/dev/v4l-subdev12 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000002b:port1 -> n0000002e [style=bold]
+ n0000002e [label="msm_vfe0_video1\n/dev/video1", shape=box, style=filled, fillcolor=yellow]
+ n00000034 [label="{{<port0> 0} | msm_vfe0_rdi2\n/dev/v4l-subdev13 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000034:port1 -> n00000037 [style=bold]
+ n00000037 [label="msm_vfe0_video2\n/dev/video2", shape=box, style=filled, fillcolor=yellow]
+ n0000003d [label="{{<port0> 0} | msm_vfe0_pix\n/dev/v4l-subdev14 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000003d:port1 -> n00000040 [style=bold]
+ n00000040 [label="msm_vfe0_video3\n/dev/video3", shape=box, style=filled, fillcolor=yellow]
+ n00000046 [label="{{<port0> 0} | msm_vfe1_rdi0\n/dev/v4l-subdev15 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000046:port1 -> n00000049 [style=bold]
+ n00000049 [label="msm_vfe1_video0\n/dev/video4", shape=box, style=filled, fillcolor=yellow]
+ n0000004f [label="{{<port0> 0} | msm_vfe1_rdi1\n/dev/v4l-subdev16 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000004f:port1 -> n00000052 [style=bold]
+ n00000052 [label="msm_vfe1_video1\n/dev/video5", shape=box, style=filled, fillcolor=yellow]
+ n00000058 [label="{{<port0> 0} | msm_vfe1_rdi2\n/dev/v4l-subdev17 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000058:port1 -> n0000005b [style=bold]
+ n0000005b [label="msm_vfe1_video2\n/dev/video6", shape=box, style=filled, fillcolor=yellow]
+ n00000061 [label="{{<port0> 0} | msm_vfe1_pix\n/dev/v4l-subdev18 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000061:port1 -> n00000064 [style=bold]
+ n00000064 [label="msm_vfe1_video3\n/dev/video7", shape=box, style=filled, fillcolor=yellow]
+ n000000e2 [label="{{} | ov5645 3-0039\n/dev/v4l-subdev19 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
+ n000000e2:port0 -> n00000004:port0 [style=bold]
+ n000000e4 [label="{{} | ov5645 3-003a\n/dev/v4l-subdev20 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
+ n000000e4:port0 -> n00000007:port0 [style=bold]
+ n000000e6 [label="{{} | ov5645 3-003b\n/dev/v4l-subdev21 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
+ n000000e6:port0 -> n00000001:port0 [style=bold]
+}
diff --git a/Documentation/admin-guide/media/qcom_camss_graph.dot b/Documentation/admin-guide/media/qcom_camss_graph.dot
new file mode 100644
index 000000000000..ef7dca92fd0b
--- /dev/null
+++ b/Documentation/admin-guide/media/qcom_camss_graph.dot
@@ -0,0 +1,43 @@
+# SPDX-License-Identifier: GPL-2.0
+
+digraph board {
+ rankdir=TB
+ n00000001 [label="{{<port0> 0} | msm_csiphy0\n/dev/v4l-subdev0 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000001:port1 -> n00000007:port0 [style=dashed]
+ n00000001:port1 -> n0000000a:port0 [style=dashed]
+ n00000004 [label="{{<port0> 0} | msm_csiphy1\n/dev/v4l-subdev1 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000004:port1 -> n00000007:port0 [style=dashed]
+ n00000004:port1 -> n0000000a:port0 [style=dashed]
+ n00000007 [label="{{<port0> 0} | msm_csid0\n/dev/v4l-subdev2 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000007:port1 -> n0000000d:port0 [style=dashed]
+ n00000007:port1 -> n00000010:port0 [style=dashed]
+ n0000000a [label="{{<port0> 0} | msm_csid1\n/dev/v4l-subdev3 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000000a:port1 -> n0000000d:port0 [style=dashed]
+ n0000000a:port1 -> n00000010:port0 [style=dashed]
+ n0000000d [label="{{<port0> 0} | msm_ispif0\n/dev/v4l-subdev4 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000000d:port1 -> n00000013:port0 [style=dashed]
+ n0000000d:port1 -> n0000001c:port0 [style=dashed]
+ n0000000d:port1 -> n00000025:port0 [style=dashed]
+ n0000000d:port1 -> n0000002e:port0 [style=dashed]
+ n00000010 [label="{{<port0> 0} | msm_ispif1\n/dev/v4l-subdev5 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000010:port1 -> n00000013:port0 [style=dashed]
+ n00000010:port1 -> n0000001c:port0 [style=dashed]
+ n00000010:port1 -> n00000025:port0 [style=dashed]
+ n00000010:port1 -> n0000002e:port0 [style=dashed]
+ n00000013 [label="{{<port0> 0} | msm_vfe0_rdi0\n/dev/v4l-subdev6 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000013:port1 -> n00000016 [style=bold]
+ n00000016 [label="msm_vfe0_video0\n/dev/video0", shape=box, style=filled, fillcolor=yellow]
+ n0000001c [label="{{<port0> 0} | msm_vfe0_rdi1\n/dev/v4l-subdev7 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000001c:port1 -> n0000001f [style=bold]
+ n0000001f [label="msm_vfe0_video1\n/dev/video1", shape=box, style=filled, fillcolor=yellow]
+ n00000025 [label="{{<port0> 0} | msm_vfe0_rdi2\n/dev/v4l-subdev8 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000025:port1 -> n00000028 [style=bold]
+ n00000028 [label="msm_vfe0_video2\n/dev/video2", shape=box, style=filled, fillcolor=yellow]
+ n0000002e [label="{{<port0> 0} | msm_vfe0_pix\n/dev/v4l-subdev9 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000002e:port1 -> n00000031 [style=bold]
+ n00000031 [label="msm_vfe0_video3\n/dev/video3", shape=box, style=filled, fillcolor=yellow]
+ n00000057 [label="{{} | ov5645 1-0076\n/dev/v4l-subdev10 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000057:port0 -> n00000001:port0 [style=bold]
+ n00000059 [label="{{} | ov5645 1-0074\n/dev/v4l-subdev11 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000059:port0 -> n00000004:port0 [style=bold]
+}
diff --git a/Documentation/admin-guide/media/radio-cardlist.rst b/Documentation/admin-guide/media/radio-cardlist.rst
new file mode 100644
index 000000000000..a82a146bf912
--- /dev/null
+++ b/Documentation/admin-guide/media/radio-cardlist.rst
@@ -0,0 +1,44 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Radio drivers
+=============
+
+There is also support for pure AM/FM radio, and even for some FM radio
+transmitters:
+
+===================== =========================================================
+Driver Name
+===================== =========================================================
+si4713 Silicon Labs Si4713 FM Radio Transmitter
+radio-aztech Aztech/Packard Bell Radio
+radio-cadet ADS Cadet AM/FM Tuner
+radio-gemtek GemTek Radio card (or compatible)
+radio-maxiradio Guillemot MAXI Radio FM 2000 radio
+radio-miropcm20 miroSOUND PCM20 radio
+radio-aimslab AIMSlab RadioTrack (aka RadioReveal)
+radio-rtrack2 AIMSlab RadioTrack II
+saa7706h SAA7706H Car Radio DSP
+radio-sf16fmi SF16-FMI/SF16-FMP/SF16-FMD Radio
+radio-sf16fmr2 SF16-FMR2/SF16-FMD2 Radio
+radio-shark Griffin radioSHARK USB radio receiver
+shark2 Griffin radioSHARK2 USB radio receiver
+radio-si470x-common Silicon Labs Si470x FM Radio Receiver
+radio-si476x Silicon Laboratories Si476x I2C FM Radio
+radio-tea5764 TEA5764 I2C FM radio
+tef6862 TEF6862 Car Radio Enhanced Selectivity Tuner
+radio-terratec TerraTec ActiveRadio ISA Standalone
+radio-timb Enable the Timberdale radio driver
+radio-trust Trust FM radio card
+radio-typhoon Typhoon Radio (a.k.a. EcoRadio)
+radio-wl1273 Texas Instruments WL1273 I2C FM Radio
+fm_drv ISA radio devices
+fm_drv ISA radio devices
+radio-zoltrix Zoltrix Radio
+dsbr100 D-Link/GemTek USB FM radio
+radio-keene Keene FM Transmitter USB
+radio-ma901 Masterkit MA901 USB FM radio
+radio-mr800 AverMedia MR 800 USB FM radio
+radio-raremono Thanko's Raremono AM/FM/SW radio
+radio-si470x-usb Silicon Labs Si470x FM Radio Receiver support with USB
+radio-usb-si4713 Silicon Labs Si4713 FM Radio Transmitter support with USB
+===================== =========================================================
diff --git a/Documentation/admin-guide/media/rcar-fdp1.rst b/Documentation/admin-guide/media/rcar-fdp1.rst
new file mode 100644
index 000000000000..88b0edcf9046
--- /dev/null
+++ b/Documentation/admin-guide/media/rcar-fdp1.rst
@@ -0,0 +1,39 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Renesas R-Car Fine Display Processor (FDP1) Driver
+==================================================
+
+The R-Car FDP1 driver implements driver-specific controls as follows.
+
+``V4L2_CID_DEINTERLACING_MODE (menu)``
+ The video deinterlacing mode (such as Bob, Weave, ...). The R-Car FDP1
+ driver implements the following modes.
+
+.. flat-table::
+ :header-rows: 0
+ :stub-columns: 0
+ :widths: 1 4
+
+ * - ``"Progressive" (0)``
+ - The input image video stream is progressive (not interlaced). No
+ deinterlacing is performed. Apart from (optional) format and encoding
+ conversion output frames are identical to the input frames.
+ * - ``"Adaptive 2D/3D" (1)``
+ - Motion adaptive version of 2D and 3D deinterlacing. Use 3D deinterlacing
+ in the presence of fast motion and 2D deinterlacing with diagonal
+ interpolation otherwise.
+ * - ``"Fixed 2D" (2)``
+ - The current field is scaled vertically by averaging adjacent lines to
+ recover missing lines. This method is also known as blending or Line
+ Averaging (LAV).
+ * - ``"Fixed 3D" (3)``
+ - The previous and next fields are averaged to recover lines missing from
+ the current field. This method is also known as Field Averaging (FAV).
+ * - ``"Previous field" (4)``
+ - The current field is weaved with the previous field, i.e. the previous
+ field is used to fill missing lines from the current field. This method
+ is also known as weave deinterlacing.
+ * - ``"Next field" (5)``
+ - The current field is weaved with the next field, i.e. the next field is
+ used to fill missing lines from the current field. This method is also
+ known as weave deinterlacing.
diff --git a/Documentation/admin-guide/media/remote-controller.rst b/Documentation/admin-guide/media/remote-controller.rst
new file mode 100644
index 000000000000..188944b00f4f
--- /dev/null
+++ b/Documentation/admin-guide/media/remote-controller.rst
@@ -0,0 +1,76 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================================
+Infrared remote control support in video4linux drivers
+======================================================
+
+Authors: Gerd Hoffmann, Mauro Carvalho Chehab
+
+Basics
+======
+
+Most analog and digital TV boards support remote controllers. Several of
+them have a microprocessor that receives the IR carriers, convert into
+pulse/space sequences and then to scan codes, returning such codes to
+userspace ("scancode mode"). Other boards return just the pulse/space
+sequences ("raw mode").
+
+The support for remote controller in scancode mode is provided by the
+standard Linux input layer. The support for raw mode is provided via LIRC.
+
+In order to check the support and test it, it is suggested to download
+the `v4l-utils <https://git.linuxtv.org/v4l-utils.git/>`_. It provides
+two tools to handle remote controllers:
+
+- ir-keytable: provides a way to query the remote controller, list the
+ protocols it supports, enable in-kernel support for IR decoder or
+ switch the protocol and to test the reception of scan codes;
+
+- ir-ctl: provide tools to handle remote controllers that support raw mode
+ via LIRC interface.
+
+Usually, the remote controller module is auto-loaded when the TV card is
+detected. However, for a few devices, you need to manually load the
+ir-kbd-i2c module.
+
+How it works
+============
+
+The modules register the remote as keyboard within the linux input
+layer, i.e. you'll see the keys of the remote as normal key strokes
+(if CONFIG_INPUT_KEYBOARD is enabled).
+
+Using the event devices (CONFIG_INPUT_EVDEV) it is possible for
+applications to access the remote via /dev/input/event<n> devices.
+The udev/systemd will automatically create the devices. If you install
+the `v4l-utils <https://git.linuxtv.org/v4l-utils.git/>`_, it may also
+automatically load a different keytable than the default one. Please see
+`v4l-utils <https://git.linuxtv.org/v4l-utils.git/>`_ ir-keytable.1
+man page for details.
+
+The ir-keytable tool is nice for trouble shooting, i.e. to check
+whenever the input device is really present, which of the devices it
+is, check whenever pressing keys on the remote actually generates
+events and the like. You can also use any other input utility that changes
+the keymaps, like the input kbd utility.
+
+
+Using with lircd
+----------------
+
+The latest versions of the lircd daemon supports reading events from the
+linux input layer (via event device). It also supports receiving IR codes
+in lirc mode.
+
+
+Using without lircd
+-------------------
+
+Xorg recognizes several IR keycodes that have its numerical value lower
+than 247. With the advent of Wayland, the input driver got updated too,
+and should now accept all keycodes. Yet, you may want to just reassign
+the keycodes to something that your favorite media application likes.
+
+This can be done by setting
+`v4l-utils <https://git.linuxtv.org/v4l-utils.git/>`_ to load your own
+keytable in runtime. Please read ir-keytable.1 man page for details.
diff --git a/Documentation/admin-guide/media/rkisp1.dot b/Documentation/admin-guide/media/rkisp1.dot
new file mode 100644
index 000000000000..54c1953a6130
--- /dev/null
+++ b/Documentation/admin-guide/media/rkisp1.dot
@@ -0,0 +1,18 @@
+digraph board {
+ rankdir=TB
+ n00000001 [label="{{<port0> 0 | <port1> 1} | rkisp1_isp\n/dev/v4l-subdev0 | {<port2> 2 | <port3> 3}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000001:port2 -> n00000006:port0
+ n00000001:port2 -> n00000009:port0
+ n00000001:port3 -> n00000014 [style=bold]
+ n00000006 [label="{{<port0> 0} | rkisp1_resizer_mainpath\n/dev/v4l-subdev1 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000006:port1 -> n0000000c [style=bold]
+ n00000009 [label="{{<port0> 0} | rkisp1_resizer_selfpath\n/dev/v4l-subdev2 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000009:port1 -> n00000010 [style=bold]
+ n0000000c [label="rkisp1_mainpath\n/dev/video0", shape=box, style=filled, fillcolor=yellow]
+ n00000010 [label="rkisp1_selfpath\n/dev/video1", shape=box, style=filled, fillcolor=yellow]
+ n00000014 [label="rkisp1_stats\n/dev/video2", shape=box, style=filled, fillcolor=yellow]
+ n00000018 [label="rkisp1_params\n/dev/video3", shape=box, style=filled, fillcolor=yellow]
+ n00000018 -> n00000001:port1 [style=bold]
+ n0000001c [label="{{} | imx219 4-0010\n/dev/v4l-subdev3 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000001c:port0 -> n00000001:port0
+}
diff --git a/Documentation/admin-guide/media/rkisp1.rst b/Documentation/admin-guide/media/rkisp1.rst
new file mode 100644
index 000000000000..6f14d9561fa5
--- /dev/null
+++ b/Documentation/admin-guide/media/rkisp1.rst
@@ -0,0 +1,197 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: <isonum.txt>
+
+=========================================
+Rockchip Image Signal Processor (rkisp1)
+=========================================
+
+Introduction
+============
+
+This file documents the driver for the Rockchip ISP1 that is part of RK3288
+and RK3399 SoCs. The driver is located under drivers/media/platform/rockchip/
+rkisp1 and uses the Media-Controller API.
+
+Revisions
+=========
+
+There exist multiple smaller revisions to this ISP that got introduced in
+later SoCs. Revisions can be found in the enum :c:type:`rkisp1_cif_isp_version`
+in the UAPI and the revision of the ISP inside the running SoC can be read
+in the field hw_revision of struct media_device_info as returned by
+ioctl MEDIA_IOC_DEVICE_INFO.
+
+Versions in use are:
+
+- RKISP1_V10: used at least in rk3288 and rk3399
+- RKISP1_V11: declared in the original vendor code, but not used
+- RKISP1_V12: used at least in rk3326 and px30
+- RKISP1_V13: used at least in rk1808
+
+Topology
+========
+.. _rkisp1_topology_graph:
+
+.. kernel-figure:: rkisp1.dot
+ :alt: Diagram of the default media pipeline topology
+ :align: center
+
+
+The driver has 4 video devices:
+
+- rkisp1_mainpath: capture device for retrieving images, usually in higher
+ resolution.
+- rkisp1_selfpath: capture device for retrieving images.
+- rkisp1_stats: a metadata capture device that sends statistics.
+- rkisp1_params: a metadata output device that receives parameters
+ configurations from userspace.
+
+The driver has 3 subdevices:
+
+- rkisp1_resizer_mainpath: used to resize and downsample frames for the
+ mainpath capture device.
+- rkisp1_resizer_selfpath: used to resize and downsample frames for the
+ selfpath capture device.
+- rkisp1_isp: is connected to the sensor and is responsible for all the isp
+ operations.
+
+
+rkisp1_mainpath, rkisp1_selfpath - Frames Capture Video Nodes
+-------------------------------------------------------------
+Those are the `mainpath` and `selfpath` capture devices to capture frames.
+Those entities are the DMA engines that write the frames to memory.
+The selfpath video device can capture YUV/RGB formats. Its input is YUV encoded
+stream and it is able to convert it to RGB. The selfpath is not able to
+capture bayer formats.
+The mainpath can capture both bayer and YUV formats but it is not able to
+capture RGB formats.
+Both capture videos support
+the ``V4L2_CAP_IO_MC`` :ref:`capability <device-capabilities>`.
+
+
+rkisp1_resizer_mainpath, rkisp1_resizer_selfpath - Resizers Subdevices Nodes
+----------------------------------------------------------------------------
+Those are resizer entities for the mainpath and the selfpath. Those entities
+can scale the frames up and down and also change the YUV sampling (for example
+YUV4:2:2 -> YUV4:2:0). They also have cropping capability on the sink pad.
+The resizers entities can only operate on YUV:4:2:2 format
+(MEDIA_BUS_FMT_YUYV8_2X8).
+The mainpath capture device supports capturing video in bayer formats. In that
+case the resizer of the mainpath is set to 'bypass' mode - it just forward the
+frame without operating on it.
+
+rkisp1_isp - Image Signal Processing Subdevice Node
+---------------------------------------------------
+This is the isp entity. It is connected to the sensor on sink pad 0 and
+receives the frames using the CSI-2 protocol. It is responsible of configuring
+the CSI-2 protocol. It has a cropping capability on sink pad 0 that is
+connected to the sensor and on source pad 2 connected to the resizer entities.
+Cropping on sink pad 0 defines the image region from the sensor.
+Cropping on source pad 2 defines the region for the Image Stabilizer (IS).
+
+.. _rkisp1_stats:
+
+rkisp1_stats - Statistics Video Node
+------------------------------------
+The statistics video node outputs the 3A (auto focus, auto exposure and auto
+white balance) statistics, and also histogram statistics for the frames that
+are being processed by the rkisp1 to userspace applications.
+Using these data, applications can implement algorithms and re-parameterize
+the driver through the rkisp_params node to improve image quality during a
+video stream.
+The buffer format is defined by struct :c:type:`rkisp1_stat_buffer`, and
+userspace should set
+:ref:`V4L2_META_FMT_RK_ISP1_STAT_3A <v4l2-meta-fmt-rk-isp1-stat-3a>` as the
+dataformat.
+
+.. _rkisp1_params:
+
+rkisp1_params - Parameters Video Node
+-------------------------------------
+The rkisp1_params video node receives a set of parameters from userspace
+to be applied to the hardware during a video stream, allowing userspace
+to dynamically modify values such as black level, cross talk corrections
+and others.
+
+The buffer format is defined by struct :c:type:`rkisp1_params_cfg`, and
+userspace should set
+:ref:`V4L2_META_FMT_RK_ISP1_PARAMS <v4l2-meta-fmt-rk-isp1-params>` as the
+dataformat.
+
+
+Capturing Video Frames Example
+==============================
+
+In the following example, the sensor connected to pad 0 of 'rkisp1_isp' is
+imx219.
+
+The following commands can be used to capture video from the selfpath video
+node with dimension 900x800 planar format YUV 4:2:2. It uses all cropping
+capabilities possible, (see explanation right below)
+
+.. code-block:: bash
+
+ # set the links
+ "media-ctl" "-d" "platform:rkisp1" "-r"
+ "media-ctl" "-d" "platform:rkisp1" "-l" "'imx219 4-0010':0 -> 'rkisp1_isp':0 [1]"
+ "media-ctl" "-d" "platform:rkisp1" "-l" "'rkisp1_isp':2 -> 'rkisp1_resizer_selfpath':0 [1]"
+ "media-ctl" "-d" "platform:rkisp1" "-l" "'rkisp1_isp':2 -> 'rkisp1_resizer_mainpath':0 [0]"
+
+ # set format for imx219 4-0010:0
+ "media-ctl" "-d" "platform:rkisp1" "--set-v4l2" '"imx219 4-0010":0 [fmt:SRGGB10_1X10/1640x1232]'
+
+ # set format for rkisp1_isp pads:
+ "media-ctl" "-d" "platform:rkisp1" "--set-v4l2" '"rkisp1_isp":0 [fmt:SRGGB10_1X10/1640x1232 crop: (0,0)/1600x1200]'
+ "media-ctl" "-d" "platform:rkisp1" "--set-v4l2" '"rkisp1_isp":2 [fmt:YUYV8_2X8/1600x1200 crop: (0,0)/1500x1100]'
+
+ # set format for rkisp1_resizer_selfpath pads:
+ "media-ctl" "-d" "platform:rkisp1" "--set-v4l2" '"rkisp1_resizer_selfpath":0 [fmt:YUYV8_2X8/1500x1100 crop: (300,400)/1400x1000]'
+ "media-ctl" "-d" "platform:rkisp1" "--set-v4l2" '"rkisp1_resizer_selfpath":1 [fmt:YUYV8_2X8/900x800]'
+
+ # set format for rkisp1_selfpath:
+ "v4l2-ctl" "-z" "platform:rkisp1" "-d" "rkisp1_selfpath" "-v" "width=900,height=800,"
+ "v4l2-ctl" "-z" "platform:rkisp1" "-d" "rkisp1_selfpath" "-v" "pixelformat=422P"
+
+ # start streaming:
+ v4l2-ctl "-z" "platform:rkisp1" "-d" "rkisp1_selfpath" "--stream-mmap" "--stream-count" "10"
+
+
+In the above example the sensor is configured to bayer format:
+`SRGGB10_1X10/1640x1232`. The rkisp1_isp:0 pad should be configured to the
+same mbus format and dimensions as the sensor, otherwise streaming will fail
+with 'EPIPE' error. So it is also configured to `SRGGB10_1X10/1640x1232`.
+In addition, the rkisp1_isp:0 pad is configured to cropping `(0,0)/1600x1200`.
+
+The cropping dimensions are automatically propagated to be the format of the
+isp source pad `rkisp1_isp:2`. Another cropping operation is configured on
+the isp source pad: `(0,0)/1500x1100`.
+
+The resizer's sink pad `rkisp1_resizer_selfpath` should be configured to format
+`YUYV8_2X8/1500x1100` in order to match the format on the other side of the
+link. In addition a cropping `(300,400)/1400x1000` is configured on it.
+
+The source pad of the resizer, `rkisp1_resizer_selfpath:1` is configured to
+format `YUYV8_2X8/900x800`. That means that the resizer first crop a window
+of `(300,400)/1400x100` from the received frame and then scales this window
+to dimension `900x800`.
+
+Note that the above example does not uses the stats-params control loop.
+Therefore the capture frames will not go through the 3A algorithms and
+probably won't have a good quality, and can even look dark and greenish.
+
+Configuring Quantization
+========================
+
+The driver supports limited and full range quantization on YUV formats,
+where limited is the default.
+To switch between one or the other, userspace should use the Colorspace
+Conversion API (CSC) for subdevices on source pad 2 of the
+isp (`rkisp1_isp:2`). The quantization configured on this pad is the
+quantization of the captured video frames on the mainpath and selfpath
+video nodes.
+Note that the resizer and capture entities will always report
+``V4L2_QUANTIZATION_DEFAULT`` even if the quantization is configured to full
+range on `rkisp1_isp:2`. So in order to get the configured quantization,
+application should get it from pad `rkisp1_isp:2`.
+
diff --git a/Documentation/admin-guide/media/saa7134-cardlist.rst b/Documentation/admin-guide/media/saa7134-cardlist.rst
new file mode 100644
index 000000000000..3ef8fab6bcad
--- /dev/null
+++ b/Documentation/admin-guide/media/saa7134-cardlist.rst
@@ -0,0 +1,803 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+SAA7134 cards list
+==================
+
+.. tabularcolumns:: |p{1.4cm}|p{11.1cm}|p{4.2cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 2 19 18
+ :stub-columns: 0
+
+ * - Card number
+ - Card name
+ - PCI subsystem IDs
+
+ * - 0
+ - UNKNOWN/GENERIC
+ -
+
+ * - 1
+ - Proteus Pro [philips reference design]
+ - 1131:2001, 1131:2001
+
+ * - 2
+ - LifeView FlyVIDEO3000
+ - 5168:0138, 4e42:0138
+
+ * - 3
+ - LifeView/Typhoon FlyVIDEO2000
+ - 5168:0138, 4e42:0138
+
+ * - 4
+ - EMPRESS
+ - 1131:6752
+
+ * - 5
+ - SKNet Monster TV
+ - 1131:4e85
+
+ * - 6
+ - Tevion MD 9717
+ -
+
+ * - 7
+ - KNC One TV-Station RDS / Typhoon TV Tuner RDS
+ - 1131:fe01, 1894:fe01
+
+ * - 8
+ - Terratec Cinergy 400 TV
+ - 153b:1142
+
+ * - 9
+ - Medion 5044
+ -
+
+ * - 10
+ - Kworld/KuroutoShikou SAA7130-TVPCI
+ -
+
+ * - 11
+ - Terratec Cinergy 600 TV
+ - 153b:1143
+
+ * - 12
+ - Medion 7134
+ - 16be:0003, 16be:5000
+
+ * - 13
+ - Typhoon TV+Radio 90031
+ -
+
+ * - 14
+ - ELSA EX-VISION 300TV
+ - 1048:226b
+
+ * - 15
+ - ELSA EX-VISION 500TV
+ - 1048:226a
+
+ * - 16
+ - ASUS TV-FM 7134
+ - 1043:4842, 1043:4830, 1043:4840
+
+ * - 17
+ - AOPEN VA1000 POWER
+ - 1131:7133
+
+ * - 18
+ - BMK MPEX No Tuner
+ -
+
+ * - 19
+ - Compro VideoMate TV
+ - 185b:c100
+
+ * - 20
+ - Matrox CronosPlus
+ - 102B:48d0
+
+ * - 21
+ - 10MOONS PCI TV CAPTURE CARD
+ - 1131:2001
+
+ * - 22
+ - AverMedia M156 / Medion 2819
+ - 1461:a70b
+
+ * - 23
+ - BMK MPEX Tuner
+ -
+
+ * - 24
+ - KNC One TV-Station DVR
+ - 1894:a006
+
+ * - 25
+ - ASUS TV-FM 7133
+ - 1043:4843
+
+ * - 26
+ - Pinnacle PCTV Stereo (saa7134)
+ - 11bd:002b
+
+ * - 27
+ - Manli MuchTV M-TV002
+ -
+
+ * - 28
+ - Manli MuchTV M-TV001
+ -
+
+ * - 29
+ - Nagase Sangyo TransGear 3000TV
+ - 1461:050c
+
+ * - 30
+ - Elitegroup ECS TVP3XP FM1216 Tuner Card(PAL-BG,FM)
+ - 1019:4cb4
+
+ * - 31
+ - Elitegroup ECS TVP3XP FM1236 Tuner Card (NTSC,FM)
+ - 1019:4cb5
+
+ * - 32
+ - AVACS SmartTV
+ -
+
+ * - 33
+ - AVerMedia DVD EZMaker
+ - 1461:10ff
+
+ * - 34
+ - Noval Prime TV 7133
+ -
+
+ * - 35
+ - AverMedia AverTV Studio 305
+ - 1461:2115
+
+ * - 36
+ - UPMOST PURPLE TV
+ - 12ab:0800
+
+ * - 37
+ - Items MuchTV Plus / IT-005
+ -
+
+ * - 38
+ - Terratec Cinergy 200 TV
+ - 153b:1152
+
+ * - 39
+ - LifeView FlyTV Platinum Mini
+ - 5168:0212, 4e42:0212, 5169:1502
+
+ * - 40
+ - Compro VideoMate TV PVR/FM
+ - 185b:c100
+
+ * - 41
+ - Compro VideoMate TV Gold+
+ - 185b:c100
+
+ * - 42
+ - Sabrent SBT-TVFM (saa7130)
+ -
+
+ * - 43
+ - :Zolid Xpert TV7134
+ -
+
+ * - 44
+ - Empire PCI TV-Radio LE
+ -
+
+ * - 45
+ - Avermedia AVerTV Studio 307
+ - 1461:9715
+
+ * - 46
+ - AVerMedia Cardbus TV/Radio (E500)
+ - 1461:d6ee
+
+ * - 47
+ - Terratec Cinergy 400 mobile
+ - 153b:1162
+
+ * - 48
+ - Terratec Cinergy 600 TV MK3
+ - 153b:1158
+
+ * - 49
+ - Compro VideoMate Gold+ Pal
+ - 185b:c200
+
+ * - 50
+ - Pinnacle PCTV 300i DVB-T + PAL
+ - 11bd:002d
+
+ * - 51
+ - ProVideo PV952
+ - 1540:9524
+
+ * - 52
+ - AverMedia AverTV/305
+ - 1461:2108
+
+ * - 53
+ - ASUS TV-FM 7135
+ - 1043:4845
+
+ * - 54
+ - LifeView FlyTV Platinum FM / Gold
+ - 5168:0214, 5168:5214, 1489:0214, 5168:0304
+
+ * - 55
+ - LifeView FlyDVB-T DUO / MSI TV@nywhere Duo
+ - 5168:0306, 4E42:0306
+
+ * - 56
+ - Avermedia AVerTV 307
+ - 1461:a70a
+
+ * - 57
+ - Avermedia AVerTV GO 007 FM
+ - 1461:f31f
+
+ * - 58
+ - ADS Tech Instant TV (saa7135)
+ - 1421:0350, 1421:0351, 1421:0370, 1421:1370
+
+ * - 59
+ - Kworld/Tevion V-Stream Xpert TV PVR7134
+ -
+
+ * - 60
+ - LifeView/Typhoon/Genius FlyDVB-T Duo Cardbus
+ - 5168:0502, 4e42:0502, 1489:0502
+
+ * - 61
+ - Philips TOUGH DVB-T reference design
+ - 1131:2004
+
+ * - 62
+ - Compro VideoMate TV Gold+II
+ -
+
+ * - 63
+ - Kworld Xpert TV PVR7134
+ -
+
+ * - 64
+ - FlyTV mini Asus Digimatrix
+ - 1043:0210
+
+ * - 65
+ - V-Stream Studio TV Terminator
+ -
+
+ * - 66
+ - Yuan TUN-900 (saa7135)
+ -
+
+ * - 67
+ - Beholder BeholdTV 409 FM
+ - 0000:4091
+
+ * - 68
+ - GoTView 7135 PCI
+ - 5456:7135
+
+ * - 69
+ - Philips EUROPA V3 reference design
+ - 1131:2004
+
+ * - 70
+ - Compro Videomate DVB-T300
+ - 185b:c900
+
+ * - 71
+ - Compro Videomate DVB-T200
+ - 185b:c901
+
+ * - 72
+ - RTD Embedded Technologies VFG7350
+ - 1435:7350
+
+ * - 73
+ - RTD Embedded Technologies VFG7330
+ - 1435:7330
+
+ * - 74
+ - LifeView FlyTV Platinum Mini2
+ - 14c0:1212
+
+ * - 75
+ - AVerMedia AVerTVHD MCE A180
+ - 1461:1044
+
+ * - 76
+ - SKNet MonsterTV Mobile
+ - 1131:4ee9
+
+ * - 77
+ - Pinnacle PCTV 40i/50i/110i (saa7133)
+ - 11bd:002e
+
+ * - 78
+ - ASUSTeK P7131 Dual
+ - 1043:4862
+
+ * - 79
+ - Sedna/MuchTV PC TV Cardbus TV/Radio (ITO25 Rev:2B)
+ -
+
+ * - 80
+ - ASUS Digimatrix TV
+ - 1043:0210
+
+ * - 81
+ - Philips Tiger reference design
+ - 1131:2018
+
+ * - 82
+ - MSI TV@Anywhere plus
+ - 1462:6231, 1462:8624
+
+ * - 83
+ - Terratec Cinergy 250 PCI TV
+ - 153b:1160
+
+ * - 84
+ - LifeView FlyDVB Trio
+ - 5168:0319
+
+ * - 85
+ - AverTV DVB-T 777
+ - 1461:2c05, 1461:2c05
+
+ * - 86
+ - LifeView FlyDVB-T / Genius VideoWonder DVB-T
+ - 5168:0301, 1489:0301
+
+ * - 87
+ - ADS Instant TV Duo Cardbus PTV331
+ - 0331:1421
+
+ * - 88
+ - Tevion/KWorld DVB-T 220RF
+ - 17de:7201
+
+ * - 89
+ - ELSA EX-VISION 700TV
+ - 1048:226c
+
+ * - 90
+ - Kworld ATSC110/115
+ - 17de:7350, 17de:7352
+
+ * - 91
+ - AVerMedia A169 B
+ - 1461:7360
+
+ * - 92
+ - AVerMedia A169 B1
+ - 1461:6360
+
+ * - 93
+ - Medion 7134 Bridge #2
+ - 16be:0005
+
+ * - 94
+ - LifeView FlyDVB-T Hybrid Cardbus/MSI TV @nywhere A/D NB
+ - 5168:3306, 5168:3502, 5168:3307, 4e42:3502
+
+ * - 95
+ - LifeView FlyVIDEO3000 (NTSC)
+ - 5169:0138
+
+ * - 96
+ - Medion Md8800 Quadro
+ - 16be:0007, 16be:0008, 16be:000d
+
+ * - 97
+ - LifeView FlyDVB-S /Acorp TV134DS
+ - 5168:0300, 4e42:0300
+
+ * - 98
+ - Proteus Pro 2309
+ - 0919:2003
+
+ * - 99
+ - AVerMedia TV Hybrid A16AR
+ - 1461:2c00
+
+ * - 100
+ - Asus Europa2 OEM
+ - 1043:4860
+
+ * - 101
+ - Pinnacle PCTV 310i
+ - 11bd:002f
+
+ * - 102
+ - Avermedia AVerTV Studio 507
+ - 1461:9715
+
+ * - 103
+ - Compro Videomate DVB-T200A
+ -
+
+ * - 104
+ - Hauppauge WinTV-HVR1110 DVB-T/Hybrid
+ - 0070:6700, 0070:6701, 0070:6702, 0070:6703, 0070:6704, 0070:6705
+
+ * - 105
+ - Terratec Cinergy HT PCMCIA
+ - 153b:1172
+
+ * - 106
+ - Encore ENLTV
+ - 1131:2342, 1131:2341, 3016:2344
+
+ * - 107
+ - Encore ENLTV-FM
+ - 1131:230f
+
+ * - 108
+ - Terratec Cinergy HT PCI
+ - 153b:1175
+
+ * - 109
+ - Philips Tiger - S Reference design
+ -
+
+ * - 110
+ - Avermedia M102
+ - 1461:f31e
+
+ * - 111
+ - ASUS P7131 4871
+ - 1043:4871
+
+ * - 112
+ - ASUSTeK P7131 Hybrid
+ - 1043:4876
+
+ * - 113
+ - Elitegroup ECS TVP3XP FM1246 Tuner Card (PAL,FM)
+ - 1019:4cb6
+
+ * - 114
+ - KWorld DVB-T 210
+ - 17de:7250
+
+ * - 115
+ - Sabrent PCMCIA TV-PCB05
+ - 0919:2003
+
+ * - 116
+ - 10MOONS TM300 TV Card
+ - 1131:2304
+
+ * - 117
+ - Avermedia Super 007
+ - 1461:f01d
+
+ * - 118
+ - Beholder BeholdTV 401
+ - 0000:4016
+
+ * - 119
+ - Beholder BeholdTV 403
+ - 0000:4036
+
+ * - 120
+ - Beholder BeholdTV 403 FM
+ - 0000:4037
+
+ * - 121
+ - Beholder BeholdTV 405
+ - 0000:4050
+
+ * - 122
+ - Beholder BeholdTV 405 FM
+ - 0000:4051
+
+ * - 123
+ - Beholder BeholdTV 407
+ - 0000:4070
+
+ * - 124
+ - Beholder BeholdTV 407 FM
+ - 0000:4071
+
+ * - 125
+ - Beholder BeholdTV 409
+ - 0000:4090
+
+ * - 126
+ - Beholder BeholdTV 505 FM
+ - 5ace:5050
+
+ * - 127
+ - Beholder BeholdTV 507 FM / BeholdTV 509 FM
+ - 5ace:5070, 5ace:5090
+
+ * - 128
+ - Beholder BeholdTV Columbus TV/FM
+ - 0000:5201
+
+ * - 129
+ - Beholder BeholdTV 607 FM
+ - 5ace:6070
+
+ * - 130
+ - Beholder BeholdTV M6
+ - 5ace:6190
+
+ * - 131
+ - Twinhan Hybrid DTV-DVB 3056 PCI
+ - 1822:0022
+
+ * - 132
+ - Genius TVGO AM11MCE
+ -
+
+ * - 133
+ - NXP Snake DVB-S reference design
+ -
+
+ * - 134
+ - Medion/Creatix CTX953 Hybrid
+ - 16be:0010
+
+ * - 135
+ - MSI TV@nywhere A/D v1.1
+ - 1462:8625
+
+ * - 136
+ - AVerMedia Cardbus TV/Radio (E506R)
+ - 1461:f436
+
+ * - 137
+ - AVerMedia Hybrid TV/Radio (A16D)
+ - 1461:f936
+
+ * - 138
+ - Avermedia M115
+ - 1461:a836
+
+ * - 139
+ - Compro VideoMate T750
+ - 185b:c900
+
+ * - 140
+ - Avermedia DVB-S Pro A700
+ - 1461:a7a1
+
+ * - 141
+ - Avermedia DVB-S Hybrid+FM A700
+ - 1461:a7a2
+
+ * - 142
+ - Beholder BeholdTV H6
+ - 5ace:6290
+
+ * - 143
+ - Beholder BeholdTV M63
+ - 5ace:6191
+
+ * - 144
+ - Beholder BeholdTV M6 Extra
+ - 5ace:6193
+
+ * - 145
+ - AVerMedia MiniPCI DVB-T Hybrid M103
+ - 1461:f636, 1461:f736
+
+ * - 146
+ - ASUSTeK P7131 Analog
+ -
+
+ * - 147
+ - Asus Tiger 3in1
+ - 1043:4878
+
+ * - 148
+ - Encore ENLTV-FM v5.3
+ - 1a7f:2008
+
+ * - 149
+ - Avermedia PCI pure analog (M135A)
+ - 1461:f11d
+
+ * - 150
+ - Zogis Real Angel 220
+ -
+
+ * - 151
+ - ADS Tech Instant HDTV
+ - 1421:0380
+
+ * - 152
+ - Asus Tiger Rev:1.00
+ - 1043:4857
+
+ * - 153
+ - Kworld Plus TV Analog Lite PCI
+ - 17de:7128
+
+ * - 154
+ - Avermedia AVerTV GO 007 FM Plus
+ - 1461:f31d
+
+ * - 155
+ - Hauppauge WinTV-HVR1150 ATSC/QAM-Hybrid
+ - 0070:6706, 0070:6708
+
+ * - 156
+ - Hauppauge WinTV-HVR1120 DVB-T/Hybrid
+ - 0070:6707, 0070:6709, 0070:670a
+
+ * - 157
+ - Avermedia AVerTV Studio 507UA
+ - 1461:a11b
+
+ * - 158
+ - AVerMedia Cardbus TV/Radio (E501R)
+ - 1461:b7e9
+
+ * - 159
+ - Beholder BeholdTV 505 RDS
+ - 0000:505B
+
+ * - 160
+ - Beholder BeholdTV 507 RDS
+ - 0000:5071
+
+ * - 161
+ - Beholder BeholdTV 507 RDS
+ - 0000:507B
+
+ * - 162
+ - Beholder BeholdTV 607 FM
+ - 5ace:6071
+
+ * - 163
+ - Beholder BeholdTV 609 FM
+ - 5ace:6090
+
+ * - 164
+ - Beholder BeholdTV 609 FM
+ - 5ace:6091
+
+ * - 165
+ - Beholder BeholdTV 607 RDS
+ - 5ace:6072
+
+ * - 166
+ - Beholder BeholdTV 607 RDS
+ - 5ace:6073
+
+ * - 167
+ - Beholder BeholdTV 609 RDS
+ - 5ace:6092
+
+ * - 168
+ - Beholder BeholdTV 609 RDS
+ - 5ace:6093
+
+ * - 169
+ - Compro VideoMate S350/S300
+ - 185b:c900
+
+ * - 170
+ - AverMedia AverTV Studio 505
+ - 1461:a115
+
+ * - 171
+ - Beholder BeholdTV X7
+ - 5ace:7595
+
+ * - 172
+ - RoverMedia TV Link Pro FM
+ - 19d1:0138
+
+ * - 173
+ - Zolid Hybrid TV Tuner PCI
+ - 1131:2004
+
+ * - 174
+ - Asus Europa Hybrid OEM
+ - 1043:4847
+
+ * - 175
+ - Leadtek Winfast DTV1000S
+ - 107d:6655
+
+ * - 176
+ - Beholder BeholdTV 505 RDS
+ - 0000:5051
+
+ * - 177
+ - Hawell HW-404M7
+ -
+
+ * - 178
+ - Beholder BeholdTV H7
+ - 5ace:7190
+
+ * - 179
+ - Beholder BeholdTV A7
+ - 5ace:7090
+
+ * - 180
+ - Avermedia PCI M733A
+ - 1461:4155, 1461:4255
+
+ * - 181
+ - TechoTrend TT-budget T-3000
+ - 13c2:2804
+
+ * - 182
+ - Kworld PCI SBTVD/ISDB-T Full-Seg Hybrid
+ - 17de:b136
+
+ * - 183
+ - Compro VideoMate Vista M1F
+ - 185b:c900
+
+ * - 184
+ - Encore ENLTV-FM 3
+ - 1a7f:2108
+
+ * - 185
+ - MagicPro ProHDTV Pro2 DMB-TH/Hybrid
+ - 17de:d136
+
+ * - 186
+ - Beholder BeholdTV 501
+ - 5ace:5010
+
+ * - 187
+ - Beholder BeholdTV 503 FM
+ - 5ace:5030
+
+ * - 188
+ - Sensoray 811/911
+ - 6000:0811, 6000:0911
+
+ * - 189
+ - Kworld PC150-U
+ - 17de:a134
+
+ * - 190
+ - Asus My Cinema PS3-100
+ - 1043:48cd
+
+ * - 191
+ - Hawell HW-9004V1
+ -
+
+ * - 192
+ - AverMedia AverTV Satellite Hybrid+FM A706
+ - 1461:2055
+
+ * - 193
+ - WIS Voyager or compatible
+ - 1905:7007
+
+ * - 194
+ - AverMedia AverTV/505
+ - 1461:a10a
+
+ * - 195
+ - Leadtek Winfast TV2100 FM
+ - 107d:6f3a
+
+ * - 196
+ - SnaZio* TVPVR PRO
+ - 1779:13cf
diff --git a/Documentation/admin-guide/media/saa7134.rst b/Documentation/admin-guide/media/saa7134.rst
new file mode 100644
index 000000000000..51eae7eb5ab7
--- /dev/null
+++ b/Documentation/admin-guide/media/saa7134.rst
@@ -0,0 +1,89 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+The saa7134 driver
+==================
+
+Author Gerd Hoffmann
+
+
+This is a v4l2/oss device driver for saa7130/33/34/35 based capture / TV
+boards.
+
+
+Status
+------
+
+Almost everything is working. video, sound, tuner, radio, mpeg ts, ...
+
+As with bttv, card-specific tweaks are needed. Check CARDLIST for a
+list of known TV cards and saa7134-cards.c for the drivers card
+configuration info.
+
+
+Build
+-----
+
+Once you pick up a Kernel source, you should configure, build,
+install and boot the new kernel. You'll need at least
+these config options::
+
+ ./scripts/config -e PCI
+ ./scripts/config -e INPUT
+ ./scripts/config -m I2C
+ ./scripts/config -m MEDIA_SUPPORT
+ ./scripts/config -e MEDIA_PCI_SUPPORT
+ ./scripts/config -e MEDIA_ANALOG_TV_SUPPORT
+ ./scripts/config -e MEDIA_DIGITAL_TV_SUPPORT
+ ./scripts/config -e MEDIA_RADIO_SUPPORT
+ ./scripts/config -e RC_CORE
+ ./scripts/config -e MEDIA_SUBDRV_AUTOSELECT
+ ./scripts/config -m VIDEO_SAA7134
+ ./scripts/config -e SAA7134_ALSA
+ ./scripts/config -e VIDEO_SAA7134_RC
+ ./scripts/config -e VIDEO_SAA7134_DVB
+ ./scripts/config -e VIDEO_SAA7134_GO7007
+
+To build and install, you should run::
+
+ make && make modules_install && make install
+
+Once the new Kernel is booted, saa7134 driver should be loaded automatically.
+
+Depending on the card you might have to pass ``card=<nr>`` as insmod option.
+If so, please check Documentation/admin-guide/media/saa7134-cardlist.rst
+for valid choices.
+
+Once you have your card type number, you can pass a modules configuration
+via a file (usually, it is either ``/etc/modules.conf`` or some file at
+``/etc/modules-load.d/``, but the actual place depends on your
+distribution), with this content::
+
+ options saa7134 card=13 # Assuming that your card type is #13
+
+
+Changes / Fixes
+---------------
+
+Please mail to linux-media AT vger.kernel.org unified diffs against
+the linux media git tree:
+
+ https://git.linuxtv.org/media_tree.git/
+
+This is done by committing a patch at a clone of the git tree and
+submitting the patch using ``git send-email``. Don't forget to
+describe at the lots what it changes / which problem it fixes / whatever
+it is good for ...
+
+
+Known Problems
+--------------
+
+* The tuner for the flyvideos isn't detected automatically and the
+ default might not work for you depending on which version you have.
+ There is a ``tuner=`` insmod option to override the driver's default.
+
+Credits
+-------
+
+andrew.stevens@philips.com + werner.leeb@philips.com for providing
+saa7134 hardware specs and sample board.
diff --git a/Documentation/admin-guide/media/saa7164-cardlist.rst b/Documentation/admin-guide/media/saa7164-cardlist.rst
new file mode 100644
index 000000000000..7949c09aa900
--- /dev/null
+++ b/Documentation/admin-guide/media/saa7164-cardlist.rst
@@ -0,0 +1,71 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+SAA7164 cards list
+==================
+
+.. tabularcolumns:: |p{1.4cm}|p{11.1cm}|p{4.2cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 2 19 18
+ :stub-columns: 0
+
+ * - Card number
+ - Card name
+ - PCI subsystem IDs
+
+ * - 0
+ - Unknown
+ -
+
+ * - 1
+ - Generic Rev2
+ -
+
+ * - 2
+ - Generic Rev3
+ -
+
+ * - 3
+ - Hauppauge WinTV-HVR2250
+ - 0070:8880, 0070:8810
+
+ * - 4
+ - Hauppauge WinTV-HVR2200
+ - 0070:8980
+
+ * - 5
+ - Hauppauge WinTV-HVR2200
+ - 0070:8900
+
+ * - 6
+ - Hauppauge WinTV-HVR2200
+ - 0070:8901
+
+ * - 7
+ - Hauppauge WinTV-HVR2250
+ - 0070:8891, 0070:8851
+
+ * - 8
+ - Hauppauge WinTV-HVR2250
+ - 0070:88A1
+
+ * - 9
+ - Hauppauge WinTV-HVR2200
+ - 0070:8940
+
+ * - 10
+ - Hauppauge WinTV-HVR2200
+ - 0070:8953
+
+ * - 11
+ - Hauppauge WinTV-HVR2255(proto)
+ - 0070:f111
+
+ * - 12
+ - Hauppauge WinTV-HVR2255
+ - 0070:f111
+
+ * - 13
+ - Hauppauge WinTV-HVR2205
+ - 0070:f123, 0070:f120
diff --git a/Documentation/admin-guide/media/si470x.rst b/Documentation/admin-guide/media/si470x.rst
new file mode 100644
index 000000000000..d53bf5f95200
--- /dev/null
+++ b/Documentation/admin-guide/media/si470x.rst
@@ -0,0 +1,167 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: <isonum.txt>
+
+The Silicon Labs Si470x FM Radio Receivers driver
+=================================================
+
+Copyright |copy| 2009 Tobias Lorenz <tobias.lorenz@gmx.net>
+
+
+Information from Silicon Labs
+-----------------------------
+
+Silicon Laboratories is the manufacturer of the radio ICs, that nowadays are the
+most often used radio receivers in cell phones. Usually they are connected with
+I2C. But SiLabs also provides a reference design, which integrates this IC,
+together with a small microcontroller C8051F321, to form a USB radio.
+Part of this reference design is also a radio application in binary and source
+code. The software also contains an automatic firmware upgrade to the most
+current version. Information on these can be downloaded here:
+http://www.silabs.com/usbradio
+
+
+Supported ICs
+-------------
+
+The following ICs have a very similar register set, so that they are or will be
+supported somewhen by the driver:
+
+- Si4700: FM radio receiver
+- Si4701: FM radio receiver, RDS Support
+- Si4702: FM radio receiver
+- Si4703: FM radio receiver, RDS Support
+- Si4704: FM radio receiver, no external antenna required
+- Si4705: FM radio receiver, no external antenna required, RDS support, Dig I/O
+- Si4706: Enhanced FM RDS/TMC radio receiver, no external antenna required, RDS
+ Support
+- Si4707: Dedicated weather band radio receiver with SAME decoder, RDS Support
+- Si4708: Smallest FM receivers
+- Si4709: Smallest FM receivers, RDS Support
+
+More information on these can be downloaded here:
+http://www.silabs.com/products/mcu/Pages/USBFMRadioRD.aspx
+
+
+Supported USB devices
+---------------------
+
+Currently the following USB radios (vendor:product) with the Silicon Labs si470x
+chips are known to work:
+
+- 10c4:818a: Silicon Labs USB FM Radio Reference Design
+- 06e1:a155: ADS/Tech FM Radio Receiver (formerly Instant FM Music) (RDX-155-EF)
+- 1b80:d700: KWorld USB FM Radio SnapMusic Mobile 700 (FM700)
+- 10c5:819a: Sanei Electric, Inc. FM USB Radio (sold as DealExtreme.com PCear)
+
+
+Software
+--------
+
+Testing is usually done with most application under Debian/testing:
+
+- fmtools - Utility for managing FM tuner cards
+- gnomeradio - FM-radio tuner for the GNOME desktop
+- gradio - GTK FM radio tuner
+- kradio - Comfortable Radio Application for KDE
+- radio - ncurses-based radio application
+- mplayer - The Ultimate Movie Player For Linux
+- v4l2-ctl - Collection of command line video4linux utilities
+
+For example, you can use:
+
+.. code-block:: none
+
+ v4l2-ctl -d /dev/radio0 --set-ctrl=volume=10,mute=0 --set-freq=95.21 --all
+
+There is also a library libv4l, which can be used. It's going to have a function
+for frequency seeking, either by using hardware functionality as in radio-si470x
+or by implementing a function as we currently have in every of the mentioned
+programs. Somewhen the radio programs should make use of libv4l.
+
+For processing RDS information, there is a project ongoing at:
+http://rdsd.berlios.de/
+
+There is currently no project for making TMC sentences human readable.
+
+
+Audio Listing
+-------------
+
+USB Audio is provided by the ALSA snd_usb_audio module. It is recommended to
+also select SND_USB_AUDIO, as this is required to get sound from the radio. For
+listing you have to redirect the sound, for example using one of the following
+commands. Please adjust the audio devices to your needs (/dev/dsp* and hw:x,x).
+
+If you just want to test audio (very poor quality):
+
+.. code-block:: none
+
+ cat /dev/dsp1 > /dev/dsp
+
+If you use sox + OSS try:
+
+.. code-block:: none
+
+ sox -2 --endian little -r 96000 -t oss /dev/dsp1 -t oss /dev/dsp
+
+or using sox + alsa:
+
+.. code-block:: none
+
+ sox --endian little -c 2 -S -r 96000 -t alsa hw:1 -t alsa -r 96000 hw:0
+
+If you use arts try:
+
+.. code-block:: none
+
+ arecord -D hw:1,0 -r96000 -c2 -f S16_LE | artsdsp aplay -B -
+
+If you use mplayer try:
+
+.. code-block:: none
+
+ mplayer -radio adevice=hw=1.0:arate=96000 \
+ -rawaudio rate=96000 \
+ radio://<frequency>/capture
+
+Module Parameters
+-----------------
+
+After loading the module, you still have access to some of them in the sysfs
+mount under /sys/module/radio_si470x/parameters. The contents of read-only files
+(0444) are not updated, even if space, band and de are changed using private
+video controls. The others are runtime changeable.
+
+
+Errors
+------
+
+Increase tune_timeout, if you often get -EIO errors.
+
+When timed out or band limit is reached, hw_freq_seek returns -EAGAIN.
+
+If you get any errors from snd_usb_audio, please report them to the ALSA people.
+
+
+Open Issues
+-----------
+
+V4L minor device allocation and parameter setting is not perfect. A solution is
+currently under discussion.
+
+There is an USB interface for downloading/uploading new firmware images. Support
+for it can be implemented using the request_firmware interface.
+
+There is a RDS interrupt mode. The driver is already using the same interface
+for polling RDS information, but is currently not using the interrupt mode.
+
+There is a LED interface, which can be used to override the LED control
+programmed in the firmware. This can be made available using the LED support
+functions in the kernel.
+
+
+Other useful information and links
+----------------------------------
+
+http://www.silabs.com/usbradio
diff --git a/Documentation/admin-guide/media/si4713.rst b/Documentation/admin-guide/media/si4713.rst
new file mode 100644
index 000000000000..be8e6b49b7b4
--- /dev/null
+++ b/Documentation/admin-guide/media/si4713.rst
@@ -0,0 +1,192 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: <isonum.txt>
+
+The Silicon Labs Si4713 FM Radio Transmitter Driver
+===================================================
+
+Copyright |copy| 2009 Nokia Corporation
+
+Contact: Eduardo Valentin <eduardo.valentin@nokia.com>
+
+
+Information about the Device
+----------------------------
+
+This chip is a Silicon Labs product. It is a I2C device, currently on 0x63 address.
+Basically, it has transmission and signal noise level measurement features.
+
+The Si4713 integrates transmit functions for FM broadcast stereo transmission.
+The chip also allows integrated receive power scanning to identify low signal
+power FM channels.
+
+The chip is programmed using commands and responses. There are also several
+properties which can change the behavior of this chip.
+
+Users must comply with local regulations on radio frequency (RF) transmission.
+
+Device driver description
+-------------------------
+
+There are two modules to handle this device. One is a I2C device driver
+and the other is a platform driver.
+
+The I2C device driver exports a v4l2-subdev interface to the kernel.
+All properties can also be accessed by v4l2 extended controls interface, by
+using the v4l2-subdev calls (g_ext_ctrls, s_ext_ctrls).
+
+The platform device driver exports a v4l2 radio device interface to user land.
+So, it uses the I2C device driver as a sub device in order to send the user
+commands to the actual device. Basically it is a wrapper to the I2C device driver.
+
+Applications can use v4l2 radio API to specify frequency of operation, mute state,
+etc. But mostly of its properties will be present in the extended controls.
+
+When the v4l2 mute property is set to 1 (true), the driver will turn the chip off.
+
+Properties description
+----------------------
+
+The properties can be accessed using v4l2 extended controls.
+Here is an output from v4l2-ctl util:
+
+.. code-block:: none
+
+ / # v4l2-ctl -d /dev/radio0 --all -L
+ Driver Info:
+ Driver name : radio-si4713
+ Card type : Silicon Labs Si4713 Modulator
+ Bus info :
+ Driver version: 0
+ Capabilities : 0x00080800
+ RDS Output
+ Modulator
+ Audio output: 0 (FM Modulator Audio Out)
+ Frequency: 1408000 (88.000000 MHz)
+ Video Standard = 0x00000000
+ Modulator:
+ Name : FM Modulator
+ Capabilities : 62.5 Hz stereo rds
+ Frequency range : 76.0 MHz - 108.0 MHz
+ Subchannel modulation: stereo+rds
+
+ User Controls
+
+ mute (bool) : default=1 value=0
+
+ FM Radio Modulator Controls
+
+ rds_signal_deviation (int) : min=0 max=90000 step=10 default=200 value=200 flags=slider
+ rds_program_id (int) : min=0 max=65535 step=1 default=0 value=0
+ rds_program_type (int) : min=0 max=31 step=1 default=0 value=0
+ rds_ps_name (str) : min=0 max=96 step=8 value='si4713 '
+ rds_radio_text (str) : min=0 max=384 step=32 value=''
+ audio_limiter_feature_enabled (bool) : default=1 value=1
+ audio_limiter_release_time (int) : min=250 max=102390 step=50 default=5010 value=5010 flags=slider
+ audio_limiter_deviation (int) : min=0 max=90000 step=10 default=66250 value=66250 flags=slider
+ audio_compression_feature_enabl (bool) : default=1 value=1
+ audio_compression_gain (int) : min=0 max=20 step=1 default=15 value=15 flags=slider
+ audio_compression_threshold (int) : min=-40 max=0 step=1 default=-40 value=-40 flags=slider
+ audio_compression_attack_time (int) : min=0 max=5000 step=500 default=0 value=0 flags=slider
+ audio_compression_release_time (int) : min=100000 max=1000000 step=100000 default=1000000 value=1000000 flags=slider
+ pilot_tone_feature_enabled (bool) : default=1 value=1
+ pilot_tone_deviation (int) : min=0 max=90000 step=10 default=6750 value=6750 flags=slider
+ pilot_tone_frequency (int) : min=0 max=19000 step=1 default=19000 value=19000 flags=slider
+ pre_emphasis_settings (menu) : min=0 max=2 default=1 value=1
+ tune_power_level (int) : min=0 max=120 step=1 default=88 value=88 flags=slider
+ tune_antenna_capacitor (int) : min=0 max=191 step=1 default=0 value=110 flags=slider
+
+Here is a summary of them:
+
+* Pilot is an audible tone sent by the device.
+
+- pilot_frequency - Configures the frequency of the stereo pilot tone.
+- pilot_deviation - Configures pilot tone frequency deviation level.
+- pilot_enabled - Enables or disables the pilot tone feature.
+
+* The si4713 device is capable of applying audio compression to the
+ transmitted signal.
+
+- acomp_enabled - Enables or disables the audio dynamic range control feature.
+- acomp_gain - Sets the gain for audio dynamic range control.
+- acomp_threshold - Sets the threshold level for audio dynamic range control.
+- acomp_attack_time - Sets the attack time for audio dynamic range control.
+- acomp_release_time - Sets the release time for audio dynamic range control.
+
+* Limiter setups audio deviation limiter feature. Once a over deviation occurs,
+ it is possible to adjust the front-end gain of the audio input and always
+ prevent over deviation.
+
+- limiter_enabled - Enables or disables the limiter feature.
+- limiter_deviation - Configures audio frequency deviation level.
+- limiter_release_time - Sets the limiter release time.
+
+* Tuning power
+
+- power_level - Sets the output power level for signal transmission.
+ antenna_capacitor - This selects the value of antenna tuning capacitor
+ manually or automatically if set to zero.
+
+* RDS related
+
+- rds_ps_name - Sets the RDS ps name field for transmission.
+- rds_radio_text - Sets the RDS radio text for transmission.
+- rds_pi - Sets the RDS PI field for transmission.
+- rds_pty - Sets the RDS PTY field for transmission.
+
+* Region related
+
+- preemphasis - sets the preemphasis to be applied for transmission.
+
+RNL
+---
+
+This device also has an interface to measure received noise level. To do that, you should
+ioctl the device node. Here is an code of example:
+
+.. code-block:: none
+
+ int main (int argc, char *argv[])
+ {
+ struct si4713_rnl rnl;
+ int fd = open("/dev/radio0", O_RDWR);
+ int rval;
+
+ if (argc < 2)
+ return -EINVAL;
+
+ if (fd < 0)
+ return fd;
+
+ sscanf(argv[1], "%d", &rnl.frequency);
+
+ rval = ioctl(fd, SI4713_IOC_MEASURE_RNL, &rnl);
+ if (rval < 0)
+ return rval;
+
+ printf("received noise level: %d\n", rnl.rnl);
+
+ close(fd);
+ }
+
+The struct si4713_rnl and SI4713_IOC_MEASURE_RNL are defined under
+include/linux/platform_data/media/si4713.h.
+
+Stereo/Mono and RDS subchannels
+-------------------------------
+
+The device can also be configured using the available sub channels for
+transmission. To do that use S/G_MODULATOR ioctl and configure txsubchans properly.
+Refer to the V4L2 API specification for proper use of this ioctl.
+
+Testing
+-------
+Testing is usually done with v4l2-ctl utility for managing FM tuner cards.
+The tool can be found in v4l-dvb repository under v4l2-apps/util directory.
+
+Example for setting rds ps name:
+
+.. code-block:: none
+
+ # v4l2-ctl -d /dev/radio0 --set-ctrl=rds_ps_name="Dummy"
+
diff --git a/Documentation/admin-guide/media/si476x.rst b/Documentation/admin-guide/media/si476x.rst
new file mode 100644
index 000000000000..c8882ee9f208
--- /dev/null
+++ b/Documentation/admin-guide/media/si476x.rst
@@ -0,0 +1,160 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: <isonum.txt>
+
+
+The SI476x Driver
+=================
+
+Copyright |copy| 2013 Andrey Smirnov <andrew.smirnov@gmail.com>
+
+TODO for the driver
+-------------------
+
+- According to the SiLabs' datasheet it is possible to update the
+ firmware of the radio chip in the run-time, thus bringing it to the
+ most recent version. Unfortunately I couldn't find any mentioning of
+ the said firmware update for the old chips that I tested the driver
+ against, so for chips like that the driver only exposes the old
+ functionality.
+
+
+Parameters exposed over debugfs
+-------------------------------
+SI476x allow user to get multiple characteristics that can be very
+useful for EoL testing/RF performance estimation, parameters that have
+very little to do with V4L2 subsystem. Such parameters are exposed via
+debugfs and can be accessed via regular file I/O operations.
+
+The drivers exposes following files:
+
+* /sys/kernel/debug/<device-name>/acf
+ This file contains ACF(Automatically Controlled Features) status
+ information. The contents of the file is binary data of the
+ following layout:
+
+ .. tabularcolumns:: |p{7ex}|p{12ex}|L|
+
+ ============= ============== ====================================
+ Offset Name Description
+ ============= ============== ====================================
+ 0x00 blend_int Flag, set when stereo separation has
+ crossed below the blend threshold
+ 0x01 hblend_int Flag, set when HiBlend cutoff
+ frequency is lower than threshold
+ 0x02 hicut_int Flag, set when HiCut cutoff
+ frequency is lower than threshold
+ 0x03 chbw_int Flag, set when channel filter
+ bandwidth is less than threshold
+ 0x04 softmute_int Flag indicating that softmute
+ attenuation has increased above
+ softmute threshold
+ 0x05 smute 0 - Audio is not soft muted
+ 1 - Audio is soft muted
+ 0x06 smattn Soft mute attenuation level in dB
+ 0x07 chbw Channel filter bandwidth in kHz
+ 0x08 hicut HiCut cutoff frequency in units of
+ 100Hz
+ 0x09 hiblend HiBlend cutoff frequency in units
+ of 100 Hz
+ 0x10 pilot 0 - Stereo pilot is not present
+ 1 - Stereo pilot is present
+ 0x11 stblend Stereo blend in %
+ ============= ============== ====================================
+
+
+* /sys/kernel/debug/<device-name>/rds_blckcnt
+ This file contains statistics about RDS receptions. It's binary data
+ has the following layout:
+
+ .. tabularcolumns:: |p{7ex}|p{12ex}|L|
+
+ ============= ============== ====================================
+ Offset Name Description
+ ============= ============== ====================================
+ 0x00 expected Number of expected RDS blocks
+ 0x02 received Number of received RDS blocks
+ 0x04 uncorrectable Number of uncorrectable RDS blocks
+ ============= ============== ====================================
+
+* /sys/kernel/debug/<device-name>/agc
+ This file contains information about parameters pertaining to
+ AGC(Automatic Gain Control)
+
+ The layout is:
+
+ .. tabularcolumns:: |p{7ex}|p{12ex}|L|
+
+ ============= ============== ====================================
+ Offset Name Description
+ ============= ============== ====================================
+ 0x00 mxhi 0 - FM Mixer PD high threshold is
+ not tripped
+ 1 - FM Mixer PD high threshold is
+ tripped
+ 0x01 mxlo ditto for FM Mixer PD low
+ 0x02 lnahi ditto for FM LNA PD high
+ 0x03 lnalo ditto for FM LNA PD low
+ 0x04 fmagc1 FMAGC1 attenuator resistance
+ (see datasheet for more detail)
+ 0x05 fmagc2 ditto for FMAGC2
+ 0x06 pgagain PGA gain in dB
+ 0x07 fmwblang FM/WB LNA Gain in dB
+ ============= ============== ====================================
+
+* /sys/kernel/debug/<device-name>/rsq
+ This file contains information about parameters pertaining to
+ RSQ(Received Signal Quality)
+
+ The layout is:
+
+ .. tabularcolumns:: |p{7ex}|p{12ex}|p{60ex}|
+
+ ============= ============== ====================================
+ Offset Name Description
+ ============= ============== ====================================
+ 0x00 multhint 0 - multipath value has not crossed
+ the Multipath high threshold
+ 1 - multipath value has crossed
+ the Multipath high threshold
+ 0x01 multlint ditto for Multipath low threshold
+ 0x02 snrhint 0 - received signal's SNR has not
+ crossed high threshold
+ 1 - received signal's SNR has
+ crossed high threshold
+ 0x03 snrlint ditto for low threshold
+ 0x04 rssihint ditto for RSSI high threshold
+ 0x05 rssilint ditto for RSSI low threshold
+ 0x06 bltf Flag indicating if seek command
+ reached/wrapped seek band limit
+ 0x07 snr_ready Indicates that SNR metrics is ready
+ 0x08 rssiready ditto for RSSI metrics
+ 0x09 injside 0 - Low-side injection is being used
+ 1 - High-side injection is used
+ 0x10 afcrl Flag indicating if AFC rails
+ 0x11 valid Flag indicating if channel is valid
+ 0x12 readfreq Current tuned frequency
+ 0x14 freqoff Signed frequency offset in units of
+ 2ppm
+ 0x15 rssi Signed value of RSSI in dBuV
+ 0x16 snr Signed RF SNR in dB
+ 0x17 issi Signed Image Strength Signal
+ indicator
+ 0x18 lassi Signed Low side adjacent Channel
+ Strength indicator
+ 0x19 hassi ditto for High side
+ 0x20 mult Multipath indicator
+ 0x21 dev Frequency deviation
+ 0x24 assi Adjacent channel SSI
+ 0x25 usn Ultrasonic noise indicator
+ 0x26 pilotdev Pilot deviation in units of 100 Hz
+ 0x27 rdsdev ditto for RDS
+ 0x28 assidev ditto for ASSI
+ 0x29 strongdev Frequency deviation
+ 0x30 rdspi RDS PI code
+ ============= ============== ====================================
+
+* /sys/kernel/debug/<device-name>/rsq_primary
+ This file contains information about parameters pertaining to
+ RSQ(Received Signal Quality) for primary tuner only. Layout is as
+ the one above.
diff --git a/Documentation/admin-guide/media/siano-cardlist.rst b/Documentation/admin-guide/media/siano-cardlist.rst
new file mode 100644
index 000000000000..bb731a953878
--- /dev/null
+++ b/Documentation/admin-guide/media/siano-cardlist.rst
@@ -0,0 +1,56 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Siano cards list
+================
+
+.. tabularcolumns:: p{13.3cm}|p{4.2cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 17 16
+ :stub-columns: 0
+
+ * - Card name
+ - USB IDs
+ * - Hauppauge Catamount
+ - 2040:1700
+ * - Hauppauge Okemo-A
+ - 2040:1800
+ * - Hauppauge Okemo-B
+ - 2040:1801
+ * - Hauppauge WinTV MiniCard
+ - 2040:2000, 2040:200a, 2040:2010, 2040:2011, 2040:2019
+ * - Hauppauge WinTV MiniCard Rev 2
+ - 2040:2009
+ * - Hauppauge WinTV MiniStick
+ - 2040:5500, 2040:5510, 2040:5520, 2040:5530, 2040:5580, 2040:5590, 2040:b900, 2040:b910, 2040:b980, 2040:b990, 2040:c000, 2040:c010, 2040:c080, 2040:c090, 2040:c0a0, 2040:f5a0
+ * - Hauppauge microStick 77e
+ - 2013:0257
+ * - ONDA Data Card Digital Receiver
+ - 19D2:0078
+ * - Siano Denver (ATSC-M/H) Digital Receiver
+ - 187f:0800
+ * - Siano Denver (TDMB) Digital Receiver
+ - 187f:0700
+ * - Siano Ming Digital Receiver
+ - 187f:0310
+ * - Siano Nice Digital Receiver
+ - 187f:0202, 187f:0202
+ * - Siano Nova A Digital Receiver
+ - 187f:0200
+ * - Siano Nova B Digital Receiver
+ - 187f:0201
+ * - Siano Pele Digital Receiver
+ - 187f:0500
+ * - Siano Rio Digital Receiver
+ - 187f:0600, 3275:0080
+ * - Siano Stellar Digital Receiver
+ - 187f:0100
+ * - Siano Stellar Digital Receiver ROM
+ - 187f:0010
+ * - Siano Vega Digital Receiver
+ - 187f:0300
+ * - Siano Venice Digital Receiver
+ - 187f:0301, 187f:0301, 187f:0302
+ * - ZTE Data Card Digital Receiver
+ - 19D2:0086
diff --git a/Documentation/admin-guide/media/starfive_camss.rst b/Documentation/admin-guide/media/starfive_camss.rst
new file mode 100644
index 000000000000..ca42e9447c47
--- /dev/null
+++ b/Documentation/admin-guide/media/starfive_camss.rst
@@ -0,0 +1,72 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. include:: <isonum.txt>
+
+================================
+Starfive Camera Subsystem driver
+================================
+
+Introduction
+------------
+
+This file documents the driver for the Starfive Camera Subsystem found on
+Starfive JH7110 SoC. The driver is located under drivers/staging/media/starfive/
+camss.
+
+The driver implements V4L2, Media controller and v4l2_subdev interfaces. Camera
+sensor using V4L2 subdev interface in the kernel is supported.
+
+The driver has been successfully used on the Gstreamer 1.18.5 with v4l2src
+plugin.
+
+
+Starfive Camera Subsystem hardware
+----------------------------------
+
+The Starfive Camera Subsystem hardware consists of::
+
+ |\ +---------------+ +-----------+
+ +----------+ | \ | | | |
+ | | | | | | | |
+ | MIPI |----->| |----->| ISP |----->| |
+ | | | | | | | |
+ +----------+ | | | | | Memory |
+ |MUX| +---------------+ | Interface |
+ +----------+ | | | |
+ | | | |---------------------------->| |
+ | Parallel |----->| | | |
+ | | | | | |
+ +----------+ | / | |
+ |/ +-----------+
+
+- MIPI: The MIPI interface, receiving data from a MIPI CSI-2 camera sensor.
+
+- Parallel: The parallel interface, receiving data from a parallel sensor.
+
+- ISP: The ISP, processing raw Bayer data from an image sensor and producing
+ YUV frames.
+
+
+Topology
+--------
+
+The media controller pipeline graph is as follows:
+
+.. _starfive_camss_graph:
+
+.. kernel-figure:: starfive_camss_graph.dot
+ :alt: starfive_camss_graph.dot
+ :align: center
+
+The driver has 2 video devices:
+
+- capture_raw: The capture device, capturing image data directly from a sensor.
+- capture_yuv: The capture device, capturing YUV frame data processed by the
+ ISP module
+
+The driver has 3 subdevices:
+
+- stf_isp: is responsible for all the isp operations, outputs YUV frames.
+- cdns_csi2rx: a CSI-2 bridge supporting up to 4 CSI lanes in input, and 4
+ different pixel streams in output.
+- imx219: an image sensor, image data is sent through MIPI CSI-2.
diff --git a/Documentation/admin-guide/media/starfive_camss_graph.dot b/Documentation/admin-guide/media/starfive_camss_graph.dot
new file mode 100644
index 000000000000..8eff1f161ac7
--- /dev/null
+++ b/Documentation/admin-guide/media/starfive_camss_graph.dot
@@ -0,0 +1,12 @@
+digraph board {
+ rankdir=TB
+ n00000001 [label="{{<port0> 0} | stf_isp\n/dev/v4l-subdev0 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000001:port1 -> n00000008 [style=dashed]
+ n00000004 [label="capture_raw\n/dev/video0", shape=box, style=filled, fillcolor=yellow]
+ n00000008 [label="capture_yuv\n/dev/video1", shape=box, style=filled, fillcolor=yellow]
+ n0000000e [label="{{<port0> 0} | cdns_csi2rx.19800000.csi-bridge\n | {<port1> 1 | <port2> 2 | <port3> 3 | <port4> 4}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000000e:port1 -> n00000001:port0 [style=dashed]
+ n0000000e:port1 -> n00000004 [style=dashed]
+ n00000018 [label="{{} | imx219 6-0010\n/dev/v4l-subdev1 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000018:port0 -> n0000000e:port0 [style=bold]
+}
diff --git a/Documentation/admin-guide/media/technisat.rst b/Documentation/admin-guide/media/technisat.rst
new file mode 100644
index 000000000000..9eaa12366bbf
--- /dev/null
+++ b/Documentation/admin-guide/media/technisat.rst
@@ -0,0 +1,100 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+How to set up the Technisat/B2C2 Flexcop devices
+================================================
+
+.. note::
+
+ This documentation is outdated.
+
+Author: Uwe Bugla <uwe.bugla@gmx.de> August 2009
+
+Find out what device you have
+-----------------------------
+
+Important Notice: The driver does NOT support Technisat USB 2 devices!
+
+First start your linux box with a shipped kernel:
+
+.. code-block:: none
+
+ lspci -vvv for a PCI device (lsusb -vvv for an USB device) will show you for example:
+ 02:0b.0 Network controller: Techsan Electronics Co Ltd B2C2 FlexCopII DVB chip /
+ Technisat SkyStar2 DVB card (rev 02)
+
+ dmesg | grep frontend may show you for example:
+ DVB: registering frontend 0 (Conexant CX24123/CX24109)...
+
+Kernel compilation:
+-------------------
+
+If the Flexcop / Technisat is the only DVB / TV / Radio device in your box
+get rid of unnecessary modules and check this one:
+
+``Multimedia support`` => ``Customise analog and hybrid tuner modules to build``
+
+In this directory uncheck every driver which is activated there
+(except ``Simple tuner support`` for ATSC 3rd generation only -> see case 9 please).
+
+Then please activate:
+
+- Main module part:
+
+ ``Multimedia support`` => ``DVB/ATSC adapters`` => ``Technisat/B2C2 FlexcopII(b) and FlexCopIII adapters``
+
+ #) => ``Technisat/B2C2 Air/Sky/Cable2PC PCI`` (PCI card) or
+ #) => ``Technisat/B2C2 Air/Sky/Cable2PC USB`` (USB 1.1 adapter)
+ and for troubleshooting purposes:
+ #) => ``Enable debug for the B2C2 FlexCop drivers``
+
+- Frontend / Tuner / Demodulator module part:
+
+ ``Multimedia support`` => ``DVB/ATSC adapters``
+ => ``Customise the frontend modules to build`` ``Customise DVB frontends`` =>
+
+ - SkyStar DVB-S Revision 2.3:
+
+ #) => ``Zarlink VP310/MT312/ZL10313 based``
+ #) => ``Generic I2C PLL based tuners``
+
+ - SkyStar DVB-S Revision 2.6:
+
+ #) => ``ST STV0299 based``
+ #) => ``Generic I2C PLL based tuners``
+
+ - SkyStar DVB-S Revision 2.7:
+
+ #) => ``Samsung S5H1420 based``
+ #) => ``Integrant ITD1000 Zero IF tuner for DVB-S/DSS``
+ #) => ``ISL6421 SEC controller``
+
+ - SkyStar DVB-S Revision 2.8:
+
+ #) => ``Conexant CX24123 based``
+ #) => ``Conexant CX24113/CX24128 tuner for DVB-S/DSS``
+ #) => ``ISL6421 SEC controller``
+
+ - AirStar DVB-T card:
+
+ #) => ``Zarlink MT352 based``
+ #) => ``Generic I2C PLL based tuners``
+
+ - CableStar DVB-C card:
+
+ #) => ``ST STV0297 based``
+ #) => ``Generic I2C PLL based tuners``
+
+ - AirStar ATSC card 1st generation:
+
+ #) => ``Broadcom BCM3510``
+
+ - AirStar ATSC card 2nd generation:
+
+ #) => ``NxtWave Communications NXT2002/NXT2004 based``
+ #) => ``Generic I2C PLL based tuners``
+
+ - AirStar ATSC card 3rd generation:
+
+ #) => ``LG Electronics LGDT3302/LGDT3303 based``
+ #) ``Multimedia support`` => ``Customise analog and hybrid tuner modules to build`` => ``Simple tuner support``
+
diff --git a/Documentation/admin-guide/media/ttusb-dec.rst b/Documentation/admin-guide/media/ttusb-dec.rst
new file mode 100644
index 000000000000..516bbab8a872
--- /dev/null
+++ b/Documentation/admin-guide/media/ttusb-dec.rst
@@ -0,0 +1,45 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+TechnoTrend/Hauppauge DEC USB Driver
+====================================
+
+Driver Status
+-------------
+
+Supported:
+
+ - DEC2000-t
+ - DEC2450-t
+ - DEC3000-s
+ - Video Streaming
+ - Audio Streaming
+ - Section Filters
+ - Channel Zapping
+ - Hotplug firmware loader
+
+To Do:
+
+ - Tuner status information
+ - DVB network interface
+ - Streaming video PC->DEC
+ - Conax support for 2450-t
+
+Getting the Firmware
+--------------------
+To download the firmware, use the following commands:
+
+.. code-block:: none
+
+ scripts/get_dvb_firmware dec2000t
+ scripts/get_dvb_firmware dec2540t
+ scripts/get_dvb_firmware dec3000s
+
+
+Hotplug Firmware Loading
+------------------------
+
+Since 2.6 kernels, the firmware is loaded at the point that the driver module
+is loaded.
+
+Copy the three files downloaded above into the /usr/lib/hotplug/firmware or
+/lib/firmware directory (depending on configuration of firmware hotplug).
diff --git a/Documentation/admin-guide/media/tuner-cardlist.rst b/Documentation/admin-guide/media/tuner-cardlist.rst
new file mode 100644
index 000000000000..362617c59c5d
--- /dev/null
+++ b/Documentation/admin-guide/media/tuner-cardlist.rst
@@ -0,0 +1,100 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Tuner cards list
+================
+
+============ =====================================================
+Tuner number Card name
+============ =====================================================
+0 Temic PAL (4002 FH5)
+1 Philips PAL_I (FI1246 and compatibles)
+2 Philips NTSC (FI1236,FM1236 and compatibles)
+3 Philips (SECAM+PAL_BG) (FI1216MF, FM1216MF, FR1216MF)
+4 NoTuner
+5 Philips PAL_BG (FI1216 and compatibles)
+6 Temic NTSC (4032 FY5)
+7 Temic PAL_I (4062 FY5)
+8 Temic NTSC (4036 FY5)
+9 Alps HSBH1
+10 Alps TSBE1
+11 Alps TSBB5
+12 Alps TSBE5
+13 Alps TSBC5
+14 Temic PAL_BG (4006FH5)
+15 Alps TSCH6
+16 Temic PAL_DK (4016 FY5)
+17 Philips NTSC_M (MK2)
+18 Temic PAL_I (4066 FY5)
+19 Temic PAL* auto (4006 FN5)
+20 Temic PAL_BG (4009 FR5) or PAL_I (4069 FR5)
+21 Temic NTSC (4039 FR5)
+22 Temic PAL/SECAM multi (4046 FM5)
+23 Philips PAL_DK (FI1256 and compatibles)
+24 Philips PAL/SECAM multi (FQ1216ME)
+25 LG PAL_I+FM (TAPC-I001D)
+26 LG PAL_I (TAPC-I701D)
+27 LG NTSC+FM (TPI8NSR01F)
+28 LG PAL_BG+FM (TPI8PSB01D)
+29 LG PAL_BG (TPI8PSB11D)
+30 Temic PAL* auto + FM (4009 FN5)
+31 SHARP NTSC_JP (2U5JF5540)
+32 Samsung PAL TCPM9091PD27
+33 MT20xx universal
+34 Temic PAL_BG (4106 FH5)
+35 Temic PAL_DK/SECAM_L (4012 FY5)
+36 Temic NTSC (4136 FY5)
+37 LG PAL (newer TAPC series)
+38 Philips PAL/SECAM multi (FM1216ME MK3)
+39 LG NTSC (newer TAPC series)
+40 HITACHI V7-J180AT
+41 Philips PAL_MK (FI1216 MK)
+42 Philips FCV1236D ATSC/NTSC dual in
+43 Philips NTSC MK3 (FM1236MK3 or FM1236/F)
+44 Philips 4 in 1 (ATI TV Wonder Pro/Conexant)
+45 Microtune 4049 FM5
+46 Panasonic VP27s/ENGE4324D
+47 LG NTSC (TAPE series)
+48 Tenna TNF 8831 BGFF)
+49 Microtune 4042 FI5 ATSC/NTSC dual in
+50 TCL 2002N
+51 Philips PAL/SECAM_D (FM 1256 I-H3)
+52 Thomson DTT 7610 (ATSC/NTSC)
+53 Philips FQ1286
+54 Philips/NXP TDA 8290/8295 + 8275/8275A/18271
+55 TCL 2002MB
+56 Philips PAL/SECAM multi (FQ1216AME MK4)
+57 Philips FQ1236A MK4
+58 Ymec TVision TVF-8531MF/8831MF/8731MF
+59 Ymec TVision TVF-5533MF
+60 Thomson DTT 761X (ATSC/NTSC)
+61 Tena TNF9533-D/IF/TNF9533-B/DF
+62 Philips TEA5767HN FM Radio
+63 Philips FMD1216ME MK3 Hybrid Tuner
+64 LG TDVS-H06xF
+65 Ymec TVF66T5-B/DFF
+66 LG TALN series
+67 Philips TD1316 Hybrid Tuner
+68 Philips TUV1236D ATSC/NTSC dual in
+69 Tena TNF 5335 and similar models
+70 Samsung TCPN 2121P30A
+71 Xceive xc2028/xc3028 tuner
+72 Thomson FE6600
+73 Samsung TCPG 6121P30A
+75 Philips TEA5761 FM Radio
+76 Xceive 5000 tuner
+77 TCL tuner MF02GIP-5N-E
+78 Philips FMD1216MEX MK3 Hybrid Tuner
+79 Philips PAL/SECAM multi (FM1216 MK5)
+80 Philips FQ1216LME MK3 PAL/SECAM w/active loopthrough
+81 Partsnic (Daewoo) PTI-5NF05
+82 Philips CU1216L
+83 NXP TDA18271
+84 Sony BTF-Pxn01Z
+85 Philips FQ1236 MK5
+86 Tena TNF5337 MFD
+87 Xceive 4000 tuner
+88 Xceive 5000C tuner
+89 Sony BTF-PG472Z PAL/SECAM
+90 Sony BTF-PK467Z NTSC-M-JP
+91 Sony BTF-PB463Z NTSC-M
+============ =====================================================
diff --git a/Documentation/admin-guide/media/usb-cardlist.rst b/Documentation/admin-guide/media/usb-cardlist.rst
new file mode 100644
index 000000000000..5f5ab0723e48
--- /dev/null
+++ b/Documentation/admin-guide/media/usb-cardlist.rst
@@ -0,0 +1,149 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+USB drivers
+===========
+
+The USB boards are identified by an identification called USB ID.
+
+The ``lsusb`` command allows identifying the USB IDs::
+
+ $ lsusb
+ ...
+ Bus 001 Device 015: ID 046d:082d Logitech, Inc. HD Pro Webcam C920
+ Bus 001 Device 074: ID 2040:b131 Hauppauge
+ Bus 001 Device 075: ID 2013:024f PCTV Systems nanoStick T2 290e
+ ...
+
+Newer camera devices use a standard way to expose themselves as such,
+via USB Video Class. Those cameras are automatically supported by the
+``uvc-driver``.
+
+Older cameras and TV USB devices uses USB Vendor Classes: each vendor
+defines its own way to access the device. This section contains
+card lists for such vendor-class devices.
+
+While this is not as common as on PCI, sometimes the same USB ID is used
+by different products. So, several media drivers allow passing a ``card=``
+parameter, in order to setup a card number that would match the correct
+settings for an specific product type.
+
+The current supported USB cards (not including staging drivers) are
+listed below\ [#]_.
+
+.. [#]
+
+ some of the drivers have sub-drivers, not shown at this table.
+ In particular, gspca driver has lots of sub-drivers,
+ for cameras not supported by the USB Video Class (UVC) driver,
+ as shown at :doc:`gspca card list <gspca-cardlist>`.
+
+====================== =========================================================
+Driver Name
+====================== =========================================================
+airspy AirSpy
+au0828 Auvitek AU0828
+b2c2-flexcop-usb Technisat/B2C2 Air/Sky/Cable2PC USB
+cx231xx Conexant cx231xx USB video capture
+dvb-as102 Abilis AS102 DVB receiver
+dvb-ttusb-budget Technotrend/Hauppauge Nova - USB devices
+dvb-usb-a800 AVerMedia AverTV DVB-T USB 2.0 (A800)
+dvb-usb-af9005 Afatech AF9005 DVB-T USB1.1
+dvb-usb-af9015 Afatech AF9015 DVB-T USB2.0
+dvb-usb-af9035 Afatech AF9035 DVB-T USB2.0
+dvb-usb-anysee Anysee DVB-T/C USB2.0
+dvb-usb-au6610 Alcor Micro AU6610 USB2.0
+dvb-usb-az6007 AzureWave 6007 and clones DVB-T/C USB2.0
+dvb-usb-az6027 Azurewave DVB-S/S2 USB2.0 AZ6027
+dvb-usb-ce6230 Intel CE6230 DVB-T USB2.0
+dvb-usb-cinergyT2 Terratec CinergyT2/qanu USB 2.0 DVB-T
+dvb-usb-cxusb Conexant USB2.0 hybrid
+dvb-usb-dib0700 DiBcom DiB0700
+dvb-usb-dibusb-common DiBcom DiB3000M-B
+dvb-usb-dibusb-mc DiBcom DiB3000M-C/P
+dvb-usb-digitv Nebula Electronics uDigiTV DVB-T USB2.0
+dvb-usb-dtt200u WideView WT-200U and WT-220U (pen) DVB-T
+dvb-usb-dtv5100 AME DTV-5100 USB2.0 DVB-T
+dvb-usb-dvbsky DVBSky USB
+dvb-usb-dw2102 DvbWorld & TeVii DVB-S/S2 USB2.0
+dvb-usb-ec168 E3C EC168 DVB-T USB2.0
+dvb-usb-gl861 Genesys Logic GL861 USB2.0
+dvb-usb-gp8psk GENPIX 8PSK->USB module
+dvb-usb-lmedm04 LME DM04/QQBOX DVB-S USB2.0
+dvb-usb-m920x Uli m920x DVB-T USB2.0
+dvb-usb-nova-t-usb2 Hauppauge WinTV-NOVA-T usb2 DVB-T USB2.0
+dvb-usb-opera Opera1 DVB-S USB2.0 receiver
+dvb-usb-pctv452e Pinnacle PCTV HDTV Pro USB device/TT Connect S2-3600
+dvb-usb-rtl28xxu Realtek RTL28xxU DVB USB
+dvb-usb-technisat-usb2 Technisat DVB-S/S2 USB2.0
+dvb-usb-ttusb2 Pinnacle 400e DVB-S USB2.0
+dvb-usb-umt-010 HanfTek UMT-010 DVB-T USB2.0
+dvb_usb_v2 Support for various USB DVB devices v2
+dvb-usb-vp702x TwinhanDTV StarBox and clones DVB-S USB2.0
+dvb-usb-vp7045 TwinhanDTV Alpha/MagicBoxII, DNTV tinyUSB2, Beetle USB2.0
+em28xx Empia EM28xx USB devices
+go7007 WIS GO7007 MPEG encoder
+gspca Drivers for several USB Cameras
+hackrf HackRF
+hdpvr Hauppauge HD PVR
+msi2500 Mirics MSi2500
+mxl111sf-tuner MxL111SF DTV USB2.0
+pvrusb2 Hauppauge WinTV-PVR USB2
+pwc USB Philips Cameras
+s2250 Sensoray 2250/2251
+s2255drv USB Sensoray 2255 video capture device
+smsusb Siano SMS1xxx based MDTV receiver
+ttusb_dec Technotrend/Hauppauge USB DEC devices
+usbtv USBTV007 video capture
+uvcvideo USB Video Class (UVC)
+zd1301 ZyDAS ZD1301
+====================== =========================================================
+
+.. toctree::
+ :maxdepth: 1
+
+ au0828-cardlist
+ cx231xx-cardlist
+ em28xx-cardlist
+ siano-cardlist
+
+ gspca-cardlist
+
+ dvb-usb-dib0700-cardlist
+ dvb-usb-dibusb-mb-cardlist
+ dvb-usb-dibusb-mc-cardlist
+
+ dvb-usb-a800-cardlist
+ dvb-usb-af9005-cardlist
+ dvb-usb-az6027-cardlist
+ dvb-usb-cinergyT2-cardlist
+ dvb-usb-cxusb-cardlist
+ dvb-usb-digitv-cardlist
+ dvb-usb-dtt200u-cardlist
+ dvb-usb-dtv5100-cardlist
+ dvb-usb-dw2102-cardlist
+ dvb-usb-gp8psk-cardlist
+ dvb-usb-m920x-cardlist
+ dvb-usb-nova-t-usb2-cardlist
+ dvb-usb-opera1-cardlist
+ dvb-usb-pctv452e-cardlist
+ dvb-usb-technisat-usb2-cardlist
+ dvb-usb-ttusb2-cardlist
+ dvb-usb-umt-010-cardlist
+ dvb-usb-vp702x-cardlist
+ dvb-usb-vp7045-cardlist
+
+ dvb-usb-af9015-cardlist
+ dvb-usb-af9035-cardlist
+ dvb-usb-anysee-cardlist
+ dvb-usb-au6610-cardlist
+ dvb-usb-az6007-cardlist
+ dvb-usb-ce6230-cardlist
+ dvb-usb-dvbsky-cardlist
+ dvb-usb-ec168-cardlist
+ dvb-usb-gl861-cardlist
+ dvb-usb-lmedm04-cardlist
+ dvb-usb-mxl111sf-cardlist
+ dvb-usb-rtl28xxu-cardlist
+ dvb-usb-zd1301-cardlist
+
+ other-usb-cardlist
diff --git a/Documentation/admin-guide/media/v4l-drivers.rst b/Documentation/admin-guide/media/v4l-drivers.rst
new file mode 100644
index 000000000000..f4bb2605f07e
--- /dev/null
+++ b/Documentation/admin-guide/media/v4l-drivers.rst
@@ -0,0 +1,34 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _uapi-v4l-drivers:
+
+===============================================
+Video4Linux (V4L) driver-specific documentation
+===============================================
+
+.. toctree::
+ :maxdepth: 2
+
+ bttv
+ cafe_ccic
+ cx88
+ fimc
+ imx
+ imx7
+ ipu3
+ ivtv
+ mgb4
+ omap3isp
+ omap4_camera
+ philips
+ qcom_camss
+ rcar-fdp1
+ rkisp1
+ saa7134
+ si470x
+ si4713
+ si476x
+ starfive_camss
+ vimc
+ visl
+ vivid
diff --git a/Documentation/admin-guide/media/vimc.dot b/Documentation/admin-guide/media/vimc.dot
new file mode 100644
index 000000000000..92a5bb631235
--- /dev/null
+++ b/Documentation/admin-guide/media/vimc.dot
@@ -0,0 +1,26 @@
+# SPDX-License-Identifier: GPL-2.0
+
+digraph board {
+ rankdir=TB
+ n00000001 [label="{{} | Sensor A\n/dev/v4l-subdev0 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000001:port0 -> n00000005:port0 [style=bold]
+ n00000001:port0 -> n0000000b [style=bold]
+ n00000001 -> n00000002
+ n00000002 [label="{{} | Lens A\n/dev/v4l-subdev5 | {<port0>}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000003 [label="{{} | Sensor B\n/dev/v4l-subdev1 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000003:port0 -> n00000008:port0 [style=bold]
+ n00000003:port0 -> n0000000f [style=bold]
+ n00000003 -> n00000004
+ n00000004 [label="{{} | Lens B\n/dev/v4l-subdev6 | {<port0>}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000005 [label="{{<port0> 0} | Debayer A\n/dev/v4l-subdev2 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000005:port1 -> n00000015:port0
+ n00000008 [label="{{<port0> 0} | Debayer B\n/dev/v4l-subdev3 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000008:port1 -> n00000015:port0 [style=dashed]
+ n0000000b [label="Raw Capture 0\n/dev/video0", shape=box, style=filled, fillcolor=yellow]
+ n0000000f [label="Raw Capture 1\n/dev/video1", shape=box, style=filled, fillcolor=yellow]
+ n00000013 [label="{{} | RGB/YUV Input\n/dev/v4l-subdev4 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000013:port0 -> n00000015:port0 [style=dashed]
+ n00000015 [label="{{<port0> 0} | Scaler\n/dev/v4l-subdev5 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000015:port1 -> n00000018 [style=bold]
+ n00000018 [label="RGB/YUV Capture\n/dev/video2", shape=box, style=filled, fillcolor=yellow]
+}
diff --git a/Documentation/admin-guide/media/vimc.rst b/Documentation/admin-guide/media/vimc.rst
new file mode 100644
index 000000000000..29d843a8ddb1
--- /dev/null
+++ b/Documentation/admin-guide/media/vimc.rst
@@ -0,0 +1,110 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+The Virtual Media Controller Driver (vimc)
+==========================================
+
+The vimc driver emulates complex video hardware using the V4L2 API and the Media
+API. It has a capture device and three subdevices: sensor, debayer and scaler.
+
+Topology
+--------
+
+The topology is hardcoded, although you could modify it in vimc-core and
+recompile the driver to achieve your own topology. This is the default topology:
+
+.. _vimc_topology_graph:
+
+.. kernel-figure:: vimc.dot
+ :alt: Diagram of the default media pipeline topology
+ :align: center
+
+ Media pipeline graph on vimc
+
+Configuring the topology
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Each subdevice will come with its default configuration (pixelformat, height,
+width, ...). One needs to configure the topology in order to match the
+configuration on each linked subdevice to stream frames through the pipeline.
+If the configuration doesn't match, the stream will fail. The ``v4l-utils``
+package is a bundle of user-space applications, that comes with ``media-ctl`` and
+``v4l2-ctl`` that can be used to configure the vimc configuration. This sequence
+of commands fits for the default topology:
+
+.. code-block:: bash
+
+ media-ctl -d platform:vimc -V '"Sensor A":0[fmt:SBGGR8_1X8/640x480]'
+ media-ctl -d platform:vimc -V '"Debayer A":0[fmt:SBGGR8_1X8/640x480]'
+ media-ctl -d platform:vimc -V '"Scaler":0[fmt:RGB888_1X24/640x480]'
+ media-ctl -d platform:vimc -V '"Scaler":0[crop:(100,50)/400x150]'
+ media-ctl -d platform:vimc -V '"Scaler":1[fmt:RGB888_1X24/300x700]'
+ v4l2-ctl -z platform:vimc -d "RGB/YUV Capture" -v width=300,height=700
+ v4l2-ctl -z platform:vimc -d "Raw Capture 0" -v pixelformat=BA81
+
+Subdevices
+----------
+
+Subdevices define the behavior of an entity in the topology. Depending on the
+subdevice, the entity can have multiple pads of type source or sink.
+
+vimc-sensor:
+ Generates images in several formats using video test pattern generator.
+ Exposes:
+
+ * 1 Pad source
+
+vimc-lens:
+ Ancillary lens for a sensor. Supports auto focus control. Linked to
+ a vimc-sensor using an ancillary link. The lens supports FOCUS_ABSOLUTE
+ control.
+
+.. code-block:: bash
+
+ media-ctl -p
+ ...
+ - entity 28: Lens A (0 pad, 0 link)
+ type V4L2 subdev subtype Lens flags 0
+ device node name /dev/v4l-subdev6
+ - entity 29: Lens B (0 pad, 0 link)
+ type V4L2 subdev subtype Lens flags 0
+ device node name /dev/v4l-subdev7
+ v4l2-ctl -d /dev/v4l-subdev7 -C focus_absolute
+ focus_absolute: 0
+
+
+vimc-debayer:
+ Transforms images in bayer format into a non-bayer format.
+ Exposes:
+
+ * 1 Pad sink
+ * 1 Pad source
+
+vimc-scaler:
+ Re-size the image to meet the source pad resolution. E.g.: if the sync
+ pad is configured to 360x480 and the source to 1280x720, the image will
+ be stretched to fit the source resolution. Works for any resolution
+ within the vimc limitations (even shrinking the image if necessary).
+ Exposes:
+
+ * 1 Pad sink
+ * 1 Pad source
+
+vimc-capture:
+ Exposes node /dev/videoX to allow userspace to capture the stream.
+ Exposes:
+
+ * 1 Pad sink
+ * 1 Pad source
+
+Module options
+--------------
+
+Vimc has a module parameter to configure the driver.
+
+* ``allocator=<unsigned int>``
+
+ memory allocator selection, default is 0. It specifies the way buffers
+ will be allocated.
+
+ - 0: vmalloc
+ - 1: dma-contig
diff --git a/Documentation/admin-guide/media/visl.rst b/Documentation/admin-guide/media/visl.rst
new file mode 100644
index 000000000000..db1ef29438e1
--- /dev/null
+++ b/Documentation/admin-guide/media/visl.rst
@@ -0,0 +1,177 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+The Virtual Stateless Decoder Driver (visl)
+===========================================
+
+A virtual stateless decoder device for stateless uAPI development
+purposes.
+
+This tool's objective is to help the development and testing of
+userspace applications that use the V4L2 stateless API to decode media.
+
+A userspace implementation can use visl to run a decoding loop even when
+no hardware is available or when the kernel uAPI for the codec has not
+been upstreamed yet. This can reveal bugs at an early stage.
+
+This driver can also trace the contents of the V4L2 controls submitted
+to it. It can also dump the contents of the vb2 buffers through a
+debugfs interface. This is in many ways similar to the tracing
+infrastructure available for other popular encode/decode APIs out there
+and can help develop a userspace application by using another (working)
+one as a reference.
+
+.. note::
+
+ No actual decoding of video frames is performed by visl. The
+ V4L2 test pattern generator is used to write various debug information
+ to the capture buffers instead.
+
+Module parameters
+-----------------
+
+- visl_debug: Activates debug info, printing various debug messages through
+ dprintk. Also controls whether per-frame debug info is shown. Defaults to off.
+ Note that enabling this feature can result in slow performance through serial.
+
+- visl_transtime_ms: Simulated process time in milliseconds. Slowing down the
+ decoding speed can be useful for debugging.
+
+- visl_dprintk_frame_start, visl_dprintk_frame_nframes: Dictates a range of
+ frames where dprintk is activated. This only controls the dprintk tracing on a
+ per-frame basis. Note that printing a lot of data can be slow through serial.
+
+- keep_bitstream_buffers: Controls whether bitstream (i.e. OUTPUT) buffers are
+ kept after a decoding session. Defaults to false so as to reduce the amount of
+ clutter. keep_bitstream_buffers == false works well when live debugging the
+ client program with GDB.
+
+- bitstream_trace_frame_start, bitstream_trace_nframes: Similar to
+ visl_dprintk_frame_start, visl_dprintk_nframes, but controls the dumping of
+ buffer data through debugfs instead.
+
+What is the default use case for this driver?
+---------------------------------------------
+
+This driver can be used as a way to compare different userspace implementations.
+This assumes that a working client is run against visl and that the ftrace and
+OUTPUT buffer data is subsequently used to debug a work-in-progress
+implementation.
+
+Information on reference frames, their timestamps, the status of the OUTPUT and
+CAPTURE queues and more can be read directly from the CAPTURE buffers.
+
+Supported codecs
+----------------
+
+The following codecs are supported:
+
+- FWHT
+- MPEG2
+- VP8
+- VP9
+- H.264
+- HEVC
+- AV1
+
+visl trace events
+-----------------
+The trace events are defined on a per-codec basis, e.g.:
+
+.. code-block:: bash
+
+ $ ls /sys/kernel/tracing/events/ | grep visl
+ visl_av1_controls
+ visl_fwht_controls
+ visl_h264_controls
+ visl_hevc_controls
+ visl_mpeg2_controls
+ visl_vp8_controls
+ visl_vp9_controls
+
+For example, in order to dump HEVC SPS data:
+
+.. code-block:: bash
+
+ $ echo 1 > /sys/kernel/tracing/events/visl_hevc_controls/v4l2_ctrl_hevc_sps/enable
+
+The SPS data will be dumped to the trace buffer, i.e.:
+
+.. code-block:: bash
+
+ $ cat /sys/kernel/tracing/trace
+ video_parameter_set_id 0
+ seq_parameter_set_id 0
+ pic_width_in_luma_samples 1920
+ pic_height_in_luma_samples 1080
+ bit_depth_luma_minus8 0
+ bit_depth_chroma_minus8 0
+ log2_max_pic_order_cnt_lsb_minus4 4
+ sps_max_dec_pic_buffering_minus1 6
+ sps_max_num_reorder_pics 2
+ sps_max_latency_increase_plus1 0
+ log2_min_luma_coding_block_size_minus3 0
+ log2_diff_max_min_luma_coding_block_size 3
+ log2_min_luma_transform_block_size_minus2 0
+ log2_diff_max_min_luma_transform_block_size 3
+ max_transform_hierarchy_depth_inter 2
+ max_transform_hierarchy_depth_intra 2
+ pcm_sample_bit_depth_luma_minus1 0
+ pcm_sample_bit_depth_chroma_minus1 0
+ log2_min_pcm_luma_coding_block_size_minus3 0
+ log2_diff_max_min_pcm_luma_coding_block_size 0
+ num_short_term_ref_pic_sets 0
+ num_long_term_ref_pics_sps 0
+ chroma_format_idc 1
+ sps_max_sub_layers_minus1 0
+ flags AMP_ENABLED|SAMPLE_ADAPTIVE_OFFSET|TEMPORAL_MVP_ENABLED|STRONG_INTRA_SMOOTHING_ENABLED
+
+
+Dumping OUTPUT buffer data through debugfs
+------------------------------------------
+
+If the **VISL_DEBUGFS** Kconfig is enabled, visl will populate
+**/sys/kernel/debug/visl/bitstream** with OUTPUT buffer data according to the
+values of bitstream_trace_frame_start and bitstream_trace_nframes. This can
+highlight errors as broken clients may fail to fill the buffers properly.
+
+A single file is created for each processed OUTPUT buffer. Its name contains an
+integer that denotes the buffer sequence, i.e.:
+
+.. code-block:: c
+
+ snprintf(name, 32, "bitstream%d", run->src->sequence);
+
+Dumping the values is simply a matter of reading from the file, i.e.:
+
+For the buffer with sequence == 0:
+
+.. code-block:: bash
+
+ $ xxd /sys/kernel/debug/visl/bitstream/bitstream0
+ 00000000: 2601 af04 d088 bc25 a173 0e41 a4f2 3274 &......%.s.A..2t
+ 00000010: c668 cb28 e775 b4ac f53a ba60 f8fd 3aa1 .h.(.u...:.`..:.
+ 00000020: 46b4 bcfc 506c e227 2372 e5f5 d7ea 579f F...Pl.'#r....W.
+ 00000030: 6371 5eb5 0eb8 23b5 ca6a 5de5 983a 19e4 cq^...#..j]..:..
+ 00000040: e8c3 4320 b4ba a226 cbc1 4138 3a12 32d6 ..C ...&..A8:.2.
+ 00000050: fef3 247b 3523 4e90 9682 ac8e eb0c a389 ..${5#N.........
+ 00000060: ddd0 6cfc 0187 0e20 7aae b15b 1812 3d33 ..l.... z..[..=3
+ 00000070: e1c5 f425 a83a 00b7 4f18 8127 3c4c aefb ...%.:..O..'<L..
+
+For the buffer with sequence == 1:
+
+.. code-block:: bash
+
+ $ xxd /sys/kernel/debug/visl/bitstream/bitstream1
+ 00000000: 0201 d021 49e1 0c40 aa11 1449 14a6 01dc ...!I..@...I....
+ 00000010: 7023 889a c8cd 2cd0 13b4 dab0 e8ca 21fe p#....,.......!.
+ 00000020: c4c8 ab4c 486e 4e2f b0df 96cc c74e 8dde ...LHnN/.....N..
+ 00000030: 8ce7 ee36 d880 4095 4d64 30a0 ff4f 0c5e ...6..@.Md0..O.^
+ 00000040: f16b a6a1 d806 ca2a 0ece a673 7bea 1f37 .k.....*...s{..7
+ 00000050: 370f 5bb9 1dc4 ba21 6434 bc53 0173 cba0 7.[....!d4.S.s..
+ 00000060: dfe6 bc99 01ea b6e0 346b 92b5 c8de 9f5d ........4k.....]
+ 00000070: e7cc 3484 1769 fef2 a693 a945 2c8b 31da ..4..i.....E,.1.
+
+And so on.
+
+By default, the files are removed during STREAMOFF. This is to reduce the amount
+of clutter.
diff --git a/Documentation/admin-guide/media/vivid.rst b/Documentation/admin-guide/media/vivid.rst
new file mode 100644
index 000000000000..58ac25b2c385
--- /dev/null
+++ b/Documentation/admin-guide/media/vivid.rst
@@ -0,0 +1,1416 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+The Virtual Video Test Driver (vivid)
+=====================================
+
+This driver emulates video4linux hardware of various types: video capture, video
+output, vbi capture and output, metadata capture and output, radio receivers and
+transmitters, touch capture and a software defined radio receiver. In addition a
+simple framebuffer device is available for testing capture and output overlays.
+
+Up to 64 vivid instances can be created, each with up to 16 inputs and 16 outputs.
+
+Each input can be a webcam, TV capture device, S-Video capture device or an HDMI
+capture device. Each output can be an S-Video output device or an HDMI output
+device.
+
+These inputs and outputs act exactly as a real hardware device would behave. This
+allows you to use this driver as a test input for application development, since
+you can test the various features without requiring special hardware.
+
+This document describes the features implemented by this driver:
+
+- Support for read()/write(), MMAP, USERPTR and DMABUF streaming I/O.
+- A large list of test patterns and variations thereof
+- Working brightness, contrast, saturation and hue controls
+- Support for the alpha color component
+- Full colorspace support, including limited/full RGB range
+- All possible control types are present
+- Support for various pixel aspect ratios and video aspect ratios
+- Error injection to test what happens if errors occur
+- Supports crop/compose/scale in any combination for both input and output
+- Can emulate up to 4K resolutions
+- All Field settings are supported for testing interlaced capturing
+- Supports all standard YUV and RGB formats, including two multiplanar YUV formats
+- Raw and Sliced VBI capture and output support
+- Radio receiver and transmitter support, including RDS support
+- Software defined radio (SDR) support
+- Capture and output overlay support
+- Metadata capture and output support
+- Touch capture support
+
+These features will be described in more detail below.
+
+Configuring the driver
+----------------------
+
+By default the driver will create a single instance that has a video capture
+device with webcam, TV, S-Video and HDMI inputs, a video output device with
+S-Video and HDMI outputs, one vbi capture device, one vbi output device, one
+radio receiver device, one radio transmitter device and one SDR device.
+
+The number of instances, devices, video inputs and outputs and their types are
+all configurable using the following module options:
+
+- n_devs:
+
+ number of driver instances to create. By default set to 1. Up to 64
+ instances can be created.
+
+- node_types:
+
+ which devices should each driver instance create. An array of
+ hexadecimal values, one for each instance. The default is 0x1d3d.
+ Each value is a bitmask with the following meaning:
+
+ - bit 0: Video Capture node
+ - bit 2-3: VBI Capture node: 0 = none, 1 = raw vbi, 2 = sliced vbi, 3 = both
+ - bit 4: Radio Receiver node
+ - bit 5: Software Defined Radio Receiver node
+ - bit 8: Video Output node
+ - bit 10-11: VBI Output node: 0 = none, 1 = raw vbi, 2 = sliced vbi, 3 = both
+ - bit 12: Radio Transmitter node
+ - bit 16: Framebuffer for testing overlays
+ - bit 17: Metadata Capture node
+ - bit 18: Metadata Output node
+ - bit 19: Touch Capture node
+
+ So to create four instances, the first two with just one video capture
+ device, the second two with just one video output device you would pass
+ these module options to vivid:
+
+ .. code-block:: none
+
+ n_devs=4 node_types=0x1,0x1,0x100,0x100
+
+- num_inputs:
+
+ the number of inputs, one for each instance. By default 4 inputs
+ are created for each video capture device. At most 16 inputs can be created,
+ and there must be at least one.
+
+- input_types:
+
+ the input types for each instance, the default is 0xe4. This defines
+ what the type of each input is when the inputs are created for each driver
+ instance. This is a hexadecimal value with up to 16 pairs of bits, each
+ pair gives the type and bits 0-1 map to input 0, bits 2-3 map to input 1,
+ 30-31 map to input 15. Each pair of bits has the following meaning:
+
+ - 00: this is a webcam input
+ - 01: this is a TV tuner input
+ - 10: this is an S-Video input
+ - 11: this is an HDMI input
+
+ So to create a video capture device with 8 inputs where input 0 is a TV
+ tuner, inputs 1-3 are S-Video inputs and inputs 4-7 are HDMI inputs you
+ would use the following module options:
+
+ .. code-block:: none
+
+ num_inputs=8 input_types=0xffa9
+
+- num_outputs:
+
+ the number of outputs, one for each instance. By default 2 outputs
+ are created for each video output device. At most 16 outputs can be
+ created, and there must be at least one.
+
+- output_types:
+
+ the output types for each instance, the default is 0x02. This defines
+ what the type of each output is when the outputs are created for each
+ driver instance. This is a hexadecimal value with up to 16 bits, each bit
+ gives the type and bit 0 maps to output 0, bit 1 maps to output 1, bit
+ 15 maps to output 15. The meaning of each bit is as follows:
+
+ - 0: this is an S-Video output
+ - 1: this is an HDMI output
+
+ So to create a video output device with 8 outputs where outputs 0-3 are
+ S-Video outputs and outputs 4-7 are HDMI outputs you would use the
+ following module options:
+
+ .. code-block:: none
+
+ num_outputs=8 output_types=0xf0
+
+- vid_cap_nr:
+
+ give the desired videoX start number for each video capture device.
+ The default is -1 which will just take the first free number. This allows
+ you to map capture video nodes to specific videoX device nodes. Example:
+
+ .. code-block:: none
+
+ n_devs=4 vid_cap_nr=2,4,6,8
+
+ This will attempt to assign /dev/video2 for the video capture device of
+ the first vivid instance, video4 for the next up to video8 for the last
+ instance. If it can't succeed, then it will just take the next free
+ number.
+
+- vid_out_nr:
+
+ give the desired videoX start number for each video output device.
+ The default is -1 which will just take the first free number.
+
+- vbi_cap_nr:
+
+ give the desired vbiX start number for each vbi capture device.
+ The default is -1 which will just take the first free number.
+
+- vbi_out_nr:
+
+ give the desired vbiX start number for each vbi output device.
+ The default is -1 which will just take the first free number.
+
+- radio_rx_nr:
+
+ give the desired radioX start number for each radio receiver device.
+ The default is -1 which will just take the first free number.
+
+- radio_tx_nr:
+
+ give the desired radioX start number for each radio transmitter
+ device. The default is -1 which will just take the first free number.
+
+- sdr_cap_nr:
+
+ give the desired swradioX start number for each SDR capture device.
+ The default is -1 which will just take the first free number.
+
+- meta_cap_nr:
+
+ give the desired videoX start number for each metadata capture device.
+ The default is -1 which will just take the first free number.
+
+- meta_out_nr:
+
+ give the desired videoX start number for each metadata output device.
+ The default is -1 which will just take the first free number.
+
+- touch_cap_nr:
+
+ give the desired v4l-touchX start number for each touch capture device.
+ The default is -1 which will just take the first free number.
+
+- ccs_cap_mode:
+
+ specify the allowed video capture crop/compose/scaling combination
+ for each driver instance. Video capture devices can have any combination
+ of cropping, composing and scaling capabilities and this will tell the
+ vivid driver which of those is should emulate. By default the user can
+ select this through controls.
+
+ The value is either -1 (controlled by the user) or a set of three bits,
+ each enabling (1) or disabling (0) one of the features:
+
+ - bit 0:
+
+ Enable crop support. Cropping will take only part of the
+ incoming picture.
+ - bit 1:
+
+ Enable compose support. Composing will copy the incoming
+ picture into a larger buffer.
+
+ - bit 2:
+
+ Enable scaling support. Scaling can scale the incoming
+ picture. The scaler of the vivid driver can enlarge up
+ or down to four times the original size. The scaler is
+ very simple and low-quality. Simplicity and speed were
+ key, not quality.
+
+ Note that this value is ignored by webcam inputs: those enumerate
+ discrete framesizes and that is incompatible with cropping, composing
+ or scaling.
+
+- ccs_out_mode:
+
+ specify the allowed video output crop/compose/scaling combination
+ for each driver instance. Video output devices can have any combination
+ of cropping, composing and scaling capabilities and this will tell the
+ vivid driver which of those is should emulate. By default the user can
+ select this through controls.
+
+ The value is either -1 (controlled by the user) or a set of three bits,
+ each enabling (1) or disabling (0) one of the features:
+
+ - bit 0:
+
+ Enable crop support. Cropping will take only part of the
+ outgoing buffer.
+
+ - bit 1:
+
+ Enable compose support. Composing will copy the incoming
+ buffer into a larger picture frame.
+
+ - bit 2:
+
+ Enable scaling support. Scaling can scale the incoming
+ buffer. The scaler of the vivid driver can enlarge up
+ or down to four times the original size. The scaler is
+ very simple and low-quality. Simplicity and speed were
+ key, not quality.
+
+- multiplanar:
+
+ select whether each device instance supports multi-planar formats,
+ and thus the V4L2 multi-planar API. By default device instances are
+ single-planar.
+
+ This module option can override that for each instance. Values are:
+
+ - 1: this is a single-planar instance.
+ - 2: this is a multi-planar instance.
+
+- vivid_debug:
+
+ enable driver debugging info
+
+- no_error_inj:
+
+ if set disable the error injecting controls. This option is
+ needed in order to run a tool like v4l2-compliance. Tools like that
+ exercise all controls including a control like 'Disconnect' which
+ emulates a USB disconnect, making the device inaccessible and so
+ all tests that v4l2-compliance is doing will fail afterwards.
+
+ There may be other situations as well where you want to disable the
+ error injection support of vivid. When this option is set, then the
+ controls that select crop, compose and scale behavior are also
+ removed. Unless overridden by ccs_cap_mode and/or ccs_out_mode the
+ will default to enabling crop, compose and scaling.
+
+- allocators:
+
+ memory allocator selection, default is 0. It specifies the way buffers
+ will be allocated.
+
+ - 0: vmalloc
+ - 1: dma-contig
+
+- cache_hints:
+
+ specifies if the device should set queues' user-space cache and memory
+ consistency hint capability (V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS).
+ The hints are valid only when using MMAP streaming I/O. Default is 0.
+
+ - 0: forbid hints
+ - 1: allow hints
+
+Taken together, all these module options allow you to precisely customize
+the driver behavior and test your application with all sorts of permutations.
+It is also very suitable to emulate hardware that is not yet available, e.g.
+when developing software for a new upcoming device.
+
+
+Video Capture
+-------------
+
+This is probably the most frequently used feature. The video capture device
+can be configured by using the module options num_inputs, input_types and
+ccs_cap_mode (see section 1 for more detailed information), but by default
+four inputs are configured: a webcam, a TV tuner, an S-Video and an HDMI
+input, one input for each input type. Those are described in more detail
+below.
+
+Special attention has been given to the rate at which new frames become
+available. The jitter will be around 1 jiffie (that depends on the HZ
+configuration of your kernel, so usually 1/100, 1/250 or 1/1000 of a second),
+but the long-term behavior is exactly following the framerate. So a
+framerate of 59.94 Hz is really different from 60 Hz. If the framerate
+exceeds your kernel's HZ value, then you will get dropped frames, but the
+frame/field sequence counting will keep track of that so the sequence
+count will skip whenever frames are dropped.
+
+
+Webcam Input
+~~~~~~~~~~~~
+
+The webcam input supports three framesizes: 320x180, 640x360 and 1280x720. It
+supports frames per second settings of 10, 15, 25, 30, 50 and 60 fps. Which ones
+are available depends on the chosen framesize: the larger the framesize, the
+lower the maximum frames per second.
+
+The initially selected colorspace when you switch to the webcam input will be
+sRGB.
+
+
+TV and S-Video Inputs
+~~~~~~~~~~~~~~~~~~~~~
+
+The only difference between the TV and S-Video input is that the TV has a
+tuner. Otherwise they behave identically.
+
+These inputs support audio inputs as well: one TV and one Line-In. They
+both support all TV standards. If the standard is queried, then the Vivid
+controls 'Standard Signal Mode' and 'Standard' determine what
+the result will be.
+
+These inputs support all combinations of the field setting. Special care has
+been taken to faithfully reproduce how fields are handled for the different
+TV standards. This is particularly noticeable when generating a horizontally
+moving image so the temporal effect of using interlaced formats becomes clearly
+visible. For 50 Hz standards the top field is the oldest and the bottom field
+is the newest in time. For 60 Hz standards that is reversed: the bottom field
+is the oldest and the top field is the newest in time.
+
+When you start capturing in V4L2_FIELD_ALTERNATE mode the first buffer will
+contain the top field for 50 Hz standards and the bottom field for 60 Hz
+standards. This is what capture hardware does as well.
+
+Finally, for PAL/SECAM standards the first half of the top line contains noise.
+This simulates the Wide Screen Signal that is commonly placed there.
+
+The initially selected colorspace when you switch to the TV or S-Video input
+will be SMPTE-170M.
+
+The pixel aspect ratio will depend on the TV standard. The video aspect ratio
+can be selected through the 'Standard Aspect Ratio' Vivid control.
+Choices are '4x3', '16x9' which will give letterboxed widescreen video and
+'16x9 Anamorphic' which will give full screen squashed anamorphic widescreen
+video that will need to be scaled accordingly.
+
+The TV 'tuner' supports a frequency range of 44-958 MHz. Channels are available
+every 6 MHz, starting from 49.25 MHz. For each channel the generated image
+will be in color for the +/- 0.25 MHz around it, and in grayscale for
++/- 1 MHz around the channel. Beyond that it is just noise. The VIDIOC_G_TUNER
+ioctl will return 100% signal strength for +/- 0.25 MHz and 50% for +/- 1 MHz.
+It will also return correct afc values to show whether the frequency is too
+low or too high.
+
+The audio subchannels that are returned are MONO for the +/- 1 MHz range around
+a valid channel frequency. When the frequency is within +/- 0.25 MHz of the
+channel it will return either MONO, STEREO, either MONO | SAP (for NTSC) or
+LANG1 | LANG2 (for others), or STEREO | SAP.
+
+Which one is returned depends on the chosen channel, each next valid channel
+will cycle through the possible audio subchannel combinations. This allows
+you to test the various combinations by just switching channels..
+
+Finally, for these inputs the v4l2_timecode struct is filled in the
+dequeued v4l2_buffer struct.
+
+
+HDMI Input
+~~~~~~~~~~
+
+The HDMI inputs supports all CEA-861 and DMT timings, both progressive and
+interlaced, for pixelclock frequencies between 25 and 600 MHz. The field
+mode for interlaced formats is always V4L2_FIELD_ALTERNATE. For HDMI the
+field order is always top field first, and when you start capturing an
+interlaced format you will receive the top field first.
+
+The initially selected colorspace when you switch to the HDMI input or
+select an HDMI timing is based on the format resolution: for resolutions
+less than or equal to 720x576 the colorspace is set to SMPTE-170M, for
+others it is set to REC-709 (CEA-861 timings) or sRGB (VESA DMT timings).
+
+The pixel aspect ratio will depend on the HDMI timing: for 720x480 is it
+set as for the NTSC TV standard, for 720x576 it is set as for the PAL TV
+standard, and for all others a 1:1 pixel aspect ratio is returned.
+
+The video aspect ratio can be selected through the 'DV Timings Aspect Ratio'
+Vivid control. Choices are 'Source Width x Height' (just use the
+same ratio as the chosen format), '4x3' or '16x9', either of which can
+result in pillarboxed or letterboxed video.
+
+For HDMI inputs it is possible to set the EDID. By default a simple EDID
+is provided. You can only set the EDID for HDMI inputs. Internally, however,
+the EDID is shared between all HDMI inputs.
+
+No interpretation is done of the EDID data with the exception of the
+physical address. See the CEC section for more details.
+
+There is a maximum of 15 HDMI inputs (if there are more, then they will be
+reduced to 15) since that's the limitation of the EDID physical address.
+
+
+Video Output
+------------
+
+The video output device can be configured by using the module options
+num_outputs, output_types and ccs_out_mode (see section 1 for more detailed
+information), but by default two outputs are configured: an S-Video and an
+HDMI input, one output for each output type. Those are described in more detail
+below.
+
+Like with video capture the framerate is also exact in the long term.
+
+
+S-Video Output
+~~~~~~~~~~~~~~
+
+This output supports audio outputs as well: "Line-Out 1" and "Line-Out 2".
+The S-Video output supports all TV standards.
+
+This output supports all combinations of the field setting.
+
+The initially selected colorspace when you switch to the TV or S-Video input
+will be SMPTE-170M.
+
+
+HDMI Output
+~~~~~~~~~~~
+
+The HDMI output supports all CEA-861 and DMT timings, both progressive and
+interlaced, for pixelclock frequencies between 25 and 600 MHz. The field
+mode for interlaced formats is always V4L2_FIELD_ALTERNATE.
+
+The initially selected colorspace when you switch to the HDMI output or
+select an HDMI timing is based on the format resolution: for resolutions
+less than or equal to 720x576 the colorspace is set to SMPTE-170M, for
+others it is set to REC-709 (CEA-861 timings) or sRGB (VESA DMT timings).
+
+The pixel aspect ratio will depend on the HDMI timing: for 720x480 is it
+set as for the NTSC TV standard, for 720x576 it is set as for the PAL TV
+standard, and for all others a 1:1 pixel aspect ratio is returned.
+
+An HDMI output has a valid EDID which can be obtained through VIDIOC_G_EDID.
+
+There is a maximum of 15 HDMI outputs (if there are more, then they will be
+reduced to 15) since that's the limitation of the EDID physical address. See
+also the CEC section for more details.
+
+VBI Capture
+-----------
+
+There are three types of VBI capture devices: those that only support raw
+(undecoded) VBI, those that only support sliced (decoded) VBI and those that
+support both. This is determined by the node_types module option. In all
+cases the driver will generate valid VBI data: for 60 Hz standards it will
+generate Closed Caption and XDS data. The closed caption stream will
+alternate between "Hello world!" and "Closed captions test" every second.
+The XDS stream will give the current time once a minute. For 50 Hz standards
+it will generate the Wide Screen Signal which is based on the actual Video
+Aspect Ratio control setting and teletext pages 100-159, one page per frame.
+
+The VBI device will only work for the S-Video and TV inputs, it will give
+back an error if the current input is a webcam or HDMI.
+
+
+VBI Output
+----------
+
+There are three types of VBI output devices: those that only support raw
+(undecoded) VBI, those that only support sliced (decoded) VBI and those that
+support both. This is determined by the node_types module option.
+
+The sliced VBI output supports the Wide Screen Signal and the teletext signal
+for 50 Hz standards and Closed Captioning + XDS for 60 Hz standards.
+
+The VBI device will only work for the S-Video output, it will give
+back an error if the current output is HDMI.
+
+
+Radio Receiver
+--------------
+
+The radio receiver emulates an FM/AM/SW receiver. The FM band also supports RDS.
+The frequency ranges are:
+
+ - FM: 64 MHz - 108 MHz
+ - AM: 520 kHz - 1710 kHz
+ - SW: 2300 kHz - 26.1 MHz
+
+Valid channels are emulated every 1 MHz for FM and every 100 kHz for AM and SW.
+The signal strength decreases the further the frequency is from the valid
+frequency until it becomes 0% at +/- 50 kHz (FM) or 5 kHz (AM/SW) from the
+ideal frequency. The initial frequency when the driver is loaded is set to
+95 MHz.
+
+The FM receiver supports RDS as well, both using 'Block I/O' and 'Controls'
+modes. In the 'Controls' mode the RDS information is stored in read-only
+controls. These controls are updated every time the frequency is changed,
+or when the tuner status is requested. The Block I/O method uses the read()
+interface to pass the RDS blocks on to the application for decoding.
+
+The RDS signal is 'detected' for +/- 12.5 kHz around the channel frequency,
+and the further the frequency is away from the valid frequency the more RDS
+errors are randomly introduced into the block I/O stream, up to 50% of all
+blocks if you are +/- 12.5 kHz from the channel frequency. All four errors
+can occur in equal proportions: blocks marked 'CORRECTED', blocks marked
+'ERROR', blocks marked 'INVALID' and dropped blocks.
+
+The generated RDS stream contains all the standard fields contained in a
+0B group, and also radio text and the current time.
+
+The receiver supports HW frequency seek, either in Bounded mode, Wrap Around
+mode or both, which is configurable with the "Radio HW Seek Mode" control.
+
+
+Radio Transmitter
+-----------------
+
+The radio transmitter emulates an FM/AM/SW transmitter. The FM band also supports RDS.
+The frequency ranges are:
+
+ - FM: 64 MHz - 108 MHz
+ - AM: 520 kHz - 1710 kHz
+ - SW: 2300 kHz - 26.1 MHz
+
+The initial frequency when the driver is loaded is 95.5 MHz.
+
+The FM transmitter supports RDS as well, both using 'Block I/O' and 'Controls'
+modes. In the 'Controls' mode the transmitted RDS information is configured
+using controls, and in 'Block I/O' mode the blocks are passed to the driver
+using write().
+
+
+Software Defined Radio Receiver
+-------------------------------
+
+The SDR receiver has three frequency bands for the ADC tuner:
+
+ - 300 kHz
+ - 900 kHz - 2800 kHz
+ - 3200 kHz
+
+The RF tuner supports 50 MHz - 2000 MHz.
+
+The generated data contains the In-phase and Quadrature components of a
+1 kHz tone that has an amplitude of sqrt(2).
+
+
+Metadata Capture
+----------------
+
+The Metadata capture generates UVC format metadata. The PTS and SCR are
+transmitted based on the values set in vivid controls.
+
+The Metadata device will only work for the Webcam input, it will give
+back an error for all other inputs.
+
+
+Metadata Output
+---------------
+
+The Metadata output can be used to set brightness, contrast, saturation and hue.
+
+The Metadata device will only work for the Webcam output, it will give
+back an error for all other outputs.
+
+
+Touch Capture
+-------------
+
+The Touch capture generates touch patterns simulating single tap, double tap,
+triple tap, move from left to right, zoom in, zoom out, palm press (simulating
+a large area being pressed on a touchpad), and simulating 16 simultaneous
+touch points.
+
+Controls
+--------
+
+Different devices support different controls. The sections below will describe
+each control and which devices support them.
+
+
+User Controls - Test Controls
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The Button, Boolean, Integer 32 Bits, Integer 64 Bits, Menu, String, Bitmask and
+Integer Menu are controls that represent all possible control types. The Menu
+control and the Integer Menu control both have 'holes' in their menu list,
+meaning that one or more menu items return EINVAL when VIDIOC_QUERYMENU is called.
+Both menu controls also have a non-zero minimum control value. These features
+allow you to check if your application can handle such things correctly.
+These controls are supported for every device type.
+
+
+User Controls - Video Capture
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The following controls are specific to video capture.
+
+The Brightness, Contrast, Saturation and Hue controls actually work and are
+standard. There is one special feature with the Brightness control: each
+video input has its own brightness value, so changing input will restore
+the brightness for that input. In addition, each video input uses a different
+brightness range (minimum and maximum control values). Switching inputs will
+cause a control event to be sent with the V4L2_EVENT_CTRL_CH_RANGE flag set.
+This allows you to test controls that can change their range.
+
+The 'Gain, Automatic' and Gain controls can be used to test volatile controls:
+if 'Gain, Automatic' is set, then the Gain control is volatile and changes
+constantly. If 'Gain, Automatic' is cleared, then the Gain control is a normal
+control.
+
+The 'Horizontal Flip' and 'Vertical Flip' controls can be used to flip the
+image. These combine with the 'Sensor Flipped Horizontally/Vertically' Vivid
+controls.
+
+The 'Alpha Component' control can be used to set the alpha component for
+formats containing an alpha channel.
+
+
+User Controls - Audio
+~~~~~~~~~~~~~~~~~~~~~
+
+The following controls are specific to video capture and output and radio
+receivers and transmitters.
+
+The 'Volume' and 'Mute' audio controls are typical for such devices to
+control the volume and mute the audio. They don't actually do anything in
+the vivid driver.
+
+
+Vivid Controls
+~~~~~~~~~~~~~~
+
+These vivid custom controls control the image generation, error injection, etc.
+
+
+Test Pattern Controls
+^^^^^^^^^^^^^^^^^^^^^
+
+The Test Pattern Controls are all specific to video capture.
+
+- Test Pattern:
+
+ selects which test pattern to use. Use the CSC Colorbar for
+ testing colorspace conversions: the colors used in that test pattern
+ map to valid colors in all colorspaces. The colorspace conversion
+ is disabled for the other test patterns.
+
+- OSD Text Mode:
+
+ selects whether the text superimposed on the
+ test pattern should be shown, and if so, whether only counters should
+ be displayed or the full text.
+
+- Horizontal Movement:
+
+ selects whether the test pattern should
+ move to the left or right and at what speed.
+
+- Vertical Movement:
+
+ does the same for the vertical direction.
+
+- Show Border:
+
+ show a two-pixel wide border at the edge of the actual image,
+ excluding letter or pillarboxing.
+
+- Show Square:
+
+ show a square in the middle of the image. If the image is
+ displayed with the correct pixel and image aspect ratio corrections,
+ then the width and height of the square on the monitor should be
+ the same.
+
+- Insert SAV Code in Image:
+
+ adds a SAV (Start of Active Video) code to the image.
+ This can be used to check if such codes in the image are inadvertently
+ interpreted instead of being ignored.
+
+- Insert EAV Code in Image:
+
+ does the same for the EAV (End of Active Video) code.
+
+- Insert Video Guard Band
+
+ adds 4 columns of pixels with the HDMI Video Guard Band code at the
+ left hand side of the image. This only works with 3 or 4 byte RGB pixel
+ formats. The RGB pixel value 0xab/0x55/0xab turns out to be equivalent
+ to the HDMI Video Guard Band code that precedes each active video line
+ (see section 5.2.2.1 in the HDMI 1.3 Specification). To test if a video
+ receiver has correct HDMI Video Guard Band processing, enable this
+ control and then move the image to the left hand side of the screen.
+ That will result in video lines that start with multiple pixels that
+ have the same value as the Video Guard Band that precedes them.
+ Receivers that will just keep skipping Video Guard Band values will
+ now fail and either loose sync or these video lines will shift.
+
+
+Capture Feature Selection Controls
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These controls are all specific to video capture.
+
+- Sensor Flipped Horizontally:
+
+ the image is flipped horizontally and the
+ V4L2_IN_ST_HFLIP input status flag is set. This emulates the case where
+ a sensor is for example mounted upside down.
+
+- Sensor Flipped Vertically:
+
+ the image is flipped vertically and the
+ V4L2_IN_ST_VFLIP input status flag is set. This emulates the case where
+ a sensor is for example mounted upside down.
+
+- Standard Aspect Ratio:
+
+ selects if the image aspect ratio as used for the TV or
+ S-Video input should be 4x3, 16x9 or anamorphic widescreen. This may
+ introduce letterboxing.
+
+- DV Timings Aspect Ratio:
+
+ selects if the image aspect ratio as used for the HDMI
+ input should be the same as the source width and height ratio, or if
+ it should be 4x3 or 16x9. This may introduce letter or pillarboxing.
+
+- Timestamp Source:
+
+ selects when the timestamp for each buffer is taken.
+
+- Colorspace:
+
+ selects which colorspace should be used when generating the image.
+ This only applies if the CSC Colorbar test pattern is selected,
+ otherwise the test pattern will go through unconverted.
+ This behavior is also what you want, since a 75% Colorbar
+ should really have 75% signal intensity and should not be affected
+ by colorspace conversions.
+
+ Changing the colorspace will result in the V4L2_EVENT_SOURCE_CHANGE
+ to be sent since it emulates a detected colorspace change.
+
+- Transfer Function:
+
+ selects which colorspace transfer function should be used when
+ generating an image. This only applies if the CSC Colorbar test pattern is
+ selected, otherwise the test pattern will go through unconverted.
+ This behavior is also what you want, since a 75% Colorbar
+ should really have 75% signal intensity and should not be affected
+ by colorspace conversions.
+
+ Changing the transfer function will result in the V4L2_EVENT_SOURCE_CHANGE
+ to be sent since it emulates a detected colorspace change.
+
+- Y'CbCr Encoding:
+
+ selects which Y'CbCr encoding should be used when generating
+ a Y'CbCr image. This only applies if the format is set to a Y'CbCr format
+ as opposed to an RGB format.
+
+ Changing the Y'CbCr encoding will result in the V4L2_EVENT_SOURCE_CHANGE
+ to be sent since it emulates a detected colorspace change.
+
+- Quantization:
+
+ selects which quantization should be used for the RGB or Y'CbCr
+ encoding when generating the test pattern.
+
+ Changing the quantization will result in the V4L2_EVENT_SOURCE_CHANGE
+ to be sent since it emulates a detected colorspace change.
+
+- Limited RGB Range (16-235):
+
+ selects if the RGB range of the HDMI source should
+ be limited or full range. This combines with the Digital Video 'Rx RGB
+ Quantization Range' control and can be used to test what happens if
+ a source provides you with the wrong quantization range information.
+ See the description of that control for more details.
+
+- Apply Alpha To Red Only:
+
+ apply the alpha channel as set by the 'Alpha Component'
+ user control to the red color of the test pattern only.
+
+- Enable Capture Cropping:
+
+ enables crop support. This control is only present if
+ the ccs_cap_mode module option is set to the default value of -1 and if
+ the no_error_inj module option is set to 0 (the default).
+
+- Enable Capture Composing:
+
+ enables composing support. This control is only
+ present if the ccs_cap_mode module option is set to the default value of
+ -1 and if the no_error_inj module option is set to 0 (the default).
+
+- Enable Capture Scaler:
+
+ enables support for a scaler (maximum 4 times upscaling
+ and downscaling). This control is only present if the ccs_cap_mode
+ module option is set to the default value of -1 and if the no_error_inj
+ module option is set to 0 (the default).
+
+- Maximum EDID Blocks:
+
+ determines how many EDID blocks the driver supports.
+ Note that the vivid driver does not actually interpret new EDID
+ data, it just stores it. It allows for up to 256 EDID blocks
+ which is the maximum supported by the standard.
+
+- Fill Percentage of Frame:
+
+ can be used to draw only the top X percent
+ of the image. Since each frame has to be drawn by the driver, this
+ demands a lot of the CPU. For large resolutions this becomes
+ problematic. By drawing only part of the image this CPU load can
+ be reduced.
+
+
+Output Feature Selection Controls
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These controls are all specific to video output.
+
+- Enable Output Cropping:
+
+ enables crop support. This control is only present if
+ the ccs_out_mode module option is set to the default value of -1 and if
+ the no_error_inj module option is set to 0 (the default).
+
+- Enable Output Composing:
+
+ enables composing support. This control is only
+ present if the ccs_out_mode module option is set to the default value of
+ -1 and if the no_error_inj module option is set to 0 (the default).
+
+- Enable Output Scaler:
+
+ enables support for a scaler (maximum 4 times upscaling
+ and downscaling). This control is only present if the ccs_out_mode
+ module option is set to the default value of -1 and if the no_error_inj
+ module option is set to 0 (the default).
+
+
+Error Injection Controls
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The following two controls are only valid for video and vbi capture.
+
+- Standard Signal Mode:
+
+ selects the behavior of VIDIOC_QUERYSTD: what should it return?
+
+ Changing this control will result in the V4L2_EVENT_SOURCE_CHANGE
+ to be sent since it emulates a changed input condition (e.g. a cable
+ was plugged in or out).
+
+- Standard:
+
+ selects the standard that VIDIOC_QUERYSTD should return if the
+ previous control is set to "Selected Standard".
+
+ Changing this control will result in the V4L2_EVENT_SOURCE_CHANGE
+ to be sent since it emulates a changed input standard.
+
+
+The following two controls are only valid for video capture.
+
+- DV Timings Signal Mode:
+
+ selects the behavior of VIDIOC_QUERY_DV_TIMINGS: what
+ should it return?
+
+ Changing this control will result in the V4L2_EVENT_SOURCE_CHANGE
+ to be sent since it emulates a changed input condition (e.g. a cable
+ was plugged in or out).
+
+- DV Timings:
+
+ selects the timings the VIDIOC_QUERY_DV_TIMINGS should return
+ if the previous control is set to "Selected DV Timings".
+
+ Changing this control will result in the V4L2_EVENT_SOURCE_CHANGE
+ to be sent since it emulates changed input timings.
+
+
+The following controls are only present if the no_error_inj module option
+is set to 0 (the default). These controls are valid for video and vbi
+capture and output streams and for the SDR capture device except for the
+Disconnect control which is valid for all devices.
+
+- Wrap Sequence Number:
+
+ test what happens when you wrap the sequence number in
+ struct v4l2_buffer around.
+
+- Wrap Timestamp:
+
+ test what happens when you wrap the timestamp in struct
+ v4l2_buffer around.
+
+- Percentage of Dropped Buffers:
+
+ sets the percentage of buffers that
+ are never returned by the driver (i.e., they are dropped).
+
+- Disconnect:
+
+ emulates a USB disconnect. The device will act as if it has
+ been disconnected. Only after all open filehandles to the device
+ node have been closed will the device become 'connected' again.
+
+- Inject V4L2_BUF_FLAG_ERROR:
+
+ when pressed, the next frame returned by
+ the driver will have the error flag set (i.e. the frame is marked
+ corrupt).
+
+- Inject VIDIOC_REQBUFS Error:
+
+ when pressed, the next REQBUFS or CREATE_BUFS
+ ioctl call will fail with an error. To be precise: the videobuf2
+ queue_setup() op will return -EINVAL.
+
+- Inject VIDIOC_QBUF Error:
+
+ when pressed, the next VIDIOC_QBUF or
+ VIDIOC_PREPARE_BUFFER ioctl call will fail with an error. To be
+ precise: the videobuf2 buf_prepare() op will return -EINVAL.
+
+- Inject VIDIOC_STREAMON Error:
+
+ when pressed, the next VIDIOC_STREAMON ioctl
+ call will fail with an error. To be precise: the videobuf2
+ start_streaming() op will return -EINVAL.
+
+- Inject Fatal Streaming Error:
+
+ when pressed, the streaming core will be
+ marked as having suffered a fatal error, the only way to recover
+ from that is to stop streaming. To be precise: the videobuf2
+ vb2_queue_error() function is called.
+
+
+VBI Raw Capture Controls
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+- Interlaced VBI Format:
+
+ if set, then the raw VBI data will be interlaced instead
+ of providing it grouped by field.
+
+
+Digital Video Controls
+~~~~~~~~~~~~~~~~~~~~~~
+
+- Rx RGB Quantization Range:
+
+ sets the RGB quantization detection of the HDMI
+ input. This combines with the Vivid 'Limited RGB Range (16-235)'
+ control and can be used to test what happens if a source provides
+ you with the wrong quantization range information. This can be tested
+ by selecting an HDMI input, setting this control to Full or Limited
+ range and selecting the opposite in the 'Limited RGB Range (16-235)'
+ control. The effect is easy to see if the 'Gray Ramp' test pattern
+ is selected.
+
+- Tx RGB Quantization Range:
+
+ sets the RGB quantization detection of the HDMI
+ output. It is currently not used for anything in vivid, but most HDMI
+ transmitters would typically have this control.
+
+- Transmit Mode:
+
+ sets the transmit mode of the HDMI output to HDMI or DVI-D. This
+ affects the reported colorspace since DVI_D outputs will always use
+ sRGB.
+
+- Display Present:
+
+ sets the presence of a "display" on the HDMI output. This affects
+ the tx_edid_present, tx_hotplug and tx_rxsense controls.
+
+
+FM Radio Receiver Controls
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- RDS Reception:
+
+ set if the RDS receiver should be enabled.
+
+- RDS Program Type:
+
+
+- RDS PS Name:
+
+
+- RDS Radio Text:
+
+
+- RDS Traffic Announcement:
+
+
+- RDS Traffic Program:
+
+
+- RDS Music:
+
+ these are all read-only controls. If RDS Rx I/O Mode is set to
+ "Block I/O", then they are inactive as well. If RDS Rx I/O Mode is set
+ to "Controls", then these controls report the received RDS data.
+
+.. note::
+ The vivid implementation of this is pretty basic: they are only
+ updated when you set a new frequency or when you get the tuner status
+ (VIDIOC_G_TUNER).
+
+- Radio HW Seek Mode:
+
+ can be one of "Bounded", "Wrap Around" or "Both". This
+ determines if VIDIOC_S_HW_FREQ_SEEK will be bounded by the frequency
+ range or wrap-around or if it is selectable by the user.
+
+- Radio Programmable HW Seek:
+
+ if set, then the user can provide the lower and
+ upper bound of the HW Seek. Otherwise the frequency range boundaries
+ will be used.
+
+- Generate RBDS Instead of RDS:
+
+ if set, then generate RBDS (the US variant of
+ RDS) data instead of RDS (European-style RDS). This affects only the
+ PICODE and PTY codes.
+
+- RDS Rx I/O Mode:
+
+ this can be "Block I/O" where the RDS blocks have to be read()
+ by the application, or "Controls" where the RDS data is provided by
+ the RDS controls mentioned above.
+
+
+FM Radio Modulator Controls
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- RDS Program ID:
+
+
+- RDS Program Type:
+
+
+- RDS PS Name:
+
+
+- RDS Radio Text:
+
+
+- RDS Stereo:
+
+
+- RDS Artificial Head:
+
+
+- RDS Compressed:
+
+
+- RDS Dynamic PTY:
+
+
+- RDS Traffic Announcement:
+
+
+- RDS Traffic Program:
+
+
+- RDS Music:
+
+ these are all controls that set the RDS data that is transmitted by
+ the FM modulator.
+
+- RDS Tx I/O Mode:
+
+ this can be "Block I/O" where the application has to use write()
+ to pass the RDS blocks to the driver, or "Controls" where the RDS data
+ is Provided by the RDS controls mentioned above.
+
+Metadata Capture Controls
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- Generate PTS
+
+ if set, then the generated metadata stream contains Presentation timestamp.
+
+- Generate SCR
+
+ if set, then the generated metadata stream contains Source Clock information.
+
+Video, VBI and RDS Looping
+--------------------------
+
+The vivid driver supports looping of video output to video input, VBI output
+to VBI input and RDS output to RDS input. For video/VBI looping this emulates
+as if a cable was hooked up between the output and input connector. So video
+and VBI looping is only supported between S-Video and HDMI inputs and outputs.
+VBI is only valid for S-Video as it makes no sense for HDMI.
+
+Since radio is wireless this looping always happens if the radio receiver
+frequency is close to the radio transmitter frequency. In that case the radio
+transmitter will 'override' the emulated radio stations.
+
+Looping is currently supported only between devices created by the same
+vivid driver instance.
+
+
+Video and Sliced VBI looping
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The way to enable video/VBI looping is currently fairly crude. A 'Loop Video'
+control is available in the "Vivid" control class of the video
+capture and VBI capture devices. When checked the video looping will be enabled.
+Once enabled any video S-Video or HDMI input will show a static test pattern
+until the video output has started. At that time the video output will be
+looped to the video input provided that:
+
+- the input type matches the output type. So the HDMI input cannot receive
+ video from the S-Video output.
+
+- the video resolution of the video input must match that of the video output.
+ So it is not possible to loop a 50 Hz (720x576) S-Video output to a 60 Hz
+ (720x480) S-Video input, or a 720p60 HDMI output to a 1080p30 input.
+
+- the pixel formats must be identical on both sides. Otherwise the driver would
+ have to do pixel format conversion as well, and that's taking things too far.
+
+- the field settings must be identical on both sides. Same reason as above:
+ requiring the driver to convert from one field format to another complicated
+ matters too much. This also prohibits capturing with 'Field Top' or 'Field
+ Bottom' when the output video is set to 'Field Alternate'. This combination,
+ while legal, became too complicated to support. Both sides have to be 'Field
+ Alternate' for this to work. Also note that for this specific case the
+ sequence and field counting in struct v4l2_buffer on the capture side may not
+ be 100% accurate.
+
+- field settings V4L2_FIELD_SEQ_TB/BT are not supported. While it is possible to
+ implement this, it would mean a lot of work to get this right. Since these
+ field values are rarely used the decision was made not to implement this for
+ now.
+
+- on the input side the "Standard Signal Mode" for the S-Video input or the
+ "DV Timings Signal Mode" for the HDMI input should be configured so that a
+ valid signal is passed to the video input.
+
+The framerates do not have to match, although this might change in the future.
+
+By default you will see the OSD text superimposed on top of the looped video.
+This can be turned off by changing the "OSD Text Mode" control of the video
+capture device.
+
+For VBI looping to work all of the above must be valid and in addition the vbi
+output must be configured for sliced VBI. The VBI capture side can be configured
+for either raw or sliced VBI. Note that at the moment only CC/XDS (60 Hz formats)
+and WSS (50 Hz formats) VBI data is looped. Teletext VBI data is not looped.
+
+
+Radio & RDS Looping
+~~~~~~~~~~~~~~~~~~~
+
+As mentioned in section 6 the radio receiver emulates stations are regular
+frequency intervals. Depending on the frequency of the radio receiver a
+signal strength value is calculated (this is returned by VIDIOC_G_TUNER).
+However, it will also look at the frequency set by the radio transmitter and
+if that results in a higher signal strength than the settings of the radio
+transmitter will be used as if it was a valid station. This also includes
+the RDS data (if any) that the transmitter 'transmits'. This is received
+faithfully on the receiver side. Note that when the driver is loaded the
+frequencies of the radio receiver and transmitter are not identical, so
+initially no looping takes place.
+
+
+Cropping, Composing, Scaling
+----------------------------
+
+This driver supports cropping, composing and scaling in any combination. Normally
+which features are supported can be selected through the Vivid controls,
+but it is also possible to hardcode it when the module is loaded through the
+ccs_cap_mode and ccs_out_mode module options. See section 1 on the details of
+these module options.
+
+This allows you to test your application for all these variations.
+
+Note that the webcam input never supports cropping, composing or scaling. That
+only applies to the TV/S-Video/HDMI inputs and outputs. The reason is that
+webcams, including this virtual implementation, normally use
+VIDIOC_ENUM_FRAMESIZES to list a set of discrete framesizes that it supports.
+And that does not combine with cropping, composing or scaling. This is
+primarily a limitation of the V4L2 API which is carefully reproduced here.
+
+The minimum and maximum resolutions that the scaler can achieve are 16x16 and
+(4096 * 4) x (2160 x 4), but it can only scale up or down by a factor of 4 or
+less. So for a source resolution of 1280x720 the minimum the scaler can do is
+320x180 and the maximum is 5120x2880. You can play around with this using the
+qv4l2 test tool and you will see these dependencies.
+
+This driver also supports larger 'bytesperline' settings, something that
+VIDIOC_S_FMT allows but that few drivers implement.
+
+The scaler is a simple scaler that uses the Coarse Bresenham algorithm. It's
+designed for speed and simplicity, not quality.
+
+If the combination of crop, compose and scaling allows it, then it is possible
+to change crop and compose rectangles on the fly.
+
+
+Formats
+-------
+
+The driver supports all the regular packed and planar 4:4:4, 4:2:2 and 4:2:0
+YUYV formats, 8, 16, 24 and 32 RGB packed formats and various multiplanar
+formats.
+
+The alpha component can be set through the 'Alpha Component' User control
+for those formats that support it. If the 'Apply Alpha To Red Only' control
+is set, then the alpha component is only used for the color red and set to
+0 otherwise.
+
+The driver has to be configured to support the multiplanar formats. By default
+the driver instances are single-planar. This can be changed by setting the
+multiplanar module option, see section 1 for more details on that option.
+
+If the driver instance is using the multiplanar formats/API, then the first
+single planar format (YUYV) and the multiplanar NV16M and NV61M formats the
+will have a plane that has a non-zero data_offset of 128 bytes. It is rare for
+data_offset to be non-zero, so this is a useful feature for testing applications.
+
+Video output will also honor any data_offset that the application set.
+
+
+Capture Overlay
+---------------
+
+Note: capture overlay support is implemented primarily to test the existing
+V4L2 capture overlay API. In practice few if any GPUs support such overlays
+anymore, and neither are they generally needed anymore since modern hardware
+is so much more capable. By setting flag 0x10000 in the node_types module
+option the vivid driver will create a simple framebuffer device that can be
+used for testing this API. Whether this API should be used for new drivers is
+questionable.
+
+This driver has support for a destructive capture overlay with bitmap clipping
+and list clipping (up to 16 rectangles) capabilities. Overlays are not
+supported for multiplanar formats. It also honors the struct v4l2_window field
+setting: if it is set to FIELD_TOP or FIELD_BOTTOM and the capture setting is
+FIELD_ALTERNATE, then only the top or bottom fields will be copied to the overlay.
+
+The overlay only works if you are also capturing at that same time. This is a
+vivid limitation since it copies from a buffer to the overlay instead of
+filling the overlay directly. And if you are not capturing, then no buffers
+are available to fill.
+
+In addition, the pixelformat of the capture format and that of the framebuffer
+must be the same for the overlay to work. Otherwise VIDIOC_OVERLAY will return
+an error.
+
+In order to really see what it going on you will need to create two vivid
+instances: the first with a framebuffer enabled. You configure the capture
+overlay of the second instance to use the framebuffer of the first, then
+you start capturing in the second instance. For the first instance you setup
+the output overlay for the video output, turn on video looping and capture
+to see the blended framebuffer overlay that's being written to by the second
+instance. This setup would require the following commands:
+
+.. code-block:: none
+
+ $ sudo modprobe vivid n_devs=2 node_types=0x10101,0x1
+ $ v4l2-ctl -d1 --find-fb
+ /dev/fb1 is the framebuffer associated with base address 0x12800000
+ $ sudo v4l2-ctl -d2 --set-fbuf fb=1
+ $ v4l2-ctl -d1 --set-fbuf fb=1
+ $ v4l2-ctl -d0 --set-fmt-video=pixelformat='AR15'
+ $ v4l2-ctl -d1 --set-fmt-video-out=pixelformat='AR15'
+ $ v4l2-ctl -d2 --set-fmt-video=pixelformat='AR15'
+ $ v4l2-ctl -d0 -i2
+ $ v4l2-ctl -d2 -i2
+ $ v4l2-ctl -d2 -c horizontal_movement=4
+ $ v4l2-ctl -d1 --overlay=1
+ $ v4l2-ctl -d0 -c loop_video=1
+ $ v4l2-ctl -d2 --stream-mmap --overlay=1
+
+And from another console:
+
+.. code-block:: none
+
+ $ v4l2-ctl -d1 --stream-out-mmap
+
+And yet another console:
+
+.. code-block:: none
+
+ $ qv4l2
+
+and start streaming.
+
+As you can see, this is not for the faint of heart...
+
+
+Output Overlay
+--------------
+
+Note: output overlays are primarily implemented in order to test the existing
+V4L2 output overlay API. Whether this API should be used for new drivers is
+questionable.
+
+This driver has support for an output overlay and is capable of:
+
+ - bitmap clipping,
+ - list clipping (up to 16 rectangles)
+ - chromakey
+ - source chromakey
+ - global alpha
+ - local alpha
+ - local inverse alpha
+
+Output overlays are not supported for multiplanar formats. In addition, the
+pixelformat of the capture format and that of the framebuffer must be the
+same for the overlay to work. Otherwise VIDIOC_OVERLAY will return an error.
+
+Output overlays only work if the driver has been configured to create a
+framebuffer by setting flag 0x10000 in the node_types module option. The
+created framebuffer has a size of 720x576 and supports ARGB 1:5:5:5 and
+RGB 5:6:5.
+
+In order to see the effects of the various clipping, chromakeying or alpha
+processing capabilities you need to turn on video looping and see the results
+on the capture side. The use of the clipping, chromakeying or alpha processing
+capabilities will slow down the video loop considerably as a lot of checks have
+to be done per pixel.
+
+
+CEC (Consumer Electronics Control)
+----------------------------------
+
+If there are HDMI inputs then a CEC adapter will be created that has
+the same number of input ports. This is the equivalent of e.g. a TV that
+has that number of inputs. Each HDMI output will also create a
+CEC adapter that is hooked up to the corresponding input port, or (if there
+are more outputs than inputs) is not hooked up at all. In other words,
+this is the equivalent of hooking up each output device to an input port of
+the TV. Any remaining output devices remain unconnected.
+
+The EDID that each output reads reports a unique CEC physical address that is
+based on the physical address of the EDID of the input. So if the EDID of the
+receiver has physical address A.B.0.0, then each output will see an EDID
+containing physical address A.B.C.0 where C is 1 to the number of inputs. If
+there are more outputs than inputs then the remaining outputs have a CEC adapter
+that is disabled and reports an invalid physical address.
+
+
+Some Future Improvements
+------------------------
+
+Just as a reminder and in no particular order:
+
+- Add a virtual alsa driver to test audio
+- Add virtual sub-devices and media controller support
+- Some support for testing compressed video
+- Add support to loop raw VBI output to raw VBI input
+- Add support to loop teletext sliced VBI output to VBI input
+- Fix sequence/field numbering when looping of video with alternate fields
+- Add support for V4L2_CID_BG_COLOR for video outputs
+- Add ARGB888 overlay support: better testing of the alpha channel
+- Improve pixel aspect support in the tpg code by passing a real v4l2_fract
+- Use per-queue locks and/or per-device locks to improve throughput
+- Add support to loop from a specific output to a specific input across
+ vivid instances
+- The SDR radio should use the same 'frequencies' for stations as the normal
+ radio receiver, and give back noise if the frequency doesn't match up with
+ a station frequency
+- Make a thread for the RDS generation, that would help in particular for the
+ "Controls" RDS Rx I/O Mode as the read-only RDS controls could be updated
+ in real-time.
+- Changing the EDID should cause hotplug detect emulation to happen.
diff --git a/Documentation/admin-guide/media/zoran-cardlist.rst b/Documentation/admin-guide/media/zoran-cardlist.rst
new file mode 100644
index 000000000000..d7fc8bed62ff
--- /dev/null
+++ b/Documentation/admin-guide/media/zoran-cardlist.rst
@@ -0,0 +1,51 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Zoran cards list
+================
+
+.. tabularcolumns:: |p{1.4cm}|p{11.1cm}|p{4.2cm}|
+
+.. flat-table::
+ :header-rows: 1
+ :widths: 2 19 18
+ :stub-columns: 0
+
+ * - Card number
+ - Card name
+ - PCI subsystem IDs
+
+ * - 0
+ - DC10(old)
+ - <any>
+
+ * - 1
+ - DC10(new)
+ - <any>
+
+ * - 2
+ - DC10_PLUS
+ - 1031:7efe
+
+ * - 3
+ - DC30
+ - <any>
+
+ * - 4
+ - DC30_PLUS
+ - 1031:d801
+
+ * - 5
+ - LML33
+ - <any>
+
+ * - 6
+ - LML33R10
+ - 12f8:8a02
+
+ * - 7
+ - Buz
+ - 13ca:4231
+
+ * - 8
+ - 6-Eyes
+ - <any>
diff --git a/Documentation/admin-guide/mm/cma_debugfs.rst b/Documentation/admin-guide/mm/cma_debugfs.rst
index 4e06ffabd78a..7367e6294ef6 100644
--- a/Documentation/admin-guide/mm/cma_debugfs.rst
+++ b/Documentation/admin-guide/mm/cma_debugfs.rst
@@ -5,10 +5,10 @@ CMA Debugfs Interface
The CMA debugfs interface is useful to retrieve basic information out of the
different CMA areas and to test allocation/release in each of the areas.
-Each CMA zone represents a directory under <debugfs>/cma/, indexed by the
-kernel's CMA index. So the first CMA zone would be:
+Each CMA area represents a directory under <debugfs>/cma/, represented by
+its CMA name like below:
- <debugfs>/cma/cma-0
+ <debugfs>/cma/<cma_name>
The structure of the files created under that directory is as follows:
@@ -18,8 +18,8 @@ The structure of the files created under that directory is as follows:
- [RO] bitmap: The bitmap of page states in the zone.
- [WO] alloc: Allocate N pages from that CMA area. For example::
- echo 5 > <debugfs>/cma/cma-2/alloc
+ echo 5 > <debugfs>/cma/<cma_name>/alloc
-would try to allocate 5 pages from the cma-2 area.
+would try to allocate 5 pages from the 'cma_name' area.
- [WO] free: Free N pages from that CMA area, similar to the above.
diff --git a/Documentation/admin-guide/mm/concepts.rst b/Documentation/admin-guide/mm/concepts.rst
index c2531b14bf46..e796b0a7e4a5 100644
--- a/Documentation/admin-guide/mm/concepts.rst
+++ b/Documentation/admin-guide/mm/concepts.rst
@@ -1,5 +1,3 @@
-.. _mm_concepts:
-
=================
Concepts overview
=================
@@ -35,7 +33,7 @@ physical memory (demand paging) and provides a mechanism for the
protection and controlled sharing of data between processes.
With virtual memory, each and every memory access uses a virtual
-address. When the CPU decodes the an instruction that reads (or
+address. When the CPU decodes an instruction that reads (or
writes) from (or to) the system memory, it translates the `virtual`
address encoded in that instruction to a `physical` address that the
memory controller can understand.
@@ -86,16 +84,15 @@ memory with the huge pages. The first one is `HugeTLB filesystem`, or
hugetlbfs. It is a pseudo filesystem that uses RAM as its backing
store. For the files created in this filesystem the data resides in
the memory and mapped using huge pages. The hugetlbfs is described at
-:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`.
+Documentation/admin-guide/mm/hugetlbpage.rst.
Another, more recent, mechanism that enables use of the huge pages is
called `Transparent HugePages`, or THP. Unlike the hugetlbfs that
requires users and/or system administrators to configure what parts of
the system memory should and can be mapped by the huge pages, THP
manages such mappings transparently to the user and hence the
-name. See
-:ref:`Documentation/admin-guide/mm/transhuge.rst <admin_guide_transhuge>`
-for more details about THP.
+name. See Documentation/admin-guide/mm/transhuge.rst for more details
+about THP.
Zones
=====
@@ -125,8 +122,8 @@ processor. Each bank is referred to as a `node` and for each node Linux
constructs an independent memory management subsystem. A node has its
own set of zones, lists of free and used pages and various statistics
counters. You can find more details about NUMA in
-:ref:`Documentation/vm/numa.rst <numa>` and in
-:ref:`Documentation/admin-guide/mm/numa_memory_policy.rst <numa_memory_policy>`.
+Documentation/mm/numa.rst` and in
+Documentation/admin-guide/mm/numa_memory_policy.rst.
Page cache
==========
@@ -184,7 +181,7 @@ pages either asynchronously or synchronously, depending on the state
of the system. When the system is not loaded, most of the memory is free
and allocation requests will be satisfied immediately from the free
pages supply. As the load increases, the amount of the free pages goes
-down and when it reaches a certain threshold (high watermark), an
+down and when it reaches a certain threshold (low watermark), an
allocation request will awaken the ``kswapd`` daemon. It will
asynchronously scan memory pages and either just free them if the data
they contain is available elsewhere, or evict to the backing storage
diff --git a/Documentation/admin-guide/mm/damon/index.rst b/Documentation/admin-guide/mm/damon/index.rst
new file mode 100644
index 000000000000..33d37bb2fb4e
--- /dev/null
+++ b/Documentation/admin-guide/mm/damon/index.rst
@@ -0,0 +1,17 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================
+DAMON: Data Access MONitor
+==========================
+
+:doc:`DAMON </mm/damon/index>` allows light-weight data access monitoring.
+Using DAMON, users can analyze the memory access patterns of their systems and
+optimize those.
+
+.. toctree::
+ :maxdepth: 2
+
+ start
+ usage
+ reclaim
+ lru_sort
diff --git a/Documentation/admin-guide/mm/damon/lru_sort.rst b/Documentation/admin-guide/mm/damon/lru_sort.rst
new file mode 100644
index 000000000000..7b0775d281b4
--- /dev/null
+++ b/Documentation/admin-guide/mm/damon/lru_sort.rst
@@ -0,0 +1,294 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============================
+DAMON-based LRU-lists Sorting
+=============================
+
+DAMON-based LRU-lists Sorting (DAMON_LRU_SORT) is a static kernel module that
+aimed to be used for proactive and lightweight data access pattern based
+(de)prioritization of pages on their LRU-lists for making LRU-lists a more
+trusworthy data access pattern source.
+
+Where Proactive LRU-lists Sorting is Required?
+==============================================
+
+As page-granularity access checking overhead could be significant on huge
+systems, LRU lists are normally not proactively sorted but partially and
+reactively sorted for special events including specific user requests, system
+calls and memory pressure. As a result, LRU lists are sometimes not so
+perfectly prepared to be used as a trustworthy access pattern source for some
+situations including reclamation target pages selection under sudden memory
+pressure.
+
+Because DAMON can identify access patterns of best-effort accuracy while
+inducing only user-specified range of overhead, proactively running
+DAMON_LRU_SORT could be helpful for making LRU lists more trustworthy access
+pattern source with low and controlled overhead.
+
+How It Works?
+=============
+
+DAMON_LRU_SORT finds hot pages (pages of memory regions that showing access
+rates that higher than a user-specified threshold) and cold pages (pages of
+memory regions that showing no access for a time that longer than a
+user-specified threshold) using DAMON, and prioritizes hot pages while
+deprioritizing cold pages on their LRU-lists. To avoid it consuming too much
+CPU for the prioritizations, a CPU time usage limit can be configured. Under
+the limit, it prioritizes and deprioritizes more hot and cold pages first,
+respectively. System administrators can also configure under what situation
+this scheme should automatically activated and deactivated with three memory
+pressure watermarks.
+
+Its default parameters for hotness/coldness thresholds and CPU quota limit are
+conservatively chosen. That is, the module under its default parameters could
+be widely used without harm for common situations while providing a level of
+benefits for systems having clear hot/cold access patterns under memory
+pressure while consuming only a limited small portion of CPU time.
+
+Interface: Module Parameters
+============================
+
+To use this feature, you should first ensure your system is running on a kernel
+that is built with ``CONFIG_DAMON_LRU_SORT=y``.
+
+To let sysadmins enable or disable it and tune for the given system,
+DAMON_LRU_SORT utilizes module parameters. That is, you can put
+``damon_lru_sort.<parameter>=<value>`` on the kernel boot command line or write
+proper values to ``/sys/module/damon_lru_sort/parameters/<parameter>`` files.
+
+Below are the description of each parameter.
+
+enabled
+-------
+
+Enable or disable DAMON_LRU_SORT.
+
+You can enable DAMON_LRU_SORT by setting the value of this parameter as ``Y``.
+Setting it as ``N`` disables DAMON_LRU_SORT. Note that DAMON_LRU_SORT could do
+no real monitoring and LRU-lists sorting due to the watermarks-based activation
+condition. Refer to below descriptions for the watermarks parameter for this.
+
+commit_inputs
+-------------
+
+Make DAMON_LRU_SORT reads the input parameters again, except ``enabled``.
+
+Input parameters that updated while DAMON_LRU_SORT is running are not applied
+by default. Once this parameter is set as ``Y``, DAMON_LRU_SORT reads values
+of parametrs except ``enabled`` again. Once the re-reading is done, this
+parameter is set as ``N``. If invalid parameters are found while the
+re-reading, DAMON_LRU_SORT will be disabled.
+
+hot_thres_access_freq
+---------------------
+
+Access frequency threshold for hot memory regions identification in permil.
+
+If a memory region is accessed in frequency of this or higher, DAMON_LRU_SORT
+identifies the region as hot, and mark it as accessed on the LRU list, so that
+it could not be reclaimed under memory pressure. 50% by default.
+
+cold_min_age
+------------
+
+Time threshold for cold memory regions identification in microseconds.
+
+If a memory region is not accessed for this or longer time, DAMON_LRU_SORT
+identifies the region as cold, and mark it as unaccessed on the LRU list, so
+that it could be reclaimed first under memory pressure. 120 seconds by
+default.
+
+quota_ms
+--------
+
+Limit of time for trying the LRU lists sorting in milliseconds.
+
+DAMON_LRU_SORT tries to use only up to this time within a time window
+(quota_reset_interval_ms) for trying LRU lists sorting. This can be used
+for limiting CPU consumption of DAMON_LRU_SORT. If the value is zero, the
+limit is disabled.
+
+10 ms by default.
+
+quota_reset_interval_ms
+-----------------------
+
+The time quota charge reset interval in milliseconds.
+
+The charge reset interval for the quota of time (quota_ms). That is,
+DAMON_LRU_SORT does not try LRU-lists sorting for more than quota_ms
+milliseconds or quota_sz bytes within quota_reset_interval_ms milliseconds.
+
+1 second by default.
+
+wmarks_interval
+---------------
+
+The watermarks check time interval in microseconds.
+
+Minimal time to wait before checking the watermarks, when DAMON_LRU_SORT is
+enabled but inactive due to its watermarks rule. 5 seconds by default.
+
+wmarks_high
+-----------
+
+Free memory rate (per thousand) for the high watermark.
+
+If free memory of the system in bytes per thousand bytes is higher than this,
+DAMON_LRU_SORT becomes inactive, so it does nothing but periodically checks the
+watermarks. 200 (20%) by default.
+
+wmarks_mid
+----------
+
+Free memory rate (per thousand) for the middle watermark.
+
+If free memory of the system in bytes per thousand bytes is between this and
+the low watermark, DAMON_LRU_SORT becomes active, so starts the monitoring and
+the LRU-lists sorting. 150 (15%) by default.
+
+wmarks_low
+----------
+
+Free memory rate (per thousand) for the low watermark.
+
+If free memory of the system in bytes per thousand bytes is lower than this,
+DAMON_LRU_SORT becomes inactive, so it does nothing but periodically checks the
+watermarks. 50 (5%) by default.
+
+sample_interval
+---------------
+
+Sampling interval for the monitoring in microseconds.
+
+The sampling interval of DAMON for the cold memory monitoring. Please refer to
+the DAMON documentation (:doc:`usage`) for more detail. 5ms by default.
+
+aggr_interval
+-------------
+
+Aggregation interval for the monitoring in microseconds.
+
+The aggregation interval of DAMON for the cold memory monitoring. Please
+refer to the DAMON documentation (:doc:`usage`) for more detail. 100ms by
+default.
+
+min_nr_regions
+--------------
+
+Minimum number of monitoring regions.
+
+The minimal number of monitoring regions of DAMON for the cold memory
+monitoring. This can be used to set lower-bound of the monitoring quality.
+But, setting this too high could result in increased monitoring overhead.
+Please refer to the DAMON documentation (:doc:`usage`) for more detail. 10 by
+default.
+
+max_nr_regions
+--------------
+
+Maximum number of monitoring regions.
+
+The maximum number of monitoring regions of DAMON for the cold memory
+monitoring. This can be used to set upper-bound of the monitoring overhead.
+However, setting this too low could result in bad monitoring quality. Please
+refer to the DAMON documentation (:doc:`usage`) for more detail. 1000 by
+defaults.
+
+monitor_region_start
+--------------------
+
+Start of target memory region in physical address.
+
+The start physical address of memory region that DAMON_LRU_SORT will do work
+against. By default, biggest System RAM is used as the region.
+
+monitor_region_end
+------------------
+
+End of target memory region in physical address.
+
+The end physical address of memory region that DAMON_LRU_SORT will do work
+against. By default, biggest System RAM is used as the region.
+
+kdamond_pid
+-----------
+
+PID of the DAMON thread.
+
+If DAMON_LRU_SORT is enabled, this becomes the PID of the worker thread. Else,
+-1.
+
+nr_lru_sort_tried_hot_regions
+-----------------------------
+
+Number of hot memory regions that tried to be LRU-sorted.
+
+bytes_lru_sort_tried_hot_regions
+--------------------------------
+
+Total bytes of hot memory regions that tried to be LRU-sorted.
+
+nr_lru_sorted_hot_regions
+-------------------------
+
+Number of hot memory regions that successfully be LRU-sorted.
+
+bytes_lru_sorted_hot_regions
+----------------------------
+
+Total bytes of hot memory regions that successfully be LRU-sorted.
+
+nr_hot_quota_exceeds
+--------------------
+
+Number of times that the time quota limit for hot regions have exceeded.
+
+nr_lru_sort_tried_cold_regions
+------------------------------
+
+Number of cold memory regions that tried to be LRU-sorted.
+
+bytes_lru_sort_tried_cold_regions
+---------------------------------
+
+Total bytes of cold memory regions that tried to be LRU-sorted.
+
+nr_lru_sorted_cold_regions
+--------------------------
+
+Number of cold memory regions that successfully be LRU-sorted.
+
+bytes_lru_sorted_cold_regions
+-----------------------------
+
+Total bytes of cold memory regions that successfully be LRU-sorted.
+
+nr_cold_quota_exceeds
+---------------------
+
+Number of times that the time quota limit for cold regions have exceeded.
+
+Example
+=======
+
+Below runtime example commands make DAMON_LRU_SORT to find memory regions
+having >=50% access frequency and LRU-prioritize while LRU-deprioritizing
+memory regions that not accessed for 120 seconds. The prioritization and
+deprioritization is limited to be done using only up to 1% CPU time to avoid
+DAMON_LRU_SORT consuming too much CPU time for the (de)prioritization. It also
+asks DAMON_LRU_SORT to do nothing if the system's free memory rate is more than
+50%, but start the real works if it becomes lower than 40%. If DAMON_RECLAIM
+doesn't make progress and therefore the free memory rate becomes lower than
+20%, it asks DAMON_LRU_SORT to do nothing again, so that we can fall back to
+the LRU-list based page granularity reclamation. ::
+
+ # cd /sys/module/damon_lru_sort/parameters
+ # echo 500 > hot_thres_access_freq
+ # echo 120000000 > cold_min_age
+ # echo 10 > quota_ms
+ # echo 1000 > quota_reset_interval_ms
+ # echo 500 > wmarks_high
+ # echo 400 > wmarks_mid
+ # echo 200 > wmarks_low
+ # echo Y > enabled
diff --git a/Documentation/admin-guide/mm/damon/reclaim.rst b/Documentation/admin-guide/mm/damon/reclaim.rst
new file mode 100644
index 000000000000..343e25b252f4
--- /dev/null
+++ b/Documentation/admin-guide/mm/damon/reclaim.rst
@@ -0,0 +1,274 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+DAMON-based Reclamation
+=======================
+
+DAMON-based Reclamation (DAMON_RECLAIM) is a static kernel module that aimed to
+be used for proactive and lightweight reclamation under light memory pressure.
+It doesn't aim to replace the LRU-list based page_granularity reclamation, but
+to be selectively used for different level of memory pressure and requirements.
+
+Where Proactive Reclamation is Required?
+========================================
+
+On general memory over-committed systems, proactively reclaiming cold pages
+helps saving memory and reducing latency spikes that incurred by the direct
+reclaim of the process or CPU consumption of kswapd, while incurring only
+minimal performance degradation [1]_ [2]_ .
+
+Free Pages Reporting [3]_ based memory over-commit virtualization systems are
+good example of the cases. In such systems, the guest VMs reports their free
+memory to host, and the host reallocates the reported memory to other guests.
+As a result, the memory of the systems are fully utilized. However, the
+guests could be not so memory-frugal, mainly because some kernel subsystems and
+user-space applications are designed to use as much memory as available. Then,
+guests could report only small amount of memory as free to host, results in
+memory utilization drop of the systems. Running the proactive reclamation in
+guests could mitigate this problem.
+
+How It Works?
+=============
+
+DAMON_RECLAIM finds memory regions that didn't accessed for specific time
+duration and page out. To avoid it consuming too much CPU for the paging out
+operation, a speed limit can be configured. Under the speed limit, it pages
+out memory regions that didn't accessed longer time first. System
+administrators can also configure under what situation this scheme should
+automatically activated and deactivated with three memory pressure watermarks.
+
+Interface: Module Parameters
+============================
+
+To use this feature, you should first ensure your system is running on a kernel
+that is built with ``CONFIG_DAMON_RECLAIM=y``.
+
+To let sysadmins enable or disable it and tune for the given system,
+DAMON_RECLAIM utilizes module parameters. That is, you can put
+``damon_reclaim.<parameter>=<value>`` on the kernel boot command line or write
+proper values to ``/sys/module/damon_reclaim/parameters/<parameter>`` files.
+
+Below are the description of each parameter.
+
+enabled
+-------
+
+Enable or disable DAMON_RECLAIM.
+
+You can enable DAMON_RCLAIM by setting the value of this parameter as ``Y``.
+Setting it as ``N`` disables DAMON_RECLAIM. Note that DAMON_RECLAIM could do
+no real monitoring and reclamation due to the watermarks-based activation
+condition. Refer to below descriptions for the watermarks parameter for this.
+
+commit_inputs
+-------------
+
+Make DAMON_RECLAIM reads the input parameters again, except ``enabled``.
+
+Input parameters that updated while DAMON_RECLAIM is running are not applied
+by default. Once this parameter is set as ``Y``, DAMON_RECLAIM reads values
+of parametrs except ``enabled`` again. Once the re-reading is done, this
+parameter is set as ``N``. If invalid parameters are found while the
+re-reading, DAMON_RECLAIM will be disabled.
+
+min_age
+-------
+
+Time threshold for cold memory regions identification in microseconds.
+
+If a memory region is not accessed for this or longer time, DAMON_RECLAIM
+identifies the region as cold, and reclaims it.
+
+120 seconds by default.
+
+quota_ms
+--------
+
+Limit of time for the reclamation in milliseconds.
+
+DAMON_RECLAIM tries to use only up to this time within a time window
+(quota_reset_interval_ms) for trying reclamation of cold pages. This can be
+used for limiting CPU consumption of DAMON_RECLAIM. If the value is zero, the
+limit is disabled.
+
+10 ms by default.
+
+quota_sz
+--------
+
+Limit of size of memory for the reclamation in bytes.
+
+DAMON_RECLAIM charges amount of memory which it tried to reclaim within a time
+window (quota_reset_interval_ms) and makes no more than this limit is tried.
+This can be used for limiting consumption of CPU and IO. If this value is
+zero, the limit is disabled.
+
+128 MiB by default.
+
+quota_reset_interval_ms
+-----------------------
+
+The time/size quota charge reset interval in milliseconds.
+
+The charget reset interval for the quota of time (quota_ms) and size
+(quota_sz). That is, DAMON_RECLAIM does not try reclamation for more than
+quota_ms milliseconds or quota_sz bytes within quota_reset_interval_ms
+milliseconds.
+
+1 second by default.
+
+wmarks_interval
+---------------
+
+Minimal time to wait before checking the watermarks, when DAMON_RECLAIM is
+enabled but inactive due to its watermarks rule.
+
+wmarks_high
+-----------
+
+Free memory rate (per thousand) for the high watermark.
+
+If free memory of the system in bytes per thousand bytes is higher than this,
+DAMON_RECLAIM becomes inactive, so it does nothing but only periodically checks
+the watermarks.
+
+wmarks_mid
+----------
+
+Free memory rate (per thousand) for the middle watermark.
+
+If free memory of the system in bytes per thousand bytes is between this and
+the low watermark, DAMON_RECLAIM becomes active, so starts the monitoring and
+the reclaiming.
+
+wmarks_low
+----------
+
+Free memory rate (per thousand) for the low watermark.
+
+If free memory of the system in bytes per thousand bytes is lower than this,
+DAMON_RECLAIM becomes inactive, so it does nothing but periodically checks the
+watermarks. In the case, the system falls back to the LRU-list based page
+granularity reclamation logic.
+
+sample_interval
+---------------
+
+Sampling interval for the monitoring in microseconds.
+
+The sampling interval of DAMON for the cold memory monitoring. Please refer to
+the DAMON documentation (:doc:`usage`) for more detail.
+
+aggr_interval
+-------------
+
+Aggregation interval for the monitoring in microseconds.
+
+The aggregation interval of DAMON for the cold memory monitoring. Please
+refer to the DAMON documentation (:doc:`usage`) for more detail.
+
+min_nr_regions
+--------------
+
+Minimum number of monitoring regions.
+
+The minimal number of monitoring regions of DAMON for the cold memory
+monitoring. This can be used to set lower-bound of the monitoring quality.
+But, setting this too high could result in increased monitoring overhead.
+Please refer to the DAMON documentation (:doc:`usage`) for more detail.
+
+max_nr_regions
+--------------
+
+Maximum number of monitoring regions.
+
+The maximum number of monitoring regions of DAMON for the cold memory
+monitoring. This can be used to set upper-bound of the monitoring overhead.
+However, setting this too low could result in bad monitoring quality. Please
+refer to the DAMON documentation (:doc:`usage`) for more detail.
+
+monitor_region_start
+--------------------
+
+Start of target memory region in physical address.
+
+The start physical address of memory region that DAMON_RECLAIM will do work
+against. That is, DAMON_RECLAIM will find cold memory regions in this region
+and reclaims. By default, biggest System RAM is used as the region.
+
+monitor_region_end
+------------------
+
+End of target memory region in physical address.
+
+The end physical address of memory region that DAMON_RECLAIM will do work
+against. That is, DAMON_RECLAIM will find cold memory regions in this region
+and reclaims. By default, biggest System RAM is used as the region.
+
+skip_anon
+---------
+
+Skip anonymous pages reclamation.
+
+If this parameter is set as ``Y``, DAMON_RECLAIM does not reclaim anonymous
+pages. By default, ``N``.
+
+
+kdamond_pid
+-----------
+
+PID of the DAMON thread.
+
+If DAMON_RECLAIM is enabled, this becomes the PID of the worker thread. Else,
+-1.
+
+nr_reclaim_tried_regions
+------------------------
+
+Number of memory regions that tried to be reclaimed by DAMON_RECLAIM.
+
+bytes_reclaim_tried_regions
+---------------------------
+
+Total bytes of memory regions that tried to be reclaimed by DAMON_RECLAIM.
+
+nr_reclaimed_regions
+--------------------
+
+Number of memory regions that successfully be reclaimed by DAMON_RECLAIM.
+
+bytes_reclaimed_regions
+-----------------------
+
+Total bytes of memory regions that successfully be reclaimed by DAMON_RECLAIM.
+
+nr_quota_exceeds
+----------------
+
+Number of times that the time/space quota limits have exceeded.
+
+Example
+=======
+
+Below runtime example commands make DAMON_RECLAIM to find memory regions that
+not accessed for 30 seconds or more and pages out. The reclamation is limited
+to be done only up to 1 GiB per second to avoid DAMON_RECLAIM consuming too
+much CPU time for the paging out operation. It also asks DAMON_RECLAIM to do
+nothing if the system's free memory rate is more than 50%, but start the real
+works if it becomes lower than 40%. If DAMON_RECLAIM doesn't make progress and
+therefore the free memory rate becomes lower than 20%, it asks DAMON_RECLAIM to
+do nothing again, so that we can fall back to the LRU-list based page
+granularity reclamation. ::
+
+ # cd /sys/module/damon_reclaim/parameters
+ # echo 30000000 > min_age
+ # echo $((1 * 1024 * 1024 * 1024)) > quota_sz
+ # echo 1000 > quota_reset_interval_ms
+ # echo 500 > wmarks_high
+ # echo 400 > wmarks_mid
+ # echo 200 > wmarks_low
+ # echo Y > enabled
+
+.. [1] https://research.google/pubs/pub48551/
+.. [2] https://lwn.net/Articles/787611/
+.. [3] https://www.kernel.org/doc/html/latest/mm/free_page_reporting.html
diff --git a/Documentation/admin-guide/mm/damon/start.rst b/Documentation/admin-guide/mm/damon/start.rst
new file mode 100644
index 000000000000..7aa0071ff1c3
--- /dev/null
+++ b/Documentation/admin-guide/mm/damon/start.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+Getting Started
+===============
+
+This document briefly describes how you can use DAMON by demonstrating its
+default user space tool. Please note that this document describes only a part
+of its features for brevity. Please refer to the usage `doc
+<https://github.com/awslabs/damo/blob/next/USAGE.md>`_ of the tool for more
+details.
+
+
+Prerequisites
+=============
+
+Kernel
+------
+
+You should first ensure your system is running on a kernel built with
+``CONFIG_DAMON_*=y``.
+
+
+User Space Tool
+---------------
+
+For the demonstration, we will use the default user space tool for DAMON,
+called DAMON Operator (DAMO). It is available at
+https://github.com/awslabs/damo. The examples below assume that ``damo`` is on
+your ``$PATH``. It's not mandatory, though.
+
+Because DAMO is using the sysfs interface (refer to :doc:`usage` for the
+detail) of DAMON, you should ensure :doc:`sysfs </filesystems/sysfs>` is
+mounted.
+
+
+Recording Data Access Patterns
+==============================
+
+The commands below record the memory access patterns of a program and save the
+monitoring results to a file. ::
+
+ $ git clone https://github.com/sjp38/masim
+ $ cd masim; make; ./masim ./configs/zigzag.cfg &
+ $ sudo damo record -o damon.data $(pidof masim)
+
+The first two lines of the commands download an artificial memory access
+generator program and run it in the background. The generator will repeatedly
+access two 100 MiB sized memory regions one by one. You can substitute this
+with your real workload. The last line asks ``damo`` to record the access
+pattern in the ``damon.data`` file.
+
+
+Visualizing Recorded Patterns
+=============================
+
+You can visualize the pattern in a heatmap, showing which memory region
+(x-axis) got accessed when (y-axis) and how frequently (number).::
+
+ $ sudo damo report heats --heatmap stdout
+ 22222222222222222222222222222222222222211111111111111111111111111111111111111100
+ 44444444444444444444444444444444444444434444444444444444444444444444444444443200
+ 44444444444444444444444444444444444444433444444444444444444444444444444444444200
+ 33333333333333333333333333333333333333344555555555555555555555555555555555555200
+ 33333333333333333333333333333333333344444444444444444444444444444444444444444200
+ 22222222222222222222222222222222222223355555555555555555555555555555555555555200
+ 00000000000000000000000000000000000000288888888888888888888888888888888888888400
+ 00000000000000000000000000000000000000288888888888888888888888888888888888888400
+ 33333333333333333333333333333333333333355555555555555555555555555555555555555200
+ 88888888888888888888888888888888888888600000000000000000000000000000000000000000
+ 88888888888888888888888888888888888888600000000000000000000000000000000000000000
+ 33333333333333333333333333333333333333444444444444444444444444444444444444443200
+ 00000000000000000000000000000000000000288888888888888888888888888888888888888400
+ [...]
+ # access_frequency: 0 1 2 3 4 5 6 7 8 9
+ # x-axis: space (139728247021568-139728453431248: 196.848 MiB)
+ # y-axis: time (15256597248362-15326899978162: 1 m 10.303 s)
+ # resolution: 80x40 (2.461 MiB and 1.758 s for each character)
+
+You can also visualize the distribution of the working set size, sorted by the
+size.::
+
+ $ sudo damo report wss --range 0 101 10
+ # <percentile> <wss>
+ # target_id 18446632103789443072
+ # avr: 107.708 MiB
+ 0 0 B | |
+ 10 95.328 MiB |**************************** |
+ 20 95.332 MiB |**************************** |
+ 30 95.340 MiB |**************************** |
+ 40 95.387 MiB |**************************** |
+ 50 95.387 MiB |**************************** |
+ 60 95.398 MiB |**************************** |
+ 70 95.398 MiB |**************************** |
+ 80 95.504 MiB |**************************** |
+ 90 190.703 MiB |********************************************************* |
+ 100 196.875 MiB |***********************************************************|
+
+Using ``--sortby`` option with the above command, you can show how the working
+set size has chronologically changed.::
+
+ $ sudo damo report wss --range 0 101 10 --sortby time
+ # <percentile> <wss>
+ # target_id 18446632103789443072
+ # avr: 107.708 MiB
+ 0 3.051 MiB | |
+ 10 190.703 MiB |***********************************************************|
+ 20 95.336 MiB |***************************** |
+ 30 95.328 MiB |***************************** |
+ 40 95.387 MiB |***************************** |
+ 50 95.332 MiB |***************************** |
+ 60 95.320 MiB |***************************** |
+ 70 95.398 MiB |***************************** |
+ 80 95.398 MiB |***************************** |
+ 90 95.340 MiB |***************************** |
+ 100 95.398 MiB |***************************** |
+
+
+Data Access Pattern Aware Memory Management
+===========================================
+
+Below command makes every memory region of size >=4K that has not accessed for
+>=60 seconds in your workload to be swapped out. ::
+
+ $ sudo damo schemes --damos_access_rate 0 0 --damos_sz_region 4K max \
+ --damos_age 60s max --damos_action pageout \
+ <pid of your workload>
diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst
new file mode 100644
index 000000000000..9d23144bf985
--- /dev/null
+++ b/Documentation/admin-guide/mm/damon/usage.rst
@@ -0,0 +1,911 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+Detailed Usages
+===============
+
+DAMON provides below interfaces for different users.
+
+- *DAMON user space tool.*
+ `This <https://github.com/awslabs/damo>`_ is for privileged people such as
+ system administrators who want a just-working human-friendly interface.
+ Using this, users can use the DAMON’s major features in a human-friendly way.
+ It may not be highly tuned for special cases, though. For more detail,
+ please refer to its `usage document
+ <https://github.com/awslabs/damo/blob/next/USAGE.md>`_.
+- *sysfs interface.*
+ :ref:`This <sysfs_interface>` is for privileged user space programmers who
+ want more optimized use of DAMON. Using this, users can use DAMON’s major
+ features by reading from and writing to special sysfs files. Therefore,
+ you can write and use your personalized DAMON sysfs wrapper programs that
+ reads/writes the sysfs files instead of you. The `DAMON user space tool
+ <https://github.com/awslabs/damo>`_ is one example of such programs.
+- *Kernel Space Programming Interface.*
+ :doc:`This </mm/damon/api>` is for kernel space programmers. Using this,
+ users can utilize every feature of DAMON most flexibly and efficiently by
+ writing kernel space DAMON application programs for you. You can even extend
+ DAMON for various address spaces. For detail, please refer to the interface
+ :doc:`document </mm/damon/api>`.
+- *debugfs interface. (DEPRECATED!)*
+ :ref:`This <debugfs_interface>` is almost identical to :ref:`sysfs interface
+ <sysfs_interface>`. This is deprecated, so users should move to the
+ :ref:`sysfs interface <sysfs_interface>`. If you depend on this and cannot
+ move, please report your usecase to damon@lists.linux.dev and
+ linux-mm@kvack.org.
+
+.. _sysfs_interface:
+
+sysfs Interface
+===============
+
+DAMON sysfs interface is built when ``CONFIG_DAMON_SYSFS`` is defined. It
+creates multiple directories and files under its sysfs directory,
+``<sysfs>/kernel/mm/damon/``. You can control DAMON by writing to and reading
+from the files under the directory.
+
+For a short example, users can monitor the virtual address space of a given
+workload as below. ::
+
+ # cd /sys/kernel/mm/damon/admin/
+ # echo 1 > kdamonds/nr_kdamonds && echo 1 > kdamonds/0/contexts/nr_contexts
+ # echo vaddr > kdamonds/0/contexts/0/operations
+ # echo 1 > kdamonds/0/contexts/0/targets/nr_targets
+ # echo $(pidof <workload>) > kdamonds/0/contexts/0/targets/0/pid_target
+ # echo on > kdamonds/0/state
+
+Files Hierarchy
+---------------
+
+The files hierarchy of DAMON sysfs interface is shown below. In the below
+figure, parents-children relations are represented with indentations, each
+directory is having ``/`` suffix, and files in each directory are separated by
+comma (",").
+
+.. parsed-literal::
+
+ :ref:`/sys/kernel/mm/damon <sysfs_root>`/admin
+ │ :ref:`kdamonds <sysfs_kdamonds>`/nr_kdamonds
+ │ │ :ref:`0 <sysfs_kdamond>`/state,pid
+ │ │ │ :ref:`contexts <sysfs_contexts>`/nr_contexts
+ │ │ │ │ :ref:`0 <sysfs_context>`/avail_operations,operations
+ │ │ │ │ │ :ref:`monitoring_attrs <sysfs_monitoring_attrs>`/
+ │ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
+ │ │ │ │ │ │ nr_regions/min,max
+ │ │ │ │ │ :ref:`targets <sysfs_targets>`/nr_targets
+ │ │ │ │ │ │ :ref:`0 <sysfs_target>`/pid_target
+ │ │ │ │ │ │ │ :ref:`regions <sysfs_regions>`/nr_regions
+ │ │ │ │ │ │ │ │ :ref:`0 <sysfs_region>`/start,end
+ │ │ │ │ │ │ │ │ ...
+ │ │ │ │ │ │ ...
+ │ │ │ │ │ :ref:`schemes <sysfs_schemes>`/nr_schemes
+ │ │ │ │ │ │ :ref:`0 <sysfs_scheme>`/action,apply_interval_us
+ │ │ │ │ │ │ │ :ref:`access_pattern <sysfs_access_pattern>`/
+ │ │ │ │ │ │ │ │ sz/min,max
+ │ │ │ │ │ │ │ │ nr_accesses/min,max
+ │ │ │ │ │ │ │ │ age/min,max
+ │ │ │ │ │ │ │ :ref:`quotas <sysfs_quotas>`/ms,bytes,reset_interval_ms
+ │ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
+ │ │ │ │ │ │ │ │ :ref:`goals <sysfs_schemes_quota_goals>`/nr_goals
+ │ │ │ │ │ │ │ │ │ 0/target_value,current_value
+ │ │ │ │ │ │ │ :ref:`watermarks <sysfs_watermarks>`/metric,interval_us,high,mid,low
+ │ │ │ │ │ │ │ :ref:`filters <sysfs_filters>`/nr_filters
+ │ │ │ │ │ │ │ │ 0/type,matching,memcg_id
+ │ │ │ │ │ │ │ :ref:`stats <sysfs_schemes_stats>`/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
+ │ │ │ │ │ │ │ :ref:`tried_regions <sysfs_schemes_tried_regions>`/total_bytes
+ │ │ │ │ │ │ │ │ 0/start,end,nr_accesses,age
+ │ │ │ │ │ │ │ │ ...
+ │ │ │ │ │ │ ...
+ │ │ │ │ ...
+ │ │ ...
+
+.. _sysfs_root:
+
+Root
+----
+
+The root of the DAMON sysfs interface is ``<sysfs>/kernel/mm/damon/``, and it
+has one directory named ``admin``. The directory contains the files for
+privileged user space programs' control of DAMON. User space tools or daemons
+having the root permission could use this directory.
+
+.. _sysfs_kdamonds:
+
+kdamonds/
+---------
+
+Under the ``admin`` directory, one directory, ``kdamonds``, which has files for
+controlling the kdamonds (refer to
+:ref:`design <damon_design_execution_model_and_data_structures>` for more
+details) exists. In the beginning, this directory has only one file,
+``nr_kdamonds``. Writing a number (``N``) to the file creates the number of
+child directories named ``0`` to ``N-1``. Each directory represents each
+kdamond.
+
+.. _sysfs_kdamond:
+
+kdamonds/<N>/
+-------------
+
+In each kdamond directory, two files (``state`` and ``pid``) and one directory
+(``contexts``) exist.
+
+Reading ``state`` returns ``on`` if the kdamond is currently running, or
+``off`` if it is not running.
+
+Users can write below commands for the kdamond to the ``state`` file.
+
+- ``on``: Start running.
+- ``off``: Stop running.
+- ``commit``: Read the user inputs in the sysfs files except ``state`` file
+ again.
+- ``commit_schemes_quota_goals``: Read the DAMON-based operation schemes'
+ :ref:`quota goals <sysfs_schemes_quota_goals>`.
+- ``update_schemes_stats``: Update the contents of stats files for each
+ DAMON-based operation scheme of the kdamond. For details of the stats,
+ please refer to :ref:`stats section <sysfs_schemes_stats>`.
+- ``update_schemes_tried_regions``: Update the DAMON-based operation scheme
+ action tried regions directory for each DAMON-based operation scheme of the
+ kdamond. For details of the DAMON-based operation scheme action tried
+ regions directory, please refer to
+ :ref:`tried_regions section <sysfs_schemes_tried_regions>`.
+- ``update_schemes_tried_bytes``: Update only ``.../tried_regions/total_bytes``
+ files.
+- ``clear_schemes_tried_regions``: Clear the DAMON-based operating scheme
+ action tried regions directory for each DAMON-based operation scheme of the
+ kdamond.
+
+If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread.
+
+``contexts`` directory contains files for controlling the monitoring contexts
+that this kdamond will execute.
+
+.. _sysfs_contexts:
+
+kdamonds/<N>/contexts/
+----------------------
+
+In the beginning, this directory has only one file, ``nr_contexts``. Writing a
+number (``N``) to the file creates the number of child directories named as
+``0`` to ``N-1``. Each directory represents each monitoring context (refer to
+:ref:`design <damon_design_execution_model_and_data_structures>` for more
+details). At the moment, only one context per kdamond is supported, so only
+``0`` or ``1`` can be written to the file.
+
+.. _sysfs_context:
+
+contexts/<N>/
+-------------
+
+In each context directory, two files (``avail_operations`` and ``operations``)
+and three directories (``monitoring_attrs``, ``targets``, and ``schemes``)
+exist.
+
+DAMON supports multiple types of monitoring operations, including those for
+virtual address space and the physical address space. You can get the list of
+available monitoring operations set on the currently running kernel by reading
+``avail_operations`` file. Based on the kernel configuration, the file will
+list some or all of below keywords.
+
+ - vaddr: Monitor virtual address spaces of specific processes
+ - fvaddr: Monitor fixed virtual address ranges
+ - paddr: Monitor the physical address space of the system
+
+Please refer to :ref:`regions sysfs directory <sysfs_regions>` for detailed
+differences between the operations sets in terms of the monitoring target
+regions.
+
+You can set and get what type of monitoring operations DAMON will use for the
+context by writing one of the keywords listed in ``avail_operations`` file and
+reading from the ``operations`` file.
+
+.. _sysfs_monitoring_attrs:
+
+contexts/<N>/monitoring_attrs/
+------------------------------
+
+Files for specifying attributes of the monitoring including required quality
+and efficiency of the monitoring are in ``monitoring_attrs`` directory.
+Specifically, two directories, ``intervals`` and ``nr_regions`` exist in this
+directory.
+
+Under ``intervals`` directory, three files for DAMON's sampling interval
+(``sample_us``), aggregation interval (``aggr_us``), and update interval
+(``update_us``) exist. You can set and get the values in micro-seconds by
+writing to and reading from the files.
+
+Under ``nr_regions`` directory, two files for the lower-bound and upper-bound
+of DAMON's monitoring regions (``min`` and ``max``, respectively), which
+controls the monitoring overhead, exist. You can set and get the values by
+writing to and rading from the files.
+
+For more details about the intervals and monitoring regions range, please refer
+to the Design document (:doc:`/mm/damon/design`).
+
+.. _sysfs_targets:
+
+contexts/<N>/targets/
+---------------------
+
+In the beginning, this directory has only one file, ``nr_targets``. Writing a
+number (``N``) to the file creates the number of child directories named ``0``
+to ``N-1``. Each directory represents each monitoring target.
+
+.. _sysfs_target:
+
+targets/<N>/
+------------
+
+In each target directory, one file (``pid_target``) and one directory
+(``regions``) exist.
+
+If you wrote ``vaddr`` to the ``contexts/<N>/operations``, each target should
+be a process. You can specify the process to DAMON by writing the pid of the
+process to the ``pid_target`` file.
+
+.. _sysfs_regions:
+
+targets/<N>/regions
+-------------------
+
+When ``vaddr`` monitoring operations set is being used (``vaddr`` is written to
+the ``contexts/<N>/operations`` file), DAMON automatically sets and updates the
+monitoring target regions so that entire memory mappings of target processes
+can be covered. However, users could want to set the initial monitoring region
+to specific address ranges.
+
+In contrast, DAMON do not automatically sets and updates the monitoring target
+regions when ``fvaddr`` or ``paddr`` monitoring operations sets are being used
+(``fvaddr`` or ``paddr`` have written to the ``contexts/<N>/operations``).
+Therefore, users should set the monitoring target regions by themselves in the
+cases.
+
+For such cases, users can explicitly set the initial monitoring target regions
+as they want, by writing proper values to the files under this directory.
+
+In the beginning, this directory has only one file, ``nr_regions``. Writing a
+number (``N``) to the file creates the number of child directories named ``0``
+to ``N-1``. Each directory represents each initial monitoring target region.
+
+.. _sysfs_region:
+
+regions/<N>/
+------------
+
+In each region directory, you will find two files (``start`` and ``end``). You
+can set and get the start and end addresses of the initial monitoring target
+region by writing to and reading from the files, respectively.
+
+Each region should not overlap with others. ``end`` of directory ``N`` should
+be equal or smaller than ``start`` of directory ``N+1``.
+
+.. _sysfs_schemes:
+
+contexts/<N>/schemes/
+---------------------
+
+The directory for DAMON-based Operation Schemes (:ref:`DAMOS
+<damon_design_damos>`). Users can get and set the schemes by reading from and
+writing to files under this directory.
+
+In the beginning, this directory has only one file, ``nr_schemes``. Writing a
+number (``N``) to the file creates the number of child directories named ``0``
+to ``N-1``. Each directory represents each DAMON-based operation scheme.
+
+.. _sysfs_scheme:
+
+schemes/<N>/
+------------
+
+In each scheme directory, five directories (``access_pattern``, ``quotas``,
+``watermarks``, ``filters``, ``stats``, and ``tried_regions``) and two files
+(``action`` and ``apply_interval``) exist.
+
+The ``action`` file is for setting and getting the scheme's :ref:`action
+<damon_design_damos_action>`. The keywords that can be written to and read
+from the file and their meaning are as below.
+
+Note that support of each action depends on the running DAMON operations set
+:ref:`implementation <sysfs_context>`.
+
+ - ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``.
+ Supported by ``vaddr`` and ``fvaddr`` operations set.
+ - ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``.
+ Supported by ``vaddr`` and ``fvaddr`` operations set.
+ - ``pageout``: Call ``madvise()`` for the region with ``MADV_PAGEOUT``.
+ Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set.
+ - ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``.
+ Supported by ``vaddr`` and ``fvaddr`` operations set.
+ - ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``.
+ Supported by ``vaddr`` and ``fvaddr`` operations set.
+ - ``lru_prio``: Prioritize the region on its LRU lists.
+ Supported by ``paddr`` operations set.
+ - ``lru_deprio``: Deprioritize the region on its LRU lists.
+ Supported by ``paddr`` operations set.
+ - ``stat``: Do nothing but count the statistics.
+ Supported by all operations sets.
+
+The ``apply_interval_us`` file is for setting and getting the scheme's
+:ref:`apply_interval <damon_design_damos>` in microseconds.
+
+.. _sysfs_access_pattern:
+
+schemes/<N>/access_pattern/
+---------------------------
+
+The directory for the target access :ref:`pattern
+<damon_design_damos_access_pattern>` of the given DAMON-based operation scheme.
+
+Under the ``access_pattern`` directory, three directories (``sz``,
+``nr_accesses``, and ``age``) each having two files (``min`` and ``max``)
+exist. You can set and get the access pattern for the given scheme by writing
+to and reading from the ``min`` and ``max`` files under ``sz``,
+``nr_accesses``, and ``age`` directories, respectively. Note that the ``min``
+and the ``max`` form a closed interval.
+
+.. _sysfs_quotas:
+
+schemes/<N>/quotas/
+-------------------
+
+The directory for the :ref:`quotas <damon_design_damos_quotas>` of the given
+DAMON-based operation scheme.
+
+Under ``quotas`` directory, three files (``ms``, ``bytes``,
+``reset_interval_ms``) and two directores (``weights`` and ``goals``) exist.
+
+You can set the ``time quota`` in milliseconds, ``size quota`` in bytes, and
+``reset interval`` in milliseconds by writing the values to the three files,
+respectively. Then, DAMON tries to use only up to ``time quota`` milliseconds
+for applying the ``action`` to memory regions of the ``access_pattern``, and to
+apply the action to only up to ``bytes`` bytes of memory regions within the
+``reset_interval_ms``. Setting both ``ms`` and ``bytes`` zero disables the
+quota limits.
+
+Under ``weights`` directory, three files (``sz_permil``,
+``nr_accesses_permil``, and ``age_permil``) exist.
+You can set the :ref:`prioritization weights
+<damon_design_damos_quotas_prioritization>` for size, access frequency, and age
+in per-thousand unit by writing the values to the three files under the
+``weights`` directory.
+
+.. _sysfs_schemes_quota_goals:
+
+schemes/<N>/quotas/goals/
+-------------------------
+
+The directory for the :ref:`automatic quota tuning goals
+<damon_design_damos_quotas_auto_tuning>` of the given DAMON-based operation
+scheme.
+
+In the beginning, this directory has only one file, ``nr_goals``. Writing a
+number (``N``) to the file creates the number of child directories named ``0``
+to ``N-1``. Each directory represents each goal and current achievement.
+Among the multiple feedback, the best one is used.
+
+Each goal directory contains two files, namely ``target_value`` and
+``current_value``. Users can set and get any number to those files to set the
+feedback. User space main workload's latency or throughput, system metrics
+like free memory ratio or memory pressure stall time (PSI) could be example
+metrics for the values. Note that users should write
+``commit_schemes_quota_goals`` to the ``state`` file of the :ref:`kdamond
+directory <sysfs_kdamond>` to pass the feedback to DAMON.
+
+.. _sysfs_watermarks:
+
+schemes/<N>/watermarks/
+-----------------------
+
+The directory for the :ref:`watermarks <damon_design_damos_watermarks>` of the
+given DAMON-based operation scheme.
+
+Under the watermarks directory, five files (``metric``, ``interval_us``,
+``high``, ``mid``, and ``low``) for setting the metric, the time interval
+between check of the metric, and the three watermarks exist. You can set and
+get the five values by writing to the files, respectively.
+
+Keywords and meanings of those that can be written to the ``metric`` file are
+as below.
+
+ - none: Ignore the watermarks
+ - free_mem_rate: System's free memory rate (per thousand)
+
+The ``interval`` should written in microseconds unit.
+
+.. _sysfs_filters:
+
+schemes/<N>/filters/
+--------------------
+
+The directory for the :ref:`filters <damon_design_damos_filters>` of the given
+DAMON-based operation scheme.
+
+In the beginning, this directory has only one file, ``nr_filters``. Writing a
+number (``N``) to the file creates the number of child directories named ``0``
+to ``N-1``. Each directory represents each filter. The filters are evaluated
+in the numeric order.
+
+Each filter directory contains six files, namely ``type``, ``matcing``,
+``memcg_path``, ``addr_start``, ``addr_end``, and ``target_idx``. To ``type``
+file, you can write one of four special keywords: ``anon`` for anonymous pages,
+``memcg`` for specific memory cgroup, ``addr`` for specific address range (an
+open-ended interval), or ``target`` for specific DAMON monitoring target
+filtering. In case of the memory cgroup filtering, you can specify the memory
+cgroup of the interest by writing the path of the memory cgroup from the
+cgroups mount point to ``memcg_path`` file. In case of the address range
+filtering, you can specify the start and end address of the range to
+``addr_start`` and ``addr_end`` files, respectively. For the DAMON monitoring
+target filtering, you can specify the index of the target between the list of
+the DAMON context's monitoring targets list to ``target_idx`` file. You can
+write ``Y`` or ``N`` to ``matching`` file to filter out pages that does or does
+not match to the type, respectively. Then, the scheme's action will not be
+applied to the pages that specified to be filtered out.
+
+For example, below restricts a DAMOS action to be applied to only non-anonymous
+pages of all memory cgroups except ``/having_care_already``.::
+
+ # echo 2 > nr_filters
+ # # filter out anonymous pages
+ echo anon > 0/type
+ echo Y > 0/matching
+ # # further filter out all cgroups except one at '/having_care_already'
+ echo memcg > 1/type
+ echo /having_care_already > 1/memcg_path
+ echo N > 1/matching
+
+Note that ``anon`` and ``memcg`` filters are currently supported only when
+``paddr`` :ref:`implementation <sysfs_context>` is being used.
+
+Also, memory regions that are filtered out by ``addr`` or ``target`` filters
+are not counted as the scheme has tried to those, while regions that filtered
+out by other type filters are counted as the scheme has tried to. The
+difference is applied to :ref:`stats <damos_stats>` and
+:ref:`tried regions <sysfs_schemes_tried_regions>`.
+
+.. _sysfs_schemes_stats:
+
+schemes/<N>/stats/
+------------------
+
+DAMON counts the total number and bytes of regions that each scheme is tried to
+be applied, the two numbers for the regions that each scheme is successfully
+applied, and the total number of the quota limit exceeds. This statistics can
+be used for online analysis or tuning of the schemes.
+
+The statistics can be retrieved by reading the files under ``stats`` directory
+(``nr_tried``, ``sz_tried``, ``nr_applied``, ``sz_applied``, and
+``qt_exceeds``), respectively. The files are not updated in real time, so you
+should ask DAMON sysfs interface to update the content of the files for the
+stats by writing a special keyword, ``update_schemes_stats`` to the relevant
+``kdamonds/<N>/state`` file.
+
+.. _sysfs_schemes_tried_regions:
+
+schemes/<N>/tried_regions/
+--------------------------
+
+This directory initially has one file, ``total_bytes``.
+
+When a special keyword, ``update_schemes_tried_regions``, is written to the
+relevant ``kdamonds/<N>/state`` file, DAMON updates the ``total_bytes`` file so
+that reading it returns the total size of the scheme tried regions, and creates
+directories named integer starting from ``0`` under this directory. Each
+directory contains files exposing detailed information about each of the memory
+region that the corresponding scheme's ``action`` has tried to be applied under
+this directory, during next :ref:`apply interval <damon_design_damos>` of the
+corresponding scheme. The information includes address range, ``nr_accesses``,
+and ``age`` of the region.
+
+Writing ``update_schemes_tried_bytes`` to the relevant ``kdamonds/<N>/state``
+file will only update the ``total_bytes`` file, and will not create the
+subdirectories.
+
+The directories will be removed when another special keyword,
+``clear_schemes_tried_regions``, is written to the relevant
+``kdamonds/<N>/state`` file.
+
+The expected usage of this directory is investigations of schemes' behaviors,
+and query-like efficient data access monitoring results retrievals. For the
+latter use case, in particular, users can set the ``action`` as ``stat`` and
+set the ``access pattern`` as their interested pattern that they want to query.
+
+.. _sysfs_schemes_tried_region:
+
+tried_regions/<N>/
+------------------
+
+In each region directory, you will find four files (``start``, ``end``,
+``nr_accesses``, and ``age``). Reading the files will show the start and end
+addresses, ``nr_accesses``, and ``age`` of the region that corresponding
+DAMON-based operation scheme ``action`` has tried to be applied.
+
+Example
+~~~~~~~
+
+Below commands applies a scheme saying "If a memory region of size in [4KiB,
+8KiB] is showing accesses per aggregate interval in [0, 5] for aggregate
+interval in [10, 20], page out the region. For the paging out, use only up to
+10ms per second, and also don't page out more than 1GiB per second. Under the
+limitation, page out memory regions having longer age first. Also, check the
+free memory rate of the system every 5 seconds, start the monitoring and paging
+out when the free memory rate becomes lower than 50%, but stop it if the free
+memory rate becomes larger than 60%, or lower than 30%". ::
+
+ # cd <sysfs>/kernel/mm/damon/admin
+ # # populate directories
+ # echo 1 > kdamonds/nr_kdamonds; echo 1 > kdamonds/0/contexts/nr_contexts;
+ # echo 1 > kdamonds/0/contexts/0/schemes/nr_schemes
+ # cd kdamonds/0/contexts/0/schemes/0
+ # # set the basic access pattern and the action
+ # echo 4096 > access_pattern/sz/min
+ # echo 8192 > access_pattern/sz/max
+ # echo 0 > access_pattern/nr_accesses/min
+ # echo 5 > access_pattern/nr_accesses/max
+ # echo 10 > access_pattern/age/min
+ # echo 20 > access_pattern/age/max
+ # echo pageout > action
+ # # set quotas
+ # echo 10 > quotas/ms
+ # echo $((1024*1024*1024)) > quotas/bytes
+ # echo 1000 > quotas/reset_interval_ms
+ # # set watermark
+ # echo free_mem_rate > watermarks/metric
+ # echo 5000000 > watermarks/interval_us
+ # echo 600 > watermarks/high
+ # echo 500 > watermarks/mid
+ # echo 300 > watermarks/low
+
+Please note that it's highly recommended to use user space tools like `damo
+<https://github.com/awslabs/damo>`_ rather than manually reading and writing
+the files as above. Above is only for an example.
+
+.. _tracepoint:
+
+Tracepoints for Monitoring Results
+==================================
+
+Users can get the monitoring results via the :ref:`tried_regions
+<sysfs_schemes_tried_regions>`. The interface is useful for getting a
+snapshot, but it could be inefficient for fully recording all the monitoring
+results. For the purpose, two trace points, namely ``damon:damon_aggregated``
+and ``damon:damos_before_apply``, are provided. ``damon:damon_aggregated``
+provides the whole monitoring results, while ``damon:damos_before_apply``
+provides the monitoring results for regions that each DAMON-based Operation
+Scheme (:ref:`DAMOS <damon_design_damos>`) is gonna be applied. Hence,
+``damon:damos_before_apply`` is more useful for recording internal behavior of
+DAMOS, or DAMOS target access
+:ref:`pattern <damon_design_damos_access_pattern>` based query-like efficient
+monitoring results recording.
+
+While the monitoring is turned on, you could record the tracepoint events and
+show results using tracepoint supporting tools like ``perf``. For example::
+
+ # echo on > monitor_on
+ # perf record -e damon:damon_aggregated &
+ # sleep 5
+ # kill 9 $(pidof perf)
+ # echo off > monitor_on
+ # perf script
+ kdamond.0 46568 [027] 79357.842179: damon:damon_aggregated: target_id=0 nr_regions=11 122509119488-135708762112: 0 864
+ [...]
+
+Each line of the perf script output represents each monitoring region. The
+first five fields are as usual other tracepoint outputs. The sixth field
+(``target_id=X``) shows the ide of the monitoring target of the region. The
+seventh field (``nr_regions=X``) shows the total number of monitoring regions
+for the target. The eighth field (``X-Y:``) shows the start (``X``) and end
+(``Y``) addresses of the region in bytes. The ninth field (``X``) shows the
+``nr_accesses`` of the region (refer to
+:ref:`design <damon_design_region_based_sampling>` for more details of the
+counter). Finally the tenth field (``X``) shows the ``age`` of the region
+(refer to :ref:`design <damon_design_age_tracking>` for more details of the
+counter).
+
+If the event was ``damon:damos_beofre_apply``, the ``perf script`` output would
+be somewhat like below::
+
+ kdamond.0 47293 [000] 80801.060214: damon:damos_before_apply: ctx_idx=0 scheme_idx=0 target_idx=0 nr_regions=11 121932607488-135128711168: 0 136
+ [...]
+
+Each line of the output represents each monitoring region that each DAMON-based
+Operation Scheme was about to be applied at the traced time. The first five
+fields are as usual. It shows the index of the DAMON context (``ctx_idx=X``)
+of the scheme in the list of the contexts of the context's kdamond, the index
+of the scheme (``scheme_idx=X``) in the list of the schemes of the context, in
+addition to the output of ``damon_aggregated`` tracepoint.
+
+
+.. _debugfs_interface:
+
+debugfs Interface (DEPRECATED!)
+===============================
+
+.. note::
+
+ THIS IS DEPRECATED!
+
+ DAMON debugfs interface is deprecated, so users should move to the
+ :ref:`sysfs interface <sysfs_interface>`. If you depend on this and cannot
+ move, please report your usecase to damon@lists.linux.dev and
+ linux-mm@kvack.org.
+
+DAMON exports eight files, ``attrs``, ``target_ids``, ``init_regions``,
+``schemes``, ``monitor_on``, ``kdamond_pid``, ``mk_contexts`` and
+``rm_contexts`` under its debugfs directory, ``<debugfs>/damon/``.
+
+
+Attributes
+----------
+
+Users can get and set the ``sampling interval``, ``aggregation interval``,
+``update interval``, and min/max number of monitoring target regions by
+reading from and writing to the ``attrs`` file. To know about the monitoring
+attributes in detail, please refer to the :doc:`/mm/damon/design`. For
+example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10 and
+1000, and then check it again::
+
+ # cd <debugfs>/damon
+ # echo 5000 100000 1000000 10 1000 > attrs
+ # cat attrs
+ 5000 100000 1000000 10 1000
+
+
+Target IDs
+----------
+
+Some types of address spaces supports multiple monitoring target. For example,
+the virtual memory address spaces monitoring can have multiple processes as the
+monitoring targets. Users can set the targets by writing relevant id values of
+the targets to, and get the ids of the current targets by reading from the
+``target_ids`` file. In case of the virtual address spaces monitoring, the
+values should be pids of the monitoring target processes. For example, below
+commands set processes having pids 42 and 4242 as the monitoring targets and
+check it again::
+
+ # cd <debugfs>/damon
+ # echo 42 4242 > target_ids
+ # cat target_ids
+ 42 4242
+
+Users can also monitor the physical memory address space of the system by
+writing a special keyword, "``paddr\n``" to the file. Because physical address
+space monitoring doesn't support multiple targets, reading the file will show a
+fake value, ``42``, as below::
+
+ # cd <debugfs>/damon
+ # echo paddr > target_ids
+ # cat target_ids
+ 42
+
+Note that setting the target ids doesn't start the monitoring.
+
+
+Initial Monitoring Target Regions
+---------------------------------
+
+In case of the virtual address space monitoring, DAMON automatically sets and
+updates the monitoring target regions so that entire memory mappings of target
+processes can be covered. However, users can want to limit the monitoring
+region to specific address ranges, such as the heap, the stack, or specific
+file-mapped area. Or, some users can know the initial access pattern of their
+workloads and therefore want to set optimal initial regions for the 'adaptive
+regions adjustment'.
+
+In contrast, DAMON do not automatically sets and updates the monitoring target
+regions in case of physical memory monitoring. Therefore, users should set the
+monitoring target regions by themselves.
+
+In such cases, users can explicitly set the initial monitoring target regions
+as they want, by writing proper values to the ``init_regions`` file. The input
+should be a sequence of three integers separated by white spaces that represent
+one region in below form.::
+
+ <target idx> <start address> <end address>
+
+The ``target idx`` should be the index of the target in ``target_ids`` file,
+starting from ``0``, and the regions should be passed in address order. For
+example, below commands will set a couple of address ranges, ``1-100`` and
+``100-200`` as the initial monitoring target region of pid 42, which is the
+first one (index ``0``) in ``target_ids``, and another couple of address
+ranges, ``20-40`` and ``50-100`` as that of pid 4242, which is the second one
+(index ``1``) in ``target_ids``.::
+
+ # cd <debugfs>/damon
+ # cat target_ids
+ 42 4242
+ # echo "0 1 100 \
+ 0 100 200 \
+ 1 20 40 \
+ 1 50 100" > init_regions
+
+Note that this sets the initial monitoring target regions only. In case of
+virtual memory monitoring, DAMON will automatically updates the boundary of the
+regions after one ``update interval``. Therefore, users should set the
+``update interval`` large enough in this case, if they don't want the
+update.
+
+
+Schemes
+-------
+
+Users can get and set the DAMON-based operation :ref:`schemes
+<damon_design_damos>` by reading from and writing to ``schemes`` debugfs file.
+Reading the file also shows the statistics of each scheme. To the file, each
+of the schemes should be represented in each line in below form::
+
+ <target access pattern> <action> <quota> <watermarks>
+
+You can disable schemes by simply writing an empty string to the file.
+
+Target Access Pattern
+~~~~~~~~~~~~~~~~~~~~~
+
+The target access :ref:`pattern <damon_design_damos_access_pattern>` of the
+scheme. The ``<target access pattern>`` is constructed with three ranges in
+below form::
+
+ min-size max-size min-acc max-acc min-age max-age
+
+Specifically, bytes for the size of regions (``min-size`` and ``max-size``),
+number of monitored accesses per aggregate interval for access frequency
+(``min-acc`` and ``max-acc``), number of aggregate intervals for the age of
+regions (``min-age`` and ``max-age``) are specified. Note that the ranges are
+closed interval.
+
+Action
+~~~~~~
+
+The ``<action>`` is a predefined integer for memory management :ref:`actions
+<damon_design_damos_action>`. The supported numbers and their meanings are as
+below.
+
+ - 0: Call ``madvise()`` for the region with ``MADV_WILLNEED``. Ignored if
+ ``target`` is ``paddr``.
+ - 1: Call ``madvise()`` for the region with ``MADV_COLD``. Ignored if
+ ``target`` is ``paddr``.
+ - 2: Call ``madvise()`` for the region with ``MADV_PAGEOUT``.
+ - 3: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``. Ignored if
+ ``target`` is ``paddr``.
+ - 4: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``. Ignored if
+ ``target`` is ``paddr``.
+ - 5: Do nothing but count the statistics
+
+Quota
+~~~~~
+
+Users can set the :ref:`quotas <damon_design_damos_quotas>` of the given scheme
+via the ``<quota>`` in below form::
+
+ <ms> <sz> <reset interval> <priority weights>
+
+This makes DAMON to try to use only up to ``<ms>`` milliseconds for applying
+the action to memory regions of the ``target access pattern`` within the
+``<reset interval>`` milliseconds, and to apply the action to only up to
+``<sz>`` bytes of memory regions within the ``<reset interval>``. Setting both
+``<ms>`` and ``<sz>`` zero disables the quota limits.
+
+For the :ref:`prioritization <damon_design_damos_quotas_prioritization>`, users
+can set the weights for the three properties in ``<priority weights>`` in below
+form::
+
+ <size weight> <access frequency weight> <age weight>
+
+Watermarks
+~~~~~~~~~~
+
+Users can specify :ref:`watermarks <damon_design_damos_watermarks>` of the
+given scheme via ``<watermarks>`` in below form::
+
+ <metric> <check interval> <high mark> <middle mark> <low mark>
+
+``<metric>`` is a predefined integer for the metric to be checked. The
+supported numbers and their meanings are as below.
+
+ - 0: Ignore the watermarks
+ - 1: System's free memory rate (per thousand)
+
+The value of the metric is checked every ``<check interval>`` microseconds.
+
+If the value is higher than ``<high mark>`` or lower than ``<low mark>``, the
+scheme is deactivated. If the value is lower than ``<mid mark>``, the scheme
+is activated.
+
+.. _damos_stats:
+
+Statistics
+~~~~~~~~~~
+
+It also counts the total number and bytes of regions that each scheme is tried
+to be applied, the two numbers for the regions that each scheme is successfully
+applied, and the total number of the quota limit exceeds. This statistics can
+be used for online analysis or tuning of the schemes.
+
+The statistics can be shown by reading the ``schemes`` file. Reading the file
+will show each scheme you entered in each line, and the five numbers for the
+statistics will be added at the end of each line.
+
+Example
+~~~~~~~
+
+Below commands applies a scheme saying "If a memory region of size in [4KiB,
+8KiB] is showing accesses per aggregate interval in [0, 5] for aggregate
+interval in [10, 20], page out the region. For the paging out, use only up to
+10ms per second, and also don't page out more than 1GiB per second. Under the
+limitation, page out memory regions having longer age first. Also, check the
+free memory rate of the system every 5 seconds, start the monitoring and paging
+out when the free memory rate becomes lower than 50%, but stop it if the free
+memory rate becomes larger than 60%, or lower than 30%".::
+
+ # cd <debugfs>/damon
+ # scheme="4096 8192 0 5 10 20 2" # target access pattern and action
+ # scheme+=" 10 $((1024*1024*1024)) 1000" # quotas
+ # scheme+=" 0 0 100" # prioritization weights
+ # scheme+=" 1 5000000 600 500 300" # watermarks
+ # echo "$scheme" > schemes
+
+
+Turning On/Off
+--------------
+
+Setting the files as described above doesn't incur effect unless you explicitly
+start the monitoring. You can start, stop, and check the current status of the
+monitoring by writing to and reading from the ``monitor_on`` file. Writing
+``on`` to the file starts the monitoring of the targets with the attributes.
+Writing ``off`` to the file stops those. DAMON also stops if every target
+process is terminated. Below example commands turn on, off, and check the
+status of DAMON::
+
+ # cd <debugfs>/damon
+ # echo on > monitor_on
+ # echo off > monitor_on
+ # cat monitor_on
+ off
+
+Please note that you cannot write to the above-mentioned debugfs files while
+the monitoring is turned on. If you write to the files while DAMON is running,
+an error code such as ``-EBUSY`` will be returned.
+
+
+Monitoring Thread PID
+---------------------
+
+DAMON does requested monitoring with a kernel thread called ``kdamond``. You
+can get the pid of the thread by reading the ``kdamond_pid`` file. When the
+monitoring is turned off, reading the file returns ``none``. ::
+
+ # cd <debugfs>/damon
+ # cat monitor_on
+ off
+ # cat kdamond_pid
+ none
+ # echo on > monitor_on
+ # cat kdamond_pid
+ 18594
+
+
+Using Multiple Monitoring Threads
+---------------------------------
+
+One ``kdamond`` thread is created for each monitoring context. You can create
+and remove monitoring contexts for multiple ``kdamond`` required use case using
+the ``mk_contexts`` and ``rm_contexts`` files.
+
+Writing the name of the new context to the ``mk_contexts`` file creates a
+directory of the name on the DAMON debugfs directory. The directory will have
+DAMON debugfs files for the context. ::
+
+ # cd <debugfs>/damon
+ # ls foo
+ # ls: cannot access 'foo': No such file or directory
+ # echo foo > mk_contexts
+ # ls foo
+ # attrs init_regions kdamond_pid schemes target_ids
+
+If the context is not needed anymore, you can remove it and the corresponding
+directory by putting the name of the context to the ``rm_contexts`` file. ::
+
+ # echo foo > rm_contexts
+ # ls foo
+ # ls: cannot access 'foo': No such file or directory
+
+Note that ``mk_contexts``, ``rm_contexts``, and ``monitor_on`` files are in the
+root directory only.
diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
index 1cc0bc78d10e..e4d4b4a8dc97 100644
--- a/Documentation/admin-guide/mm/hugetlbpage.rst
+++ b/Documentation/admin-guide/mm/hugetlbpage.rst
@@ -1,5 +1,3 @@
-.. _hugetlbpage:
-
=============
HugeTLB Pages
=============
@@ -60,8 +58,12 @@ HugePages_Surp
the pool above the value in ``/proc/sys/vm/nr_hugepages``. The
maximum number of surplus huge pages is controlled by
``/proc/sys/vm/nr_overcommit_hugepages``.
+ Note: When the feature of freeing unused vmemmap pages associated
+ with each hugetlb page is enabled, the number of surplus huge pages
+ may be temporarily larger than the maximum number of surplus huge
+ pages when the system is under memory pressure.
Hugepagesize
- is the default hugepage size (in Kb).
+ is the default hugepage size (in kB).
Hugetlb
is the total amount of memory (in kB), consumed by huge
pages of all sizes.
@@ -80,6 +82,10 @@ returned to the huge page pool when freed by a task. A user with root
privileges can dynamically allocate more or free some persistent huge pages
by increasing or decreasing the value of ``nr_hugepages``.
+Note: When the feature of freeing unused vmemmap pages associated with each
+hugetlb page is enabled, we can fail to free the huge pages triggered by
+the user when the system is under memory pressure. Please try again later.
+
Pages that are used as huge pages are reserved inside the kernel and cannot
be used for other purposes. Huge pages cannot be swapped out under
memory pressure.
@@ -100,6 +106,65 @@ with a huge page size selection parameter "hugepagesz=<size>". <size> must
be specified in bytes with optional scale suffix [kKmMgG]. The default huge
page size may be selected with the "default_hugepagesz=<size>" boot parameter.
+Hugetlb boot command line parameter semantics
+
+hugepagesz
+ Specify a huge page size. Used in conjunction with hugepages
+ parameter to preallocate a number of huge pages of the specified
+ size. Hence, hugepagesz and hugepages are typically specified in
+ pairs such as::
+
+ hugepagesz=2M hugepages=512
+
+ hugepagesz can only be specified once on the command line for a
+ specific huge page size. Valid huge page sizes are architecture
+ dependent.
+hugepages
+ Specify the number of huge pages to preallocate. This typically
+ follows a valid hugepagesz or default_hugepagesz parameter. However,
+ if hugepages is the first or only hugetlb command line parameter it
+ implicitly specifies the number of huge pages of default size to
+ allocate. If the number of huge pages of default size is implicitly
+ specified, it can not be overwritten by a hugepagesz,hugepages
+ parameter pair for the default size. This parameter also has a
+ node format. The node format specifies the number of huge pages
+ to allocate on specific nodes.
+
+ For example, on an architecture with 2M default huge page size::
+
+ hugepages=256 hugepagesz=2M hugepages=512
+
+ will result in 256 2M huge pages being allocated and a warning message
+ indicating that the hugepages=512 parameter is ignored. If a hugepages
+ parameter is preceded by an invalid hugepagesz parameter, it will
+ be ignored.
+
+ Node format example::
+
+ hugepagesz=2M hugepages=0:1,1:2
+
+ It will allocate 1 2M hugepage on node0 and 2 2M hugepages on node1.
+ If the node number is invalid, the parameter will be ignored.
+
+default_hugepagesz
+ Specify the default huge page size. This parameter can
+ only be specified once on the command line. default_hugepagesz can
+ optionally be followed by the hugepages parameter to preallocate a
+ specific number of huge pages of default size. The number of default
+ sized huge pages to preallocate can also be implicitly specified as
+ mentioned in the hugepages section above. Therefore, on an
+ architecture with 2M default huge page size::
+
+ hugepages=256
+ default_hugepagesz=2M hugepages=256
+ hugepages=256 default_hugepagesz=2M
+
+ will all result in 256 2M huge pages being allocated. Valid default
+ huge page size is architecture dependent.
+hugetlb_free_vmemmap
+ When CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is set, this enables HugeTLB
+ Vmemmap Optimization (HVO).
+
When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages``
indicates the current number of pre-allocated huge pages of the default size.
Thus, one can use the following command to dynamically allocate/deallocate
@@ -177,8 +242,12 @@ will exist, of the form::
hugepages-${size}kB
-Inside each of these directories, the same set of files will exist::
+Inside each of these directories, the set of files contained in ``/proc``
+will exist. In addition, two additional interfaces for demoting huge
+pages may exist::
+ demote
+ demote_size
nr_hugepages
nr_hugepages_mempolicy
nr_overcommit_hugepages
@@ -186,7 +255,29 @@ Inside each of these directories, the same set of files will exist::
resv_hugepages
surplus_hugepages
-which function as described above for the default huge page-sized case.
+The demote interfaces provide the ability to split a huge page into
+smaller huge pages. For example, the x86 architecture supports both
+1GB and 2MB huge pages sizes. A 1GB huge page can be split into 512
+2MB huge pages. Demote interfaces are not available for the smallest
+huge page size. The demote interfaces are:
+
+demote_size
+ is the size of demoted pages. When a page is demoted a corresponding
+ number of huge pages of demote_size will be created. By default,
+ demote_size is set to the next smaller huge page size. If there are
+ multiple smaller huge page sizes, demote_size can be set to any of
+ these smaller sizes. Only huge page sizes less than the current huge
+ pages size are allowed.
+
+demote
+ is used to demote a number of huge pages. A user with root privileges
+ can write to this file. It may not be possible to demote the
+ requested number of huge pages. To determine how many pages were
+ actually demoted, compare the value of nr_hugepages before and after
+ writing to the demote interface. demote is a write only interface.
+
+The interfaces which are the same as in ``/proc`` (all except demote and
+demote_size) function as described above for the default huge page-sized case.
.. _mem_policy_and_hp_alloc:
@@ -220,7 +311,7 @@ memory policy mode--bind, preferred, local or interleave--may be used. The
resulting effect on persistent huge page allocation is as follows:
#. Regardless of mempolicy mode [see
- :ref:`Documentation/admin-guide/mm/numa_memory_policy.rst <numa_memory_policy>`],
+ Documentation/admin-guide/mm/numa_memory_policy.rst],
persistent huge pages will be distributed across the node or nodes
specified in the mempolicy as if "interleave" had been specified.
However, if a node in the policy does not contain sufficient contiguous
@@ -368,13 +459,13 @@ Examples
.. _map_hugetlb:
``map_hugetlb``
- see tools/testing/selftests/vm/map_hugetlb.c
+ see tools/testing/selftests/mm/map_hugetlb.c
``hugepage-shm``
- see tools/testing/selftests/vm/hugepage-shm.c
+ see tools/testing/selftests/mm/hugepage-shm.c
``hugepage-mmap``
- see tools/testing/selftests/vm/hugepage-mmap.c
+ see tools/testing/selftests/mm/hugepage-mmap.c
The `libhugetlbfs`_ library provides a wide range of userspace tools
to help with huge page usability, environment setup, and control.
diff --git a/Documentation/admin-guide/mm/idle_page_tracking.rst b/Documentation/admin-guide/mm/idle_page_tracking.rst
index df9394fb39c2..16fcf38dac56 100644
--- a/Documentation/admin-guide/mm/idle_page_tracking.rst
+++ b/Documentation/admin-guide/mm/idle_page_tracking.rst
@@ -1,5 +1,3 @@
-.. _idle_page_tracking:
-
==================
Idle Page Tracking
==================
@@ -65,14 +63,13 @@ workload one should:
are not reclaimable, he or she can filter them out using
``/proc/kpageflags``.
-The page-types tool in the tools/vm directory can be used to assist in this.
+The page-types tool in the tools/mm directory can be used to assist in this.
If the tool is run initially with the appropriate option, it will mark all the
queried pages as idle. Subsequent runs of the tool can then show which pages have
their idle flag cleared in the interim.
-See :ref:`Documentation/admin-guide/mm/pagemap.rst <pagemap>` for more
-information about ``/proc/pid/pagemap``, ``/proc/kpageflags``, and
-``/proc/kpagecgroup``.
+See Documentation/admin-guide/mm/pagemap.rst for more information about
+``/proc/pid/pagemap``, ``/proc/kpageflags``, and ``/proc/kpagecgroup``.
.. _impl_details:
diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst
index 11db46448354..1f883abf3f00 100644
--- a/Documentation/admin-guide/mm/index.rst
+++ b/Documentation/admin-guide/mm/index.rst
@@ -3,9 +3,9 @@ Memory Management
=================
Linux memory management subsystem is responsible, as the name implies,
-for managing the memory in the system. This includes implemnetation of
+for managing the memory in the system. This includes implementation of
virtual memory and demand paging, memory allocation both for kernel
-internal structures and user space programms, mapping of files into
+internal structures and user space programs, mapping of files into
processes address space and many other cool things.
Linux memory management is a complex system with many configurable
@@ -16,8 +16,7 @@ are described in Documentation/admin-guide/sysctl/vm.rst and in `man 5 proc`_.
.. _man 5 proc: http://man7.org/linux/man-pages/man5/proc.5.html
Linux memory management has its own jargon and if you are not yet
-familiar with it, consider reading
-:ref:`Documentation/admin-guide/mm/concepts.rst <mm_concepts>`.
+familiar with it, consider reading Documentation/admin-guide/mm/concepts.rst.
Here we document in detail how to interact with various mechanisms in
the Linux memory management.
@@ -27,13 +26,19 @@ the Linux memory management.
concepts
cma_debugfs
+ damon/index
hugetlbpage
idle_page_tracking
ksm
memory-hotplug
+ multigen_lru
+ nommu-mmap
numa_memory_policy
numaperf
pagemap
+ shrinker_debugfs
soft-dirty
+ swap_numa
transhuge
userfaultfd
+ zswap
diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
index 874eb0c77d34..a639cac12477 100644
--- a/Documentation/admin-guide/mm/ksm.rst
+++ b/Documentation/admin-guide/mm/ksm.rst
@@ -1,5 +1,3 @@
-.. _admin_guide_ksm:
-
=======================
Kernel Samepage Merging
=======================
@@ -9,7 +7,7 @@ Overview
KSM is a memory-saving de-duplication feature, enabled by CONFIG_KSM=y,
added to the Linux kernel in 2.6.32. See ``mm/ksm.c`` for its implementation,
-and http://lwn.net/Articles/306704/ and http://lwn.net/Articles/330589/
+and http://lwn.net/Articles/306704/ and https://lwn.net/Articles/330589/
KSM was originally developed for use with KVM (where it was known as
Kernel Shared Memory), to fit more virtual machines into physical memory,
@@ -22,7 +20,7 @@ content which can be replaced by a single write-protected page (which
is automatically copied if a process later wants to update its
content). The amount of pages that KSM daemon scans in a single pass
and the time between the passes are configured using :ref:`sysfs
-intraface <ksm_sysfs>`
+interface <ksm_sysfs>`
KSM only merges anonymous (private) pages, never pagecache (file) pages.
KSM's merged pages were originally locked into kernel memory, but can now
@@ -52,7 +50,7 @@ with EAGAIN, but more probably arousing the Out-Of-Memory killer.
If KSM is not configured into the running kernel, madvise MADV_MERGEABLE
and MADV_UNMERGEABLE simply fail with EINVAL. If the running kernel was
built with CONFIG_KSM=y, those calls will normally succeed: even if the
-the KSM daemon is not currently running, MADV_MERGEABLE still registers
+KSM daemon is not currently running, MADV_MERGEABLE still registers
the range for whenever the KSM daemon is started; even if the range
cannot contain any pages which KSM could actually merge; even if
MADV_UNMERGEABLE is applied to a range which was never MADV_MERGEABLE.
@@ -82,6 +80,9 @@ pages_to_scan
how many pages to scan before ksmd goes to sleep
e.g. ``echo 100 > /sys/kernel/mm/ksm/pages_to_scan``.
+ The pages_to_scan value cannot be changed if ``advisor_mode`` has
+ been set to scan-time.
+
Default: 100 (chosen for demonstration purposes)
sleep_millisecs
@@ -157,8 +158,44 @@ stable_node_chains_prune_millisecs
scan. It's a noop if not a single KSM page hit the
``max_page_sharing`` yet.
+smart_scan
+ Historically KSM checked every candidate page for each scan. It did
+ not take into account historic information. When smart scan is
+ enabled, pages that have previously not been de-duplicated get
+ skipped. How often these pages are skipped depends on how often
+ de-duplication has already been tried and failed. By default this
+ optimization is enabled. The ``pages_skipped`` metric shows how
+ effective the setting is.
+
+advisor_mode
+ The ``advisor_mode`` selects the current advisor. Two modes are
+ supported: none and scan-time. The default is none. By setting
+ ``advisor_mode`` to scan-time, the scan time advisor is enabled.
+ The section about ``advisor`` explains in detail how the scan time
+ advisor works.
+
+adivsor_max_cpu
+ specifies the upper limit of the cpu percent usage of the ksmd
+ background thread. The default is 70.
+
+advisor_target_scan_time
+ specifies the target scan time in seconds to scan all the candidate
+ pages. The default value is 200 seconds.
+
+advisor_min_pages_to_scan
+ specifies the lower limit of the ``pages_to_scan`` parameter of the
+ scan time advisor. The default is 500.
+
+adivsor_max_pages_to_scan
+ specifies the upper limit of the ``pages_to_scan`` parameter of the
+ scan time advisor. The default is 30000.
+
The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
+general_profit
+ how effective is KSM. The calculation is explained below.
+pages_scanned
+ how many pages are being scanned for ksm
pages_shared
how many shared pages are being used
pages_sharing
@@ -167,12 +204,21 @@ pages_unshared
how many pages unique but repeatedly checked for merging
pages_volatile
how many pages changing too fast to be placed in a tree
+pages_skipped
+ how many pages did the "smart" page scanning algorithm skip
full_scans
how many times all mergeable areas have been scanned
stable_node_chains
the number of KSM pages that hit the ``max_page_sharing`` limit
stable_node_dups
number of duplicated KSM pages
+ksm_zero_pages
+ how many zero pages that are still mapped into processes were mapped by
+ KSM when deduplicating.
+
+When ``use_zero_pages`` is/was enabled, the sum of ``pages_sharing`` +
+``ksm_zero_pages`` represents the actual number of pages saved by KSM.
+if ``use_zero_pages`` has never been enabled, ``ksm_zero_pages`` is 0.
A high ratio of ``pages_sharing`` to ``pages_shared`` indicates good
sharing, but a high ratio of ``pages_unshared`` to ``pages_sharing``
@@ -184,6 +230,94 @@ The maximum possible ``pages_sharing/pages_shared`` ratio is limited by the
``max_page_sharing`` tunable. To increase the ratio ``max_page_sharing`` must
be increased accordingly.
+Monitoring KSM profit
+=====================
+
+KSM can save memory by merging identical pages, but also can consume
+additional memory, because it needs to generate a number of rmap_items to
+save each scanned page's brief rmap information. Some of these pages may
+be merged, but some may not be abled to be merged after being checked
+several times, which are unprofitable memory consumed.
+
+1) How to determine whether KSM save memory or consume memory in system-wide
+ range? Here is a simple approximate calculation for reference::
+
+ general_profit =~ ksm_saved_pages * sizeof(page) - (all_rmap_items) *
+ sizeof(rmap_item);
+
+ where ksm_saved_pages equals to the sum of ``pages_sharing`` +
+ ``ksm_zero_pages`` of the system, and all_rmap_items can be easily
+ obtained by summing ``pages_sharing``, ``pages_shared``, ``pages_unshared``
+ and ``pages_volatile``.
+
+2) The KSM profit inner a single process can be similarly obtained by the
+ following approximate calculation::
+
+ process_profit =~ ksm_saved_pages * sizeof(page) -
+ ksm_rmap_items * sizeof(rmap_item).
+
+ where ksm_saved_pages equals to the sum of ``ksm_merging_pages`` and
+ ``ksm_zero_pages``, both of which are shown under the directory
+ ``/proc/<pid>/ksm_stat``, and ksm_rmap_items is also shown in
+ ``/proc/<pid>/ksm_stat``. The process profit is also shown in
+ ``/proc/<pid>/ksm_stat`` as ksm_process_profit.
+
+From the perspective of application, a high ratio of ``ksm_rmap_items`` to
+``ksm_merging_pages`` means a bad madvise-applied policy, so developers or
+administrators have to rethink how to change madvise policy. Giving an example
+for reference, a page's size is usually 4K, and the rmap_item's size is
+separately 32B on 32-bit CPU architecture and 64B on 64-bit CPU architecture.
+so if the ``ksm_rmap_items/ksm_merging_pages`` ratio exceeds 64 on 64-bit CPU
+or exceeds 128 on 32-bit CPU, then the app's madvise policy should be dropped,
+because the ksm profit is approximately zero or negative.
+
+Monitoring KSM events
+=====================
+
+There are some counters in /proc/vmstat that may be used to monitor KSM events.
+KSM might help save memory, it's a tradeoff by may suffering delay on KSM COW
+or on swapping in copy. Those events could help users evaluate whether or how
+to use KSM. For example, if cow_ksm increases too fast, user may decrease the
+range of madvise(, , MADV_MERGEABLE).
+
+cow_ksm
+ is incremented every time a KSM page triggers copy on write (COW)
+ when users try to write to a KSM page, we have to make a copy.
+
+ksm_swpin_copy
+ is incremented every time a KSM page is copied when swapping in
+ note that KSM page might be copied when swapping in because do_swap_page()
+ cannot do all the locking needed to reconstitute a cross-anon_vma KSM page.
+
+Advisor
+=======
+
+The number of candidate pages for KSM is dynamic. It can be often observed
+that during the startup of an application more candidate pages need to be
+processed. Without an advisor the ``pages_to_scan`` parameter needs to be
+sized for the maximum number of candidate pages. The scan time advisor can
+changes the ``pages_to_scan`` parameter based on demand.
+
+The advisor can be enabled, so KSM can automatically adapt to changes in the
+number of candidate pages to scan. Two advisors are implemented: none and
+scan-time. With none, no advisor is enabled. The default is none.
+
+The scan time advisor changes the ``pages_to_scan`` parameter based on the
+observed scan times. The possible values for the ``pages_to_scan`` parameter is
+limited by the ``advisor_max_cpu`` parameter. In addition there is also the
+``advisor_target_scan_time`` parameter. This parameter sets the target time to
+scan all the KSM candidate pages. The parameter ``advisor_target_scan_time``
+decides how aggressive the scan time advisor scans candidate pages. Lower
+values make the scan time advisor to scan more aggresively. This is the most
+important parameter for the configuration of the scan time advisor.
+
+The initial value and the maximum value can be changed with
+``advisor_min_pages_to_scan`` and ``advisor_max_pages_to_scan``. The default
+values are sufficient for most workloads and use cases.
+
+The ``pages_to_scan`` parameter is re-calculated after a scan has been completed.
+
+
--
Izik Eidus,
Hugh Dickins, 17 Nov 2009
diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst
index 5c4432c96c4b..098f14d83e99 100644
--- a/Documentation/admin-guide/mm/memory-hotplug.rst
+++ b/Documentation/admin-guide/mm/memory-hotplug.rst
@@ -1,444 +1,695 @@
-.. _admin_guide_memory_hotplug:
+==================
+Memory Hot(Un)Plug
+==================
-==============
-Memory Hotplug
-==============
+This document describes generic Linux support for memory hot(un)plug with
+a focus on System RAM, including ZONE_MOVABLE support.
-:Created: Jul 28 2007
-:Updated: Add some details about locking internals: Aug 20 2018
+.. contents:: :local:
-This document is about memory hotplug including how-to-use and current status.
-Because Memory Hotplug is still under development, contents of this text will
-be changed often.
+Introduction
+============
-.. contents:: :local:
+Memory hot(un)plug allows for increasing and decreasing the size of physical
+memory available to a machine at runtime. In the simplest case, it consists of
+physically plugging or unplugging a DIMM at runtime, coordinated with the
+operating system.
-.. note::
+Memory hot(un)plug is used for various purposes:
- (1) x86_64's has special implementation for memory hotplug.
- This text does not describe it.
- (2) This text assumes that sysfs is mounted at ``/sys``.
+- The physical memory available to a machine can be adjusted at runtime, up- or
+ downgrading the memory capacity. This dynamic memory resizing, sometimes
+ referred to as "capacity on demand", is frequently used with virtual machines
+ and logical partitions.
+- Replacing hardware, such as DIMMs or whole NUMA nodes, without downtime. One
+ example is replacing failing memory modules.
-Introduction
-============
+- Reducing energy consumption either by physically unplugging memory modules or
+ by logically unplugging (parts of) memory modules from Linux.
+
+Further, the basic memory hot(un)plug infrastructure in Linux is nowadays also
+used to expose persistent memory, other performance-differentiated memory and
+reserved memory regions as ordinary system RAM to Linux.
-Purpose of memory hotplug
--------------------------
+Linux only supports memory hot(un)plug on selected 64 bit architectures, such as
+x86_64, arm64, ppc64 and s390x.
-Memory Hotplug allows users to increase/decrease the amount of memory.
-Generally, there are two purposes.
+Memory Hot(Un)Plug Granularity
+------------------------------
-(A) For changing the amount of memory.
- This is to allow a feature like capacity on demand.
-(B) For installing/removing DIMMs or NUMA-nodes physically.
- This is to exchange DIMMs/NUMA-nodes, reduce power consumption, etc.
+Memory hot(un)plug in Linux uses the SPARSEMEM memory model, which divides the
+physical memory address space into chunks of the same size: memory sections. The
+size of a memory section is architecture dependent. For example, x86_64 uses
+128 MiB and ppc64 uses 16 MiB.
-(A) is required by highly virtualized environments and (B) is required by
-hardware which supports memory power management.
+Memory sections are combined into chunks referred to as "memory blocks". The
+size of a memory block is architecture dependent and corresponds to the smallest
+granularity that can be hot(un)plugged. The default size of a memory block is
+the same as memory section size, unless an architecture specifies otherwise.
-Linux memory hotplug is designed for both purpose.
+All memory blocks have the same size.
-Phases of memory hotplug
+Phases of Memory Hotplug
------------------------
-There are 2 phases in Memory Hotplug:
+Memory hotplug consists of two phases:
- 1) Physical Memory Hotplug phase
- 2) Logical Memory Hotplug phase.
+(1) Adding the memory to Linux
+(2) Onlining memory blocks
-The First phase is to communicate hardware/firmware and make/erase
-environment for hotplugged memory. Basically, this phase is necessary
-for the purpose (B), but this is good phase for communication between
-highly virtualized environments too.
+In the first phase, metadata, such as the memory map ("memmap") and page tables
+for the direct mapping, is allocated and initialized, and memory blocks are
+created; the latter also creates sysfs files for managing newly created memory
+blocks.
-When memory is hotplugged, the kernel recognizes new memory, makes new memory
-management tables, and makes sysfs files for new memory's operation.
+In the second phase, added memory is exposed to the page allocator. After this
+phase, the memory is visible in memory statistics, such as free and total
+memory, of the system.
-If firmware supports notification of connection of new memory to OS,
-this phase is triggered automatically. ACPI can notify this event. If not,
-"probe" operation by system administration is used instead.
-(see :ref:`memory_hotplug_physical_mem`).
+Phases of Memory Hotunplug
+--------------------------
-Logical Memory Hotplug phase is to change memory state into
-available/unavailable for users. Amount of memory from user's view is
-changed by this phase. The kernel makes all memory in it as free pages
-when a memory range is available.
+Memory hotunplug consists of two phases:
-In this document, this phase is described as online/offline.
+(1) Offlining memory blocks
+(2) Removing the memory from Linux
-Logical Memory Hotplug phase is triggered by write of sysfs file by system
-administrator. For the hot-add case, it must be executed after Physical Hotplug
-phase by hand.
-(However, if you writes udev's hotplug scripts for memory hotplug, these
-phases can be execute in seamless way.)
+In the first phase, memory is "hidden" from the page allocator again, for
+example, by migrating busy memory to other memory locations and removing all
+relevant free pages from the page allocator After this phase, the memory is no
+longer visible in memory statistics of the system.
-Unit of Memory online/offline operation
----------------------------------------
+In the second phase, the memory blocks are removed and metadata is freed.
-Memory hotplug uses SPARSEMEM memory model which allows memory to be divided
-into chunks of the same size. These chunks are called "sections". The size of
-a memory section is architecture dependent. For example, power uses 16MiB, ia64
-uses 1GiB.
+Memory Hotplug Notifications
+============================
-Memory sections are combined into chunks referred to as "memory blocks". The
-size of a memory block is architecture dependent and represents the logical
-unit upon which memory online/offline operations are to be performed. The
-default size of a memory block is the same as memory section size unless an
-architecture specifies otherwise. (see :ref:`memory_hotplug_sysfs_files`.)
+There are various ways how Linux is notified about memory hotplug events such
+that it can start adding hotplugged memory. This description is limited to
+systems that support ACPI; mechanisms specific to other firmware interfaces or
+virtual machines are not described.
-To determine the size (in bytes) of a memory block please read this file::
+ACPI Notifications
+------------------
- /sys/devices/system/memory/block_size_bytes
+Platforms that support ACPI, such as x86_64, can support memory hotplug
+notifications via ACPI.
-Kernel Configuration
-====================
+In general, a firmware supporting memory hotplug defines a memory class object
+HID "PNP0C80". When notified about hotplug of a new memory device, the ACPI
+driver will hotplug the memory to Linux.
-To use memory hotplug feature, kernel must be compiled with following
-config options.
+If the firmware supports hotplug of NUMA nodes, it defines an object _HID
+"ACPI0004", "PNP0A05", or "PNP0A06". When notified about an hotplug event, all
+assigned memory devices are added to Linux by the ACPI driver.
-- For all memory hotplug:
- - Memory model -> Sparse Memory (``CONFIG_SPARSEMEM``)
- - Allow for memory hot-add (``CONFIG_MEMORY_HOTPLUG``)
+Similarly, Linux can be notified about requests to hotunplug a memory device or
+a NUMA node via ACPI. The ACPI driver will try offlining all relevant memory
+blocks, and, if successful, hotunplug the memory from Linux.
-- To enable memory removal, the following are also necessary:
- - Allow for memory hot remove (``CONFIG_MEMORY_HOTREMOVE``)
- - Page Migration (``CONFIG_MIGRATION``)
+Manual Probing
+--------------
-- For ACPI memory hotplug, the following are also necessary:
- - Memory hotplug (under ACPI Support menu) (``CONFIG_ACPI_HOTPLUG_MEMORY``)
- - This option can be kernel module.
+On some architectures, the firmware may not be able to notify the operating
+system about a memory hotplug event. Instead, the memory has to be manually
+probed from user space.
-- As a related configuration, if your box has a feature of NUMA-node hotplug
- via ACPI, then this option is necessary too.
+The probe interface is located at::
- - ACPI0004,PNP0A05 and PNP0A06 Container Driver (under ACPI Support menu)
- (``CONFIG_ACPI_CONTAINER``).
+ /sys/devices/system/memory/probe
- This option can be kernel module too.
+Only complete memory blocks can be probed. Individual memory blocks are probed
+by providing the physical start address of the memory block::
+ % echo addr > /sys/devices/system/memory/probe
-.. _memory_hotplug_sysfs_files:
+Which results in a memory block for the range [addr, addr + memory_block_size)
+being created.
-sysfs files for memory hotplug
-==============================
+.. note::
-All memory blocks have their device information in sysfs. Each memory block
-is described under ``/sys/devices/system/memory`` as::
+ Using the probe interface is discouraged as it is easy to crash the kernel,
+ because Linux cannot validate user input; this interface might be removed in
+ the future.
- /sys/devices/system/memory/memoryXXX
+Onlining and Offlining Memory Blocks
+====================================
-where XXX is the memory block id.
+After a memory block has been created, Linux has to be instructed to actually
+make use of that memory: the memory block has to be "online".
-For the memory block covered by the sysfs directory. It is expected that all
-memory sections in this range are present and no memory holes exist in the
-range. Currently there is no way to determine if there is a memory hole, but
-the existence of one should not affect the hotplug capabilities of the memory
-block.
+Before a memory block can be removed, Linux has to stop using any memory part of
+the memory block: the memory block has to be "offlined".
-For example, assume 1GiB memory block size. A device for a memory starting at
-0x100000000 is ``/sys/device/system/memory/memory4``::
+The Linux kernel can be configured to automatically online added memory blocks
+and drivers automatically trigger offlining of memory blocks when trying
+hotunplug of memory. Memory blocks can only be removed once offlining succeeded
+and drivers may trigger offlining of memory blocks when attempting hotunplug of
+memory.
- (0x100000000 / 1Gib = 4)
+Onlining Memory Blocks Manually
+-------------------------------
-This device covers address range [0x100000000 ... 0x140000000)
+If auto-onlining of memory blocks isn't enabled, user-space has to manually
+trigger onlining of memory blocks. Often, udev rules are used to automate this
+task in user space.
-Under each memory block, you can see 5 files:
+Onlining of a memory block can be triggered via::
-- ``/sys/devices/system/memory/memoryXXX/phys_index``
-- ``/sys/devices/system/memory/memoryXXX/phys_device``
-- ``/sys/devices/system/memory/memoryXXX/state``
-- ``/sys/devices/system/memory/memoryXXX/removable``
-- ``/sys/devices/system/memory/memoryXXX/valid_zones``
+ % echo online > /sys/devices/system/memory/memoryXXX/state
-=================== ============================================================
-``phys_index`` read-only and contains memory block id, same as XXX.
-``state`` read-write
-
- - at read: contains online/offline state of memory.
- - at write: user can specify "online_kernel",
-
- "online_movable", "online", "offline" command
- which will be performed on all sections in the block.
-``phys_device`` read-only: designed to show the name of physical memory
- device. This is not well implemented now.
-``removable`` read-only: contains an integer value indicating
- whether the memory block is removable or not
- removable. A value of 1 indicates that the memory
- block is removable and a value of 0 indicates that
- it is not removable. A memory block is removable only if
- every section in the block is removable.
-``valid_zones`` read-only: designed to show which zones this memory block
- can be onlined to.
-
- The first column shows it`s default zone.
-
- "memory6/valid_zones: Normal Movable" shows this memoryblock
- can be onlined to ZONE_NORMAL by default and to ZONE_MOVABLE
- by online_movable.
-
- "memory7/valid_zones: Movable Normal" shows this memoryblock
- can be onlined to ZONE_MOVABLE by default and to ZONE_NORMAL
- by online_kernel.
-=================== ============================================================
+Or alternatively::
+
+ % echo 1 > /sys/devices/system/memory/memoryXXX/online
+
+The kernel will select the target zone automatically, depending on the
+configured ``online_policy``.
+
+One can explicitly request to associate an offline memory block with
+ZONE_MOVABLE by::
+
+ % echo online_movable > /sys/devices/system/memory/memoryXXX/state
+
+Or one can explicitly request a kernel zone (usually ZONE_NORMAL) by::
+
+ % echo online_kernel > /sys/devices/system/memory/memoryXXX/state
+
+In any case, if onlining succeeds, the state of the memory block is changed to
+be "online". If it fails, the state of the memory block will remain unchanged
+and the above commands will fail.
+
+Onlining Memory Blocks Automatically
+------------------------------------
+
+The kernel can be configured to try auto-onlining of newly added memory blocks.
+If this feature is disabled, the memory blocks will stay offline until
+explicitly onlined from user space.
+
+The configured auto-online behavior can be observed via::
+
+ % cat /sys/devices/system/memory/auto_online_blocks
+
+Auto-onlining can be enabled by writing ``online``, ``online_kernel`` or
+``online_movable`` to that file, like::
+
+ % echo online > /sys/devices/system/memory/auto_online_blocks
+
+Similarly to manual onlining, with ``online`` the kernel will select the
+target zone automatically, depending on the configured ``online_policy``.
+
+Modifying the auto-online behavior will only affect all subsequently added
+memory blocks only.
.. note::
- These directories/files appear after physical memory hotplug phase.
+ In corner cases, auto-onlining can fail. The kernel won't retry. Note that
+ auto-onlining is not expected to fail in default configurations.
-If CONFIG_NUMA is enabled the memoryXXX/ directories can also be accessed
-via symbolic links located in the ``/sys/devices/system/node/node*`` directories.
+.. note::
-For example::
+ DLPAR on ppc64 ignores the ``offline`` setting and will still online added
+ memory blocks; if onlining fails, memory blocks are removed again.
- /sys/devices/system/node/node0/memory9 -> ../../memory/memory9
+Offlining Memory Blocks
+-----------------------
-A backlink will also be created::
+In the current implementation, Linux's memory offlining will try migrating all
+movable pages off the affected memory block. As most kernel allocations, such as
+page tables, are unmovable, page migration can fail and, therefore, inhibit
+memory offlining from succeeding.
- /sys/devices/system/memory/memory9/node0 -> ../../node/node0
+Having the memory provided by memory block managed by ZONE_MOVABLE significantly
+increases memory offlining reliability; still, memory offlining can fail in
+some corner cases.
-.. _memory_hotplug_physical_mem:
+Further, memory offlining might retry for a long time (or even forever), until
+aborted by the user.
-Physical memory hot-add phase
-=============================
+Offlining of a memory block can be triggered via::
-Hardware(Firmware) Support
---------------------------
+ % echo offline > /sys/devices/system/memory/memoryXXX/state
-On x86_64/ia64 platform, memory hotplug by ACPI is supported.
+Or alternatively::
-In general, the firmware (ACPI) which supports memory hotplug defines
-memory class object of _HID "PNP0C80". When a notify is asserted to PNP0C80,
-Linux's ACPI handler does hot-add memory to the system and calls a hotplug udev
-script. This will be done automatically.
+ % echo 0 > /sys/devices/system/memory/memoryXXX/online
-But scripts for memory hotplug are not contained in generic udev package(now).
-You may have to write it by yourself or online/offline memory by hand.
-Please see :ref:`memory_hotplug_how_to_online_memory` and
-:ref:`memory_hotplug_how_to_offline_memory`.
+If offlining succeeds, the state of the memory block is changed to be "offline".
+If it fails, the state of the memory block will remain unchanged and the above
+commands will fail, for example, via::
-If firmware supports NUMA-node hotplug, and defines an object _HID "ACPI0004",
-"PNP0A05", or "PNP0A06", notification is asserted to it, and ACPI handler
-calls hotplug code for all of objects which are defined in it.
-If memory device is found, memory hotplug code will be called.
+ bash: echo: write error: Device or resource busy
-Notify memory hot-add event by hand
------------------------------------
+or via::
-On some architectures, the firmware may not notify the kernel of a memory
-hotplug event. Therefore, the memory "probe" interface is supported to
-explicitly notify the kernel. This interface depends on
-CONFIG_ARCH_MEMORY_PROBE and can be configured on powerpc, sh, and x86
-if hotplug is supported, although for x86 this should be handled by ACPI
-notification.
+ bash: echo: write error: Invalid argument
-Probe interface is located at::
+Observing the State of Memory Blocks
+------------------------------------
- /sys/devices/system/memory/probe
+The state (online/offline/going-offline) of a memory block can be observed
+either via::
-You can tell the physical address of new memory to the kernel by::
+ % cat /sys/devices/system/memory/memoryXXX/state
- % echo start_address_of_new_memory > /sys/devices/system/memory/probe
+Or alternatively (1/0) via::
-Then, [start_address_of_new_memory, start_address_of_new_memory +
-memory_block_size] memory range is hot-added. In this case, hotplug script is
-not called (in current implementation). You'll have to online memory by
-yourself. Please see :ref:`memory_hotplug_how_to_online_memory`.
+ % cat /sys/devices/system/memory/memoryXXX/online
-Logical Memory hot-add phase
-============================
+For an online memory block, the managing zone can be observed via::
-State of memory
----------------
+ % cat /sys/devices/system/memory/memoryXXX/valid_zones
-To see (online/offline) state of a memory block, read 'state' file::
+Configuring Memory Hot(Un)Plug
+==============================
- % cat /sys/device/system/memory/memoryXXX/state
+There are various ways how system administrators can configure memory
+hot(un)plug and interact with memory blocks, especially, to online them.
+Memory Hot(Un)Plug Configuration via Sysfs
+------------------------------------------
-- If the memory block is online, you'll read "online".
-- If the memory block is offline, you'll read "offline".
+Some memory hot(un)plug properties can be configured or inspected via sysfs in::
+ /sys/devices/system/memory/
-.. _memory_hotplug_how_to_online_memory:
+The following files are currently defined:
-How to online memory
---------------------
+====================== =========================================================
+``auto_online_blocks`` read-write: set or get the default state of new memory
+ blocks; configure auto-onlining.
-When the memory is hot-added, the kernel decides whether or not to "online"
-it according to the policy which can be read from "auto_online_blocks" file::
+ The default value depends on the
+ CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel configuration
+ option.
- % cat /sys/devices/system/memory/auto_online_blocks
+ See the ``state`` property of memory blocks for details.
+``block_size_bytes`` read-only: the size in bytes of a memory block.
+``probe`` write-only: add (probe) selected memory blocks manually
+ from user space by supplying the physical start address.
-The default depends on the CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config
-option. If it is disabled the default is "offline" which means the newly added
-memory is not in a ready-to-use state and you have to "online" the newly added
-memory blocks manually. Automatic onlining can be requested by writing "online"
-to "auto_online_blocks" file::
+ Availability depends on the CONFIG_ARCH_MEMORY_PROBE
+ kernel configuration option.
+``uevent`` read-write: generic udev file for device subsystems.
+``crash_hotplug`` read-only: when changes to the system memory map
+ occur due to hot un/plug of memory, this file contains
+ '1' if the kernel updates the kdump capture kernel memory
+ map itself (via elfcorehdr), or '0' if userspace must update
+ the kdump capture kernel memory map.
- % echo online > /sys/devices/system/memory/auto_online_blocks
+ Availability depends on the CONFIG_MEMORY_HOTPLUG kernel
+ configuration option.
+====================== =========================================================
-This sets a global policy and impacts all memory blocks that will subsequently
-be hotplugged. Currently offline blocks keep their state. It is possible, under
-certain circumstances, that some memory blocks will be added but will fail to
-online. User space tools can check their "state" files
-(``/sys/devices/system/memory/memoryXXX/state``) and try to online them manually.
+.. note::
-If the automatic onlining wasn't requested, failed, or some memory block was
-offlined it is possible to change the individual block's state by writing to the
-"state" file::
+ When the CONFIG_MEMORY_FAILURE kernel configuration option is enabled, two
+ additional files ``hard_offline_page`` and ``soft_offline_page`` are available
+ to trigger hwpoisoning of pages, for example, for testing purposes. Note that
+ this functionality is not really related to memory hot(un)plug or actual
+ offlining of memory blocks.
- % echo online > /sys/devices/system/memory/memoryXXX/state
+Memory Block Configuration via Sysfs
+------------------------------------
-This onlining will not change the ZONE type of the target memory block,
-If the memory block doesn't belong to any zone an appropriate kernel zone
-(usually ZONE_NORMAL) will be used unless movable_node kernel command line
-option is specified when ZONE_MOVABLE will be used.
+Each memory block is represented as a memory block device that can be
+onlined or offlined. All memory blocks have their device information located in
+sysfs. Each present memory block is listed under
+``/sys/devices/system/memory`` as::
-You can explicitly request to associate it with ZONE_MOVABLE by::
+ /sys/devices/system/memory/memoryXXX
- % echo online_movable > /sys/devices/system/memory/memoryXXX/state
+where XXX is the memory block id; the number of digits is variable.
-.. note:: current limit: this memory block must be adjacent to ZONE_MOVABLE
+A present memory block indicates that some memory in the range is present;
+however, a memory block might span memory holes. A memory block spanning memory
+holes cannot be offlined.
-Or you can explicitly request a kernel zone (usually ZONE_NORMAL) by::
+For example, assume 1 GiB memory block size. A device for a memory starting at
+0x100000000 is ``/sys/devices/system/memory/memory4``::
- % echo online_kernel > /sys/devices/system/memory/memoryXXX/state
+ (0x100000000 / 1Gib = 4)
+
+This device covers address range [0x100000000 ... 0x140000000)
-.. note:: current limit: this memory block must be adjacent to ZONE_NORMAL
+The following files are currently defined:
-An explicit zone onlining can fail (e.g. when the range is already within
-and existing and incompatible zone already).
+=================== ============================================================
+``online`` read-write: simplified interface to trigger onlining /
+ offlining and to observe the state of a memory block.
+ When onlining, the zone is selected automatically.
+``phys_device`` read-only: legacy interface only ever used on s390x to
+ expose the covered storage increment.
+``phys_index`` read-only: the memory block id (XXX).
+``removable`` read-only: legacy interface that indicated whether a memory
+ block was likely to be offlineable or not. Nowadays, the
+ kernel return ``1`` if and only if it supports memory
+ offlining.
+``state`` read-write: advanced interface to trigger onlining /
+ offlining and to observe the state of a memory block.
+
+ When writing, ``online``, ``offline``, ``online_kernel`` and
+ ``online_movable`` are supported.
+
+ ``online_movable`` specifies onlining to ZONE_MOVABLE.
+ ``online_kernel`` specifies onlining to the default kernel
+ zone for the memory block, such as ZONE_NORMAL.
+ ``online`` let's the kernel select the zone automatically.
+
+ When reading, ``online``, ``offline`` and ``going-offline``
+ may be returned.
+``uevent`` read-write: generic uevent file for devices.
+``valid_zones`` read-only: when a block is online, shows the zone it
+ belongs to; when a block is offline, shows what zone will
+ manage it when the block will be onlined.
+
+ For online memory blocks, ``DMA``, ``DMA32``, ``Normal``,
+ ``Movable`` and ``none`` may be returned. ``none`` indicates
+ that memory provided by a memory block is managed by
+ multiple zones or spans multiple nodes; such memory blocks
+ cannot be offlined. ``Movable`` indicates ZONE_MOVABLE.
+ Other values indicate a kernel zone.
+
+ For offline memory blocks, the first column shows the
+ zone the kernel would select when onlining the memory block
+ right now without further specifying a zone.
+
+ Availability depends on the CONFIG_MEMORY_HOTREMOVE
+ kernel configuration option.
+=================== ============================================================
+
+.. note::
-After this, memory block XXX's state will be 'online' and the amount of
-available memory will be increased.
+ If the CONFIG_NUMA kernel configuration option is enabled, the memoryXXX/
+ directories can also be accessed via symbolic links located in the
+ ``/sys/devices/system/node/node*`` directories.
-This may be changed in future.
+ For example::
-Logical memory remove
-=====================
+ /sys/devices/system/node/node0/memory9 -> ../../memory/memory9
-Memory offline and ZONE_MOVABLE
--------------------------------
+ A backlink will also be created::
-Memory offlining is more complicated than memory online. Because memory offline
-has to make the whole memory block be unused, memory offline can fail if
-the memory block includes memory which cannot be freed.
+ /sys/devices/system/memory/memory9/node0 -> ../../node/node0
+
+Command Line Parameters
+-----------------------
+
+Some command line parameters affect memory hot(un)plug handling. The following
+command line parameters are relevant:
+
+======================== =======================================================
+``memhp_default_state`` configure auto-onlining by essentially setting
+ ``/sys/devices/system/memory/auto_online_blocks``.
+``movable_node`` configure automatic zone selection in the kernel when
+ using the ``contig-zones`` online policy. When
+ set, the kernel will default to ZONE_MOVABLE when
+ onlining a memory block, unless other zones can be kept
+ contiguous.
+======================== =======================================================
+
+See Documentation/admin-guide/kernel-parameters.txt for a more generic
+description of these command line parameters.
+
+Module Parameters
+------------------
+
+Instead of additional command line parameters or sysfs files, the
+``memory_hotplug`` subsystem now provides a dedicated namespace for module
+parameters. Module parameters can be set via the command line by predicating
+them with ``memory_hotplug.`` such as::
+
+ memory_hotplug.memmap_on_memory=1
+
+and they can be observed (and some even modified at runtime) via::
+
+ /sys/module/memory_hotplug/parameters/
+
+The following module parameters are currently defined:
+
+================================ ===============================================
+``memmap_on_memory`` read-write: Allocate memory for the memmap from
+ the added memory block itself. Even if enabled,
+ actual support depends on various other system
+ properties and should only be regarded as a
+ hint whether the behavior would be desired.
+
+ While allocating the memmap from the memory
+ block itself makes memory hotplug less likely
+ to fail and keeps the memmap on the same NUMA
+ node in any case, it can fragment physical
+ memory in a way that huge pages in bigger
+ granularity cannot be formed on hotplugged
+ memory.
+
+ With value "force" it could result in memory
+ wastage due to memmap size limitations. For
+ example, if the memmap for a memory block
+ requires 1 MiB, but the pageblock size is 2
+ MiB, 1 MiB of hotplugged memory will be wasted.
+ Note that there are still cases where the
+ feature cannot be enforced: for example, if the
+ memmap is smaller than a single page, or if the
+ architecture does not support the forced mode
+ in all configurations.
+
+``online_policy`` read-write: Set the basic policy used for
+ automatic zone selection when onlining memory
+ blocks without specifying a target zone.
+ ``contig-zones`` has been the kernel default
+ before this parameter was added. After an
+ online policy was configured and memory was
+ online, the policy should not be changed
+ anymore.
+
+ When set to ``contig-zones``, the kernel will
+ try keeping zones contiguous. If a memory block
+ intersects multiple zones or no zone, the
+ behavior depends on the ``movable_node`` kernel
+ command line parameter: default to ZONE_MOVABLE
+ if set, default to the applicable kernel zone
+ (usually ZONE_NORMAL) if not set.
+
+ When set to ``auto-movable``, the kernel will
+ try onlining memory blocks to ZONE_MOVABLE if
+ possible according to the configuration and
+ memory device details. With this policy, one
+ can avoid zone imbalances when eventually
+ hotplugging a lot of memory later and still
+ wanting to be able to hotunplug as much as
+ possible reliably, very desirable in
+ virtualized environments. This policy ignores
+ the ``movable_node`` kernel command line
+ parameter and isn't really applicable in
+ environments that require it (e.g., bare metal
+ with hotunpluggable nodes) where hotplugged
+ memory might be exposed via the
+ firmware-provided memory map early during boot
+ to the system instead of getting detected,
+ added and onlined later during boot (such as
+ done by virtio-mem or by some hypervisors
+ implementing emulated DIMMs). As one example, a
+ hotplugged DIMM will be onlined either
+ completely to ZONE_MOVABLE or completely to
+ ZONE_NORMAL, not a mixture.
+ As another example, as many memory blocks
+ belonging to a virtio-mem device will be
+ onlined to ZONE_MOVABLE as possible,
+ special-casing units of memory blocks that can
+ only get hotunplugged together. *This policy
+ does not protect from setups that are
+ problematic with ZONE_MOVABLE and does not
+ change the zone of memory blocks dynamically
+ after they were onlined.*
+``auto_movable_ratio`` read-write: Set the maximum MOVABLE:KERNEL
+ memory ratio in % for the ``auto-movable``
+ online policy. Whether the ratio applies only
+ for the system across all NUMA nodes or also
+ per NUMA nodes depends on the
+ ``auto_movable_numa_aware`` configuration.
+
+ All accounting is based on present memory pages
+ in the zones combined with accounting per
+ memory device. Memory dedicated to the CMA
+ allocator is accounted as MOVABLE, although
+ residing on one of the kernel zones. The
+ possible ratio depends on the actual workload.
+ The kernel default is "301" %, for example,
+ allowing for hotplugging 24 GiB to a 8 GiB VM
+ and automatically onlining all hotplugged
+ memory to ZONE_MOVABLE in many setups. The
+ additional 1% deals with some pages being not
+ present, for example, because of some firmware
+ allocations.
+
+ Note that ZONE_NORMAL memory provided by one
+ memory device does not allow for more
+ ZONE_MOVABLE memory for a different memory
+ device. As one example, onlining memory of a
+ hotplugged DIMM to ZONE_NORMAL will not allow
+ for another hotplugged DIMM to get onlined to
+ ZONE_MOVABLE automatically. In contrast, memory
+ hotplugged by a virtio-mem device that got
+ onlined to ZONE_NORMAL will allow for more
+ ZONE_MOVABLE memory within *the same*
+ virtio-mem device.
+``auto_movable_numa_aware`` read-write: Configure whether the
+ ``auto_movable_ratio`` in the ``auto-movable``
+ online policy also applies per NUMA
+ node in addition to the whole system across all
+ NUMA nodes. The kernel default is "Y".
+
+ Disabling NUMA awareness can be helpful when
+ dealing with NUMA nodes that should be
+ completely hotunpluggable, onlining the memory
+ completely to ZONE_MOVABLE automatically if
+ possible.
+
+ Parameter availability depends on CONFIG_NUMA.
+================================ ===============================================
+
+ZONE_MOVABLE
+============
-In general, memory offline can use 2 techniques.
+ZONE_MOVABLE is an important mechanism for more reliable memory offlining.
+Further, having system RAM managed by ZONE_MOVABLE instead of one of the
+kernel zones can increase the number of possible transparent huge pages and
+dynamically allocated huge pages.
-(1) reclaim and free all memory in the memory block.
-(2) migrate all pages in the memory block.
+Most kernel allocations are unmovable. Important examples include the memory
+map (usually 1/64ths of memory), page tables, and kmalloc(). Such allocations
+can only be served from the kernel zones.
-In the current implementation, Linux's memory offline uses method (2), freeing
-all pages in the memory block by page migration. But not all pages are
-migratable. Under current Linux, migratable pages are anonymous pages and
-page caches. For offlining a memory block by migration, the kernel has to
-guarantee that the memory block contains only migratable pages.
+Most user space pages, such as anonymous memory, and page cache pages are
+movable. Such allocations can be served from ZONE_MOVABLE and the kernel zones.
-Now, a boot option for making a memory block which consists of migratable pages
-is supported. By specifying "kernelcore=" or "movablecore=" boot option, you can
-create ZONE_MOVABLE...a zone which is just used for movable pages.
-(See also Documentation/admin-guide/kernel-parameters.rst)
+Only movable allocations are served from ZONE_MOVABLE, resulting in unmovable
+allocations being limited to the kernel zones. Without ZONE_MOVABLE, there is
+absolutely no guarantee whether a memory block can be offlined successfully.
-Assume the system has "TOTAL" amount of memory at boot time, this boot option
-creates ZONE_MOVABLE as following.
+Zone Imbalances
+---------------
+
+Having too much system RAM managed by ZONE_MOVABLE is called a zone imbalance,
+which can harm the system or degrade performance. As one example, the kernel
+might crash because it runs out of free memory for unmovable allocations,
+although there is still plenty of free memory left in ZONE_MOVABLE.
-1) When kernelcore=YYYY boot option is used,
- Size of memory not for movable pages (not for offline) is YYYY.
- Size of memory for movable pages (for offline) is TOTAL-YYYY.
+Usually, MOVABLE:KERNEL ratios of up to 3:1 or even 4:1 are fine. Ratios of 63:1
+are definitely impossible due to the overhead for the memory map.
-2) When movablecore=ZZZZ boot option is used,
- Size of memory not for movable pages (not for offline) is TOTAL - ZZZZ.
- Size of memory for movable pages (for offline) is ZZZZ.
+Actual safe zone ratios depend on the workload. Extreme cases, like excessive
+long-term pinning of pages, might not be able to deal with ZONE_MOVABLE at all.
.. note::
- Unfortunately, there is no information to show which memory block belongs
- to ZONE_MOVABLE. This is TBD.
+ CMA memory part of a kernel zone essentially behaves like memory in
+ ZONE_MOVABLE and similar considerations apply, especially when combining
+ CMA with ZONE_MOVABLE.
-.. _memory_hotplug_how_to_offline_memory:
+ZONE_MOVABLE Sizing Considerations
+----------------------------------
-How to offline memory
----------------------
+We usually expect that a large portion of available system RAM will actually
+be consumed by user space, either directly or indirectly via the page cache. In
+the normal case, ZONE_MOVABLE can be used when allocating such pages just fine.
-You can offline a memory block by using the same sysfs interface that was used
-in memory onlining::
+With that in mind, it makes sense that we can have a big portion of system RAM
+managed by ZONE_MOVABLE. However, there are some things to consider when using
+ZONE_MOVABLE, especially when fine-tuning zone ratios:
- % echo offline > /sys/devices/system/memory/memoryXXX/state
+- Having a lot of offline memory blocks. Even offline memory blocks consume
+ memory for metadata and page tables in the direct map; having a lot of offline
+ memory blocks is not a typical case, though.
+
+- Memory ballooning without balloon compaction is incompatible with
+ ZONE_MOVABLE. Only some implementations, such as virtio-balloon and
+ pseries CMM, fully support balloon compaction.
+
+ Further, the CONFIG_BALLOON_COMPACTION kernel configuration option might be
+ disabled. In that case, balloon inflation will only perform unmovable
+ allocations and silently create a zone imbalance, usually triggered by
+ inflation requests from the hypervisor.
+
+- Gigantic pages are unmovable, resulting in user space consuming a
+ lot of unmovable memory.
+
+- Huge pages are unmovable when an architectures does not support huge
+ page migration, resulting in a similar issue as with gigantic pages.
+
+- Page tables are unmovable. Excessive swapping, mapping extremely large
+ files or ZONE_DEVICE memory can be problematic, although only really relevant
+ in corner cases. When we manage a lot of user space memory that has been
+ swapped out or is served from a file/persistent memory/... we still need a lot
+ of page tables to manage that memory once user space accessed that memory.
+
+- In certain DAX configurations the memory map for the device memory will be
+ allocated from the kernel zones.
+
+- KASAN can have a significant memory overhead, for example, consuming 1/8th of
+ the total system memory size as (unmovable) tracking metadata.
+
+- Long-term pinning of pages. Techniques that rely on long-term pinnings
+ (especially, RDMA and vfio/mdev) are fundamentally problematic with
+ ZONE_MOVABLE, and therefore, memory offlining. Pinned pages cannot reside
+ on ZONE_MOVABLE as that would turn these pages unmovable. Therefore, they
+ have to be migrated off that zone while pinning. Pinning a page can fail
+ even if there is plenty of free memory in ZONE_MOVABLE.
+
+ In addition, using ZONE_MOVABLE might make page pinning more expensive,
+ because of the page migration overhead.
+
+By default, all the memory configured at boot time is managed by the kernel
+zones and ZONE_MOVABLE is not used.
+
+To enable ZONE_MOVABLE to include the memory present at boot and to control the
+ratio between movable and kernel zones there are two command line options:
+``kernelcore=`` and ``movablecore=``. See
+Documentation/admin-guide/kernel-parameters.rst for their description.
+
+Memory Offlining and ZONE_MOVABLE
+---------------------------------
+
+Even with ZONE_MOVABLE, there are some corner cases where offlining a memory
+block might fail:
+
+- Memory blocks with memory holes; this applies to memory blocks present during
+ boot and can apply to memory blocks hotplugged via the XEN balloon and the
+ Hyper-V balloon.
+
+- Mixed NUMA nodes and mixed zones within a single memory block prevent memory
+ offlining; this applies to memory blocks present during boot only.
+
+- Special memory blocks prevented by the system from getting offlined. Examples
+ include any memory available during boot on arm64 or memory blocks spanning
+ the crashkernel area on s390x; this usually applies to memory blocks present
+ during boot only.
+
+- Memory blocks overlapping with CMA areas cannot be offlined, this applies to
+ memory blocks present during boot only.
+
+- Concurrent activity that operates on the same physical memory area, such as
+ allocating gigantic pages, can result in temporary offlining failures.
+
+- Out of memory when dissolving huge pages, especially when HugeTLB Vmemmap
+ Optimization (HVO) is enabled.
+
+ Offlining code may be able to migrate huge page contents, but may not be able
+ to dissolve the source huge page because it fails allocating (unmovable) pages
+ for the vmemmap, because the system might not have free memory in the kernel
+ zones left.
+
+ Users that depend on memory offlining to succeed for movable zones should
+ carefully consider whether the memory savings gained from this feature are
+ worth the risk of possibly not being able to offline memory in certain
+ situations.
+
+Further, when running into out of memory situations while migrating pages, or
+when still encountering permanently unmovable pages within ZONE_MOVABLE
+(-> BUG), memory offlining will keep retrying until it eventually succeeds.
+
+When offlining is triggered from user space, the offlining context can be
+terminated by sending a signal. A timeout based offlining can easily be
+implemented via::
-If offline succeeds, the state of the memory block is changed to be "offline".
-If it fails, some error core (like -EBUSY) will be returned by the kernel.
-Even if a memory block does not belong to ZONE_MOVABLE, you can try to offline
-it. If it doesn't contain 'unmovable' memory, you'll get success.
-
-A memory block under ZONE_MOVABLE is considered to be able to be offlined
-easily. But under some busy state, it may return -EBUSY. Even if a memory
-block cannot be offlined due to -EBUSY, you can retry offlining it and may be
-able to offline it (or not). (For example, a page is referred to by some kernel
-internal call and released soon.)
-
-Consideration:
- Memory hotplug's design direction is to make the possibility of memory
- offlining higher and to guarantee unplugging memory under any situation. But
- it needs more work. Returning -EBUSY under some situation may be good because
- the user can decide to retry more or not by himself. Currently, memory
- offlining code does some amount of retry with 120 seconds timeout.
-
-Physical memory remove
-======================
-
-Need more implementation yet....
- - Notification completion of remove works by OS to firmware.
- - Guard from remove if not yet.
-
-
-Locking Internals
-=================
-
-When adding/removing memory that uses memory block devices (i.e. ordinary RAM),
-the device_hotplug_lock should be held to:
-
-- synchronize against online/offline requests (e.g. via sysfs). This way, memory
- block devices can only be accessed (.online/.state attributes) by user
- space once memory has been fully added. And when removing memory, we
- know nobody is in critical sections.
-- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
-
-Especially, there is a possible lock inversion that is avoided using
-device_hotplug_lock when adding memory and user space tries to online that
-memory faster than expected:
-
-- device_online() will first take the device_lock(), followed by
- mem_hotplug_lock
-- add_memory_resource() will first take the mem_hotplug_lock, followed by
- the device_lock() (while creating the devices, during bus_add_device()).
-
-As the device is visible to user space before taking the device_lock(), this
-can result in a lock inversion.
-
-onlining/offlining of memory should be done via device_online()/
-device_offline() - to make sure it is properly synchronized to actions
-via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
-
-When adding/removing/onlining/offlining memory or adding/removing
-heterogeneous/device memory, we should always hold the mem_hotplug_lock in
-write mode to serialise memory hotplug (e.g. access to global/zone
-variables).
-
-In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read
-mode allows for a quite efficient get_online_mems/put_online_mems
-implementation, so code accessing memory can protect from that memory
-vanishing.
-
-
-Future Work
-===========
-
- - allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like
- sysctl or new control file.
- - showing memory block and physical device relationship.
- - test and make it better memory offlining.
- - support HugeTLB page migration and offlining.
- - memmap removing at memory offline.
- - physical remove memory.
+ % timeout $TIMEOUT offline_block | failure_handling
diff --git a/Documentation/admin-guide/mm/multigen_lru.rst b/Documentation/admin-guide/mm/multigen_lru.rst
new file mode 100644
index 000000000000..33e068830497
--- /dev/null
+++ b/Documentation/admin-guide/mm/multigen_lru.rst
@@ -0,0 +1,162 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+Multi-Gen LRU
+=============
+The multi-gen LRU is an alternative LRU implementation that optimizes
+page reclaim and improves performance under memory pressure. Page
+reclaim decides the kernel's caching policy and ability to overcommit
+memory. It directly impacts the kswapd CPU usage and RAM efficiency.
+
+Quick start
+===========
+Build the kernel with the following configurations.
+
+* ``CONFIG_LRU_GEN=y``
+* ``CONFIG_LRU_GEN_ENABLED=y``
+
+All set!
+
+Runtime options
+===============
+``/sys/kernel/mm/lru_gen/`` contains stable ABIs described in the
+following subsections.
+
+Kill switch
+-----------
+``enabled`` accepts different values to enable or disable the
+following components. Its default value depends on
+``CONFIG_LRU_GEN_ENABLED``. All the components should be enabled
+unless some of them have unforeseen side effects. Writing to
+``enabled`` has no effect when a component is not supported by the
+hardware, and valid values will be accepted even when the main switch
+is off.
+
+====== ===============================================================
+Values Components
+====== ===============================================================
+0x0001 The main switch for the multi-gen LRU.
+0x0002 Clearing the accessed bit in leaf page table entries in large
+ batches, when MMU sets it (e.g., on x86). This behavior can
+ theoretically worsen lock contention (mmap_lock). If it is
+ disabled, the multi-gen LRU will suffer a minor performance
+ degradation for workloads that contiguously map hot pages,
+ whose accessed bits can be otherwise cleared by fewer larger
+ batches.
+0x0004 Clearing the accessed bit in non-leaf page table entries as
+ well, when MMU sets it (e.g., on x86). This behavior was not
+ verified on x86 varieties other than Intel and AMD. If it is
+ disabled, the multi-gen LRU will suffer a negligible
+ performance degradation.
+[yYnN] Apply to all the components above.
+====== ===============================================================
+
+E.g.,
+::
+
+ echo y >/sys/kernel/mm/lru_gen/enabled
+ cat /sys/kernel/mm/lru_gen/enabled
+ 0x0007
+ echo 5 >/sys/kernel/mm/lru_gen/enabled
+ cat /sys/kernel/mm/lru_gen/enabled
+ 0x0005
+
+Thrashing prevention
+--------------------
+Personal computers are more sensitive to thrashing because it can
+cause janks (lags when rendering UI) and negatively impact user
+experience. The multi-gen LRU offers thrashing prevention to the
+majority of laptop and desktop users who do not have ``oomd``.
+
+Users can write ``N`` to ``min_ttl_ms`` to prevent the working set of
+``N`` milliseconds from getting evicted. The OOM killer is triggered
+if this working set cannot be kept in memory. In other words, this
+option works as an adjustable pressure relief valve, and when open, it
+terminates applications that are hopefully not being used.
+
+Based on the average human detectable lag (~100ms), ``N=1000`` usually
+eliminates intolerable janks due to thrashing. Larger values like
+``N=3000`` make janks less noticeable at the risk of premature OOM
+kills.
+
+The default value ``0`` means disabled.
+
+Experimental features
+=====================
+``/sys/kernel/debug/lru_gen`` accepts commands described in the
+following subsections. Multiple command lines are supported, so does
+concatenation with delimiters ``,`` and ``;``.
+
+``/sys/kernel/debug/lru_gen_full`` provides additional stats for
+debugging. ``CONFIG_LRU_GEN_STATS=y`` keeps historical stats from
+evicted generations in this file.
+
+Working set estimation
+----------------------
+Working set estimation measures how much memory an application needs
+in a given time interval, and it is usually done with little impact on
+the performance of the application. E.g., data centers want to
+optimize job scheduling (bin packing) to improve memory utilizations.
+When a new job comes in, the job scheduler needs to find out whether
+each server it manages can allocate a certain amount of memory for
+this new job before it can pick a candidate. To do so, the job
+scheduler needs to estimate the working sets of the existing jobs.
+
+When it is read, ``lru_gen`` returns a histogram of numbers of pages
+accessed over different time intervals for each memcg and node.
+``MAX_NR_GENS`` decides the number of bins for each histogram. The
+histograms are noncumulative.
+::
+
+ memcg memcg_id memcg_path
+ node node_id
+ min_gen_nr age_in_ms nr_anon_pages nr_file_pages
+ ...
+ max_gen_nr age_in_ms nr_anon_pages nr_file_pages
+
+Each bin contains an estimated number of pages that have been accessed
+within ``age_in_ms``. E.g., ``min_gen_nr`` contains the coldest pages
+and ``max_gen_nr`` contains the hottest pages, since ``age_in_ms`` of
+the former is the largest and that of the latter is the smallest.
+
+Users can write the following command to ``lru_gen`` to create a new
+generation ``max_gen_nr+1``:
+
+ ``+ memcg_id node_id max_gen_nr [can_swap [force_scan]]``
+
+``can_swap`` defaults to the swap setting and, if it is set to ``1``,
+it forces the scan of anon pages when swap is off, and vice versa.
+``force_scan`` defaults to ``1`` and, if it is set to ``0``, it
+employs heuristics to reduce the overhead, which is likely to reduce
+the coverage as well.
+
+A typical use case is that a job scheduler runs this command at a
+certain time interval to create new generations, and it ranks the
+servers it manages based on the sizes of their cold pages defined by
+this time interval.
+
+Proactive reclaim
+-----------------
+Proactive reclaim induces page reclaim when there is no memory
+pressure. It usually targets cold pages only. E.g., when a new job
+comes in, the job scheduler wants to proactively reclaim cold pages on
+the server it selected, to improve the chance of successfully landing
+this new job.
+
+Users can write the following command to ``lru_gen`` to evict
+generations less than or equal to ``min_gen_nr``.
+
+ ``- memcg_id node_id min_gen_nr [swappiness [nr_to_reclaim]]``
+
+``min_gen_nr`` should be less than ``max_gen_nr-1``, since
+``max_gen_nr`` and ``max_gen_nr-1`` are not fully aged (equivalent to
+the active list) and therefore cannot be evicted. ``swappiness``
+overrides the default value in ``/proc/sys/vm/swappiness``.
+``nr_to_reclaim`` limits the number of pages to evict.
+
+A typical use case is that a job scheduler runs this command before it
+tries to land a new job on a server. If it fails to materialize enough
+cold pages because of the overestimation, it retries on the next
+server according to the ranking result obtained from the working set
+estimation step. This less forceful approach limits the impacts on the
+existing jobs.
diff --git a/Documentation/admin-guide/mm/nommu-mmap.rst b/Documentation/admin-guide/mm/nommu-mmap.rst
new file mode 100644
index 000000000000..530fed08de2c
--- /dev/null
+++ b/Documentation/admin-guide/mm/nommu-mmap.rst
@@ -0,0 +1,283 @@
+=============================
+No-MMU memory mapping support
+=============================
+
+The kernel has limited support for memory mapping under no-MMU conditions, such
+as are used in uClinux environments. From the userspace point of view, memory
+mapping is made use of in conjunction with the mmap() system call, the shmat()
+call and the execve() system call. From the kernel's point of view, execve()
+mapping is actually performed by the binfmt drivers, which call back into the
+mmap() routines to do the actual work.
+
+Memory mapping behaviour also involves the way fork(), vfork(), clone() and
+ptrace() work. Under uClinux there is no fork(), and clone() must be supplied
+the CLONE_VM flag.
+
+The behaviour is similar between the MMU and no-MMU cases, but not identical;
+and it's also much more restricted in the latter case:
+
+ (#) Anonymous mapping, MAP_PRIVATE
+
+ In the MMU case: VM regions backed by arbitrary pages; copy-on-write
+ across fork.
+
+ In the no-MMU case: VM regions backed by arbitrary contiguous runs of
+ pages.
+
+ (#) Anonymous mapping, MAP_SHARED
+
+ These behave very much like private mappings, except that they're
+ shared across fork() or clone() without CLONE_VM in the MMU case. Since
+ the no-MMU case doesn't support these, behaviour is identical to
+ MAP_PRIVATE there.
+
+ (#) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, !PROT_WRITE
+
+ In the MMU case: VM regions backed by pages read from file; changes to
+ the underlying file are reflected in the mapping; copied across fork.
+
+ In the no-MMU case:
+
+ - If one exists, the kernel will re-use an existing mapping to the
+ same segment of the same file if that has compatible permissions,
+ even if this was created by another process.
+
+ - If possible, the file mapping will be directly on the backing device
+ if the backing device has the NOMMU_MAP_DIRECT capability and
+ appropriate mapping protection capabilities. Ramfs, romfs, cramfs
+ and mtd might all permit this.
+
+ - If the backing device can't or won't permit direct sharing,
+ but does have the NOMMU_MAP_COPY capability, then a copy of the
+ appropriate bit of the file will be read into a contiguous bit of
+ memory and any extraneous space beyond the EOF will be cleared
+
+ - Writes to the file do not affect the mapping; writes to the mapping
+ are visible in other processes (no MMU protection), but should not
+ happen.
+
+ (#) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, PROT_WRITE
+
+ In the MMU case: like the non-PROT_WRITE case, except that the pages in
+ question get copied before the write actually happens. From that point
+ on writes to the file underneath that page no longer get reflected into
+ the mapping's backing pages. The page is then backed by swap instead.
+
+ In the no-MMU case: works much like the non-PROT_WRITE case, except
+ that a copy is always taken and never shared.
+
+ (#) Regular file / blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
+
+ In the MMU case: VM regions backed by pages read from file; changes to
+ pages written back to file; writes to file reflected into pages backing
+ mapping; shared across fork.
+
+ In the no-MMU case: not supported.
+
+ (#) Memory backed regular file, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
+
+ In the MMU case: As for ordinary regular files.
+
+ In the no-MMU case: The filesystem providing the memory-backed file
+ (such as ramfs or tmpfs) may choose to honour an open, truncate, mmap
+ sequence by providing a contiguous sequence of pages to map. In that
+ case, a shared-writable memory mapping will be possible. It will work
+ as for the MMU case. If the filesystem does not provide any such
+ support, then the mapping request will be denied.
+
+ (#) Memory backed blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
+
+ In the MMU case: As for ordinary regular files.
+
+ In the no-MMU case: As for memory backed regular files, but the
+ blockdev must be able to provide a contiguous run of pages without
+ truncate being called. The ramdisk driver could do this if it allocated
+ all its memory as a contiguous array upfront.
+
+ (#) Memory backed chardev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
+
+ In the MMU case: As for ordinary regular files.
+
+ In the no-MMU case: The character device driver may choose to honour
+ the mmap() by providing direct access to the underlying device if it
+ provides memory or quasi-memory that can be accessed directly. Examples
+ of such are frame buffers and flash devices. If the driver does not
+ provide any such support, then the mapping request will be denied.
+
+
+Further notes on no-MMU MMAP
+============================
+
+ (#) A request for a private mapping of a file may return a buffer that is not
+ page-aligned. This is because XIP may take place, and the data may not be
+ paged aligned in the backing store.
+
+ (#) A request for an anonymous mapping will always be page aligned. If
+ possible the size of the request should be a power of two otherwise some
+ of the space may be wasted as the kernel must allocate a power-of-2
+ granule but will only discard the excess if appropriately configured as
+ this has an effect on fragmentation.
+
+ (#) The memory allocated by a request for an anonymous mapping will normally
+ be cleared by the kernel before being returned in accordance with the
+ Linux man pages (ver 2.22 or later).
+
+ In the MMU case this can be achieved with reasonable performance as
+ regions are backed by virtual pages, with the contents only being mapped
+ to cleared physical pages when a write happens on that specific page
+ (prior to which, the pages are effectively mapped to the global zero page
+ from which reads can take place). This spreads out the time it takes to
+ initialize the contents of a page - depending on the write-usage of the
+ mapping.
+
+ In the no-MMU case, however, anonymous mappings are backed by physical
+ pages, and the entire map is cleared at allocation time. This can cause
+ significant delays during a userspace malloc() as the C library does an
+ anonymous mapping and the kernel then does a memset for the entire map.
+
+ However, for memory that isn't required to be precleared - such as that
+ returned by malloc() - mmap() can take a MAP_UNINITIALIZED flag to
+ indicate to the kernel that it shouldn't bother clearing the memory before
+ returning it. Note that CONFIG_MMAP_ALLOW_UNINITIALIZED must be enabled
+ to permit this, otherwise the flag will be ignored.
+
+ uClibc uses this to speed up malloc(), and the ELF-FDPIC binfmt uses this
+ to allocate the brk and stack region.
+
+ (#) A list of all the private copy and anonymous mappings on the system is
+ visible through /proc/maps in no-MMU mode.
+
+ (#) A list of all the mappings in use by a process is visible through
+ /proc/<pid>/maps in no-MMU mode.
+
+ (#) Supplying MAP_FIXED or a requesting a particular mapping address will
+ result in an error.
+
+ (#) Files mapped privately usually have to have a read method provided by the
+ driver or filesystem so that the contents can be read into the memory
+ allocated if mmap() chooses not to map the backing device directly. An
+ error will result if they don't. This is most likely to be encountered
+ with character device files, pipes, fifos and sockets.
+
+
+Interprocess shared memory
+==========================
+
+Both SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMU
+mode. The former through the usual mechanism, the latter through files created
+on ramfs or tmpfs mounts.
+
+
+Futexes
+=======
+
+Futexes are supported in NOMMU mode if the arch supports them. An error will
+be given if an address passed to the futex system call lies outside the
+mappings made by a process or if the mapping in which the address lies does not
+support futexes (such as an I/O chardev mapping).
+
+
+No-MMU mremap
+=============
+
+The mremap() function is partially supported. It may change the size of a
+mapping, and may move it [#]_ if MREMAP_MAYMOVE is specified and if the new size
+of the mapping exceeds the size of the slab object currently occupied by the
+memory to which the mapping refers, or if a smaller slab object could be used.
+
+MREMAP_FIXED is not supported, though it is ignored if there's no change of
+address and the object does not need to be moved.
+
+Shared mappings may not be moved. Shareable mappings may not be moved either,
+even if they are not currently shared.
+
+The mremap() function must be given an exact match for base address and size of
+a previously mapped object. It may not be used to create holes in existing
+mappings, move parts of existing mappings or resize parts of mappings. It must
+act on a complete mapping.
+
+.. [#] Not currently supported.
+
+
+Providing shareable character device support
+============================================
+
+To provide shareable character device support, a driver must provide a
+file->f_op->get_unmapped_area() operation. The mmap() routines will call this
+to get a proposed address for the mapping. This may return an error if it
+doesn't wish to honour the mapping because it's too long, at a weird offset,
+under some unsupported combination of flags or whatever.
+
+The driver should also provide backing device information with capabilities set
+to indicate the permitted types of mapping on such devices. The default is
+assumed to be readable and writable, not executable, and only shareable
+directly (can't be copied).
+
+The file->f_op->mmap() operation will be called to actually inaugurate the
+mapping. It can be rejected at that point. Returning the ENOSYS error will
+cause the mapping to be copied instead if NOMMU_MAP_COPY is specified.
+
+The vm_ops->close() routine will be invoked when the last mapping on a chardev
+is removed. An existing mapping will be shared, partially or not, if possible
+without notifying the driver.
+
+It is permitted also for the file->f_op->get_unmapped_area() operation to
+return -ENOSYS. This will be taken to mean that this operation just doesn't
+want to handle it, despite the fact it's got an operation. For instance, it
+might try directing the call to a secondary driver which turns out not to
+implement it. Such is the case for the framebuffer driver which attempts to
+direct the call to the device-specific driver. Under such circumstances, the
+mapping request will be rejected if NOMMU_MAP_COPY is not specified, and a
+copy mapped otherwise.
+
+.. important::
+
+ Some types of device may present a different appearance to anyone
+ looking at them in certain modes. Flash chips can be like this; for
+ instance if they're in programming or erase mode, you might see the
+ status reflected in the mapping, instead of the data.
+
+ In such a case, care must be taken lest userspace see a shared or a
+ private mapping showing such information when the driver is busy
+ controlling the device. Remember especially: private executable
+ mappings may still be mapped directly off the device under some
+ circumstances!
+
+
+Providing shareable memory-backed file support
+==============================================
+
+Provision of shared mappings on memory backed files is similar to the provision
+of support for shared mapped character devices. The main difference is that the
+filesystem providing the service will probably allocate a contiguous collection
+of pages and permit mappings to be made on that.
+
+It is recommended that a truncate operation applied to such a file that
+increases the file size, if that file is empty, be taken as a request to gather
+enough pages to honour a mapping. This is required to support POSIX shared
+memory.
+
+Memory backed devices are indicated by the mapping's backing device info having
+the memory_backed flag set.
+
+
+Providing shareable block device support
+========================================
+
+Provision of shared mappings on block device files is exactly the same as for
+character devices. If there isn't a real device underneath, then the driver
+should allocate sufficient contiguous memory to honour any supported mapping.
+
+
+Adjusting page trimming behaviour
+=================================
+
+NOMMU mmap automatically rounds up to the nearest power-of-2 number of pages
+when performing an allocation. This can have adverse effects on memory
+fragmentation, and as such, is left configurable. The default behaviour is to
+aggressively trim allocations and discard any excess pages back in to the page
+allocator. In order to retain finer-grained control over fragmentation, this
+behaviour can either be disabled completely, or bumped up to a higher page
+watermark where trimming begins.
+
+Page trimming behaviour is configurable via the sysctl ``vm.nr_trim_pages``.
diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
index 8463f5538fda..eca38fa81e0f 100644
--- a/Documentation/admin-guide/mm/numa_memory_policy.rst
+++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
@@ -1,5 +1,3 @@
-.. _numa_memory_policy:
-
==================
NUMA Memory Policy
==================
@@ -111,7 +109,7 @@ VMA Policy
* A task may install a new VMA policy on a sub-range of a
previously mmap()ed region. When this happens, Linux splits
the existing virtual memory area into 2 or 3 VMAs, each with
- it's own policy.
+ its own policy.
* By default, VMA policy applies only to pages allocated after
the policy is installed. Any pages already faulted into the
@@ -245,6 +243,13 @@ MPOL_INTERLEAVED
address range or file. During system boot up, the temporary
interleaved system default policy works in this mode.
+MPOL_PREFERRED_MANY
+ This mode specifies that the allocation should be preferably
+ satisfied from the nodemask specified in the policy. If there is
+ a memory pressure on all nodes in the nodemask, the allocation
+ can fall back to all existing numa nodes. This is effectively
+ MPOL_PREFERRED allowed for a mask rather than a single node.
+
NUMA memory policy supports the following optional mode flags:
MPOL_F_STATIC_NODES
@@ -253,10 +258,10 @@ MPOL_F_STATIC_NODES
nodes changes after the memory policy has been defined.
Without this flag, any time a mempolicy is rebound because of a
- change in the set of allowed nodes, the node (Preferred) or
- nodemask (Bind, Interleave) is remapped to the new set of
- allowed nodes. This may result in nodes being used that were
- previously undesired.
+ change in the set of allowed nodes, the preferred nodemask (Preferred
+ Many), preferred node (Preferred) or nodemask (Bind, Interleave) is
+ remapped to the new set of allowed nodes. This may result in nodes
+ being used that were previously undesired.
With this flag, if the user-specified nodes overlap with the
nodes allowed by the task's cpuset, then the memory policy is
@@ -353,7 +358,7 @@ and NUMA nodes. "Usage" here means one of the following:
2) examination of the policy to determine the policy mode and associated node
or node lists, if any, for page allocation. This is considered a "hot
path". Note that for MPOL_BIND, the "usage" extends across the entire
- allocation process, which may sleep during page reclaimation, because the
+ allocation process, which may sleep during page reclamation, because the
BIND policy nodemask is used, by reference, to filter ineligible nodes.
We can avoid taking an extra reference during the usages listed above as
@@ -364,19 +369,19 @@ follows:
2) for querying the policy, we do not need to take an extra reference on the
target task's task policy nor vma policies because we always acquire the
- task's mm's mmap_sem for read during the query. The set_mempolicy() and
- mbind() APIs [see below] always acquire the mmap_sem for write when
+ task's mm's mmap_lock for read during the query. The set_mempolicy() and
+ mbind() APIs [see below] always acquire the mmap_lock for write when
installing or replacing task or vma policies. Thus, there is no possibility
of a task or thread freeing a policy while another task or thread is
querying it.
3) Page allocation usage of task or vma policy occurs in the fault path where
- we hold them mmap_sem for read. Again, because replacing the task or vma
- policy requires that the mmap_sem be held for write, the policy can't be
+ we hold them mmap_lock for read. Again, because replacing the task or vma
+ policy requires that the mmap_lock be held for write, the policy can't be
freed out from under us while we're using it for page allocation.
4) Shared policies require special consideration. One task can replace a
- shared memory policy while another task, with a distinct mmap_sem, is
+ shared memory policy while another task, with a distinct mmap_lock, is
querying or allocating a page based on the policy. To resolve this
potential race, the shared policy infrastructure adds an extra reference
to the shared policy during lookup while holding a spin lock on the shared
@@ -401,7 +406,7 @@ follows:
Memory Policy APIs
==================
-Linux supports 3 system calls for controlling memory policy. These APIS
+Linux supports 4 system calls for controlling memory policy. These APIS
always affect only the calling task, the calling task's address space, or
some shared object mapped into the calling task's address space.
@@ -453,6 +458,20 @@ requested via the 'flags' argument.
See the mbind(2) man page for more details.
+Set home node for a Range of Task's Address Spacec::
+
+ long sys_set_mempolicy_home_node(unsigned long start, unsigned long len,
+ unsigned long home_node,
+ unsigned long flags);
+
+sys_set_mempolicy_home_node set the home node for a VMA policy present in the
+task's address range. The system call updates the home node only for the existing
+mempolicy range. Other address ranges are ignored. A home node is the NUMA node
+closest to which page allocation will come from. Specifying the home node override
+the default allocation policy to allocate memory close to the local node for an
+executing CPU.
+
+
Memory Policy Command Line Interface
====================================
diff --git a/Documentation/admin-guide/mm/numaperf.rst b/Documentation/admin-guide/mm/numaperf.rst
index a80c3c37226e..90a12b6a8bfc 100644
--- a/Documentation/admin-guide/mm/numaperf.rst
+++ b/Documentation/admin-guide/mm/numaperf.rst
@@ -1,6 +1,7 @@
-.. _numaperf:
+=======================
+NUMA Memory Performance
+=======================
-=============
NUMA Locality
=============
@@ -56,7 +57,11 @@ nodes' access characteristics share the same performance relative to other
linked initiator nodes. Each target within an initiator's access class,
though, do not necessarily perform the same as each other.
-================
+The access class "1" is used to allow differentiation between initiators
+that are CPUs and hence suitable for generic task scheduling, and
+IO initiators such as GPUs and NICs. Unlike access class 0, only
+nodes containing CPUs are considered.
+
NUMA Performance
================
@@ -69,7 +74,7 @@ memory node's access class 0 initiators as follows::
/sys/devices/system/node/nodeY/access0/initiators/
These attributes apply only when accessed from nodes that have the
-are linked under the this access's inititiators.
+are linked under the this access's initiators.
The performance characteristics the kernel provides for the local initiators
are exported are as follows::
@@ -88,7 +93,9 @@ The latency attributes are provided in nanoseconds.
The values reported here correspond to the rated latency and bandwidth
for the platform.
-==========
+Access class 1 takes the same form but only includes values for CPU to
+memory activity.
+
NUMA Cache
==========
@@ -129,7 +136,7 @@ will create the following directory::
/sys/devices/system/node/nodeX/memory_side_cache/
-If that directory is not present, the system either does not not provide
+If that directory is not present, the system either does not provide
a memory-side cache, or that information is not accessible to the kernel.
The attributes for each level of cache is provided under its cache
@@ -143,7 +150,7 @@ Each cache level's directory provides its attributes. For example, the
following shows a single cache level and the attributes available for
software to query::
- # tree sys/devices/system/node/node0/memory_side_cache/
+ # tree /sys/devices/system/node/node0/memory_side_cache/
/sys/devices/system/node/node0/memory_side_cache/
|-- index1
| |-- indexing
@@ -162,7 +169,6 @@ The "size" is the number of bytes provided by this cache level.
The "write_policy" will be 0 for write-back, and non-zero for
write-through caching.
-========
See Also
========
diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
index 340a5aee9b80..f5f065c67615 100644
--- a/Documentation/admin-guide/mm/pagemap.rst
+++ b/Documentation/admin-guide/mm/pagemap.rst
@@ -1,5 +1,3 @@
-.. _pagemap:
-
=============================
Examining Process Page Tables
=============================
@@ -19,9 +17,11 @@ There are four components to pagemap:
* Bits 0-4 swap type if swapped
* Bits 5-54 swap offset if swapped
* Bit 55 pte is soft-dirty (see
- :ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`)
+ Documentation/admin-guide/mm/soft-dirty.rst)
* Bit 56 page exclusively mapped (since 4.2)
- * Bits 57-60 zero
+ * Bit 57 pte is uffd-wp write-protected (since 5.13) (see
+ Documentation/admin-guide/mm/userfaultfd.rst)
+ * Bits 58-60 zero
* Bit 61 page is file-page or shared-anon (since 3.5)
* Bit 62 page swapped
* Bit 63 page present
@@ -44,7 +44,7 @@ There are four components to pagemap:
* ``/proc/kpagecount``. This file contains a 64-bit count of the number of
times each page is mapped, indexed by PFN.
-The page-types tool in the tools/vm directory can be used to query the
+The page-types tool in the tools/mm directory can be used to query the
number of times a page is mapped.
* ``/proc/kpageflags``. This file contains a 64-bit set of flags for each
@@ -88,13 +88,14 @@ Short descriptions to the page flags
====================================
0 - LOCKED
- page is being locked for exclusive access, e.g. by undergoing read/write IO
+ The page is being locked for exclusive access, e.g. by undergoing read/write
+ IO.
7 - SLAB
- page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator
- When compound page is used, SLUB/SLQB will only set this flag on the head
- page; SLOB will not flag it at all.
+ The page is managed by the SLAB/SLUB kernel memory allocator.
+ When compound page is used, either will only set this flag on the head
+ page.
10 - BUDDY
- a free memory block managed by the buddy system allocator
+ A free memory block managed by the buddy system allocator.
The buddy system organizes free memory in blocks of various orders.
An order N block has 2^N physically contiguous pages, with the BUDDY flag
set for and _only_ for the first page.
@@ -102,75 +103,74 @@ Short descriptions to the page flags
A compound page with order N consists of 2^N physically contiguous pages.
A compound page with order 2 takes the form of "HTTT", where H donates its
head page and T donates its tail page(s). The major consumers of compound
- pages are hugeTLB pages
- (:ref:`Documentation/admin-guide/mm/hugetlbpage.rst <hugetlbpage>`),
+ pages are hugeTLB pages (Documentation/admin-guide/mm/hugetlbpage.rst),
the SLUB etc. memory allocators and various device drivers.
However in this interface, only huge/giga pages are made visible
to end users.
16 - COMPOUND_TAIL
A compound page tail (see description above).
17 - HUGE
- this is an integral part of a HugeTLB page
+ This is an integral part of a HugeTLB page.
19 - HWPOISON
- hardware detected memory corruption on this page: don't touch the data!
+ Hardware detected memory corruption on this page: don't touch the data!
20 - NOPAGE
- no page frame exists at the requested address
+ No page frame exists at the requested address.
21 - KSM
- identical memory pages dynamically shared between one or more processes
+ Identical memory pages dynamically shared between one or more processes.
22 - THP
- contiguous pages which construct transparent hugepages
+ Contiguous pages which construct transparent hugepages.
23 - OFFLINE
- page is logically offline
+ The page is logically offline.
24 - ZERO_PAGE
- zero page for pfn_zero or huge_zero page
+ Zero page for pfn_zero or huge_zero page.
25 - IDLE
- page has not been accessed since it was marked idle (see
- :ref:`Documentation/admin-guide/mm/idle_page_tracking.rst <idle_page_tracking>`).
+ The page has not been accessed since it was marked idle (see
+ Documentation/admin-guide/mm/idle_page_tracking.rst).
Note that this flag may be stale in case the page was accessed via
a PTE. To make sure the flag is up-to-date one has to read
``/sys/kernel/mm/page_idle/bitmap`` first.
26 - PGTABLE
- page is in use as a page table
+ The page is in use as a page table.
IO related page flags
---------------------
1 - ERROR
- IO error occurred
+ IO error occurred.
3 - UPTODATE
- page has up-to-date data
+ The page has up-to-date data.
ie. for file backed page: (in-memory data revision >= on-disk one)
4 - DIRTY
- page has been written to, hence contains new data
+ The page has been written to, hence contains new data.
i.e. for file backed page: (in-memory data revision > on-disk one)
8 - WRITEBACK
- page is being synced to disk
+ The page is being synced to disk.
LRU related page flags
----------------------
5 - LRU
- page is in one of the LRU lists
+ The page is in one of the LRU lists.
6 - ACTIVE
- page is in the active LRU list
+ The page is in the active LRU list.
18 - UNEVICTABLE
- page is in the unevictable (non-)LRU list It is somehow pinned and
+ The page is in the unevictable (non-)LRU list It is somehow pinned and
not a candidate for LRU page reclaims, e.g. ramfs pages,
- shmctl(SHM_LOCK) and mlock() memory segments
+ shmctl(SHM_LOCK) and mlock() memory segments.
2 - REFERENCED
- page has been referenced since last LRU list enqueue/requeue
+ The page has been referenced since last LRU list enqueue/requeue.
9 - RECLAIM
- page will be reclaimed soon after its pageout IO completed
+ The page will be reclaimed soon after its pageout IO completed.
11 - MMAP
- a memory mapped page
+ A memory mapped page.
12 - ANON
- a memory mapped page that is not part of a file
+ A memory mapped page that is not part of a file.
13 - SWAPCACHE
- page is mapped to swap space, i.e. has an associated swap entry
+ The page is mapped to swap space, i.e. has an associated swap entry.
14 - SWAPBACKED
- page is backed by swap/RAM
+ The page is backed by swap/RAM.
-The page-types tool in the tools/vm directory can be used to query the
+The page-types tool in the tools/mm directory can be used to query the
above flags.
Using pagemap to do something useful
@@ -194,6 +194,28 @@ you can go through every map in the process, find the PFNs, look those up
in kpagecount, and tally up the number of pages that are only referenced
once.
+Exceptions for Shared Memory
+============================
+
+Page table entries for shared pages are cleared when the pages are zapped or
+swapped out. This makes swapped out pages indistinguishable from never-allocated
+ones.
+
+In kernel space, the swap location can still be retrieved from the page cache.
+However, values stored only on the normal PTE get lost irretrievably when the
+page is swapped out (i.e. SOFT_DIRTY).
+
+In user space, whether the page is present, swapped or none can be deduced with
+the help of lseek and/or mincore system calls.
+
+lseek() can differentiate between accessed pages (present or swapped out) and
+holes (none/non-allocated) by specifying the SEEK_DATA flag on the file where
+the pages are backed. For anonymous shared pages, the file can be found in
+``/proc/pid/map_files/``.
+
+mincore() can differentiate between pages in memory (present, including swap
+cache) and out of memory (swapped out or none/non-allocated).
+
Other notes
===========
@@ -205,3 +227,93 @@ Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
always 12 at most architectures). Since Linux 3.11 their meaning changes
after first clear of soft-dirty bits. Since Linux 4.2 they are used for
flags unconditionally.
+
+Pagemap Scan IOCTL
+==================
+
+The ``PAGEMAP_SCAN`` IOCTL on the pagemap file can be used to get or optionally
+clear the info about page table entries. The following operations are supported
+in this IOCTL:
+
+- Scan the address range and get the memory ranges matching the provided criteria.
+ This is performed when the output buffer is specified.
+- Write-protect the pages. The ``PM_SCAN_WP_MATCHING`` is used to write-protect
+ the pages of interest. The ``PM_SCAN_CHECK_WPASYNC`` aborts the operation if
+ non-Async Write Protected pages are found. The ``PM_SCAN_WP_MATCHING`` can be
+ used with or without ``PM_SCAN_CHECK_WPASYNC``.
+- Both of those operations can be combined into one atomic operation where we can
+ get and write protect the pages as well.
+
+Following flags about pages are currently supported:
+
+- ``PAGE_IS_WPALLOWED`` - Page has async-write-protection enabled
+- ``PAGE_IS_WRITTEN`` - Page has been written to from the time it was write protected
+- ``PAGE_IS_FILE`` - Page is file backed
+- ``PAGE_IS_PRESENT`` - Page is present in the memory
+- ``PAGE_IS_SWAPPED`` - Page is in swapped
+- ``PAGE_IS_PFNZERO`` - Page has zero PFN
+- ``PAGE_IS_HUGE`` - Page is THP or Hugetlb backed
+- ``PAGE_IS_SOFT_DIRTY`` - Page is soft-dirty
+
+The ``struct pm_scan_arg`` is used as the argument of the IOCTL.
+
+ 1. The size of the ``struct pm_scan_arg`` must be specified in the ``size``
+ field. This field will be helpful in recognizing the structure if extensions
+ are done later.
+ 2. The flags can be specified in the ``flags`` field. The ``PM_SCAN_WP_MATCHING``
+ and ``PM_SCAN_CHECK_WPASYNC`` are the only added flags at this time. The get
+ operation is optionally performed depending upon if the output buffer is
+ provided or not.
+ 3. The range is specified through ``start`` and ``end``.
+ 4. The walk can abort before visiting the complete range such as the user buffer
+ can get full etc. The walk ending address is specified in``end_walk``.
+ 5. The output buffer of ``struct page_region`` array and size is specified in
+ ``vec`` and ``vec_len``.
+ 6. The optional maximum requested pages are specified in the ``max_pages``.
+ 7. The masks are specified in ``category_mask``, ``category_anyof_mask``,
+ ``category_inverted`` and ``return_mask``.
+
+Find pages which have been written and WP them as well::
+
+ struct pm_scan_arg arg = {
+ .size = sizeof(arg),
+ .flags = PM_SCAN_CHECK_WPASYNC | PM_SCAN_CHECK_WPASYNC,
+ ..
+ .category_mask = PAGE_IS_WRITTEN,
+ .return_mask = PAGE_IS_WRITTEN,
+ };
+
+Find pages which have been written, are file backed, not swapped and either
+present or huge::
+
+ struct pm_scan_arg arg = {
+ .size = sizeof(arg),
+ .flags = 0,
+ ..
+ .category_mask = PAGE_IS_WRITTEN | PAGE_IS_SWAPPED,
+ .category_inverted = PAGE_IS_SWAPPED,
+ .category_anyof_mask = PAGE_IS_PRESENT | PAGE_IS_HUGE,
+ .return_mask = PAGE_IS_WRITTEN | PAGE_IS_SWAPPED |
+ PAGE_IS_PRESENT | PAGE_IS_HUGE,
+ };
+
+The ``PAGE_IS_WRITTEN`` flag can be considered as a better-performing alternative
+of soft-dirty flag. It doesn't get affected by VMA merging of the kernel and hence
+the user can find the true soft-dirty pages in case of normal pages. (There may
+still be extra dirty pages reported for THP or Hugetlb pages.)
+
+"PAGE_IS_WRITTEN" category is used with uffd write protect-enabled ranges to
+implement memory dirty tracking in userspace:
+
+ 1. The userfaultfd file descriptor is created with ``userfaultfd`` syscall.
+ 2. The ``UFFD_FEATURE_WP_UNPOPULATED`` and ``UFFD_FEATURE_WP_ASYNC`` features
+ are set by ``UFFDIO_API`` IOCTL.
+ 3. The memory range is registered with ``UFFDIO_REGISTER_MODE_WP`` mode
+ through ``UFFDIO_REGISTER`` IOCTL.
+ 4. Then any part of the registered memory or the whole memory region must
+ be write protected using ``PAGEMAP_SCAN`` IOCTL with flag ``PM_SCAN_WP_MATCHING``
+ or the ``UFFDIO_WRITEPROTECT`` IOCTL can be used. Both of these perform the
+ same operation. The former is better in terms of performance.
+ 5. Now the ``PAGEMAP_SCAN`` IOCTL can be used to either just find pages which
+ have been written to since they were last marked and/or optionally write protect
+ the pages as well.
diff --git a/Documentation/admin-guide/mm/shrinker_debugfs.rst b/Documentation/admin-guide/mm/shrinker_debugfs.rst
new file mode 100644
index 000000000000..c582033bd113
--- /dev/null
+++ b/Documentation/admin-guide/mm/shrinker_debugfs.rst
@@ -0,0 +1,133 @@
+==========================
+Shrinker Debugfs Interface
+==========================
+
+Shrinker debugfs interface provides a visibility into the kernel memory
+shrinkers subsystem and allows to get information about individual shrinkers
+and interact with them.
+
+For each shrinker registered in the system a directory in **<debugfs>/shrinker/**
+is created. The directory's name is composed from the shrinker's name and an
+unique id: e.g. *kfree_rcu-0* or *sb-xfs:vda1-36*.
+
+Each shrinker directory contains **count** and **scan** files, which allow to
+trigger *count_objects()* and *scan_objects()* callbacks for each memcg and
+numa node (if applicable).
+
+Usage:
+------
+
+1. *List registered shrinkers*
+
+ ::
+
+ $ cd /sys/kernel/debug/shrinker/
+ $ ls
+ dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42
+ mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43
+ mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44
+ rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49
+ sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13
+ sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36
+ sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19
+ sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10
+ sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9
+ sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37
+ sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38
+ sb-dax-11 sb-proc-45 sb-tmpfs-35
+ sb-debugfs-7 sb-proc-46 sb-tmpfs-40
+
+2. *Get information about a specific shrinker*
+
+ ::
+
+ $ cd sb-btrfs\:vda2-24/
+ $ ls
+ count scan
+
+3. *Count objects*
+
+ Each line in the output has the following format::
+
+ <cgroup inode id> <nr of objects on node 0> <nr of objects on node 1> ...
+ <cgroup inode id> <nr of objects on node 0> <nr of objects on node 1> ...
+ ...
+
+ If there are no objects on all numa nodes, a line is omitted. If there
+ are no objects at all, the output might be empty.
+
+ If the shrinker is not memcg-aware or CONFIG_MEMCG is off, 0 is printed
+ as cgroup inode id. If the shrinker is not numa-aware, 0's are printed
+ for all nodes except the first one.
+ ::
+
+ $ cat count
+ 1 224 2
+ 21 98 0
+ 55 818 10
+ 2367 2 0
+ 2401 30 0
+ 225 13 0
+ 599 35 0
+ 939 124 0
+ 1041 3 0
+ 1075 1 0
+ 1109 1 0
+ 1279 60 0
+ 1313 7 0
+ 1347 39 0
+ 1381 3 0
+ 1449 14 0
+ 1483 63 0
+ 1517 53 0
+ 1551 6 0
+ 1585 1 0
+ 1619 6 0
+ 1653 40 0
+ 1687 11 0
+ 1721 8 0
+ 1755 4 0
+ 1789 52 0
+ 1823 888 0
+ 1857 1 0
+ 1925 2 0
+ 1959 32 0
+ 2027 22 0
+ 2061 9 0
+ 2469 799 0
+ 2537 861 0
+ 2639 1 0
+ 2707 70 0
+ 2775 4 0
+ 2877 84 0
+ 293 1 0
+ 735 8 0
+
+4. *Scan objects*
+
+ The expected input format::
+
+ <cgroup inode id> <numa id> <number of objects to scan>
+
+ For a non-memcg-aware shrinker or on a system with no memory
+ cgrups **0** should be passed as cgroup id.
+ ::
+
+ $ cd /sys/kernel/debug/shrinker/
+ $ cd sb-btrfs\:vda2-24/
+
+ $ cat count | head -n 5
+ 1 212 0
+ 21 97 0
+ 55 802 5
+ 2367 2 0
+ 225 13 0
+
+ $ echo "55 0 200" > scan
+
+ $ cat count | head -n 5
+ 1 212 0
+ 21 96 0
+ 55 752 5
+ 2367 2 0
+ 225 13 0
diff --git a/Documentation/admin-guide/mm/soft-dirty.rst b/Documentation/admin-guide/mm/soft-dirty.rst
index cb0cfd6672fa..aeea936caa44 100644
--- a/Documentation/admin-guide/mm/soft-dirty.rst
+++ b/Documentation/admin-guide/mm/soft-dirty.rst
@@ -1,5 +1,3 @@
-.. _soft_dirty:
-
===============
Soft-Dirty PTEs
===============
diff --git a/Documentation/admin-guide/mm/swap_numa.rst b/Documentation/admin-guide/mm/swap_numa.rst
new file mode 100644
index 000000000000..2e630627bcee
--- /dev/null
+++ b/Documentation/admin-guide/mm/swap_numa.rst
@@ -0,0 +1,78 @@
+===========================================
+Automatically bind swap device to numa node
+===========================================
+
+If the system has more than one swap device and swap device has the node
+information, we can make use of this information to decide which swap
+device to use in get_swap_pages() to get better performance.
+
+
+How to use this feature
+=======================
+
+Swap device has priority and that decides the order of it to be used. To make
+use of automatically binding, there is no need to manipulate priority settings
+for swap devices. e.g. on a 2 node machine, assume 2 swap devices swapA and
+swapB, with swapA attached to node 0 and swapB attached to node 1, are going
+to be swapped on. Simply swapping them on by doing::
+
+ # swapon /dev/swapA
+ # swapon /dev/swapB
+
+Then node 0 will use the two swap devices in the order of swapA then swapB and
+node 1 will use the two swap devices in the order of swapB then swapA. Note
+that the order of them being swapped on doesn't matter.
+
+A more complex example on a 4 node machine. Assume 6 swap devices are going to
+be swapped on: swapA and swapB are attached to node 0, swapC is attached to
+node 1, swapD and swapE are attached to node 2 and swapF is attached to node3.
+The way to swap them on is the same as above::
+
+ # swapon /dev/swapA
+ # swapon /dev/swapB
+ # swapon /dev/swapC
+ # swapon /dev/swapD
+ # swapon /dev/swapE
+ # swapon /dev/swapF
+
+Then node 0 will use them in the order of::
+
+ swapA/swapB -> swapC -> swapD -> swapE -> swapF
+
+swapA and swapB will be used in a round robin mode before any other swap device.
+
+node 1 will use them in the order of::
+
+ swapC -> swapA -> swapB -> swapD -> swapE -> swapF
+
+node 2 will use them in the order of::
+
+ swapD/swapE -> swapA -> swapB -> swapC -> swapF
+
+Similaly, swapD and swapE will be used in a round robin mode before any
+other swap devices.
+
+node 3 will use them in the order of::
+
+ swapF -> swapA -> swapB -> swapC -> swapD -> swapE
+
+
+Implementation details
+======================
+
+The current code uses a priority based list, swap_avail_list, to decide
+which swap device to use and if multiple swap devices share the same
+priority, they are used round robin. This change here replaces the single
+global swap_avail_list with a per-numa-node list, i.e. for each numa node,
+it sees its own priority based list of available swap devices. Swap
+device's priority can be promoted on its matching node's swap_avail_list.
+
+The current swap device's priority is set as: user can set a >=0 value,
+or the system will pick one starting from -1 then downwards. The priority
+value in the swap_avail_list is the negated value of the swap device's
+due to plist being sorted from low to high. The new policy doesn't change
+the semantics for priority >=0 cases, the previous starting from -1 then
+downwards now becomes starting from -2 then downwards and -1 is reserved
+as the promoted value. So if multiple swap devices are attached to the same
+node, they will all be promoted to priority -1 on that node's plist and will
+be used round robin before any other swap devices.
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 2f31de8f7c74..04eb45a2f940 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -1,5 +1,3 @@
-.. _admin_guide_transhuge:
-
============================
Transparent Hugepage Support
============================
@@ -47,10 +45,25 @@ components:
the two is using hugepages just because of the fact the TLB miss is
going to run faster.
+Modern kernels support "multi-size THP" (mTHP), which introduces the
+ability to allocate memory in blocks that are bigger than a base page
+but smaller than traditional PMD-size (as described above), in
+increments of a power-of-2 number of pages. mTHP can back anonymous
+memory (for example 16K, 32K, 64K, etc). These THPs continue to be
+PTE-mapped, but in many cases can still provide similar benefits to
+those outlined above: Page faults are significantly reduced (by a
+factor of e.g. 4, 8, 16, etc), but latency spikes are much less
+prominent because the size of each page isn't as huge as the PMD-sized
+variant and there is less memory to clear in each page fault. Some
+architectures also employ TLB compression mechanisms to squeeze more
+entries in when a set of PTEs are virtually and physically contiguous
+and approporiately aligned. In this case, TLB misses will occur less
+often.
+
THP can be enabled system wide or restricted to certain tasks or even
memory ranges inside task's address space. Unless THP is completely
disabled, there is ``khugepaged`` daemon that scans memory and
-collapses sequences of basic pages into huge pages.
+collapses sequences of basic pages into PMD-sized huge pages.
The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>`
interface and using madvise(2) and prctl(2) system calls.
@@ -97,12 +110,40 @@ Global THP controls
Transparent Hugepage Support for anonymous memory can be entirely disabled
(mostly for debugging purposes) or only enabled inside MADV_HUGEPAGE
regions (to avoid the risk of consuming more memory resources) or enabled
-system wide. This can be achieved with one of::
+system wide. This can be achieved per-supported-THP-size with one of::
+
+ echo always >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
+ echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
+ echo never >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
+
+where <size> is the hugepage size being addressed, the available sizes
+for which vary by system.
+
+For example::
+
+ echo always >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
+
+Alternatively it is possible to specify that a given hugepage size
+will inherit the top-level "enabled" value::
+
+ echo inherit >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
+
+For example::
+
+ echo inherit >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
+
+The top-level setting (for use with "inherit") can be set by issuing
+one of the following commands::
echo always >/sys/kernel/mm/transparent_hugepage/enabled
echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
echo never >/sys/kernel/mm/transparent_hugepage/enabled
+By default, PMD-sized hugepages have enabled="inherit" and all other
+hugepage sizes have enabled="never". If enabling multiple hugepage
+sizes, the kernel will select the most appropriate enabled size for a
+given allocation.
+
It's also possible to limit defrag efforts in the VM to generate
anonymous hugepages in case they're not immediately free to madvise
regions or to never try to defrag memory and simply fallback to regular
@@ -148,25 +189,34 @@ madvise
never
should be self-explanatory.
-By default kernel tries to use huge zero page on read page fault to
-anonymous mapping. It's possible to disable huge zero page by writing 0
-or enable it back by writing 1::
+By default kernel tries to use huge, PMD-mappable zero page on read
+page fault to anonymous mapping. It's possible to disable huge zero
+page by writing 0 or enable it back by writing 1::
echo 0 >/sys/kernel/mm/transparent_hugepage/use_zero_page
echo 1 >/sys/kernel/mm/transparent_hugepage/use_zero_page
-Some userspace (such as a test program, or an optimized memory allocation
-library) may want to know the size (in bytes) of a transparent hugepage::
+Some userspace (such as a test program, or an optimized memory
+allocation library) may want to know the size (in bytes) of a
+PMD-mappable transparent hugepage::
cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
-khugepaged will be automatically started when
-transparent_hugepage/enabled is set to "always" or "madvise, and it'll
-be automatically shutdown if it's set to "never".
+khugepaged will be automatically started when one or more hugepage
+sizes are enabled (either by directly setting "always" or "madvise",
+or by setting "inherit" while the top-level enabled is set to "always"
+or "madvise"), and it'll be automatically shutdown when the last
+hugepage size is disabled (either by directly setting "never", or by
+setting "inherit" while the top-level enabled is set to "never").
Khugepaged controls
-------------------
+.. note::
+ khugepaged currently only searches for opportunities to collapse to
+ PMD-sized THP and no attempt is made to collapse to other THP
+ sizes.
+
khugepaged runs usually at low frequency so while one may not want to
invoke defrag algorithms synchronously during the page faults, it
should be worth invoking defrag at least in khugepaged. However it's
@@ -191,7 +241,14 @@ allocation failure to throttle the next allocation attempt::
/sys/kernel/mm/transparent_hugepage/khugepaged/alloc_sleep_millisecs
-The khugepaged progress can be seen in the number of pages collapsed::
+The khugepaged progress can be seen in the number of pages collapsed (note
+that this counter may not be an exact count of the number of pages
+collapsed, since "collapsed" could mean multiple things: (1) A PTE mapping
+being replaced by a PMD mapping, or (2) All 4K physical pages replaced by
+one 2M hugepage. Each may happen independently, or together, depending on
+the type of memory and the failures that occur. As such, this value should
+be interpreted roughly as a sign of progress, and counters in /proc/vmstat
+consulted for more accurate accounting)::
/sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed
@@ -220,6 +277,13 @@ memory. A lower value can prevent THPs from being
collapsed, resulting fewer pages being collapsed into
THPs, and lower memory access performance.
+``max_ptes_shared`` specifies how many pages can be shared across multiple
+processes. Exceeding the number would block the collapse::
+
+ /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared
+
+A higher value may increase memory footprint for some workloads.
+
Boot parameter
==============
@@ -270,19 +334,26 @@ force
Need of application restart
===========================
-The transparent_hugepage/enabled values and tmpfs mount option only affect
-future behavior. So to make them effective you need to restart any
-application that could have been using hugepages. This also applies to the
-regions registered in khugepaged.
+The transparent_hugepage/enabled and
+transparent_hugepage/hugepages-<size>kB/enabled values and tmpfs mount
+option only affect future behavior. So to make them effective you need
+to restart any application that could have been using hugepages. This
+also applies to the regions registered in khugepaged.
Monitoring usage
================
-The number of anonymous transparent huge pages currently used by the
+.. note::
+ Currently the below counters only record events relating to
+ PMD-sized THP. Events relating to other THP sizes are not included.
+
+The number of PMD-sized anonymous transparent huge pages currently used by the
system is available by reading the AnonHugePages field in ``/proc/meminfo``.
-To identify what applications are using anonymous transparent huge pages,
-it is necessary to read ``/proc/PID/smaps`` and count the AnonHugePages fields
-for each mapping.
+To identify what applications are using PMD-sized anonymous transparent huge
+pages, it is necessary to read ``/proc/PID/smaps`` and count the AnonHugePages
+fields for each mapping. (Note that AnonHugePages only applies to traditional
+PMD-sized THP for historical reasons and should have been called
+AnonHugePmdMapped).
The number of file transparent huge pages mapped to userspace is available
by reading ShmemPmdMapped and ShmemHugePages fields in ``/proc/meminfo``.
@@ -298,8 +369,7 @@ monitor how successfully the system is providing huge pages for use.
thp_fault_alloc
is incremented every time a huge page is successfully
- allocated to handle a page fault. This applies to both the
- first time a page is faulted and for COW faults.
+ allocated to handle a page fault.
thp_collapse_alloc
is incremented by khugepaged when it has found
@@ -360,10 +430,9 @@ thp_split_pmd
page table entry.
thp_zero_page_alloc
- is incremented every time a huge zero page is
- successfully allocated. It includes allocations which where
- dropped due race with other allocation. Note, it doesn't count
- every map of the huge zero page, only its allocation.
+ is incremented every time a huge zero page used for thp is
+ successfully allocated. Note, it doesn't count every map of
+ the huge zero page, only its allocation.
thp_zero_page_alloc_failed
is incremented if kernel fails to allocate
@@ -395,30 +464,15 @@ compact_fail
is incremented if the system tries to compact memory
but failed.
-compact_pages_moved
- is incremented each time a page is moved. If
- this value is increasing rapidly, it implies that the system
- is copying a lot of data to satisfy the huge page allocation.
- It is possible that the cost of copying exceeds any savings
- from reduced TLB misses.
-
-compact_pagemigrate_failed
- is incremented when the underlying mechanism
- for moving a page failed.
-
-compact_blocks_moved
- is incremented each time memory compaction examines
- a huge page aligned range of pages.
-
It is possible to establish how long the stalls were using the function
-tracer to record how long was spent in __alloc_pages_nodemask and
+tracer to record how long was spent in __alloc_pages() and
using the mm_page_alloc tracepoint to identify which allocations were
for huge pages.
Optimizing the applications
===========================
-To be guaranteed that the kernel will map a 2M page immediately in any
+To be guaranteed that the kernel will map a THP immediately in any
memory region, the mmap region has to be hugepage naturally
aligned. posix_memalign() can provide that guarantee.
diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst
index c30176e67900..e5cc8848dcb3 100644
--- a/Documentation/admin-guide/mm/userfaultfd.rst
+++ b/Documentation/admin-guide/mm/userfaultfd.rst
@@ -1,5 +1,3 @@
-.. _userfaultfd:
-
===========
Userfaultfd
===========
@@ -12,109 +10,173 @@ and more generally they allow userland to take control of various
memory page faults, something otherwise only the kernel code could do.
For example userfaults allows a proper and more optimal implementation
-of the PROT_NONE+SIGSEGV trick.
+of the ``PROT_NONE+SIGSEGV`` trick.
Design
======
-Userfaults are delivered and resolved through the userfaultfd syscall.
+Userspace creates a new userfaultfd, initializes it, and registers one or more
+regions of virtual memory with it. Then, any page faults which occur within the
+region(s) result in a message being delivered to the userfaultfd, notifying
+userspace of the fault.
-The userfaultfd (aside from registering and unregistering virtual
+The ``userfaultfd`` (aside from registering and unregistering virtual
memory ranges) provides two primary functionalities:
-1) read/POLLIN protocol to notify a userland thread of the faults
+1) ``read/POLLIN`` protocol to notify a userland thread of the faults
happening
-2) various UFFDIO_* ioctls that can manage the virtual memory regions
- registered in the userfaultfd that allows userland to efficiently
+2) various ``UFFDIO_*`` ioctls that can manage the virtual memory regions
+ registered in the ``userfaultfd`` that allows userland to efficiently
resolve the userfaults it receives via 1) or to manage the virtual
memory in the background
The real advantage of userfaults if compared to regular virtual memory
management of mremap/mprotect is that the userfaults in all their
operations never involve heavyweight structures like vmas (in fact the
-userfaultfd runtime load never takes the mmap_sem for writing).
-
+``userfaultfd`` runtime load never takes the mmap_lock for writing).
Vmas are not suitable for page- (or hugepage) granular fault tracking
when dealing with virtual address spaces that could span
Terabytes. Too many vmas would be needed for that.
-The userfaultfd once opened by invoking the syscall, can also be
+The ``userfaultfd``, once created, can also be
passed using unix domain sockets to a manager process, so the same
manager process could handle the userfaults of a multitude of
different processes without them being aware about what is going on
-(well of course unless they later try to use the userfaultfd
+(well of course unless they later try to use the ``userfaultfd``
themselves on the same region the manager is already tracking, which
-is a corner case that would currently return -EBUSY).
+is a corner case that would currently return ``-EBUSY``).
API
===
-When first opened the userfaultfd must be enabled invoking the
-UFFDIO_API ioctl specifying a uffdio_api.api value set to UFFD_API (or
-a later API version) which will specify the read/POLLIN protocol
-userland intends to speak on the UFFD and the uffdio_api.features
-userland requires. The UFFDIO_API ioctl if successful (i.e. if the
-requested uffdio_api.api is spoken also by the running kernel and the
+Creating a userfaultfd
+----------------------
+
+There are two ways to create a new userfaultfd, each of which provide ways to
+restrict access to this functionality (since historically userfaultfds which
+handle kernel page faults have been a useful tool for exploiting the kernel).
+
+The first way, supported since userfaultfd was introduced, is the
+userfaultfd(2) syscall. Access to this is controlled in several ways:
+
+- Any user can always create a userfaultfd which traps userspace page faults
+ only. Such a userfaultfd can be created using the userfaultfd(2) syscall
+ with the flag UFFD_USER_MODE_ONLY.
+
+- In order to also trap kernel page faults for the address space, either the
+ process needs the CAP_SYS_PTRACE capability, or the system must have
+ vm.unprivileged_userfaultfd set to 1. By default, vm.unprivileged_userfaultfd
+ is set to 0.
+
+The second way, added to the kernel more recently, is by opening
+/dev/userfaultfd and issuing a USERFAULTFD_IOC_NEW ioctl to it. This method
+yields equivalent userfaultfds to the userfaultfd(2) syscall.
+
+Unlike userfaultfd(2), access to /dev/userfaultfd is controlled via normal
+filesystem permissions (user/group/mode), which gives fine grained access to
+userfaultfd specifically, without also granting other unrelated privileges at
+the same time (as e.g. granting CAP_SYS_PTRACE would do). Users who have access
+to /dev/userfaultfd can always create userfaultfds that trap kernel page faults;
+vm.unprivileged_userfaultfd is not considered.
+
+Initializing a userfaultfd
+--------------------------
+
+When first opened the ``userfaultfd`` must be enabled invoking the
+``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or
+a later API version) which will specify the ``read/POLLIN`` protocol
+userland intends to speak on the ``UFFD`` and the ``uffdio_api.features``
+userland requires. The ``UFFDIO_API`` ioctl if successful (i.e. if the
+requested ``uffdio_api.api`` is spoken also by the running kernel and the
requested features are going to be enabled) will return into
-uffdio_api.features and uffdio_api.ioctls two 64bit bitmasks of
+``uffdio_api.features`` and ``uffdio_api.ioctls`` two 64bit bitmasks of
respectively all the available features of the read(2) protocol and
the generic ioctl available.
-The uffdio_api.features bitmask returned by the UFFDIO_API ioctl
-defines what memory types are supported by the userfaultfd and what
-events, except page fault notifications, may be generated.
-
-If the kernel supports registering userfaultfd ranges on hugetlbfs
-virtual memory areas, UFFD_FEATURE_MISSING_HUGETLBFS will be set in
-uffdio_api.features. Similarly, UFFD_FEATURE_MISSING_SHMEM will be
-set if the kernel supports registering userfaultfd ranges on shared
-memory (covering all shmem APIs, i.e. tmpfs, IPCSHM, /dev/zero
-MAP_SHARED, memfd_create, etc).
-
-The userland application that wants to use userfaultfd with hugetlbfs
-or shared memory need to set the corresponding flag in
-uffdio_api.features to enable those features.
-
-If the userland desires to receive notifications for events other than
-page faults, it has to verify that uffdio_api.features has appropriate
-UFFD_FEATURE_EVENT_* bits set. These events are described in more
-detail below in "Non-cooperative userfaultfd" section.
-
-Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should
-be invoked (if present in the returned uffdio_api.ioctls bitmask) to
-register a memory range in the userfaultfd by setting the
-uffdio_register structure accordingly. The uffdio_register.mode
+The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl
+defines what memory types are supported by the ``userfaultfd`` and what
+events, except page fault notifications, may be generated:
+
+- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events
+ other than page faults are supported. These events are described in more
+ detail below in the `Non-cooperative userfaultfd`_ section.
+
+- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM``
+ indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING``
+ registrations for hugetlbfs and shared memory (covering all shmem APIs,
+ i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``,
+ etc) virtual memory areas, respectively.
+
+- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports
+ ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory
+ areas. ``UFFD_FEATURE_MINOR_SHMEM`` is the analogous feature indicating
+ support for shmem virtual memory areas.
+
+- ``UFFD_FEATURE_MOVE`` indicates that the kernel supports moving an
+ existing page contents from userspace.
+
+The userland application should set the feature flags it intends to use
+when invoking the ``UFFDIO_API`` ioctl, to request that those features be
+enabled if supported.
+
+Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER``
+ioctl should be invoked (if present in the returned ``uffdio_api.ioctls``
+bitmask) to register a memory range in the ``userfaultfd`` by setting the
+uffdio_register structure accordingly. The ``uffdio_register.mode``
bitmask will specify to the kernel which kind of faults to track for
-the range (UFFDIO_REGISTER_MODE_MISSING would track missing
-pages). The UFFDIO_REGISTER ioctl will return the
-uffdio_register.ioctls bitmask of ioctls that are suitable to resolve
+the range. The ``UFFDIO_REGISTER`` ioctl will return the
+``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve
userfaults on the range registered. Not all ioctls will necessarily be
-supported for all memory types depending on the underlying virtual
-memory backend (anonymous memory vs tmpfs vs real filebacked
-mappings).
+supported for all memory types (e.g. anonymous memory vs. shmem vs.
+hugetlbfs), or all types of intercepted faults.
-Userland can use the uffdio_register.ioctls to manage the virtual
+Userland can use the ``uffdio_register.ioctls`` to manage the virtual
address space in the background (to add or potentially also remove
-memory from the userfaultfd registered range). This means a userfault
+memory from the ``userfaultfd`` registered range). This means a userfault
could be triggering just before userland maps in the background the
user-faulted page.
-The primary ioctl to resolve userfaults is UFFDIO_COPY. That
-atomically copies a page into the userfault registered range and wakes
-up the blocked userfaults (unless uffdio_copy.mode &
-UFFDIO_COPY_MODE_DONTWAKE is set). Other ioctl works similarly to
-UFFDIO_COPY. They're atomic as in guaranteeing that nothing can see an
-half copied page since it'll keep userfaulting until the copy has
-finished.
+Resolving Userfaults
+--------------------
+
+There are three basic ways to resolve userfaults:
+
+- ``UFFDIO_COPY`` atomically copies some existing page contents from
+ userspace.
+
+- ``UFFDIO_ZEROPAGE`` atomically zeros the new page.
+
+- ``UFFDIO_CONTINUE`` maps an existing, previously-populated page.
+
+These operations are atomic in the sense that they guarantee nothing can
+see a half-populated page, since readers will keep userfaulting until the
+operation has finished.
+
+By default, these wake up userfaults blocked on the range in question.
+They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates
+that waking will be done separately at some later time.
+
+Which ioctl to choose depends on the kind of page fault, and what we'd
+like to do to resolve it:
+
+- For ``UFFDIO_REGISTER_MODE_MISSING`` faults, the fault needs to be
+ resolved by either providing a new page (``UFFDIO_COPY``), or mapping
+ the zero page (``UFFDIO_ZEROPAGE``). By default, the kernel would map
+ the zero page for a missing fault. With userfaultfd, userspace can
+ decide what content to provide before the faulting thread continues.
+
+- For ``UFFDIO_REGISTER_MODE_MINOR`` faults, there is an existing page (in
+ the page cache). Userspace has the option of modifying the page's
+ contents before resolving the fault. Once the contents are correct
+ (modified or not), userspace asks the kernel to map the page and let the
+ faulting thread continue with ``UFFDIO_CONTINUE``.
Notes:
-- If you requested UFFDIO_REGISTER_MODE_MISSING when registering then
- you must provide some kind of page in your thread after reading from
- the uffd. You must provide either UFFDIO_COPY or UFFDIO_ZEROPAGE.
- The normal behavior of the OS automatically providing a zero page on
- an annonymous mmaping is not in place.
+- You can tell which kind of fault occurred by examining
+ ``pagefault.flags`` within the ``uffd_msg``, checking for the
+ ``UFFD_PAGEFAULT_FLAG_*`` flags.
- None of the page-delivering ioctls default to the range that you
registered with. You must fill in all fields for the appropriate
@@ -122,13 +184,13 @@ Notes:
- You get the address of the access that triggered the missing page
event out of a struct uffd_msg that you read in the thread from the
- uffd. You can supply as many pages as you want with UFFDIO_COPY or
- UFFDIO_ZEROPAGE. Keep in mind that unless you used DONTWAKE then
- the first of any of those IOCTLs wakes up the faulting thread.
+ uffd. You can supply as many pages as you want with these IOCTLs.
+ Keep in mind that unless you used DONTWAKE then the first of any of
+ those IOCTLs wakes up the faulting thread.
-- Be sure to test for all errors including (pollfd[0].revents &
- POLLERR). This can happen, e.g. when ranges supplied were
- incorrect.
+- Be sure to test for all errors including
+ (``pollfd[0].revents & POLLERR``). This can happen, e.g. when ranges
+ supplied were incorrect.
Write Protect Notifications
---------------------------
@@ -136,41 +198,117 @@ Write Protect Notifications
This is equivalent to (but faster than) using mprotect and a SIGSEGV
signal handler.
-Firstly you need to register a range with UFFDIO_REGISTER_MODE_WP.
-Instead of using mprotect(2) you use ioctl(uffd, UFFDIO_WRITEPROTECT,
-struct *uffdio_writeprotect) while mode = UFFDIO_WRITEPROTECT_MODE_WP
+Firstly you need to register a range with ``UFFDIO_REGISTER_MODE_WP``.
+Instead of using mprotect(2) you use
+``ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect)``
+while ``mode = UFFDIO_WRITEPROTECT_MODE_WP``
in the struct passed in. The range does not default to and does not
have to be identical to the range you registered with. You can write
protect as many ranges as you like (inside the registered range).
Then, in the thread reading from uffd the struct will have
-msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP set. Now you send
-ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect) again
-while pagefault.mode does not have UFFDIO_WRITEPROTECT_MODE_WP set.
-This wakes up the thread which will continue to run with writes. This
+``msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP`` set. Now you send
+``ioctl(uffd, UFFDIO_WRITEPROTECT, struct *uffdio_writeprotect)``
+again while ``pagefault.mode`` does not have ``UFFDIO_WRITEPROTECT_MODE_WP``
+set. This wakes up the thread which will continue to run with writes. This
allows you to do the bookkeeping about the write in the uffd reading
thread before the ioctl.
-If you registered with both UFFDIO_REGISTER_MODE_MISSING and
-UFFDIO_REGISTER_MODE_WP then you need to think about the sequence in
+If you registered with both ``UFFDIO_REGISTER_MODE_MISSING`` and
+``UFFDIO_REGISTER_MODE_WP`` then you need to think about the sequence in
which you supply a page and undo write protect. Note that there is a
difference between writes into a WP area and into a !WP area. The
-former will have UFFD_PAGEFAULT_FLAG_WP set, the latter
-UFFD_PAGEFAULT_FLAG_WRITE. The latter did not fail on protection but
-you still need to supply a page when UFFDIO_REGISTER_MODE_MISSING was
+former will have ``UFFD_PAGEFAULT_FLAG_WP`` set, the latter
+``UFFD_PAGEFAULT_FLAG_WRITE``. The latter did not fail on protection but
+you still need to supply a page when ``UFFDIO_REGISTER_MODE_MISSING`` was
used.
+Userfaultfd write-protect mode currently behave differently on none ptes
+(when e.g. page is missing) over different types of memories.
+
+For anonymous memory, ``ioctl(UFFDIO_WRITEPROTECT)`` will ignore none ptes
+(e.g. when pages are missing and not populated). For file-backed memories
+like shmem and hugetlbfs, none ptes will be write protected just like a
+present pte. In other words, there will be a userfaultfd write fault
+message generated when writing to a missing page on file typed memories,
+as long as the page range was write-protected before. Such a message will
+not be generated on anonymous memories by default.
+
+If the application wants to be able to write protect none ptes on anonymous
+memory, one can pre-populate the memory with e.g. MADV_POPULATE_READ. On
+newer kernels, one can also detect the feature UFFD_FEATURE_WP_UNPOPULATED
+and set the feature bit in advance to make sure none ptes will also be
+write protected even upon anonymous memory.
+
+When using ``UFFDIO_REGISTER_MODE_WP`` in combination with either
+``UFFDIO_REGISTER_MODE_MISSING`` or ``UFFDIO_REGISTER_MODE_MINOR``, when
+resolving missing / minor faults with ``UFFDIO_COPY`` or ``UFFDIO_CONTINUE``
+respectively, it may be desirable for the new page / mapping to be
+write-protected (so future writes will also result in a WP fault). These ioctls
+support a mode flag (``UFFDIO_COPY_MODE_WP`` or ``UFFDIO_CONTINUE_MODE_WP``
+respectively) to configure the mapping this way.
+
+If the userfaultfd context has ``UFFD_FEATURE_WP_ASYNC`` feature bit set,
+any vma registered with write-protection will work in async mode rather
+than the default sync mode.
+
+In async mode, there will be no message generated when a write operation
+happens, meanwhile the write-protection will be resolved automatically by
+the kernel. It can be seen as a more accurate version of soft-dirty
+tracking and it can be different in a few ways:
+
+ - The dirty result will not be affected by vma changes (e.g. vma
+ merging) because the dirty is only tracked by the pte.
+
+ - It supports range operations by default, so one can enable tracking on
+ any range of memory as long as page aligned.
+
+ - Dirty information will not get lost if the pte was zapped due to
+ various reasons (e.g. during split of a shmem transparent huge page).
+
+ - Due to a reverted meaning of soft-dirty (page clean when uffd-wp bit
+ set; dirty when uffd-wp bit cleared), it has different semantics on
+ some of the memory operations. For example: ``MADV_DONTNEED`` on
+ anonymous (or ``MADV_REMOVE`` on a file mapping) will be treated as
+ dirtying of memory by dropping uffd-wp bit during the procedure.
+
+The user app can collect the "written/dirty" status by looking up the
+uffd-wp bit for the pages being interested in /proc/pagemap.
+
+The page will not be under track of uffd-wp async mode until the page is
+explicitly write-protected by ``ioctl(UFFDIO_WRITEPROTECT)`` with the mode
+flag ``UFFDIO_WRITEPROTECT_MODE_WP`` set. Trying to resolve a page fault
+that was tracked by async mode userfaultfd-wp is invalid.
+
+When userfaultfd-wp async mode is used alone, it can be applied to all
+kinds of memory.
+
+Memory Poisioning Emulation
+---------------------------
+
+In response to a fault (either missing or minor), an action userspace can
+take to "resolve" it is to issue a ``UFFDIO_POISON``. This will cause any
+future faulters to either get a SIGBUS, or in KVM's case the guest will
+receive an MCE as if there were hardware memory poisoning.
+
+This is used to emulate hardware memory poisoning. Imagine a VM running on a
+machine which experiences a real hardware memory error. Later, we live migrate
+the VM to another physical machine. Since we want the migration to be
+transparent to the guest, we want that same address range to act as if it was
+still poisoned, even though it's on a new physical host which ostensibly
+doesn't have a memory error in the exact same spot.
+
QEMU/KVM
========
-QEMU/KVM is using the userfaultfd syscall to implement postcopy live
+QEMU/KVM is using the ``userfaultfd`` syscall to implement postcopy live
migration. Postcopy live migration is one form of memory
externalization consisting of a virtual machine running with part or
all of its memory residing on a different node in the cloud. The
-userfaultfd abstraction is generic enough that not a single line of
+``userfaultfd`` abstraction is generic enough that not a single line of
KVM kernel code had to be modified in order to add postcopy live
migration to QEMU.
-Guest async page faults, FOLL_NOWAIT and all other GUP features work
+Guest async page faults, ``FOLL_NOWAIT`` and all other ``GUP*`` features work
just fine in combination with userfaults. Userfaults trigger async
page faults in the guest scheduler so those guest processes that
aren't waiting for userfaults (i.e. network bound) can keep running in
@@ -183,19 +321,19 @@ generating userfaults for readonly guest regions.
The implementation of postcopy live migration currently uses one
single bidirectional socket but in the future two different sockets
will be used (to reduce the latency of the userfaults to the minimum
-possible without having to decrease /proc/sys/net/ipv4/tcp_wmem).
+possible without having to decrease ``/proc/sys/net/ipv4/tcp_wmem``).
The QEMU in the source node writes all pages that it knows are missing
in the destination node, into the socket, and the migration thread of
-the QEMU running in the destination node runs UFFDIO_COPY|ZEROPAGE
-ioctls on the userfaultfd in order to map the received pages into the
-guest (UFFDIO_ZEROCOPY is used if the source page was a zero page).
+the QEMU running in the destination node runs ``UFFDIO_COPY|ZEROPAGE``
+ioctls on the ``userfaultfd`` in order to map the received pages into the
+guest (``UFFDIO_ZEROCOPY`` is used if the source page was a zero page).
A different postcopy thread in the destination node listens with
-poll() to the userfaultfd in parallel. When a POLLIN event is
+poll() to the ``userfaultfd`` in parallel. When a ``POLLIN`` event is
generated after a userfault triggers, the postcopy thread read() from
-the userfaultfd and receives the fault address (or -EAGAIN in case the
-userfault was already resolved and waken by a UFFDIO_COPY|ZEROPAGE run
+the ``userfaultfd`` and receives the fault address (or ``-EAGAIN`` in case the
+userfault was already resolved and waken by a ``UFFDIO_COPY|ZEROPAGE`` run
by the parallel QEMU migration thread).
After the QEMU postcopy thread (running in the destination node) gets
@@ -206,7 +344,7 @@ remaining missing pages from that new page offset. Soon after that
(just the time to flush the tcp_wmem queue through the network) the
migration thread in the QEMU running in the destination node will
receive the page that triggered the userfault and it'll map it as
-usual with the UFFDIO_COPY|ZEROPAGE (without actually knowing if it
+usual with the ``UFFDIO_COPY|ZEROPAGE`` (without actually knowing if it
was spontaneously sent by the source or if it was an urgent page
requested through a userfault).
@@ -219,74 +357,74 @@ checked to find which missing pages to send in round robin and we seek
over it when receiving incoming userfaults. After sending each page of
course the bitmap is updated accordingly. It's also useful to avoid
sending the same page twice (in case the userfault is read by the
-postcopy thread just before UFFDIO_COPY|ZEROPAGE runs in the migration
+postcopy thread just before ``UFFDIO_COPY|ZEROPAGE`` runs in the migration
thread).
Non-cooperative userfaultfd
===========================
-When the userfaultfd is monitored by an external manager, the manager
+When the ``userfaultfd`` is monitored by an external manager, the manager
must be able to track changes in the process virtual memory
layout. Userfaultfd can notify the manager about such changes using
the same read(2) protocol as for the page fault notifications. The
manager has to explicitly enable these events by setting appropriate
-bits in uffdio_api.features passed to UFFDIO_API ioctl:
+bits in ``uffdio_api.features`` passed to ``UFFDIO_API`` ioctl:
-UFFD_FEATURE_EVENT_FORK
- enable userfaultfd hooks for fork(). When this feature is
- enabled, the userfaultfd context of the parent process is
+``UFFD_FEATURE_EVENT_FORK``
+ enable ``userfaultfd`` hooks for fork(). When this feature is
+ enabled, the ``userfaultfd`` context of the parent process is
duplicated into the newly created process. The manager
- receives UFFD_EVENT_FORK with file descriptor of the new
- userfaultfd context in the uffd_msg.fork.
+ receives ``UFFD_EVENT_FORK`` with file descriptor of the new
+ ``userfaultfd`` context in the ``uffd_msg.fork``.
-UFFD_FEATURE_EVENT_REMAP
+``UFFD_FEATURE_EVENT_REMAP``
enable notifications about mremap() calls. When the
non-cooperative process moves a virtual memory area to a
different location, the manager will receive
- UFFD_EVENT_REMAP. The uffd_msg.remap will contain the old and
+ ``UFFD_EVENT_REMAP``. The ``uffd_msg.remap`` will contain the old and
new addresses of the area and its original length.
-UFFD_FEATURE_EVENT_REMOVE
+``UFFD_FEATURE_EVENT_REMOVE``
enable notifications about madvise(MADV_REMOVE) and
- madvise(MADV_DONTNEED) calls. The event UFFD_EVENT_REMOVE will
- be generated upon these calls to madvise. The uffd_msg.remove
+ madvise(MADV_DONTNEED) calls. The event ``UFFD_EVENT_REMOVE`` will
+ be generated upon these calls to madvise(). The ``uffd_msg.remove``
will contain start and end addresses of the removed area.
-UFFD_FEATURE_EVENT_UNMAP
+``UFFD_FEATURE_EVENT_UNMAP``
enable notifications about memory unmapping. The manager will
- get UFFD_EVENT_UNMAP with uffd_msg.remove containing start and
+ get ``UFFD_EVENT_UNMAP`` with ``uffd_msg.remove`` containing start and
end addresses of the unmapped area.
-Although the UFFD_FEATURE_EVENT_REMOVE and UFFD_FEATURE_EVENT_UNMAP
+Although the ``UFFD_FEATURE_EVENT_REMOVE`` and ``UFFD_FEATURE_EVENT_UNMAP``
are pretty similar, they quite differ in the action expected from the
-userfaultfd manager. In the former case, the virtual memory is
+``userfaultfd`` manager. In the former case, the virtual memory is
removed, but the area is not, the area remains monitored by the
-userfaultfd, and if a page fault occurs in that area it will be
+``userfaultfd``, and if a page fault occurs in that area it will be
delivered to the manager. The proper resolution for such page fault is
to zeromap the faulting address. However, in the latter case, when an
area is unmapped, either explicitly (with munmap() system call), or
implicitly (e.g. during mremap()), the area is removed and in turn the
-userfaultfd context for such area disappears too and the manager will
+``userfaultfd`` context for such area disappears too and the manager will
not get further userland page faults from the removed area. Still, the
notification is required in order to prevent manager from using
-UFFDIO_COPY on the unmapped area.
+``UFFDIO_COPY`` on the unmapped area.
Unlike userland page faults which have to be synchronous and require
explicit or implicit wakeup, all the events are delivered
asynchronously and the non-cooperative process resumes execution as
-soon as manager executes read(). The userfaultfd manager should
-carefully synchronize calls to UFFDIO_COPY with the events
-processing. To aid the synchronization, the UFFDIO_COPY ioctl will
-return -ENOSPC when the monitored process exits at the time of
-UFFDIO_COPY, and -ENOENT, when the non-cooperative process has changed
-its virtual memory layout simultaneously with outstanding UFFDIO_COPY
+soon as manager executes read(). The ``userfaultfd`` manager should
+carefully synchronize calls to ``UFFDIO_COPY`` with the events
+processing. To aid the synchronization, the ``UFFDIO_COPY`` ioctl will
+return ``-ENOSPC`` when the monitored process exits at the time of
+``UFFDIO_COPY``, and ``-ENOENT``, when the non-cooperative process has changed
+its virtual memory layout simultaneously with outstanding ``UFFDIO_COPY``
operation.
The current asynchronous model of the event delivery is optimal for
-single threaded non-cooperative userfaultfd manager implementations. A
+single threaded non-cooperative ``userfaultfd`` manager implementations. A
synchronous event delivery model can be added later as a new
-userfaultfd feature to facilitate multithreading enhancements of the
-non cooperative manager, for example to allow UFFDIO_COPY ioctls to
+``userfaultfd`` feature to facilitate multithreading enhancements of the
+non cooperative manager, for example to allow ``UFFDIO_COPY`` ioctls to
run in parallel to the event reception. Single threaded
implementations should continue to use the current async event
delivery model instead.
diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst
new file mode 100644
index 000000000000..b42132969e31
--- /dev/null
+++ b/Documentation/admin-guide/mm/zswap.rst
@@ -0,0 +1,178 @@
+=====
+zswap
+=====
+
+Overview
+========
+
+Zswap is a lightweight compressed cache for swap pages. It takes pages that are
+in the process of being swapped out and attempts to compress them into a
+dynamically allocated RAM-based memory pool. zswap basically trades CPU cycles
+for potentially reduced swap I/O. This trade-off can also result in a
+significant performance improvement if reads from the compressed cache are
+faster than reads from a swap device.
+
+Some potential benefits:
+
+* Desktop/laptop users with limited RAM capacities can mitigate the
+ performance impact of swapping.
+* Overcommitted guests that share a common I/O resource can
+ dramatically reduce their swap I/O pressure, avoiding heavy handed I/O
+ throttling by the hypervisor. This allows more work to get done with less
+ impact to the guest workload and guests sharing the I/O subsystem
+* Users with SSDs as swap devices can extend the life of the device by
+ drastically reducing life-shortening writes.
+
+Zswap evicts pages from compressed cache on an LRU basis to the backing swap
+device when the compressed pool reaches its size limit. This requirement had
+been identified in prior community discussions.
+
+Whether Zswap is enabled at the boot time depends on whether
+the ``CONFIG_ZSWAP_DEFAULT_ON`` Kconfig option is enabled or not.
+This setting can then be overridden by providing the kernel command line
+``zswap.enabled=`` option, for example ``zswap.enabled=0``.
+Zswap can also be enabled and disabled at runtime using the sysfs interface.
+An example command to enable zswap at runtime, assuming sysfs is mounted
+at ``/sys``, is::
+
+ echo 1 > /sys/module/zswap/parameters/enabled
+
+When zswap is disabled at runtime it will stop storing pages that are
+being swapped out. However, it will _not_ immediately write out or fault
+back into memory all of the pages stored in the compressed pool. The
+pages stored in zswap will remain in the compressed pool until they are
+either invalidated or faulted back into memory. In order to force all
+pages out of the compressed pool, a swapoff on the swap device(s) will
+fault back into memory all swapped out pages, including those in the
+compressed pool.
+
+Design
+======
+
+Zswap receives pages for compression from the swap subsystem and is able to
+evict pages from its own compressed pool on an LRU basis and write them back to
+the backing swap device in the case that the compressed pool is full.
+
+Zswap makes use of zpool for the managing the compressed memory pool. Each
+allocation in zpool is not directly accessible by address. Rather, a handle is
+returned by the allocation routine and that handle must be mapped before being
+accessed. The compressed memory pool grows on demand and shrinks as compressed
+pages are freed. The pool is not preallocated. By default, a zpool
+of type selected in ``CONFIG_ZSWAP_ZPOOL_DEFAULT`` Kconfig option is created,
+but it can be overridden at boot time by setting the ``zpool`` attribute,
+e.g. ``zswap.zpool=zbud``. It can also be changed at runtime using the sysfs
+``zpool`` attribute, e.g.::
+
+ echo zbud > /sys/module/zswap/parameters/zpool
+
+The zbud type zpool allocates exactly 1 page to store 2 compressed pages, which
+means the compression ratio will always be 2:1 or worse (because of half-full
+zbud pages). The zsmalloc type zpool has a more complex compressed page
+storage method, and it can achieve greater storage densities.
+
+When a swap page is passed from swapout to zswap, zswap maintains a mapping
+of the swap entry, a combination of the swap type and swap offset, to the zpool
+handle that references that compressed swap page. This mapping is achieved
+with a red-black tree per swap type. The swap offset is the search key for the
+tree nodes.
+
+During a page fault on a PTE that is a swap entry, the swapin code calls the
+zswap load function to decompress the page into the page allocated by the page
+fault handler.
+
+Once there are no PTEs referencing a swap page stored in zswap (i.e. the count
+in the swap_map goes to 0) the swap code calls the zswap invalidate function
+to free the compressed entry.
+
+Zswap seeks to be simple in its policies. Sysfs attributes allow for one user
+controlled policy:
+
+* max_pool_percent - The maximum percentage of memory that the compressed
+ pool can occupy.
+
+The default compressor is selected in ``CONFIG_ZSWAP_COMPRESSOR_DEFAULT``
+Kconfig option, but it can be overridden at boot time by setting the
+``compressor`` attribute, e.g. ``zswap.compressor=lzo``.
+It can also be changed at runtime using the sysfs "compressor"
+attribute, e.g.::
+
+ echo lzo > /sys/module/zswap/parameters/compressor
+
+When the zpool and/or compressor parameter is changed at runtime, any existing
+compressed pages are not modified; they are left in their own zpool. When a
+request is made for a page in an old zpool, it is uncompressed using its
+original compressor. Once all pages are removed from an old zpool, the zpool
+and its compressor are freed.
+
+Some of the pages in zswap are same-value filled pages (i.e. contents of the
+page have same value or repetitive pattern). These pages include zero-filled
+pages and they are handled differently. During store operation, a page is
+checked if it is a same-value filled page before compressing it. If true, the
+compressed length of the page is set to zero and the pattern or same-filled
+value is stored.
+
+Same-value filled pages identification feature is enabled by default and can be
+disabled at boot time by setting the ``same_filled_pages_enabled`` attribute
+to 0, e.g. ``zswap.same_filled_pages_enabled=0``. It can also be enabled and
+disabled at runtime using the sysfs ``same_filled_pages_enabled``
+attribute, e.g.::
+
+ echo 1 > /sys/module/zswap/parameters/same_filled_pages_enabled
+
+When zswap same-filled page identification is disabled at runtime, it will stop
+checking for the same-value filled pages during store operation.
+In other words, every page will be then considered non-same-value filled.
+However, the existing pages which are marked as same-value filled pages remain
+stored unchanged in zswap until they are either loaded or invalidated.
+
+In some circumstances it might be advantageous to make use of just the zswap
+ability to efficiently store same-filled pages without enabling the whole
+compressed page storage.
+In this case the handling of non-same-value pages by zswap (enabled by default)
+can be disabled by setting the ``non_same_filled_pages_enabled`` attribute
+to 0, e.g. ``zswap.non_same_filled_pages_enabled=0``.
+It can also be enabled and disabled at runtime using the sysfs
+``non_same_filled_pages_enabled`` attribute, e.g.::
+
+ echo 1 > /sys/module/zswap/parameters/non_same_filled_pages_enabled
+
+Disabling both ``zswap.same_filled_pages_enabled`` and
+``zswap.non_same_filled_pages_enabled`` effectively disables accepting any new
+pages by zswap.
+
+To prevent zswap from shrinking pool when zswap is full and there's a high
+pressure on swap (this will result in flipping pages in and out zswap pool
+without any real benefit but with a performance drop for the system), a
+special parameter has been introduced to implement a sort of hysteresis to
+refuse taking pages into zswap pool until it has sufficient space if the limit
+has been hit. To set the threshold at which zswap would start accepting pages
+again after it became full, use the sysfs ``accept_threshold_percent``
+attribute, e. g.::
+
+ echo 80 > /sys/module/zswap/parameters/accept_threshold_percent
+
+Setting this parameter to 100 will disable the hysteresis.
+
+Some users cannot tolerate the swapping that comes with zswap store failures
+and zswap writebacks. Swapping can be disabled entirely (without disabling
+zswap itself) on a cgroup-basis as follows:
+
+ echo 0 > /sys/fs/cgroup/<cgroup-name>/memory.zswap.writeback
+
+Note that if the store failures are recurring (for e.g if the pages are
+incompressible), users can observe reclaim inefficiency after disabling
+writeback (because the same pages might be rejected again and again).
+
+When there is a sizable amount of cold memory residing in the zswap pool, it
+can be advantageous to proactively write these cold pages to swap and reclaim
+the memory for other use cases. By default, the zswap shrinker is disabled.
+User can enable it as follows:
+
+ echo Y > /sys/module/zswap/parameters/shrinker_enabled
+
+This can be enabled at the boot time if ``CONFIG_ZSWAP_SHRINKER_DEFAULT_ON`` is
+selected.
+
+A debugfs interface is provided for various statistic about pool size, number
+of pages stored, same-value filled pages and various counters for the reasons
+pages are rejected.
diff --git a/Documentation/admin-guide/module-signing.rst b/Documentation/admin-guide/module-signing.rst
index f8b584179cff..a8667a777490 100644
--- a/Documentation/admin-guide/module-signing.rst
+++ b/Documentation/admin-guide/module-signing.rst
@@ -28,10 +28,10 @@ trusted userspace bits.
This facility uses X.509 ITU-T standard certificates to encode the public keys
involved. The signatures are not themselves encoded in any industrial standard
-type. The facility currently only supports the RSA public key encryption
-standard (though it is pluggable and permits others to be used). The possible
-hash algorithms that can be used are SHA-1, SHA-224, SHA-256, SHA-384, and
-SHA-512 (the algorithm is selected by data in the signature).
+type. The built-in facility currently only supports the RSA & NIST P-384 ECDSA
+public key signing standard (though it is pluggable and permits others to be
+used). The possible hash algorithms that can be used are SHA-2 and SHA-3 of
+sizes 256, 384, and 512 (the algorithm is selected by data in the signature).
==========================
@@ -81,11 +81,12 @@ This has a number of options available:
sign the modules with:
=============================== ==========================================
- ``CONFIG_MODULE_SIG_SHA1`` :menuselection:`Sign modules with SHA-1`
- ``CONFIG_MODULE_SIG_SHA224`` :menuselection:`Sign modules with SHA-224`
``CONFIG_MODULE_SIG_SHA256`` :menuselection:`Sign modules with SHA-256`
``CONFIG_MODULE_SIG_SHA384`` :menuselection:`Sign modules with SHA-384`
``CONFIG_MODULE_SIG_SHA512`` :menuselection:`Sign modules with SHA-512`
+ ``CONFIG_MODULE_SIG_SHA3_256`` :menuselection:`Sign modules with SHA3-256`
+ ``CONFIG_MODULE_SIG_SHA3_384`` :menuselection:`Sign modules with SHA3-384`
+ ``CONFIG_MODULE_SIG_SHA3_512`` :menuselection:`Sign modules with SHA3-512`
=============================== ==========================================
The algorithm selected here will also be built into the kernel (rather
@@ -106,7 +107,7 @@ This has a number of options available:
certificate and a private key.
If the PEM file containing the private key is encrypted, or if the
- PKCS#11 token requries a PIN, this can be provided at build time by
+ PKCS#11 token requires a PIN, this can be provided at build time by
means of the ``KBUILD_SIGN_PIN`` variable.
@@ -145,6 +146,10 @@ into vmlinux) using parameters in the::
file (which is also generated if it does not already exist).
+One can select between RSA (``MODULE_SIG_KEY_TYPE_RSA``) and ECDSA
+(``MODULE_SIG_KEY_TYPE_ECDSA``) to generate either RSA 4k or NIST
+P-384 keypair.
+
It is strongly recommended that you provide your own x509.genkey file.
Most notably, in the x509.genkey file, the req_distinguished_name section
@@ -266,7 +271,7 @@ for which it has a public key. Otherwise, it will also load modules that are
unsigned. Any module for which the kernel has a key, but which proves to have
a signature mismatch will not be permitted to load.
-Any module that has an unparseable signature will be rejected.
+Any module that has an unparsable signature will be rejected.
=========================================
diff --git a/Documentation/admin-guide/mono.rst b/Documentation/admin-guide/mono.rst
index 59e6d59f0ed9..c6dab5680065 100644
--- a/Documentation/admin-guide/mono.rst
+++ b/Documentation/admin-guide/mono.rst
@@ -12,11 +12,11 @@ other program after you have done the following:
a binary package, a source tarball or by installing from Git. Binary
packages for several distributions can be found at:
- http://www.mono-project.com/download/
+ https://www.mono-project.com/download/
Instructions for compiling Mono can be found at:
- http://www.mono-project.com/docs/compiling-mono/linux/
+ https://www.mono-project.com/docs/compiling-mono/linux/
Once the Mono CLR support has been installed, just check that
``/usr/bin/mono`` (which could be located elsewhere, for example
diff --git a/Documentation/admin-guide/nfs/fault_injection.rst b/Documentation/admin-guide/nfs/fault_injection.rst
deleted file mode 100644
index eb029c0c15ce..000000000000
--- a/Documentation/admin-guide/nfs/fault_injection.rst
+++ /dev/null
@@ -1,70 +0,0 @@
-===================
-NFS Fault Injection
-===================
-
-Fault injection is a method for forcing errors that may not normally occur, or
-may be difficult to reproduce. Forcing these errors in a controlled environment
-can help the developer find and fix bugs before their code is shipped in a
-production system. Injecting an error on the Linux NFS server will allow us to
-observe how the client reacts and if it manages to recover its state correctly.
-
-NFSD_FAULT_INJECTION must be selected when configuring the kernel to use this
-feature.
-
-
-Using Fault Injection
-=====================
-On the client, mount the fault injection server through NFS v4.0+ and do some
-work over NFS (open files, take locks, ...).
-
-On the server, mount the debugfs filesystem to <debug_dir> and ls
-<debug_dir>/nfsd. This will show a list of files that will be used for
-injecting faults on the NFS server. As root, write a number n to the file
-corresponding to the action you want the server to take. The server will then
-process the first n items it finds. So if you want to forget 5 locks, echo '5'
-to <debug_dir>/nfsd/forget_locks. A value of 0 will tell the server to forget
-all corresponding items. A log message will be created containing the number
-of items forgotten (check dmesg).
-
-Go back to work on the client and check if the client recovered from the error
-correctly.
-
-
-Available Faults
-================
-forget_clients:
- The NFS server keeps a list of clients that have placed a mount call. If
- this list is cleared, the server will have no knowledge of who the client
- is, forcing the client to reauthenticate with the server.
-
-forget_openowners:
- The NFS server keeps a list of what files are currently opened and who
- they were opened by. Clearing this list will force the client to reopen
- its files.
-
-forget_locks:
- The NFS server keeps a list of what files are currently locked in the VFS.
- Clearing this list will force the client to reclaim its locks (files are
- unlocked through the VFS as they are cleared from this list).
-
-forget_delegations:
- A delegation is used to assure the client that a file, or part of a file,
- has not changed since the delegation was awarded. Clearing this list will
- force the client to reacquire its delegation before accessing the file
- again.
-
-recall_delegations:
- Delegations can be recalled by the server when another client attempts to
- access a file. This test will notify the client that its delegation has
- been revoked, forcing the client to reacquire the delegation before using
- the file again.
-
-
-tools/nfs/inject_faults.sh script
-=================================
-This script has been created to ease the fault injection process. This script
-will detect the mounted debugfs directory and write to the files located there
-based on the arguments passed by the user. For example, running
-`inject_faults.sh forget_locks 1` as root will instruct the server to forget
-one lock. Running `inject_faults forget_locks` will instruct the server to
-forgetall locks.
diff --git a/Documentation/admin-guide/nfs/index.rst b/Documentation/admin-guide/nfs/index.rst
index 6b5a3c90fac5..3601a708f333 100644
--- a/Documentation/admin-guide/nfs/index.rst
+++ b/Documentation/admin-guide/nfs/index.rst
@@ -12,4 +12,3 @@ NFS
nfs-idmapper
pnfs-block-server
pnfs-scsi-server
- fault_injection
diff --git a/Documentation/admin-guide/nfs/nfs-client.rst b/Documentation/admin-guide/nfs/nfs-client.rst
index c4b777c7584b..36760685dd34 100644
--- a/Documentation/admin-guide/nfs/nfs-client.rst
+++ b/Documentation/admin-guide/nfs/nfs-client.rst
@@ -36,10 +36,9 @@ administrative requirements that require particular behavior that does not
work well as part of an nfs_client_id4 string.
The nfs.nfs4_unique_id boot parameter specifies a unique string that can be
-used instead of a system's node name when an NFS client identifies itself to
-a server. Thus, if the system's node name is not unique, or it changes, its
-nfs.nfs4_unique_id stays the same, preventing collision with other clients
-or loss of state during NFS reboot recovery or transparent state migration.
+used together with a system's node name when an NFS client identifies itself to
+a server. Thus, if the system's node name is not unique, its
+nfs.nfs4_unique_id can help prevent collisions with other clients.
The nfs.nfs4_unique_id string is typically a UUID, though it can contain
anything that is believed to be unique across all NFS clients. An
@@ -53,8 +52,12 @@ outstanding NFSv4 state has expired, to prevent loss of NFSv4 state.
This string can be stored in an NFS client's grub.conf, or it can be provided
via a net boot facility such as PXE. It may also be specified as an nfs.ko
-module parameter. Specifying a uniquifier string is not support for NFS
-clients running in containers.
+module parameter.
+
+This uniquifier string will be the same for all NFS clients running in
+containers unless it is overridden by a value written to
+/sys/fs/nfs/net/nfs_client/identifier which will be local to the network
+namespace of the process which writes.
The DNS resolver
@@ -65,8 +68,8 @@ migrated onto another server by means of the special "fs_locations"
attribute. See `RFC3530 Section 6: Filesystem Migration and Replication`_ and
`Implementation Guide for Referrals in NFSv4`_.
-.. _RFC3530 Section 6\: Filesystem Migration and Replication: http://tools.ietf.org/html/rfc3530#section-6
-.. _Implementation Guide for Referrals in NFSv4: http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00
+.. _RFC3530 Section 6\: Filesystem Migration and Replication: https://tools.ietf.org/html/rfc3530#section-6
+.. _Implementation Guide for Referrals in NFSv4: https://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00
The fs_locations information can take the form of either an ip address and
a path, or a DNS hostname and a path. The latter requires the NFS client to
diff --git a/Documentation/admin-guide/nfs/nfs-rdma.rst b/Documentation/admin-guide/nfs/nfs-rdma.rst
index ef0f3678b1fb..f137485f8bde 100644
--- a/Documentation/admin-guide/nfs/nfs-rdma.rst
+++ b/Documentation/admin-guide/nfs/nfs-rdma.rst
@@ -65,7 +65,7 @@ use with NFS/RDMA.
If the version is less than 1.1.2 or the command does not exist,
you should install the latest version of nfs-utils.
- Download the latest package from: http://www.kernel.org/pub/linux/utils/nfs
+ Download the latest package from: https://www.kernel.org/pub/linux/utils/nfs
Uncompress the package and follow the installation instructions.
diff --git a/Documentation/admin-guide/nfs/nfsroot.rst b/Documentation/admin-guide/nfs/nfsroot.rst
index 82a4fda057f9..135218f33394 100644
--- a/Documentation/admin-guide/nfs/nfsroot.rst
+++ b/Documentation/admin-guide/nfs/nfsroot.rst
@@ -18,7 +18,7 @@ Mounting the root filesystem via NFS (nfsroot)
In order to use a diskless system, such as an X-terminal or printer server for
example, it is necessary for the root filesystem to be present on a non-disk
device. This may be an initramfs (see
-Documentation/filesystems/ramfs-rootfs-initramfs.txt), a ramdisk (see
+Documentation/filesystems/ramfs-rootfs-initramfs.rst), a ramdisk (see
Documentation/admin-guide/initrd.rst) or a filesystem mounted via NFS. The
following text describes on how to use NFS for the root filesystem. For the rest
of this text 'client' means the diskless system, and 'server' means the NFS
@@ -264,7 +264,7 @@ They depend on various facilities being available:
access to the floppy drive device, /dev/fd0
For more information on syslinux, including how to create bootdisks
- for prebuilt kernels, see http://syslinux.zytor.com/
+ for prebuilt kernels, see https://syslinux.zytor.com/
.. note::
Previously it was possible to write a kernel directly to
@@ -292,7 +292,7 @@ They depend on various facilities being available:
cdrecord dev=ATAPI:1,0,0 arch/x86/boot/image.iso
For more information on isolinux, including how to create bootdisks
- for prebuilt kernels, see http://syslinux.zytor.com/
+ for prebuilt kernels, see https://syslinux.zytor.com/
- Using LILO
@@ -346,7 +346,7 @@ They depend on various facilities being available:
see Documentation/admin-guide/serial-console.rst for more information.
For more information on isolinux, including how to create bootdisks
- for prebuilt kernels, see http://syslinux.zytor.com/
+ for prebuilt kernels, see https://syslinux.zytor.com/
diff --git a/Documentation/admin-guide/nfs/pnfs-block-server.rst b/Documentation/admin-guide/nfs/pnfs-block-server.rst
index b00a2e705cc4..20fe9f5117fe 100644
--- a/Documentation/admin-guide/nfs/pnfs-block-server.rst
+++ b/Documentation/admin-guide/nfs/pnfs-block-server.rst
@@ -8,7 +8,7 @@ to handling all the metadata access to the NFS export also hands out layouts
to the clients to directly access the underlying block devices that are
shared with the client.
-To use pNFS block layouts with with the Linux NFS server the exported file
+To use pNFS block layouts with the Linux NFS server the exported file
system needs to support the pNFS block layouts (currently just XFS), and the
file system must sit on shared storage (typically iSCSI) that is accessible
to the clients in addition to the MDS. As of now the file system needs to
diff --git a/Documentation/admin-guide/nfs/pnfs-scsi-server.rst b/Documentation/admin-guide/nfs/pnfs-scsi-server.rst
index d2f6ee558071..b2eec2288329 100644
--- a/Documentation/admin-guide/nfs/pnfs-scsi-server.rst
+++ b/Documentation/admin-guide/nfs/pnfs-scsi-server.rst
@@ -9,7 +9,7 @@ which in addition to handling all the metadata access to the NFS export,
also hands out layouts to the clients so that they can directly access the
underlying SCSI LUNs that are shared with the client.
-To use pNFS SCSI layouts with with the Linux NFS server, the exported file
+To use pNFS SCSI layouts with the Linux NFS server, the exported file
system needs to support the pNFS SCSI layouts (currently just XFS), and the
file system must sit on a SCSI LUN that is accessible to the clients in
addition to the MDS. As of now the file system needs to sit directly on the
diff --git a/Documentation/admin-guide/numastat.rst b/Documentation/admin-guide/numastat.rst
index aaf1667489f8..08ec2c2bdce3 100644
--- a/Documentation/admin-guide/numastat.rst
+++ b/Documentation/admin-guide/numastat.rst
@@ -6,6 +6,21 @@ Numa policy hit/miss statistics
All units are pages. Hugepages have separate counters.
+The numa_hit, numa_miss and numa_foreign counters reflect how well processes
+are able to allocate memory from nodes they prefer. If they succeed, numa_hit
+is incremented on the preferred node, otherwise numa_foreign is incremented on
+the preferred node and numa_miss on the node where allocation succeeded.
+
+Usually preferred node is the one local to the CPU where the process executes,
+but restrictions such as mempolicies can change that, so there are also two
+counters based on CPU local node. local_node is similar to numa_hit and is
+incremented on allocation from a node by CPU on the same node. other_node is
+similar to numa_miss and is incremented on the node where allocation succeeds
+from a CPU from a different node. Note there is no counter analogical to
+numa_foreign.
+
+In more detail:
+
=============== ============================================================
numa_hit A process wanted to allocate memory from this node,
and succeeded.
@@ -14,11 +29,13 @@ numa_miss A process wanted to allocate memory from another node,
but ended up with memory from this node.
numa_foreign A process wanted to allocate on this node,
- but ended up with memory from another one.
+ but ended up with memory from another node.
-local_node A process ran on this node and got memory from it.
+local_node A process ran on this node's CPU,
+ and got memory from this node.
-other_node A process ran on this node and got memory from another node.
+other_node A process ran on a different node's CPU
+ and got memory from this node.
interleave_hit Interleaving wanted to allocate from this node
and succeeded.
@@ -28,3 +45,11 @@ For easier reading you can use the numastat utility from the numactl package
(http://oss.sgi.com/projects/libnuma/). Note that it only works
well right now on machines with a small number of CPUs.
+Note that on systems with memoryless nodes (where a node has CPUs but no
+memory) the numa_hit, numa_miss and numa_foreign statistics can be skewed
+heavily. In the current kernel implementation, if a process prefers a
+memoryless node (i.e. because it is running on one of its local CPU), the
+implementation actually treats one of the nearest nodes with memory as the
+preferred node. As a result, such allocation will not increase the numa_foreign
+counter on the memoryless node, and will skew the numa_hit, numa_miss and
+numa_foreign statistics of the nearest node.
diff --git a/Documentation/admin-guide/perf-security.rst b/Documentation/admin-guide/perf-security.rst
index 72effa7c23b9..34aa334320ca 100644
--- a/Documentation/admin-guide/perf-security.rst
+++ b/Documentation/admin-guide/perf-security.rst
@@ -1,6 +1,6 @@
.. _perf_security:
-Perf Events and tool security
+Perf events and tool security
=============================
Overview
@@ -42,11 +42,11 @@ categories:
Data that belong to the fourth category can potentially contain
sensitive process data. If PMUs in some monitoring modes capture values
of execution context registers or data from process memory then access
-to such monitoring capabilities requires to be ordered and secured
-properly. So, perf_events/Perf performance monitoring is the subject for
-security access control management [5]_ .
+to such monitoring modes requires to be ordered and secured properly.
+So, perf_events performance monitoring and observability operations are
+the subject for security access control management [5]_ .
-perf_events/Perf access control
+perf_events access control
-------------------------------
To perform security checks, the Linux implementation splits processes
@@ -66,15 +66,32 @@ into distinct units, known as capabilities [6]_ , which can be
independently enabled and disabled on per-thread basis for processes and
files of unprivileged users.
-Unprivileged processes with enabled CAP_SYS_ADMIN capability are treated
+Unprivileged processes with enabled CAP_PERFMON capability are treated
as privileged processes with respect to perf_events performance
-monitoring and bypass *scope* permissions checks in the kernel.
-
-Unprivileged processes using perf_events system call API is also subject
-for PTRACE_MODE_READ_REALCREDS ptrace access mode check [7]_ , whose
-outcome determines whether monitoring is permitted. So unprivileged
-processes provided with CAP_SYS_PTRACE capability are effectively
-permitted to pass the check.
+monitoring and observability operations, thus, bypass *scope* permissions
+checks in the kernel. CAP_PERFMON implements the principle of least
+privilege [13]_ (POSIX 1003.1e: 2.2.2.39) for performance monitoring and
+observability operations in the kernel and provides a secure approach to
+performance monitoring and observability in the system.
+
+For backward compatibility reasons the access to perf_events monitoring and
+observability operations is also open for CAP_SYS_ADMIN privileged
+processes but CAP_SYS_ADMIN usage for secure monitoring and observability
+use cases is discouraged with respect to the CAP_PERFMON capability.
+If system audit records [14]_ for a process using perf_events system call
+API contain denial records of acquiring both CAP_PERFMON and CAP_SYS_ADMIN
+capabilities then providing the process with CAP_PERFMON capability singly
+is recommended as the preferred secure approach to resolve double access
+denial logging related to usage of performance monitoring and observability.
+
+Prior Linux v5.9 unprivileged processes using perf_events system call
+are also subject for PTRACE_MODE_READ_REALCREDS ptrace access mode check
+[7]_ , whose outcome determines whether monitoring is permitted.
+So unprivileged processes provided with CAP_SYS_PTRACE capability are
+effectively permitted to pass the check. Starting from Linux v5.9
+CAP_SYS_PTRACE capability is not required and CAP_PERFMON is enough to
+be provided for processes to make performance monitoring and observability
+operations.
Other capabilities being granted to unprivileged processes can
effectively enable capturing of additional data required for later
@@ -82,14 +99,14 @@ performance analysis of monitored processes or a system. For example,
CAP_SYSLOG capability permits reading kernel space memory addresses from
/proc/kallsyms file.
-perf_events/Perf privileged users
+Privileged Perf users groups
---------------------------------
-Mechanisms of capabilities, privileged capability-dumb files [6]_ and
-file system ACLs [10]_ can be used to create a dedicated group of
-perf_events/Perf privileged users who are permitted to execute
-performance monitoring without scope limits. The following steps can be
-taken to create such a group of privileged Perf users.
+Mechanisms of capabilities, privileged capability-dumb files [6]_,
+file system ACLs [10]_ and sudo [15]_ utility can be used to create
+dedicated groups of privileged Perf users who are permitted to execute
+performance monitoring and observability without limits. The following
+steps can be taken to create such groups of privileged Perf users.
1. Create perf_users group of privileged Perf users, assign perf_users
group to Perf tool executable and limit access to the executable for
@@ -108,30 +125,105 @@ taken to create such a group of privileged Perf users.
-rwxr-x--- 2 root perf_users 11M Oct 19 15:12 perf
2. Assign the required capabilities to the Perf tool executable file and
- enable members of perf_users group with performance monitoring
+ enable members of perf_users group with monitoring and observability
privileges [6]_ :
::
- # setcap "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf
- # setcap -v "cap_sys_admin,cap_sys_ptrace,cap_syslog=ep" perf
+ # setcap "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf
+ # setcap -v "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" perf
perf: OK
# getcap perf
- perf = cap_sys_ptrace,cap_sys_admin,cap_syslog+ep
+ perf = cap_sys_ptrace,cap_syslog,cap_perfmon+ep
+
+If the libcap [16]_ installed doesn't yet support "cap_perfmon", use "38" instead,
+i.e.:
+
+::
+
+ # setcap "38,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf
+
+Note that you may need to have 'cap_ipc_lock' in the mix for tools such as
+'perf top', alternatively use 'perf top -m N', to reduce the memory that
+it uses for the perf ring buffer, see the memory allocation section below.
+
+Using a libcap without support for CAP_PERFMON will make cap_get_flag(caps, 38,
+CAP_EFFECTIVE, &val) fail, which will lead the default event to be 'cycles:u',
+so as a workaround explicitly ask for the 'cycles' event, i.e.:
+
+::
+
+ # perf top -e cycles
+
+To get kernel and user samples with a perf binary with just CAP_PERFMON.
As a result, members of perf_users group are capable of conducting
-performance monitoring by using functionality of the configured Perf
-tool executable that, when executes, passes perf_events subsystem scope
-checks.
+performance monitoring and observability by using functionality of the
+configured Perf tool executable that, when executes, passes perf_events
+subsystem scope checks.
+
+In case Perf tool executable can't be assigned required capabilities (e.g.
+file system is mounted with nosuid option or extended attributes are
+not supported by the file system) then creation of the capabilities
+privileged environment, naturally shell, is possible. The shell provides
+inherent processes with CAP_PERFMON and other required capabilities so that
+performance monitoring and observability operations are available in the
+environment without limits. Access to the environment can be open via sudo
+utility for members of perf_users group only. In order to create such
+environment:
+
+1. Create shell script that uses capsh utility [16]_ to assign CAP_PERFMON
+ and other required capabilities into ambient capability set of the shell
+ process, lock the process security bits after enabling SECBIT_NO_SETUID_FIXUP,
+ SECBIT_NOROOT and SECBIT_NO_CAP_AMBIENT_RAISE bits and then change
+ the process identity to sudo caller of the script who should essentially
+ be a member of perf_users group:
+
+::
+
+ # ls -alh /usr/local/bin/perf.shell
+ -rwxr-xr-x. 1 root root 83 Oct 13 23:57 /usr/local/bin/perf.shell
+ # cat /usr/local/bin/perf.shell
+ exec /usr/sbin/capsh --iab=^cap_perfmon --secbits=239 --user=$SUDO_USER -- -l
+
+2. Extend sudo policy at /etc/sudoers file with a rule for perf_users group:
+
+::
+
+ # grep perf_users /etc/sudoers
+ %perf_users ALL=/usr/local/bin/perf.shell
+
+3. Check that members of perf_users group have access to the privileged
+ shell and have CAP_PERFMON and other required capabilities enabled
+ in permitted, effective and ambient capability sets of an inherent process:
+
+::
+
+ $ id
+ uid=1003(capsh_test) gid=1004(capsh_test) groups=1004(capsh_test),1000(perf_users) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
+ $ sudo perf.shell
+ [sudo] password for capsh_test:
+ $ grep Cap /proc/self/status
+ CapInh: 0000004000000000
+ CapPrm: 0000004000000000
+ CapEff: 0000004000000000
+ CapBnd: 000000ffffffffff
+ CapAmb: 0000004000000000
+ $ capsh --decode=0000004000000000
+ 0x0000004000000000=cap_perfmon
+
+As a result, members of perf_users group have access to the privileged
+environment where they can use tools employing performance monitoring APIs
+governed by CAP_PERFMON Linux capability.
This specific access control management is only available to superuser
or root running processes with CAP_SETPCAP, CAP_SETFCAP [6]_
capabilities.
-perf_events/Perf unprivileged users
+Unprivileged users
-----------------------------------
-perf_events/Perf *scope* and *access* control for unprivileged processes
+perf_events *scope* and *access* control for unprivileged processes
is governed by perf_event_paranoid [2]_ setting:
-1:
@@ -166,7 +258,7 @@ is governed by perf_event_paranoid [2]_ setting:
perf_event_mlock_kb locking limit is imposed but ignored for
unprivileged processes with CAP_IPC_LOCK capability.
-perf_events/Perf resource control
+Resource control
---------------------------------
Open file descriptors
@@ -227,4 +319,7 @@ Bibliography
.. [10] `<http://man7.org/linux/man-pages/man5/acl.5.html>`_
.. [11] `<http://man7.org/linux/man-pages/man2/getrlimit.2.html>`_
.. [12] `<http://man7.org/linux/man-pages/man5/limits.conf.5.html>`_
-
+.. [13] `<https://sites.google.com/site/fullycapable>`_
+.. [14] `<http://man7.org/linux/man-pages/man8/auditd.8.html>`_
+.. [15] `<https://man7.org/linux/man-pages/man8/sudo.8.html>`_
+.. [16] `<https://git.kernel.org/pub/scm/libs/libcap/libcap.git/>`_
diff --git a/Documentation/admin-guide/perf/alibaba_pmu.rst b/Documentation/admin-guide/perf/alibaba_pmu.rst
new file mode 100644
index 000000000000..7d840023903f
--- /dev/null
+++ b/Documentation/admin-guide/perf/alibaba_pmu.rst
@@ -0,0 +1,105 @@
+=============================================================
+Alibaba's T-Head SoC Uncore Performance Monitoring Unit (PMU)
+=============================================================
+
+The Yitian 710, custom-built by Alibaba Group's chip development business,
+T-Head, implements uncore PMU for performance and functional debugging to
+facilitate system maintenance.
+
+DDR Sub-System Driveway (DRW) PMU Driver
+=========================================
+
+Yitian 710 employs eight DDR5/4 channels, four on each die. Each DDR5 channel
+is independent of others to service system memory requests. And one DDR5
+channel is split into two independent sub-channels. The DDR Sub-System Driveway
+implements separate PMUs for each sub-channel to monitor various performance
+metrics.
+
+The Driveway PMU devices are named as ali_drw_<sys_base_addr> with perf.
+For example, ali_drw_21000 and ali_drw_21080 are two PMU devices for two
+sub-channels of the same channel in die 0. And the PMU device of die 1 is
+prefixed with ali_drw_400XXXXX, e.g. ali_drw_40021000.
+
+Each sub-channel has 36 PMU counters in total, which is classified into
+four groups:
+
+- Group 0: PMU Cycle Counter. This group has one pair of counters
+ pmu_cycle_cnt_low and pmu_cycle_cnt_high, that is used as the cycle count
+ based on DDRC core clock.
+
+- Group 1: PMU Bandwidth Counters. This group has 8 counters that are used
+ to count the total access number of either the eight bank groups in a
+ selected rank, or four ranks separately in the first 4 counters. The base
+ transfer unit is 64B.
+
+- Group 2: PMU Retry Counters. This group has 10 counters, that intend to
+ count the total retry number of each type of uncorrectable error.
+
+- Group 3: PMU Common Counters. This group has 16 counters, that are used
+ to count the common events.
+
+For now, the Driveway PMU driver only uses counters in group 0 and group 3.
+
+The DDR Controller (DDRCTL) and DDR PHY combine to create a complete solution
+for connecting an SoC application bus to DDR memory devices. The DDRCTL
+receives transactions Host Interface (HIF) which is custom-defined by Synopsys.
+These transactions are queued internally and scheduled for access while
+satisfying the SDRAM protocol timing requirements, transaction priorities, and
+dependencies between the transactions. The DDRCTL in turn issues commands on
+the DDR PHY Interface (DFI) to the PHY module, which launches and captures data
+to and from the SDRAM. The driveway PMUs have hardware logic to gather
+statistics and performance logging signals on HIF, DFI, etc.
+
+By counting the READ, WRITE and RMW commands sent to the DDRC through the HIF
+interface, we could calculate the bandwidth. Example usage of counting memory
+data bandwidth::
+
+ perf stat \
+ -e ali_drw_21000/hif_wr/ \
+ -e ali_drw_21000/hif_rd/ \
+ -e ali_drw_21000/hif_rmw/ \
+ -e ali_drw_21000/cycle/ \
+ -e ali_drw_21080/hif_wr/ \
+ -e ali_drw_21080/hif_rd/ \
+ -e ali_drw_21080/hif_rmw/ \
+ -e ali_drw_21080/cycle/ \
+ -e ali_drw_23000/hif_wr/ \
+ -e ali_drw_23000/hif_rd/ \
+ -e ali_drw_23000/hif_rmw/ \
+ -e ali_drw_23000/cycle/ \
+ -e ali_drw_23080/hif_wr/ \
+ -e ali_drw_23080/hif_rd/ \
+ -e ali_drw_23080/hif_rmw/ \
+ -e ali_drw_23080/cycle/ \
+ -e ali_drw_25000/hif_wr/ \
+ -e ali_drw_25000/hif_rd/ \
+ -e ali_drw_25000/hif_rmw/ \
+ -e ali_drw_25000/cycle/ \
+ -e ali_drw_25080/hif_wr/ \
+ -e ali_drw_25080/hif_rd/ \
+ -e ali_drw_25080/hif_rmw/ \
+ -e ali_drw_25080/cycle/ \
+ -e ali_drw_27000/hif_wr/ \
+ -e ali_drw_27000/hif_rd/ \
+ -e ali_drw_27000/hif_rmw/ \
+ -e ali_drw_27000/cycle/ \
+ -e ali_drw_27080/hif_wr/ \
+ -e ali_drw_27080/hif_rd/ \
+ -e ali_drw_27080/hif_rmw/ \
+ -e ali_drw_27080/cycle/ -- sleep 10
+
+Example usage of counting all memory read/write bandwidth by metric::
+
+ perf stat -M ddr_read_bandwidth.all -- sleep 10
+ perf stat -M ddr_write_bandwidth.all -- sleep 10
+
+The average DRAM bandwidth can be calculated as follows:
+
+- Read Bandwidth = perf_hif_rd * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle
+- Write Bandwidth = (perf_hif_wr + perf_hif_rmw) * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle
+
+Here, DDRC_WIDTH = 64 bytes.
+
+The current driver does not support sampling. So "perf record" is
+unsupported. Also attach to a task is unsupported as the events are all
+uncore.
diff --git a/Documentation/admin-guide/perf/ampere_cspmu.rst b/Documentation/admin-guide/perf/ampere_cspmu.rst
new file mode 100644
index 000000000000..94f93f5aee6c
--- /dev/null
+++ b/Documentation/admin-guide/perf/ampere_cspmu.rst
@@ -0,0 +1,29 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================================
+Ampere SoC Performance Monitoring Unit (PMU)
+============================================
+
+Ampere SoC PMU is a generic PMU IP that follows Arm CoreSight PMU architecture.
+Therefore, the driver is implemented as a submodule of arm_cspmu driver. At the
+first phase it's used for counting MCU events on AmpereOne.
+
+
+MCU PMU events
+--------------
+
+The PMU driver supports setting filters for "rank", "bank", and "threshold".
+Note, that the filters are per PMU instance rather than per event.
+
+
+Example for perf tool use::
+
+ / # perf list ampere
+
+ ampere_mcu_pmu_0/act_sent/ [Kernel PMU event]
+ <...>
+ ampere_mcu_pmu_1/rd_sent/ [Kernel PMU event]
+ <...>
+
+ / # perf stat -a -e ampere_mcu_pmu_0/act_sent,bank=5,rank=3,threshold=2/,ampere_mcu_pmu_1/rd_sent/ \
+ sleep 1
diff --git a/Documentation/admin-guide/perf/arm-ccn.rst b/Documentation/admin-guide/perf/arm-ccn.rst
index 832b0c64023a..f62f7fe50eba 100644
--- a/Documentation/admin-guide/perf/arm-ccn.rst
+++ b/Documentation/admin-guide/perf/arm-ccn.rst
@@ -27,7 +27,7 @@ Crosspoint PMU events require "xp" (index), "bus" (bus number)
and "vc" (virtual channel ID).
Crosspoint watchpoint-based events (special "event" value 0xfe)
-require "xp" and "vc" as as above plus "port" (device port index),
+require "xp" and "vc" as above plus "port" (device port index),
"dir" (transmit/receive direction), comparator values ("cmp_l"
and "cmp_h") and "mask", being index of the comparator mask.
diff --git a/Documentation/admin-guide/perf/arm-cmn.rst b/Documentation/admin-guide/perf/arm-cmn.rst
new file mode 100644
index 000000000000..796e25b7027b
--- /dev/null
+++ b/Documentation/admin-guide/perf/arm-cmn.rst
@@ -0,0 +1,65 @@
+=============================
+Arm Coherent Mesh Network PMU
+=============================
+
+CMN-600 is a configurable mesh interconnect consisting of a rectangular
+grid of crosspoints (XPs), with each crosspoint supporting up to two
+device ports to which various AMBA CHI agents are attached.
+
+CMN implements a distributed PMU design as part of its debug and trace
+functionality. This consists of a local monitor (DTM) at every XP, which
+counts up to 4 event signals from the connected device nodes and/or the
+XP itself. Overflow from these local counters is accumulated in up to 8
+global counters implemented by the main controller (DTC), which provides
+overall PMU control and interrupts for global counter overflow.
+
+PMU events
+----------
+
+The PMU driver registers a single PMU device for the whole interconnect,
+see /sys/bus/event_source/devices/arm_cmn_0. Multi-chip systems may link
+more than one CMN together via external CCIX links - in this situation,
+each mesh counts its own events entirely independently, and additional
+PMU devices will be named arm_cmn_{1..n}.
+
+Most events are specified in a format based directly on the TRM
+definitions - "type" selects the respective node type, and "eventid" the
+event number. Some events require an additional occupancy ID, which is
+specified by "occupid".
+
+* Since RN-D nodes do not have any distinct events from RN-I nodes, they
+ are treated as the same type (0xa), and the common event templates are
+ named "rnid_*".
+
+* The cycle counter is treated as a synthetic event belonging to the DTC
+ node ("type" == 0x3, "eventid" is ignored).
+
+* XP events also encode the port and channel in the "eventid" field, to
+ match the underlying pmu_event0_id encoding for the pmu_event_sel
+ register. The event templates are named with prefixes to cover all
+ permutations.
+
+By default each event provides an aggregate count over all nodes of the
+given type. To target a specific node, "bynodeid" must be set to 1 and
+"nodeid" to the appropriate value derived from the CMN configuration
+(as defined in the "Node ID Mapping" section of the TRM).
+
+Watchpoints
+-----------
+
+The PMU can also count watchpoint events to monitor specific flit
+traffic. Watchpoints are treated as a synthetic event type, and like PMU
+events can be global or targeted with a particular XP's "nodeid" value.
+Since the watchpoint direction is otherwise implicit in the underlying
+register selection, separate events are provided for flit uploads and
+downloads.
+
+The flit match value and mask are passed in config1 and config2 ("val"
+and "mask" respectively). "wp_dev_sel", "wp_chn_sel", "wp_grp" and
+"wp_exclusive" are specified per the TRM definitions for dtm_wp_config0.
+Where a watchpoint needs to match fields from both match groups on the
+REQ or SNP channel, it can be specified as two events - one for each
+group - with the same nonzero "combine" value. The count for such a
+pair of combined events will be attributed to the primary match.
+Watchpoint events with a "combine" value of 0 are considered independent
+and will count individually.
diff --git a/Documentation/admin-guide/perf/cxl.rst b/Documentation/admin-guide/perf/cxl.rst
new file mode 100644
index 000000000000..9233ea0d0b10
--- /dev/null
+++ b/Documentation/admin-guide/perf/cxl.rst
@@ -0,0 +1,68 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================
+CXL Performance Monitoring Unit (CPMU)
+======================================
+
+The CXL rev 3.0 specification provides a definition of CXL Performance
+Monitoring Unit in section 13.2: Performance Monitoring.
+
+CXL components (e.g. Root Port, Switch Upstream Port, End Point) may have
+any number of CPMU instances. CPMU capabilities are fully discoverable from
+the devices. The specification provides event definitions for all CXL protocol
+message types and a set of additional events for things commonly counted on
+CXL devices (e.g. DRAM events).
+
+CPMU driver
+===========
+
+The CPMU driver registers a perf PMU with the name pmu_mem<X>.<Y> on the CXL bus
+representing the Yth CPMU for memX.
+
+ /sys/bus/cxl/device/pmu_mem<X>.<Y>
+
+The associated PMU is registered as
+
+ /sys/bus/event_sources/devices/cxl_pmu_mem<X>.<Y>
+
+In common with other CXL bus devices, the id has no specific meaning and the
+relationship to specific CXL device should be established via the device parent
+of the device on the CXL bus.
+
+PMU driver provides description of available events and filter options in sysfs.
+
+The "format" directory describes all formats of the config (event vendor id,
+group id and mask) config1 (threshold, filter enables) and config2 (filter
+parameters) fields of the perf_event_attr structure. The "events" directory
+describes all documented events show in perf list.
+
+The events shown in perf list are the most fine grained events with a single
+bit of the event mask set. More general events may be enable by setting
+multiple mask bits in config. For example, all Device to Host Read Requests
+may be captured on a single counter by setting the bits for all of
+
+* d2h_req_rdcurr
+* d2h_req_rdown
+* d2h_req_rdshared
+* d2h_req_rdany
+* d2h_req_rdownnodata
+
+Example of usage::
+
+ $#perf list
+ cxl_pmu_mem0.0/clock_ticks/ [Kernel PMU event]
+ cxl_pmu_mem0.0/d2h_req_rdshared/ [Kernel PMU event]
+ cxl_pmu_mem0.0/h2d_req_snpcur/ [Kernel PMU event]
+ cxl_pmu_mem0.0/h2d_req_snpdata/ [Kernel PMU event]
+ cxl_pmu_mem0.0/h2d_req_snpinv/ [Kernel PMU event]
+ -----------------------------------------------------------
+
+ $# perf stat -a -e cxl_pmu_mem0.0/clock_ticks/ -e cxl_pmu_mem0.0/d2h_req_rdshared/
+
+Vendor specific events may also be available and if so can be used via
+
+ $# perf stat -a -e cxl_pmu_mem0.0/vid=VID,gid=GID,mask=MASK/
+
+The driver does not support sampling so "perf record" is unsupported.
+It only supports system-wide counting so attaching to a task is
+unsupported.
diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
new file mode 100644
index 000000000000..d47cd229d710
--- /dev/null
+++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
@@ -0,0 +1,94 @@
+======================================================================
+Synopsys DesignWare Cores (DWC) PCIe Performance Monitoring Unit (PMU)
+======================================================================
+
+DesignWare Cores (DWC) PCIe PMU
+===============================
+
+The PMU is a PCIe configuration space register block provided by each PCIe Root
+Port in a Vendor-Specific Extended Capability named RAS D.E.S (Debug, Error
+injection, and Statistics).
+
+As the name indicates, the RAS DES capability supports system level
+debugging, AER error injection, and collection of statistics. To facilitate
+collection of statistics, Synopsys DesignWare Cores PCIe controller
+provides the following two features:
+
+- one 64-bit counter for Time Based Analysis (RX/TX data throughput and
+ time spent in each low-power LTSSM state) and
+- one 32-bit counter for Event Counting (error and non-error events for
+ a specified lane)
+
+Note: There is no interrupt for counter overflow.
+
+Time Based Analysis
+-------------------
+
+Using this feature you can obtain information regarding RX/TX data
+throughput and time spent in each low-power LTSSM state by the controller.
+The PMU measures data in two categories:
+
+- Group#0: Percentage of time the controller stays in LTSSM states.
+- Group#1: Amount of data processed (Units of 16 bytes).
+
+Lane Event counters
+-------------------
+
+Using this feature you can obtain Error and Non-Error information in
+specific lane by the controller. The PMU event is selected by all of:
+
+- Group i
+- Event j within the Group i
+- Lane k
+
+Some of the events only exist for specific configurations.
+
+DesignWare Cores (DWC) PCIe PMU Driver
+=======================================
+
+This driver adds PMU devices for each PCIe Root Port named based on the BDF of
+the Root Port. For example,
+
+ 30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
+
+the PMU device name for this Root Port is dwc_rootport_3018.
+
+The DWC PCIe PMU driver registers a perf PMU driver, which provides
+description of available events and configuration options in sysfs, see
+/sys/bus/event_source/devices/dwc_rootport_{bdf}.
+
+The "format" directory describes format of the config fields of the
+perf_event_attr structure. The "events" directory provides configuration
+templates for all documented events. For example,
+"Rx_PCIe_TLP_Data_Payload" is an equivalent of "eventid=0x22,type=0x1".
+
+The "perf list" command shall list the available events from sysfs, e.g.::
+
+ $# perf list | grep dwc_rootport
+ <...>
+ dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/ [Kernel PMU event]
+ <...>
+ dwc_rootport_3018/rx_memory_read,lane=?/ [Kernel PMU event]
+
+Time Based Analysis Event Usage
+-------------------------------
+
+Example usage of counting PCIe RX TLP data payload (Units of bytes)::
+
+ $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
+
+The average RX/TX bandwidth can be calculated using the following formula:
+
+ PCIe RX Bandwidth = Rx_PCIe_TLP_Data_Payload / Measure_Time_Window
+ PCIe TX Bandwidth = Tx_PCIe_TLP_Data_Payload / Measure_Time_Window
+
+Lane Event Usage
+-------------------------------
+
+Each lane has the same event set and to avoid generating a list of hundreds
+of events, the user need to specify the lane ID explicitly, e.g.::
+
+ $# perf stat -a -e dwc_rootport_3018/rx_memory_read,lane=4/
+
+The driver does not support sampling, therefore "perf record" will not
+work. Per-task (without "-a") perf sessions are not supported.
diff --git a/Documentation/admin-guide/perf/hisi-pcie-pmu.rst b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst
new file mode 100644
index 000000000000..7e863662e2d4
--- /dev/null
+++ b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst
@@ -0,0 +1,130 @@
+================================================
+HiSilicon PCIe Performance Monitoring Unit (PMU)
+================================================
+
+On Hip09, HiSilicon PCIe Performance Monitoring Unit (PMU) could monitor
+bandwidth, latency, bus utilization and buffer occupancy data of PCIe.
+
+Each PCIe Core has a PMU to monitor multi Root Ports of this PCIe Core and
+all Endpoints downstream these Root Ports.
+
+
+HiSilicon PCIe PMU driver
+=========================
+
+The PCIe PMU driver registers a perf PMU with the name of its sicl-id and PCIe
+Core id.::
+
+ /sys/bus/event_source/hisi_pcie<sicl>_core<core>
+
+PMU driver provides description of available events and filter options in sysfs,
+see /sys/bus/event_source/devices/hisi_pcie<sicl>_core<core>.
+
+The "format" directory describes all formats of the config (events) and config1
+(filter options) fields of the perf_event_attr structure. The "events" directory
+describes all documented events shown in perf list.
+
+The "identifier" sysfs file allows users to identify the version of the
+PMU hardware device.
+
+The "bus" sysfs file allows users to get the bus number of Root Ports
+monitored by PMU.
+
+Example usage of perf::
+
+ $# perf list
+ hisi_pcie0_core0/rx_mwr_latency/ [kernel PMU event]
+ hisi_pcie0_core0/rx_mwr_cnt/ [kernel PMU event]
+ ------------------------------------------
+
+ $# perf stat -e hisi_pcie0_core0/rx_mwr_latency/
+ $# perf stat -e hisi_pcie0_core0/rx_mwr_cnt/
+ $# perf stat -g -e hisi_pcie0_core0/rx_mwr_latency/ -e hisi_pcie0_core0/rx_mwr_cnt/
+
+The current driver does not support sampling. So "perf record" is unsupported.
+Also attach to a task is unsupported for PCIe PMU.
+
+Filter options
+--------------
+
+1. Target filter
+
+ PMU could only monitor the performance of traffic downstream target Root
+ Ports or downstream target Endpoint. PCIe PMU driver support "port" and
+ "bdf" interfaces for users, and these two interfaces aren't supported at the
+ same time.
+
+ - port
+
+ "port" filter can be used in all PCIe PMU events, target Root Port can be
+ selected by configuring the 16-bits-bitmap "port". Multi ports can be
+ selected for AP-layer-events, and only one port can be selected for
+ TL/DL-layer-events.
+
+ For example, if target Root Port is 0000:00:00.0 (x8 lanes), bit0 of
+ bitmap should be set, port=0x1; if target Root Port is 0000:00:04.0 (x4
+ lanes), bit8 is set, port=0x100; if these two Root Ports are both
+ monitored, port=0x101.
+
+ Example usage of perf::
+
+ $# perf stat -e hisi_pcie0_core0/rx_mwr_latency,port=0x1/ sleep 5
+
+ - bdf
+
+ "bdf" filter can only be used in bandwidth events, target Endpoint is
+ selected by configuring BDF to "bdf". Counter only counts the bandwidth of
+ message requested by target Endpoint.
+
+ For example, "bdf=0x3900" means BDF of target Endpoint is 0000:39:00.0.
+
+ Example usage of perf::
+
+ $# perf stat -e hisi_pcie0_core0/rx_mrd_flux,bdf=0x3900/ sleep 5
+
+2. Trigger filter
+
+ Event statistics start when the first time TLP length is greater/smaller
+ than trigger condition. You can set the trigger condition by writing
+ "trig_len", and set the trigger mode by writing "trig_mode". This filter can
+ only be used in bandwidth events.
+
+ For example, "trig_len=4" means trigger condition is 2^4 DW, "trig_mode=0"
+ means statistics start when TLP length > trigger condition, "trig_mode=1"
+ means start when TLP length < condition.
+
+ Example usage of perf::
+
+ $# perf stat -e hisi_pcie0_core0/rx_mrd_flux,trig_len=0x4,trig_mode=1/ sleep 5
+
+3. Threshold filter
+
+ Counter counts when TLP length within the specified range. You can set the
+ threshold by writing "thr_len", and set the threshold mode by writing
+ "thr_mode". This filter can only be used in bandwidth events.
+
+ For example, "thr_len=4" means threshold is 2^4 DW, "thr_mode=0" means
+ counter counts when TLP length >= threshold, and "thr_mode=1" means counts
+ when TLP length < threshold.
+
+ Example usage of perf::
+
+ $# perf stat -e hisi_pcie0_core0/rx_mrd_flux,thr_len=0x4,thr_mode=1/ sleep 5
+
+4. TLP Length filter
+
+ When counting bandwidth, the data can be composed of certain parts of TLP
+ packets. You can specify it through "len_mode":
+
+ - 2'b00: Reserved (Do not use this since the behaviour is undefined)
+ - 2'b01: Bandwidth of TLP payloads
+ - 2'b10: Bandwidth of TLP headers
+ - 2'b11: Bandwidth of both TLP payloads and headers
+
+ For example, "len_mode=2" means only counting the bandwidth of TLP headers
+ and "len_mode=3" means the final bandwidth data is composed of both TLP
+ headers and payloads. Default value if not specified is 2'b11.
+
+ Example usage of perf::
+
+ $# perf stat -e hisi_pcie0_core0/rx_mrd_flux,len_mode=0x1/ sleep 5
diff --git a/Documentation/admin-guide/perf/hisi-pmu.rst b/Documentation/admin-guide/perf/hisi-pmu.rst
index 404a5c3d9d00..e0174d20809a 100644
--- a/Documentation/admin-guide/perf/hisi-pmu.rst
+++ b/Documentation/admin-guide/perf/hisi-pmu.rst
@@ -53,6 +53,72 @@ Example usage of perf::
$# perf stat -a -e hisi_sccl3_l3c0/rd_hit_cpipe/ sleep 5
$# perf stat -a -e hisi_sccl3_l3c0/config=0x02/ sleep 5
+For HiSilicon uncore PMU v2 whose identifier is 0x30, the topology is the same
+as PMU v1, but some new functions are added to the hardware.
+
+1. L3C PMU supports filtering by core/thread within the cluster which can be
+specified as a bitmap::
+
+ $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_core=0x3/ sleep 5
+
+This will only count the operations from core/thread 0 and 1 in this cluster.
+
+2. Tracetag allow the user to chose to count only read, write or atomic
+operations via the tt_req parameeter in perf. The default value counts all
+operations. tt_req is 3bits, 3'b100 represents read operations, 3'b101
+represents write operations, 3'b110 represents atomic store operations and
+3'b111 represents atomic non-store operations, other values are reserved::
+
+ $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_req=0x4/ sleep 5
+
+This will only count the read operations in this cluster.
+
+3. Datasrc allows the user to check where the data comes from. It is 5 bits.
+Some important codes are as follows:
+
+- 5'b00001: comes from L3C in this die;
+- 5'b01000: comes from L3C in the cross-die;
+- 5'b01001: comes from L3C which is in another socket;
+- 5'b01110: comes from the local DDR;
+- 5'b01111: comes from the cross-die DDR;
+- 5'b10000: comes from cross-socket DDR;
+
+etc, it is mainly helpful to find that the data source is nearest from the CPU
+cores. If datasrc_cfg is used in the multi-chips, the datasrc_skt shall be
+configured in perf command::
+
+ $# perf stat -a -e hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xE/,
+ hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xF/ sleep 5
+
+4. Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die
+contains several Compute Clusters (CCLs). The I/O dies are called Super I/O
+clusters (SICL) containing multiple I/O clusters (ICLs). Each CCL/ICL in the
+SoC has a unique ID. Each ID is 11bits, include a 6-bit SCCL-ID and 5-bit
+CCL/ICL-ID. For I/O die, the ICL-ID is followed by:
+
+- 5'b00000: I/O_MGMT_ICL;
+- 5'b00001: Network_ICL;
+- 5'b00011: HAC_ICL;
+- 5'b10000: PCIe_ICL;
+
+5. uring_channel: UC PMU events 0x47~0x59 supports filtering by tx request
+uring channel. It is 2 bits. Some important codes are as follows:
+
+- 2'b11: count the events which sent to the uring_ext (MATA) channel;
+- 2'b01: is the same as 2'b11;
+- 2'b10: count the events which sent to the uring (non-MATA) channel;
+- 2'b00: default value, count the events which sent to the both uring and
+ uring_ext channel;
+
+Users could configure IDs to count data come from specific CCL/ICL, by setting
+srcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting
+tgtid_cmd & tgtid_msk. A set bit in srcid_msk/tgtid_msk means the PMU will not
+check the bit when matching against the srcid_cmd/tgtid_cmd.
+
+If all of these options are disabled, it can works by the default value that
+doesn't distinguish the filter condition and ID information and will return
+the total counter values in the PMU counters.
+
The current driver does not support sampling. So "perf record" is unsupported.
Also attach to a task is unsupported as the events are all uncore.
diff --git a/Documentation/admin-guide/perf/hns3-pmu.rst b/Documentation/admin-guide/perf/hns3-pmu.rst
new file mode 100644
index 000000000000..75a40846d47f
--- /dev/null
+++ b/Documentation/admin-guide/perf/hns3-pmu.rst
@@ -0,0 +1,136 @@
+======================================
+HNS3 Performance Monitoring Unit (PMU)
+======================================
+
+HNS3(HiSilicon network system 3) Performance Monitoring Unit (PMU) is an
+End Point device to collect performance statistics of HiSilicon SoC NIC.
+On Hip09, each SICL(Super I/O cluster) has one PMU device.
+
+HNS3 PMU supports collection of performance statistics such as bandwidth,
+latency, packet rate and interrupt rate.
+
+Each HNS3 PMU supports 8 hardware events.
+
+HNS3 PMU driver
+===============
+
+The HNS3 PMU driver registers a perf PMU with the name of its sicl id.::
+
+ /sys/devices/hns3_pmu_sicl_<sicl_id>
+
+PMU driver provides description of available events, filter modes, format,
+identifier and cpumask in sysfs.
+
+The "events" directory describes the event code of all supported events
+shown in perf list.
+
+The "filtermode" directory describes the supported filter modes of each
+event.
+
+The "format" directory describes all formats of the config (events) and
+config1 (filter options) fields of the perf_event_attr structure.
+
+The "identifier" file shows version of PMU hardware device.
+
+The "bdf_min" and "bdf_max" files show the supported bdf range of each
+pmu device.
+
+The "hw_clk_freq" file shows the hardware clock frequency of each pmu
+device.
+
+Example usage of checking event code and subevent code::
+
+ $# cat /sys/devices/hns3_pmu_sicl_0/events/dly_tx_normal_to_mac_time
+ config=0x00204
+ $# cat /sys/devices/hns3_pmu_sicl_0/events/dly_tx_normal_to_mac_packet_num
+ config=0x10204
+
+Each performance statistic has a pair of events to get two values to
+calculate real performance data in userspace.
+
+The bits 0~15 of config (here 0x0204) are the true hardware event code. If
+two events have same value of bits 0~15 of config, that means they are
+event pair. And the bit 16 of config indicates getting counter 0 or
+counter 1 of hardware event.
+
+After getting two values of event pair in userspace, the formula of
+computation to calculate real performance data is:::
+
+ counter 0 / counter 1
+
+Example usage of checking supported filter mode::
+
+ $# cat /sys/devices/hns3_pmu_sicl_0/filtermode/bw_ssu_rpu_byte_num
+ filter mode supported: global/port/port-tc/func/func-queue/
+
+Example usage of perf::
+
+ $# perf list
+ hns3_pmu_sicl_0/bw_ssu_rpu_byte_num/ [kernel PMU event]
+ hns3_pmu_sicl_0/bw_ssu_rpu_time/ [kernel PMU event]
+ ------------------------------------------
+
+ $# perf stat -g -e hns3_pmu_sicl_0/bw_ssu_rpu_byte_num,global=1/ -e hns3_pmu_sicl_0/bw_ssu_rpu_time,global=1/ -I 1000
+ or
+ $# perf stat -g -e hns3_pmu_sicl_0/config=0x00002,global=1/ -e hns3_pmu_sicl_0/config=0x10002,global=1/ -I 1000
+
+
+Filter modes
+--------------
+
+1. global mode
+PMU collect performance statistics for all HNS3 PCIe functions of IO DIE.
+Set the "global" filter option to 1 will enable this mode.
+Example usage of perf::
+
+ $# perf stat -a -e hns3_pmu_sicl_0/config=0x1020F,global=1/ -I 1000
+
+2. port mode
+PMU collect performance statistic of one whole physical port. The port id
+is same as mac id. The "tc" filter option must be set to 0xF in this mode,
+here tc stands for traffic class.
+
+Example usage of perf::
+
+ $# perf stat -a -e hns3_pmu_sicl_0/config=0x1020F,port=0,tc=0xF/ -I 1000
+
+3. port-tc mode
+PMU collect performance statistic of one tc of physical port. The port id
+is same as mac id. The "tc" filter option must be set to 0 ~ 7 in this
+mode.
+Example usage of perf::
+
+ $# perf stat -a -e hns3_pmu_sicl_0/config=0x1020F,port=0,tc=0/ -I 1000
+
+4. func mode
+PMU collect performance statistic of one PF/VF. The function id is BDF of
+PF/VF, its conversion formula::
+
+ func = (bus << 8) + (device << 3) + (function)
+
+for example:
+ BDF func
+ 35:00.0 0x3500
+ 35:00.1 0x3501
+ 35:01.0 0x3508
+
+In this mode, the "queue" filter option must be set to 0xFFFF.
+Example usage of perf::
+
+ $# perf stat -a -e hns3_pmu_sicl_0/config=0x1020F,bdf=0x3500,queue=0xFFFF/ -I 1000
+
+5. func-queue mode
+PMU collect performance statistic of one queue of PF/VF. The function id
+is BDF of PF/VF, the "queue" filter option must be set to the exact queue
+id of function.
+Example usage of perf::
+
+ $# perf stat -a -e hns3_pmu_sicl_0/config=0x1020F,bdf=0x3500,queue=0/ -I 1000
+
+6. func-intr mode
+PMU collect performance statistic of one interrupt of PF/VF. The function
+id is BDF of PF/VF, the "intr" filter option must be set to the exact
+interrupt id of function.
+Example usage of perf::
+
+ $# perf stat -a -e hns3_pmu_sicl_0/config=0x00301,bdf=0x3500,intr=0/ -I 1000
diff --git a/Documentation/admin-guide/perf/imx-ddr.rst b/Documentation/admin-guide/perf/imx-ddr.rst
index f05f56c73b7d..77418ae5a290 100644
--- a/Documentation/admin-guide/perf/imx-ddr.rst
+++ b/Documentation/admin-guide/perf/imx-ddr.rst
@@ -4,7 +4,7 @@ Freescale i.MX8 DDR Performance Monitoring Unit (PMU)
There are no performance counters inside the DRAM controller, so performance
signals are brought out to the edge of the controller where a set of 4 x 32 bit
-counters is implemented. This is controlled by the CSV modes programed in counter
+counters is implemented. This is controlled by the CSV modes programmed in counter
control register which causes a large number of PERF signals to be generated.
Selection of the value for each counter is done via the config registers. There
@@ -13,8 +13,8 @@ is one register for each counter. Counter 0 is special in that it always counts
interrupt is raised. If any other counter overflows, it continues counting, and
no interrupt is raised.
-The "format" directory describes format of the config (event ID) and config1
-(AXI filtering) fields of the perf_event_attr structure, see /sys/bus/event_source/
+The "format" directory describes format of the config (event ID) and config1/2
+(AXI filter setting) fields of the perf_event_attr structure, see /sys/bus/event_source/
devices/imx8_ddr0/format/. The "events" directory describes the events types
hardware supported that can be used with perf tool, see /sys/bus/event_source/
devices/imx8_ddr0/events/. The "caps" directory describes filter features implemented
@@ -28,12 +28,11 @@ in DDR PMU, see /sys/bus/events_source/devices/imx8_ddr0/caps/.
AXI filtering is only used by CSV modes 0x41 (axid-read) and 0x42 (axid-write)
to count reading or writing matches filter setting. Filter setting is various
from different DRAM controller implementations, which is distinguished by quirks
-in the driver. You also can dump info from userspace, filter in "caps" directory
-indicates whether PMU supports AXI ID filter or not; enhanced_filter indicates
-whether PMU supports enhanced AXI ID filter or not. Value 0 for un-supported, and
-value 1 for supported.
+in the driver. You also can dump info from userspace, "caps" directory show the
+type of AXI filter (filter, enhanced_filter and super_filter). Value 0 for
+un-supported, and value 1 for supported.
-* With DDR_CAP_AXI_ID_FILTER quirk(filter: 1, enhanced_filter: 0).
+* With DDR_CAP_AXI_ID_FILTER quirk(filter: 1, enhanced_filter: 0, super_filter: 0).
Filter is defined with two configuration parts:
--AXI_ID defines AxID matching value.
--AXI_MASKING defines which bits of AxID are meaningful for the matching.
@@ -65,7 +64,37 @@ value 1 for supported.
perf stat -a -e imx8_ddr0/axid-read,axi_id=0x12/ cmd, which will monitor ARID=0x12
-* With DDR_CAP_AXI_ID_FILTER_ENHANCED quirk(filter: 1, enhanced_filter: 1).
+* With DDR_CAP_AXI_ID_FILTER_ENHANCED quirk(filter: 1, enhanced_filter: 1, super_filter: 0).
This is an extension to the DDR_CAP_AXI_ID_FILTER quirk which permits
counting the number of bytes (as opposed to the number of bursts) from DDR
read and write transactions concurrently with another set of data counters.
+
+* With DDR_CAP_AXI_ID_PORT_CHANNEL_FILTER quirk(filter: 0, enhanced_filter: 0, super_filter: 1).
+ There is a limitation in previous AXI filter, it cannot filter different IDs
+ at the same time as the filter is shared between counters. This quirk is the
+ extension of AXI ID filter. One improvement is that counter 1-3 has their own
+ filter, means that it supports concurrently filter various IDs. Another
+ improvement is that counter 1-3 supports AXI PORT and CHANNEL selection. Support
+ selecting address channel or data channel.
+
+ Filter is defined with 2 configuration registers per counter 1-3.
+ --Counter N MASK COMP register - including AXI_ID and AXI_MASKING.
+ --Counter N MUX CNTL register - including AXI CHANNEL and AXI PORT.
+
+ - 0: address channel
+ - 1: data channel
+
+ PMU in DDR subsystem, only one single port0 exists, so axi_port is reserved
+ which should be 0.
+
+ .. code-block:: bash
+
+ perf stat -a -e imx8_ddr0/axid-read,axi_mask=0xMMMM,axi_id=0xDDDD,axi_channel=0xH/ cmd
+ perf stat -a -e imx8_ddr0/axid-write,axi_mask=0xMMMM,axi_id=0xDDDD,axi_channel=0xH/ cmd
+
+ .. note::
+
+ axi_channel is inverted in userspace, and it will be reverted in driver
+ automatically. So that users do not need specify axi_channel if want to
+ monitor data channel from DDR transactions, since data channel is more
+ meaningful.
diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
index 47c99f40cc16..f4a4513c526f 100644
--- a/Documentation/admin-guide/perf/index.rst
+++ b/Documentation/admin-guide/perf/index.rst
@@ -8,10 +8,19 @@ Performance monitor support
:maxdepth: 1
hisi-pmu
+ hisi-pcie-pmu
+ hns3-pmu
imx-ddr
qcom_l2_pmu
qcom_l3_pmu
arm-ccn
+ arm-cmn
xgene-pmu
arm_dsu_pmu
thunderx2-pmu
+ alibaba_pmu
+ dwc_pcie_pmu
+ nvidia-pmu
+ meson-ddr-pmu
+ cxl
+ ampere_cspmu
diff --git a/Documentation/admin-guide/perf/meson-ddr-pmu.rst b/Documentation/admin-guide/perf/meson-ddr-pmu.rst
new file mode 100644
index 000000000000..8e71be1d6346
--- /dev/null
+++ b/Documentation/admin-guide/perf/meson-ddr-pmu.rst
@@ -0,0 +1,70 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================================================
+Amlogic SoC DDR Bandwidth Performance Monitoring Unit (PMU)
+===========================================================
+
+The Amlogic Meson G12 SoC contains a bandwidth monitor inside DRAM controller.
+The monitor includes 4 channels. Each channel can count the request accessing
+DRAM. The channel can count up to 3 AXI port simultaneously. It can be helpful
+to show if the performance bottleneck is on DDR bandwidth.
+
+Currently, this driver supports the following 5 perf events:
+
++ meson_ddr_bw/total_rw_bytes/
++ meson_ddr_bw/chan_1_rw_bytes/
++ meson_ddr_bw/chan_2_rw_bytes/
++ meson_ddr_bw/chan_3_rw_bytes/
++ meson_ddr_bw/chan_4_rw_bytes/
+
+meson_ddr_bw/chan_{1,2,3,4}_rw_bytes/ events are channel-specific events.
+Each channel support filtering, which can let the channel to monitor
+individual IP module in SoC.
+
+Below are DDR access request event filter keywords:
+
++ arm - from CPU
++ vpu_read1 - from OSD + VPP read
++ gpu - from 3D GPU
++ pcie - from PCIe controller
++ hdcp - from HDCP controller
++ hevc_front - from HEVC codec front end
++ usb3_0 - from USB3.0 controller
++ hevc_back - from HEVC codec back end
++ h265enc - from HEVC encoder
++ vpu_read2 - from DI read
++ vpu_write1 - from VDIN write
++ vpu_write2 - from di write
++ vdec - from legacy codec video decoder
++ hcodec - from H264 encoder
++ ge2d - from ge2d
++ spicc1 - from SPI controller 1
++ usb0 - from USB2.0 controller 0
++ dma - from system DMA controller 1
++ arb0 - from arb0
++ sd_emmc_b - from SD eMMC b controller
++ usb1 - from USB2.0 controller 1
++ audio - from Audio module
++ sd_emmc_c - from SD eMMC c controller
++ spicc2 - from SPI controller 2
++ ethernet - from Ethernet controller
+
+
+Examples:
+
+ + Show the total DDR bandwidth per seconds:
+
+ .. code-block:: bash
+
+ perf stat -a -e meson_ddr_bw/total_rw_bytes/ -I 1000 sleep 10
+
+
+ + Show individual DDR bandwidth from CPU and GPU respectively, as well as
+ sum of them:
+
+ .. code-block:: bash
+
+ perf stat -a -e meson_ddr_bw/chan_1_rw_bytes,arm=1/ -I 1000 sleep 10
+ perf stat -a -e meson_ddr_bw/chan_2_rw_bytes,gpu=1/ -I 1000 sleep 10
+ perf stat -a -e meson_ddr_bw/chan_3_rw_bytes,arm=1,gpu=1/ -I 1000 sleep 10
+
diff --git a/Documentation/admin-guide/perf/nvidia-pmu.rst b/Documentation/admin-guide/perf/nvidia-pmu.rst
new file mode 100644
index 000000000000..2e0d47cfe7ea
--- /dev/null
+++ b/Documentation/admin-guide/perf/nvidia-pmu.rst
@@ -0,0 +1,299 @@
+=========================================================
+NVIDIA Tegra SoC Uncore Performance Monitoring Unit (PMU)
+=========================================================
+
+The NVIDIA Tegra SoC includes various system PMUs to measure key performance
+metrics like memory bandwidth, latency, and utilization:
+
+* Scalable Coherency Fabric (SCF)
+* NVLink-C2C0
+* NVLink-C2C1
+* CNVLink
+* PCIE
+
+PMU Driver
+----------
+
+The PMUs in this document are based on ARM CoreSight PMU Architecture as
+described in document: ARM IHI 0091. Since this is a standard architecture, the
+PMUs are managed by a common driver "arm-cs-arch-pmu". This driver describes
+the available events and configuration of each PMU in sysfs. Please see the
+sections below to get the sysfs path of each PMU. Like other uncore PMU drivers,
+the driver provides "cpumask" sysfs attribute to show the CPU id used to handle
+the PMU event. There is also "associated_cpus" sysfs attribute, which contains a
+list of CPUs associated with the PMU instance.
+
+.. _SCF_PMU_Section:
+
+SCF PMU
+-------
+
+The SCF PMU monitors system level cache events, CPU traffic, and
+strongly-ordered (SO) PCIE write traffic to local/remote memory. Please see
+:ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about the PMU
+traffic coverage.
+
+The events and configuration options of this PMU device are described in sysfs,
+see /sys/bus/event_sources/devices/nvidia_scf_pmu_<socket-id>.
+
+Example usage:
+
+* Count event id 0x0 in socket 0::
+
+ perf stat -a -e nvidia_scf_pmu_0/event=0x0/
+
+* Count event id 0x0 in socket 1::
+
+ perf stat -a -e nvidia_scf_pmu_1/event=0x0/
+
+NVLink-C2C0 PMU
+--------------------
+
+The NVLink-C2C0 PMU monitors incoming traffic from a GPU/CPU connected with
+NVLink-C2C (Chip-2-Chip) interconnect. The type of traffic captured by this PMU
+varies dependent on the chip configuration:
+
+* NVIDIA Grace Hopper Superchip: Hopper GPU is connected with Grace SoC.
+
+ In this config, the PMU captures GPU ATS translated or EGM traffic from the GPU.
+
+* NVIDIA Grace CPU Superchip: two Grace CPU SoCs are connected.
+
+ In this config, the PMU captures read and relaxed ordered (RO) writes from
+ PCIE device of the remote SoC.
+
+Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about
+the PMU traffic coverage.
+
+The events and configuration options of this PMU device are described in sysfs,
+see /sys/bus/event_sources/devices/nvidia_nvlink_c2c0_pmu_<socket-id>.
+
+Example usage:
+
+* Count event id 0x0 from the GPU/CPU connected with socket 0::
+
+ perf stat -a -e nvidia_nvlink_c2c0_pmu_0/event=0x0/
+
+* Count event id 0x0 from the GPU/CPU connected with socket 1::
+
+ perf stat -a -e nvidia_nvlink_c2c0_pmu_1/event=0x0/
+
+* Count event id 0x0 from the GPU/CPU connected with socket 2::
+
+ perf stat -a -e nvidia_nvlink_c2c0_pmu_2/event=0x0/
+
+* Count event id 0x0 from the GPU/CPU connected with socket 3::
+
+ perf stat -a -e nvidia_nvlink_c2c0_pmu_3/event=0x0/
+
+NVLink-C2C1 PMU
+-------------------
+
+The NVLink-C2C1 PMU monitors incoming traffic from a GPU connected with
+NVLink-C2C (Chip-2-Chip) interconnect. This PMU captures untranslated GPU
+traffic, in contrast with NvLink-C2C0 PMU that captures ATS translated traffic.
+Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about
+the PMU traffic coverage.
+
+The events and configuration options of this PMU device are described in sysfs,
+see /sys/bus/event_sources/devices/nvidia_nvlink_c2c1_pmu_<socket-id>.
+
+Example usage:
+
+* Count event id 0x0 from the GPU connected with socket 0::
+
+ perf stat -a -e nvidia_nvlink_c2c1_pmu_0/event=0x0/
+
+* Count event id 0x0 from the GPU connected with socket 1::
+
+ perf stat -a -e nvidia_nvlink_c2c1_pmu_1/event=0x0/
+
+* Count event id 0x0 from the GPU connected with socket 2::
+
+ perf stat -a -e nvidia_nvlink_c2c1_pmu_2/event=0x0/
+
+* Count event id 0x0 from the GPU connected with socket 3::
+
+ perf stat -a -e nvidia_nvlink_c2c1_pmu_3/event=0x0/
+
+CNVLink PMU
+---------------
+
+The CNVLink PMU monitors traffic from GPU and PCIE device on remote sockets
+to local memory. For PCIE traffic, this PMU captures read and relaxed ordered
+(RO) write traffic. Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section`
+for more info about the PMU traffic coverage.
+
+The events and configuration options of this PMU device are described in sysfs,
+see /sys/bus/event_sources/devices/nvidia_cnvlink_pmu_<socket-id>.
+
+Each SoC socket can be connected to one or more sockets via CNVLink. The user can
+use "rem_socket" bitmap parameter to select the remote socket(s) to monitor.
+Each bit represents the socket number, e.g. "rem_socket=0xE" corresponds to
+socket 1 to 3.
+/sys/bus/event_sources/devices/nvidia_cnvlink_pmu_<socket-id>/format/rem_socket
+shows the valid bits that can be set in the "rem_socket" parameter.
+
+The PMU can not distinguish the remote traffic initiator, therefore it does not
+provide filter to select the traffic source to monitor. It reports combined
+traffic from remote GPU and PCIE devices.
+
+Example usage:
+
+* Count event id 0x0 for the traffic from remote socket 1, 2, and 3 to socket 0::
+
+ perf stat -a -e nvidia_cnvlink_pmu_0/event=0x0,rem_socket=0xE/
+
+* Count event id 0x0 for the traffic from remote socket 0, 2, and 3 to socket 1::
+
+ perf stat -a -e nvidia_cnvlink_pmu_1/event=0x0,rem_socket=0xD/
+
+* Count event id 0x0 for the traffic from remote socket 0, 1, and 3 to socket 2::
+
+ perf stat -a -e nvidia_cnvlink_pmu_2/event=0x0,rem_socket=0xB/
+
+* Count event id 0x0 for the traffic from remote socket 0, 1, and 2 to socket 3::
+
+ perf stat -a -e nvidia_cnvlink_pmu_3/event=0x0,rem_socket=0x7/
+
+
+PCIE PMU
+------------
+
+The PCIE PMU monitors all read/write traffic from PCIE root ports to
+local/remote memory. Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section`
+for more info about the PMU traffic coverage.
+
+The events and configuration options of this PMU device are described in sysfs,
+see /sys/bus/event_sources/devices/nvidia_pcie_pmu_<socket-id>.
+
+Each SoC socket can support multiple root ports. The user can use
+"root_port" bitmap parameter to select the port(s) to monitor, i.e.
+"root_port=0xF" corresponds to root port 0 to 3.
+/sys/bus/event_sources/devices/nvidia_pcie_pmu_<socket-id>/format/root_port
+shows the valid bits that can be set in the "root_port" parameter.
+
+Example usage:
+
+* Count event id 0x0 from root port 0 and 1 of socket 0::
+
+ perf stat -a -e nvidia_pcie_pmu_0/event=0x0,root_port=0x3/
+
+* Count event id 0x0 from root port 0 and 1 of socket 1::
+
+ perf stat -a -e nvidia_pcie_pmu_1/event=0x0,root_port=0x3/
+
+.. _NVIDIA_Uncore_PMU_Traffic_Coverage_Section:
+
+Traffic Coverage
+----------------
+
+The PMU traffic coverage may vary dependent on the chip configuration:
+
+* **NVIDIA Grace Hopper Superchip**: Hopper GPU is connected with Grace SoC.
+
+ Example configuration with two Grace SoCs::
+
+ ********************************* *********************************
+ * SOCKET-A * * SOCKET-B *
+ * * * *
+ * :::::::: * * :::::::: *
+ * : PCIE : * * : PCIE : *
+ * :::::::: * * :::::::: *
+ * | * * | *
+ * | * * | *
+ * ::::::: ::::::::: * * ::::::::: ::::::: *
+ * : : : : * * : : : : *
+ * : GPU :<--NVLink-->: Grace :<---CNVLink--->: Grace :<--NVLink-->: GPU : *
+ * : : C2C : SoC : * * : SoC : C2C : : *
+ * ::::::: ::::::::: * * ::::::::: ::::::: *
+ * | | * * | | *
+ * | | * * | | *
+ * &&&&&&&& &&&&&&&& * * &&&&&&&& &&&&&&&& *
+ * & GMEM & & CMEM & * * & CMEM & & GMEM & *
+ * &&&&&&&& &&&&&&&& * * &&&&&&&& &&&&&&&& *
+ * * * *
+ ********************************* *********************************
+
+ GMEM = GPU Memory (e.g. HBM)
+ CMEM = CPU Memory (e.g. LPDDR5X)
+
+ |
+ | Following table contains traffic coverage of Grace SoC PMU in socket-A:
+
+ ::
+
+ +--------------+-------+-----------+-----------+-----+----------+----------+
+ | | Source |
+ + +-------+-----------+-----------+-----+----------+----------+
+ | Destination | |GPU ATS |GPU Not-ATS| | Socket-B | Socket-B |
+ | |PCI R/W|Translated,|Translated | CPU | CPU/PCIE1| GPU/PCIE2|
+ | | |EGM | | | | |
+ +==============+=======+===========+===========+=====+==========+==========+
+ | Local | PCIE |NVLink-C2C0|NVLink-C2C1| SCF | SCF PMU | CNVLink |
+ | SYSRAM/CMEM | PMU |PMU |PMU | PMU | | PMU |
+ +--------------+-------+-----------+-----------+-----+----------+----------+
+ | Local GMEM | PCIE | N/A |NVLink-C2C1| SCF | SCF PMU | CNVLink |
+ | | PMU | |PMU | PMU | | PMU |
+ +--------------+-------+-----------+-----------+-----+----------+----------+
+ | Remote | PCIE |NVLink-C2C0|NVLink-C2C1| SCF | | |
+ | SYSRAM/CMEM | PMU |PMU |PMU | PMU | N/A | N/A |
+ | over CNVLink | | | | | | |
+ +--------------+-------+-----------+-----------+-----+----------+----------+
+ | Remote GMEM | PCIE |NVLink-C2C0|NVLink-C2C1| SCF | | |
+ | over CNVLink | PMU |PMU |PMU | PMU | N/A | N/A |
+ +--------------+-------+-----------+-----------+-----+----------+----------+
+
+ PCIE1 traffic represents strongly ordered (SO) writes.
+ PCIE2 traffic represents reads and relaxed ordered (RO) writes.
+
+* **NVIDIA Grace CPU Superchip**: two Grace CPU SoCs are connected.
+
+ Example configuration with two Grace SoCs::
+
+ ******************* *******************
+ * SOCKET-A * * SOCKET-B *
+ * * * *
+ * :::::::: * * :::::::: *
+ * : PCIE : * * : PCIE : *
+ * :::::::: * * :::::::: *
+ * | * * | *
+ * | * * | *
+ * ::::::::: * * ::::::::: *
+ * : : * * : : *
+ * : Grace :<--------NVLink------->: Grace : *
+ * : SoC : * C2C * : SoC : *
+ * ::::::::: * * ::::::::: *
+ * | * * | *
+ * | * * | *
+ * &&&&&&&& * * &&&&&&&& *
+ * & CMEM & * * & CMEM & *
+ * &&&&&&&& * * &&&&&&&& *
+ * * * *
+ ******************* *******************
+
+ GMEM = GPU Memory (e.g. HBM)
+ CMEM = CPU Memory (e.g. LPDDR5X)
+
+ |
+ | Following table contains traffic coverage of Grace SoC PMU in socket-A:
+
+ ::
+
+ +-----------------+-----------+---------+----------+-------------+
+ | | Source |
+ + +-----------+---------+----------+-------------+
+ | Destination | | | Socket-B | Socket-B |
+ | | PCI R/W | CPU | CPU/PCIE1| PCIE2 |
+ | | | | | |
+ +=================+===========+=========+==========+=============+
+ | Local | PCIE PMU | SCF PMU | SCF PMU | NVLink-C2C0 |
+ | SYSRAM/CMEM | | | | PMU |
+ +-----------------+-----------+---------+----------+-------------+
+ | Remote | | | | |
+ | SYSRAM/CMEM | PCIE PMU | SCF PMU | N/A | N/A |
+ | over NVLink-C2C | | | | |
+ +-----------------+-----------+---------+----------+-------------+
+
+ PCIE1 traffic represents strongly ordered (SO) writes.
+ PCIE2 traffic represents reads and relaxed ordered (RO) writes.
diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst
new file mode 100644
index 000000000000..9eb26014d34b
--- /dev/null
+++ b/Documentation/admin-guide/pm/amd-pstate.rst
@@ -0,0 +1,720 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+===============================================
+``amd-pstate`` CPU Performance Scaling Driver
+===============================================
+
+:Copyright: |copy| 2021 Advanced Micro Devices, Inc.
+
+:Author: Huang Rui <ray.huang@amd.com>
+
+
+Introduction
+===================
+
+``amd-pstate`` is the AMD CPU performance scaling driver that introduces a
+new CPU frequency control mechanism on modern AMD APU and CPU series in
+Linux kernel. The new mechanism is based on Collaborative Processor
+Performance Control (CPPC) which provides finer grain frequency management
+than legacy ACPI hardware P-States. Current AMD CPU/APU platforms are using
+the ACPI P-states driver to manage CPU frequency and clocks with switching
+only in 3 P-states. CPPC replaces the ACPI P-states controls and allows a
+flexible, low-latency interface for the Linux kernel to directly
+communicate the performance hints to hardware.
+
+``amd-pstate`` leverages the Linux kernel governors such as ``schedutil``,
+``ondemand``, etc. to manage the performance hints which are provided by
+CPPC hardware functionality that internally follows the hardware
+specification (for details refer to AMD64 Architecture Programmer's Manual
+Volume 2: System Programming [1]_). Currently, ``amd-pstate`` supports basic
+frequency control function according to kernel governors on some of the
+Zen2 and Zen3 processors, and we will implement more AMD specific functions
+in future after we verify them on the hardware and SBIOS.
+
+
+AMD CPPC Overview
+=======================
+
+Collaborative Processor Performance Control (CPPC) interface enumerates a
+continuous, abstract, and unit-less performance value in a scale that is
+not tied to a specific performance state / frequency. This is an ACPI
+standard [2]_ which software can specify application performance goals and
+hints as a relative target to the infrastructure limits. AMD processors
+provide the low latency register model (MSR) instead of an AML code
+interpreter for performance adjustments. ``amd-pstate`` will initialize a
+``struct cpufreq_driver`` instance, ``amd_pstate_driver``, with the callbacks
+to manage each performance update behavior. ::
+
+ Highest Perf ------>+-----------------------+ +-----------------------+
+ | | | |
+ | | | |
+ | | Max Perf ---->| |
+ | | | |
+ | | | |
+ Nominal Perf ------>+-----------------------+ +-----------------------+
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | Desired Perf ---->| |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ Lowest non- | | | |
+ linear perf ------>+-----------------------+ +-----------------------+
+ | | | |
+ | | Lowest perf ---->| |
+ | | | |
+ Lowest perf ------>+-----------------------+ +-----------------------+
+ | | | |
+ | | | |
+ | | | |
+ 0 ------>+-----------------------+ +-----------------------+
+
+ AMD P-States Performance Scale
+
+
+.. _perf_cap:
+
+AMD CPPC Performance Capability
+--------------------------------
+
+Highest Performance (RO)
+.........................
+
+This is the absolute maximum performance an individual processor may reach,
+assuming ideal conditions. This performance level may not be sustainable
+for long durations and may only be achievable if other platform components
+are in a specific state; for example, it may require other processors to be in
+an idle state. This would be equivalent to the highest frequencies
+supported by the processor.
+
+Nominal (Guaranteed) Performance (RO)
+......................................
+
+This is the maximum sustained performance level of the processor, assuming
+ideal operating conditions. In the absence of an external constraint (power,
+thermal, etc.), this is the performance level the processor is expected to
+be able to maintain continuously. All cores/processors are expected to be
+able to sustain their nominal performance state simultaneously.
+
+Lowest non-linear Performance (RO)
+...................................
+
+This is the lowest performance level at which nonlinear power savings are
+achieved, for example, due to the combined effects of voltage and frequency
+scaling. Above this threshold, lower performance levels should be generally
+more energy efficient than higher performance levels. This register
+effectively conveys the most efficient performance level to ``amd-pstate``.
+
+Lowest Performance (RO)
+........................
+
+This is the absolute lowest performance level of the processor. Selecting a
+performance level lower than the lowest nonlinear performance level may
+cause an efficiency penalty but should reduce the instantaneous power
+consumption of the processor.
+
+AMD CPPC Performance Control
+------------------------------
+
+``amd-pstate`` passes performance goals through these registers. The
+register drives the behavior of the desired performance target.
+
+Minimum requested performance (RW)
+...................................
+
+``amd-pstate`` specifies the minimum allowed performance level.
+
+Maximum requested performance (RW)
+...................................
+
+``amd-pstate`` specifies a limit the maximum performance that is expected
+to be supplied by the hardware.
+
+Desired performance target (RW)
+...................................
+
+``amd-pstate`` specifies a desired target in the CPPC performance scale as
+a relative number. This can be expressed as percentage of nominal
+performance (infrastructure max). Below the nominal sustained performance
+level, desired performance expresses the average performance level of the
+processor subject to hardware. Above the nominal performance level,
+the processor must provide at least nominal performance requested and go higher
+if current operating conditions allow.
+
+Energy Performance Preference (EPP) (RW)
+.........................................
+
+This attribute provides a hint to the hardware if software wants to bias
+toward performance (0x0) or energy efficiency (0xff).
+
+
+Key Governors Support
+=======================
+
+``amd-pstate`` can be used with all the (generic) scaling governors listed
+by the ``scaling_available_governors`` policy attribute in ``sysfs``. Then,
+it is responsible for the configuration of policy objects corresponding to
+CPUs and provides the ``CPUFreq`` core (and the scaling governors attached
+to the policy objects) with accurate information on the maximum and minimum
+operating frequencies supported by the hardware. Users can check the
+``scaling_cur_freq`` information comes from the ``CPUFreq`` core.
+
+``amd-pstate`` mainly supports ``schedutil`` and ``ondemand`` for dynamic
+frequency control. It is to fine tune the processor configuration on
+``amd-pstate`` to the ``schedutil`` with CPU CFS scheduler. ``amd-pstate``
+registers the adjust_perf callback to implement performance update behavior
+similar to CPPC. It is initialized by ``sugov_start`` and then populates the
+CPU's update_util_data pointer to assign ``sugov_update_single_perf`` as the
+utilization update callback function in the CPU scheduler. The CPU scheduler
+will call ``cpufreq_update_util`` and assigns the target performance according
+to the ``struct sugov_cpu`` that the utilization update belongs to.
+Then, ``amd-pstate`` updates the desired performance according to the CPU
+scheduler assigned.
+
+.. _processor_support:
+
+Processor Support
+=======================
+
+The ``amd-pstate`` initialization will fail if the ``_CPC`` entry in the ACPI
+SBIOS does not exist in the detected processor. It uses ``acpi_cpc_valid``
+to check the existence of ``_CPC``. All Zen based processors support the legacy
+ACPI hardware P-States function, so when ``amd-pstate`` fails initialization,
+the kernel will fall back to initialize the ``acpi-cpufreq`` driver.
+
+There are two types of hardware implementations for ``amd-pstate``: one is
+`Full MSR Support <perf_cap_>`_ and another is `Shared Memory Support
+<perf_cap_>`_. It can use the :c:macro:`X86_FEATURE_CPPC` feature flag to
+indicate the different types. (For details, refer to the Processor Programming
+Reference (PPR) for AMD Family 19h Model 51h, Revision A1 Processors [3]_.)
+``amd-pstate`` is to register different ``static_call`` instances for different
+hardware implementations.
+
+Currently, some of the Zen2 and Zen3 processors support ``amd-pstate``. In the
+future, it will be supported on more and more AMD processors.
+
+Full MSR Support
+-----------------
+
+Some new Zen3 processors such as Cezanne provide the MSR registers directly
+while the :c:macro:`X86_FEATURE_CPPC` CPU feature flag is set.
+``amd-pstate`` can handle the MSR register to implement the fast switch
+function in ``CPUFreq`` that can reduce the latency of frequency control in
+interrupt context. The functions with a ``pstate_xxx`` prefix represent the
+operations on MSR registers.
+
+Shared Memory Support
+----------------------
+
+If the :c:macro:`X86_FEATURE_CPPC` CPU feature flag is not set, the
+processor supports the shared memory solution. In this case, ``amd-pstate``
+uses the ``cppc_acpi`` helper methods to implement the callback functions
+that are defined on ``static_call``. The functions with the ``cppc_xxx`` prefix
+represent the operations of ACPI CPPC helpers for the shared memory solution.
+
+
+AMD P-States and ACPI hardware P-States always can be supported in one
+processor. But AMD P-States has the higher priority and if it is enabled
+with :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond
+to the request from AMD P-States.
+
+
+User Space Interface in ``sysfs`` - Per-policy control
+======================================================
+
+``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to
+control its functionality at the system level. They are located in the
+``/sys/devices/system/cpu/cpufreq/policyX/`` directory and affect all CPUs. ::
+
+ root@hr-test1:/home/ray# ls /sys/devices/system/cpu/cpufreq/policy0/*amd*
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_highest_perf
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_nonlinear_freq
+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_max_freq
+
+
+``amd_pstate_highest_perf / amd_pstate_max_freq``
+
+Maximum CPPC performance and CPU frequency that the driver is allowed to
+set, in percent of the maximum supported CPPC performance level (the highest
+performance supported in `AMD CPPC Performance Capability <perf_cap_>`_).
+In some ASICs, the highest CPPC performance is not the one in the ``_CPC``
+table, so we need to expose it to sysfs. If boost is not active, but
+still supported, this maximum frequency will be larger than the one in
+``cpuinfo``.
+This attribute is read-only.
+
+``amd_pstate_lowest_nonlinear_freq``
+
+The lowest non-linear CPPC CPU frequency that the driver is allowed to set,
+in percent of the maximum supported CPPC performance level. (Please see the
+lowest non-linear performance in `AMD CPPC Performance Capability
+<perf_cap_>`_.)
+This attribute is read-only.
+
+``energy_performance_available_preferences``
+
+A list of all the supported EPP preferences that could be used for
+``energy_performance_preference`` on this system.
+These profiles represent different hints that are provided
+to the low-level firmware about the user's desired energy vs efficiency
+tradeoff. ``default`` represents the epp value is set by platform
+firmware. This attribute is read-only.
+
+``energy_performance_preference``
+
+The current energy performance preference can be read from this attribute.
+and user can change current preference according to energy or performance needs
+Please get all support profiles list from
+``energy_performance_available_preferences`` attribute, all the profiles are
+integer values defined between 0 to 255 when EPP feature is enabled by platform
+firmware, if EPP feature is disabled, driver will ignore the written value
+This attribute is read-write.
+
+Other performance and frequency values can be read back from
+``/sys/devices/system/cpu/cpuX/acpi_cppc/``, see :ref:`cppc_sysfs`.
+
+
+``amd-pstate`` vs ``acpi-cpufreq``
+======================================
+
+On the majority of AMD platforms supported by ``acpi-cpufreq``, the ACPI tables
+provided by the platform firmware are used for CPU performance scaling, but
+only provide 3 P-states on AMD processors.
+However, on modern AMD APU and CPU series, hardware provides the Collaborative
+Processor Performance Control according to the ACPI protocol and customizes this
+for AMD platforms. That is, fine-grained and continuous frequency ranges
+instead of the legacy hardware P-states. ``amd-pstate`` is the kernel
+module which supports the new AMD P-States mechanism on most of the future AMD
+platforms. The AMD P-States mechanism is the more performance and energy
+efficiency frequency management method on AMD processors.
+
+
+AMD Pstate Driver Operation Modes
+=================================
+
+``amd_pstate`` CPPC has 3 operation modes: autonomous (active) mode,
+non-autonomous (passive) mode and guided autonomous (guided) mode.
+Active/passive/guided mode can be chosen by different kernel parameters.
+
+- In autonomous mode, platform ignores the desired performance level request
+ and takes into account only the values set to the minimum, maximum and energy
+ performance preference registers.
+- In non-autonomous mode, platform gets desired performance level
+ from OS directly through Desired Performance Register.
+- In guided-autonomous mode, platform sets operating performance level
+ autonomously according to the current workload and within the limits set by
+ OS through min and max performance registers.
+
+Active Mode
+------------
+
+``amd_pstate=active``
+
+This is the low-level firmware control mode which is implemented by ``amd_pstate_epp``
+driver with ``amd_pstate=active`` passed to the kernel in the command line.
+In this mode, ``amd_pstate_epp`` driver provides a hint to the hardware if software
+wants to bias toward performance (0x0) or energy efficiency (0xff) to the CPPC firmware.
+then CPPC power algorithm will calculate the runtime workload and adjust the realtime
+cores frequency according to the power supply and thermal, core voltage and some other
+hardware conditions.
+
+Passive Mode
+------------
+
+``amd_pstate=passive``
+
+It will be enabled if the ``amd_pstate=passive`` is passed to the kernel in the command line.
+In this mode, ``amd_pstate`` driver software specifies a desired QoS target in the CPPC
+performance scale as a relative number. This can be expressed as percentage of nominal
+performance (infrastructure max). Below the nominal sustained performance level,
+desired performance expresses the average performance level of the processor subject
+to the Performance Reduction Tolerance register. Above the nominal performance level,
+processor must provide at least nominal performance requested and go higher if current
+operating conditions allow.
+
+Guided Mode
+-----------
+
+``amd_pstate=guided``
+
+If ``amd_pstate=guided`` is passed to kernel command line option then this mode
+is activated. In this mode, driver requests minimum and maximum performance
+level and the platform autonomously selects a performance level in this range
+and appropriate to the current workload.
+
+User Space Interface in ``sysfs`` - General
+===========================================
+
+Global Attributes
+-----------------
+
+``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to
+control its functionality at the system level. They are located in the
+``/sys/devices/system/cpu/amd_pstate/`` directory and affect all CPUs.
+
+``status``
+ Operation mode of the driver: "active", "passive" or "disable".
+
+ "active"
+ The driver is functional and in the ``active mode``
+
+ "passive"
+ The driver is functional and in the ``passive mode``
+
+ "guided"
+ The driver is functional and in the ``guided mode``
+
+ "disable"
+ The driver is unregistered and not functional now.
+
+ This attribute can be written to in order to change the driver's
+ operation mode or to unregister it. The string written to it must be
+ one of the possible values of it and, if successful, writing one of
+ these values to the sysfs file will cause the driver to switch over
+ to the operation mode represented by that string - or to be
+ unregistered in the "disable" case.
+
+``cpupower`` tool support for ``amd-pstate``
+===============================================
+
+``amd-pstate`` is supported by the ``cpupower`` tool, which can be used to dump
+frequency information. Development is in progress to support more and more
+operations for the new ``amd-pstate`` module with this tool. ::
+
+ root@hr-test1:/home/ray# cpupower frequency-info
+ analyzing CPU 0:
+ driver: amd-pstate
+ CPUs which run at the same hardware frequency: 0
+ CPUs which need to have their frequency coordinated by software: 0
+ maximum transition latency: 131 us
+ hardware limits: 400 MHz - 4.68 GHz
+ available cpufreq governors: ondemand conservative powersave userspace performance schedutil
+ current policy: frequency should be within 400 MHz and 4.68 GHz.
+ The governor "schedutil" may decide which speed to use
+ within this range.
+ current CPU frequency: Unable to call hardware
+ current CPU frequency: 4.02 GHz (asserted by call to kernel)
+ boost state support:
+ Supported: yes
+ Active: yes
+ AMD PSTATE Highest Performance: 166. Maximum Frequency: 4.68 GHz.
+ AMD PSTATE Nominal Performance: 117. Nominal Frequency: 3.30 GHz.
+ AMD PSTATE Lowest Non-linear Performance: 39. Lowest Non-linear Frequency: 1.10 GHz.
+ AMD PSTATE Lowest Performance: 15. Lowest Frequency: 400 MHz.
+
+
+Diagnostics and Tuning
+=======================
+
+Trace Events
+--------------
+
+There are two static trace events that can be used for ``amd-pstate``
+diagnostics. One of them is the ``cpu_frequency`` trace event generally used
+by ``CPUFreq``, and the other one is the ``amd_pstate_perf`` trace event
+specific to ``amd-pstate``. The following sequence of shell commands can
+be used to enable them and see their output (if the kernel is
+configured to support event tracing). ::
+
+ root@hr-test1:/home/ray# cd /sys/kernel/tracing/
+ root@hr-test1:/sys/kernel/tracing# echo 1 > events/amd_cpu/enable
+ root@hr-test1:/sys/kernel/tracing# cat trace
+ # tracer: nop
+ #
+ # entries-in-buffer/entries-written: 47827/42233061 #P:2
+ #
+ # _-----=> irqs-off
+ # / _----=> need-resched
+ # | / _---=> hardirq/softirq
+ # || / _--=> preempt-depth
+ # ||| / delay
+ # TASK-PID CPU# |||| TIMESTAMP FUNCTION
+ # | | | |||| | |
+ <idle>-0 [015] dN... 4995.979886: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=15 changed=false fast_switch=true
+ <idle>-0 [007] d.h.. 4995.979893: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true
+ cat-2161 [000] d.... 4995.980841: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=0 changed=false fast_switch=true
+ sshd-2125 [004] d.s.. 4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=4 changed=false fast_switch=true
+ <idle>-0 [007] d.s.. 4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true
+ <idle>-0 [003] d.s.. 4995.980971: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=3 changed=false fast_switch=true
+ <idle>-0 [011] d.s.. 4995.980996: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=11 changed=false fast_switch=true
+
+The ``cpu_frequency`` trace event will be triggered either by the ``schedutil`` scaling
+governor (for the policies it is attached to), or by the ``CPUFreq`` core (for the
+policies with other scaling governors).
+
+
+Tracer Tool
+-------------
+
+``amd_pstate_tracer.py`` can record and parse ``amd-pstate`` trace log, then
+generate performance plots. This utility can be used to debug and tune the
+performance of ``amd-pstate`` driver. The tracer tool needs to import intel
+pstate tracer.
+
+Tracer tool located in ``linux/tools/power/x86/amd_pstate_tracer``. It can be
+used in two ways. If trace file is available, then directly parse the file
+with command ::
+
+ ./amd_pstate_trace.py [-c cpus] -t <trace_file> -n <test_name>
+
+Or generate trace file with root privilege, then parse and plot with command ::
+
+ sudo ./amd_pstate_trace.py [-c cpus] -n <test_name> -i <interval> [-m kbytes]
+
+The test result can be found in ``results/test_name``. Following is the example
+about part of the output. ::
+
+ common_cpu common_secs common_usecs min_perf des_perf max_perf freq mperf apef tsc load duration_ms sample_num elapsed_time common_comm
+ CPU_005 712 116384 39 49 166 0.7565 9645075 2214891 38431470 25.1 11.646 469 2.496 kworker/5:0-40
+ CPU_006 712 116408 39 49 166 0.6769 8950227 1839034 37192089 24.06 11.272 470 2.496 kworker/6:0-1264
+
+Unit Tests for amd-pstate
+-------------------------
+
+``amd-pstate-ut`` is a test module for testing the ``amd-pstate`` driver.
+
+ * It can help all users to verify their processor support (SBIOS/Firmware or Hardware).
+
+ * Kernel can have a basic function test to avoid the kernel regression during the update.
+
+ * We can introduce more functional or performance tests to align the result together, it will benefit power and performance scale optimization.
+
+1. Test case descriptions
+
+ 1). Basic tests
+
+ Test prerequisite and basic functions for the ``amd-pstate`` driver.
+
+ +---------+--------------------------------+------------------------------------------------------------------------------------+
+ | Index | Functions | Description |
+ +=========+================================+====================================================================================+
+ | 1 | amd_pstate_ut_acpi_cpc_valid || Check whether the _CPC object is present in SBIOS. |
+ | | || |
+ | | || The detail refer to `Processor Support <processor_support_>`_. |
+ +---------+--------------------------------+------------------------------------------------------------------------------------+
+ | 2 | amd_pstate_ut_check_enabled || Check whether AMD P-State is enabled. |
+ | | || |
+ | | || AMD P-States and ACPI hardware P-States always can be supported in one processor. |
+ | | | But AMD P-States has the higher priority and if it is enabled with |
+ | | | :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond to the |
+ | | | request from AMD P-States. |
+ +---------+--------------------------------+------------------------------------------------------------------------------------+
+ | 3 | amd_pstate_ut_check_perf || Check if the each performance values are reasonable. |
+ | | || highest_perf >= nominal_perf > lowest_nonlinear_perf > lowest_perf > 0. |
+ +---------+--------------------------------+------------------------------------------------------------------------------------+
+ | 4 | amd_pstate_ut_check_freq || Check if the each frequency values and max freq when set support boost mode |
+ | | | are reasonable. |
+ | | || max_freq >= nominal_freq > lowest_nonlinear_freq > min_freq > 0 |
+ | | || If boost is not active but supported, this maximum frequency will be larger than |
+ | | | the one in ``cpuinfo``. |
+ +---------+--------------------------------+------------------------------------------------------------------------------------+
+
+ 2). Tbench test
+
+ Test and monitor the cpu changes when running tbench benchmark under the specified governor.
+ These changes include desire performance, frequency, load, performance, energy etc.
+ The specified governor is ondemand or schedutil.
+ Tbench can also be tested on the ``acpi-cpufreq`` kernel driver for comparison.
+
+ 3). Gitsource test
+
+ Test and monitor the cpu changes when running gitsource benchmark under the specified governor.
+ These changes include desire performance, frequency, load, time, energy etc.
+ The specified governor is ondemand or schedutil.
+ Gitsource can also be tested on the ``acpi-cpufreq`` kernel driver for comparison.
+
+#. How to execute the tests
+
+ We use test module in the kselftest frameworks to implement it.
+ We create ``amd-pstate-ut`` module and tie it into kselftest.(for
+ details refer to Linux Kernel Selftests [4]_).
+
+ 1). Build
+
+ + open the :c:macro:`CONFIG_X86_AMD_PSTATE` configuration option.
+ + set the :c:macro:`CONFIG_X86_AMD_PSTATE_UT` configuration option to M.
+ + make project
+ + make selftest ::
+
+ $ cd linux
+ $ make -C tools/testing/selftests
+
+ + make perf ::
+
+ $ cd tools/perf/
+ $ make
+
+
+ 2). Installation & Steps ::
+
+ $ make -C tools/testing/selftests install INSTALL_PATH=~/kselftest
+ $ cp tools/perf/perf /usr/bin/perf
+ $ sudo ./kselftest/run_kselftest.sh -c amd-pstate
+
+ 3). Specified test case ::
+
+ $ cd ~/kselftest/amd-pstate
+ $ sudo ./run.sh -t basic
+ $ sudo ./run.sh -t tbench
+ $ sudo ./run.sh -t tbench -m acpi-cpufreq
+ $ sudo ./run.sh -t gitsource
+ $ sudo ./run.sh -t gitsource -m acpi-cpufreq
+ $ ./run.sh --help
+ ./run.sh: illegal option -- -
+ Usage: ./run.sh [OPTION...]
+ [-h <help>]
+ [-o <output-file-for-dump>]
+ [-c <all: All testing,
+ basic: Basic testing,
+ tbench: Tbench testing,
+ gitsource: Gitsource testing.>]
+ [-t <tbench time limit>]
+ [-p <tbench process number>]
+ [-l <loop times for tbench>]
+ [-i <amd tracer interval>]
+ [-m <comparative test: acpi-cpufreq>]
+
+
+ 4). Results
+
+ + basic
+
+ When you finish test, you will get the following log info ::
+
+ $ dmesg | grep "amd_pstate_ut" | tee log.txt
+ [12977.570663] amd_pstate_ut: 1 amd_pstate_ut_acpi_cpc_valid success!
+ [12977.570673] amd_pstate_ut: 2 amd_pstate_ut_check_enabled success!
+ [12977.571207] amd_pstate_ut: 3 amd_pstate_ut_check_perf success!
+ [12977.571212] amd_pstate_ut: 4 amd_pstate_ut_check_freq success!
+
+ + tbench
+
+ When you finish test, you will get selftest.tbench.csv and png images.
+ The selftest.tbench.csv file contains the raw data and the drop of the comparative test.
+ The png images shows the performance, energy and performan per watt of each test.
+ Open selftest.tbench.csv :
+
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + Governor | Round | Des-perf | Freq | Load | Performance | Energy | Performance Per Watt |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + Unit | | | GHz | | MB/s | J | MB/J |
+ +=================================================+==============+==========+=========+==========+=============+=========+======================+
+ + amd-pstate-ondemand | 1 | | | | 2504.05 | 1563.67 | 158.5378 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + amd-pstate-ondemand | 2 | | | | 2243.64 | 1430.32 | 155.2941 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + amd-pstate-ondemand | 3 | | | | 2183.88 | 1401.32 | 154.2860 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + amd-pstate-ondemand | Average | | | | 2310.52 | 1465.1 | 156.1268 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + amd-pstate-schedutil | 1 | 165.329 | 1.62257 | 99.798 | 2136.54 | 1395.26 | 151.5971 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + amd-pstate-schedutil | 2 | 166 | 1.49761 | 99.9993 | 2100.56 | 1380.5 | 150.6377 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + amd-pstate-schedutil | 3 | 166 | 1.47806 | 99.9993 | 2084.12 | 1375.76 | 149.9737 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + amd-pstate-schedutil | Average | 165.776 | 1.53275 | 99.9322 | 2107.07 | 1383.84 | 150.7399 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-ondemand | 1 | | | | 2529.9 | 1564.4 | 160.0997 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-ondemand | 2 | | | | 2249.76 | 1432.97 | 155.4297 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-ondemand | 3 | | | | 2181.46 | 1406.88 | 153.5060 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-ondemand | Average | | | | 2320.37 | 1468.08 | 156.4741 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-schedutil | 1 | | | | 2137.64 | 1385.24 | 152.7723 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-schedutil | 2 | | | | 2107.05 | 1372.23 | 152.0138 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-schedutil | 3 | | | | 2085.86 | 1365.35 | 151.2433 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-schedutil | Average | | | | 2110.18 | 1374.27 | 152.0136 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-ondemand VS acpi-cpufreq-schedutil | Comprison(%) | | | | -9.0584 | -6.3899 | -2.8506 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + amd-pstate-ondemand VS amd-pstate-schedutil | Comprison(%) | | | | 8.8053 | -5.5463 | -3.4503 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-ondemand VS amd-pstate-ondemand | Comprison(%) | | | | -0.4245 | -0.2029 | -0.2219 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-schedutil VS amd-pstate-schedutil | Comprison(%) | | | | -0.1473 | 0.6963 | -0.8378 |
+ +-------------------------------------------------+--------------+----------+---------+----------+-------------+---------+----------------------+
+
+ + gitsource
+
+ When you finish test, you will get selftest.gitsource.csv and png images.
+ The selftest.gitsource.csv file contains the raw data and the drop of the comparative test.
+ The png images shows the performance, energy and performan per watt of each test.
+ Open selftest.gitsource.csv :
+
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + Governor | Round | Des-perf | Freq | Load | Time | Energy | Performance Per Watt |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + Unit | | | GHz | | s | J | 1/J |
+ +=================================================+==============+==========+==========+==========+=============+=========+======================+
+ + amd-pstate-ondemand | 1 | 50.119 | 2.10509 | 23.3076 | 475.69 | 865.78 | 0.001155027 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + amd-pstate-ondemand | 2 | 94.8006 | 1.98771 | 56.6533 | 467.1 | 839.67 | 0.001190944 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + amd-pstate-ondemand | 3 | 76.6091 | 2.53251 | 43.7791 | 467.69 | 855.85 | 0.001168429 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + amd-pstate-ondemand | Average | 73.8429 | 2.20844 | 41.2467 | 470.16 | 853.767 | 0.001171279 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + amd-pstate-schedutil | 1 | 165.919 | 1.62319 | 98.3868 | 464.17 | 866.8 | 0.001153668 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + amd-pstate-schedutil | 2 | 165.97 | 1.31309 | 99.5712 | 480.15 | 880.4 | 0.001135847 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + amd-pstate-schedutil | 3 | 165.973 | 1.28448 | 99.9252 | 481.79 | 867.02 | 0.001153375 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + amd-pstate-schedutil | Average | 165.954 | 1.40692 | 99.2944 | 475.37 | 871.407 | 0.001147569 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-ondemand | 1 | | | | 2379.62 | 742.96 | 0.001345967 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-ondemand | 2 | | | | 441.74 | 817.49 | 0.001223256 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-ondemand | 3 | | | | 455.48 | 820.01 | 0.001219497 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-ondemand | Average | | | | 425.613 | 793.487 | 0.001260260 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-schedutil | 1 | | | | 459.69 | 838.54 | 0.001192548 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-schedutil | 2 | | | | 466.55 | 830.89 | 0.001203528 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-schedutil | 3 | | | | 470.38 | 837.32 | 0.001194286 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-schedutil | Average | | | | 465.54 | 835.583 | 0.001196769 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-ondemand VS acpi-cpufreq-schedutil | Comprison(%) | | | | 9.3810 | 5.3051 | -5.0379 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + amd-pstate-ondemand VS amd-pstate-schedutil | Comprison(%) | 124.7392 | -36.2934 | 140.7329 | 1.1081 | 2.0661 | -2.0242 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-ondemand VS amd-pstate-ondemand | Comprison(%) | | | | 10.4665 | 7.5968 | -7.0605 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+ + acpi-cpufreq-schedutil VS amd-pstate-schedutil | Comprison(%) | | | | 2.1115 | 4.2873 | -4.1110 |
+ +-------------------------------------------------+--------------+----------+----------+----------+-------------+---------+----------------------+
+
+Reference
+===========
+
+.. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming,
+ https://www.amd.com/system/files/TechDocs/24593.pdf
+
+.. [2] Advanced Configuration and Power Interface Specification,
+ https://uefi.org/sites/default/files/resources/ACPI_Spec_6_4_Jan22.pdf
+
+.. [3] Processor Programming Reference (PPR) for AMD Family 19h Model 51h, Revision A1 Processors
+ https://www.amd.com/system/files/TechDocs/56569-A1-PUB.zip
+
+.. [4] Linux Kernel Selftests,
+ https://www.kernel.org/doc/html/latest/dev-tools/kselftest.html
diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
index 0c74a7784964..6adb7988e0eb 100644
--- a/Documentation/admin-guide/pm/cpufreq.rst
+++ b/Documentation/admin-guide/pm/cpufreq.rst
@@ -1,7 +1,6 @@
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
-.. |struct cpufreq_policy| replace:: :c:type:`struct cpufreq_policy <cpufreq_policy>`
.. |intel_pstate| replace:: :doc:`intel_pstate <intel_pstate>`
=======================
@@ -92,16 +91,16 @@ control the P-state of multiple CPUs at the same time and writing to it affects
all of those CPUs simultaneously.
Sets of CPUs sharing hardware P-state control interfaces are represented by
-``CPUFreq`` as |struct cpufreq_policy| objects. For consistency,
-|struct cpufreq_policy| is also used when there is only one CPU in the given
+``CPUFreq`` as struct cpufreq_policy objects. For consistency,
+struct cpufreq_policy is also used when there is only one CPU in the given
set.
-The ``CPUFreq`` core maintains a pointer to a |struct cpufreq_policy| object for
+The ``CPUFreq`` core maintains a pointer to a struct cpufreq_policy object for
every CPU in the system, including CPUs that are currently offline. If multiple
CPUs share the same hardware P-state control interface, all of the pointers
-corresponding to them point to the same |struct cpufreq_policy| object.
+corresponding to them point to the same struct cpufreq_policy object.
-``CPUFreq`` uses |struct cpufreq_policy| as its basic data type and the design
+``CPUFreq`` uses struct cpufreq_policy as its basic data type and the design
of its user space interface is based on the policy concept.
@@ -147,9 +146,9 @@ CPUs in it.
The next major initialization step for a new policy object is to attach a
scaling governor to it (to begin with, that is the default scaling governor
-determined by the kernel configuration, but it may be changed later
-via ``sysfs``). First, a pointer to the new policy object is passed to the
-governor's ``->init()`` callback which is expected to initialize all of the
+determined by the kernel command line or configuration, but it may be changed
+later via ``sysfs``). First, a pointer to the new policy object is passed to
+the governor's ``->init()`` callback which is expected to initialize all of the
data structures necessary to handle the given policy and, possibly, to add
a governor ``sysfs`` interface to it. Next, the governor is started by
invoking its ``->start()`` callback.
diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst
index 5605cc6f9560..19754beb5a4e 100644
--- a/Documentation/admin-guide/pm/cpuidle.rst
+++ b/Documentation/admin-guide/pm/cpuidle.rst
@@ -159,17 +159,15 @@ governor uses that information depends on what algorithm is implemented by it
and that is the primary reason for having more than one governor in the
``CPUIdle`` subsystem.
-There are three ``CPUIdle`` governors available, ``menu``, `TEO <teo-gov_>`_
-and ``ladder``. Which of them is used by default depends on the configuration
-of the kernel and in particular on whether or not the scheduler tick can be
-`stopped by the idle loop <idle-cpus-and-tick_>`_. It is possible to change the
-governor at run time if the ``cpuidle_sysfs_switch`` command line parameter has
-been passed to the kernel, but that is not safe in general, so it should not be
-done on production systems (that may change in the future, though). The name of
-the ``CPUIdle`` governor currently used by the kernel can be read from the
-:file:`current_governor_ro` (or :file:`current_governor` if
-``cpuidle_sysfs_switch`` is present in the kernel command line) file under
-:file:`/sys/devices/system/cpu/cpuidle/` in ``sysfs``.
+There are four ``CPUIdle`` governors available, ``menu``, `TEO <teo-gov_>`_,
+``ladder`` and ``haltpoll``. Which of them is used by default depends on the
+configuration of the kernel and in particular on whether or not the scheduler
+tick can be `stopped by the idle loop <idle-cpus-and-tick_>`_. Available
+governors can be read from the :file:`available_governors`, and the governor
+can be changed at runtime. The name of the ``CPUIdle`` governor currently
+used by the kernel can be read from the :file:`current_governor_ro` or
+:file:`current_governor` file under :file:`/sys/devices/system/cpu/cpuidle/`
+in ``sysfs``.
Which ``CPUIdle`` driver is used, on the other hand, usually depends on the
platform the kernel is running on, but there are platforms with more than one
@@ -349,81 +347,8 @@ for tickless systems. It follows the same basic strategy as the ``menu`` `one
<menu-gov_>`_: it always tries to find the deepest idle state suitable for the
given conditions. However, it applies a different approach to that problem.
-First, it does not use sleep length correction factors, but instead it attempts
-to correlate the observed idle duration values with the available idle states
-and use that information to pick up the idle state that is most likely to
-"match" the upcoming CPU idle interval. Second, it does not take the tasks
-that were running on the given CPU in the past and are waiting on some I/O
-operations to complete now at all (there is no guarantee that they will run on
-the same CPU when they become runnable again) and the pattern detection code in
-it avoids taking timer wakeups into account. It also only uses idle duration
-values less than the current time till the closest timer (with the scheduler
-tick excluded) for that purpose.
-
-Like in the ``menu`` governor `case <menu-gov_>`_, the first step is to obtain
-the *sleep length*, which is the time until the closest timer event with the
-assumption that the scheduler tick will be stopped (that also is the upper bound
-on the time until the next CPU wakeup). That value is then used to preselect an
-idle state on the basis of three metrics maintained for each idle state provided
-by the ``CPUIdle`` driver: ``hits``, ``misses`` and ``early_hits``.
-
-The ``hits`` and ``misses`` metrics measure the likelihood that a given idle
-state will "match" the observed (post-wakeup) idle duration if it "matches" the
-sleep length. They both are subject to decay (after a CPU wakeup) every time
-the target residency of the idle state corresponding to them is less than or
-equal to the sleep length and the target residency of the next idle state is
-greater than the sleep length (that is, when the idle state corresponding to
-them "matches" the sleep length). The ``hits`` metric is increased if the
-former condition is satisfied and the target residency of the given idle state
-is less than or equal to the observed idle duration and the target residency of
-the next idle state is greater than the observed idle duration at the same time
-(that is, it is increased when the given idle state "matches" both the sleep
-length and the observed idle duration). In turn, the ``misses`` metric is
-increased when the given idle state "matches" the sleep length only and the
-observed idle duration is too short for its target residency.
-
-The ``early_hits`` metric measures the likelihood that a given idle state will
-"match" the observed (post-wakeup) idle duration if it does not "match" the
-sleep length. It is subject to decay on every CPU wakeup and it is increased
-when the idle state corresponding to it "matches" the observed (post-wakeup)
-idle duration and the target residency of the next idle state is less than or
-equal to the sleep length (i.e. the idle state "matching" the sleep length is
-deeper than the given one).
-
-The governor walks the list of idle states provided by the ``CPUIdle`` driver
-and finds the last (deepest) one with the target residency less than or equal
-to the sleep length. Then, the ``hits`` and ``misses`` metrics of that idle
-state are compared with each other and it is preselected if the ``hits`` one is
-greater (which means that that idle state is likely to "match" the observed idle
-duration after CPU wakeup). If the ``misses`` one is greater, the governor
-preselects the shallower idle state with the maximum ``early_hits`` metric
-(or if there are multiple shallower idle states with equal ``early_hits``
-metric which also is the maximum, the shallowest of them will be preselected).
-[If there is a wakeup latency constraint coming from the `PM QoS framework
-<cpu-pm-qos_>`_ which is hit before reaching the deepest idle state with the
-target residency within the sleep length, the deepest idle state with the exit
-latency within the constraint is preselected without consulting the ``hits``,
-``misses`` and ``early_hits`` metrics.]
-
-Next, the governor takes several idle duration values observed most recently
-into consideration and if at least a half of them are greater than or equal to
-the target residency of the preselected idle state, that idle state becomes the
-final candidate to ask for. Otherwise, the average of the most recent idle
-duration values below the target residency of the preselected idle state is
-computed and the governor walks the idle states shallower than the preselected
-one and finds the deepest of them with the target residency within that average.
-That idle state is then taken as the final candidate to ask for.
-
-Still, at this point the governor may need to refine the idle state selection if
-it has not decided to `stop the scheduler tick <idle-cpus-and-tick_>`_. That
-generally happens if the target residency of the idle state selected so far is
-less than the tick period and the tick has not been stopped already (in a
-previous iteration of the idle loop). Then, like in the ``menu`` governor
-`case <menu-gov_>`_, the sleep length used in the previous computations may not
-reflect the real time until the closest timer event and if it really is greater
-than that time, a shallower state with a suitable target residency may need to
-be selected.
-
+.. kernel-doc:: drivers/cpuidle/governors/teo.c
+ :doc: teo-description
.. _idle-states-representation:
@@ -480,7 +405,7 @@ order to ask the hardware to enter that state. Also, for each
statistics of the given idle state. That information is exposed by the kernel
via ``sysfs``.
-For each CPU in the system, there is a :file:`/sys/devices/system/cpu<N>/cpuidle/`
+For each CPU in the system, there is a :file:`/sys/devices/system/cpu/cpu<N>/cpuidle/`
directory in ``sysfs``, where the number ``<N>`` is assigned to the given
CPU at the initialization time. That directory contains a set of subdirectories
called :file:`state0`, :file:`state1` and so on, up to the number of idle state
@@ -496,7 +421,7 @@ object corresponding to it, as follows:
residency.
``below``
- Total number of times this idle state had been asked for, but cerainly
+ Total number of times this idle state had been asked for, but certainly
a deeper idle state would have been a better match for the observed idle
duration.
@@ -530,6 +455,10 @@ object corresponding to it, as follows:
Total number of times the hardware has been asked by the given CPU to
enter this idle state.
+``rejected``
+ Total number of times a request to enter this idle state on the given
+ CPU was rejected.
+
The :file:`desc` and :file:`name` files both contain strings. The difference
between them is that the name is expected to be more concise, while the
description may be longer and it may contain white space or special characters.
@@ -574,6 +503,11 @@ particular case. For these reasons, the only reliable way to find out how
much time has been spent by the hardware in different idle states supported by
it is to use idle state residency counters in the hardware, if available.
+Generally, an interrupt received when trying to enter an idle state causes the
+idle state entry request to be rejected, in which case the ``CPUIdle`` driver
+may return an error code to indicate that this was the case. The :file:`usage`
+and :file:`rejected` files report the number of times the given idle state
+was entered successfully or rejected, respectively.
.. _cpu-pm-qos:
@@ -678,8 +612,8 @@ the ``menu`` governor to be used on the systems that use the ``ladder`` governor
by default this way, for example.
The other kernel command line parameters controlling CPU idle time management
-described below are only relevant for the *x86* architecture and some of
-them affect Intel processors only.
+described below are only relevant for the *x86* architecture and references
+to ``intel_idle`` affect Intel processors only.
The *x86* architecture support code recognizes three kernel command line
options related to CPU idle time management: ``idle=poll``, ``idle=halt``,
@@ -692,7 +626,7 @@ which of the two parameters is added to the kernel command line. In the
instruction of the CPUs (which, as a rule, suspends the execution of the program
and causes the hardware to attempt to enter the shallowest available idle state)
for this purpose, and if ``idle=poll`` is used, idle CPUs will execute a
-more or less ``lightweight'' sequence of instructions in a tight loop. [Note
+more or less "lightweight" sequence of instructions in a tight loop. [Note
that using ``idle=poll`` is somewhat drastic in many cases, as preventing idle
CPUs from saving almost any energy at all may not be the only effect of it.
For example, on Intel hardware it effectively prevents CPUs from using
@@ -701,10 +635,13 @@ idle, so it very well may hurt single-thread computations performance as well as
energy-efficiency. Thus using it for performance reasons may not be a good idea
at all.]
-The ``idle=nomwait`` option disables the ``intel_idle`` driver and causes
-``acpi_idle`` to be used (as long as all of the information needed by it is
-there in the system's ACPI tables), but it is not allowed to use the
-``MWAIT`` instruction of the CPUs to ask the hardware to enter idle states.
+The ``idle=nomwait`` option prevents the use of ``MWAIT`` instruction of
+the CPU to enter idle states. When this option is used, the ``acpi_idle``
+driver will use the ``HLT`` instruction instead of ``MWAIT``. On systems
+running Intel processors, this option disables the ``intel_idle`` driver
+and forces the use of the ``acpi_idle`` driver instead. Note that in either
+case, ``acpi_idle`` driver will function only if all the information needed
+by it is in the system's ACPI tables.
In addition to the architecture-level kernel command line options affecting CPU
idle time management, there are parameters affecting individual ``CPUIdle``
diff --git a/Documentation/admin-guide/pm/intel-speed-select.rst b/Documentation/admin-guide/pm/intel-speed-select.rst
new file mode 100644
index 000000000000..a2bfb971654f
--- /dev/null
+++ b/Documentation/admin-guide/pm/intel-speed-select.rst
@@ -0,0 +1,939 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================================================
+Intel(R) Speed Select Technology User Guide
+============================================================
+
+The Intel(R) Speed Select Technology (Intel(R) SST) provides a powerful new
+collection of features that give more granular control over CPU performance.
+With Intel(R) SST, one server can be configured for power and performance for a
+variety of diverse workload requirements.
+
+Refer to the links below for an overview of the technology:
+
+- https://www.intel.com/content/www/us/en/architecture-and-technology/speed-select-technology-article.html
+- https://builders.intel.com/docs/networkbuilders/intel-speed-select-technology-base-frequency-enhancing-performance.pdf
+
+These capabilities are further enhanced in some of the newer generations of
+server platforms where these features can be enumerated and controlled
+dynamically without pre-configuring via BIOS setup options. This dynamic
+configuration is done via mailbox commands to the hardware. One way to enumerate
+and configure these features is by using the Intel Speed Select utility.
+
+This document explains how to use the Intel Speed Select tool to enumerate and
+control Intel(R) SST features. This document gives example commands and explains
+how these commands change the power and performance profile of the system under
+test. Using this tool as an example, customers can replicate the messaging
+implemented in the tool in their production software.
+
+intel-speed-select configuration tool
+======================================
+
+Most Linux distribution packages may include the "intel-speed-select" tool. If not,
+it can be built by downloading the Linux kernel tree from kernel.org. Once
+downloaded, the tool can be built without building the full kernel.
+
+From the kernel tree, run the following commands::
+
+# cd tools/power/x86/intel-speed-select/
+# make
+# make install
+
+Getting Help
+------------
+
+To get help with the tool, execute the command below::
+
+# intel-speed-select --help
+
+The top-level help describes arguments and features. Notice that there is a
+multi-level help structure in the tool. For example, to get help for the feature "perf-profile"::
+
+# intel-speed-select perf-profile --help
+
+To get help on a command, another level of help is provided. For example for the command info "info"::
+
+# intel-speed-select perf-profile info --help
+
+Summary of platform capability
+------------------------------
+To check the current platform and driver capabilities, execute::
+
+#intel-speed-select --info
+
+For example on a test system::
+
+ # intel-speed-select --info
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ Platform: API version : 1
+ Platform: Driver version : 1
+ Platform: mbox supported : 1
+ Platform: mmio supported : 1
+ Intel(R) SST-PP (feature perf-profile) is supported
+ TDP level change control is unlocked, max level: 4
+ Intel(R) SST-TF (feature turbo-freq) is supported
+ Intel(R) SST-BF (feature base-freq) is not supported
+ Intel(R) SST-CP (feature core-power) is supported
+
+Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP)
+------------------------------------------------------------------------
+
+This feature allows configuration of a server dynamically based on workload
+performance requirements. This helps users during deployment as they do not have
+to choose a specific server configuration statically. This Intel(R) Speed Select
+Technology - Performance Profile (Intel(R) SST-PP) feature introduces a mechanism
+that allows multiple optimized performance profiles per system. Each profile
+defines a set of CPUs that need to be online and rest offline to sustain a
+guaranteed base frequency. Once the user issues a command to use a specific
+performance profile and meet CPU online/offline requirement, the user can expect
+a change in the base frequency dynamically. This feature is called
+"perf-profile" when using the Intel Speed Select tool.
+
+Number or performance levels
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There can be multiple performance profiles on a system. To get the number of
+profiles, execute the command below::
+
+ # intel-speed-select perf-profile get-config-levels
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ get-config-levels:4
+ package-1
+ die-0
+ cpu-14
+ get-config-levels:4
+
+On this system under test, there are 4 performance profiles in addition to the
+base performance profile (which is performance level 0).
+
+Lock/Unlock status
+~~~~~~~~~~~~~~~~~~
+
+Even if there are multiple performance profiles, it is possible that they
+are locked. If they are locked, users cannot issue a command to change the
+performance state. It is possible that there is a BIOS setup to unlock or check
+with your system vendor.
+
+To check if the system is locked, execute the following command::
+
+ # intel-speed-select perf-profile get-lock-status
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ get-lock-status:0
+ package-1
+ die-0
+ cpu-14
+ get-lock-status:0
+
+In this case, lock status is 0, which means that the system is unlocked.
+
+Properties of a performance level
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To get properties of a specific performance level (For example for the level 0, below), execute the command below::
+
+ # intel-speed-select perf-profile info -l 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ perf-profile-level-0
+ cpu-count:28
+ enable-cpu-mask:000003ff,f0003fff
+ enable-cpu-list:0,1,2,3,4,5,6,7,8,9,10,11,12,13,28,29,30,31,32,33,34,35,36,37,38,39,40,41
+ thermal-design-power-ratio:26
+ base-frequency(MHz):2600
+ speed-select-turbo-freq:disabled
+ speed-select-base-freq:disabled
+ ...
+ ...
+
+Here -l option is used to specify a performance level.
+
+If the option -l is omitted, then this command will print information about all
+the performance levels. The above command is printing properties of the
+performance level 0.
+
+For this performance profile, the list of CPUs displayed by the
+"enable-cpu-mask/enable-cpu-list" at the max can be "online." When that
+condition is met, then base frequency of 2600 MHz can be maintained. To
+understand more, execute "intel-speed-select perf-profile info" for performance
+level 4::
+
+ # intel-speed-select perf-profile info -l 4
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ perf-profile-level-4
+ cpu-count:28
+ enable-cpu-mask:000000fa,f0000faf
+ enable-cpu-list:0,1,2,3,5,7,8,9,10,11,28,29,30,31,33,35,36,37,38,39
+ thermal-design-power-ratio:28
+ base-frequency(MHz):2800
+ speed-select-turbo-freq:disabled
+ speed-select-base-freq:unsupported
+ ...
+ ...
+
+There are fewer CPUs in the "enable-cpu-mask/enable-cpu-list". Consequently, if
+the user only keeps these CPUs online and the rest "offline," then the base
+frequency is increased to 2.8 GHz compared to 2.6 GHz at performance level 0.
+
+Get current performance level
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To get the current performance level, execute::
+
+ # intel-speed-select perf-profile get-config-current-level
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ get-config-current_level:0
+
+First verify that the base_frequency displayed by the cpufreq sysfs is correct::
+
+ # cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency
+ 2600000
+
+This matches the base-frequency (MHz) field value displayed from the
+"perf-profile info" command for performance level 0(cpufreq frequency is in
+KHz).
+
+To check if the average frequency is equal to the base frequency for a 100% busy
+workload, disable turbo::
+
+# echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
+
+Then runs a busy workload on all CPUs, for example::
+
+#stress -c 64
+
+To verify the base frequency, run turbostat::
+
+ #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
+
+ Package Core CPU Bzy_MHz
+ - - 2600
+ 0 0 0 2600
+ 0 1 1 2600
+ 0 2 2 2600
+ 0 3 3 2600
+ 0 4 4 2600
+ . . . .
+
+
+Changing performance level
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To the change the performance level to 4, execute::
+
+ # intel-speed-select -d perf-profile set-config-level -l 4 -o
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ perf-profile
+ set_tdp_level:success
+
+In the command above, "-o" is optional. If it is specified, then it will also
+offline CPUs which are not present in the enable_cpu_mask for this performance
+level.
+
+Now if the base_frequency is checked::
+
+ #cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency
+ 2800000
+
+Which shows that the base frequency now increased from 2600 MHz at performance
+level 0 to 2800 MHz at performance level 4. As a result, any workload, which can
+use fewer CPUs, can see a boost of 200 MHz compared to performance level 0.
+
+Changing performance level via BMC Interface
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is possible to change SST-PP level using out of band (OOB) agent (Via some
+remote management console, through BMC "Baseboard Management Controller"
+interface). This mode is supported from the Sapphire Rapids processor
+generation. The kernel and tool change to support this mode is added to Linux
+kernel version 5.18. To enable this feature, kernel config
+"CONFIG_INTEL_HFI_THERMAL" is required. The minimum version of the tool
+is "v1.12" to support this feature, which is part of Linux kernel version 5.18.
+
+To support such configuration, this tool can be used as a daemon. Add
+a command line option --oob::
+
+ # intel-speed-select --oob
+ Intel(R) Speed Select Technology
+ Executing on CPU model:143[0x8f]
+ OOB mode is enabled and will run as daemon
+
+In this mode the tool will online/offline CPUs based on the new performance
+level.
+
+Check presence of other Intel(R) SST features
+---------------------------------------------
+
+Each of the performance profiles also specifies weather there is support of
+other two Intel(R) SST features (Intel(R) Speed Select Technology - Base Frequency
+(Intel(R) SST-BF) and Intel(R) Speed Select Technology - Turbo Frequency (Intel
+SST-TF)).
+
+For example, from the output of "perf-profile info" above, for level 0 and level
+4:
+
+For level 0::
+ speed-select-turbo-freq:disabled
+ speed-select-base-freq:disabled
+
+For level 4::
+ speed-select-turbo-freq:disabled
+ speed-select-base-freq:unsupported
+
+Given these results, the "speed-select-base-freq" (Intel(R) SST-BF) in level 4
+changed from "disabled" to "unsupported" compared to performance level 0.
+
+This means that at performance level 4, the "speed-select-base-freq" feature is
+not supported. However, at performance level 0, this feature is "supported", but
+currently "disabled", meaning the user has not activated this feature. Whereas
+"speed-select-turbo-freq" (Intel(R) SST-TF) is supported at both performance
+levels, but currently not activated by the user.
+
+The Intel(R) SST-BF and the Intel(R) SST-TF features are built on a foundation
+technology called Intel(R) Speed Select Technology - Core Power (Intel(R) SST-CP).
+The platform firmware enables this feature when Intel(R) SST-BF or Intel(R) SST-TF
+is supported on a platform.
+
+Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP)
+---------------------------------------------------------------
+
+Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP) is an interface that
+allows users to define per core priority. This defines a mechanism to distribute
+power among cores when there is a power constrained scenario. This defines a
+class of service (CLOS) configuration.
+
+The user can configure up to 4 class of service configurations. Each CLOS group
+configuration allows definitions of parameters, which affects how the frequency
+can be limited and power is distributed. Each CPU core can be tied to a class of
+service and hence an associated priority. The granularity is at core level not
+at per CPU level.
+
+Enable CLOS based prioritization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To use CLOS based prioritization feature, firmware must be informed to enable
+and use a priority type. There is a default per platform priority type, which
+can be changed with optional command line parameter.
+
+To enable and check the options, execute::
+
+ # intel-speed-select core-power enable --help
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ Enable core-power for a package/die
+ Clos Enable: Specify priority type with [--priority|-p]
+ 0: Proportional, 1: Ordered
+
+There are two types of priority types:
+
+- Ordered
+
+Priority for ordered throttling is defined based on the index of the assigned
+CLOS group. Where CLOS0 gets highest priority (throttled last).
+
+Priority order is:
+CLOS0 > CLOS1 > CLOS2 > CLOS3.
+
+- Proportional
+
+When proportional priority is used, there is an additional parameter called
+frequency_weight, which can be specified per CLOS group. The goal of
+proportional priority is to provide each core with the requested min., then
+distribute all remaining (excess/deficit) budgets in proportion to a defined
+weight. This proportional priority can be configured using "core-power config"
+command.
+
+To enable with the platform default priority type, execute::
+
+ # intel-speed-select core-power enable
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ core-power
+ enable:success
+ package-1
+ die-0
+ cpu-6
+ core-power
+ enable:success
+
+The scope of this enable is per package or die scoped when a package contains
+multiple dies. To check if CLOS is enabled and get priority type, "core-power
+info" command can be used. For example to check the status of core-power feature
+on CPU 0, execute::
+
+ # intel-speed-select -c 0 core-power info
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ core-power
+ support-status:supported
+ enable-status:enabled
+ clos-enable-status:enabled
+ priority-type:proportional
+ package-1
+ die-0
+ cpu-24
+ core-power
+ support-status:supported
+ enable-status:enabled
+ clos-enable-status:enabled
+ priority-type:proportional
+
+Configuring CLOS groups
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Each CLOS group has its own attributes including min, max, freq_weight and
+desired. These parameters can be configured with "core-power config" command.
+Defaults will be used if user skips setting a parameter except clos id, which is
+mandatory. To check core-power config options, execute::
+
+ # intel-speed-select core-power config --help
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ Set core-power configuration for one of the four clos ids
+ Specify targeted clos id with [--clos|-c]
+ Specify clos Proportional Priority [--weight|-w]
+ Specify clos min in MHz with [--min|-n]
+ Specify clos max in MHz with [--max|-m]
+
+For example::
+
+ # intel-speed-select core-power config -c 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ clos epp is not specified, default: 0
+ clos frequency weight is not specified, default: 0
+ clos min is not specified, default: 0 MHz
+ clos max is not specified, default: 25500 MHz
+ clos desired is not specified, default: 0
+ package-0
+ die-0
+ cpu-0
+ core-power
+ config:success
+ package-1
+ die-0
+ cpu-6
+ core-power
+ config:success
+
+The user has the option to change defaults. For example, the user can change the
+"min" and set the base frequency to always get guaranteed base frequency.
+
+Get the current CLOS configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To check the current configuration, "core-power get-config" can be used. For
+example, to get the configuration of CLOS 0::
+
+ # intel-speed-select core-power get-config -c 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ core-power
+ clos:0
+ epp:0
+ clos-proportional-priority:0
+ clos-min:0 MHz
+ clos-max:Max Turbo frequency
+ clos-desired:0 MHz
+ package-1
+ die-0
+ cpu-24
+ core-power
+ clos:0
+ epp:0
+ clos-proportional-priority:0
+ clos-min:0 MHz
+ clos-max:Max Turbo frequency
+ clos-desired:0 MHz
+
+Associating a CPU with a CLOS group
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To associate a CPU to a CLOS group "core-power assoc" command can be used::
+
+ # intel-speed-select core-power assoc --help
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ Associate a clos id to a CPU
+ Specify targeted clos id with [--clos|-c]
+
+
+For example to associate CPU 10 to CLOS group 3, execute::
+
+ # intel-speed-select -c 10 core-power assoc -c 3
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-10
+ core-power
+ assoc:success
+
+Once a CPU is associated, its sibling CPUs are also associated to a CLOS group.
+Once associated, avoid changing Linux "cpufreq" subsystem scaling frequency
+limits.
+
+To check the existing association for a CPU, "core-power get-assoc" command can
+be used. For example, to get association of CPU 10, execute::
+
+ # intel-speed-select -c 10 core-power get-assoc
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-1
+ die-0
+ cpu-10
+ get-assoc
+ clos:3
+
+This shows that CPU 10 is part of a CLOS group 3.
+
+
+Disable CLOS based prioritization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To disable, execute::
+
+# intel-speed-select core-power disable
+
+Some features like Intel(R) SST-TF can only be enabled when CLOS based prioritization
+is enabled. For this reason, disabling while Intel(R) SST-TF is enabled can cause
+Intel(R) SST-TF to fail. This will cause the "disable" command to display an error
+if Intel(R) SST-TF is already enabled. In turn, to disable, the Intel(R) SST-TF
+feature must be disabled first.
+
+Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF)
+-------------------------------------------------------------------
+
+The Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF) feature lets
+the user control base frequency. If some critical workload threads demand
+constant high guaranteed performance, then this feature can be used to execute
+the thread at higher base frequency on specific sets of CPUs (high priority
+CPUs) at the cost of lower base frequency (low priority CPUs) on other CPUs.
+This feature does not require offline of the low priority CPUs.
+
+The support of Intel(R) SST-BF depends on the Intel(R) Speed Select Technology -
+Performance Profile (Intel(R) SST-PP) performance level configuration. It is
+possible that only certain performance levels support Intel(R) SST-BF. It is also
+possible that only base performance level (level = 0) has support of Intel
+SST-BF. Consequently, first select the desired performance level to enable this
+feature.
+
+In the system under test here, Intel(R) SST-BF is supported at the base
+performance level 0, but currently disabled. For example for the level 0::
+
+ # intel-speed-select -c 0 perf-profile info -l 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ perf-profile-level-0
+ ...
+
+ speed-select-base-freq:disabled
+ ...
+
+Before enabling Intel(R) SST-BF and measuring its impact on a workload
+performance, execute some workload and measure performance and get a baseline
+performance to compare against.
+
+Here the user wants more guaranteed performance. For this reason, it is likely
+that turbo is disabled. To disable turbo, execute::
+
+#echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
+
+Based on the output of the "intel-speed-select perf-profile info -l 0" base
+frequency of guaranteed frequency 2600 MHz.
+
+
+Measure baseline performance for comparison
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To compare, pick a multi-threaded workload where each thread can be scheduled on
+separate CPUs. "Hackbench pipe" test is a good example on how to improve
+performance using Intel(R) SST-BF.
+
+Below, the workload is measuring average scheduler wakeup latency, so a lower
+number means better performance::
+
+ # taskset -c 3,4 perf bench -r 100 sched pipe
+ # Running 'sched/pipe' benchmark:
+ # Executed 1000000 pipe operations between two processes
+ Total time: 6.102 [sec]
+ 6.102445 usecs/op
+ 163868 ops/sec
+
+While running the above test, if we take turbostat output, it will show us that
+2 of the CPUs are busy and reaching max. frequency (which would be the base
+frequency as the turbo is disabled). The turbostat output::
+
+ #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
+ Package Core CPU Bzy_MHz
+ 0 0 0 1000
+ 0 1 1 1005
+ 0 2 2 1000
+ 0 3 3 2600
+ 0 4 4 2600
+ 0 5 5 1000
+ 0 6 6 1000
+ 0 7 7 1005
+ 0 8 8 1005
+ 0 9 9 1000
+ 0 10 10 1000
+ 0 11 11 995
+ 0 12 12 1000
+ 0 13 13 1000
+
+From the above turbostat output, both CPU 3 and 4 are very busy and reaching
+full guaranteed frequency of 2600 MHz.
+
+Intel(R) SST-BF Capabilities
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To get capabilities of Intel(R) SST-BF for the current performance level 0,
+execute::
+
+ # intel-speed-select base-freq info -l 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ speed-select-base-freq
+ high-priority-base-frequency(MHz):3000
+ high-priority-cpu-mask:00000216,00002160
+ high-priority-cpu-list:5,6,8,13,33,34,36,41
+ low-priority-base-frequency(MHz):2400
+ tjunction-temperature(C):125
+ thermal-design-power(W):205
+
+The above capabilities show that there are some CPUs on this system that can
+offer base frequency of 3000 MHz compared to the standard base frequency at this
+performance levels. Nevertheless, these CPUs are fixed, and they are presented
+via high-priority-cpu-list/high-priority-cpu-mask. But if this Intel(R) SST-BF
+feature is selected, the low priorities CPUs (which are not in
+high-priority-cpu-list) can only offer up to 2400 MHz. As a result, if this
+clipping of low priority CPUs is acceptable, then the user can enable Intel
+SST-BF feature particularly for the above "sched pipe" workload since only two
+CPUs are used, they can be scheduled on high priority CPUs and can get boost of
+400 MHz.
+
+Enable Intel(R) SST-BF
+~~~~~~~~~~~~~~~~~~~~~~
+
+To enable Intel(R) SST-BF feature, execute::
+
+ # intel-speed-select base-freq enable -a
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ base-freq
+ enable:success
+ package-1
+ die-0
+ cpu-14
+ base-freq
+ enable:success
+
+In this case, -a option is optional. This not only enables Intel(R) SST-BF, but it
+also adjusts the priority of cores using Intel(R) Speed Select Technology Core
+Power (Intel(R) SST-CP) features. This option sets the minimum performance of each
+Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP) class to
+maximum performance so that the hardware will give maximum performance possible
+for each CPU.
+
+If -a option is not used, then the following steps are required before enabling
+Intel(R) SST-BF:
+
+- Discover Intel(R) SST-BF and note low and high priority base frequency
+- Note the high priority CPU list
+- Enable CLOS using core-power feature set
+- Configure CLOS parameters. Use CLOS.min to set to minimum performance
+- Subscribe desired CPUs to CLOS groups
+
+With this configuration, if the same workload is executed by pinning the
+workload to high priority CPUs (CPU 5 and 6 in this case)::
+
+ #taskset -c 5,6 perf bench -r 100 sched pipe
+ # Running 'sched/pipe' benchmark:
+ # Executed 1000000 pipe operations between two processes
+ Total time: 5.627 [sec]
+ 5.627922 usecs/op
+ 177685 ops/sec
+
+This way, by enabling Intel(R) SST-BF, the performance of this benchmark is
+improved (latency reduced) by 7.79%. From the turbostat output, it can be
+observed that the high priority CPUs reached 3000 MHz compared to 2600 MHz.
+The turbostat output::
+
+ #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
+ Package Core CPU Bzy_MHz
+ 0 0 0 2151
+ 0 1 1 2166
+ 0 2 2 2175
+ 0 3 3 2175
+ 0 4 4 2175
+ 0 5 5 3000
+ 0 6 6 3000
+ 0 7 7 2180
+ 0 8 8 2662
+ 0 9 9 2176
+ 0 10 10 2175
+ 0 11 11 2176
+ 0 12 12 2176
+ 0 13 13 2661
+
+Disable Intel(R) SST-BF
+~~~~~~~~~~~~~~~~~~~~~~~
+
+To disable the Intel(R) SST-BF feature, execute::
+
+# intel-speed-select base-freq disable -a
+
+
+Intel(R) Speed Select Technology - Turbo Frequency (Intel(R) SST-TF)
+--------------------------------------------------------------------
+
+This feature enables the ability to set different "All core turbo ratio limits"
+to cores based on the priority. By using this feature, some cores can be
+configured to get higher turbo frequency by designating them as high priority at
+the cost of lower or no turbo frequency on the low priority cores.
+
+For this reason, this feature is only useful when system is busy utilizing all
+CPUs, but the user wants some configurable option to get high performance on
+some CPUs.
+
+The support of Intel(R) Speed Select Technology - Turbo Frequency (Intel(R) SST-TF)
+depends on the Intel(R) Speed Select Technology - Performance Profile (Intel
+SST-PP) performance level configuration. It is possible that only a certain
+performance level supports Intel(R) SST-TF. It is also possible that only the base
+performance level (level = 0) has the support of Intel(R) SST-TF. Hence, first
+select the desired performance level to enable this feature.
+
+In the system under test here, Intel(R) SST-TF is supported at the base
+performance level 0, but currently disabled::
+
+ # intel-speed-select -c 0 perf-profile info -l 0
+ Intel(R) Speed Select Technology
+ package-0
+ die-0
+ cpu-0
+ perf-profile-level-0
+ ...
+ ...
+ speed-select-turbo-freq:disabled
+ ...
+ ...
+
+
+To check if performance can be improved using Intel(R) SST-TF feature, get the turbo
+frequency properties with Intel(R) SST-TF enabled and compare to the base turbo
+capability of this system.
+
+Get Base turbo capability
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To get the base turbo capability of performance level 0, execute::
+
+ # intel-speed-select perf-profile info -l 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ perf-profile-level-0
+ ...
+ ...
+ turbo-ratio-limits-sse
+ bucket-0
+ core-count:2
+ max-turbo-frequency(MHz):3200
+ bucket-1
+ core-count:4
+ max-turbo-frequency(MHz):3100
+ bucket-2
+ core-count:6
+ max-turbo-frequency(MHz):3100
+ bucket-3
+ core-count:8
+ max-turbo-frequency(MHz):3100
+ bucket-4
+ core-count:10
+ max-turbo-frequency(MHz):3100
+ bucket-5
+ core-count:12
+ max-turbo-frequency(MHz):3100
+ bucket-6
+ core-count:14
+ max-turbo-frequency(MHz):3100
+ bucket-7
+ core-count:16
+ max-turbo-frequency(MHz):3100
+
+Based on the data above, when all the CPUS are busy, the max. frequency of 3100
+MHz can be achieved. If there is some busy workload on cpu 0 - 11 (e.g. stress)
+and on CPU 12 and 13, execute "hackbench pipe" workload::
+
+ # taskset -c 12,13 perf bench -r 100 sched pipe
+ # Running 'sched/pipe' benchmark:
+ # Executed 1000000 pipe operations between two processes
+ Total time: 5.705 [sec]
+ 5.705488 usecs/op
+ 175269 ops/sec
+
+The turbostat output::
+
+ #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
+ Package Core CPU Bzy_MHz
+ 0 0 0 3000
+ 0 1 1 3000
+ 0 2 2 3000
+ 0 3 3 3000
+ 0 4 4 3000
+ 0 5 5 3100
+ 0 6 6 3100
+ 0 7 7 3000
+ 0 8 8 3100
+ 0 9 9 3000
+ 0 10 10 3000
+ 0 11 11 3000
+ 0 12 12 3100
+ 0 13 13 3100
+
+Based on turbostat output, the performance is limited by frequency cap of 3100
+MHz. To check if the hackbench performance can be improved for CPU 12 and CPU
+13, first check the capability of the Intel(R) SST-TF feature for this performance
+level.
+
+Get Intel(R) SST-TF Capability
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To get the capability, the "turbo-freq info" command can be used::
+
+ # intel-speed-select turbo-freq info -l 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ speed-select-turbo-freq
+ bucket-0
+ high-priority-cores-count:2
+ high-priority-max-frequency(MHz):3200
+ high-priority-max-avx2-frequency(MHz):3200
+ high-priority-max-avx512-frequency(MHz):3100
+ bucket-1
+ high-priority-cores-count:4
+ high-priority-max-frequency(MHz):3100
+ high-priority-max-avx2-frequency(MHz):3000
+ high-priority-max-avx512-frequency(MHz):2900
+ bucket-2
+ high-priority-cores-count:6
+ high-priority-max-frequency(MHz):3100
+ high-priority-max-avx2-frequency(MHz):3000
+ high-priority-max-avx512-frequency(MHz):2900
+ speed-select-turbo-freq-clip-frequencies
+ low-priority-max-frequency(MHz):2600
+ low-priority-max-avx2-frequency(MHz):2400
+ low-priority-max-avx512-frequency(MHz):2100
+
+Based on the output above, there is an Intel(R) SST-TF bucket for which there are
+two high priority cores. If only two high priority cores are set, then max.
+turbo frequency on those cores can be increased to 3200 MHz. This is 100 MHz
+more than the base turbo capability for all cores.
+
+In turn, for the hackbench workload, two CPUs can be set as high priority and
+rest as low priority. One side effect is that once enabled, the low priority
+cores will be clipped to a lower frequency of 2600 MHz.
+
+Enable Intel(R) SST-TF
+~~~~~~~~~~~~~~~~~~~~~~
+
+To enable Intel(R) SST-TF, execute::
+
+ # intel-speed-select -c 12,13 turbo-freq enable -a
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-12
+ turbo-freq
+ enable:success
+ package-0
+ die-0
+ cpu-13
+ turbo-freq
+ enable:success
+ package--1
+ die-0
+ cpu-63
+ turbo-freq --auto
+ enable:success
+
+In this case, the option "-a" is optional. If set, it enables Intel(R) SST-TF
+feature and also sets the CPUs to high and low priority using Intel Speed
+Select Technology Core Power (Intel(R) SST-CP) features. The CPU numbers passed
+with "-c" arguments are marked as high priority, including its siblings.
+
+If -a option is not used, then the following steps are required before enabling
+Intel(R) SST-TF:
+
+- Discover Intel(R) SST-TF and note buckets of high priority cores and maximum frequency
+
+- Enable CLOS using core-power feature set - Configure CLOS parameters
+
+- Subscribe desired CPUs to CLOS groups making sure that high priority cores are set to the maximum frequency
+
+If the same hackbench workload is executed, schedule hackbench threads on high
+priority CPUs::
+
+ #taskset -c 12,13 perf bench -r 100 sched pipe
+ # Running 'sched/pipe' benchmark:
+ # Executed 1000000 pipe operations between two processes
+ Total time: 5.510 [sec]
+ 5.510165 usecs/op
+ 180826 ops/sec
+
+This improved performance by around 3.3% improvement on a busy system. Here the
+turbostat output will show that the CPU 12 and CPU 13 are getting 100 MHz boost.
+The turbostat output::
+
+ #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
+ Package Core CPU Bzy_MHz
+ ...
+ 0 12 12 3200
+ 0 13 13 3200
diff --git a/Documentation/admin-guide/pm/intel_idle.rst b/Documentation/admin-guide/pm/intel_idle.rst
index 89309e1b0e48..39bd6ecce7de 100644
--- a/Documentation/admin-guide/pm/intel_idle.rst
+++ b/Documentation/admin-guide/pm/intel_idle.rst
@@ -20,8 +20,8 @@ Nehalem and later generations of Intel processors, but the level of support for
a particular processor model in it depends on whether or not it recognizes that
processor model and may also depend on information coming from the platform
firmware. [To understand ``intel_idle`` it is necessary to know how ``CPUIdle``
-works in general, so this is the time to get familiar with :doc:`cpuidle` if you
-have not done that yet.]
+works in general, so this is the time to get familiar with
+Documentation/admin-guide/pm/cpuidle.rst if you have not done that yet.]
``intel_idle`` uses the ``MWAIT`` instruction to inform the processor that the
logical CPU executing it is idle and so it may be possible to put some of the
@@ -53,7 +53,8 @@ processor) corresponding to them depends on the processor model and it may also
depend on the configuration of the platform.
In order to create a list of available idle states required by the ``CPUIdle``
-subsystem (see :ref:`idle-states-representation` in :doc:`cpuidle`),
+subsystem (see :ref:`idle-states-representation` in
+Documentation/admin-guide/pm/cpuidle.rst),
``intel_idle`` can use two sources of information: static tables of idle states
for different processor models included in the driver itself and the ACPI tables
of the system. The former are always used if the processor model at hand is
@@ -98,7 +99,8 @@ states may not be enabled by default if there are no matching entries in the
preliminary list of idle states coming from the ACPI tables. In that case user
space still can enable them later (on a per-CPU basis) with the help of
the ``disable`` idle state attribute in ``sysfs`` (see
-:ref:`idle-states-representation` in :doc:`cpuidle`). This basically means that
+:ref:`idle-states-representation` in
+Documentation/admin-guide/pm/cpuidle.rst). This basically means that
the idle states "known" to the driver may not be enabled by default if they have
not been exposed by the platform firmware (through the ACPI tables).
@@ -168,7 +170,7 @@ and ``idle=nomwait``. If any of them is present in the kernel command line, the
``MWAIT`` instruction is not allowed to be used, so the initialization of
``intel_idle`` will fail.
-Apart from that there are four module parameters recognized by ``intel_idle``
+Apart from that there are five module parameters recognized by ``intel_idle``
itself that can be set via the kernel command line (they cannot be updated via
sysfs, so that is the only way to change their values).
@@ -186,7 +188,8 @@ be desirable. In practice, it is only really necessary to do that if the idle
states in question cannot be enabled during system startup, because in the
working state of the system the CPU power management quality of service (PM
QoS) feature can be used to prevent ``CPUIdle`` from touching those idle states
-even if they have been enumerated (see :ref:`cpu-pm-qos` in :doc:`cpuidle`).
+even if they have been enumerated (see :ref:`cpu-pm-qos` in
+Documentation/admin-guide/pm/cpuidle.rst).
Setting ``max_cstate`` to 0 causes the ``intel_idle`` initialization to fail.
The ``no_acpi`` and ``use_acpi`` module parameters (recognized by ``intel_idle``
@@ -202,7 +205,8 @@ Namely, the positions of the bits that are set in the ``states_off`` value are
the indices of idle states to be disabled by default (as reflected by the names
of the corresponding idle state directories in ``sysfs``, :file:`state0`,
:file:`state1` ... :file:`state<i>` ..., where ``<i>`` is the index of the given
-idle state; see :ref:`idle-states-representation` in :doc:`cpuidle`).
+idle state; see :ref:`idle-states-representation` in
+Documentation/admin-guide/pm/cpuidle.rst).
For example, if ``states_off`` is equal to 3, the driver will disable idle
states 0 and 1 by default, and if it is equal to 8, idle state 3 will be
@@ -212,6 +216,21 @@ are ignored).
The idle states disabled this way can be enabled (on a per-CPU basis) from user
space via ``sysfs``.
+The ``ibrs_off`` module parameter is a boolean flag (defaults to
+false). If set, it is used to control if IBRS (Indirect Branch Restricted
+Speculation) should be turned off when the CPU enters an idle state.
+This flag does not affect CPUs that use Enhanced IBRS which can remain
+on with little performance impact.
+
+For some CPUs, IBRS will be selected as mitigation for Spectre v2 and Retbleed
+security vulnerabilities by default. Leaving the IBRS mode on while idling may
+have a performance impact on its sibling CPU. The IBRS mode will be turned off
+by default when the CPU enters into a deep idle state, but not in some
+shallower ones. Setting the ``ibrs_off`` module parameter will force the IBRS
+mode to off when the CPU is in any one of the available idle states. This may
+help performance of a sibling CPU at the expense of a slightly higher wakeup
+latency for the idle CPU.
+
.. _intel-idle-core-and-package-idle-states:
diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst
index ad392f3aee06..bf13ad25a32f 100644
--- a/Documentation/admin-guide/pm/intel_pstate.rst
+++ b/Documentation/admin-guide/pm/intel_pstate.rst
@@ -18,8 +18,8 @@ General Information
(``CPUFreq``). It is a scaling driver for the Sandy Bridge and later
generations of Intel processors. Note, however, that some of those processors
may not be supported. [To understand ``intel_pstate`` it is necessary to know
-how ``CPUFreq`` works in general, so this is the time to read :doc:`cpufreq` if
-you have not done that yet.]
+how ``CPUFreq`` works in general, so this is the time to read
+Documentation/admin-guide/pm/cpufreq.rst if you have not done that yet.]
For the processors supported by ``intel_pstate``, the P-state concept is broader
than just an operating frequency or an operating performance point (see the
@@ -54,17 +54,21 @@ registered (see `below <status_attr_>`_).
Operation Modes
===============
-``intel_pstate`` can operate in three different modes: in the active mode with
-or without hardware-managed P-states support and in the passive mode. Which of
-them will be in effect depends on what kernel command line options are used and
-on the capabilities of the processor.
+``intel_pstate`` can operate in two different modes, active or passive. In the
+active mode, it uses its own internal performance scaling governor algorithm or
+allows the hardware to do performance scaling by itself, while in the passive
+mode it responds to requests made by a generic ``CPUFreq`` governor implementing
+a certain performance scaling algorithm. Which of them will be in effect
+depends on what kernel command line options are used and on the capabilities of
+the processor.
Active Mode
-----------
-This is the default operation mode of ``intel_pstate``. If it works in this
-mode, the ``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq``
-policies contains the string "intel_pstate".
+This is the default operation mode of ``intel_pstate`` for processors with
+hardware-managed P-states (HWP) support. If it works in this mode, the
+``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` policies
+contains the string "intel_pstate".
In this mode the driver bypasses the scaling governors layer of ``CPUFreq`` and
provides its own scaling algorithms for P-state selection. Those algorithms
@@ -119,7 +123,9 @@ Energy-Performance Bias (EPB) knob (otherwise), which means that the processor's
internal P-state selection logic is expected to focus entirely on performance.
This will override the EPP/EPB setting coming from the ``sysfs`` interface
-(see `Energy vs Performance Hints`_ below).
+(see `Energy vs Performance Hints`_ below). Moreover, any attempts to change
+the EPP/EPB to a value different from 0 ("performance") via ``sysfs`` in this
+configuration will be rejected.
Also, in this configuration the range of P-states available to the processor's
internal P-state selection logic is always restricted to the upper boundary
@@ -138,12 +144,13 @@ internal P-state selection logic to be less performance-focused.
Active Mode Without HWP
~~~~~~~~~~~~~~~~~~~~~~~
-This is the default operation mode for processors that do not support the HWP
-feature. It also is used by default with the ``intel_pstate=no_hwp`` argument
-in the kernel command line. However, in this mode ``intel_pstate`` may refuse
-to work with the given processor if it does not recognize it. [Note that
-``intel_pstate`` will never refuse to work with any processor with the HWP
-feature enabled.]
+This operation mode is optional for processors that do not support the HWP
+feature or when the ``intel_pstate=no_hwp`` argument is passed to the kernel in
+the command line. The active mode is used in those cases if the
+``intel_pstate=active`` argument is passed to the kernel in the command line.
+In this mode ``intel_pstate`` may refuse to work with processors that are not
+recognized by it. [Note that ``intel_pstate`` will never refuse to work with
+any processor with the HWP feature enabled.]
In this mode ``intel_pstate`` registers utilization update callbacks with the
CPU scheduler in order to run a P-state selection algorithm, either
@@ -188,10 +195,15 @@ is not set.
Passive Mode
------------
-This mode is used if the ``intel_pstate=passive`` argument is passed to the
-kernel in the command line (it implies the ``intel_pstate=no_hwp`` setting too).
-Like in the active mode without HWP support, in this mode ``intel_pstate`` may
-refuse to work with the given processor if it does not recognize it.
+This is the default operation mode of ``intel_pstate`` for processors without
+hardware-managed P-states (HWP) support. It is always used if the
+``intel_pstate=passive`` argument is passed to the kernel in the command line
+regardless of whether or not the given processor supports HWP. [Note that the
+``intel_pstate=no_hwp`` setting causes the driver to start in the passive mode
+if it is not combined with ``intel_pstate=active``.] Like in the active mode
+without HWP support, in this mode ``intel_pstate`` may refuse to work with
+processors that are not recognized by it if HWP is prevented from being enabled
+through the kernel command line.
If the driver works in this mode, the ``scaling_driver`` policy attribute in
``sysfs`` for all ``CPUFreq`` policies contains the string "intel_cpufreq".
@@ -312,10 +324,9 @@ manuals need to be consulted to get to it too.
For this reason, there is a list of supported processors in ``intel_pstate`` and
the driver initialization will fail if the detected processor is not in that
-list, unless it supports the `HWP feature <Active Mode_>`_. [The interface to
-obtain all of the information listed above is the same for all of the processors
-supporting the HWP feature, which is why they all are supported by
-``intel_pstate``.]
+list, unless it supports the HWP feature. [The interface to obtain all of the
+information listed above is the same for all of the processors supporting the
+HWP feature, which is why ``intel_pstate`` works with all of them.]
User Space Interface in ``sysfs``
@@ -354,6 +365,9 @@ argument is passed to the kernel in the command line.
inclusive) including both turbo and non-turbo P-states (see
`Turbo P-states Support`_).
+ This attribute is present only if the value exposed by it is the same
+ for all of the CPUs in the system.
+
The value of this attribute is not affected by the ``no_turbo``
setting described `below <no_turbo_attr_>`_.
@@ -363,19 +377,22 @@ argument is passed to the kernel in the command line.
Ratio of the `turbo range <turbo_>`_ size to the size of the entire
range of supported P-states, in percent.
+ This attribute is present only if the value exposed by it is the same
+ for all of the CPUs in the system.
+
This attribute is read-only.
.. _no_turbo_attr:
``no_turbo``
If set (equal to 1), the driver is not allowed to set any turbo P-states
- (see `Turbo P-states Support`_). If unset (equalt to 0, which is the
+ (see `Turbo P-states Support`_). If unset (equal to 0, which is the
default), turbo P-states can be set by the driver.
[Note that ``intel_pstate`` does not support the general ``boost``
attribute (supported by some other scaling drivers) which is replaced
by this one.]
- This attrubute does not affect the maximum supported frequency value
+ This attribute does not affect the maximum supported frequency value
supplied to the ``CPUFreq`` core and exposed via the policy interface,
but it affects the maximum possible value of per-policy P-state limits
(see `Interpretation of Policy Attributes`_ below for details).
@@ -419,18 +436,24 @@ argument is passed to the kernel in the command line.
as well as the per-policy ones) are then reset to their default
values, possibly depending on the target operation mode.]
- That only is supported in some configurations, though (for example, if
- the `HWP feature is enabled in the processor <Active Mode With HWP_>`_,
- the operation mode of the driver cannot be changed), and if it is not
- supported in the current configuration, writes to this attribute will
- fail with an appropriate error.
+``energy_efficiency``
+ This attribute is only present on platforms with CPUs matching the Kaby
+ Lake or Coffee Lake desktop CPU model. By default, energy-efficiency
+ optimizations are disabled on these CPU models if HWP is enabled.
+ Enabling energy-efficiency optimizations may limit maximum operating
+ frequency with or without the HWP feature. With HWP enabled, the
+ optimizations are done only in the turbo frequency range. Without it,
+ they are done in the entire available frequency range. Setting this
+ attribute to "1" enables the energy-efficiency optimizations and setting
+ to "0" disables them.
Interpretation of Policy Attributes
-----------------------------------
The interpretation of some ``CPUFreq`` policy attributes described in
-:doc:`cpufreq` is special with ``intel_pstate`` as the current scaling driver
-and it generally depends on the driver's `operation mode <Operation Modes_>`_.
+Documentation/admin-guide/pm/cpufreq.rst is special with ``intel_pstate``
+as the current scaling driver and it generally depends on the driver's
+`operation mode <Operation Modes_>`_.
First of all, the values of the ``cpuinfo_max_freq``, ``cpuinfo_min_freq`` and
``scaling_cur_freq`` attributes are produced by applying a processor-specific
@@ -467,8 +490,8 @@ Next, the following policy attributes have special meaning if
policy for the time interval between the last two invocations of the
driver's utilization update callback by the CPU scheduler for that CPU.
-One more policy attribute is present if the `HWP feature is enabled in the
-processor <Active Mode With HWP_>`_:
+One more policy attribute is present if the HWP feature is enabled in the
+processor:
``base_frequency``
Shows the base frequency of the CPU. Any frequency above this will be
@@ -509,11 +532,11 @@ on the following rules, regardless of the current operation mode of the driver:
3. The global and per-policy limits can be set independently.
-If the `HWP feature is enabled in the processor <Active Mode With HWP_>`_, the
-resulting effective values are written into its registers whenever the limits
-change in order to request its internal P-state selection logic to always set
-P-states within these limits. Otherwise, the limits are taken into account by
-scaling governors (in the `passive mode <Passive Mode_>`_) and by the driver
+In the `active mode with the HWP feature enabled <Active Mode With HWP_>`_, the
+resulting effective values are written into hardware registers whenever the
+limits change in order to request its internal P-state selection logic to always
+set P-states within these limits. Otherwise, the limits are taken into account
+by scaling governors (in the `passive mode <Passive Mode_>`_) and by the driver
every time before setting a new P-state for a CPU.
Additionally, if the ``intel_pstate=per_cpu_perf_limits`` command line argument
@@ -524,12 +547,11 @@ at all and the only way to set the limits is by using the policy attributes.
Energy vs Performance Hints
---------------------------
-If ``intel_pstate`` works in the `active mode with the HWP feature enabled
-<Active Mode With HWP_>`_ in the processor, additional attributes are present
-in every ``CPUFreq`` policy directory in ``sysfs``. They are intended to allow
-user space to help ``intel_pstate`` to adjust the processor's internal P-state
-selection logic by focusing it on performance or on energy-efficiency, or
-somewhere between the two extremes:
+If the hardware-managed P-states (HWP) is enabled in the processor, additional
+attributes, intended to allow user space to help ``intel_pstate`` to adjust the
+processor's internal P-state selection logic by focusing it on performance or on
+energy-efficiency, or somewhere between the two extremes, are present in every
+``CPUFreq`` policy directory in ``sysfs``. They are :
``energy_performance_preference``
Current value of the energy vs performance hint for the given policy
@@ -548,7 +570,11 @@ somewhere between the two extremes:
Strings written to the ``energy_performance_preference`` attribute are
internally translated to integer values written to the processor's
Energy-Performance Preference (EPP) knob (if supported) or its
-Energy-Performance Bias (EPB) knob.
+Energy-Performance Bias (EPB) knob. It is also possible to write a positive
+integer value between 0 to 255, if the EPP feature is present. If the EPP
+feature is not present, writing integer value to this attribute is not
+supported. In this case, user can use the
+"/sys/devices/system/cpu/cpu*/power/energy_perf_bias" interface.
[Note that tasks may by migrated from one CPU to another by the scheduler's
load-balancing algorithm and if different energy vs performance hints are
@@ -629,12 +655,14 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
Do not register ``intel_pstate`` as the scaling driver even if the
processor is supported by it.
+``active``
+ Register ``intel_pstate`` in the `active mode <Active Mode_>`_ to start
+ with.
+
``passive``
Register ``intel_pstate`` in the `passive mode <Passive Mode_>`_ to
start with.
- This option implies the ``no_hwp`` one described below.
-
``force``
Register ``intel_pstate`` as the scaling driver instead of
``acpi-cpufreq`` even if the latter is preferred on the given system.
@@ -649,13 +677,12 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
driver is used instead of ``acpi-cpufreq``.
``no_hwp``
- Do not enable the `hardware-managed P-states (HWP) feature
- <Active Mode With HWP_>`_ even if it is supported by the processor.
+ Do not enable the hardware-managed P-states (HWP) feature even if it is
+ supported by the processor.
``hwp_only``
Register ``intel_pstate`` as the scaling driver only if the
- `hardware-managed P-states (HWP) feature <Active Mode With HWP_>`_ is
- supported by the processor.
+ hardware-managed P-states (HWP) feature is supported by the processor.
``support_acpi_ppc``
Take ACPI ``_PPC`` performance limits into account.
@@ -685,7 +712,7 @@ it works in the `active mode <Active Mode_>`_.
The following sequence of shell commands can be used to enable them and see
their output (if the kernel is generally configured to support event tracing)::
- # cd /sys/kernel/debug/tracing/
+ # cd /sys/kernel/tracing/
# echo 1 > events/power/pstate_sample/enable
# echo 1 > events/power/cpu_frequency/enable
# cat trace
@@ -702,10 +729,10 @@ core (for the policies with other scaling governors).
The ``ftrace`` interface can be used for low-level diagnostics of
``intel_pstate``. For example, to check how often the function to set a
-P-state is called, the ``ftrace`` filter can be set to to
+P-state is called, the ``ftrace`` filter can be set to
:c:func:`intel_pstate_set_pstate`::
- # cd /sys/kernel/debug/tracing/
+ # cd /sys/kernel/tracing/
# cat available_filter_functions | grep -i pstate
intel_pstate_set_pstate
intel_pstate_cpu_init
diff --git a/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst b/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst
new file mode 100644
index 000000000000..5ab3440e6cee
--- /dev/null
+++ b/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst
@@ -0,0 +1,115 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+==============================
+Intel Uncore Frequency Scaling
+==============================
+
+:Copyright: |copy| 2022-2023 Intel Corporation
+
+:Author: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
+
+Introduction
+------------
+
+The uncore can consume significant amount of power in Intel's Xeon servers based
+on the workload characteristics. To optimize the total power and improve overall
+performance, SoCs have internal algorithms for scaling uncore frequency. These
+algorithms monitor workload usage of uncore and set a desirable frequency.
+
+It is possible that users have different expectations of uncore performance and
+want to have control over it. The objective is similar to allowing users to set
+the scaling min/max frequencies via cpufreq sysfs to improve CPU performance.
+Users may have some latency sensitive workloads where they do not want any
+change to uncore frequency. Also, users may have workloads which require
+different core and uncore performance at distinct phases and they may want to
+use both cpufreq and the uncore scaling interface to distribute power and
+improve overall performance.
+
+Sysfs Interface
+---------------
+
+To control uncore frequency, a sysfs interface is provided in the directory:
+`/sys/devices/system/cpu/intel_uncore_frequency/`.
+
+There is one directory for each package and die combination as the scope of
+uncore scaling control is per die in multiple die/package SoCs or per
+package for single die per package SoCs. The name represents the
+scope of control. For example: 'package_00_die_00' is for package id 0 and
+die 0.
+
+Each package_*_die_* contains the following attributes:
+
+``initial_max_freq_khz``
+ Out of reset, this attribute represent the maximum possible frequency.
+ This is a read-only attribute. If users adjust max_freq_khz,
+ they can always go back to maximum using the value from this attribute.
+
+``initial_min_freq_khz``
+ Out of reset, this attribute represent the minimum possible frequency.
+ This is a read-only attribute. If users adjust min_freq_khz,
+ they can always go back to minimum using the value from this attribute.
+
+``max_freq_khz``
+ This attribute is used to set the maximum uncore frequency.
+
+``min_freq_khz``
+ This attribute is used to set the minimum uncore frequency.
+
+``current_freq_khz``
+ This attribute is used to get the current uncore frequency.
+
+SoCs with TPMI (Topology Aware Register and PM Capsule Interface)
+-----------------------------------------------------------------
+
+An SoC can contain multiple power domains with individual or collection
+of mesh partitions. This partition is called fabric cluster.
+
+Certain type of meshes will need to run at the same frequency, they will
+be placed in the same fabric cluster. Benefit of fabric cluster is that it
+offers a scalable mechanism to deal with partitioned fabrics in a SoC.
+
+The current sysfs interface supports controls at package and die level.
+This interface is not enough to support more granular control at
+fabric cluster level.
+
+SoCs with the support of TPMI (Topology Aware Register and PM Capsule
+Interface), can have multiple power domains. Each power domain can
+contain one or more fabric clusters.
+
+To represent controls at fabric cluster level in addition to the
+controls at package and die level (like systems without TPMI
+support), sysfs is enhanced. This granular interface is presented in the
+sysfs with directories names prefixed with "uncore". For example:
+uncore00, uncore01 etc.
+
+The scope of control is specified by attributes "package_id", "domain_id"
+and "fabric_cluster_id" in the directory.
+
+Attributes in each directory:
+
+``domain_id``
+ This attribute is used to get the power domain id of this instance.
+
+``fabric_cluster_id``
+ This attribute is used to get the fabric cluster id of this instance.
+
+``package_id``
+ This attribute is used to get the package id of this instance.
+
+The other attributes are same as presented at package_*_die_* level.
+
+In most of current use cases, the "max_freq_khz" and "min_freq_khz"
+is updated at "package_*_die_*" level. This model will be still supported
+with the following approach:
+
+When user uses controls at "package_*_die_*" level, then every fabric
+cluster is affected in that package and die. For example: user changes
+"max_freq_khz" in the package_00_die_00, then "max_freq_khz" for uncore*
+directory with the same package id will be updated. In this case user can
+still update "max_freq_khz" at each uncore* level, which is more restrictive.
+Similarly, user can update "min_freq_khz" at "package_*_die_*" level
+to apply at each uncore* level.
+
+Support for "current_freq_khz" is available only at each fabric cluster
+level (i.e., in uncore* directory).
diff --git a/Documentation/admin-guide/pm/working-state.rst b/Documentation/admin-guide/pm/working-state.rst
index 0a38cdf39df1..ee45887811ff 100644
--- a/Documentation/admin-guide/pm/working-state.rst
+++ b/Documentation/admin-guide/pm/working-state.rst
@@ -11,5 +11,8 @@ Working-State Power Management
intel_idle
cpufreq
intel_pstate
+ amd-pstate
cpufreq_drivers
intel_epb
+ intel-speed-select
+ intel_uncore_frequency_scaling
diff --git a/Documentation/admin-guide/pmf.rst b/Documentation/admin-guide/pmf.rst
new file mode 100644
index 000000000000..9ee729ffc19b
--- /dev/null
+++ b/Documentation/admin-guide/pmf.rst
@@ -0,0 +1,24 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Set udev rules for PMF Smart PC Builder
+---------------------------------------
+
+AMD PMF(Platform Management Framework) Smart PC Solution builder has to set the system states
+like S0i3, Screen lock, hibernate etc, based on the output actions provided by the PMF
+TA (Trusted Application).
+
+In order for this to work the PMF driver generates a uevent for userspace to react to. Below are
+sample udev rules that can facilitate this experience when a machine has PMF Smart PC solution builder
+enabled.
+
+Please add the following line(s) to
+``/etc/udev/rules.d/99-local.rules``::
+
+ DRIVERS=="amd-pmf", ACTION=="change", ENV{EVENT_ID}=="0", RUN+="/usr/bin/systemctl suspend"
+ DRIVERS=="amd-pmf", ACTION=="change", ENV{EVENT_ID}=="1", RUN+="/usr/bin/systemctl hibernate"
+ DRIVERS=="amd-pmf", ACTION=="change", ENV{EVENT_ID}=="2", RUN+="/bin/loginctl lock-sessions"
+
+EVENT_ID values:
+0= Put the system to S0i3/S2Idle
+1= Put the system to hibernate
+2= Lock the screen
diff --git a/Documentation/admin-guide/pnp.rst b/Documentation/admin-guide/pnp.rst
index bab2d10631f0..3eda08191d13 100644
--- a/Documentation/admin-guide/pnp.rst
+++ b/Documentation/admin-guide/pnp.rst
@@ -281,10 +281,6 @@ ISAPNP drivers. They should serve as a temporary solution only.
They are as follows::
- struct pnp_card *pnp_find_card(unsigned short vendor,
- unsigned short device,
- struct pnp_card *from)
-
struct pnp_dev *pnp_find_dev(struct pnp_card *card,
unsigned short vendor,
unsigned short function,
diff --git a/Documentation/admin-guide/pstore-blk.rst b/Documentation/admin-guide/pstore-blk.rst
new file mode 100644
index 000000000000..1bb2a1c292aa
--- /dev/null
+++ b/Documentation/admin-guide/pstore-blk.rst
@@ -0,0 +1,234 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+pstore block oops/panic logger
+==============================
+
+Introduction
+------------
+
+pstore block (pstore/blk) is an oops/panic logger that writes its logs to a
+block device and non-block device before the system crashes. You can get
+these log files by mounting pstore filesystem like::
+
+ mount -t pstore pstore /sys/fs/pstore
+
+
+pstore block concepts
+---------------------
+
+pstore/blk provides efficient configuration method for pstore/blk, which
+divides all configurations into two parts, configurations for user and
+configurations for driver.
+
+Configurations for user determine how pstore/blk works, such as pmsg_size,
+kmsg_size and so on. All of them support both Kconfig and module parameters,
+but module parameters have priority over Kconfig.
+
+Configurations for driver are all about block device and non-block device,
+such as total_size of block device and read/write operations.
+
+Configurations for user
+-----------------------
+
+All of these configurations support both Kconfig and module parameters, but
+module parameters have priority over Kconfig.
+
+Here is an example for module parameters::
+
+ pstore_blk.blkdev=/dev/mmcblk0p7 pstore_blk.kmsg_size=64 best_effort=y
+
+The detail of each configurations may be of interest to you.
+
+blkdev
+~~~~~~
+
+The block device to use. Most of the time, it is a partition of block device.
+It's required for pstore/blk. It is also used for MTD device.
+
+When pstore/blk is built as a module, "blkdev" accepts the following variants:
+
+1. /dev/<disk_name> represents the device number of disk
+#. /dev/<disk_name><decimal> represents the device number of partition - device
+ number of disk plus the partition number
+#. /dev/<disk_name>p<decimal> - same as the above; this form is used when disk
+ name of partitioned disk ends with a digit.
+
+When pstore/blk is built into the kernel, "blkdev" accepts the following variants:
+
+#. <hex_major><hex_minor> device number in hexadecimal representation,
+ with no leading 0x, for example b302.
+#. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF represents the unique id of
+ a partition if the partition table provides it. The UUID may be either an
+ EFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP,
+ where SSSSSSSS is a zero-filled hex representation of the 32-bit
+ "NT disk signature", and PP is a zero-filled hex representation of the
+ 1-based partition number.
+#. PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to a
+ partition with a known unique id.
+#. <major>:<minor> major and minor number of the device separated by a colon.
+
+It accepts the following variants for MTD device:
+
+1. <device name> MTD device name. "pstore" is recommended.
+#. <device number> MTD device number.
+
+kmsg_size
+~~~~~~~~~
+
+The chunk size in KB for oops/panic front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care about the oops/panic log.
+
+There are multiple chunks for oops/panic front-end depending on the remaining
+space except other pstore front-ends.
+
+pstore/blk will log to oops/panic chunks one by one, and always overwrite the
+oldest chunk if there is no more free chunk.
+
+pmsg_size
+~~~~~~~~~
+
+The chunk size in KB for pmsg front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care about the pmsg log.
+
+Unlike oops/panic front-end, there is only one chunk for pmsg front-end.
+
+Pmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are
+appended to the chunk. On reboot the contents are available in
+*/sys/fs/pstore/pmsg-pstore-blk-0*.
+
+console_size
+~~~~~~~~~~~~
+
+The chunk size in KB for console front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care about the console log.
+
+Similar to pmsg front-end, there is only one chunk for console front-end.
+
+All log of console will be appended to the chunk. On reboot the contents are
+available in */sys/fs/pstore/console-pstore-blk-0*.
+
+ftrace_size
+~~~~~~~~~~~
+
+The chunk size in KB for ftrace front-end. It **MUST** be a multiple of 4.
+It's optional if you do not care about the ftrace log.
+
+Similar to oops front-end, there are multiple chunks for ftrace front-end
+depending on the count of cpu processors. Each chunk size is equal to
+ftrace_size / processors_count.
+
+All log of ftrace will be appended to the chunk. On reboot the contents are
+combined and available in */sys/fs/pstore/ftrace-pstore-blk-0*.
+
+Persistent function tracing might be useful for debugging software or hardware
+related hangs. Here is an example of usage::
+
+ # mount -t pstore pstore /sys/fs/pstore
+ # mount -t debugfs debugfs /sys/kernel/debug/
+ # echo 1 > /sys/kernel/debug/pstore/record_ftrace
+ # reboot -f
+ [...]
+ # mount -t pstore pstore /sys/fs/pstore
+ # tail /sys/fs/pstore/ftrace-pstore-blk-0
+ CPU:0 ts:5914676 c0063828 c0063b94 call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0
+ CPU:0 ts:5914678 c039ecdc c006385c cpuidle_enter_state <- call_cpuidle+0x44/0x48
+ CPU:0 ts:5914680 c039e9a0 c039ecf0 cpuidle_enter_freeze <- cpuidle_enter_state+0x304/0x314
+ CPU:0 ts:5914681 c0063870 c039ea30 sched_idle_set_state <- cpuidle_enter_state+0x44/0x314
+ CPU:1 ts:5916720 c0160f59 c015ee04 kernfs_unmap_bin_file <- __kernfs_remove+0x140/0x204
+ CPU:1 ts:5916721 c05ca625 c015ee0c __mutex_lock_slowpath <- __kernfs_remove+0x148/0x204
+ CPU:1 ts:5916723 c05c813d c05ca630 yield_to <- __mutex_lock_slowpath+0x314/0x358
+ CPU:1 ts:5916724 c05ca2d1 c05ca638 __ww_mutex_lock <- __mutex_lock_slowpath+0x31c/0x358
+
+max_reason
+~~~~~~~~~~
+
+Limiting which kinds of kmsg dumps are stored can be controlled via
+the ``max_reason`` value, as defined in include/linux/kmsg_dump.h's
+``enum kmsg_dump_reason``. For example, to store both Oopses and Panics,
+``max_reason`` should be set to 2 (KMSG_DUMP_OOPS), to store only Panics
+``max_reason`` should be set to 1 (KMSG_DUMP_PANIC). Setting this to 0
+(KMSG_DUMP_UNDEF), means the reason filtering will be controlled by the
+``printk.always_kmsg_dump`` boot param: if unset, it'll be KMSG_DUMP_OOPS,
+otherwise KMSG_DUMP_MAX.
+
+Configurations for driver
+-------------------------
+
+A device driver uses ``register_pstore_device`` with
+``struct pstore_device_info`` to register to pstore/blk.
+
+.. kernel-doc:: fs/pstore/blk.c
+ :export:
+
+Compression and header
+----------------------
+
+Block device is large enough for uncompressed oops data. Actually we do not
+recommend data compression because pstore/blk will insert some information into
+the first line of oops/panic data. For example::
+
+ Panic: Total 16 times
+
+It means that it's OOPS|Panic for the 16th time since the first booting.
+Sometimes the number of occurrences of oops|panic since the first booting is
+important to judge whether the system is stable.
+
+The following line is inserted by pstore filesystem. For example::
+
+ Oops#2 Part1
+
+It means that it's OOPS for the 2nd time on the last boot.
+
+Reading the data
+----------------
+
+The dump data can be read from the pstore filesystem. The format for these
+files is ``dmesg-pstore-blk-[N]`` for oops/panic front-end,
+``pmsg-pstore-blk-0`` for pmsg front-end and so on. The timestamp of the
+dump file records the trigger time. To delete a stored record from block
+device, simply unlink the respective pstore file.
+
+Attentions in panic read/write APIs
+-----------------------------------
+
+If on panic, the kernel is not going to run for much longer, the tasks will not
+be scheduled and most kernel resources will be out of service. It
+looks like a single-threaded program running on a single-core computer.
+
+The following points require special attention for panic read/write APIs:
+
+1. Can **NOT** allocate any memory.
+ If you need memory, just allocate while the block driver is initializing
+ rather than waiting until the panic.
+#. Must be polled, **NOT** interrupt driven.
+ No task schedule any more. The block driver should delay to ensure the write
+ succeeds, but NOT sleep.
+#. Can **NOT** take any lock.
+ There is no other task, nor any shared resource; you are safe to break all
+ locks.
+#. Just use CPU to transfer.
+ Do not use DMA to transfer unless you are sure that DMA will not keep lock.
+#. Control registers directly.
+ Please control registers directly rather than use Linux kernel resources.
+ Do I/O map while initializing rather than wait until a panic occurs.
+#. Reset your block device and controller if necessary.
+ If you are not sure of the state of your block device and controller when
+ a panic occurs, you are safe to stop and reset them.
+
+pstore/blk supports psblk_blkdev_info(), which is defined in
+*linux/pstore_blk.h*, to get information of using block device, such as the
+device number, sector count and start sector of the whole disk.
+
+pstore block internals
+----------------------
+
+For developer reference, here are all the important structures and APIs:
+
+.. kernel-doc:: fs/pstore/zone.c
+ :internal:
+
+.. kernel-doc:: include/linux/pstore_zone.h
+ :internal:
+
+.. kernel-doc:: include/linux/pstore_blk.h
+ :internal:
diff --git a/Documentation/admin-guide/quickly-build-trimmed-linux.rst b/Documentation/admin-guide/quickly-build-trimmed-linux.rst
new file mode 100644
index 000000000000..f08149bc53f8
--- /dev/null
+++ b/Documentation/admin-guide/quickly-build-trimmed-linux.rst
@@ -0,0 +1,1097 @@
+.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
+.. [see the bottom of this file for redistribution information]
+
+===========================================
+How to quickly build a trimmed Linux kernel
+===========================================
+
+This guide explains how to swiftly build Linux kernels that are ideal for
+testing purposes, but perfectly fine for day-to-day use, too.
+
+The essence of the process (aka 'TL;DR')
+========================================
+
+*[If you are new to compiling Linux, ignore this TLDR and head over to the next
+section below: it contains a step-by-step guide, which is more detailed, but
+still brief and easy to follow; that guide and its accompanying reference
+section also mention alternatives, pitfalls, and additional aspects, all of
+which might be relevant for you.]*
+
+If your system uses techniques like Secure Boot, prepare it to permit starting
+self-compiled Linux kernels; install compilers and everything else needed for
+building Linux; make sure to have 12 Gigabyte free space in your home directory.
+Now run the following commands to download fresh Linux mainline sources, which
+you then use to configure, build and install your own kernel::
+
+ git clone --depth 1 -b master \
+ https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git ~/linux/
+ cd ~/linux/
+ # Hint: if you want to apply patches, do it at this point. See below for details.
+ # Hint: it's recommended to tag your build at this point. See below for details.
+ yes "" | make localmodconfig
+ # Hint: at this point you might want to adjust the build configuration; you'll
+ # have to, if you are running Debian. See below for details.
+ make -j $(nproc --all)
+ # Note: on many commodity distributions the next command suffices, but on Arch
+ # Linux, its derivatives, and some others it does not. See below for details.
+ command -v installkernel && sudo make modules_install install
+ reboot
+
+If you later want to build a newer mainline snapshot, use these commands::
+
+ cd ~/linux/
+ git fetch --depth 1 origin
+ # Note: the next command will discard any changes you did to the code:
+ git checkout --force --detach origin/master
+ # Reminder: if you want to (re)apply patches, do it at this point.
+ # Reminder: you might want to add or modify a build tag at this point.
+ make olddefconfig
+ make -j $(nproc --all)
+ # Reminder: the next command on some distributions does not suffice.
+ command -v installkernel && sudo make modules_install install
+ reboot
+
+Step-by-step guide
+==================
+
+Compiling your own Linux kernel is easy in principle. There are various ways to
+do it. Which of them actually work and is the best depends on the circumstances.
+
+This guide describes a way perfectly suited for those who want to quickly
+install Linux from sources without being bothered by complicated details; the
+goal is to cover everything typically needed on mainstream Linux distributions
+running on commodity PC or server hardware.
+
+The described approach is great for testing purposes, for example to try a
+proposed fix or to check if a problem was already fixed in the latest codebase.
+Nonetheless, kernels built this way are also totally fine for day-to-day use
+while at the same time being easy to keep up to date.
+
+The following steps describe the important aspects of the process; a
+comprehensive reference section later explains each of them in more detail. It
+sometimes also describes alternative approaches, pitfalls, as well as errors
+that might occur at a particular point -- and how to then get things rolling
+again.
+
+..
+ Note: if you see this note, you are reading the text's source file. You
+ might want to switch to a rendered version, as it makes it a lot easier to
+ quickly look something up in the reference section and afterwards jump back
+ to where you left off. Find a the latest rendered version here:
+ https://docs.kernel.org/admin-guide/quickly-build-trimmed-linux.html
+
+.. _backup_sbs:
+
+ * Create a fresh backup and put system repair and restore tools at hand, just
+ to be prepared for the unlikely case of something going sideways.
+
+ [:ref:`details<backup>`]
+
+.. _secureboot_sbs:
+
+ * On platforms with 'Secure Boot' or similar techniques, prepare everything to
+ ensure the system will permit your self-compiled kernel to boot later. The
+ quickest and easiest way to achieve this on commodity x86 systems is to
+ disable such techniques in the BIOS setup utility; alternatively, remove
+ their restrictions through a process initiated by
+ ``mokutil --disable-validation``.
+
+ [:ref:`details<secureboot>`]
+
+.. _buildrequires_sbs:
+
+ * Install all software required to build a Linux kernel. Often you will need:
+ 'bc', 'binutils' ('ld' et al.), 'bison', 'flex', 'gcc', 'git', 'openssl',
+ 'pahole', 'perl', and the development headers for 'libelf' and 'openssl'. The
+ reference section shows how to quickly install those on various popular Linux
+ distributions.
+
+ [:ref:`details<buildrequires>`]
+
+.. _diskspace_sbs:
+
+ * Ensure to have enough free space for building and installing Linux. For the
+ latter 150 Megabyte in /lib/ and 100 in /boot/ are a safe bet. For storing
+ sources and build artifacts 12 Gigabyte in your home directory should
+ typically suffice. If you have less available, be sure to check the reference
+ section for the step that explains adjusting your kernels build
+ configuration: it mentions a trick that reduce the amount of required space
+ in /home/ to around 4 Gigabyte.
+
+ [:ref:`details<diskspace>`]
+
+.. _sources_sbs:
+
+ * Retrieve the sources of the Linux version you intend to build; then change
+ into the directory holding them, as all further commands in this guide are
+ meant to be executed from there.
+
+ *[Note: the following paragraphs describe how to retrieve the sources by
+ partially cloning the Linux stable git repository. This is called a shallow
+ clone. The reference section explains two alternatives:* :ref:`packaged
+ archives<sources_archive>` *and* :ref:`a full git clone<sources_full>` *;
+ prefer the latter, if downloading a lot of data does not bother you, as that
+ will avoid some* :ref:`peculiar characteristics of shallow clones the
+ reference section explains<sources_shallow>` *.]*
+
+ First, execute the following command to retrieve a fresh mainline codebase::
+
+ git clone --no-checkout --depth 1 -b master \
+ https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git ~/linux/
+ cd ~/linux/
+
+ If you want to access recent mainline releases and pre-releases, deepen you
+ clone's history to the oldest mainline version you are interested in::
+
+ git fetch --shallow-exclude=v6.0 origin
+
+ In case you want to access a stable/longterm release (say v6.1.5), simply add
+ the branch holding that series; afterwards fetch the history at least up to
+ the mainline version that started the series (v6.1)::
+
+ git remote set-branches --add origin linux-6.1.y
+ git fetch --shallow-exclude=v6.0 origin
+
+ Now checkout the code you are interested in. If you just performed the
+ initial clone, you will be able to check out a fresh mainline codebase, which
+ is ideal for checking whether developers already fixed an issue::
+
+ git checkout --detach origin/master
+
+ If you deepened your clone, you instead of ``origin/master`` can specify the
+ version you deepened to (``v6.0`` above); later releases like ``v6.1`` and
+ pre-release like ``v6.2-rc1`` will work, too. Stable or longterm versions
+ like ``v6.1.5`` work just the same, if you added the appropriate
+ stable/longterm branch as described.
+
+ [:ref:`details<sources>`]
+
+.. _patching_sbs:
+
+ * In case you want to apply a kernel patch, do so now. Often a command like
+ this will do the trick::
+
+ patch -p1 < ../proposed-fix.patch
+
+ If the ``-p1`` is actually needed, depends on how the patch was created; in
+ case it does not apply thus try without it.
+
+ If you cloned the sources with git and anything goes sideways, run ``git
+ reset --hard`` to undo any changes to the sources.
+
+ [:ref:`details<patching>`]
+
+.. _tagging_sbs:
+
+ * If you patched your kernel or have one of the same version installed already,
+ better add a unique tag to the one you are about to build::
+
+ echo "-proposed_fix" > localversion
+
+ Running ``uname -r`` under your kernel later will then print something like
+ '6.1-rc4-proposed_fix'.
+
+ [:ref:`details<tagging>`]
+
+ .. _configuration_sbs:
+
+ * Create the build configuration for your kernel based on an existing
+ configuration.
+
+ If you already prepared such a '.config' file yourself, copy it to
+ ~/linux/ and run ``make olddefconfig``.
+
+ Use the same command, if your distribution or somebody else already tailored
+ your running kernel to your or your hardware's needs: the make target
+ 'olddefconfig' will then try to use that kernel's .config as base.
+
+ Using this make target is fine for everybody else, too -- but you often can
+ save a lot of time by using this command instead::
+
+ yes "" | make localmodconfig
+
+ This will try to pick your distribution's kernel as base, but then disable
+ modules for any features apparently superfluous for your setup. This will
+ reduce the compile time enormously, especially if you are running an
+ universal kernel from a commodity Linux distribution.
+
+ There is a catch: 'localmodconfig' is likely to disable kernel features you
+ did not use since you booted your Linux -- like drivers for currently
+ disconnected peripherals or a virtualization software not haven't used yet.
+ You can reduce or nearly eliminate that risk with tricks the reference
+ section outlines; but when building a kernel just for quick testing purposes
+ it is often negligible if such features are missing. But you should keep that
+ aspect in mind when using a kernel built with this make target, as it might
+ be the reason why something you only use occasionally stopped working.
+
+ [:ref:`details<configuration>`]
+
+.. _configmods_sbs:
+
+ * Check if you might want to or have to adjust some kernel configuration
+ options:
+
+ * Evaluate how you want to handle debug symbols. Enable them, if you later
+ might need to decode a stack trace found for example in a 'panic', 'Oops',
+ 'warning', or 'BUG'; on the other hand disable them, if you are short on
+ storage space or prefer a smaller kernel binary. See the reference section
+ for details on how to do either. If neither applies, it will likely be fine
+ to simply not bother with this. [:ref:`details<configmods_debugsymbols>`]
+
+ * Are you running Debian? Then to avoid known problems by performing
+ additional adjustments explained in the reference section.
+ [:ref:`details<configmods_distros>`].
+
+ * If you want to influence the other aspects of the configuration, do so now
+ by using make targets like 'menuconfig' or 'xconfig'.
+ [:ref:`details<configmods_individual>`].
+
+.. _build_sbs:
+
+ * Build the image and the modules of your kernel::
+
+ make -j $(nproc --all)
+
+ If you want your kernel packaged up as deb, rpm, or tar file, see the
+ reference section for alternatives.
+
+ [:ref:`details<build>`]
+
+.. _install_sbs:
+
+ * Now install your kernel::
+
+ command -v installkernel && sudo make modules_install install
+
+ Often all left for you to do afterwards is a ``reboot``, as many commodity
+ Linux distributions will then create an initramfs (also known as initrd) and
+ an entry for your kernel in your bootloader's configuration; but on some
+ distributions you have to take care of these two steps manually for reasons
+ the reference section explains.
+
+ On a few distributions like Arch Linux and its derivatives the above command
+ does nothing at all; in that case you have to manually install your kernel,
+ as outlined in the reference section.
+
+ If you are running a immutable Linux distribution, check its documentation
+ and the web to find out how to install your own kernel there.
+
+ [:ref:`details<install>`]
+
+.. _another_sbs:
+
+ * To later build another kernel you need similar steps, but sometimes slightly
+ different commands.
+
+ First, switch back into the sources tree::
+
+ cd ~/linux/
+
+ In case you want to build a version from a stable/longterm series you have
+ not used yet (say 6.2.y), tell git to track it::
+
+ git remote set-branches --add origin linux-6.2.y
+
+ Now fetch the latest upstream changes; you again need to specify the earliest
+ version you care about, as git otherwise might retrieve the entire commit
+ history::
+
+ git fetch --shallow-exclude=v6.0 origin
+
+ Now switch to the version you are interested in -- but be aware the command
+ used here will discard any modifications you performed, as they would
+ conflict with the sources you want to checkout::
+
+ git checkout --force --detach origin/master
+
+ At this point you might want to patch the sources again or set/modify a build
+ tag, as explained earlier. Afterwards adjust the build configuration to the
+ new codebase using olddefconfig, which will now adjust the configuration file
+ you prepared earlier using localmodconfig (~/linux/.config) for your next
+ kernel::
+
+ # reminder: if you want to apply patches, do it at this point
+ # reminder: you might want to update your build tag at this point
+ make olddefconfig
+
+ Now build your kernel::
+
+ make -j $(nproc --all)
+
+ Afterwards install the kernel as outlined above::
+
+ command -v installkernel && sudo make modules_install install
+
+ [:ref:`details<another>`]
+
+.. _uninstall_sbs:
+
+ * Your kernel is easy to remove later, as its parts are only stored in two
+ places and clearly identifiable by the kernel's release name. Just ensure to
+ not delete the kernel you are running, as that might render your system
+ unbootable.
+
+ Start by deleting the directory holding your kernel's modules, which is named
+ after its release name -- '6.0.1-foobar' in the following example::
+
+ sudo rm -rf /lib/modules/6.0.1-foobar
+
+ Now try the following command, which on some distributions will delete all
+ other kernel files installed while also removing the kernel's entry from the
+ bootloader configuration::
+
+ command -v kernel-install && sudo kernel-install -v remove 6.0.1-foobar
+
+ If that command does not output anything or fails, see the reference section;
+ do the same if any files named '*6.0.1-foobar*' remain in /boot/.
+
+ [:ref:`details<uninstall>`]
+
+.. _submit_improvements:
+
+Did you run into trouble following any of the above steps that is not cleared up
+by the reference section below? Or do you have ideas how to improve the text?
+Then please take a moment of your time and let the maintainer of this document
+know by email (Thorsten Leemhuis <linux@leemhuis.info>), ideally while CCing the
+Linux docs mailing list (linux-doc@vger.kernel.org). Such feedback is vital to
+improve this document further, which is in everybody's interest, as it will
+enable more people to master the task described here.
+
+Reference section for the step-by-step guide
+============================================
+
+This section holds additional information for each of the steps in the above
+guide.
+
+.. _backup:
+
+Prepare for emergencies
+-----------------------
+
+ *Create a fresh backup and put system repair and restore tools at hand*
+ [:ref:`... <backup_sbs>`]
+
+Remember, you are dealing with computers, which sometimes do unexpected things
+-- especially if you fiddle with crucial parts like the kernel of an operating
+system. That's what you are about to do in this process. Hence, better prepare
+for something going sideways, even if that should not happen.
+
+[:ref:`back to step-by-step guide <backup_sbs>`]
+
+.. _secureboot:
+
+Dealing with techniques like Secure Boot
+----------------------------------------
+
+ *On platforms with 'Secure Boot' or similar techniques, prepare everything to
+ ensure the system will permit your self-compiled kernel to boot later.*
+ [:ref:`... <secureboot_sbs>`]
+
+Many modern systems allow only certain operating systems to start; they thus by
+default will reject booting self-compiled kernels.
+
+You ideally deal with this by making your platform trust your self-built kernels
+with the help of a certificate and signing. How to do that is not described
+here, as it requires various steps that would take the text too far away from
+its purpose; 'Documentation/admin-guide/module-signing.rst' and various web
+sides already explain this in more detail.
+
+Temporarily disabling solutions like Secure Boot is another way to make your own
+Linux boot. On commodity x86 systems it is possible to do this in the BIOS Setup
+utility; the steps to do so are not described here, as they greatly vary between
+machines.
+
+On mainstream x86 Linux distributions there is a third and universal option:
+disable all Secure Boot restrictions for your Linux environment. You can
+initiate this process by running ``mokutil --disable-validation``; this will
+tell you to create a one-time password, which is safe to write down. Now
+restart; right after your BIOS performed all self-tests the bootloader Shim will
+show a blue box with a message 'Press any key to perform MOK management'. Hit
+some key before the countdown exposes. This will open a menu and choose 'Change
+Secure Boot state' there. Shim's 'MokManager' will now ask you to enter three
+randomly chosen characters from the one-time password specified earlier. Once
+you provided them, confirm that you really want to disable the validation.
+Afterwards, permit MokManager to reboot the machine.
+
+[:ref:`back to step-by-step guide <secureboot_sbs>`]
+
+.. _buildrequires:
+
+Install build requirements
+--------------------------
+
+ *Install all software required to build a Linux kernel.*
+ [:ref:`...<buildrequires_sbs>`]
+
+The kernel is pretty stand-alone, but besides tools like the compiler you will
+sometimes need a few libraries to build one. How to install everything needed
+depends on your Linux distribution and the configuration of the kernel you are
+about to build.
+
+Here are a few examples what you typically need on some mainstream
+distributions:
+
+ * Debian, Ubuntu, and derivatives::
+
+ sudo apt install bc binutils bison dwarves flex gcc git make openssl \
+ pahole perl-base libssl-dev libelf-dev
+
+ * Fedora and derivatives::
+
+ sudo dnf install binutils /usr/include/{libelf.h,openssl/pkcs7.h} \
+ /usr/bin/{bc,bison,flex,gcc,git,openssl,make,perl,pahole}
+
+ * openSUSE and derivatives::
+
+ sudo zypper install bc binutils bison dwarves flex gcc git make perl-base \
+ openssl openssl-devel libelf-dev
+
+In case you wonder why these lists include openssl and its development headers:
+they are needed for the Secure Boot support, which many distributions enable in
+their kernel configuration for x86 machines.
+
+Sometimes you will need tools for compression formats like bzip2, gzip, lz4,
+lzma, lzo, xz, or zstd as well.
+
+You might need additional libraries and their development headers in case you
+perform tasks not covered in this guide. For example, zlib will be needed when
+building kernel tools from the tools/ directory; adjusting the build
+configuration with make targets like 'menuconfig' or 'xconfig' will require
+development headers for ncurses or Qt5.
+
+[:ref:`back to step-by-step guide <buildrequires_sbs>`]
+
+.. _diskspace:
+
+Space requirements
+------------------
+
+ *Ensure to have enough free space for building and installing Linux.*
+ [:ref:`... <diskspace_sbs>`]
+
+The numbers mentioned are rough estimates with a big extra charge to be on the
+safe side, so often you will need less.
+
+If you have space constraints, remember to read the reference section when you
+reach the :ref:`section about configuration adjustments' <configmods>`, as
+ensuring debug symbols are disabled will reduce the consumed disk space by quite
+a few gigabytes.
+
+[:ref:`back to step-by-step guide <diskspace_sbs>`]
+
+
+.. _sources:
+
+Download the sources
+--------------------
+
+ *Retrieve the sources of the Linux version you intend to build.*
+ [:ref:`...<sources_sbs>`]
+
+The step-by-step guide outlines how to retrieve Linux' sources using a shallow
+git clone. There is :ref:`more to tell about this method<sources_shallow>` and
+two alternate ways worth describing: :ref:`packaged archives<sources_archive>`
+and :ref:`a full git clone<sources_full>`. And the aspects ':ref:`wouldn't it
+be wiser to use a proper pre-release than the latest mainline code
+<sources_snapshot>`' and ':ref:`how to get an even fresher mainline codebase
+<sources_fresher>`' need elaboration, too.
+
+Note, to keep things simple the commands used in this guide store the build
+artifacts in the source tree. If you prefer to separate them, simply add
+something like ``O=~/linux-builddir/`` to all make calls; also adjust the path
+in all commands that add files or modify any generated (like your '.config').
+
+[:ref:`back to step-by-step guide <sources_sbs>`]
+
+.. _sources_shallow:
+
+Noteworthy characteristics of shallow clones
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The step-by-step guide uses a shallow clone, as it is the best solution for most
+of this document's target audience. There are a few aspects of this approach
+worth mentioning:
+
+ * This document in most places uses ``git fetch`` with ``--shallow-exclude=``
+ to specify the earliest version you care about (or to be precise: its git
+ tag). You alternatively can use the parameter ``--shallow-since=`` to specify
+ an absolute (say ``'2023-07-15'``) or relative (``'12 months'``) date to
+ define the depth of the history you want to download. As a second
+ alternative, you can also specify a certain depth explicitly with a parameter
+ like ``--depth=1``, unless you add branches for stable/longterm kernels.
+
+ * When running ``git fetch``, remember to always specify the oldest version,
+ the time you care about, or an explicit depth as shown in the step-by-step
+ guide. Otherwise you will risk downloading nearly the entire git history,
+ which will consume quite a bit of time and bandwidth while also stressing the
+ servers.
+
+ Note, you do not have to use the same version or date all the time. But when
+ you change it over time, git will deepen or flatten the history to the
+ specified point. That allows you to retrieve versions you initially thought
+ you did not need -- or it will discard the sources of older versions, for
+ example in case you want to free up some disk space. The latter will happen
+ automatically when using ``--shallow-since=`` or
+ ``--depth=``.
+
+ * Be warned, when deepening your clone you might encounter an error like
+ 'fatal: error in object: unshallow cafecaca0c0dacafecaca0c0dacafecaca0c0da'.
+ In that case run ``git repack -d`` and try again``
+
+ * In case you want to revert changes from a certain version (say Linux 6.3) or
+ perform a bisection (v6.2..v6.3), better tell ``git fetch`` to retrieve
+ objects up to three versions earlier (e.g. 6.0): ``git describe`` will then
+ be able to describe most commits just like it would in a full git clone.
+
+[:ref:`back to step-by-step guide <sources_sbs>`] [:ref:`back to section intro <sources>`]
+
+.. _sources_archive:
+
+Downloading the sources using a packages archive
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+People new to compiling Linux often assume downloading an archive via the
+front-page of https://kernel.org is the best approach to retrieve Linux'
+sources. It actually can be, if you are certain to build just one particular
+kernel version without changing any code. Thing is: you might be sure this will
+be the case, but in practice it often will turn out to be a wrong assumption.
+
+That's because when reporting or debugging an issue developers will often ask to
+give another version a try. They also might suggest temporarily undoing a commit
+with ``git revert`` or might provide various patches to try. Sometimes reporters
+will also be asked to use ``git bisect`` to find the change causing a problem.
+These things rely on git or are a lot easier and quicker to handle with it.
+
+A shallow clone also does not add any significant overhead. For example, when
+you use ``git clone --depth=1`` to create a shallow clone of the latest mainline
+codebase git will only retrieve a little more data than downloading the latest
+mainline pre-release (aka 'rc') via the front-page of kernel.org would.
+
+A shallow clone therefore is often the better choice. If you nevertheless want
+to use a packaged source archive, download one via kernel.org; afterwards
+extract its content to some directory and change to the subdirectory created
+during extraction. The rest of the step-by-step guide will work just fine, apart
+from things that rely on git -- but this mainly concerns the section on
+successive builds of other versions.
+
+[:ref:`back to step-by-step guide <sources_sbs>`] [:ref:`back to section intro <sources>`]
+
+.. _sources_full:
+
+Downloading the sources using a full git clone
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If downloading and storing a lot of data (~4,4 Gigabyte as of early 2023) is
+nothing that bothers you, instead of a shallow clone perform a full git clone
+instead. You then will avoid the specialties mentioned above and will have all
+versions and individual commits at hand at any time::
+
+ curl -L \
+ https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/clone.bundle \
+ -o linux-stable.git.bundle
+ git clone linux-stable.git.bundle ~/linux/
+ rm linux-stable.git.bundle
+ cd ~/linux/
+ git remote set-url origin \
+ https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
+ git fetch origin
+ git checkout --detach origin/master
+
+[:ref:`back to step-by-step guide <sources_sbs>`] [:ref:`back to section intro <sources>`]
+
+.. _sources_snapshot:
+
+Proper pre-releases (RCs) vs. latest mainline
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When cloning the sources using git and checking out origin/master, you often
+will retrieve a codebase that is somewhere between the latest and the next
+release or pre-release. This almost always is the code you want when giving
+mainline a shot: pre-releases like v6.1-rc5 are in no way special, as they do
+not get any significant extra testing before being published.
+
+There is one exception: you might want to stick to the latest mainline release
+(say v6.1) before its successor's first pre-release (v6.2-rc1) is out. That is
+because compiler errors and other problems are more likely to occur during this
+time, as mainline then is in its 'merge window': a usually two week long phase,
+in which the bulk of the changes for the next release is merged.
+
+[:ref:`back to step-by-step guide <sources_sbs>`] [:ref:`back to section intro <sources>`]
+
+.. _sources_fresher:
+
+Avoiding the mainline lag
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The explanations for both the shallow clone and the full clone both retrieve the
+code from the Linux stable git repository. That makes things simpler for this
+document's audience, as it allows easy access to both mainline and
+stable/longterm releases. This approach has just one downside:
+
+Changes merged into the mainline repository are only synced to the master branch
+of the Linux stable repository every few hours. This lag most of the time is
+not something to worry about; but in case you really need the latest code, just
+add the mainline repo as additional remote and checkout the code from there::
+
+ git remote add mainline \
+ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
+ git fetch mainline
+ git checkout --detach mainline/master
+
+When doing this with a shallow clone, remember to call ``git fetch`` with one
+of the parameters described earlier to limit the depth.
+
+[:ref:`back to step-by-step guide <sources_sbs>`] [:ref:`back to section intro <sources>`]
+
+.. _patching:
+
+Patch the sources (optional)
+----------------------------
+
+ *In case you want to apply a kernel patch, do so now.*
+ [:ref:`...<patching_sbs>`]
+
+This is the point where you might want to patch your kernel -- for example when
+a developer proposed a fix and asked you to check if it helps. The step-by-step
+guide already explains everything crucial here.
+
+[:ref:`back to step-by-step guide <patching_sbs>`]
+
+.. _tagging:
+
+Tagging this kernel build (optional, often wise)
+------------------------------------------------
+
+ *If you patched your kernel or already have that kernel version installed,
+ better tag your kernel by extending its release name:*
+ [:ref:`...<tagging_sbs>`]
+
+Tagging your kernel will help avoid confusion later, especially when you patched
+your kernel. Adding an individual tag will also ensure the kernel's image and
+its modules are installed in parallel to any existing kernels.
+
+There are various ways to add such a tag. The step-by-step guide realizes one by
+creating a 'localversion' file in your build directory from which the kernel
+build scripts will automatically pick up the tag. You can later change that file
+to use a different tag in subsequent builds or simply remove that file to dump
+the tag.
+
+[:ref:`back to step-by-step guide <tagging_sbs>`]
+
+.. _configuration:
+
+Define the build configuration for your kernel
+----------------------------------------------
+
+ *Create the build configuration for your kernel based on an existing
+ configuration.* [:ref:`... <configuration_sbs>`]
+
+There are various aspects for this steps that require a more careful
+explanation:
+
+Pitfalls when using another configuration file as base
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Make targets like localmodconfig and olddefconfig share a few common snares you
+want to be aware of:
+
+ * These targets will reuse a kernel build configuration in your build directory
+ (e.g. '~/linux/.config'), if one exists. In case you want to start from
+ scratch you thus need to delete it.
+
+ * The make targets try to find the configuration for your running kernel
+ automatically, but might choose poorly. A line like '# using defaults found
+ in /boot/config-6.0.7-250.fc36.x86_64' or 'using config:
+ '/boot/config-6.0.7-250.fc36.x86_64' tells you which file they picked. If
+ that is not the intended one, simply store it as '~/linux/.config'
+ before using these make targets.
+
+ * Unexpected things might happen if you try to use a config file prepared for
+ one kernel (say v6.0) on an older generation (say v5.15). In that case you
+ might want to use a configuration as base which your distribution utilized
+ when they used that or an slightly older kernel version.
+
+Influencing the configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The make target olddefconfig and the ``yes "" |`` used when utilizing
+localmodconfig will set any undefined build options to their default value. This
+among others will disable many kernel features that were introduced after your
+base kernel was released.
+
+If you want to set these configurations options manually, use ``oldconfig``
+instead of ``olddefconfig`` or omit the ``yes "" |`` when utilizing
+localmodconfig. Then for each undefined configuration option you will be asked
+how to proceed. In case you are unsure what to answer, simply hit 'enter' to
+apply the default value.
+
+Big pitfall when using localmodconfig
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As explained briefly in the step-by-step guide already: with localmodconfig it
+can easily happen that your self-built kernel will lack modules for tasks you
+did not perform before utilizing this make target. That's because those tasks
+require kernel modules that are normally autoloaded when you perform that task
+for the first time; if you didn't perform that task at least once before using
+localmodonfig, the latter will thus assume these modules are superfluous and
+disable them.
+
+You can try to avoid this by performing typical tasks that often will autoload
+additional kernel modules: start a VM, establish VPN connections, loop-mount a
+CD/DVD ISO, mount network shares (CIFS, NFS, ...), and connect all external
+devices (2FA keys, headsets, webcams, ...) as well as storage devices with file
+systems you otherwise do not utilize (btrfs, ext4, FAT, NTFS, XFS, ...). But it
+is hard to think of everything that might be needed -- even kernel developers
+often forget one thing or another at this point.
+
+Do not let that risk bother you, especially when compiling a kernel only for
+testing purposes: everything typically crucial will be there. And if you forget
+something important you can turn on a missing feature later and quickly run the
+commands to compile and install a better kernel.
+
+But if you plan to build and use self-built kernels regularly, you might want to
+reduce the risk by recording which modules your system loads over the course of
+a few weeks. You can automate this with `modprobed-db
+<https://github.com/graysky2/modprobed-db>`_. Afterwards use ``LSMOD=<path>`` to
+point localmodconfig to the list of modules modprobed-db noticed being used::
+
+ yes "" | make LSMOD="${HOME}"/.config/modprobed.db localmodconfig
+
+Remote building with localmodconfig
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If you want to use localmodconfig to build a kernel for another machine, run
+``lsmod > lsmod_foo-machine`` on it and transfer that file to your build host.
+Now point the build scripts to the file like this: ``yes "" | make
+LSMOD=~/lsmod_foo-machine localmodconfig``. Note, in this case
+you likely want to copy a base kernel configuration from the other machine over
+as well and place it as .config in your build directory.
+
+[:ref:`back to step-by-step guide <configuration_sbs>`]
+
+.. _configmods:
+
+Adjust build configuration
+--------------------------
+
+ *Check if you might want to or have to adjust some kernel configuration
+ options:*
+
+Depending on your needs you at this point might want or have to adjust some
+kernel configuration options.
+
+.. _configmods_debugsymbols:
+
+Debug symbols
+~~~~~~~~~~~~~
+
+ *Evaluate how you want to handle debug symbols.*
+ [:ref:`...<configmods_sbs>`]
+
+Most users do not need to care about this, it's often fine to leave everything
+as it is; but you should take a closer look at this, if you might need to decode
+a stack trace or want to reduce space consumption.
+
+Having debug symbols available can be important when your kernel throws a
+'panic', 'Oops', 'warning', or 'BUG' later when running, as then you will be
+able to find the exact place where the problem occurred in the code. But
+collecting and embedding the needed debug information takes time and consumes
+quite a bit of space: in late 2022 the build artifacts for a typical x86 kernel
+configured with localmodconfig consumed around 5 Gigabyte of space with debug
+symbols, but less than 1 when they were disabled. The resulting kernel image and
+the modules are bigger as well, which increases load times.
+
+Hence, if you want a small kernel and are unlikely to decode a stack trace
+later, you might want to disable debug symbols to avoid above downsides::
+
+ ./scripts/config --file .config -d DEBUG_INFO \
+ -d DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT -d DEBUG_INFO_DWARF4 \
+ -d DEBUG_INFO_DWARF5 -e CONFIG_DEBUG_INFO_NONE
+ make olddefconfig
+
+You on the other hand definitely want to enable them, if there is a decent
+chance that you need to decode a stack trace later (as explained by 'Decode
+failure messages' in Documentation/admin-guide/tainted-kernels.rst in more
+detail)::
+
+ ./scripts/config --file .config -d DEBUG_INFO_NONE -e DEBUG_KERNEL
+ -e DEBUG_INFO -e DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT -e KALLSYMS -e KALLSYMS_ALL
+ make olddefconfig
+
+Note, many mainstream distributions enable debug symbols in their kernel
+configurations -- make targets like localmodconfig and olddefconfig thus will
+often pick that setting up.
+
+[:ref:`back to step-by-step guide <configmods_sbs>`]
+
+.. _configmods_distros:
+
+Distro specific adjustments
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ *Are you running* [:ref:`... <configmods_sbs>`]
+
+The following sections help you to avoid build problems that are known to occur
+when following this guide on a few commodity distributions.
+
+**Debian:**
+
+ * Remove a stale reference to a certificate file that would cause your build to
+ fail::
+
+ ./scripts/config --file .config --set-str SYSTEM_TRUSTED_KEYS ''
+
+ Alternatively, download the needed certificate and make that configuration
+ option point to it, as `the Debian handbook explains in more detail
+ <https://debian-handbook.info/browse/stable/sect.kernel-compilation.html>`_
+ -- or generate your own, as explained in
+ Documentation/admin-guide/module-signing.rst.
+
+[:ref:`back to step-by-step guide <configmods_sbs>`]
+
+.. _configmods_individual:
+
+Individual adjustments
+~~~~~~~~~~~~~~~~~~~~~~
+
+ *If you want to influence the other aspects of the configuration, do so
+ now* [:ref:`... <configmods_sbs>`]
+
+You at this point can use a command like ``make menuconfig`` to enable or
+disable certain features using a text-based user interface; to use a graphical
+configuration utilize, use the make target ``xconfig`` or ``gconfig`` instead.
+All of them require development libraries from toolkits they are based on
+(ncurses, Qt5, Gtk2); an error message will tell you if something required is
+missing.
+
+[:ref:`back to step-by-step guide <configmods_sbs>`]
+
+.. _build:
+
+Build your kernel
+-----------------
+
+ *Build the image and the modules of your kernel* [:ref:`... <build_sbs>`]
+
+A lot can go wrong at this stage, but the instructions below will help you help
+yourself. Another subsection explains how to directly package your kernel up as
+deb, rpm or tar file.
+
+Dealing with build errors
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When a build error occurs, it might be caused by some aspect of your machine's
+setup that often can be fixed quickly; other times though the problem lies in
+the code and can only be fixed by a developer. A close examination of the
+failure messages coupled with some research on the internet will often tell you
+which of the two it is. To perform such a investigation, restart the build
+process like this::
+
+ make V=1
+
+The ``V=1`` activates verbose output, which might be needed to see the actual
+error. To make it easier to spot, this command also omits the ``-j $(nproc
+--all)`` used earlier to utilize every CPU core in the system for the job -- but
+this parallelism also results in some clutter when failures occur.
+
+After a few seconds the build process should run into the error again. Now try
+to find the most crucial line describing the problem. Then search the internet
+for the most important and non-generic section of that line (say 4 to 8 words);
+avoid or remove anything that looks remotely system-specific, like your username
+or local path names like ``/home/username/linux/``. First try your regular
+internet search engine with that string, afterwards search Linux kernel mailing
+lists via `lore.kernel.org/all/ <https://lore.kernel.org/all/>`_.
+
+This most of the time will find something that will explain what is wrong; quite
+often one of the hits will provide a solution for your problem, too. If you
+do not find anything that matches your problem, try again from a different angle
+by modifying your search terms or using another line from the error messages.
+
+In the end, most trouble you are to run into has likely been encountered and
+reported by others already. That includes issues where the cause is not your
+system, but lies the code. If you run into one of those, you might thus find a
+solution (e.g. a patch) or workaround for your problem, too.
+
+Package your kernel up
+~~~~~~~~~~~~~~~~~~~~~~
+
+The step-by-step guide uses the default make targets (e.g. 'bzImage' and
+'modules' on x86) to build the image and the modules of your kernel, which later
+steps of the guide then install. You instead can also directly build everything
+and directly package it up by using one of the following targets:
+
+ * ``make -j $(nproc --all) bindeb-pkg`` to generate a deb package
+
+ * ``make -j $(nproc --all) binrpm-pkg`` to generate a rpm package
+
+ * ``make -j $(nproc --all) tarbz2-pkg`` to generate a bz2 compressed tarball
+
+This is just a selection of available make targets for this purpose, see
+``make help`` for others. You can also use these targets after running
+``make -j $(nproc --all)``, as they will pick up everything already built.
+
+If you employ the targets to generate deb or rpm packages, ignore the
+step-by-step guide's instructions on installing and removing your kernel;
+instead install and remove the packages using the package utility for the format
+(e.g. dpkg and rpm) or a package management utility build on top of them (apt,
+aptitude, dnf/yum, zypper, ...). Be aware that the packages generated using
+these two make targets are designed to work on various distributions utilizing
+those formats, they thus will sometimes behave differently than your
+distribution's kernel packages.
+
+[:ref:`back to step-by-step guide <build_sbs>`]
+
+.. _install:
+
+Install your kernel
+-------------------
+
+ *Now install your kernel* [:ref:`... <install_sbs>`]
+
+What you need to do after executing the command in the step-by-step guide
+depends on the existence and the implementation of an ``installkernel``
+executable. Many commodity Linux distributions ship such a kernel installer in
+``/sbin/`` that does everything needed, hence there is nothing left for you
+except rebooting. But some distributions contain an installkernel that does
+only part of the job -- and a few lack it completely and leave all the work to
+you.
+
+If ``installkernel`` is found, the kernel's build system will delegate the
+actual installation of your kernel's image and related files to this executable.
+On almost all Linux distributions it will store the image as '/boot/vmlinuz-
+<your kernel's release name>' and put a 'System.map-<your kernel's release
+name>' alongside it. Your kernel will thus be installed in parallel to any
+existing ones, unless you already have one with exactly the same release name.
+
+Installkernel on many distributions will afterwards generate an 'initramfs'
+(often also called 'initrd'), which commodity distributions rely on for booting;
+hence be sure to keep the order of the two make targets used in the step-by-step
+guide, as things will go sideways if you install your kernel's image before its
+modules. Often installkernel will then add your kernel to the bootloader
+configuration, too. You have to take care of one or both of these tasks
+yourself, if your distributions installkernel doesn't handle them.
+
+A few distributions like Arch Linux and its derivatives totally lack an
+installkernel executable. On those just install the modules using the kernel's
+build system and then install the image and the System.map file manually::
+
+ sudo make modules_install
+ sudo install -m 0600 $(make -s image_name) /boot/vmlinuz-$(make -s kernelrelease)
+ sudo install -m 0600 System.map /boot/System.map-$(make -s kernelrelease)
+
+If your distribution boots with the help of an initramfs, now generate one for
+your kernel using the tools your distribution provides for this process.
+Afterwards add your kernel to your bootloader configuration and reboot.
+
+[:ref:`back to step-by-step guide <install_sbs>`]
+
+.. _another:
+
+Another round later
+-------------------
+
+ *To later build another kernel you need similar, but sometimes slightly
+ different commands* [:ref:`... <another_sbs>`]
+
+The process to build later kernels is similar, but at some points slightly
+different. You for example do not want to use 'localmodconfig' for succeeding
+kernel builds, as you already created a trimmed down configuration you want to
+use from now on. Hence instead just use ``oldconfig`` or ``olddefconfig`` to
+adjust your build configurations to the needs of the kernel version you are
+about to build.
+
+If you created a shallow-clone with git, remember what the :ref:`section that
+explained the setup described in more detail <sources>`: you need to use a
+slightly different ``git fetch`` command and when switching to another series
+need to add an additional remote branch.
+
+[:ref:`back to step-by-step guide <another_sbs>`]
+
+.. _uninstall:
+
+Uninstall the kernel later
+--------------------------
+
+ *All parts of your installed kernel are identifiable by its release name and
+ thus easy to remove later.* [:ref:`... <uninstall_sbs>`]
+
+Do not worry installing your kernel manually and thus bypassing your
+distribution's packaging system will totally mess up your machine: all parts of
+your kernel are easy to remove later, as files are stored in two places only and
+normally identifiable by the kernel's release name.
+
+One of the two places is a directory in /lib/modules/, which holds the modules
+for each installed kernel. This directory is named after the kernel's release
+name; hence, to remove all modules for one of your kernels, simply remove its
+modules directory in /lib/modules/.
+
+The other place is /boot/, where typically one to five files will be placed
+during installation of a kernel. All of them usually contain the release name in
+their file name, but how many files and their name depends somewhat on your
+distribution's installkernel executable (:ref:`see above <install>`) and its
+initramfs generator. On some distributions the ``kernel-install`` command
+mentioned in the step-by-step guide will remove all of these files for you --
+and the entry for your kernel in the bootloader configuration at the same time,
+too. On others you have to take care of these steps yourself. The following
+command should interactively remove the two main files of a kernel with the
+release name '6.0.1-foobar'::
+
+ rm -i /boot/{System.map,vmlinuz}-6.0.1-foobar
+
+Now remove the belonging initramfs, which often will be called something like
+``/boot/initramfs-6.0.1-foobar.img`` or ``/boot/initrd.img-6.0.1-foobar``.
+Afterwards check for other files in /boot/ that have '6.0.1-foobar' in their
+name and delete them as well. Now remove the kernel from your bootloader's
+configuration.
+
+Note, be very careful with wildcards like '*' when deleting files or directories
+for kernels manually: you might accidentally remove files of a 6.0.11 kernel
+when all you want is to remove 6.0 or 6.0.1.
+
+[:ref:`back to step-by-step guide <uninstall_sbs>`]
+
+.. _faq:
+
+FAQ
+===
+
+Why does this 'how-to' not work on my system?
+---------------------------------------------
+
+As initially stated, this guide is 'designed to cover everything typically
+needed [to build a kernel] on mainstream Linux distributions running on
+commodity PC or server hardware'. The outlined approach despite this should work
+on many other setups as well. But trying to cover every possible use-case in one
+guide would defeat its purpose, as without such a focus you would need dozens or
+hundreds of constructs along the lines of 'in case you are having <insert
+machine or distro>, you at this point have to do <this and that>
+<instead|additionally>'. Each of which would make the text longer, more
+complicated, and harder to follow.
+
+That being said: this of course is a balancing act. Hence, if you think an
+additional use-case is worth describing, suggest it to the maintainers of this
+document, as :ref:`described above <submit_improvements>`.
+
+
+..
+ end-of-content
+..
+ This document is maintained by Thorsten Leemhuis <linux@leemhuis.info>. If
+ you spot a typo or small mistake, feel free to let him know directly and
+ he'll fix it. You are free to do the same in a mostly informal way if you
+ want to contribute changes to the text -- but for copyright reasons please CC
+ linux-doc@vger.kernel.org and 'sign-off' your contribution as
+ Documentation/process/submitting-patches.rst explains in the section 'Sign
+ your work - the Developer's Certificate of Origin'.
+..
+ This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top
+ of the file. If you want to distribute this text under CC-BY-4.0 only,
+ please use 'The Linux kernel development community' for author attribution
+ and link this as source:
+ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/quickly-build-trimmed-linux.rst
+..
+ Note: Only the content of this RST file as found in the Linux kernel sources
+ is available under CC-BY-4.0, as versions of this text that were processed
+ (for example by the kernel's build system) might contain content taken from
+ files which use a more restrictive license.
+
diff --git a/Documentation/admin-guide/ramoops.rst b/Documentation/admin-guide/ramoops.rst
index 6dbcc5481000..e9f85142182d 100644
--- a/Documentation/admin-guide/ramoops.rst
+++ b/Documentation/admin-guide/ramoops.rst
@@ -3,7 +3,7 @@ Ramoops oops/panic logger
Sergiu Iordache <sergiu@chromium.org>
-Updated: 17 November 2011
+Updated: 10 Feb 2021
Introduction
------------
@@ -22,7 +22,7 @@ and type of the memory area are set using three variables:
* ``mem_address`` for the start
* ``mem_size`` for the size. The memory size will be rounded down to a
power of two.
- * ``mem_type`` to specifiy if the memory type (default is pgprot_writecombine).
+ * ``mem_type`` to specify if the memory type (default is pgprot_writecombine).
Typically the default value of ``mem_type=0`` should be used as that sets the pstore
mapping to pgprot_writecombine. Setting ``mem_type=1`` attempts to use
@@ -30,13 +30,21 @@ mapping to pgprot_writecombine. Setting ``mem_type=1`` attempts to use
depends on atomic operations. At least on ARM, pgprot_noncached causes the
memory to be mapped strongly ordered, and atomic operations on strongly ordered
memory are implementation defined, and won't work on many ARMs such as omaps.
+Setting ``mem_type=2`` attempts to treat the memory region as normal memory,
+which enables full cache on it. This can improve the performance.
The memory area is divided into ``record_size`` chunks (also rounded down to
-power of two) and each oops/panic writes a ``record_size`` chunk of
+power of two) and each kmesg dump writes a ``record_size`` chunk of
information.
-Dumping both oopses and panics can be done by setting 1 in the ``dump_oops``
-variable while setting 0 in that variable dumps only the panics.
+Limiting which kinds of kmsg dumps are stored can be controlled via
+the ``max_reason`` value, as defined in include/linux/kmsg_dump.h's
+``enum kmsg_dump_reason``. For example, to store both Oopses and Panics,
+``max_reason`` should be set to 2 (KMSG_DUMP_OOPS), to store only Panics
+``max_reason`` should be set to 1 (KMSG_DUMP_PANIC). Setting this to 0
+(KMSG_DUMP_UNDEF), means the reason filtering will be controlled by the
+``printk.always_kmsg_dump`` boot param: if unset, it'll be KMSG_DUMP_OOPS,
+otherwise KMSG_DUMP_MAX.
The module uses a counter to record multiple dumps but the counter gets reset
on restart (i.e. new dumps after the restart will overwrite old ones).
@@ -61,7 +69,7 @@ Setting the ramoops parameters can be done in several different manners:
mem=128M ramoops.mem_address=0x8000000 ramoops.ecc=1
B. Use Device Tree bindings, as described in
- ``Documentation/devicetree/bindings/reserved-memory/ramoops.txt``.
+ ``Documentation/devicetree/bindings/reserved-memory/ramoops.yaml``.
For example::
reserved-memory {
@@ -90,7 +98,7 @@ Setting the ramoops parameters can be done in several different manners:
.mem_address = <...>,
.mem_type = <...>,
.record_size = <...>,
- .dump_oops = <...>,
+ .max_reason = <...>,
.ecc = <...>,
};
diff --git a/Documentation/admin-guide/ras.rst b/Documentation/admin-guide/ras.rst
index 0310db624964..8e03751d126d 100644
--- a/Documentation/admin-guide/ras.rst
+++ b/Documentation/admin-guide/ras.rst
@@ -156,11 +156,11 @@ the labels provided by the BIOS won't match the real ones.
ECC memory
----------
-As mentioned on the previous section, ECC memory has extra bits to be
-used for error correction. So, on 64 bit systems, a memory module
-has 64 bits of *data width*, and 74 bits of *total width*. So, there are
-8 bits extra bits to be used for the error detection and correction
-mechanisms. Those extra bits are called *syndrome*\ [#f1]_\ [#f2]_.
+As mentioned in the previous section, ECC memory has extra bits to be
+used for error correction. In the above example, a memory module has
+64 bits of *data width*, and 72 bits of *total width*. The extra 8
+bits which are used for the error detection and correction mechanisms
+are referred to as the *syndrome*\ [#f1]_\ [#f2]_.
So, when the cpu requests the memory controller to write a word with
*data width*, the memory controller calculates the *syndrome* in real time,
@@ -199,7 +199,7 @@ Architecture (MCA)\ [#f3]_.
mode).
.. [#f3] For more details about the Machine Check Architecture (MCA),
- please read Documentation/x86/x86_64/machinecheck.rst at the Kernel tree.
+ please read Documentation/arch/x86/x86_64/machinecheck.rst at the Kernel tree.
EDAC - Error Detection And Correction
*************************************
@@ -212,7 +212,7 @@ EDAC - Error Detection And Correction
purposes.
When the subsystem was pushed upstream for the first time, on
- Kernel 2.6.16, for the first time, it was renamed to ``EDAC``.
+ Kernel 2.6.16, it was renamed to ``EDAC``.
Purpose
-------
@@ -351,15 +351,17 @@ controllers. The following example will assume 2 channels:
+------------+-----------+-----------+
| | ``ch0`` | ``ch1`` |
+============+===========+===========+
- | ``csrow0`` | DIMM_A0 | DIMM_B0 |
- | | rank0 | rank0 |
- +------------+ - | - |
+ | |**DIMM_A0**|**DIMM_B0**|
+ +------------+-----------+-----------+
+ | ``csrow0`` | rank0 | rank0 |
+ +------------+-----------+-----------+
| ``csrow1`` | rank1 | rank1 |
+------------+-----------+-----------+
- | ``csrow2`` | DIMM_A1 | DIMM_B1 |
- | | rank0 | rank0 |
- +------------+ - | - |
- | ``csrow3`` | rank1 | rank1 |
+ | |**DIMM_A1**|**DIMM_B1**|
+ +------------+-----------+-----------+
+ | ``csrow2`` | rank0 | rank0 |
+ +------------+-----------+-----------+
+ | ``csrow3`` | rank1 | rank1 |
+------------+-----------+-----------+
In the above example, there are 4 physical slots on the motherboard
diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
deleted file mode 100644
index 49ac8dc3594d..000000000000
--- a/Documentation/admin-guide/reporting-bugs.rst
+++ /dev/null
@@ -1,182 +0,0 @@
-.. _reportingbugs:
-
-Reporting bugs
-++++++++++++++
-
-Background
-==========
-
-The upstream Linux kernel maintainers only fix bugs for specific kernel
-versions. Those versions include the current "release candidate" (or -rc)
-kernel, any "stable" kernel versions, and any "long term" kernels.
-
-Please see https://www.kernel.org/ for a list of supported kernels. Any
-kernel marked with [EOL] is "end of life" and will not have any fixes
-backported to it.
-
-If you've found a bug on a kernel version that isn't listed on kernel.org,
-contact your Linux distribution or embedded vendor for support.
-Alternatively, you can attempt to run one of the supported stable or -rc
-kernels, and see if you can reproduce the bug on that. It's preferable
-to reproduce the bug on the latest -rc kernel.
-
-
-How to report Linux kernel bugs
-===============================
-
-
-Identify the problematic subsystem
-----------------------------------
-
-Identifying which part of the Linux kernel might be causing your issue
-increases your chances of getting your bug fixed. Simply posting to the
-generic linux-kernel mailing list (LKML) may cause your bug report to be
-lost in the noise of a mailing list that gets 1000+ emails a day.
-
-Instead, try to figure out which kernel subsystem is causing the issue,
-and email that subsystem's maintainer and mailing list. If the subsystem
-maintainer doesn't answer, then expand your scope to mailing lists like
-LKML.
-
-
-Identify who to notify
-----------------------
-
-Once you know the subsystem that is causing the issue, you should send a
-bug report. Some maintainers prefer bugs to be reported via bugzilla
-(https://bugzilla.kernel.org), while others prefer that bugs be reported
-via the subsystem mailing list.
-
-To find out where to send an emailed bug report, find your subsystem or
-device driver in the MAINTAINERS file. Search in the file for relevant
-entries, and send your bug report to the person(s) listed in the "M:"
-lines, making sure to Cc the mailing list(s) in the "L:" lines. When the
-maintainer replies to you, make sure to 'Reply-all' in order to keep the
-public mailing list(s) in the email thread.
-
-If you know which driver is causing issues, you can pass one of the driver
-files to the get_maintainer.pl script::
-
- perl scripts/get_maintainer.pl -f <filename>
-
-If it is a security bug, please copy the Security Contact listed in the
-MAINTAINERS file. They can help coordinate bugfix and disclosure. See
-:ref:`Documentation/admin-guide/security-bugs.rst <securitybugs>` for more information.
-
-If you can't figure out which subsystem caused the issue, you should file
-a bug in kernel.org bugzilla and send email to
-linux-kernel@vger.kernel.org, referencing the bugzilla URL. (For more
-information on the linux-kernel mailing list see
-http://vger.kernel.org/lkml/).
-
-
-Tips for reporting bugs
------------------------
-
-If you haven't reported a bug before, please read:
-
- http://www.chiark.greenend.org.uk/~sgtatham/bugs.html
-
- http://www.catb.org/esr/faqs/smart-questions.html
-
-It's REALLY important to report bugs that seem unrelated as separate email
-threads or separate bugzilla entries. If you report several unrelated
-bugs at once, it's difficult for maintainers to tease apart the relevant
-data.
-
-
-Gather information
-------------------
-
-The most important information in a bug report is how to reproduce the
-bug. This includes system information, and (most importantly)
-step-by-step instructions for how a user can trigger the bug.
-
-If the failure includes an "OOPS:", take a picture of the screen, capture
-a netconsole trace, or type the message from your screen into the bug
-report. Please read "Documentation/admin-guide/bug-hunting.rst" before posting your
-bug report. This explains what you should do with the "Oops" information
-to make it useful to the recipient.
-
-This is a suggested format for a bug report sent via email or bugzilla.
-Having a standardized bug report form makes it easier for you not to
-overlook things, and easier for the developers to find the pieces of
-information they're really interested in. If some information is not
-relevant to your bug, feel free to exclude it.
-
-First run the ver_linux script included as scripts/ver_linux, which
-reports the version of some important subsystems. Run this script with
-the command ``awk -f scripts/ver_linux``.
-
-Use that information to fill in all fields of the bug report form, and
-post it to the mailing list with a subject of "PROBLEM: <one line
-summary from [1.]>" for easy identification by the developers::
-
- [1.] One line summary of the problem:
- [2.] Full description of the problem/report:
- [3.] Keywords (i.e., modules, networking, kernel):
- [4.] Kernel information
- [4.1.] Kernel version (from /proc/version):
- [4.2.] Kernel .config file:
- [5.] Most recent kernel version which did not have the bug:
- [6.] Output of Oops.. message (if applicable) with symbolic information
- resolved (see Documentation/admin-guide/bug-hunting.rst)
- [7.] A small shell script or example program which triggers the
- problem (if possible)
- [8.] Environment
- [8.1.] Software (add the output of the ver_linux script here)
- [8.2.] Processor information (from /proc/cpuinfo):
- [8.3.] Module information (from /proc/modules):
- [8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
- [8.5.] PCI information ('lspci -vvv' as root)
- [8.6.] SCSI information (from /proc/scsi/scsi)
- [8.7.] Other information that might be relevant to the problem
- (please look in /proc and include all information that you
- think to be relevant):
- [X.] Other notes, patches, fixes, workarounds:
-
-
-Follow up
-=========
-
-Expectations for bug reporters
-------------------------------
-
-Linux kernel maintainers expect bug reporters to be able to follow up on
-bug reports. That may include running new tests, applying patches,
-recompiling your kernel, and/or re-triggering your bug. The most
-frustrating thing for maintainers is for someone to report a bug, and then
-never follow up on a request to try out a fix.
-
-That said, it's still useful for a kernel maintainer to know a bug exists
-on a supported kernel, even if you can't follow up with retests. Follow
-up reports, such as replying to the email thread with "I tried the latest
-kernel and I can't reproduce my bug anymore" are also helpful, because
-maintainers have to assume silence means things are still broken.
-
-Expectations for kernel maintainers
------------------------------------
-
-Linux kernel maintainers are busy, overworked human beings. Some times
-they may not be able to address your bug in a day, a week, or two weeks.
-If they don't answer your email, they may be on vacation, or at a Linux
-conference. Check the conference schedule at https://LWN.net for more info:
-
- https://lwn.net/Calendar/
-
-In general, kernel maintainers take 1 to 5 business days to respond to
-bugs. The majority of kernel maintainers are employed to work on the
-kernel, and they may not work on the weekends. Maintainers are scattered
-around the world, and they may not work in your time zone. Unless you
-have a high priority bug, please wait at least a week after the first bug
-report before sending the maintainer a reminder email.
-
-The exceptions to this rule are regressions, kernel crashes, security holes,
-or userspace breakage caused by new kernel behavior. Those bugs should be
-addressed by the maintainers ASAP. If you suspect a maintainer is not
-responding to these types of bugs in a timely manner (especially during a
-merge window), escalate the bug to LKML and Linus Torvalds.
-
-Thank you!
-
-[Some of this is taken from Frohwalt Egerer's original linux-kernel FAQ]
diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst
new file mode 100644
index 000000000000..2fd5a030235a
--- /dev/null
+++ b/Documentation/admin-guide/reporting-issues.rst
@@ -0,0 +1,1764 @@
+.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
+.. See the bottom of this file for additional redistribution information.
+
+Reporting issues
+++++++++++++++++
+
+
+The short guide (aka TL;DR)
+===========================
+
+Are you facing a regression with vanilla kernels from the same stable or
+longterm series? One still supported? Then search the `LKML
+<https://lore.kernel.org/lkml/>`_ and the `Linux stable mailing list
+<https://lore.kernel.org/stable/>`_ archives for matching reports to join. If
+you don't find any, install `the latest release from that series
+<https://kernel.org/>`_. If it still shows the issue, report it to the stable
+mailing list (stable@vger.kernel.org) and CC the regressions list
+(regressions@lists.linux.dev); ideally also CC the maintainer and the mailing
+list for the subsystem in question.
+
+In all other cases try your best guess which kernel part might be causing the
+issue. Check the :ref:`MAINTAINERS <maintainers>` file for how its developers
+expect to be told about problems, which most of the time will be by email with a
+mailing list in CC. Check the destination's archives for matching reports;
+search the `LKML <https://lore.kernel.org/lkml/>`_ and the web, too. If you
+don't find any to join, install `the latest mainline kernel
+<https://kernel.org/>`_. If the issue is present there, send a report.
+
+The issue was fixed there, but you would like to see it resolved in a still
+supported stable or longterm series as well? Then install its latest release.
+If it shows the problem, search for the change that fixed it in mainline and
+check if backporting is in the works or was discarded; if it's neither, ask
+those who handled the change for it.
+
+**General remarks**: When installing and testing a kernel as outlined above,
+ensure it's vanilla (IOW: not patched and not using add-on modules). Also make
+sure it's built and running in a healthy environment and not already tainted
+before the issue occurs.
+
+If you are facing multiple issues with the Linux kernel at once, report each
+separately. While writing your report, include all information relevant to the
+issue, like the kernel and the distro used. In case of a regression, CC the
+regressions mailing list (regressions@lists.linux.dev) to your report. Also try
+to pin-point the culprit with a bisection; if you succeed, include its
+commit-id and CC everyone in the sign-off-by chain.
+
+Once the report is out, answer any questions that come up and help where you
+can. That includes keeping the ball rolling by occasionally retesting with newer
+releases and sending a status update afterwards.
+
+Step-by-step guide how to report issues to the kernel maintainers
+=================================================================
+
+The above TL;DR outlines roughly how to report issues to the Linux kernel
+developers. It might be all that's needed for people already familiar with
+reporting issues to Free/Libre & Open Source Software (FLOSS) projects. For
+everyone else there is this section. It is more detailed and uses a
+step-by-step approach. It still tries to be brief for readability and leaves
+out a lot of details; those are described below the step-by-step guide in a
+reference section, which explains each of the steps in more detail.
+
+Note: this section covers a few more aspects than the TL;DR and does things in
+a slightly different order. That's in your interest, to make sure you notice
+early if an issue that looks like a Linux kernel problem is actually caused by
+something else. These steps thus help to ensure the time you invest in this
+process won't feel wasted in the end:
+
+ * Are you facing an issue with a Linux kernel a hardware or software vendor
+ provided? Then in almost all cases you are better off to stop reading this
+ document and reporting the issue to your vendor instead, unless you are
+ willing to install the latest Linux version yourself. Be aware the latter
+ will often be needed anyway to hunt down and fix issues.
+
+ * Perform a rough search for existing reports with your favorite internet
+ search engine; additionally, check the archives of the `Linux Kernel Mailing
+ List (LKML) <https://lore.kernel.org/lkml/>`_. If you find matching reports,
+ join the discussion instead of sending a new one.
+
+ * See if the issue you are dealing with qualifies as regression, security
+ issue, or a really severe problem: those are 'issues of high priority' that
+ need special handling in some steps that are about to follow.
+
+ * Make sure it's not the kernel's surroundings that are causing the issue
+ you face.
+
+ * Create a fresh backup and put system repair and restore tools at hand.
+
+ * Ensure your system does not enhance its kernels by building additional
+ kernel modules on-the-fly, which solutions like DKMS might be doing locally
+ without your knowledge.
+
+ * Check if your kernel was 'tainted' when the issue occurred, as the event
+ that made the kernel set this flag might be causing the issue you face.
+
+ * Write down coarsely how to reproduce the issue. If you deal with multiple
+ issues at once, create separate notes for each of them and make sure they
+ work independently on a freshly booted system. That's needed, as each issue
+ needs to get reported to the kernel developers separately, unless they are
+ strongly entangled.
+
+ * If you are facing a regression within a stable or longterm version line
+ (say something broke when updating from 5.10.4 to 5.10.5), scroll down to
+ 'Dealing with regressions within a stable and longterm kernel line'.
+
+ * Locate the driver or kernel subsystem that seems to be causing the issue.
+ Find out how and where its developers expect reports. Note: most of the
+ time this won't be bugzilla.kernel.org, as issues typically need to be sent
+ by mail to a maintainer and a public mailing list.
+
+ * Search the archives of the bug tracker or mailing list in question
+ thoroughly for reports that might match your issue. If you find anything,
+ join the discussion instead of sending a new report.
+
+After these preparations you'll now enter the main part:
+
+ * Unless you are already running the latest 'mainline' Linux kernel, better
+ go and install it for the reporting process. Testing and reporting with
+ the latest 'stable' Linux can be an acceptable alternative in some
+ situations; during the merge window that actually might be even the best
+ approach, but in that development phase it can be an even better idea to
+ suspend your efforts for a few days anyway. Whatever version you choose,
+ ideally use a 'vanilla' build. Ignoring these advices will dramatically
+ increase the risk your report will be rejected or ignored.
+
+ * Ensure the kernel you just installed does not 'taint' itself when
+ running.
+
+ * Reproduce the issue with the kernel you just installed. If it doesn't show
+ up there, scroll down to the instructions for issues only happening with
+ stable and longterm kernels.
+
+ * Optimize your notes: try to find and write the most straightforward way to
+ reproduce your issue. Make sure the end result has all the important
+ details, and at the same time is easy to read and understand for others
+ that hear about it for the first time. And if you learned something in this
+ process, consider searching again for existing reports about the issue.
+
+ * If your failure involves a 'panic', 'Oops', 'warning', or 'BUG', consider
+ decoding the kernel log to find the line of code that triggered the error.
+
+ * If your problem is a regression, try to narrow down when the issue was
+ introduced as much as possible.
+
+ * Start to compile the report by writing a detailed description about the
+ issue. Always mention a few things: the latest kernel version you installed
+ for reproducing, the Linux Distribution used, and your notes on how to
+ reproduce the issue. Ideally, make the kernel's build configuration
+ (.config) and the output from ``dmesg`` available somewhere on the net and
+ link to it. Include or upload all other information that might be relevant,
+ like the output/screenshot of an Oops or the output from ``lspci``. Once
+ you wrote this main part, insert a normal length paragraph on top of it
+ outlining the issue and the impact quickly. On top of this add one sentence
+ that briefly describes the problem and gets people to read on. Now give the
+ thing a descriptive title or subject that yet again is shorter. Then you're
+ ready to send or file the report like the MAINTAINERS file told you, unless
+ you are dealing with one of those 'issues of high priority': they need
+ special care which is explained in 'Special handling for high priority
+ issues' below.
+
+ * Wait for reactions and keep the thing rolling until you can accept the
+ outcome in one way or the other. Thus react publicly and in a timely manner
+ to any inquiries. Test proposed fixes. Do proactive testing: retest with at
+ least every first release candidate (RC) of a new mainline version and
+ report your results. Send friendly reminders if things stall. And try to
+ help yourself, if you don't get any help or if it's unsatisfying.
+
+
+Reporting regressions within a stable and longterm kernel line
+--------------------------------------------------------------
+
+This subsection is for you, if you followed above process and got sent here at
+the point about regression within a stable or longterm kernel version line. You
+face one of those if something breaks when updating from 5.10.4 to 5.10.5 (a
+switch from 5.9.15 to 5.10.5 does not qualify). The developers want to fix such
+regressions as quickly as possible, hence there is a streamlined process to
+report them:
+
+ * Check if the kernel developers still maintain the Linux kernel version
+ line you care about: go to the `front page of kernel.org
+ <https://kernel.org/>`_ and make sure it mentions
+ the latest release of the particular version line without an '[EOL]' tag.
+
+ * Check the archives of the `Linux stable mailing list
+ <https://lore.kernel.org/stable/>`_ for existing reports.
+
+ * Install the latest release from the particular version line as a vanilla
+ kernel. Ensure this kernel is not tainted and still shows the problem, as
+ the issue might have already been fixed there. If you first noticed the
+ problem with a vendor kernel, check a vanilla build of the last version
+ known to work performs fine as well.
+
+ * Send a short problem report to the Linux stable mailing list
+ (stable@vger.kernel.org) and CC the Linux regressions mailing list
+ (regressions@lists.linux.dev); if you suspect the cause in a particular
+ subsystem, CC its maintainer and its mailing list. Roughly describe the
+ issue and ideally explain how to reproduce it. Mention the first version
+ that shows the problem and the last version that's working fine. Then
+ wait for further instructions.
+
+The reference section below explains each of these steps in more detail.
+
+
+Reporting issues only occurring in older kernel version lines
+-------------------------------------------------------------
+
+This subsection is for you, if you tried the latest mainline kernel as outlined
+above, but failed to reproduce your issue there; at the same time you want to
+see the issue fixed in a still supported stable or longterm series or vendor
+kernels regularly rebased on those. If that the case, follow these steps:
+
+ * Prepare yourself for the possibility that going through the next few steps
+ might not get the issue solved in older releases: the fix might be too big
+ or risky to get backported there.
+
+ * Perform the first three steps in the section "Dealing with regressions
+ within a stable and longterm kernel line" above.
+
+ * Search the Linux kernel version control system for the change that fixed
+ the issue in mainline, as its commit message might tell you if the fix is
+ scheduled for backporting already. If you don't find anything that way,
+ search the appropriate mailing lists for posts that discuss such an issue
+ or peer-review possible fixes; then check the discussions if the fix was
+ deemed unsuitable for backporting. If backporting was not considered at
+ all, join the newest discussion, asking if it's in the cards.
+
+ * One of the former steps should lead to a solution. If that doesn't work
+ out, ask the maintainers for the subsystem that seems to be causing the
+ issue for advice; CC the mailing list for the particular subsystem as well
+ as the stable mailing list.
+
+The reference section below explains each of these steps in more detail.
+
+
+Reference section: Reporting issues to the kernel maintainers
+=============================================================
+
+The detailed guides above outline all the major steps in brief fashion, which
+should be enough for most people. But sometimes there are situations where even
+experienced users might wonder how to actually do one of those steps. That's
+what this section is for, as it will provide a lot more details on each of the
+above steps. Consider this as reference documentation: it's possible to read it
+from top to bottom. But it's mainly meant to skim over and a place to look up
+details how to actually perform those steps.
+
+A few words of general advice before digging into the details:
+
+ * The Linux kernel developers are well aware this process is complicated and
+ demands more than other FLOSS projects. We'd love to make it simpler. But
+ that would require work in various places as well as some infrastructure,
+ which would need constant maintenance; nobody has stepped up to do that
+ work, so that's just how things are for now.
+
+ * A warranty or support contract with some vendor doesn't entitle you to
+ request fixes from developers in the upstream Linux kernel community: such
+ contracts are completely outside the scope of the Linux kernel, its
+ development community, and this document. That's why you can't demand
+ anything such a contract guarantees in this context, not even if the
+ developer handling the issue works for the vendor in question. If you want
+ to claim your rights, use the vendor's support channel instead. When doing
+ so, you might want to mention you'd like to see the issue fixed in the
+ upstream Linux kernel; motivate them by saying it's the only way to ensure
+ the fix in the end will get incorporated in all Linux distributions.
+
+ * If you never reported an issue to a FLOSS project before you should consider
+ reading `How to Report Bugs Effectively
+ <https://www.chiark.greenend.org.uk/~sgtatham/bugs.html>`_, `How To Ask
+ Questions The Smart Way
+ <http://www.catb.org/esr/faqs/smart-questions.html>`_, and `How to ask good
+ questions <https://jvns.ca/blog/good-questions/>`_.
+
+With that off the table, find below the details on how to properly report
+issues to the Linux kernel developers.
+
+
+Make sure you're using the upstream Linux kernel
+------------------------------------------------
+
+ *Are you facing an issue with a Linux kernel a hardware or software vendor
+ provided? Then in almost all cases you are better off to stop reading this
+ document and reporting the issue to your vendor instead, unless you are
+ willing to install the latest Linux version yourself. Be aware the latter
+ will often be needed anyway to hunt down and fix issues.*
+
+Like most programmers, Linux kernel developers don't like to spend time dealing
+with reports for issues that don't even happen with their current code. It's
+just a waste everybody's time, especially yours. Unfortunately such situations
+easily happen when it comes to the kernel and often leads to frustration on both
+sides. That's because almost all Linux-based kernels pre-installed on devices
+(Computers, Laptops, Smartphones, Routers, …) and most shipped by Linux
+distributors are quite distant from the official Linux kernel as distributed by
+kernel.org: these kernels from these vendors are often ancient from the point of
+Linux development or heavily modified, often both.
+
+Most of these vendor kernels are quite unsuitable for reporting issues to the
+Linux kernel developers: an issue you face with one of them might have been
+fixed by the Linux kernel developers months or years ago already; additionally,
+the modifications and enhancements by the vendor might be causing the issue you
+face, even if they look small or totally unrelated. That's why you should report
+issues with these kernels to the vendor. Its developers should look into the
+report and, in case it turns out to be an upstream issue, fix it directly
+upstream or forward the report there. In practice that often does not work out
+or might not what you want. You thus might want to consider circumventing the
+vendor by installing the very latest Linux kernel core yourself. If that's an
+option for you move ahead in this process, as a later step in this guide will
+explain how to do that once it rules out other potential causes for your issue.
+
+Note, the previous paragraph is starting with the word 'most', as sometimes
+developers in fact are willing to handle reports about issues occurring with
+vendor kernels. If they do in the end highly depends on the developers and the
+issue in question. Your chances are quite good if the distributor applied only
+small modifications to a kernel based on a recent Linux version; that for
+example often holds true for the mainline kernels shipped by Debian GNU/Linux
+Sid or Fedora Rawhide. Some developers will also accept reports about issues
+with kernels from distributions shipping the latest stable kernel, as long as
+its only slightly modified; that for example is often the case for Arch Linux,
+regular Fedora releases, and openSUSE Tumbleweed. But keep in mind, you better
+want to use a mainline Linux and avoid using a stable kernel for this
+process, as outlined in the section 'Install a fresh kernel for testing' in more
+detail.
+
+Obviously you are free to ignore all this advice and report problems with an old
+or heavily modified vendor kernel to the upstream Linux developers. But note,
+those often get rejected or ignored, so consider yourself warned. But it's still
+better than not reporting the issue at all: sometimes such reports directly or
+indirectly will help to get the issue fixed over time.
+
+
+Search for existing reports, first run
+--------------------------------------
+
+ *Perform a rough search for existing reports with your favorite internet
+ search engine; additionally, check the archives of the Linux Kernel Mailing
+ List (LKML). If you find matching reports, join the discussion instead of
+ sending a new one.*
+
+Reporting an issue that someone else already brought forward is often a waste of
+time for everyone involved, especially you as the reporter. So it's in your own
+interest to thoroughly check if somebody reported the issue already. At this
+step of the process it's okay to just perform a rough search: a later step will
+tell you to perform a more detailed search once you know where your issue needs
+to be reported to. Nevertheless, do not hurry with this step of the reporting
+process, it can save you time and trouble.
+
+Simply search the internet with your favorite search engine first. Afterwards,
+search the `Linux Kernel Mailing List (LKML) archives
+<https://lore.kernel.org/lkml/>`_.
+
+If you get flooded with results consider telling your search engine to limit
+search timeframe to the past month or year. And wherever you search, make sure
+to use good search terms; vary them a few times, too. While doing so try to
+look at the issue from the perspective of someone else: that will help you to
+come up with other words to use as search terms. Also make sure not to use too
+many search terms at once. Remember to search with and without information like
+the name of the kernel driver or the name of the affected hardware component.
+But its exact brand name (say 'ASUS Red Devil Radeon RX 5700 XT Gaming OC')
+often is not much helpful, as it is too specific. Instead try search terms like
+the model line (Radeon 5700 or Radeon 5000) and the code name of the main chip
+('Navi' or 'Navi10') with and without its manufacturer ('AMD').
+
+In case you find an existing report about your issue, join the discussion, as
+you might be able to provide valuable additional information. That can be
+important even when a fix is prepared or in its final stages already, as
+developers might look for people that can provide additional information or
+test a proposed fix. Jump to the section 'Duties after the report went out' for
+details on how to get properly involved.
+
+Note, searching `bugzilla.kernel.org <https://bugzilla.kernel.org/>`_ might also
+be a good idea, as that might provide valuable insights or turn up matching
+reports. If you find the latter, just keep in mind: most subsystems expect
+reports in different places, as described below in the section "Check where you
+need to report your issue". The developers that should take care of the issue
+thus might not even be aware of the bugzilla ticket. Hence, check the ticket if
+the issue already got reported as outlined in this document and if not consider
+doing so.
+
+
+Issue of high priority?
+-----------------------
+
+ *See if the issue you are dealing with qualifies as regression, security
+ issue, or a really severe problem: those are 'issues of high priority' that
+ need special handling in some steps that are about to follow.*
+
+Linus Torvalds and the leading Linux kernel developers want to see some issues
+fixed as soon as possible, hence there are 'issues of high priority' that get
+handled slightly differently in the reporting process. Three type of cases
+qualify: regressions, security issues, and really severe problems.
+
+You deal with a regression if some application or practical use case running
+fine with one Linux kernel works worse or not at all with a newer version
+compiled using a similar configuration. The document
+Documentation/admin-guide/reporting-regressions.rst explains this in more
+detail. It also provides a good deal of other information about regressions you
+might want to be aware of; it for example explains how to add your issue to the
+list of tracked regressions, to ensure it won't fall through the cracks.
+
+What qualifies as security issue is left to your judgment. Consider reading
+Documentation/process/security-bugs.rst before proceeding, as it
+provides additional details how to best handle security issues.
+
+An issue is a 'really severe problem' when something totally unacceptably bad
+happens. That's for example the case when a Linux kernel corrupts the data it's
+handling or damages hardware it's running on. You're also dealing with a severe
+issue when the kernel suddenly stops working with an error message ('kernel
+panic') or without any farewell note at all. Note: do not confuse a 'panic' (a
+fatal error where the kernel stop itself) with a 'Oops' (a recoverable error),
+as the kernel remains running after the latter.
+
+
+Ensure a healthy environment
+----------------------------
+
+ *Make sure it's not the kernel's surroundings that are causing the issue
+ you face.*
+
+Problems that look a lot like a kernel issue are sometimes caused by build or
+runtime environment. It's hard to rule out that problem completely, but you
+should minimize it:
+
+ * Use proven tools when building your kernel, as bugs in the compiler or the
+ binutils can cause the resulting kernel to misbehave.
+
+ * Ensure your computer components run within their design specifications;
+ that's especially important for the main processor, the main memory, and the
+ motherboard. Therefore, stop undervolting or overclocking when facing a
+ potential kernel issue.
+
+ * Try to make sure it's not faulty hardware that is causing your issue. Bad
+ main memory for example can result in a multitude of issues that will
+ manifest itself in problems looking like kernel issues.
+
+ * If you're dealing with a filesystem issue, you might want to check the file
+ system in question with ``fsck``, as it might be damaged in a way that leads
+ to unexpected kernel behavior.
+
+ * When dealing with a regression, make sure it's not something else that
+ changed in parallel to updating the kernel. The problem for example might be
+ caused by other software that was updated at the same time. It can also
+ happen that a hardware component coincidentally just broke when you rebooted
+ into a new kernel for the first time. Updating the systems BIOS or changing
+ something in the BIOS Setup can also lead to problems that on look a lot
+ like a kernel regression.
+
+
+Prepare for emergencies
+-----------------------
+
+ *Create a fresh backup and put system repair and restore tools at hand.*
+
+Reminder, you are dealing with computers, which sometimes do unexpected things,
+especially if you fiddle with crucial parts like the kernel of its operating
+system. That's what you are about to do in this process. Thus, make sure to
+create a fresh backup; also ensure you have all tools at hand to repair or
+reinstall the operating system as well as everything you need to restore the
+backup.
+
+
+Make sure your kernel doesn't get enhanced
+------------------------------------------
+
+ *Ensure your system does not enhance its kernels by building additional
+ kernel modules on-the-fly, which solutions like DKMS might be doing locally
+ without your knowledge.*
+
+The risk your issue report gets ignored or rejected dramatically increases if
+your kernel gets enhanced in any way. That's why you should remove or disable
+mechanisms like akmods and DKMS: those build add-on kernel modules
+automatically, for example when you install a new Linux kernel or boot it for
+the first time. Also remove any modules they might have installed. Then reboot
+before proceeding.
+
+Note, you might not be aware that your system is using one of these solutions:
+they often get set up silently when you install Nvidia's proprietary graphics
+driver, VirtualBox, or other software that requires a some support from a
+module not part of the Linux kernel. That why your might need to uninstall the
+packages with such software to get rid of any 3rd party kernel module.
+
+
+Check 'taint' flag
+------------------
+
+ *Check if your kernel was 'tainted' when the issue occurred, as the event
+ that made the kernel set this flag might be causing the issue you face.*
+
+The kernel marks itself with a 'taint' flag when something happens that might
+lead to follow-up errors that look totally unrelated. The issue you face might
+be such an error if your kernel is tainted. That's why it's in your interest to
+rule this out early before investing more time into this process. This is the
+only reason why this step is here, as this process later will tell you to
+install the latest mainline kernel; you will need to check the taint flag again
+then, as that's when it matters because it's the kernel the report will focus
+on.
+
+On a running system is easy to check if the kernel tainted itself: if ``cat
+/proc/sys/kernel/tainted`` returns '0' then the kernel is not tainted and
+everything is fine. Checking that file is impossible in some situations; that's
+why the kernel also mentions the taint status when it reports an internal
+problem (a 'kernel bug'), a recoverable error (a 'kernel Oops') or a
+non-recoverable error before halting operation (a 'kernel panic'). Look near
+the top of the error messages printed when one of these occurs and search for a
+line starting with 'CPU:'. It should end with 'Not tainted' if the kernel was
+not tainted when it noticed the problem; it was tainted if you see 'Tainted:'
+followed by a few spaces and some letters.
+
+If your kernel is tainted, study Documentation/admin-guide/tainted-kernels.rst
+to find out why. Try to eliminate the reason. Often it's caused by one these
+three things:
+
+ 1. A recoverable error (a 'kernel Oops') occurred and the kernel tainted
+ itself, as the kernel knows it might misbehave in strange ways after that
+ point. In that case check your kernel or system log and look for a section
+ that starts with this::
+
+ Oops: 0000 [#1] SMP
+
+ That's the first Oops since boot-up, as the '#1' between the brackets shows.
+ Every Oops and any other problem that happens after that point might be a
+ follow-up problem to that first Oops, even if both look totally unrelated.
+ Rule this out by getting rid of the cause for the first Oops and reproducing
+ the issue afterwards. Sometimes simply restarting will be enough, sometimes
+ a change to the configuration followed by a reboot can eliminate the Oops.
+ But don't invest too much time into this at this point of the process, as
+ the cause for the Oops might already be fixed in the newer Linux kernel
+ version you are going to install later in this process.
+
+ 2. Your system uses a software that installs its own kernel modules, for
+ example Nvidia's proprietary graphics driver or VirtualBox. The kernel
+ taints itself when it loads such module from external sources (even if
+ they are Open Source): they sometimes cause errors in unrelated kernel
+ areas and thus might be causing the issue you face. You therefore have to
+ prevent those modules from loading when you want to report an issue to the
+ Linux kernel developers. Most of the time the easiest way to do that is:
+ temporarily uninstall such software including any modules they might have
+ installed. Afterwards reboot.
+
+ 3. The kernel also taints itself when it's loading a module that resides in
+ the staging tree of the Linux kernel source. That's a special area for
+ code (mostly drivers) that does not yet fulfill the normal Linux kernel
+ quality standards. When you report an issue with such a module it's
+ obviously okay if the kernel is tainted; just make sure the module in
+ question is the only reason for the taint. If the issue happens in an
+ unrelated area reboot and temporarily block the module from being loaded
+ by specifying ``foo.blacklist=1`` as kernel parameter (replace 'foo' with
+ the name of the module in question).
+
+
+Document how to reproduce issue
+-------------------------------
+
+ *Write down coarsely how to reproduce the issue. If you deal with multiple
+ issues at once, create separate notes for each of them and make sure they
+ work independently on a freshly booted system. That's needed, as each issue
+ needs to get reported to the kernel developers separately, unless they are
+ strongly entangled.*
+
+If you deal with multiple issues at once, you'll have to report each of them
+separately, as they might be handled by different developers. Describing
+various issues in one report also makes it quite difficult for others to tear
+it apart. Hence, only combine issues in one report if they are very strongly
+entangled.
+
+Additionally, during the reporting process you will have to test if the issue
+happens with other kernel versions. Therefore, it will make your work easier if
+you know exactly how to reproduce an issue quickly on a freshly booted system.
+
+Note: it's often fruitless to report issues that only happened once, as they
+might be caused by a bit flip due to cosmic radiation. That's why you should
+try to rule that out by reproducing the issue before going further. Feel free
+to ignore this advice if you are experienced enough to tell a one-time error
+due to faulty hardware apart from a kernel issue that rarely happens and thus
+is hard to reproduce.
+
+
+Regression in stable or longterm kernel?
+----------------------------------------
+
+ *If you are facing a regression within a stable or longterm version line
+ (say something broke when updating from 5.10.4 to 5.10.5), scroll down to
+ 'Dealing with regressions within a stable and longterm kernel line'.*
+
+Regression within a stable and longterm kernel version line are something the
+Linux developers want to fix badly, as such issues are even more unwanted than
+regression in the main development branch, as they can quickly affect a lot of
+people. The developers thus want to learn about such issues as quickly as
+possible, hence there is a streamlined process to report them. Note,
+regressions with newer kernel version line (say something broke when switching
+from 5.9.15 to 5.10.5) do not qualify.
+
+
+Check where you need to report your issue
+-----------------------------------------
+
+ *Locate the driver or kernel subsystem that seems to be causing the issue.
+ Find out how and where its developers expect reports. Note: most of the
+ time this won't be bugzilla.kernel.org, as issues typically need to be sent
+ by mail to a maintainer and a public mailing list.*
+
+It's crucial to send your report to the right people, as the Linux kernel is a
+big project and most of its developers are only familiar with a small subset of
+it. Quite a few programmers for example only care for just one driver, for
+example one for a WiFi chip; its developer likely will only have small or no
+knowledge about the internals of remote or unrelated "subsystems", like the TCP
+stack, the PCIe/PCI subsystem, memory management or file systems.
+
+Problem is: the Linux kernel lacks a central bug tracker where you can simply
+file your issue and make it reach the developers that need to know about it.
+That's why you have to find the right place and way to report issues yourself.
+You can do that with the help of a script (see below), but it mainly targets
+kernel developers and experts. For everybody else the MAINTAINERS file is the
+better place.
+
+How to read the MAINTAINERS file
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+To illustrate how to use the :ref:`MAINTAINERS <maintainers>` file, lets assume
+the WiFi in your Laptop suddenly misbehaves after updating the kernel. In that
+case it's likely an issue in the WiFi driver. Obviously it could also be some
+code it builds upon, but unless you suspect something like that stick to the
+driver. If it's really something else, the driver's developers will get the
+right people involved.
+
+Sadly, there is no way to check which code is driving a particular hardware
+component that is both universal and easy.
+
+In case of a problem with the WiFi driver you for example might want to look at
+the output of ``lspci -k``, as it lists devices on the PCI/PCIe bus and the
+kernel module driving it::
+
+ [user@something ~]$ lspci -k
+ [...]
+ 3a:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)
+ Subsystem: Bigfoot Networks, Inc. Device 1535
+ Kernel driver in use: ath10k_pci
+ Kernel modules: ath10k_pci
+ [...]
+
+But this approach won't work if your WiFi chip is connected over USB or some
+other internal bus. In those cases you might want to check your WiFi manager or
+the output of ``ip link``. Look for the name of the problematic network
+interface, which might be something like 'wlp58s0'. This name can be used like
+this to find the module driving it::
+
+ [user@something ~]$ realpath --relative-to=/sys/module/ /sys/class/net/wlp58s0/device/driver/module
+ ath10k_pci
+
+In case tricks like these don't bring you any further, try to search the
+internet on how to narrow down the driver or subsystem in question. And if you
+are unsure which it is: just try your best guess, somebody will help you if you
+guessed poorly.
+
+Once you know the driver or subsystem, you want to search for it in the
+MAINTAINERS file. In the case of 'ath10k_pci' you won't find anything, as the
+name is too specific. Sometimes you will need to search on the net for help;
+but before doing so, try a somewhat shorted or modified name when searching the
+MAINTAINERS file, as then you might find something like this::
+
+ QUALCOMM ATHEROS ATH10K WIRELESS DRIVER
+ Mail: A. Some Human <shuman@example.com>
+ Mailing list: ath10k@lists.infradead.org
+ Status: Supported
+ Web-page: https://wireless.wiki.kernel.org/en/users/Drivers/ath10k
+ SCM: git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
+ Files: drivers/net/wireless/ath/ath10k/
+
+Note: the line description will be abbreviations, if you read the plain
+MAINTAINERS file found in the root of the Linux source tree. 'Mail:' for
+example will be 'M:', 'Mailing list:' will be 'L', and 'Status:' will be 'S:'.
+A section near the top of the file explains these and other abbreviations.
+
+First look at the line 'Status'. Ideally it should be 'Supported' or
+'Maintained'. If it states 'Obsolete' then you are using some outdated approach
+that was replaced by a newer solution you need to switch to. Sometimes the code
+only has someone who provides 'Odd Fixes' when feeling motivated. And with
+'Orphan' you are totally out of luck, as nobody takes care of the code anymore.
+That only leaves these options: arrange yourself to live with the issue, fix it
+yourself, or find a programmer somewhere willing to fix it.
+
+After checking the status, look for a line starting with 'bugs:': it will tell
+you where to find a subsystem specific bug tracker to file your issue. The
+example above does not have such a line. That is the case for most sections, as
+Linux kernel development is completely driven by mail. Very few subsystems use
+a bug tracker, and only some of those rely on bugzilla.kernel.org.
+
+In this and many other cases you thus have to look for lines starting with
+'Mail:' instead. Those mention the name and the email addresses for the
+maintainers of the particular code. Also look for a line starting with 'Mailing
+list:', which tells you the public mailing list where the code is developed.
+Your report later needs to go by mail to those addresses. Additionally, for all
+issue reports sent by email, make sure to add the Linux Kernel Mailing List
+(LKML) <linux-kernel@vger.kernel.org> to CC. Don't omit either of the mailing
+lists when sending your issue report by mail later! Maintainers are busy people
+and might leave some work for other developers on the subsystem specific list;
+and LKML is important to have one place where all issue reports can be found.
+
+
+Finding the maintainers with the help of a script
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For people that have the Linux sources at hand there is a second option to find
+the proper place to report: the script 'scripts/get_maintainer.pl' which tries
+to find all people to contact. It queries the MAINTAINERS file and needs to be
+called with a path to the source code in question. For drivers compiled as
+module if often can be found with a command like this::
+
+ $ modinfo ath10k_pci | grep filename | sed 's!/lib/modules/.*/kernel/!!; s!filename:!!; s!\.ko\(\|\.xz\)!!'
+ drivers/net/wireless/ath/ath10k/ath10k_pci.ko
+
+Pass parts of this to the script::
+
+ $ ./scripts/get_maintainer.pl -f drivers/net/wireless/ath/ath10k*
+ Some Human <shuman@example.com> (supporter:QUALCOMM ATHEROS ATH10K WIRELESS DRIVER)
+ Another S. Human <asomehuman@example.com> (maintainer:NETWORKING DRIVERS)
+ ath10k@lists.infradead.org (open list:QUALCOMM ATHEROS ATH10K WIRELESS DRIVER)
+ linux-wireless@vger.kernel.org (open list:NETWORKING DRIVERS (WIRELESS))
+ netdev@vger.kernel.org (open list:NETWORKING DRIVERS)
+ linux-kernel@vger.kernel.org (open list)
+
+Don't sent your report to all of them. Send it to the maintainers, which the
+script calls "supporter:"; additionally CC the most specific mailing list for
+the code as well as the Linux Kernel Mailing List (LKML). In this case you thus
+would need to send the report to 'Some Human <shuman@example.com>' with
+'ath10k@lists.infradead.org' and 'linux-kernel@vger.kernel.org' in CC.
+
+Note: in case you cloned the Linux sources with git you might want to call
+``get_maintainer.pl`` a second time with ``--git``. The script then will look
+at the commit history to find which people recently worked on the code in
+question, as they might be able to help. But use these results with care, as it
+can easily send you in a wrong direction. That for example happens quickly in
+areas rarely changed (like old or unmaintained drivers): sometimes such code is
+modified during tree-wide cleanups by developers that do not care about the
+particular driver at all.
+
+
+Search for existing reports, second run
+---------------------------------------
+
+ *Search the archives of the bug tracker or mailing list in question
+ thoroughly for reports that might match your issue. If you find anything,
+ join the discussion instead of sending a new report.*
+
+As mentioned earlier already: reporting an issue that someone else already
+brought forward is often a waste of time for everyone involved, especially you
+as the reporter. That's why you should search for existing report again, now
+that you know where they need to be reported to. If it's mailing list, you will
+often find its archives on `lore.kernel.org <https://lore.kernel.org/>`_.
+
+But some list are hosted in different places. That for example is the case for
+the ath10k WiFi driver used as example in the previous step. But you'll often
+find the archives for these lists easily on the net. Searching for 'archive
+ath10k@lists.infradead.org' for example will lead you to the `Info page for the
+ath10k mailing list <https://lists.infradead.org/mailman/listinfo/ath10k>`_,
+which at the top links to its
+`list archives <https://lists.infradead.org/pipermail/ath10k/>`_. Sadly this and
+quite a few other lists miss a way to search the archives. In those cases use a
+regular internet search engine and add something like
+'site:lists.infradead.org/pipermail/ath10k/' to your search terms, which limits
+the results to the archives at that URL.
+
+It's also wise to check the internet, LKML and maybe bugzilla.kernel.org again
+at this point. If your report needs to be filed in a bug tracker, you may want
+to check the mailing list archives for the subsystem as well, as someone might
+have reported it only there.
+
+For details how to search and what to do if you find matching reports see
+"Search for existing reports, first run" above.
+
+Do not hurry with this step of the reporting process: spending 30 to 60 minutes
+or even more time can save you and others quite a lot of time and trouble.
+
+
+Install a fresh kernel for testing
+----------------------------------
+
+ *Unless you are already running the latest 'mainline' Linux kernel, better
+ go and install it for the reporting process. Testing and reporting with
+ the latest 'stable' Linux can be an acceptable alternative in some
+ situations; during the merge window that actually might be even the best
+ approach, but in that development phase it can be an even better idea to
+ suspend your efforts for a few days anyway. Whatever version you choose,
+ ideally use a 'vanilla' built. Ignoring these advices will dramatically
+ increase the risk your report will be rejected or ignored.*
+
+As mentioned in the detailed explanation for the first step already: Like most
+programmers, Linux kernel developers don't like to spend time dealing with
+reports for issues that don't even happen with the current code. It's just a
+waste everybody's time, especially yours. That's why it's in everybody's
+interest that you confirm the issue still exists with the latest upstream code
+before reporting it. You are free to ignore this advice, but as outlined
+earlier: doing so dramatically increases the risk that your issue report might
+get rejected or simply ignored.
+
+In the scope of the kernel "latest upstream" normally means:
+
+ * Install a mainline kernel; the latest stable kernel can be an option, but
+ most of the time is better avoided. Longterm kernels (sometimes called 'LTS
+ kernels') are unsuitable at this point of the process. The next subsection
+ explains all of this in more detail.
+
+ * The over next subsection describes way to obtain and install such a kernel.
+ It also outlines that using a pre-compiled kernel are fine, but better are
+ vanilla, which means: it was built using Linux sources taken straight `from
+ kernel.org <https://kernel.org/>`_ and not modified or enhanced in any way.
+
+Choosing the right version for testing
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Head over to `kernel.org <https://kernel.org/>`_ to find out which version you
+want to use for testing. Ignore the big yellow button that says 'Latest release'
+and look a little lower at the table. At its top you'll see a line starting with
+mainline, which most of the time will point to a pre-release with a version
+number like '5.8-rc2'. If that's the case, you'll want to use this mainline
+kernel for testing, as that where all fixes have to be applied first. Do not let
+that 'rc' scare you, these 'development kernels' are pretty reliable — and you
+made a backup, as you were instructed above, didn't you?
+
+In about two out of every nine to ten weeks, mainline might point you to a
+proper release with a version number like '5.7'. If that happens, consider
+suspending the reporting process until the first pre-release of the next
+version (5.8-rc1) shows up on kernel.org. That's because the Linux development
+cycle then is in its two-week long 'merge window'. The bulk of the changes and
+all intrusive ones get merged for the next release during this time. It's a bit
+more risky to use mainline during this period. Kernel developers are also often
+quite busy then and might have no spare time to deal with issue reports. It's
+also quite possible that one of the many changes applied during the merge
+window fixes the issue you face; that's why you soon would have to retest with
+a newer kernel version anyway, as outlined below in the section 'Duties after
+the report went out'.
+
+That's why it might make sense to wait till the merge window is over. But don't
+to that if you're dealing with something that shouldn't wait. In that case
+consider obtaining the latest mainline kernel via git (see below) or use the
+latest stable version offered on kernel.org. Using that is also acceptable in
+case mainline for some reason does currently not work for you. An in general:
+using it for reproducing the issue is also better than not reporting it issue
+at all.
+
+Better avoid using the latest stable kernel outside merge windows, as all fixes
+must be applied to mainline first. That's why checking the latest mainline
+kernel is so important: any issue you want to see fixed in older version lines
+needs to be fixed in mainline first before it can get backported, which can
+take a few days or weeks. Another reason: the fix you hope for might be too
+hard or risky for backporting; reporting the issue again hence is unlikely to
+change anything.
+
+These aspects are also why longterm kernels (sometimes called "LTS kernels")
+are unsuitable for this part of the reporting process: they are to distant from
+the current code. Hence go and test mainline first and follow the process
+further: if the issue doesn't occur with mainline it will guide you how to get
+it fixed in older version lines, if that's in the cards for the fix in question.
+
+How to obtain a fresh Linux kernel
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Using a pre-compiled kernel**: This is often the quickest, easiest, and safest
+way for testing — especially is you are unfamiliar with the Linux kernel. The
+problem: most of those shipped by distributors or add-on repositories are build
+from modified Linux sources. They are thus not vanilla and therefore often
+unsuitable for testing and issue reporting: the changes might cause the issue
+you face or influence it somehow.
+
+But you are in luck if you are using a popular Linux distribution: for quite a
+few of them you'll find repositories on the net that contain packages with the
+latest mainline or stable Linux built as vanilla kernel. It's totally okay to
+use these, just make sure from the repository's description they are vanilla or
+at least close to it. Additionally ensure the packages contain the latest
+versions as offered on kernel.org. The packages are likely unsuitable if they
+are older than a week, as new mainline and stable kernels typically get released
+at least once a week.
+
+Please note that you might need to build your own kernel manually later: that's
+sometimes needed for debugging or testing fixes, as described later in this
+document. Also be aware that pre-compiled kernels might lack debug symbols that
+are needed to decode messages the kernel prints when a panic, Oops, warning, or
+BUG occurs; if you plan to decode those, you might be better off compiling a
+kernel yourself (see the end of this subsection and the section titled 'Decode
+failure messages' for details).
+
+**Using git**: Developers and experienced Linux users familiar with git are
+often best served by obtaining the latest Linux kernel sources straight from the
+`official development repository on kernel.org
+<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/>`_.
+Those are likely a bit ahead of the latest mainline pre-release. Don't worry
+about it: they are as reliable as a proper pre-release, unless the kernel's
+development cycle is currently in the middle of a merge window. But even then
+they are quite reliable.
+
+**Conventional**: People unfamiliar with git are often best served by
+downloading the sources as tarball from `kernel.org <https://kernel.org/>`_.
+
+How to actually build a kernel is not described here, as many websites explain
+the necessary steps already. If you are new to it, consider following one of
+those how-to's that suggest to use ``make localmodconfig``, as that tries to
+pick up the configuration of your current kernel and then tries to adjust it
+somewhat for your system. That does not make the resulting kernel any better,
+but quicker to compile.
+
+Note: If you are dealing with a panic, Oops, warning, or BUG from the kernel,
+please try to enable CONFIG_KALLSYMS when configuring your kernel.
+Additionally, enable CONFIG_DEBUG_KERNEL and CONFIG_DEBUG_INFO, too; the
+latter is the relevant one of those two, but can only be reached if you enable
+the former. Be aware CONFIG_DEBUG_INFO increases the storage space required to
+build a kernel by quite a bit. But that's worth it, as these options will allow
+you later to pinpoint the exact line of code that triggers your issue. The
+section 'Decode failure messages' below explains this in more detail.
+
+But keep in mind: Always keep a record of the issue encountered in case it is
+hard to reproduce. Sending an undecoded report is better than not reporting
+the issue at all.
+
+
+Check 'taint' flag
+------------------
+
+ *Ensure the kernel you just installed does not 'taint' itself when
+ running.*
+
+As outlined above in more detail already: the kernel sets a 'taint' flag when
+something happens that can lead to follow-up errors that look totally
+unrelated. That's why you need to check if the kernel you just installed does
+not set this flag. And if it does, you in almost all the cases needs to
+eliminate the reason for it before you reporting issues that occur with it. See
+the section above for details how to do that.
+
+
+Reproduce issue with the fresh kernel
+-------------------------------------
+
+ *Reproduce the issue with the kernel you just installed. If it doesn't show
+ up there, scroll down to the instructions for issues only happening with
+ stable and longterm kernels.*
+
+Check if the issue occurs with the fresh Linux kernel version you just
+installed. If it was fixed there already, consider sticking with this version
+line and abandoning your plan to report the issue. But keep in mind that other
+users might still be plagued by it, as long as it's not fixed in either stable
+and longterm version from kernel.org (and thus vendor kernels derived from
+those). If you prefer to use one of those or just want to help their users,
+head over to the section "Details about reporting issues only occurring in
+older kernel version lines" below.
+
+
+Optimize description to reproduce issue
+---------------------------------------
+
+ *Optimize your notes: try to find and write the most straightforward way to
+ reproduce your issue. Make sure the end result has all the important
+ details, and at the same time is easy to read and understand for others
+ that hear about it for the first time. And if you learned something in this
+ process, consider searching again for existing reports about the issue.*
+
+An unnecessarily complex report will make it hard for others to understand your
+report. Thus try to find a reproducer that's straight forward to describe and
+thus easy to understand in written form. Include all important details, but at
+the same time try to keep it as short as possible.
+
+In this in the previous steps you likely have learned a thing or two about the
+issue you face. Use this knowledge and search again for existing reports
+instead you can join.
+
+
+Decode failure messages
+-----------------------
+
+ *If your failure involves a 'panic', 'Oops', 'warning', or 'BUG', consider
+ decoding the kernel log to find the line of code that triggered the error.*
+
+When the kernel detects an internal problem, it will log some information about
+the executed code. This makes it possible to pinpoint the exact line in the
+source code that triggered the issue and shows how it was called. But that only
+works if you enabled CONFIG_DEBUG_INFO and CONFIG_KALLSYMS when configuring
+your kernel. If you did so, consider to decode the information from the
+kernel's log. That will make it a lot easier to understand what lead to the
+'panic', 'Oops', 'warning', or 'BUG', which increases the chances that someone
+can provide a fix.
+
+Decoding can be done with a script you find in the Linux source tree. If you
+are running a kernel you compiled yourself earlier, call it like this::
+
+ [user@something ~]$ sudo dmesg | ./linux-5.10.5/scripts/decode_stacktrace.sh ./linux-5.10.5/vmlinux
+
+If you are running a packaged vanilla kernel, you will likely have to install
+the corresponding packages with debug symbols. Then call the script (which you
+might need to get from the Linux sources if your distro does not package it)
+like this::
+
+ [user@something ~]$ sudo dmesg | ./linux-5.10.5/scripts/decode_stacktrace.sh \
+ /usr/lib/debug/lib/modules/5.10.10-4.1.x86_64/vmlinux /usr/src/kernels/5.10.10-4.1.x86_64/
+
+The script will work on log lines like the following, which show the address of
+the code the kernel was executing when the error occurred::
+
+ [ 68.387301] RIP: 0010:test_module_init+0x5/0xffa [test_module]
+
+Once decoded, these lines will look like this::
+
+ [ 68.387301] RIP: 0010:test_module_init (/home/username/linux-5.10.5/test-module/test-module.c:16) test_module
+
+In this case the executed code was built from the file
+'~/linux-5.10.5/test-module/test-module.c' and the error occurred by the
+instructions found in line '16'.
+
+The script will similarly decode the addresses mentioned in the section
+starting with 'Call trace', which show the path to the function where the
+problem occurred. Additionally, the script will show the assembler output for
+the code section the kernel was executing.
+
+Note, if you can't get this to work, simply skip this step and mention the
+reason for it in the report. If you're lucky, it might not be needed. And if it
+is, someone might help you to get things going. Also be aware this is just one
+of several ways to decode kernel stack traces. Sometimes different steps will
+be required to retrieve the relevant details. Don't worry about that, if that's
+needed in your case, developers will tell you what to do.
+
+
+Special care for regressions
+----------------------------
+
+ *If your problem is a regression, try to narrow down when the issue was
+ introduced as much as possible.*
+
+Linux lead developer Linus Torvalds insists that the Linux kernel never
+worsens, that's why he deems regressions as unacceptable and wants to see them
+fixed quickly. That's why changes that introduced a regression are often
+promptly reverted if the issue they cause can't get solved quickly any other
+way. Reporting a regression is thus a bit like playing a kind of trump card to
+get something quickly fixed. But for that to happen the change that's causing
+the regression needs to be known. Normally it's up to the reporter to track
+down the culprit, as maintainers often won't have the time or setup at hand to
+reproduce it themselves.
+
+To find the change there is a process called 'bisection' which the document
+Documentation/admin-guide/bug-bisect.rst describes in detail. That process
+will often require you to build about ten to twenty kernel images, trying to
+reproduce the issue with each of them before building the next. Yes, that takes
+some time, but don't worry, it works a lot quicker than most people assume.
+Thanks to a 'binary search' this will lead you to the one commit in the source
+code management system that's causing the regression. Once you find it, search
+the net for the subject of the change, its commit id and the shortened commit id
+(the first 12 characters of the commit id). This will lead you to existing
+reports about it, if there are any.
+
+Note, a bisection needs a bit of know-how, which not everyone has, and quite a
+bit of effort, which not everyone is willing to invest. Nevertheless, it's
+highly recommended performing a bisection yourself. If you really can't or
+don't want to go down that route at least find out which mainline kernel
+introduced the regression. If something for example breaks when switching from
+5.5.15 to 5.8.4, then try at least all the mainline releases in that area (5.6,
+5.7 and 5.8) to check when it first showed up. Unless you're trying to find a
+regression in a stable or longterm kernel, avoid testing versions which number
+has three sections (5.6.12, 5.7.8), as that makes the outcome hard to
+interpret, which might render your testing useless. Once you found the major
+version which introduced the regression, feel free to move on in the reporting
+process. But keep in mind: it depends on the issue at hand if the developers
+will be able to help without knowing the culprit. Sometimes they might
+recognize from the report want went wrong and can fix it; other times they will
+be unable to help unless you perform a bisection.
+
+When dealing with regressions make sure the issue you face is really caused by
+the kernel and not by something else, as outlined above already.
+
+In the whole process keep in mind: an issue only qualifies as regression if the
+older and the newer kernel got built with a similar configuration. This can be
+achieved by using ``make olddefconfig``, as explained in more detail by
+Documentation/admin-guide/reporting-regressions.rst; that document also
+provides a good deal of other information about regressions you might want to be
+aware of.
+
+
+Write and send the report
+-------------------------
+
+ *Start to compile the report by writing a detailed description about the
+ issue. Always mention a few things: the latest kernel version you installed
+ for reproducing, the Linux Distribution used, and your notes on how to
+ reproduce the issue. Ideally, make the kernel's build configuration
+ (.config) and the output from ``dmesg`` available somewhere on the net and
+ link to it. Include or upload all other information that might be relevant,
+ like the output/screenshot of an Oops or the output from ``lspci``. Once
+ you wrote this main part, insert a normal length paragraph on top of it
+ outlining the issue and the impact quickly. On top of this add one sentence
+ that briefly describes the problem and gets people to read on. Now give the
+ thing a descriptive title or subject that yet again is shorter. Then you're
+ ready to send or file the report like the MAINTAINERS file told you, unless
+ you are dealing with one of those 'issues of high priority': they need
+ special care which is explained in 'Special handling for high priority
+ issues' below.*
+
+Now that you have prepared everything it's time to write your report. How to do
+that is partly explained by the three documents linked to in the preface above.
+That's why this text will only mention a few of the essentials as well as
+things specific to the Linux kernel.
+
+There is one thing that fits both categories: the most crucial parts of your
+report are the title/subject, the first sentence, and the first paragraph.
+Developers often get quite a lot of mail. They thus often just take a few
+seconds to skim a mail before deciding to move on or look closer. Thus: the
+better the top section of your report, the higher are the chances that someone
+will look into it and help you. And that is why you should ignore them for now
+and write the detailed report first. ;-)
+
+Things each report should mention
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Describe in detail how your issue happens with the fresh vanilla kernel you
+installed. Try to include the step-by-step instructions you wrote and optimized
+earlier that outline how you and ideally others can reproduce the issue; in
+those rare cases where that's impossible try to describe what you did to
+trigger it.
+
+Also include all the relevant information others might need to understand the
+issue and its environment. What's actually needed depends a lot on the issue,
+but there are some things you should include always:
+
+ * the output from ``cat /proc/version``, which contains the Linux kernel
+ version number and the compiler it was built with.
+
+ * the Linux distribution the machine is running (``hostnamectl | grep
+ "Operating System"``)
+
+ * the architecture of the CPU and the operating system (``uname -mi``)
+
+ * if you are dealing with a regression and performed a bisection, mention the
+ subject and the commit-id of the change that is causing it.
+
+In a lot of cases it's also wise to make two more things available to those
+that read your report:
+
+ * the configuration used for building your Linux kernel (the '.config' file)
+
+ * the kernel's messages that you get from ``dmesg`` written to a file. Make
+ sure that it starts with a line like 'Linux version 5.8-1
+ (foobar@example.com) (gcc (GCC) 10.2.1, GNU ld version 2.34) #1 SMP Mon Aug
+ 3 14:54:37 UTC 2020' If it's missing, then important messages from the first
+ boot phase already got discarded. In this case instead consider using
+ ``journalctl -b 0 -k``; alternatively you can also reboot, reproduce the
+ issue and call ``dmesg`` right afterwards.
+
+These two files are big, that's why it's a bad idea to put them directly into
+your report. If you are filing the issue in a bug tracker then attach them to
+the ticket. If you report the issue by mail do not attach them, as that makes
+the mail too large; instead do one of these things:
+
+ * Upload the files somewhere public (your website, a public file paste
+ service, a ticket created just for this purpose on `bugzilla.kernel.org
+ <https://bugzilla.kernel.org/>`_, ...) and include a link to them in your
+ report. Ideally use something where the files stay available for years, as
+ they could be useful to someone many years from now; this for example can
+ happen if five or ten years from now a developer works on some code that was
+ changed just to fix your issue.
+
+ * Put the files aside and mention you will send them later in individual
+ replies to your own mail. Just remember to actually do that once the report
+ went out. ;-)
+
+Things that might be wise to provide
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Depending on the issue you might need to add more background data. Here are a
+few suggestions what often is good to provide:
+
+ * If you are dealing with a 'warning', an 'OOPS' or a 'panic' from the kernel,
+ include it. If you can't copy'n'paste it, try to capture a netconsole trace
+ or at least take a picture of the screen.
+
+ * If the issue might be related to your computer hardware, mention what kind
+ of system you use. If you for example have problems with your graphics card,
+ mention its manufacturer, the card's model, and what chip is uses. If it's a
+ laptop mention its name, but try to make sure it's meaningful. 'Dell XPS 13'
+ for example is not, because it might be the one from 2012; that one looks
+ not that different from the one sold today, but apart from that the two have
+ nothing in common. Hence, in such cases add the exact model number, which
+ for example are '9380' or '7390' for XPS 13 models introduced during 2019.
+ Names like 'Lenovo Thinkpad T590' are also somewhat ambiguous: there are
+ variants of this laptop with and without a dedicated graphics chip, so try
+ to find the exact model name or specify the main components.
+
+ * Mention the relevant software in use. If you have problems with loading
+ modules, you want to mention the versions of kmod, systemd, and udev in use.
+ If one of the DRM drivers misbehaves, you want to state the versions of
+ libdrm and Mesa; also specify your Wayland compositor or the X-Server and
+ its driver. If you have a filesystem issue, mention the version of
+ corresponding filesystem utilities (e2fsprogs, btrfs-progs, xfsprogs, ...).
+
+ * Gather additional information from the kernel that might be of interest. The
+ output from ``lspci -nn`` will for example help others to identify what
+ hardware you use. If you have a problem with hardware you even might want to
+ make the output from ``sudo lspci -vvv`` available, as that provides
+ insights how the components were configured. For some issues it might be
+ good to include the contents of files like ``/proc/cpuinfo``,
+ ``/proc/ioports``, ``/proc/iomem``, ``/proc/modules``, or
+ ``/proc/scsi/scsi``. Some subsystem also offer tools to collect relevant
+ information. One such tool is ``alsa-info.sh`` `which the audio/sound
+ subsystem developers provide <https://www.alsa-project.org/wiki/AlsaInfo>`_.
+
+Those examples should give your some ideas of what data might be wise to
+attach, but you have to think yourself what will be helpful for others to know.
+Don't worry too much about forgetting something, as developers will ask for
+additional details they need. But making everything important available from
+the start increases the chance someone will take a closer look.
+
+
+The important part: the head of your report
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Now that you have the detailed part of the report prepared let's get to the
+most important section: the first few sentences. Thus go to the top, add
+something like 'The detailed description:' before the part you just wrote and
+insert two newlines at the top. Now write one normal length paragraph that
+describes the issue roughly. Leave out all boring details and focus on the
+crucial parts readers need to know to understand what this is all about; if you
+think this bug affects a lot of users, mention this to get people interested.
+
+Once you did that insert two more lines at the top and write a one sentence
+summary that explains quickly what the report is about. After that you have to
+get even more abstract and write an even shorter subject/title for the report.
+
+Now that you have written this part take some time to optimize it, as it is the
+most important parts of your report: a lot of people will only read this before
+they decide if reading the rest is time well spent.
+
+Now send or file the report like the :ref:`MAINTAINERS <maintainers>` file told
+you, unless it's one of those 'issues of high priority' outlined earlier: in
+that case please read the next subsection first before sending the report on
+its way.
+
+Special handling for high priority issues
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Reports for high priority issues need special handling.
+
+**Severe issues**: make sure the subject or ticket title as well as the first
+paragraph makes the severeness obvious.
+
+**Regressions**: make the report's subject start with '[REGRESSION]'.
+
+In case you performed a successful bisection, use the title of the change that
+introduced the regression as the second part of your subject. Make the report
+also mention the commit id of the culprit. In case of an unsuccessful bisection,
+make your report mention the latest tested version that's working fine (say 5.7)
+and the oldest where the issue occurs (say 5.8-rc1).
+
+When sending the report by mail, CC the Linux regressions mailing list
+(regressions@lists.linux.dev). In case the report needs to be filed to some web
+tracker, proceed to do so. Once filed, forward the report by mail to the
+regressions list; CC the maintainer and the mailing list for the subsystem in
+question. Make sure to inline the forwarded report, hence do not attach it.
+Also add a short note at the top where you mention the URL to the ticket.
+
+When mailing or forwarding the report, in case of a successful bisection add the
+author of the culprit to the recipients; also CC everyone in the signed-off-by
+chain, which you find at the end of its commit message.
+
+**Security issues**: for these issues your will have to evaluate if a
+short-term risk to other users would arise if details were publicly disclosed.
+If that's not the case simply proceed with reporting the issue as described.
+For issues that bear such a risk you will need to adjust the reporting process
+slightly:
+
+ * If the MAINTAINERS file instructed you to report the issue by mail, do not
+ CC any public mailing lists.
+
+ * If you were supposed to file the issue in a bug tracker make sure to mark
+ the ticket as 'private' or 'security issue'. If the bug tracker does not
+ offer a way to keep reports private, forget about it and send your report as
+ a private mail to the maintainers instead.
+
+In both cases make sure to also mail your report to the addresses the
+MAINTAINERS file lists in the section 'security contact'. Ideally directly CC
+them when sending the report by mail. If you filed it in a bug tracker, forward
+the report's text to these addresses; but on top of it put a small note where
+you mention that you filed it with a link to the ticket.
+
+See Documentation/process/security-bugs.rst for more information.
+
+
+Duties after the report went out
+--------------------------------
+
+ *Wait for reactions and keep the thing rolling until you can accept the
+ outcome in one way or the other. Thus react publicly and in a timely manner
+ to any inquiries. Test proposed fixes. Do proactive testing: retest with at
+ least every first release candidate (RC) of a new mainline version and
+ report your results. Send friendly reminders if things stall. And try to
+ help yourself, if you don't get any help or if it's unsatisfying.*
+
+If your report was good and you are really lucky then one of the developers
+might immediately spot what's causing the issue; they then might write a patch
+to fix it, test it, and send it straight for integration in mainline while
+tagging it for later backport to stable and longterm kernels that need it. Then
+all you need to do is reply with a 'Thank you very much' and switch to a version
+with the fix once it gets released.
+
+But this ideal scenario rarely happens. That's why the job is only starting
+once you got the report out. What you'll have to do depends on the situations,
+but often it will be the things listed below. But before digging into the
+details, here are a few important things you need to keep in mind for this part
+of the process.
+
+
+General advice for further interactions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+**Always reply in public**: When you filed the issue in a bug tracker, always
+reply there and do not contact any of the developers privately about it. For
+mailed reports always use the 'Reply-all' function when replying to any mails
+you receive. That includes mails with any additional data you might want to add
+to your report: go to your mail applications 'Sent' folder and use 'reply-all'
+on your mail with the report. This approach will make sure the public mailing
+list(s) and everyone else that gets involved over time stays in the loop; it
+also keeps the mail thread intact, which among others is really important for
+mailing lists to group all related mails together.
+
+There are just two situations where a comment in a bug tracker or a 'Reply-all'
+is unsuitable:
+
+ * Someone tells you to send something privately.
+
+ * You were told to send something, but noticed it contains sensitive
+ information that needs to be kept private. In that case it's okay to send it
+ in private to the developer that asked for it. But note in the ticket or a
+ mail that you did that, so everyone else knows you honored the request.
+
+**Do research before asking for clarifications or help**: In this part of the
+process someone might tell you to do something that requires a skill you might
+not have mastered yet. For example, you might be asked to use some test tools
+you never have heard of yet; or you might be asked to apply a patch to the
+Linux kernel sources to test if it helps. In some cases it will be fine sending
+a reply asking for instructions how to do that. But before going that route try
+to find the answer own your own by searching the internet; alternatively
+consider asking in other places for advice. For example ask a friend or post
+about it to a chatroom or forum you normally hang out.
+
+**Be patient**: If you are really lucky you might get a reply to your report
+within a few hours. But most of the time it will take longer, as maintainers
+are scattered around the globe and thus might be in a different time zone – one
+where they already enjoy their night away from keyboard.
+
+In general, kernel developers will take one to five business days to respond to
+reports. Sometimes it will take longer, as they might be busy with the merge
+windows, other work, visiting developer conferences, or simply enjoying a long
+summer holiday.
+
+The 'issues of high priority' (see above for an explanation) are an exception
+here: maintainers should address them as soon as possible; that's why you
+should wait a week at maximum (or just two days if it's something urgent)
+before sending a friendly reminder.
+
+Sometimes the maintainer might not be responding in a timely manner; other
+times there might be disagreements, for example if an issue qualifies as
+regression or not. In such cases raise your concerns on the mailing list and
+ask others for public or private replies how to move on. If that fails, it
+might be appropriate to get a higher authority involved. In case of a WiFi
+driver that would be the wireless maintainers; if there are no higher level
+maintainers or all else fails, it might be one of those rare situations where
+it's okay to get Linus Torvalds involved.
+
+**Proactive testing**: Every time the first pre-release (the 'rc1') of a new
+mainline kernel version gets released, go and check if the issue is fixed there
+or if anything of importance changed. Mention the outcome in the ticket or in a
+mail you sent as reply to your report (make sure it has all those in the CC
+that up to that point participated in the discussion). This will show your
+commitment and that you are willing to help. It also tells developers if the
+issue persists and makes sure they do not forget about it. A few other
+occasional retests (for example with rc3, rc5 and the final) are also a good
+idea, but only report your results if something relevant changed or if you are
+writing something anyway.
+
+With all these general things off the table let's get into the details of how
+to help to get issues resolved once they were reported.
+
+Inquires and testing request
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Here are your duties in case you got replies to your report:
+
+**Check who you deal with**: Most of the time it will be the maintainer or a
+developer of the particular code area that will respond to your report. But as
+issues are normally reported in public it could be anyone that's replying —
+including people that want to help, but in the end might guide you totally off
+track with their questions or requests. That rarely happens, but it's one of
+many reasons why it's wise to quickly run an internet search to see who you're
+interacting with. By doing this you also get aware if your report was heard by
+the right people, as a reminder to the maintainer (see below) might be in order
+later if discussion fades out without leading to a satisfying solution for the
+issue.
+
+**Inquiries for data**: Often you will be asked to test something or provide
+additional details. Try to provide the requested information soon, as you have
+the attention of someone that might help and risk losing it the longer you
+wait; that outcome is even likely if you do not provide the information within
+a few business days.
+
+**Requests for testing**: When you are asked to test a diagnostic patch or a
+possible fix, try to test it in timely manner, too. But do it properly and make
+sure to not rush it: mixing things up can happen easily and can lead to a lot
+of confusion for everyone involved. A common mistake for example is thinking a
+proposed patch with a fix was applied, but in fact wasn't. Things like that
+happen even to experienced testers occasionally, but they most of the time will
+notice when the kernel with the fix behaves just as one without it.
+
+What to do when nothing of substance happens
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Some reports will not get any reaction from the responsible Linux kernel
+developers; or a discussion around the issue evolved, but faded out with
+nothing of substance coming out of it.
+
+In these cases wait two (better: three) weeks before sending a friendly
+reminder: maybe the maintainer was just away from keyboard for a while when
+your report arrived or had something more important to take care of. When
+writing the reminder, kindly ask if anything else from your side is needed to
+get the ball running somehow. If the report got out by mail, do that in the
+first lines of a mail that is a reply to your initial mail (see above) which
+includes a full quote of the original report below: that's on of those few
+situations where such a 'TOFU' (Text Over, Fullquote Under) is the right
+approach, as then all the recipients will have the details at hand immediately
+in the proper order.
+
+After the reminder wait three more weeks for replies. If you still don't get a
+proper reaction, you first should reconsider your approach. Did you maybe try
+to reach out to the wrong people? Was the report maybe offensive or so
+confusing that people decided to completely stay away from it? The best way to
+rule out such factors: show the report to one or two people familiar with FLOSS
+issue reporting and ask for their opinion. Also ask them for their advice how
+to move forward. That might mean: prepare a better report and make those people
+review it before you send it out. Such an approach is totally fine; just
+mention that this is the second and improved report on the issue and include a
+link to the first report.
+
+If the report was proper you can send a second reminder; in it ask for advice
+why the report did not get any replies. A good moment for this second reminder
+mail is shortly after the first pre-release (the 'rc1') of a new Linux kernel
+version got published, as you should retest and provide a status update at that
+point anyway (see above).
+
+If the second reminder again results in no reaction within a week, try to
+contact a higher-level maintainer asking for advice: even busy maintainers by
+then should at least have sent some kind of acknowledgment.
+
+Remember to prepare yourself for a disappointment: maintainers ideally should
+react somehow to every issue report, but they are only obliged to fix those
+'issues of high priority' outlined earlier. So don't be too devastating if you
+get a reply along the lines of 'thanks for the report, I have more important
+issues to deal with currently and won't have time to look into this for the
+foreseeable future'.
+
+It's also possible that after some discussion in the bug tracker or on a list
+nothing happens anymore and reminders don't help to motivate anyone to work out
+a fix. Such situations can be devastating, but is within the cards when it
+comes to Linux kernel development. This and several other reasons for not
+getting help are explained in 'Why some issues won't get any reaction or remain
+unfixed after being reported' near the end of this document.
+
+Don't get devastated if you don't find any help or if the issue in the end does
+not get solved: the Linux kernel is FLOSS and thus you can still help yourself.
+You for example could try to find others that are affected and team up with
+them to get the issue resolved. Such a team could prepare a fresh report
+together that mentions how many you are and why this is something that in your
+option should get fixed. Maybe together you can also narrow down the root cause
+or the change that introduced a regression, which often makes developing a fix
+easier. And with a bit of luck there might be someone in the team that knows a
+bit about programming and might be able to write a fix.
+
+
+Reference for "Reporting regressions within a stable and longterm kernel line"
+------------------------------------------------------------------------------
+
+This subsection provides details for the steps you need to perform if you face
+a regression within a stable and longterm kernel line.
+
+Make sure the particular version line still gets support
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ *Check if the kernel developers still maintain the Linux kernel version
+ line you care about: go to the front page of kernel.org and make sure it
+ mentions the latest release of the particular version line without an
+ '[EOL]' tag.*
+
+Most kernel version lines only get supported for about three months, as
+maintaining them longer is quite a lot of work. Hence, only one per year is
+chosen and gets supported for at least two years (often six). That's why you
+need to check if the kernel developers still support the version line you care
+for.
+
+Note, if kernel.org lists two stable version lines on the front page, you
+should consider switching to the newer one and forget about the older one:
+support for it is likely to be abandoned soon. Then it will get a "end-of-life"
+(EOL) stamp. Version lines that reached that point still get mentioned on the
+kernel.org front page for a week or two, but are unsuitable for testing and
+reporting.
+
+Search stable mailing list
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ *Check the archives of the Linux stable mailing list for existing reports.*
+
+Maybe the issue you face is already known and was fixed or is about to. Hence,
+`search the archives of the Linux stable mailing list
+<https://lore.kernel.org/stable/>`_ for reports about an issue like yours. If
+you find any matches, consider joining the discussion, unless the fix is
+already finished and scheduled to get applied soon.
+
+Reproduce issue with the newest release
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ *Install the latest release from the particular version line as a vanilla
+ kernel. Ensure this kernel is not tainted and still shows the problem, as
+ the issue might have already been fixed there. If you first noticed the
+ problem with a vendor kernel, check a vanilla build of the last version
+ known to work performs fine as well.*
+
+Before investing any more time in this process you want to check if the issue
+was already fixed in the latest release of version line you're interested in.
+This kernel needs to be vanilla and shouldn't be tainted before the issue
+happens, as detailed outlined already above in the section "Install a fresh
+kernel for testing".
+
+Did you first notice the regression with a vendor kernel? Then changes the
+vendor applied might be interfering. You need to rule that out by performing
+a recheck. Say something broke when you updated from 5.10.4-vendor.42 to
+5.10.5-vendor.43. Then after testing the latest 5.10 release as outlined in
+the previous paragraph check if a vanilla build of Linux 5.10.4 works fine as
+well. If things are broken there, the issue does not qualify as upstream
+regression and you need switch back to the main step-by-step guide to report
+the issue.
+
+Report the regression
+~~~~~~~~~~~~~~~~~~~~~
+
+ *Send a short problem report to the Linux stable mailing list
+ (stable@vger.kernel.org) and CC the Linux regressions mailing list
+ (regressions@lists.linux.dev); if you suspect the cause in a particular
+ subsystem, CC its maintainer and its mailing list. Roughly describe the
+ issue and ideally explain how to reproduce it. Mention the first version
+ that shows the problem and the last version that's working fine. Then
+ wait for further instructions.*
+
+When reporting a regression that happens within a stable or longterm kernel
+line (say when updating from 5.10.4 to 5.10.5) a brief report is enough for
+the start to get the issue reported quickly. Hence a rough description to the
+stable and regressions mailing list is all it takes; but in case you suspect
+the cause in a particular subsystem, CC its maintainers and its mailing list
+as well, because that will speed things up.
+
+And note, it helps developers a great deal if you can specify the exact version
+that introduced the problem. Hence if possible within a reasonable time frame,
+try to find that version using vanilla kernels. Lets assume something broke when
+your distributor released a update from Linux kernel 5.10.5 to 5.10.8. Then as
+instructed above go and check the latest kernel from that version line, say
+5.10.9. If it shows the problem, try a vanilla 5.10.5 to ensure that no patches
+the distributor applied interfere. If the issue doesn't manifest itself there,
+try 5.10.7 and then (depending on the outcome) 5.10.8 or 5.10.6 to find the
+first version where things broke. Mention it in the report and state that 5.10.9
+is still broken.
+
+What the previous paragraph outlines is basically a rough manual 'bisection'.
+Once your report is out your might get asked to do a proper one, as it allows to
+pinpoint the exact change that causes the issue (which then can easily get
+reverted to fix the issue quickly). Hence consider to do a proper bisection
+right away if time permits. See the section 'Special care for regressions' and
+the document Documentation/admin-guide/bug-bisect.rst for details how to
+perform one. In case of a successful bisection add the author of the culprit to
+the recipients; also CC everyone in the signed-off-by chain, which you find at
+the end of its commit message.
+
+
+Reference for "Reporting issues only occurring in older kernel version lines"
+-----------------------------------------------------------------------------
+
+This section provides details for the steps you need to take if you could not
+reproduce your issue with a mainline kernel, but want to see it fixed in older
+version lines (aka stable and longterm kernels).
+
+Some fixes are too complex
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ *Prepare yourself for the possibility that going through the next few steps
+ might not get the issue solved in older releases: the fix might be too big
+ or risky to get backported there.*
+
+Even small and seemingly obvious code-changes sometimes introduce new and
+totally unexpected problems. The maintainers of the stable and longterm kernels
+are very aware of that and thus only apply changes to these kernels that are
+within rules outlined in Documentation/process/stable-kernel-rules.rst.
+
+Complex or risky changes for example do not qualify and thus only get applied
+to mainline. Other fixes are easy to get backported to the newest stable and
+longterm kernels, but too risky to integrate into older ones. So be aware the
+fix you are hoping for might be one of those that won't be backported to the
+version line your care about. In that case you'll have no other choice then to
+live with the issue or switch to a newer Linux version, unless you want to
+patch the fix into your kernels yourself.
+
+Common preparations
+~~~~~~~~~~~~~~~~~~~
+
+ *Perform the first three steps in the section "Reporting issues only
+ occurring in older kernel version lines" above.*
+
+You need to carry out a few steps already described in another section of this
+guide. Those steps will let you:
+
+ * Check if the kernel developers still maintain the Linux kernel version line
+ you care about.
+
+ * Search the Linux stable mailing list for exiting reports.
+
+ * Check with the latest release.
+
+
+Check code history and search for existing discussions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ *Search the Linux kernel version control system for the change that fixed
+ the issue in mainline, as its commit message might tell you if the fix is
+ scheduled for backporting already. If you don't find anything that way,
+ search the appropriate mailing lists for posts that discuss such an issue
+ or peer-review possible fixes; then check the discussions if the fix was
+ deemed unsuitable for backporting. If backporting was not considered at
+ all, join the newest discussion, asking if it's in the cards.*
+
+In a lot of cases the issue you deal with will have happened with mainline, but
+got fixed there. The commit that fixed it would need to get backported as well
+to get the issue solved. That's why you want to search for it or any
+discussions abound it.
+
+ * First try to find the fix in the Git repository that holds the Linux kernel
+ sources. You can do this with the web interfaces `on kernel.org
+ <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/>`_
+ or its mirror `on GitHub <https://github.com/torvalds/linux>`_; if you have
+ a local clone you alternatively can search on the command line with ``git
+ log --grep=<pattern>``.
+
+ If you find the fix, look if the commit message near the end contains a
+ 'stable tag' that looks like this:
+
+ Cc: <stable@vger.kernel.org> # 5.4+
+
+ If that's case the developer marked the fix safe for backporting to version
+ line 5.4 and later. Most of the time it's getting applied there within two
+ weeks, but sometimes it takes a bit longer.
+
+ * If the commit doesn't tell you anything or if you can't find the fix, look
+ again for discussions about the issue. Search the net with your favorite
+ internet search engine as well as the archives for the `Linux kernel
+ developers mailing list <https://lore.kernel.org/lkml/>`_. Also read the
+ section `Locate kernel area that causes the issue` above and follow the
+ instructions to find the subsystem in question: its bug tracker or mailing
+ list archive might have the answer you are looking for.
+
+ * If you see a proposed fix, search for it in the version control system as
+ outlined above, as the commit might tell you if a backport can be expected.
+
+ * Check the discussions for any indicators the fix might be too risky to get
+ backported to the version line you care about. If that's the case you have
+ to live with the issue or switch to the kernel version line where the fix
+ got applied.
+
+ * If the fix doesn't contain a stable tag and backporting was not discussed,
+ join the discussion: mention the version where you face the issue and that
+ you would like to see it fixed, if suitable.
+
+
+Ask for advice
+~~~~~~~~~~~~~~
+
+ *One of the former steps should lead to a solution. If that doesn't work
+ out, ask the maintainers for the subsystem that seems to be causing the
+ issue for advice; CC the mailing list for the particular subsystem as well
+ as the stable mailing list.*
+
+If the previous three steps didn't get you closer to a solution there is only
+one option left: ask for advice. Do that in a mail you sent to the maintainers
+for the subsystem where the issue seems to have its roots; CC the mailing list
+for the subsystem as well as the stable mailing list (stable@vger.kernel.org).
+
+
+Why some issues won't get any reaction or remain unfixed after being reported
+=============================================================================
+
+When reporting a problem to the Linux developers, be aware only 'issues of high
+priority' (regressions, security issues, severe problems) are definitely going
+to get resolved. The maintainers or if all else fails Linus Torvalds himself
+will make sure of that. They and the other kernel developers will fix a lot of
+other issues as well. But be aware that sometimes they can't or won't help; and
+sometimes there isn't even anyone to send a report to.
+
+This is best explained with kernel developers that contribute to the Linux
+kernel in their spare time. Quite a few of the drivers in the kernel were
+written by such programmers, often because they simply wanted to make their
+hardware usable on their favorite operating system.
+
+These programmers most of the time will happily fix problems other people
+report. But nobody can force them to do, as they are contributing voluntarily.
+
+Then there are situations where such developers really want to fix an issue,
+but can't: sometimes they lack hardware programming documentation to do so.
+This often happens when the publicly available docs are superficial or the
+driver was written with the help of reverse engineering.
+
+Sooner or later spare time developers will also stop caring for the driver.
+Maybe their test hardware broke, got replaced by something more fancy, or is so
+old that it's something you don't find much outside of computer museums
+anymore. Sometimes developer stops caring for their code and Linux at all, as
+something different in their life became way more important. In some cases
+nobody is willing to take over the job as maintainer – and nobody can be forced
+to, as contributing to the Linux kernel is done on a voluntary basis. Abandoned
+drivers nevertheless remain in the kernel: they are still useful for people and
+removing would be a regression.
+
+The situation is not that different with developers that are paid for their
+work on the Linux kernel. Those contribute most changes these days. But their
+employers sooner or later also stop caring for their code or make its
+programmer focus on other things. Hardware vendors for example earn their money
+mainly by selling new hardware; quite a few of them hence are not investing
+much time and energy in maintaining a Linux kernel driver for something they
+stopped selling years ago. Enterprise Linux distributors often care for a
+longer time period, but in new versions often leave support for old and rare
+hardware aside to limit the scope. Often spare time contributors take over once
+a company orphans some code, but as mentioned above: sooner or later they will
+leave the code behind, too.
+
+Priorities are another reason why some issues are not fixed, as maintainers
+quite often are forced to set those, as time to work on Linux is limited.
+That's true for spare time or the time employers grant their developers to
+spend on maintenance work on the upstream kernel. Sometimes maintainers also
+get overwhelmed with reports, even if a driver is working nearly perfectly. To
+not get completely stuck, the programmer thus might have no other choice than
+to prioritize issue reports and reject some of them.
+
+But don't worry too much about all of this, a lot of drivers have active
+maintainers who are quite interested in fixing as many issues as possible.
+
+
+Closing words
+=============
+
+Compared with other Free/Libre & Open Source Software it's hard to report
+issues to the Linux kernel developers: the length and complexity of this
+document and the implications between the lines illustrate that. But that's how
+it is for now. The main author of this text hopes documenting the state of the
+art will lay some groundwork to improve the situation over time.
+
+
+..
+ end-of-content
+..
+ This document is maintained by Thorsten Leemhuis <linux@leemhuis.info>. If
+ you spot a typo or small mistake, feel free to let him know directly and
+ he'll fix it. You are free to do the same in a mostly informal way if you
+ want to contribute changes to the text, but for copyright reasons please CC
+ linux-doc@vger.kernel.org and "sign-off" your contribution as
+ Documentation/process/submitting-patches.rst outlines in the section "Sign
+ your work - the Developer's Certificate of Origin".
+..
+ This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top
+ of the file. If you want to distribute this text under CC-BY-4.0 only,
+ please use "The Linux kernel developers" for author attribution and link
+ this as source:
+ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-issues.rst
+..
+ Note: Only the content of this RST file as found in the Linux kernel sources
+ is available under CC-BY-4.0, as versions of this text that were processed
+ (for example by the kernel's build system) might contain content taken from
+ files which use a more restrictive license.
diff --git a/Documentation/admin-guide/reporting-regressions.rst b/Documentation/admin-guide/reporting-regressions.rst
new file mode 100644
index 000000000000..d8adccdae23f
--- /dev/null
+++ b/Documentation/admin-guide/reporting-regressions.rst
@@ -0,0 +1,451 @@
+.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
+.. [see the bottom of this file for redistribution information]
+
+Reporting regressions
++++++++++++++++++++++
+
+"*We don't cause regressions*" is the first rule of Linux kernel development;
+Linux founder and lead developer Linus Torvalds established it himself and
+ensures it's obeyed.
+
+This document describes what the rule means for users and how the Linux kernel's
+development model ensures to address all reported regressions; aspects relevant
+for kernel developers are left to Documentation/process/handling-regressions.rst.
+
+
+The important bits (aka "TL;DR")
+================================
+
+#. It's a regression if something running fine with one Linux kernel works worse
+ or not at all with a newer version. Note, the newer kernel has to be compiled
+ using a similar configuration; the detailed explanations below describes this
+ and other fine print in more detail.
+
+#. Report your issue as outlined in Documentation/admin-guide/reporting-issues.rst,
+ it already covers all aspects important for regressions and repeated
+ below for convenience. Two of them are important: start your report's subject
+ with "[REGRESSION]" and CC or forward it to `the regression mailing list
+ <https://lore.kernel.org/regressions/>`_ (regressions@lists.linux.dev).
+
+#. Optional, but recommended: when sending or forwarding your report, make the
+ Linux kernel regression tracking bot "regzbot" track the issue by specifying
+ when the regression started like this::
+
+ #regzbot introduced v5.13..v5.14-rc1
+
+
+All the details on Linux kernel regressions relevant for users
+==============================================================
+
+
+The important basics
+--------------------
+
+
+What is a "regression" and what is the "no regressions rule"?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It's a regression if some application or practical use case running fine with
+one Linux kernel works worse or not at all with a newer version compiled using a
+similar configuration. The "no regressions rule" forbids this to take place; if
+it happens by accident, developers that caused it are expected to quickly fix
+the issue.
+
+It thus is a regression when a WiFi driver from Linux 5.13 works fine, but with
+5.14 doesn't work at all, works significantly slower, or misbehaves somehow.
+It's also a regression if a perfectly working application suddenly shows erratic
+behavior with a newer kernel version; such issues can be caused by changes in
+procfs, sysfs, or one of the many other interfaces Linux provides to userland
+software. But keep in mind, as mentioned earlier: 5.14 in this example needs to
+be built from a configuration similar to the one from 5.13. This can be achieved
+using ``make olddefconfig``, as explained in more detail below.
+
+Note the "practical use case" in the first sentence of this section: developers
+despite the "no regressions" rule are free to change any aspect of the kernel
+and even APIs or ABIs to userland, as long as no existing application or use
+case breaks.
+
+Also be aware the "no regressions" rule covers only interfaces the kernel
+provides to the userland. It thus does not apply to kernel-internal interfaces
+like the module API, which some externally developed drivers use to hook into
+the kernel.
+
+How do I report a regression?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Just report the issue as outlined in
+Documentation/admin-guide/reporting-issues.rst, it already describes the
+important points. The following aspects outlined there are especially relevant
+for regressions:
+
+ * When checking for existing reports to join, also search the `archives of the
+ Linux regressions mailing list <https://lore.kernel.org/regressions/>`_ and
+ `regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_.
+
+ * Start your report's subject with "[REGRESSION]".
+
+ * In your report, clearly mention the last kernel version that worked fine and
+ the first broken one. Ideally try to find the exact change causing the
+ regression using a bisection, as explained below in more detail.
+
+ * Remember to let the Linux regressions mailing list
+ (regressions@lists.linux.dev) know about your report:
+
+ * If you report the regression by mail, CC the regressions list.
+
+ * If you report your regression to some bug tracker, forward the submitted
+ report by mail to the regressions list while CCing the maintainer and the
+ mailing list for the subsystem in question.
+
+ If it's a regression within a stable or longterm series (e.g.
+ v5.15.3..v5.15.5), remember to CC the `Linux stable mailing list
+ <https://lore.kernel.org/stable/>`_ (stable@vger.kernel.org).
+
+ In case you performed a successful bisection, add everyone to the CC the
+ culprit's commit message mentions in lines starting with "Signed-off-by:".
+
+When CCing for forwarding your report to the list, consider directly telling the
+aforementioned Linux kernel regression tracking bot about your report. To do
+that, include a paragraph like this in your mail::
+
+ #regzbot introduced: v5.13..v5.14-rc1
+
+Regzbot will then consider your mail a report for a regression introduced in the
+specified version range. In above case Linux v5.13 still worked fine and Linux
+v5.14-rc1 was the first version where you encountered the issue. If you
+performed a bisection to find the commit that caused the regression, specify the
+culprit's commit-id instead::
+
+ #regzbot introduced: 1f2e3d4c5d
+
+Placing such a "regzbot command" is in your interest, as it will ensure the
+report won't fall through the cracks unnoticed. If you omit this, the Linux
+kernel's regressions tracker will take care of telling regzbot about your
+regression, as long as you send a copy to the regressions mailing lists. But the
+regression tracker is just one human which sometimes has to rest or occasionally
+might even enjoy some time away from computers (as crazy as that might sound).
+Relying on this person thus will result in an unnecessary delay before the
+regressions becomes mentioned `on the list of tracked and unresolved Linux
+kernel regressions <https://linux-regtracking.leemhuis.info/regzbot/>`_ and the
+weekly regression reports sent by regzbot. Such delays can result in Linus
+Torvalds being unaware of important regressions when deciding between "continue
+development or call this finished and release the final?".
+
+Are really all regressions fixed?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Nearly all of them are, as long as the change causing the regression (the
+"culprit commit") is reliably identified. Some regressions can be fixed without
+this, but often it's required.
+
+Who needs to find the root cause of a regression?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Developers of the affected code area should try to locate the culprit on their
+own. But for them that's often impossible to do with reasonable effort, as quite
+a lot of issues only occur in a particular environment outside the developer's
+reach -- for example, a specific hardware platform, firmware, Linux distro,
+system's configuration, or application. That's why in the end it's often up to
+the reporter to locate the culprit commit; sometimes users might even need to
+run additional tests afterwards to pinpoint the exact root cause. Developers
+should offer advice and reasonably help where they can, to make this process
+relatively easy and achievable for typical users.
+
+How can I find the culprit?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Perform a bisection, as roughly outlined in
+Documentation/admin-guide/reporting-issues.rst and described in more detail by
+Documentation/admin-guide/bug-bisect.rst. It might sound like a lot of work, but
+in many cases finds the culprit relatively quickly. If it's hard or
+time-consuming to reliably reproduce the issue, consider teaming up with other
+affected users to narrow down the search range together.
+
+Who can I ask for advice when it comes to regressions?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Send a mail to the regressions mailing list (regressions@lists.linux.dev) while
+CCing the Linux kernel's regression tracker (regressions@leemhuis.info); if the
+issue might better be dealt with in private, feel free to omit the list.
+
+
+Additional details about regressions
+------------------------------------
+
+
+What is the goal of the "no regressions rule"?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Users should feel safe when updating kernel versions and not have to worry
+something might break. This is in the interest of the kernel developers to make
+updating attractive: they don't want users to stay on stable or longterm Linux
+series that are either abandoned or more than one and a half years old. That's
+in everybody's interest, as `those series might have known bugs, security
+issues, or other problematic aspects already fixed in later versions
+<http://www.kroah.com/log/blog/2018/08/24/what-stable-kernel-should-i-use/>`_.
+Additionally, the kernel developers want to make it simple and appealing for
+users to test the latest pre-release or regular release. That's also in
+everybody's interest, as it's a lot easier to track down and fix problems, if
+they are reported shortly after being introduced.
+
+Is the "no regressions" rule really adhered in practice?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It's taken really seriously, as can be seen by many mailing list posts from
+Linux creator and lead developer Linus Torvalds, some of which are quoted in
+Documentation/process/handling-regressions.rst.
+
+Exceptions to this rule are extremely rare; in the past developers almost always
+turned out to be wrong when they assumed a particular situation was warranting
+an exception.
+
+Who ensures the "no regressions" is actually followed?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The subsystem maintainers should take care of that, which are watched and
+supported by the tree maintainers -- e.g. Linus Torvalds for mainline and
+Greg Kroah-Hartman et al. for various stable/longterm series.
+
+All of them are helped by people trying to ensure no regression report falls
+through the cracks. One of them is Thorsten Leemhuis, who's currently acting as
+the Linux kernel's "regressions tracker"; to facilitate this work he relies on
+regzbot, the Linux kernel regression tracking bot. That's why you want to bring
+your report on the radar of these people by CCing or forwarding each report to
+the regressions mailing list, ideally with a "regzbot command" in your mail to
+get it tracked immediately.
+
+How quickly are regressions normally fixed?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Developers should fix any reported regression as quickly as possible, to provide
+affected users with a solution in a timely manner and prevent more users from
+running into the issue; nevertheless developers need to take enough time and
+care to ensure regression fixes do not cause additional damage.
+
+The answer thus depends on various factors like the impact of a regression, its
+age, or the Linux series in which it occurs. In the end though, most regressions
+should be fixed within two weeks.
+
+Is it a regression, if the issue can be avoided by updating some software?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Almost always: yes. If a developer tells you otherwise, ask the regression
+tracker for advice as outlined above.
+
+Is it a regression, if a newer kernel works slower or consumes more energy?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Yes, but the difference has to be significant. A five percent slow-down in a
+micro-benchmark thus is unlikely to qualify as regression, unless it also
+influences the results of a broad benchmark by more than one percent. If in
+doubt, ask for advice.
+
+Is it a regression, if an external kernel module breaks when updating Linux?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+No, as the "no regression" rule is about interfaces and services the Linux
+kernel provides to the userland. It thus does not cover building or running
+externally developed kernel modules, as they run in kernel-space and hook into
+the kernel using internal interfaces occasionally changed.
+
+How are regressions handled that are caused by security fixes?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In extremely rare situations security issues can't be fixed without causing
+regressions; those fixes are given way, as they are the lesser evil in the end.
+Luckily this middling almost always can be avoided, as key developers for the
+affected area and often Linus Torvalds himself try very hard to fix security
+issues without causing regressions.
+
+If you nevertheless face such a case, check the mailing list archives if people
+tried their best to avoid the regression. If not, report it; if in doubt, ask
+for advice as outlined above.
+
+What happens if fixing a regression is impossible without causing another?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Sadly these things happen, but luckily not very often; if they occur, expert
+developers of the affected code area should look into the issue to find a fix
+that avoids regressions or at least their impact. If you run into such a
+situation, do what was outlined already for regressions caused by security
+fixes: check earlier discussions if people already tried their best and ask for
+advice if in doubt.
+
+A quick note while at it: these situations could be avoided, if people would
+regularly give mainline pre-releases (say v5.15-rc1 or -rc3) from each
+development cycle a test run. This is best explained by imagining a change
+integrated between Linux v5.14 and v5.15-rc1 which causes a regression, but at
+the same time is a hard requirement for some other improvement applied for
+5.15-rc1. All these changes often can simply be reverted and the regression thus
+solved, if someone finds and reports it before 5.15 is released. A few days or
+weeks later this solution can become impossible, as some software might have
+started to rely on aspects introduced by one of the follow-up changes: reverting
+all changes would then cause a regression for users of said software and thus is
+out of the question.
+
+Is it a regression, if some feature I relied on was removed months ago?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is, but often it's hard to fix such regressions due to the aspects outlined
+in the previous section. It hence needs to be dealt with on a case-by-case
+basis. This is another reason why it's in everybody's interest to regularly test
+mainline pre-releases.
+
+Does the "no regression" rule apply if I seem to be the only affected person?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It does, but only for practical usage: the Linux developers want to be free to
+remove support for hardware only to be found in attics and museums anymore.
+
+Note, sometimes regressions can't be avoided to make progress -- and the latter
+is needed to prevent Linux from stagnation. Hence, if only very few users seem
+to be affected by a regression, it for the greater good might be in their and
+everyone else's interest to lettings things pass. Especially if there is an
+easy way to circumvent the regression somehow, for example by updating some
+software or using a kernel parameter created just for this purpose.
+
+Does the regression rule apply for code in the staging tree as well?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Not according to the `help text for the configuration option covering all
+staging code <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/staging/Kconfig>`_,
+which since its early days states::
+
+ Please note that these drivers are under heavy development, may or
+ may not work, and may contain userspace interfaces that most likely
+ will be changed in the near future.
+
+The staging developers nevertheless often adhere to the "no regressions" rule,
+but sometimes bend it to make progress. That's for example why some users had to
+deal with (often negligible) regressions when a WiFi driver from the staging
+tree was replaced by a totally different one written from scratch.
+
+Why do later versions have to be "compiled with a similar configuration"?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Because the Linux kernel developers sometimes integrate changes known to cause
+regressions, but make them optional and disable them in the kernel's default
+configuration. This trick allows progress, as the "no regressions" rule
+otherwise would lead to stagnation.
+
+Consider for example a new security feature blocking access to some kernel
+interfaces often abused by malware, which at the same time are required to run a
+few rarely used applications. The outlined approach makes both camps happy:
+people using these applications can leave the new security feature off, while
+everyone else can enable it without running into trouble.
+
+How to create a configuration similar to the one of an older kernel?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Start your machine with a known-good kernel and configure the newer Linux
+version with ``make olddefconfig``. This makes the kernel's build scripts pick
+up the configuration file (the ".config" file) from the running kernel as base
+for the new one you are about to compile; afterwards they set all new
+configuration options to their default value, which should disable new features
+that might cause regressions.
+
+Can I report a regression I found with pre-compiled vanilla kernels?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You need to ensure the newer kernel was compiled with a similar configuration
+file as the older one (see above), as those that built them might have enabled
+some known-to-be incompatible feature for the newer kernel. If in doubt, report
+the matter to the kernel's provider and ask for advice.
+
+
+More about regression tracking with "regzbot"
+---------------------------------------------
+
+What is regression tracking and why should I care about it?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Rules like "no regressions" need someone to ensure they are followed, otherwise
+they are broken either accidentally or on purpose. History has shown this to be
+true for Linux kernel development as well. That's why Thorsten Leemhuis, the
+Linux Kernel's regression tracker, and some people try to ensure all regression
+are fixed by keeping an eye on them until they are resolved. Neither of them are
+paid for this, that's why the work is done on a best effort basis.
+
+Why and how are Linux kernel regressions tracked using a bot?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Tracking regressions completely manually has proven to be quite hard due to the
+distributed and loosely structured nature of Linux kernel development process.
+That's why the Linux kernel's regression tracker developed regzbot to facilitate
+the work, with the long term goal to automate regression tracking as much as
+possible for everyone involved.
+
+Regzbot works by watching for replies to reports of tracked regressions.
+Additionally, it's looking out for posted or committed patches referencing such
+reports with "Link:" tags; replies to such patch postings are tracked as well.
+Combined this data provides good insights into the current state of the fixing
+process.
+
+How to see which regressions regzbot tracks currently?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Check out `regzbot's web-interface <https://linux-regtracking.leemhuis.info/regzbot/>`_.
+
+What kind of issues are supposed to be tracked by regzbot?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The bot is meant to track regressions, hence please don't involve regzbot for
+regular issues. But it's okay for the Linux kernel's regression tracker if you
+involve regzbot to track severe issues, like reports about hangs, corrupted
+data, or internal errors (Panic, Oops, BUG(), warning, ...).
+
+How to change aspects of a tracked regression?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+By using a 'regzbot command' in a direct or indirect reply to the mail with the
+report. The easiest way to do that: find the report in your "Sent" folder or the
+mailing list archive and reply to it using your mailer's "Reply-all" function.
+In that mail, use one of the following commands in a stand-alone paragraph (IOW:
+use blank lines to separate one or multiple of these commands from the rest of
+the mail's text).
+
+ * Update when the regression started to happen, for example after performing a
+ bisection::
+
+ #regzbot introduced: 1f2e3d4c5d
+
+ * Set or update the title::
+
+ #regzbot title: foo
+
+ * Monitor a discussion or bugzilla.kernel.org ticket where additions aspects of
+ the issue or a fix are discussed:::
+
+ #regzbot monitor: https://lore.kernel.org/r/30th.anniversary.repost@klaava.Helsinki.FI/
+ #regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=123456789
+
+ * Point to a place with further details of interest, like a mailing list post
+ or a ticket in a bug tracker that are slightly related, but about a different
+ topic::
+
+ #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=123456789
+
+ * Mark a regression as invalid::
+
+ #regzbot invalid: wasn't a regression, problem has always existed
+
+Regzbot supports a few other commands primarily used by developers or people
+tracking regressions. They and more details about the aforementioned regzbot
+commands can be found in the `getting started guide
+<https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md>`_ and
+the `reference documentation <https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md>`_
+for regzbot.
+
+..
+ end-of-content
+..
+ This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top
+ of the file. If you want to distribute this text under CC-BY-4.0 only,
+ please use "The Linux kernel developers" for author attribution and link
+ this as source:
+ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-regressions.rst
+..
+ Note: Only the content of this RST file as found in the Linux kernel sources
+ is available under CC-BY-4.0, as versions of this text that were processed
+ (for example by the kernel's build system) might contain content taken from
+ files which use a more restrictive license.
diff --git a/Documentation/admin-guide/security-bugs.rst b/Documentation/admin-guide/security-bugs.rst
deleted file mode 100644
index dcd6c93c7aac..000000000000
--- a/Documentation/admin-guide/security-bugs.rst
+++ /dev/null
@@ -1,89 +0,0 @@
-.. _securitybugs:
-
-Security bugs
-=============
-
-Linux kernel developers take security very seriously. As such, we'd
-like to know when a security bug is found so that it can be fixed and
-disclosed as quickly as possible. Please report security bugs to the
-Linux kernel security team.
-
-Contact
--------
-
-The Linux kernel security team can be contacted by email at
-<security@kernel.org>. This is a private list of security officers
-who will help verify the bug report and develop and release a fix.
-If you already have a fix, please include it with your report, as
-that can speed up the process considerably. It is possible that the
-security team will bring in extra help from area maintainers to
-understand and fix the security vulnerability.
-
-As it is with any bug, the more information provided the easier it
-will be to diagnose and fix. Please review the procedure outlined in
-admin-guide/reporting-bugs.rst if you are unclear about what
-information is helpful. Any exploit code is very helpful and will not
-be released without consent from the reporter unless it has already been
-made public.
-
-Disclosure and embargoed information
-------------------------------------
-
-The security list is not a disclosure channel. For that, see Coordination
-below.
-
-Once a robust fix has been developed, the release process starts. Fixes
-for publicly known bugs are released immediately.
-
-Although our preference is to release fixes for publicly undisclosed bugs
-as soon as they become available, this may be postponed at the request of
-the reporter or an affected party for up to 7 calendar days from the start
-of the release process, with an exceptional extension to 14 calendar days
-if it is agreed that the criticality of the bug requires more time. The
-only valid reason for deferring the publication of a fix is to accommodate
-the logistics of QA and large scale rollouts which require release
-coordination.
-
-While embargoed information may be shared with trusted individuals in
-order to develop a fix, such information will not be published alongside
-the fix or on any other disclosure channel without the permission of the
-reporter. This includes but is not limited to the original bug report
-and followup discussions (if any), exploits, CVE information or the
-identity of the reporter.
-
-In other words our only interest is in getting bugs fixed. All other
-information submitted to the security list and any followup discussions
-of the report are treated confidentially even after the embargo has been
-lifted, in perpetuity.
-
-Coordination
-------------
-
-Fixes for sensitive bugs, such as those that might lead to privilege
-escalations, may need to be coordinated with the private
-<linux-distros@vs.openwall.org> mailing list so that distribution vendors
-are well prepared to issue a fixed kernel upon public disclosure of the
-upstream fix. Distros will need some time to test the proposed patch and
-will generally request at least a few days of embargo, and vendor update
-publication prefers to happen Tuesday through Thursday. When appropriate,
-the security team can assist with this coordination, or the reporter can
-include linux-distros from the start. In this case, remember to prefix
-the email Subject line with "[vs]" as described in the linux-distros wiki:
-<http://oss-security.openwall.org/wiki/mailing-lists/distros#how-to-use-the-lists>
-
-CVE assignment
---------------
-
-The security team does not normally assign CVEs, nor do we require them
-for reports or fixes, as this can needlessly complicate the process and
-may delay the bug handling. If a reporter wishes to have a CVE identifier
-assigned ahead of public disclosure, they will need to contact the private
-linux-distros list, described above. When such a CVE identifier is known
-before a patch is provided, it is desirable to mention it in the commit
-message if the reporter agrees.
-
-Non-disclosure agreements
--------------------------
-
-The Linux kernel security team is not a formal body and therefore unable
-to enter any non-disclosure agreements.
diff --git a/Documentation/admin-guide/serial-console.rst b/Documentation/admin-guide/serial-console.rst
index a8d1e36b627a..a3dfc2c66e01 100644
--- a/Documentation/admin-guide/serial-console.rst
+++ b/Documentation/admin-guide/serial-console.rst
@@ -33,8 +33,11 @@ The format of this option is::
9600n8. The maximum baudrate is 115200.
You can specify multiple console= options on the kernel command line.
-Output will appear on all of them. The last device will be used when
-you open ``/dev/console``. So, for example::
+
+The behavior is well defined when each device type is mentioned only once.
+In this case, the output will appear on all requested consoles. And
+the last device will be used when you open ``/dev/console``.
+So, for example::
console=ttyS1,9600 console=tty0
@@ -42,7 +45,34 @@ defines that opening ``/dev/console`` will get you the current foreground
virtual console, and kernel messages will appear on both the VGA
console and the 2nd serial port (ttyS1 or COM2) at 9600 baud.
-Note that you can only define one console per device type (serial, video).
+The behavior is more complicated when the same device type is defined more
+times. In this case, there are the following two rules:
+
+1. The output will appear only on the first device of each defined type.
+
+2. ``/dev/console`` will be associated with the first registered device.
+ Where the registration order depends on how kernel initializes various
+ subsystems.
+
+ This rule is used also when the last console= parameter is not used
+ for other reasons. For example, because of a typo or because
+ the hardware is not available.
+
+The result might be surprising. For example, the following two command
+lines have the same result::
+
+ console=ttyS1,9600 console=tty0 console=tty1
+ console=tty0 console=ttyS1,9600 console=tty1
+
+The kernel messages are printed only on ``tty0`` and ``ttyS1``. And
+``/dev/console`` gets associated with ``tty0``. It is because kernel
+tries to register graphical consoles before serial ones. It does it
+because of the default behavior when no console device is specified,
+see below.
+
+Note that the last ``console=tty1`` parameter still makes a difference.
+The kernel command line is used also by systemd. It would use the last
+defined ``tty1`` as the login console.
If no console device is specified, the first device found capable of
acting as a system console will be used. At this time, the system
@@ -54,7 +84,7 @@ You will need to create a new device to use ``/dev/console``. The official
``/dev/console`` is now character device 5,1.
(You can also use a network device as a console. See
-``Documentation/networking/netconsole.txt`` for information on that.)
+``Documentation/networking/netconsole.rst`` for information on that.)
Here's an example that will use ``/dev/ttyS1`` (COM2) as the console.
Replace the sample values as needed.
diff --git a/Documentation/admin-guide/spkguide.txt b/Documentation/admin-guide/spkguide.txt
new file mode 100644
index 000000000000..0d5965138f8f
--- /dev/null
+++ b/Documentation/admin-guide/spkguide.txt
@@ -0,0 +1,1625 @@
+
+The Speakup User's Guide
+For Speakup 3.1.2 and Later
+By Gene Collins
+Updated by others
+Last modified on Mon Sep 27 14:26:31 2010
+Document version 1.3
+
+Copyright (c) 2005 Gene Collins
+Copyright (c) 2008, 2023 Samuel Thibault
+Copyright (c) 2009, 2010 the Speakup Team
+
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with no
+Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
+copy of the license is included in the section entitled "GNU Free
+Documentation License".
+
+Preface
+
+The purpose of this document is to familiarize users with the user
+interface to Speakup, a Linux Screen Reader. If you need instructions
+for installing or obtaining Speakup, visit the web site at
+http://linux-speakup.org/. Speakup is a set of patches to the standard
+Linux kernel source tree. It can be built as a series of modules, or as
+a part of a monolithic kernel. These details are beyond the scope of
+this manual, but the user may need to be aware of the module
+capabilities, depending on how your system administrator has installed
+Speakup. If Speakup is built as a part of a monolithic kernel, and the
+user is using a hardware synthesizer, then Speakup will be able to
+provide speech access from the time the kernel is loaded, until the time
+the system is shutdown. This means that if you have obtained Linux
+installation media for a distribution which includes Speakup as a part
+of its kernel, you will be able, as a blind person, to install Linux
+with speech access unaided by a sighted person. Again, these details
+are beyond the scope of this manual, but the user should be aware of
+them. See the web site mentioned above for further details.
+
+1. Starting Speakup
+
+If your system administrator has installed Speakup to work with your
+specific synthesizer by default, then all you need to do to use Speakup
+is to boot your system, and Speakup should come up talking. This
+assumes of course that your synthesizer is a supported hardware
+synthesizer, and that it is either installed in or connected to your
+system, and is if necessary powered on.
+
+It is possible, however, that Speakup may have been compiled into the
+kernel with no default synthesizer. It is even possible that your
+kernel has been compiled with support for some of the supported
+synthesizers and not others. If you find that this is the case, and
+your synthesizer is supported but not available, complain to the person
+who compiled and installed your kernel. Or better yet, go to the web
+site, and learn how to patch Speakup into your own kernel source, and
+build and install your own kernel.
+
+If your kernel has been compiled with Speakup, and has no default
+synthesizer set, or you would like to use a different synthesizer than
+the default one, then you may issue the following command at the boot
+prompt of your boot loader.
+
+linux speakup.synth=ltlk
+
+This command would tell Speakup to look for and use a LiteTalk or
+DoubleTalk LT at boot up. You may replace the ltlk synthesizer keyword
+with the keyword for whatever synthesizer you wish to use. The
+speakup.synth parameter will accept the following keywords, provided
+that support for the related synthesizers has been built into the
+kernel.
+
+acntsa -- Accent SA
+acntpc -- Accent PC
+apollo -- Apollo
+audptr -- Audapter
+bns -- Braille 'n Speak
+dectlk -- DecTalk Express (old and new, db9 serial only)
+decext -- DecTalk (old) External
+dtlk -- DoubleTalk PC
+keypc -- Keynote Gold PC
+ltlk -- DoubleTalk LT, LiteTalk, or external Tripletalk (db9 serial only)
+spkout -- Speak Out
+txprt -- Transport
+dummy -- Plain text terminal
+
+Note: Speakup does * NOT * support the internal Tripletalk!
+
+Speakup does support two other synthesizers, but because they work in
+conjunction with other software, they must be loaded as modules after
+their related software is loaded, and so are not available at boot up.
+These are as follows:
+
+decpc -- DecTalk PC (not available at boot up)
+soft -- One of several software synthesizers (not available at boot up)
+
+By default speakup looks for the synthesizer on the ttyS0 serial port. This can
+be changed with the device parameter of the modules, for instance for
+DoubleTalk LT:
+
+speakup_ltlk.dev=ttyUSB0
+
+See the sections on loading modules and software synthesizers later in
+this manual for further details. It should be noted here that the
+speakup.synth boot parameter will have no effect if Speakup has been
+compiled as modules. In order for Speakup modules to be loaded during
+the boot process, such action must be configured by your system
+administrator. This will mean that you will hear some, but not all, of
+the bootup messages.
+
+2. Basic operation
+
+Once you have booted the system, and if necessary, have supplied the
+proper bootup parameter for your synthesizer, Speakup will begin
+talking as soon as the kernel is loaded. In fact, it will talk a lot!
+It will speak all the boot up messages that the kernel prints on the
+screen during the boot process. This is because Speakup is not a
+separate screen reader, but is actually built into the operating
+system. Since almost all console applications must print text on the
+screen using the kernel, and must get their keyboard input through the
+kernel, they are automatically handled properly by Speakup. There are a
+few exceptions, but we'll come to those later.
+
+Note: In this guide I will refer to the numeric keypad as the keypad.
+This is done because the speakupmap.map file referred to later in this
+manual uses the term keypad instead of numeric keypad. Also I'm lazy
+and would rather only type one word. So keypad it is. Got it? Good.
+
+Most of the Speakup review keys are located on the keypad at the far
+right of the keyboard. The numlock key should be off, in order for these
+to work. If you toggle the numlock on, the keypad will produce numbers,
+which is exactly what you want for spreadsheets and such. For the
+purposes of this guide, you should have the numlock turned off, which is
+its default state at bootup.
+
+You probably won't want to listen to all the bootup messages every time
+you start your system, though it's a good idea to listen to them at
+least once, just so you'll know what kind of information is available to
+you during the boot process. You can always review these messages after
+bootup with the command:
+
+dmesg | more
+
+In order to speed the boot process, and to silence the speaking of the
+bootup messages, just press the keypad enter key. This key is located
+in the bottom right corner of the keypad. Speakup will shut up and stay
+that way, until you press another key.
+
+You can check to see if the boot process has completed by pressing the 8
+key on the keypad, which reads the current line. This also has the
+effect of starting Speakup talking again, so you can press keypad enter
+to silence it again if the boot process has not completed.
+
+When the boot process is complete, you will arrive at a "login" prompt.
+At this point, you'll need to type in your user id and password, as
+provided by your system administrator. You will hear Speakup speak the
+letters of your user id as you type it, but not the password. This is
+because the password is not displayed on the screen for security
+reasons. This has nothing to do with Speakup, it's a Linux security
+feature.
+
+Once you've logged in, you can run any Linux command or program which is
+allowed by your user id. Normal users will not be able to run programs
+which require root privileges.
+
+When you are running a program or command, Speakup will automatically
+speak new text as it arrives on the screen. You can at any time silence
+the speech with keypad enter, or use any of the Speakup review keys.
+
+Here are some basic Speakup review keys, and a short description of what
+they do.
+
+keypad 1 -- read previous character
+keypad 2 -- read current character (pressing keypad 2 twice rapidly will speak
+ the current character phonetically)
+keypad 3 -- read next character
+keypad 4 -- read previous word
+keypad 5 -- read current word (press twice rapidly to spell the current word)
+keypad 6 -- read next word
+keypad 7 -- read previous line
+keypad 8 -- read current line (press twice rapidly to hear how much the
+ text on the current line is indented)
+keypad 9 -- read next line
+keypad period -- speak current cursor position and announce current
+ virtual console
+
+It's also worth noting that the insert key on the keypad is mapped
+as the speakup key. Instead of pressing and releasing this key, as you
+do under DOS or Windows, you hold it like a shift key, and press other
+keys in combination with it. For example, repeatedly holding keypad
+insert, from now on called speakup, and keypad enter will toggle the
+speaking of new text on the screen on and off. This is not the same as
+just pressing keypad enter by itself, which just silences the speech
+until you hit another key. When you hit speakup plus keypad enter,
+Speakup will say, "You turned me off.", or "Hey, that's better." When
+Speakup is turned off, no new text on the screen will be spoken. You
+can still use the reading controls to review the screen however.
+
+3. Using the Speakup Help System
+
+In order to enter the Speakup help system, press and hold the speakup
+key (remember that this is the keypad insert key), and press the f1 key.
+You will hear the message:
+
+"Press space to leave help, cursor up or down to scroll, or a letter to
+go to commands in list."
+
+When you press the spacebar to leave the help system, you will hear:
+
+"Leaving help."
+
+While you are in the Speakup help system, you can scroll up or down
+through the list of available commands using the cursor keys. The list
+of commands is arranged in alphabetical order. If you wish to jump to
+commands in a specific part of the alphabet, you may press the letter of
+the alphabet you wish to jump to.
+
+You can also just explore by typing keyboard keys. Pressing keys will
+cause Speakup to speak the command associated with that key. For
+example, if you press the keypad 8 key, you will hear:
+
+"Keypad 8 is line, say current."
+
+You'll notice that some commands do not have keys assigned to them.
+This is because they are very infrequently used commands, and are also
+accessible through the sys system. We'll discuss the sys system later
+in this manual.
+
+You'll also notice that some commands have two keys assigned to them.
+This is because Speakup has a built in set of alternative key bindings
+for laptop users. The alternate speakup key is the caps lock key. You
+can press and hold the caps lock key, while pressing an alternate
+speakup command key to activate the command. On most laptops, the
+numeric keypad is defined as the keys in the j k l area of the keyboard.
+
+There is usually a function key which turns this keypad function on and
+off, and some other key which controls the numlock state. Toggling the
+keypad functionality on and off can become a royal pain. So, Speakup
+gives you a simple way to get at an alternative set of key mappings for
+your laptop. These are also available by default on desktop systems,
+because Speakup does not know whether it is running on a desktop or
+laptop. So you may choose which set of Speakup keys to use. Some
+system administrators may have chosen to compile Speakup for a desktop
+system without this set of alternate key bindings, but these details are
+beyond the scope of this manual. To use the caps lock for its normal
+purpose, hold the shift key while toggling the caps lock on and off. We
+should note here, that holding the caps lock key and pressing the z key
+will toggle the alternate j k l keypad on and off.
+
+4. Keys and Their Assigned Commands
+
+In this section, we'll go through a list of all the speakup keys and
+commands. You can also get a list of commands and assigned keys from
+the help system.
+
+The following list was taken from the speakupmap.map file. Key
+assignments are on the left of the equal sign, and the associated
+Speakup commands are on the right. The designation "spk" means to press
+and hold the speakup key, a.k.a. keypad insert, a.k.a. caps lock, while
+pressing the other specified key.
+
+spk key_f9 = punc_level_dec
+spk key_f10 = punc_level_inc
+spk key_f11 = reading_punc_dec
+spk key_f12 = reading_punc_inc
+spk key_1 = vol_dec
+spk key_2 = vol_inc
+spk key_3 = pitch_dec
+spk key_4 = pitch_inc
+spk key_5 = rate_dec
+spk key_6 = rate_inc
+key_kpasterisk = toggle_cursoring
+spk key_kpasterisk = speakup_goto
+spk key_f1 = speakup_help
+spk key_f2 = set_win
+spk key_f3 = clear_win
+spk key_f4 = enable_win
+spk key_f5 = edit_some
+spk key_f6 = edit_most
+spk key_f7 = edit_delim
+spk key_f8 = edit_repeat
+shift spk key_f9 = edit_exnum
+ key_kp7 = say_prev_line
+spk key_kp7 = left_edge
+ key_kp8 = say_line
+double key_kp8 = say_line_indent
+spk key_kp8 = say_from_top
+ key_kp9 = say_next_line
+spk key_kp9 = top_edge
+ key_kpminus = speakup_parked
+spk key_kpminus = say_char_num
+ key_kp4 = say_prev_word
+spk key_kp4 = say_from_left
+ key_kp5 = say_word
+double key_kp5 = spell_word
+spk key_kp5 = spell_phonetic
+ key_kp6 = say_next_word
+spk key_kp6 = say_to_right
+ key_kpplus = say_screen
+spk key_kpplus = say_win
+ key_kp1 = say_prev_char
+spk key_kp1 = right_edge
+ key_kp2 = say_char
+spk key_kp2 = say_to_bottom
+double key_kp2 = say_phonetic_char
+ key_kp3 = say_next_char
+spk key_kp3 = bottom_edge
+ key_kp0 = spk_key
+ key_kpdot = say_position
+spk key_kpdot = say_attributes
+key_kpenter = speakup_quiet
+spk key_kpenter = speakup_off
+key_sysrq = speech_kill
+ key_kpslash = speakup_cut
+spk key_kpslash = speakup_paste
+spk key_pageup = say_first_char
+spk key_pagedown = say_last_char
+key_capslock = spk_key
+ spk key_z = spk_lock
+key_leftmeta = spk_key
+ctrl spk key_0 = speakup_goto
+spk key_u = say_prev_line
+spk key_i = say_line
+double spk key_i = say_line_indent
+spk key_o = say_next_line
+spk key_minus = speakup_parked
+shift spk key_minus = say_char_num
+spk key_j = say_prev_word
+spk key_k = say_word
+double spk key_k = spell_word
+spk key_l = say_next_word
+spk key_m = say_prev_char
+spk key_comma = say_char
+double spk key_comma = say_phonetic_char
+spk key_dot = say_next_char
+spk key_n = say_position
+ ctrl spk key_m = left_edge
+ ctrl spk key_y = top_edge
+ ctrl spk key_dot = right_edge
+ctrl spk key_p = bottom_edge
+spk key_apostrophe = say_screen
+spk key_h = say_from_left
+spk key_y = say_from_top
+spk key_semicolon = say_to_right
+spk key_p = say_to_bottom
+spk key_slash = say_attributes
+ spk key_enter = speakup_quiet
+ ctrl spk key_enter = speakup_off
+ spk key_9 = speakup_cut
+spk key_8 = speakup_paste
+shift spk key_m = say_first_char
+ ctrl spk key_semicolon = say_last_char
+spk key_r = read_all_doc
+
+5. The Speakup Sys System
+
+The Speakup screen reader also creates a speakup subdirectory as a part
+of the sys system.
+
+As a convenience, run as root
+
+ln -s /sys/accessibility/speakup /speakup
+
+to directly access speakup parameters from /speakup.
+You can see these entries by typing the command:
+
+ls -1 /speakup/*
+
+If you issue the above ls command, you will get back something like
+this:
+
+/speakup/attrib_bleep
+/speakup/bell_pos
+/speakup/bleep_time
+/speakup/bleeps
+/speakup/cursor_time
+/speakup/delimiters
+/speakup/ex_num
+/speakup/key_echo
+/speakup/keymap
+/speakup/no_interrupt
+/speakup/punc_all
+/speakup/punc_level
+/speakup/punc_most
+/speakup/punc_some
+/speakup/reading_punc
+/speakup/repeats
+/speakup/say_control
+/speakup/say_word_ctl
+/speakup/silent
+/speakup/spell_delay
+/speakup/synth
+/speakup/synth_direct
+/speakup/version
+
+/speakup/i18n:
+announcements
+characters
+chartab
+colors
+ctl_keys
+formatted
+function_names
+key_names
+states
+
+/speakup/soft:
+caps_start
+caps_stop
+delay_time
+direct
+freq
+full_time
+jiffy_delta
+pitch
+inflection
+punct
+rate
+tone
+trigger_time
+voice
+vol
+
+Notice the two subdirectories of /speakup: /speakup/i18n and
+/speakup/soft.
+The i18n subdirectory is described in a later section.
+The files under /speakup/soft represent settings that are specific to the
+driver for the software synthesizer. If you use the LiteTalk, your
+synthesizer-specific settings would be found in /speakup/ltlk. In other words,
+a subdirectory named /speakup/KWD is created to hold parameters specific
+to the device whose keyword is KWD.
+These parameters include volume, rate, pitch, and others.
+
+In addition to using the Speakup hot keys to change such things as
+volume, pitch, and rate, you can also echo values to the appropriate
+entry in the /speakup directory. This is very useful, since it
+lets you control Speakup parameters from within a script. How you
+would write such scripts is somewhat beyond the scope of this manual,
+but I will include a couple of simple examples here to give you a
+general idea of what such scripts can do.
+
+Suppose for example, that you wanted to control both the punctuation
+level and the reading punctuation level at the same time. For
+simplicity, we'll call them punc0, punc1, punc2, and punc3. The scripts
+might look something like this:
+
+#!/bin/bash
+# punc0
+# set punc and reading punc levels to 0
+echo 0 >/speakup/punc_level
+echo 0 >/speakup/reading_punc
+echo Punctuation level set to 0.
+
+#!/bin/bash
+# punc1
+# set punc and reading punc levels to 1
+echo 1 >/speakup/punc_level
+echo 1 >/speakup/reading_punc
+echo Punctuation level set to 1.
+
+#!/bin/bash
+# punc2
+# set punc and reading punc levels to 2
+echo 2 >/speakup/punc_level
+echo 2 >/speakup/reading_punc
+echo Punctuation level set to 2.
+
+#!/bin/bash
+# punc3
+# set punc and reading punc levels to 3
+echo 3 >/speakup/punc_level
+echo 3 >/speakup/reading_punc
+echo Punctuation level set to 3.
+
+If you were to store these four small scripts in a directory in your
+path, perhaps /usr/local/bin, and set the permissions to 755 with the
+chmod command, then you could change the default reading punc and
+punctuation levels at the same time by issuing just one command. For
+example, if you were to execute the punc3 command at your shell prompt,
+then the reading punc and punc level would both get set to 3.
+
+I should note that the above scripts were written to work with bash, but
+regardless of which shell you use, you should be able to do something
+similar.
+
+The Speakup sys system also has another interesting use. You can echo
+Speakup parameters into the sys system in a script during system
+startup, and speakup will return to your preferred parameters every time
+the system is rebooted.
+
+Most of the Speakup sys parameters can be manipulated by a regular user
+on the system. However, there are a few parameters that are dangerous
+enough that they should only be manipulated by the root user on your
+system. There are even some parameters that are read only, and cannot
+be written to at all. For example, the version entry in the Speakup
+sys system is read only. This is because there is no reason for a user
+to tamper with the version number which is reported by Speakup. Doing
+an ls -l on /speakup/version will return this:
+
+-r--r--r-- 1 root root 0 Mar 21 13:46 /speakup/version
+
+As you can see, the version entry in the Speakup sys system is read
+only, is owned by root, and belongs to the root group. Doing a cat of
+/speakup/version will display the Speakup version number, like
+this:
+
+cat /speakup/version
+Speakup v-2.00 CVS: Thu Oct 21 10:38:21 EDT 2004
+synth dtlk version 1.1
+
+The display shows the Speakup version number, along with the version
+number of the driver for the current synthesizer.
+
+Looking at entries in the Speakup sys system can be useful in many
+ways. For example, you might wish to know what level your volume is set
+at. You could type:
+
+cat /speakup/KWD/vol
+# Replace KWD with the keyword for your synthesizer, E.G., ltlk for LiteTalk.
+5
+
+The number five which comes back is the level at which the synthesizer
+volume is set at.
+
+All the entries in the Speakup sys system are readable, some are
+writable by root only, and some are writable by everyone. Unless you
+know what you are doing, you should probably leave the ones that are
+writable by root only alone. Most of the names are self explanatory.
+Vol for controlling volume, pitch for pitch, inflection for pitch range, rate
+for controlling speaking rate, etc. If you find one you aren't sure about, you
+can post a query on the Speakup list.
+
+6. Changing Synthesizers
+
+It is possible to change to a different synthesizer while speakup is
+running. In other words, it is not necessary to reboot the system
+in order to use a different synthesizer. You can simply echo the
+synthesizer keyword to the /speakup/synth sys entry.
+Depending on your situation, you may wish to echo none to the synth
+sys entry, to disable speech while one synthesizer is disconnected and
+a second one is connected in its place. Then echo the keyword for the
+new synthesizer into the synth sys entry in order to start speech
+with the newly connected synthesizer. See the list of synthesizer
+keywords in section 1 to find the keyword which matches your synth.
+
+7. Loading modules
+
+As mentioned earlier, Speakup can either be completely compiled into the
+kernel, with the exception of the help module, or it can be compiled as
+a series of modules. When compiled as modules, Speakup will only be
+able to speak some of the bootup messages if your system administrator
+has configured the system to load the modules at boot time. The modules
+can be loaded after the file systems have been checked and mounted, or
+from an initrd. There is a third possibility. Speakup can be compiled
+with some components built into the kernel, and others as modules. As
+we'll see in the next section, this is particularly useful when you are
+working with software synthesizers.
+
+If Speakup is completely compiled as modules, then you must use the
+modprobe command to load Speakup. You do this by loading the module for
+the synthesizer driver you wish to use. The driver modules are all
+named speakup_<keyword>, where <keyword> is the keyword for the
+synthesizer you want. So, in order to load the driver for the DecTalk
+Express, you would type the following command:
+
+modprobe speakup_dectlk
+
+Issuing this command would load the DecTalk Express driver and all other
+related Speakup modules necessary to get Speakup up and running.
+
+To completely unload Speakup, again presuming that it is entirely built
+as modules, you would give the command:
+
+modprobe -r speakup_dectlk
+
+The above command assumes you were running a DecTalk Express. If you
+were using a different synth, then you would substitute its keyword in
+place of dectlk.
+
+If you have multiple drivers loaded, you need to unload all of them, in
+order to completely unload Speakup.
+For example, if you have loaded both the dectlk and ltlk drivers, use the
+command:
+modprobe -r speakup_dectlk speakup_ltlk
+
+You cannot unload the driver for software synthesizers when a user-space
+daemon is using /dev/softsynth. First, kill the daemon. Next, remove
+the driver with the command:
+modprobe -r speakup_soft
+
+Now, suppose we have a situation where the main Speakup component
+is built into the kernel, and some or all of the drivers are built as
+modules. Since the main part of Speakup is compiled into the kernel, a
+partial Speakup sys system has been created which we can take advantage
+of by simply echoing the synthesizer keyword into the
+/speakup/synth sys entry. This will cause the kernel to
+automatically load the appropriate driver module, and start Speakup
+talking. To switch to another synth, just echo a new keyword to the
+synth sys entry. For example, to load the DoubleTalk LT driver,
+you would type:
+
+echo ltlk >/speakup/synth
+
+You can use the modprobe -r command to unload driver modules, regardless
+of whether the main part of Speakup has been built into the kernel or
+not.
+
+8. Using Software Synthesizers
+
+Using a software synthesizer requires that some other software be
+installed and running on your system. For this reason, software
+synthesizers are not available for use at bootup, or during a system
+installation process.
+There are two freely-available solutions for software speech: Espeakup and
+Speech Dispatcher.
+These are described in subsections 8.1 and 8.2, respectively.
+
+During the rest of this section, we assume that speakup_soft is either
+built in to your kernel, or loaded as a module.
+
+If your system does not have udev installed , before you can use a
+software synthesizer, you must have created the /dev/softsynth device.
+If you have not already done so, issue the following commands as root:
+
+cd /dev
+mknod softsynth c 10 26
+
+While we are at it, we might just as well create the /dev/synth device,
+which can be used to let user space programs send information to your
+synthesizer. To create /dev/synth, change to the /dev directory, and
+issue the following command as root:
+
+mknod synth c 10 25
+
+of both.
+
+8.1. Espeakup
+
+Espeakup is a connector between Speakup and the eSpeak software synthesizer.
+Espeakup may already be available as a package for your distribution
+of Linux. If it is not packaged, you need to install it manually.
+You can find it in the contrib/ subdirectory of the Speakup sources.
+The filename is espeakup-$VERSION.tar.bz2, where $VERSION
+depends on the current release of Espeakup. The Speakup 3.1.2 source
+ships with version 0.71 of Espeakup.
+The README file included with the Espeakup sources describes the process
+of manual installation.
+
+Assuming that Espeakup is installed, either by the user or by the distributor,
+follow these steps to use it.
+
+Tell Speakup to use the "soft driver:
+echo soft > /speakup/synth
+
+Finally, start the espeakup program. There are two ways to do it.
+Both require root privileges.
+
+If Espeakup was installed as a package for your Linux distribution,
+you probably have a distribution-specific script that controls the operation
+of the daemon. Look for a file named espeakup under /etc/init.d or
+/etc/rc.d. Execute the following command with root privileges:
+/etc/init.d/espeakup start
+Replace init.d with rc.d, if your distribution uses scripts located under
+/etc/rc.d.
+Your distribution will also have a procedure for starting daemons at
+boot-time, so it is possible to have software speech as soon as user-space
+daemons are started by the bootup scripts.
+These procedures are not described in this document.
+
+If you built Espeakup manually, the "make install" step placed the binary
+under /usr/bin.
+Run the following command as root:
+/usr/bin/espeakup
+Espeakup should start speaking.
+
+8.2. Speech Dispatcher
+
+For this option, you must have a package called
+Speech Dispatcher running on your system, and it must be configured to
+work with one of its supported software synthesizers.
+
+Two open source synthesizers you might use are Flite and Festival. You
+might also choose to purchase the Software DecTalk from Fonix Sales Inc.
+If you run a google search for Fonix, you'll find their web site.
+
+You can obtain a copy of Speech Dispatcher from free(b)soft at
+http://www.freebsoft.org/. Follow the installation instructions that
+come with Speech Dispatcher in order to install and configure Speech
+Dispatcher. You can check out the web site for your Linux distribution
+in order to get a copy of either Flite or Festival. Your Linux
+distribution may also have a precompiled Speech Dispatcher package.
+
+Once you've installed, configured, and tested Speech Dispatcher with your
+chosen software synthesizer, you still need one more piece of software
+in order to make things work. You need a package called speechd-up.
+You get it from the free(b)soft web site mentioned above. After you've
+compiled and installed speechd-up, you are almost ready to begin using
+your software synthesizer.
+
+Now you can begin using your software synthesizer. In order to do so,
+echo the soft keyword to the synth sys entry like this:
+
+echo soft >/speakup/synth
+
+Next run the speechd_up command like this:
+
+speechd_up &
+
+Your synth should now start talking, and you should be able to adjust
+the pitch, rate, etc.
+
+9. Using The DecTalk PC Card
+
+The DecTalk PC card is an ISA card that is inserted into one of the ISA
+slots in your computer. It requires that the DecTalk PC software be
+installed on your computer, and that the software be loaded onto the
+Dectalk PC card before it can be used.
+
+You can get the dec_pc.tgz file from the linux-speakup.org site. The
+dec_pc.tgz file is in the ~ftp/pub/linux/speakup directory.
+
+After you have downloaded the dec_pc.tgz file, untar it in your home
+directory, and read the Readme file in the newly created dec_pc
+directory.
+
+The easiest way to get the software working is to copy the entire dec_pc
+directory into /user/local/lib. To do this, su to root in your home
+directory, and issue the command:
+
+cp dec_pc /usr/local/lib
+
+You will need to copy the dtload command from the dec_pc directory to a
+directory in your path. Either /usr/bin or /usr/local/bin is a good
+choice.
+
+You can now run the dtload command in order to load the DecTalk PC
+software onto the card. After you have done this, echo the decpc
+keyword to the synth entry in the sys system like this:
+
+echo decpc >/speakup/synth
+
+Your DecTalk PC should start talking, and then you can adjust the pitch,
+rate, volume, voice, etc. The voice entry in the Speakup sys system
+will accept a number from 0 through 7 for the DecTalk PC synthesizer,
+which will give you access to some of the DecTalk voices.
+
+10. Using Cursor Tracking
+
+In Speakup version 2.0 and later, cursor tracking is turned on by
+default. This means that when you are using an editor, Speakup will
+automatically speak characters as you move left and right with the
+cursor keys, and lines as you move up and down with the cursor keys.
+This is the traditional sort of cursor tracking.
+Recent versions of Speakup provide two additional ways to control the
+text that is spoken when the cursor is moved:
+"highlight tracking" and "read window."
+They are described later in this section.
+Sometimes, these modes get in your way, so you can disable cursor tracking
+altogether.
+
+You may select among the various forms of cursor tracking using the keypad
+asterisk key.
+Each time you press this key, a new mode is selected, and Speakup speaks
+the name of the new mode. The names for the four possible states of cursor
+tracking are: "cursoring on", "highlight tracking", "read window",
+and "cursoring off." The keypad asterisk key moves through the list of
+modes in a circular fashion.
+
+If highlight tracking is enabled, Speakup tracks highlighted text,
+rather than the cursor itself. When you move the cursor with the arrow keys,
+Speakup speaks the currently highlighted information.
+This is useful when moving through various menus and dialog boxes.
+If cursor tracking isn't helping you while navigating a menu,
+try highlight tracking.
+
+With the "read window" variety of cursor tracking, you can limit the text
+that Speakup speaks by specifying a window of interest on the screen.
+See section 15 for a description of the process of defining windows.
+When you move the cursor via the arrow keys, Speakup only speaks
+the contents of the window. This is especially helpful when you are hearing
+superfluous speech. Consider the following example.
+
+Suppose that you are at a shell prompt. You use bash, and you want to
+explore your command history using the up and down arrow keys. If you
+have enabled cursor tracking, you will hear two pieces of information.
+Speakup speaks both your shell prompt and the current entry from the
+command history. You may not want to hear the prompt repeated
+each time you move, so you can silence it by specifying a window. Find
+the last line of text on the screen. Clear the current window by pressing
+the key combination speakup f3. Use the review cursor to find the first
+character that follows your shell prompt. Press speakup + f2 twice, to
+define a one-line window. The boundaries of the window are the
+character following the shell prompt and the end of the line. Now, cycle
+through the cursor tracking modes using keypad asterisk, until Speakup
+says "read window." Move through your history using your arrow keys.
+You will notice that Speakup no longer speaks the redundant prompt.
+
+Some folks like to turn cursor tracking off while they are using the
+lynx web browser. You definitely want to turn cursor tracking off when
+you are using the alsamixer application. Otherwise, you won't be able
+to hear your mixer settings while you are using the arrow keys.
+
+11. Cut and Paste
+
+One of Speakup's more useful features is the ability to cut and paste
+text on the screen. This means that you can capture information from a
+program, and paste that captured text into a different place in the
+program, or into an entirely different program, which may even be
+running on a different console.
+
+For example, in this manual, we have made references to several web
+sites. It would be nice if you could cut and paste these urls into your
+web browser. Speakup does this quite nicely. Suppose you wanted to
+past the following url into your browser:
+
+http://linux-speakup.org/
+
+Use the speakup review keys to position the reading cursor on the first
+character of the above url. When the reading cursor is in position,
+press the keypad slash key once. Speakup will say, "mark". Next,
+position the reading cursor on the rightmost character of the above
+url. Press the keypad slash key once again to actually cut the text
+from the screen. Speakup will say, "cut". Although we call this
+cutting, Speakup does not actually delete the cut text from the screen.
+It makes a copy of the text in a special buffer for later pasting.
+
+Now that you have the url cut from the screen, you can paste it into
+your browser, or even paste the url on a command line as an argument to
+your browser.
+
+Suppose you want to start lynx and go to the Speakup site.
+
+You can switch to a different console with the alt left and right
+arrows, or you can switch to a specific console by typing alt and a
+function key. These are not Speakup commands, just standard Linux
+console capabilities.
+
+Once you've changed to an appropriate console, and are at a shell prompt,
+type the word lynx, followed by a space. Now press and hold the speakup
+key, while you type the keypad slash character. The url will be pasted
+onto the command line, just as though you had typed it in. Press the
+enter key to execute the command.
+
+The paste buffer will continue to hold the cut information, until a new
+mark and cut operation is carried out. This means you can paste the cut
+information as many times as you like before doing another cut
+operation.
+
+You are not limited to cutting and pasting only one line on the screen.
+You can also cut and paste rectangular regions of the screen. Just
+position the reading cursor at the top left corner of the text to be
+cut, mark it with the keypad slash key, then position the reading cursor
+at the bottom right corner of the region to be cut, and cut it with the
+keypad slash key.
+
+12. Changing the Pronunciation of Characters
+
+Through the /speakup/i18n/characters sys entry, Speakup gives you the
+ability to change how Speakup pronounces a given character. You could,
+for example, change how some punctuation characters are spoken. You can
+even change how Speakup will pronounce certain letters.
+
+You may, for example, wish to change how Speakup pronounces the z
+character. The author of Speakup, Kirk Reiser, is Canadian, and thus
+believes that the z should be pronounced zed. If you are an American,
+you might wish to use the zee pronunciation instead of zed. You can
+change the pronunciation of both the upper and lower case z with the
+following two commands:
+
+echo 90 zee >/speakup/characters
+echo 122 zee >/speakup/characters
+
+Let's examine the parts of the two previous commands. They are issued
+at the shell prompt, and could be placed in a startup script.
+
+The word echo tells the shell that you want to have it display the
+string of characters that follow the word echo. If you were to just
+type:
+
+echo hello.
+
+You would get the word hello printed on your screen as soon as you
+pressed the enter key. In this case, we are echoing strings that we
+want to be redirected into the sys system.
+
+The numbers 90 and 122 in the above echo commands are the ascii numeric
+values for the upper and lower case z, the characters we wish to change.
+
+The string zee is the pronunciation that we want Speakup to use for the
+upper and lower case z.
+
+The > symbol redirects the output of the echo command to a file, just
+like in DOS, or at the Windows command prompt.
+
+And finally, /speakup/i18n/characters is the file entry in the sys system
+where we want the output to be directed. Speakup looks at the numeric
+value of the character we want to change, and inserts the pronunciation
+string into an internal table.
+
+You can look at the whole table with the following command:
+
+cat /speakup/i18n/characters
+
+Speakup will then print out the entire character pronunciation table. I
+won't display it here, but leave you to look at it at your convenience.
+
+13. Mapping Keys
+
+Speakup has the capability of allowing you to assign or "map" keys to
+internal Speakup commands. This section necessarily assumes you have a
+Linux kernel source tree installed, and that it has been patched and
+configured with Speakup. How you do this is beyond the scope of this
+manual. For this information, visit the Speakup web site at
+http://linux-speakup.org/. The reason you'll need the kernel source
+tree patched with Speakup is that the genmap utility you'll need for
+processing keymaps is in the
+/usr/src/linux-<version_number>/drivers/char/speakup directory. The
+<version_number> in the above directory path is the version number of
+the Linux source tree you are working with.
+
+So ok, you've gone off and gotten your kernel source tree, and patched
+and configured it. Now you can start manipulating keymaps.
+
+You can either use the
+/usr/src/linux-<version_number>/drivers/char/speakup/speakupmap.map file
+included with the Speakup source, or you can cut and paste the copy in
+section 4 into a separate file. If you use the one in the Speakup
+source tree, make sure you make a backup of it before you start making
+changes. You have been warned!
+
+Suppose that you want to swap the key assignments for the Speakup
+say_last_char and the Speakup say_first_char commands. The
+speakupmap.map lists the key mappings for these two commands as follows:
+
+spk key_pageup = say_first_char
+spk key_pagedown = say_last_char
+
+You can edit your copy of the speakupmap.map file and swap the command
+names on the right side of the = (equals) sign. You did make a backup,
+right? The new keymap lines would look like this:
+
+spk key_pageup = say_last_char
+spk key_pagedown = say_first_char
+
+After you edit your copy of the speakupmap.map file, save it under a new
+file name, perhaps newmap.map. Then exit your editor and return to the
+shell prompt.
+
+You are now ready to load your keymap with your swapped key assignments.
+ Assuming that you saved your new keymap as the file newmap.map, you
+would load your keymap into the sys system like this:
+
+/usr/src/linux-<version_number>/drivers/char/speakup/genmap newmap.map
+>/speakup/keymap
+
+Remember to substitute your kernel version number for the
+<version_number> in the above command. Also note that although the
+above command wrapped onto two lines in this document, you should type
+it all on one line.
+
+Your say first and say last characters should now be swapped. Pressing
+speakup pagedown should read you the first non-whitespace character on
+the line your reading cursor is in, and pressing speakup pageup should
+read you the last character on the line your reading cursor is in.
+
+You should note that these new mappings will only stay in effect until
+you reboot, or until you load another keymap.
+
+One final warning. If you try to load a partial map, you will quickly
+find that all the mappings you didn't include in your file got deleted
+from the working map. Be extremely careful, and always make a backup!
+You have been warned!
+
+14. Internationalizing Speakup
+
+Speakup indicates various conditions to the user by speaking messages.
+For instance, when you move to the left edge of the screen with the
+review keys, Speakup says, "left."
+Prior to version 3.1.0 of Speakup, all of these messages were in English,
+and they could not be changed. If you used a non-English synthesizer,
+you still heard English messages, such as "left" and "cursoring on."
+In version 3.1.0 or higher, one may load translations for the various
+messages via the /sys filesystem.
+
+The directory /speakup/i18n contains several collections of messages.
+Each group of messages is stored in its own file.
+The following section lists all of these files, along with a brief description
+of each.
+
+14.1. Files Under the i18n Subdirectory
+
+* announcements:
+This file contains various general announcements, most of which cannot
+be categorized. You will find messages such as "You killed Speakup",
+"I'm alive", "leaving help", "parked", "unparked", and others.
+You will also find the names of the screen edges and cursor tracking modes
+here.
+
+* characters:
+See section 12 for a description of this file.
+
+* chartab:
+See section 12. Unlike the rest of the files in the i18n subdirectory,
+this one does not contain messages to be spoken.
+
+* colors:
+When you use the "say attributes" function, Speakup says the name of the
+foreground and background colors. These names come from the i18n/colors
+file.
+
+* ctl_keys:
+Here, you will find names of control keys. These are used with Speakup's
+say_control feature.
+
+* formatted:
+This group of messages contains embedded formatting codes, to specify
+the type and width of displayed data. If you change these, you must
+preserve all of the formatting codes, and they must appear in the order
+used by the default messages.
+
+* function_names:
+Here, you will find a list of names for Speakup functions. These are used
+by the help system. For example, suppose that you have activated help mode,
+and you pressed keypad 3. Speakup says:
+"keypad 3 is character, say next."
+The message "character, say next" names a Speakup function, and it
+comes from this function_names file.
+
+* key_names:
+Again, key_names is used by Speakup's help system. In the previous
+example, Speakup said that you pressed "keypad 3."
+This name came from the key_names file.
+
+* states:
+This file contains names for key states.
+Again, these are part of the help system. For instance, if you had pressed
+speakup + keypad 3, you would hear:
+"speakup keypad 3 is go to bottom edge."
+The speakup key is depressed, so the name of the key state is speakup.
+This part of the message comes from the states collection.
+
+14.2. Changing language
+
+14.2.1. Loading Your Own Messages
+
+The files under the i18n subdirectory all follow the same format.
+They consist of lines, with one message per line.
+Each message is represented by a number, followed by the text of the message.
+The number is the position of the message in the given collection.
+For example, if you view the file /speakup/i18n/colors, you will see the
+following list:
+
+0 black
+1 blue
+2 green
+3 cyan
+4 red
+5 magenta
+6 yellow
+7 white
+8 grey
+
+You can change one message, or you can change a whole group.
+To load a whole collection of messages from a new source, simply use
+the cp command:
+cp ~/my_colors /speakup/i18n/colors
+You can change an individual message with the echo command,
+as shown in the following example.
+
+The Spanish name for the color blue is azul.
+Looking at the colors file, we see that the name "blue" is at position 1
+within the colors group. Let's change blue to azul:
+echo '1 azul' > /speakup/i18n/colors
+The next time that Speakup says message 1 from the colors group, it will
+say "azul", rather than "blue."
+
+14.2.2. Choose a language
+
+In the future, translations into various languages will be made available,
+and most users will just load the files necessary for their language. So far,
+only French language is available beyond native Canadian English language.
+
+French is only available after you are logged in.
+
+Canadian English is the default language. To toggle another language,
+download the source of Speakup and untar it in your home directory. The
+following command should let you do this:
+
+tar xvjf speakup-<version>.tar.bz2
+
+where <version> is the version number of the application.
+
+Next, change to the newly created directory, then into the tools/ directory, and
+run the script speakup_setlocale. You are asked the language that you want to
+use. Type the number associated to your language (e.g. fr for French) then press
+Enter. Needed files are copied in the i18n directory.
+
+Note: the speakupconf must be installed on your system so that settings are saved.
+Otherwise, you will have an error: your language will be loaded but you will
+have to run the script again every time Speakup restarts.
+See section 16.1. for information about speakupconf.
+
+You will have to repeat these steps for any change of locale, i.e. if you wish
+change the speakup's language or charset (iso-8859-15 ou UTF-8).
+
+If you wish store the settings, note that at your next login, you will need to
+do:
+
+speakup load
+
+Alternatively, you can add the above line to your file
+~/.bashrc or ~/.bash_profile.
+
+If your system administrator himself ran the script, all the users will be able
+to change from English to the language chosen by root and do directly
+speakupconf load (or add this to the ~/.bashrc or
+~/.bash_profile file). If there are several languages to handle, the
+administrator (or every user) will have to run the first steps until speakupconf
+save, choosing the appropriate language, in every user's home directory. Every
+user will then be able to do speakupconf load, Speakup will load his own settings.
+
+14.3. No Support for Non-Western-European Languages
+
+As of the current release, Speakup only supports Western European languages.
+Support for the extended characters used by languages outside of the Western
+European family of languages is a work in progress.
+
+15. Using Speakup's Windowing Capability
+
+Speakup has the capability of defining and manipulating windows on the
+screen. Speakup uses the term "Window", to mean a user defined area of
+the screen. The key strokes for defining and manipulating Speakup
+windows are as follows:
+
+speakup + f2 -- Set the bounds of the window.
+Speakup + f3 -- clear the current window definition.
+speakup + f4 -- Toggle window silence on and off.
+speakup + keypad plus -- Say the currently defined window.
+
+These capabilities are useful for tracking a certain part of the screen
+without rereading the whole screen, or for silencing a part of the
+screen that is constantly changing, such as a clock or status line.
+
+There is no way to save these window settings, and you can only have one
+window defined for each virtual console. There is also no way to have
+windows automatically defined for specific applications.
+
+In order to define a window, use the review keys to move your reading
+cursor to the beginning of the area you want to define. Then press
+speakup + f2. Speakup will tell you that the window starts at the
+indicated row and column position. Then move the reading cursor to the
+end of the area to be defined as a window, and press speakup + f2 again.
+ If there is more than one line in the window, Speakup will tell you
+that the window ends at the indicated row and column position. If there
+is only one line in the window, then Speakup will tell you that the
+window is the specified line on the screen. If you are only defining a
+one line window, you can just press speakup + f2 twice after placing the
+reading cursor on the line you want to define as a window. It is not
+necessary to position the reading cursor at the end of the line in order
+to define the whole line as a window.
+
+16. Tools for Controlling Speakup
+
+The speakup distribution includes extra tools (in the tools directory)
+which were written to make speakup easier to use. This section will
+briefly describe the use of these tools.
+
+16.1. Speakupconf
+
+speakupconf began life as a contribution from Steve Holmes, a member of
+the speakup community. We would like to thank him for his work on the
+early versions of this project.
+
+This script may be installed as part of your linux distribution, but if
+it isn't, the recommended places to put it are /usr/local/bin or
+/usr/bin. This script can be run by any user, so it does not require
+root privileges.
+
+Speakupconf allows you to save and load your Speakup settings. It works
+by reading and writing the /sys files described above.
+
+The directory that speakupconf uses to store your settings depends on
+whether it is run from the root account. If you execute speakupconf as
+root, it uses the directory /etc/speakup. Otherwise, it uses the directory
+~/.speakup, where ~ is your home directory.
+Anyone who needs to use Speakup from your console can load his own custom
+settings with this script.
+
+speakupconf takes one required argument: load or save.
+Use the command
+speakupconf save
+to save your Speakup settings, and
+speakupconf load
+to load them into Speakup.
+A second argument may be specified to use an alternate directory to
+load or save the speakup parameters.
+
+16.2. Talkwith
+
+Charles Hallenbeck, another member of the speakup community, wrote the
+initial versions of this script, and we would also like to thank him for
+his work on it.
+
+This script needs root privileges to run, so if it is not installed as
+part of your linux distribution, the recommended places to install it
+are /usr/local/sbin or /usr/sbin.
+
+Talkwith allows you to switch synthesizers on the fly. It takes a synthesizer
+name as an argument. For instance,
+talkwith dectlk
+causes Speakup to use the DecTalk Express. If you wish to switch to a
+software synthesizer, you must also indicate which daemon you wish to
+use. There are two possible choices:
+spd and espeakup. spd is an abbreviation for speechd-up.
+If you wish to use espeakup for software synthesis, give the command
+talkwith soft espeakup
+To use speechd-up, type:
+talkwith soft spd
+Any arguments that follow the name of the daemon are passed to the daemon
+when it is invoked. For instance:
+talkwith espeakup --default-voice=fr
+causes espeakup to use the French voice.
+Note that talkwith must always be executed with root privileges.
+
+Talkwith does not attempt to load your settings after the new
+synthesizer is activated. You can use speakupconf to load your settings
+if desired.
+
+ GNU Free Documentation License
+ Version 1.2, November 2002
+
+
+ Copyright (C) 2000,2001,2002 Free Software Foundation, Inc.
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+
+0. PREAMBLE
+
+The purpose of this License is to make a manual, textbook, or other
+functional and useful document "free" in the sense of freedom: to
+assure everyone the effective freedom to copy and redistribute it,
+with or without modifying it, either commercially or noncommercially.
+Secondarily, this License preserves for the author and publisher a way
+to get credit for their work, while not being considered responsible
+for modifications made by others.
+
+This License is a kind of "copyleft", which means that derivative
+works of the document must themselves be free in the same sense. It
+complements the GNU General Public License, which is a copyleft
+license designed for free software.
+
+We have designed this License in order to use it for manuals for free
+software, because free software needs free documentation: a free
+program should come with manuals providing the same freedoms that the
+software does. But this License is not limited to software manuals;
+it can be used for any textual work, regardless of subject matter or
+whether it is published as a printed book. We recommend this License
+principally for works whose purpose is instruction or reference.
+
+
+1. APPLICABILITY AND DEFINITIONS
+
+This License applies to any manual or other work, in any medium, that
+contains a notice placed by the copyright holder saying it can be
+distributed under the terms of this License. Such a notice grants a
+world-wide, royalty-free license, unlimited in duration, to use that
+work under the conditions stated herein. The "Document", below,
+refers to any such manual or work. Any member of the public is a
+licensee, and is addressed as "you". You accept the license if you
+copy, modify or distribute the work in a way requiring permission
+under copyright law.
+
+A "Modified Version" of the Document means any work containing the
+Document or a portion of it, either copied verbatim, or with
+modifications and/or translated into another language.
+
+A "Secondary Section" is a named appendix or a front-matter section of
+the Document that deals exclusively with the relationship of the
+publishers or authors of the Document to the Document's overall subject
+(or to related matters) and contains nothing that could fall directly
+within that overall subject. (Thus, if the Document is in part a
+textbook of mathematics, a Secondary Section may not explain any
+mathematics.) The relationship could be a matter of historical
+connection with the subject or with related matters, or of legal,
+commercial, philosophical, ethical or political position regarding
+them.
+
+The "Invariant Sections" are certain Secondary Sections whose titles
+are designated, as being those of Invariant Sections, in the notice
+that says that the Document is released under this License. If a
+section does not fit the above definition of Secondary then it is not
+allowed to be designated as Invariant. The Document may contain zero
+Invariant Sections. If the Document does not identify any Invariant
+Sections then there are none.
+
+The "Cover Texts" are certain short passages of text that are listed,
+as Front-Cover Texts or Back-Cover Texts, in the notice that says that
+the Document is released under this License. A Front-Cover Text may
+be at most 5 words, and a Back-Cover Text may be at most 25 words.
+
+A "Transparent" copy of the Document means a machine-readable copy,
+represented in a format whose specification is available to the
+general public, that is suitable for revising the document
+straightforwardly with generic text editors or (for images composed of
+pixels) generic paint programs or (for drawings) some widely available
+drawing editor, and that is suitable for input to text formatters or
+for automatic translation to a variety of formats suitable for input
+to text formatters. A copy made in an otherwise Transparent file
+format whose markup, or absence of markup, has been arranged to thwart
+or discourage subsequent modification by readers is not Transparent.
+An image format is not Transparent if used for any substantial amount
+of text. A copy that is not "Transparent" is called "Opaque".
+
+Examples of suitable formats for Transparent copies include plain
+ASCII without markup, Texinfo input format, LaTeX input format, SGML
+or XML using a publicly available DTD, and standard-conforming simple
+HTML, PostScript or PDF designed for human modification. Examples of
+transparent image formats include PNG, XCF and JPG. Opaque formats
+include proprietary formats that can be read and edited only by
+proprietary word processors, SGML or XML for which the DTD and/or
+processing tools are not generally available, and the
+machine-generated HTML, PostScript or PDF produced by some word
+processors for output purposes only.
+
+The "Title Page" means, for a printed book, the title page itself,
+plus such following pages as are needed to hold, legibly, the material
+this License requires to appear in the title page. For works in
+formats which do not have any title page as such, "Title Page" means
+the text near the most prominent appearance of the work's title,
+preceding the beginning of the body of the text.
+
+A section "Entitled XYZ" means a named subunit of the Document whose
+title either is precisely XYZ or contains XYZ in parentheses following
+text that translates XYZ in another language. (Here XYZ stands for a
+specific section name mentioned below, such as "Acknowledgements",
+"Dedications", "Endorsements", or "History".) To "Preserve the Title"
+of such a section when you modify the Document means that it remains a
+section "Entitled XYZ" according to this definition.
+
+The Document may include Warranty Disclaimers next to the notice which
+states that this License applies to the Document. These Warranty
+Disclaimers are considered to be included by reference in this
+License, but only as regards disclaiming warranties: any other
+implication that these Warranty Disclaimers may have is void and has
+no effect on the meaning of this License.
+
+
+2. VERBATIM COPYING
+
+You may copy and distribute the Document in any medium, either
+commercially or noncommercially, provided that this License, the
+copyright notices, and the license notice saying this License applies
+to the Document are reproduced in all copies, and that you add no other
+conditions whatsoever to those of this License. You may not use
+technical measures to obstruct or control the reading or further
+copying of the copies you make or distribute. However, you may accept
+compensation in exchange for copies. If you distribute a large enough
+number of copies you must also follow the conditions in section 3.
+
+You may also lend copies, under the same conditions stated above, and
+you may publicly display copies.
+
+
+3. COPYING IN QUANTITY
+
+If you publish printed copies (or copies in media that commonly have
+printed covers) of the Document, numbering more than 100, and the
+Document's license notice requires Cover Texts, you must enclose the
+copies in covers that carry, clearly and legibly, all these Cover
+Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
+the back cover. Both covers must also clearly and legibly identify
+you as the publisher of these copies. The front cover must present
+the full title with all words of the title equally prominent and
+visible. You may add other material on the covers in addition.
+Copying with changes limited to the covers, as long as they preserve
+the title of the Document and satisfy these conditions, can be treated
+as verbatim copying in other respects.
+
+If the required texts for either cover are too voluminous to fit
+legibly, you should put the first ones listed (as many as fit
+reasonably) on the actual cover, and continue the rest onto adjacent
+pages.
+
+If you publish or distribute Opaque copies of the Document numbering
+more than 100, you must either include a machine-readable Transparent
+copy along with each Opaque copy, or state in or with each Opaque copy
+a computer-network location from which the general network-using
+public has access to download using public-standard network protocols
+a complete Transparent copy of the Document, free of added material.
+If you use the latter option, you must take reasonably prudent steps,
+when you begin distribution of Opaque copies in quantity, to ensure
+that this Transparent copy will remain thus accessible at the stated
+location until at least one year after the last time you distribute an
+Opaque copy (directly or through your agents or retailers) of that
+edition to the public.
+
+It is requested, but not required, that you contact the authors of the
+Document well before redistributing any large number of copies, to give
+them a chance to provide you with an updated version of the Document.
+
+
+4. MODIFICATIONS
+
+You may copy and distribute a Modified Version of the Document under
+the conditions of sections 2 and 3 above, provided that you release
+the Modified Version under precisely this License, with the Modified
+Version filling the role of the Document, thus licensing distribution
+and modification of the Modified Version to whoever possesses a copy
+of it. In addition, you must do these things in the Modified Version:
+
+A. Use in the Title Page (and on the covers, if any) a title distinct
+ from that of the Document, and from those of previous versions
+ (which should, if there were any, be listed in the History section
+ of the Document). You may use the same title as a previous version
+ if the original publisher of that version gives permission.
+B. List on the Title Page, as authors, one or more persons or entities
+ responsible for authorship of the modifications in the Modified
+ Version, together with at least five of the principal authors of the
+ Document (all of its principal authors, if it has fewer than five),
+ unless they release you from this requirement.
+C. State on the Title page the name of the publisher of the
+ Modified Version, as the publisher.
+D. Preserve all the copyright notices of the Document.
+E. Add an appropriate copyright notice for your modifications
+ adjacent to the other copyright notices.
+F. Include, immediately after the copyright notices, a license notice
+ giving the public permission to use the Modified Version under the
+ terms of this License, in the form shown in the Addendum below.
+G. Preserve in that license notice the full lists of Invariant Sections
+ and required Cover Texts given in the Document's license notice.
+H. Include an unaltered copy of this License.
+I. Preserve the section Entitled "History", Preserve its Title, and add
+ to it an item stating at least the title, year, new authors, and
+ publisher of the Modified Version as given on the Title Page. If
+ there is no section Entitled "History" in the Document, create one
+ stating the title, year, authors, and publisher of the Document as
+ given on its Title Page, then add an item describing the Modified
+ Version as stated in the previous sentence.
+J. Preserve the network location, if any, given in the Document for
+ public access to a Transparent copy of the Document, and likewise
+ the network locations given in the Document for previous versions
+ it was based on. These may be placed in the "History" section.
+ You may omit a network location for a work that was published at
+ least four years before the Document itself, or if the original
+ publisher of the version it refers to gives permission.
+K. For any section Entitled "Acknowledgements" or "Dedications",
+ Preserve the Title of the section, and preserve in the section all
+ the substance and tone of each of the contributor acknowledgements
+ and/or dedications given therein.
+L. Preserve all the Invariant Sections of the Document,
+ unaltered in their text and in their titles. Section numbers
+ or the equivalent are not considered part of the section titles.
+M. Delete any section Entitled "Endorsements". Such a section
+ may not be included in the Modified Version.
+N. Do not retitle any existing section to be Entitled "Endorsements"
+ or to conflict in title with any Invariant Section.
+O. Preserve any Warranty Disclaimers.
+
+If the Modified Version includes new front-matter sections or
+appendices that qualify as Secondary Sections and contain no material
+copied from the Document, you may at your option designate some or all
+of these sections as invariant. To do this, add their titles to the
+list of Invariant Sections in the Modified Version's license notice.
+These titles must be distinct from any other section titles.
+
+You may add a section Entitled "Endorsements", provided it contains
+nothing but endorsements of your Modified Version by various
+parties--for example, statements of peer review or that the text has
+been approved by an organization as the authoritative definition of a
+standard.
+
+You may add a passage of up to five words as a Front-Cover Text, and a
+passage of up to 25 words as a Back-Cover Text, to the end of the list
+of Cover Texts in the Modified Version. Only one passage of
+Front-Cover Text and one of Back-Cover Text may be added by (or
+through arrangements made by) any one entity. If the Document already
+includes a cover text for the same cover, previously added by you or
+by arrangement made by the same entity you are acting on behalf of,
+you may not add another; but you may replace the old one, on explicit
+permission from the previous publisher that added the old one.
+
+The author(s) and publisher(s) of the Document do not by this License
+give permission to use their names for publicity for or to assert or
+imply endorsement of any Modified Version.
+
+
+5. COMBINING DOCUMENTS
+
+You may combine the Document with other documents released under this
+License, under the terms defined in section 4 above for modified
+versions, provided that you include in the combination all of the
+Invariant Sections of all of the original documents, unmodified, and
+list them all as Invariant Sections of your combined work in its
+license notice, and that you preserve all their Warranty Disclaimers.
+
+The combined work need only contain one copy of this License, and
+multiple identical Invariant Sections may be replaced with a single
+copy. If there are multiple Invariant Sections with the same name but
+different contents, make the title of each such section unique by
+adding at the end of it, in parentheses, the name of the original
+author or publisher of that section if known, or else a unique number.
+Make the same adjustment to the section titles in the list of
+Invariant Sections in the license notice of the combined work.
+
+In the combination, you must combine any sections Entitled "History"
+in the various original documents, forming one section Entitled
+"History"; likewise combine any sections Entitled "Acknowledgements",
+and any sections Entitled "Dedications". You must delete all sections
+Entitled "Endorsements".
+
+
+6. COLLECTIONS OF DOCUMENTS
+
+You may make a collection consisting of the Document and other documents
+released under this License, and replace the individual copies of this
+License in the various documents with a single copy that is included in
+the collection, provided that you follow the rules of this License for
+verbatim copying of each of the documents in all other respects.
+
+You may extract a single document from such a collection, and distribute
+it individually under this License, provided you insert a copy of this
+License into the extracted document, and follow this License in all
+other respects regarding verbatim copying of that document.
+
+
+7. AGGREGATION WITH INDEPENDENT WORKS
+
+A compilation of the Document or its derivatives with other separate
+and independent documents or works, in or on a volume of a storage or
+distribution medium, is called an "aggregate" if the copyright
+resulting from the compilation is not used to limit the legal rights
+of the compilation's users beyond what the individual works permit.
+When the Document is included in an aggregate, this License does not
+apply to the other works in the aggregate which are not themselves
+derivative works of the Document.
+
+If the Cover Text requirement of section 3 is applicable to these
+copies of the Document, then if the Document is less than one half of
+the entire aggregate, the Document's Cover Texts may be placed on
+covers that bracket the Document within the aggregate, or the
+electronic equivalent of covers if the Document is in electronic form.
+Otherwise they must appear on printed covers that bracket the whole
+aggregate.
+
+
+8. TRANSLATION
+
+Translation is considered a kind of modification, so you may
+distribute translations of the Document under the terms of section 4.
+Replacing Invariant Sections with translations requires special
+permission from their copyright holders, but you may include
+translations of some or all Invariant Sections in addition to the
+original versions of these Invariant Sections. You may include a
+translation of this License, and all the license notices in the
+Document, and any Warranty Disclaimers, provided that you also include
+the original English version of this License and the original versions
+of those notices and disclaimers. In case of a disagreement between
+the translation and the original version of this License or a notice
+or disclaimer, the original version will prevail.
+
+If a section in the Document is Entitled "Acknowledgements",
+"Dedications", or "History", the requirement (section 4) to Preserve
+its Title (section 1) will typically require changing the actual
+title.
+
+
+9. TERMINATION
+
+You may not copy, modify, sublicense, or distribute the Document except
+as expressly provided for under this License. Any other attempt to
+copy, modify, sublicense or distribute the Document is void, and will
+automatically terminate your rights under this License. However,
+parties who have received copies, or rights, from you under this
+License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+
+10. FUTURE REVISIONS OF THIS LICENSE
+
+The Free Software Foundation may publish new, revised versions
+of the GNU Free Documentation License from time to time. Such new
+versions will be similar in spirit to the present version, but may
+differ in detail to address new problems or concerns. See
+https://www.gnu.org/copyleft/.
+
+Each version of the License is given a distinguishing version number.
+If the Document specifies that a particular numbered version of this
+License "or any later version" applies to it, you have the option of
+following the terms and conditions either of that specified version or
+of any later version that has been published (not as a draft) by the
+Free Software Foundation. If the Document does not specify a version
+number of this License, you may choose any version ever published (not
+as a draft) by the Free Software Foundation.
+
+
+ADDENDUM: How to use this License for your documents
+
+To use this License in a document you have written, include a copy of
+the License in the document and put the following copyright and
+license notices just after the title page:
+
+ Copyright (c) YEAR YOUR NAME.
+ Permission is granted to copy, distribute and/or modify this document
+ under the terms of the GNU Free Documentation License, Version 1.2
+ or any later version published by the Free Software Foundation;
+ with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
+ A copy of the license is included in the section entitled "GNU
+ Free Documentation License".
+
+If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts,
+replace the "with...Texts." line with this:
+
+ with the Invariant Sections being LIST THEIR TITLES, with the
+ Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.
+
+If you have Invariant Sections without Cover Texts, or some other
+combination of the three, merge those two alternatives to suit the
+situation.
+
+If your document contains nontrivial examples of program code, we
+recommend releasing these examples in parallel under your choice of
+free software license, such as the GNU General Public License,
+to permit their use in free software.
+
+The End.
diff --git a/Documentation/admin-guide/svga.rst b/Documentation/admin-guide/svga.rst
index b6c2f9acca92..9eb1e0738e84 100644
--- a/Documentation/admin-guide/svga.rst
+++ b/Documentation/admin-guide/svga.rst
@@ -12,7 +12,8 @@ Intro
This small document describes the "Video Mode Selection" feature which
allows the use of various special video modes supported by the video BIOS. Due
to usage of the BIOS, the selection is limited to boot time (before the
-kernel decompression starts) and works only on 80X86 machines.
+kernel decompression starts) and works only on 80X86 machines that are
+booted through BIOS firmware (as opposed to through UEFI, kexec, etc.).
.. note::
@@ -23,7 +24,7 @@ kernel decompression starts) and works only on 80X86 machines.
The video mode to be used is selected by a kernel parameter which can be
specified in the kernel Makefile (the SVGA_MODE=... line) or by the "vga=..."
-option of LILO (or some other boot loader you use) or by the "vidmode" utility
+option of LILO (or some other boot loader you use) or by the "xrandr" utility
(present in standard Linux utility packages). You can use the following values
of this parameter::
@@ -41,7 +42,7 @@ of this parameter::
better to use absolute mode numbers instead.
0x.... - Hexadecimal video mode ID (also displayed on the menu, see below
- for exact meaning of the ID). Warning: rdev and LILO don't support
+ for exact meaning of the ID). Warning: LILO doesn't support
hexadecimal numbers -- you have to convert it to decimal manually.
Menu
diff --git a/Documentation/admin-guide/syscall-user-dispatch.rst b/Documentation/admin-guide/syscall-user-dispatch.rst
new file mode 100644
index 000000000000..e3cfffef5a63
--- /dev/null
+++ b/Documentation/admin-guide/syscall-user-dispatch.rst
@@ -0,0 +1,94 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================
+Syscall User Dispatch
+=====================
+
+Background
+----------
+
+Compatibility layers like Wine need a way to efficiently emulate system
+calls of only a part of their process - the part that has the
+incompatible code - while being able to execute native syscalls without
+a high performance penalty on the native part of the process. Seccomp
+falls short on this task, since it has limited support to efficiently
+filter syscalls based on memory regions, and it doesn't support removing
+filters. Therefore a new mechanism is necessary.
+
+Syscall User Dispatch brings the filtering of the syscall dispatcher
+address back to userspace. The application is in control of a flip
+switch, indicating the current personality of the process. A
+multiple-personality application can then flip the switch without
+invoking the kernel, when crossing the compatibility layer API
+boundaries, to enable/disable the syscall redirection and execute
+syscalls directly (disabled) or send them to be emulated in userspace
+through a SIGSYS.
+
+The goal of this design is to provide very quick compatibility layer
+boundary crosses, which is achieved by not executing a syscall to change
+personality every time the compatibility layer executes. Instead, a
+userspace memory region exposed to the kernel indicates the current
+personality, and the application simply modifies that variable to
+configure the mechanism.
+
+There is a relatively high cost associated with handling signals on most
+architectures, like x86, but at least for Wine, syscalls issued by
+native Windows code are currently not known to be a performance problem,
+since they are quite rare, at least for modern gaming applications.
+
+Since this mechanism is designed to capture syscalls issued by
+non-native applications, it must function on syscalls whose invocation
+ABI is completely unexpected to Linux. Syscall User Dispatch, therefore
+doesn't rely on any of the syscall ABI to make the filtering. It uses
+only the syscall dispatcher address and the userspace key.
+
+As the ABI of these intercepted syscalls is unknown to Linux, these
+syscalls are not instrumentable via ptrace or the syscall tracepoints.
+
+Interface
+---------
+
+A thread can setup this mechanism on supported kernels by executing the
+following prctl:
+
+ prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <offset>, <length>, [selector])
+
+<op> is either PR_SYS_DISPATCH_ON or PR_SYS_DISPATCH_OFF, to enable and
+disable the mechanism globally for that thread. When
+PR_SYS_DISPATCH_OFF is used, the other fields must be zero.
+
+[<offset>, <offset>+<length>) delimit a memory region interval
+from which syscalls are always executed directly, regardless of the
+userspace selector. This provides a fast path for the C library, which
+includes the most common syscall dispatchers in the native code
+applications, and also provides a way for the signal handler to return
+without triggering a nested SIGSYS on (rt\_)sigreturn. Users of this
+interface should make sure that at least the signal trampoline code is
+included in this region. In addition, for syscalls that implement the
+trampoline code on the vDSO, that trampoline is never intercepted.
+
+[selector] is a pointer to a char-sized region in the process memory
+region, that provides a quick way to enable disable syscall redirection
+thread-wide, without the need to invoke the kernel directly. selector
+can be set to SYSCALL_DISPATCH_FILTER_ALLOW or SYSCALL_DISPATCH_FILTER_BLOCK.
+Any other value should terminate the program with a SIGSYS.
+
+Additionally, a tasks syscall user dispatch configuration can be peeked
+and poked via the PTRACE_(GET|SET)_SYSCALL_USER_DISPATCH_CONFIG ptrace
+requests. This is useful for checkpoint/restart software.
+
+Security Notes
+--------------
+
+Syscall User Dispatch provides functionality for compatibility layers to
+quickly capture system calls issued by a non-native part of the
+application, while not impacting the Linux native regions of the
+process. It is not a mechanism for sandboxing system calls, and it
+should not be seen as a security mechanism, since it is trivial for a
+malicious application to subvert the mechanism by jumping to an allowed
+dispatcher region prior to executing the syscall, or to discover the
+address and modify the selector value. If the use case requires any
+kind of security sandboxing, Seccomp should be used instead.
+
+Any fork or exec of the existing process resets the mechanism to
+PR_SYS_DISPATCH_OFF.
diff --git a/Documentation/admin-guide/sysctl/abi.rst b/Documentation/admin-guide/sysctl/abi.rst
index 599bcde7f0b7..4e6db0a2a4c0 100644
--- a/Documentation/admin-guide/sysctl/abi.rst
+++ b/Documentation/admin-guide/sysctl/abi.rst
@@ -1,67 +1,34 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
================================
Documentation for /proc/sys/abi/
================================
-kernel version 2.6.0.test2
+.. See scripts/check-sysctl-docs to keep this up to date:
+.. scripts/check-sysctl-docs -vtable="abi" \
+.. Documentation/admin-guide/sysctl/abi.rst \
+.. $(git grep -l register_sysctl_)
-Copyright (c) 2003, Fabian Frederick <ffrederick@users.sourceforge.net>
+Copyright (c) 2020, Stephen Kitt
-For general info: index.rst.
+For general info, see Documentation/admin-guide/sysctl/index.rst.
------------------------------------------------------------------------------
-This path is binary emulation relevant aka personality types aka abi.
-When a process is executed, it's linked to an exec_domain whose
-personality is defined using values available from /proc/sys/abi.
-You can find further details about abi in include/linux/personality.h.
-
-Here are the files featuring in 2.6 kernel:
-
-- defhandler_coff
-- defhandler_elf
-- defhandler_lcall7
-- defhandler_libcso
-- fake_utsname
-- trace
-
-defhandler_coff
----------------
-
-defined value:
- PER_SCOSVR3::
-
- 0x0003 | STICKY_TIMEOUTS | WHOLE_SECONDS | SHORT_INODE
-
-defhandler_elf
---------------
-
-defined value:
- PER_LINUX::
-
- 0
-
-defhandler_lcall7
------------------
-
-defined value :
- PER_SVR4::
-
- 0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
-
-defhandler_libsco
------------------
-
-defined value:
- PER_SVR4::
+The files in ``/proc/sys/abi`` can be used to see and modify
+ABI-related settings.
- 0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
+Currently, these files might (depending on your configuration)
+show up in ``/proc/sys/kernel``:
-fake_utsname
-------------
+.. contents:: :local:
-Unused
+vsyscall32 (x86)
+================
-trace
------
+Determines whether the kernels maps a vDSO page into 32-bit processes;
+can be set to 1 to enable, or 0 to disable. Defaults to enabled if
+``CONFIG_COMPAT_VDSO`` is set, disabled otherwise.
-Unused
+This controls the same setting as the ``vdso32`` kernel boot
+parameter.
diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
index 2a45119e3331..47499a1742bd 100644
--- a/Documentation/admin-guide/sysctl/fs.rst
+++ b/Documentation/admin-guide/sysctl/fs.rst
@@ -2,8 +2,6 @@
Documentation for /proc/sys/fs/
===============================
-kernel version 2.2.10
-
Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org>
Copyright (c) 2009, Shen Feng<shen@cn.fujitsu.com>
@@ -12,159 +10,130 @@ For general info and legal blurb, please look in intro.rst.
------------------------------------------------------------------------------
-This file contains documentation for the sysctl files in
-/proc/sys/fs/ and is valid for Linux kernel version 2.2.
+This file contains documentation for the sysctl files and directories
+in ``/proc/sys/fs/``.
The files in this directory can be used to tune and monitor
miscellaneous and general things in the operation of the Linux
-kernel. Since some of the files _can_ be used to screw up your
+kernel. Since some of the files *can* be used to screw up your
system, it is advisable to read both documentation and source
before actually making adjustments.
1. /proc/sys/fs
===============
-Currently, these files are in /proc/sys/fs:
-
-- aio-max-nr
-- aio-nr
-- dentry-state
-- dquot-max
-- dquot-nr
-- file-max
-- file-nr
-- inode-max
-- inode-nr
-- inode-state
-- nr_open
-- overflowuid
-- overflowgid
-- pipe-user-pages-hard
-- pipe-user-pages-soft
-- protected_fifos
-- protected_hardlinks
-- protected_regular
-- protected_symlinks
-- suid_dumpable
-- super-max
-- super-nr
+Currently, these files might (depending on your configuration)
+show up in ``/proc/sys/fs``:
+
+.. contents:: :local:
aio-nr & aio-max-nr
-------------------
-aio-nr is the running total of the number of events specified on the
-io_setup system call for all currently active aio contexts. If aio-nr
-reaches aio-max-nr then io_setup will fail with EAGAIN. Note that
-raising aio-max-nr does not result in the pre-allocation or re-sizing
-of any kernel data structures.
+``aio-nr`` shows the current system-wide number of asynchronous io
+requests. ``aio-max-nr`` allows you to change the maximum value
+``aio-nr`` can grow to. If ``aio-nr`` reaches ``aio-nr-max`` then
+``io_setup`` will fail with ``EAGAIN``. Note that raising
+``aio-max-nr`` does not result in the
+pre-allocation or re-sizing of any kernel data structures.
dentry-state
------------
-From linux/include/linux/dcache.h::
+This file shows the values in ``struct dentry_stat_t``, as defined in
+``fs/dcache.c``::
struct dentry_stat_t dentry_stat {
- int nr_dentry;
- int nr_unused;
- int age_limit; /* age in seconds */
- int want_pages; /* pages requested by system */
- int nr_negative; /* # of unused negative dentries */
- int dummy; /* Reserved for future use */
+ long nr_dentry;
+ long nr_unused;
+ long age_limit; /* age in seconds */
+ long want_pages; /* pages requested by system */
+ long nr_negative; /* # of unused negative dentries */
+ long dummy; /* Reserved for future use */
};
Dentries are dynamically allocated and deallocated.
-nr_dentry shows the total number of dentries allocated (active
-+ unused). nr_unused shows the number of dentries that are not
+``nr_dentry`` shows the total number of dentries allocated (active
++ unused). ``nr_unused shows`` the number of dentries that are not
actively used, but are saved in the LRU list for future reuse.
-Age_limit is the age in seconds after which dcache entries
-can be reclaimed when memory is short and want_pages is
-nonzero when shrink_dcache_pages() has been called and the
+``age_limit`` is the age in seconds after which dcache entries
+can be reclaimed when memory is short and ``want_pages`` is
+nonzero when ``shrink_dcache_pages()`` has been called and the
dcache isn't pruned yet.
-nr_negative shows the number of unused dentries that are also
+``nr_negative`` shows the number of unused dentries that are also
negative dentries which do not map to any files. Instead,
they help speeding up rejection of non-existing files provided
by the users.
-dquot-max & dquot-nr
---------------------
-
-The file dquot-max shows the maximum number of cached disk
-quota entries.
-
-The file dquot-nr shows the number of allocated disk quota
-entries and the number of free disk quota entries.
-
-If the number of free cached disk quotas is very low and
-you have some awesome number of simultaneous system users,
-you might want to raise the limit.
-
-
file-max & file-nr
------------------
-The value in file-max denotes the maximum number of file-
+The value in ``file-max`` denotes the maximum number of file-
handles that the Linux kernel will allocate. When you get lots
of error messages about running out of file handles, you might
want to increase this limit.
Historically,the kernel was able to allocate file handles
dynamically, but not to free them again. The three values in
-file-nr denote the number of allocated file handles, the number
+``file-nr`` denote the number of allocated file handles, the number
of allocated but unused file handles, and the maximum number of
-file handles. Linux 2.6 always reports 0 as the number of free
+file handles. Linux 2.6 and later always reports 0 as the number of free
file handles -- this is not an error, it just means that the
number of allocated file handles exactly matches the number of
used file handles.
-Attempts to allocate more file descriptors than file-max are
-reported with printk, look for "VFS: file-max limit <number>
-reached".
+Attempts to allocate more file descriptors than ``file-max`` are
+reported with ``printk``, look for::
+ VFS: file-max limit <number> reached
-nr_open
--------
-
-This denotes the maximum number of file-handles a process can
-allocate. Default value is 1024*1024 (1048576) which should be
-enough for most machines. Actual limit depends on RLIMIT_NOFILE
-resource limit.
+in the kernel logs.
-inode-max, inode-nr & inode-state
----------------------------------
+inode-nr & inode-state
+----------------------
As with file handles, the kernel allocates the inode structures
dynamically, but can't free them yet.
-The value in inode-max denotes the maximum number of inode
-handlers. This value should be 3-4 times larger than the value
-in file-max, since stdin, stdout and network sockets also
-need an inode struct to handle them. When you regularly run
-out of inodes, you need to increase this value.
-
-The file inode-nr contains the first two items from
-inode-state, so we'll skip to that file...
+The file ``inode-nr`` contains the first two items from
+``inode-state``, so we'll skip to that file...
-Inode-state contains three actual numbers and four dummies.
-The actual numbers are, in order of appearance, nr_inodes,
-nr_free_inodes and preshrink.
+``inode-state`` contains three actual numbers and four dummies.
+The actual numbers are, in order of appearance, ``nr_inodes``,
+``nr_free_inodes`` and ``preshrink``.
-Nr_inodes stands for the number of inodes the system has
-allocated, this can be slightly more than inode-max because
-Linux allocates them one pageful at a time.
+``nr_inodes`` stands for the number of inodes the system has
+allocated.
-Nr_free_inodes represents the number of free inodes (?) and
-preshrink is nonzero when the nr_inodes > inode-max and the
+``nr_free_inodes`` represents the number of free inodes (?) and
+preshrink is nonzero when the
system needs to prune the inode list instead of allocating
more.
+mount-max
+---------
+
+This denotes the maximum number of mounts that may exist
+in a mount namespace.
+
+
+nr_open
+-------
+
+This denotes the maximum number of file-handles a process can
+allocate. Default value is 1024*1024 (1048576) which should be
+enough for most machines. Actual limit depends on ``RLIMIT_NOFILE``
+resource limit.
+
+
overflowgid & overflowuid
-------------------------
@@ -192,7 +161,7 @@ pipe-user-pages-soft
Maximum total number of pages a non-privileged user may allocate for pipes
before the pipe size gets limited to a single page. Once this limit is reached,
new pipes will be limited to a single page in size for this user in order to
-limit total memory usage, and trying to increase them using fcntl() will be
+limit total memory usage, and trying to increase them using ``fcntl()`` will be
denied until usage goes below the limit again. The default value allows to
allocate up to 1024 pipes at their default size. When set to 0, no limit is
applied.
@@ -207,7 +176,7 @@ file.
When set to "0", writing to FIFOs is unrestricted.
-When set to "1" don't allow O_CREAT open on FIFOs that we don't own
+When set to "1" don't allow ``O_CREAT`` open on FIFOs that we don't own
in world writable sticky directories, unless they are owned by the
owner of the directory.
@@ -221,7 +190,7 @@ protected_hardlinks
A long-standing class of security issues is the hardlink-based
time-of-check-time-of-use race, most commonly seen in world-writable
-directories like /tmp. The common method of exploitation of this flaw
+directories like ``/tmp``. The common method of exploitation of this flaw
is to cross privilege boundaries when following a given hardlink (i.e. a
root process follows a hardlink created by another user). Additionally,
on systems without separated partitions, this stops unauthorized users
@@ -239,13 +208,13 @@ This protection is based on the restrictions in Openwall and grsecurity.
protected_regular
-----------------
-This protection is similar to protected_fifos, but it
+This protection is similar to `protected_fifos`_, but it
avoids writes to an attacker-controlled regular file, where a program
expected to create one.
When set to "0", writing to regular files is unrestricted.
-When set to "1" don't allow O_CREAT open on regular files that we
+When set to "1" don't allow ``O_CREAT`` open on regular files that we
don't own in world writable sticky directories, unless they are
owned by the owner of the directory.
@@ -257,11 +226,11 @@ protected_symlinks
A long-standing class of security issues is the symlink-based
time-of-check-time-of-use race, most commonly seen in world-writable
-directories like /tmp. The common method of exploitation of this flaw
+directories like ``/tmp``. The common method of exploitation of this flaw
is to cross privilege boundaries when following a given symlink (i.e. a
root process follows a symlink belonging to another user). For a likely
incomplete list of hundreds of examples across the years, please see:
-http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
+https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
When set to "0", symlink following behavior is unrestricted.
@@ -272,23 +241,25 @@ follower match, or when the directory owner matches the symlink's owner.
This protection is based on the restrictions in Openwall and grsecurity.
-suid_dumpable:
---------------
+suid_dumpable
+-------------
This value can be used to query and set the core dump mode for setuid
or otherwise protected/tainted binaries. The modes are
= ========== ===============================================================
-0 (default) traditional behaviour. Any process which has changed
+0 (default) Traditional behaviour. Any process which has changed
privilege levels or is execute only will not be dumped.
-1 (debug) all processes dump core when possible. The core dump is
+1 (debug) All processes dump core when possible. The core dump is
owned by the current user and no security is applied. This is
intended for system debugging situations only.
Ptrace is unchecked.
This is insecure as it allows regular users to examine the
memory contents of privileged processes.
-2 (suidsafe) any binary which normally would not be dumped is dumped
- anyway, but only if the "core_pattern" kernel sysctl is set to
+2 (suidsafe) Any binary which normally would not be dumped is dumped
+ anyway, but only if the ``core_pattern`` kernel sysctl (see
+ :ref:`Documentation/admin-guide/sysctl/kernel.rst <core_pattern>`)
+ is set to
either a pipe handler or a fully qualified path. (For more
details on this limitation, see CVE-2006-2451.) This mode is
appropriate when administrators are attempting to debug
@@ -301,36 +272,11 @@ or otherwise protected/tainted binaries. The modes are
= ========== ===============================================================
-super-max & super-nr
---------------------
-
-These numbers control the maximum number of superblocks, and
-thus the maximum number of mounted filesystems the kernel
-can have. You only need to increase super-max if you need to
-mount more filesystems than the current value in super-max
-allows you to.
-
-
-aio-nr & aio-max-nr
--------------------
-
-aio-nr shows the current system-wide number of asynchronous io
-requests. aio-max-nr allows you to change the maximum value
-aio-nr can grow to.
-
-
-mount-max
----------
-
-This denotes the maximum number of mounts that may exist
-in a mount namespace.
-
-
2. /proc/sys/fs/binfmt_misc
===========================
-Documentation for the files in /proc/sys/fs/binfmt_misc is
+Documentation for the files in ``/proc/sys/fs/binfmt_misc`` is
in Documentation/admin-guide/binfmt-misc.rst.
@@ -343,28 +289,32 @@ creation of a user space library that implements the POSIX message queues
API (as noted by the MSG tag in the POSIX 1003.1-2001 version of the System
Interfaces specification.)
-The "mqueue" filesystem contains values for determining/setting the amount of
-resources used by the file system.
+The "mqueue" filesystem contains values for determining/setting the
+amount of resources used by the file system.
-/proc/sys/fs/mqueue/queues_max is a read/write file for setting/getting the
-maximum number of message queues allowed on the system.
+``/proc/sys/fs/mqueue/queues_max`` is a read/write file for
+setting/getting the maximum number of message queues allowed on the
+system.
-/proc/sys/fs/mqueue/msg_max is a read/write file for setting/getting the
-maximum number of messages in a queue value. In fact it is the limiting value
-for another (user) limit which is set in mq_open invocation. This attribute of
-a queue must be less or equal then msg_max.
+``/proc/sys/fs/mqueue/msg_max`` is a read/write file for
+setting/getting the maximum number of messages in a queue value. In
+fact it is the limiting value for another (user) limit which is set in
+``mq_open`` invocation. This attribute of a queue must be less than
+or equal to ``msg_max``.
-/proc/sys/fs/mqueue/msgsize_max is a read/write file for setting/getting the
-maximum message size value (it is every message queue's attribute set during
-its creation).
+``/proc/sys/fs/mqueue/msgsize_max`` is a read/write file for
+setting/getting the maximum message size value (it is an attribute of
+every message queue, set during its creation).
-/proc/sys/fs/mqueue/msg_default is a read/write file for setting/getting the
-default number of messages in a queue value if attr parameter of mq_open(2) is
-NULL. If it exceed msg_max, the default value is initialized msg_max.
+``/proc/sys/fs/mqueue/msg_default`` is a read/write file for
+setting/getting the default number of messages in a queue value if the
+``attr`` parameter of ``mq_open(2)`` is ``NULL``. If it exceeds
+``msg_max``, the default value is initialized to ``msg_max``.
-/proc/sys/fs/mqueue/msgsize_default is a read/write file for setting/getting
-the default message size value if attr parameter of mq_open(2) is NULL. If it
-exceed msgsize_max, the default value is initialized msgsize_max.
+``/proc/sys/fs/mqueue/msgsize_default`` is a read/write file for
+setting/getting the default message size value if the ``attr``
+parameter of ``mq_open(2)`` is ``NULL``. If it exceeds
+``msgsize_max``, the default value is initialized to ``msgsize_max``.
4. /proc/sys/fs/epoll - Configuration options for the epoll interface
=====================================================================
@@ -378,7 +328,7 @@ Every epoll file descriptor can store a number of files to be monitored
for event readiness. Each one of these monitored files constitutes a "watch".
This configuration option sets the maximum number of "watches" that are
allowed for each user.
-Each "watch" costs roughly 90 bytes on a 32bit kernel, and roughly 160 bytes
-on a 64bit one.
-The current default value for max_user_watches is the 1/32 of the available
-low memory, divided for the "watch" cost in bytes.
+Each "watch" costs roughly 90 bytes on a 32-bit kernel, and roughly 160 bytes
+on a 64-bit one.
+The current default value for ``max_user_watches`` is 4% of the
+available low memory, divided by the "watch" cost in bytes.
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 0d427fd10941..6584a1f9bfe3 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -9,12 +9,13 @@ Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org>
Copyright (c) 2009, Shen Feng<shen@cn.fujitsu.com>
-For general info and legal blurb, please look in :doc:`index`.
+For general info and legal blurb, please look in
+Documentation/admin-guide/sysctl/index.rst.
------------------------------------------------------------------------------
This file contains documentation for the sysctl files in
-``/proc/sys/kernel/`` and is valid for Linux kernel version 2.2.
+``/proc/sys/kernel/``.
The files in this directory can be used to tune and monitor
miscellaneous and general things in the operation of the Linux
@@ -37,8 +38,8 @@ acct
If BSD-style process accounting is enabled these values control
its behaviour. If free space on filesystem where the log lives
-goes below ``lowwater``% accounting suspends. If free space gets
-above ``highwater``% accounting resumes. ``frequency`` determines
+goes below ``lowwater``\ % accounting suspends. If free space gets
+above ``highwater``\ % accounting resumes. ``frequency`` determines
how often do we check the amount of free space (value is in
seconds). Default:
@@ -54,7 +55,7 @@ free space valid for 30 seconds.
acpi_video_flags
================
-See :doc:`/power/video`. This allows the video resume mode to be set,
+See Documentation/power/video.rst. This allows the video resume mode to be set,
in a similar fashion to the ``acpi_sleep`` kernel parameter, by
combining the following values:
@@ -64,6 +65,11 @@ combining the following values:
4 s3_beep
= =======
+arch
+====
+
+The machine hardware name, the same output as ``uname -m``
+(e.g. ``x86_64`` or ``aarch64``).
auto_msgmni
===========
@@ -89,7 +95,7 @@ is 0x15 and the full version number is 0x234, this file will contain
the value 340 = 0x154.
See the ``type_of_loader`` and ``ext_loader_type`` fields in
-:doc:`/x86/boot` for additional information.
+Documentation/arch/x86/boot.rst for additional information.
bootloader_version (x86 only)
@@ -99,7 +105,31 @@ The complete bootloader version number. In the example above, this
file will contain the value 564 = 0x234.
See the ``type_of_loader`` and ``ext_loader_ver`` fields in
-:doc:`/x86/boot` for additional information.
+Documentation/arch/x86/boot.rst for additional information.
+
+
+bpf_stats_enabled
+=================
+
+Controls whether the kernel should collect statistics on BPF programs
+(total time spent running, number of times run...). Enabling
+statistics causes a slight reduction in performance on each program
+run. The statistics can be seen using ``bpftool``.
+
+= ===================================
+0 Don't collect statistics (default).
+1 Collect statistics.
+= ===================================
+
+
+cad_pid
+=======
+
+This is the pid which will be signalled on reboot (notably, by
+Ctrl-Alt-Delete). Writing a value to this file which doesn't
+correspond to a running process will result in ``-ESRCH``.
+
+See also `ctrl-alt-del`_.
cap_last_cap
@@ -109,6 +139,8 @@ Highest valid capability of the running kernel. Exports
``CAP_LAST_CAP`` from the kernel.
+.. _core_pattern:
+
core_pattern
============
@@ -140,9 +172,11 @@ core_pattern
%s signal number
%t UNIX time of dump
%h hostname
- %e executable filename (may be shortened)
+ %e executable filename (may be shortened, could be changed by prctl etc)
+ %f executable filename
%E executable path
%c maximum size of core file by resource limit RLIMIT_CORE
+ %C CPU the task ran on
%<OTHER> both are dropped
======== ==========================================
@@ -211,7 +245,7 @@ This toggle indicates whether unprivileged users are prevented
from using ``dmesg(8)`` to view messages from the kernel's log
buffer.
When ``dmesg_restrict`` is set to 0 there are no restrictions.
-When ``dmesg_restrict`` is set set to 1, users must have
+When ``dmesg_restrict`` is set to 1, users must have
``CAP_SYSLOG`` to use ``dmesg(8)``.
The kernel config option ``CONFIG_SECURITY_DMESG_RESTRICT`` sets the
@@ -241,6 +275,40 @@ domain names are in general different. For a detailed discussion
see the ``hostname(1)`` man page.
+firmware_config
+===============
+
+See Documentation/driver-api/firmware/fallback-mechanisms.rst.
+
+The entries in this directory allow the firmware loader helper
+fallback to be controlled:
+
+* ``force_sysfs_fallback``, when set to 1, forces the use of the
+ fallback;
+* ``ignore_sysfs_fallback``, when set to 1, ignores any fallback.
+
+
+ftrace_dump_on_oops
+===================
+
+Determines whether ``ftrace_dump()`` should be called on an oops (or
+kernel panic). This will output the contents of the ftrace buffers to
+the console. This is very useful for capturing traces that lead to
+crashes and outputting them to a serial console.
+
+= ===================================================
+0 Disabled (default).
+1 Dump buffers of all CPUs.
+2 Dump the buffer of the CPU that triggered the oops.
+= ===================================================
+
+
+ftrace_enabled, stack_tracer_enabled
+====================================
+
+See Documentation/trace/ftrace.rst.
+
+
hardlockup_all_cpu_backtrace
============================
@@ -266,7 +334,7 @@ when a hard lockup is detected.
1 Panic on hard lockup.
= ===========================
-See :doc:`/admin-guide/lockup-watchdogs` for more information.
+See Documentation/admin-guide/lockup-watchdogs.rst for more information.
This can also be set using the nmi_watchdog kernel parameter.
@@ -274,7 +342,26 @@ hotplug
=======
Path for the hotplug policy agent.
-Default value is "``/sbin/hotplug``".
+Default value is ``CONFIG_UEVENT_HELPER_PATH``, which in turn defaults
+to the empty string.
+
+This file only exists when ``CONFIG_UEVENT_HELPER`` is enabled. Most
+modern systems rely exclusively on the netlink-based uevent source and
+don't need this.
+
+
+hung_task_all_cpu_backtrace
+===========================
+
+If this option is set, the kernel will send an NMI to all CPUs to dump
+their backtraces when a hung task is detected. This file shows up if
+CONFIG_DETECT_HUNG_TASK and CONFIG_SMP are enabled.
+
+0: Won't show all CPUs backtraces when a hung task is detected.
+This is the default behavior.
+
+1: Will non-maskably interrupt all CPUs and dump their backtraces when
+a hung task is detected.
hung_task_panic
@@ -344,12 +431,58 @@ Controls whether the panic kmsg data should be reported to Hyper-V.
= =========================================================
+ignore-unaligned-usertrap
+=========================
+
+On architectures where unaligned accesses cause traps, and where this
+feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN``;
+currently, ``arc`` and ``loongarch``), controls whether all
+unaligned traps are logged.
+
+= =============================================================
+0 Log all unaligned accesses.
+1 Only warn the first time a process traps. This is the default
+ setting.
+= =============================================================
+
+See also `unaligned-trap`_.
+
+io_uring_disabled
+=================
+
+Prevents all processes from creating new io_uring instances. Enabling this
+shrinks the kernel's attack surface.
+
+= ======================================================================
+0 All processes can create io_uring instances as normal. This is the
+ default setting.
+1 io_uring creation is disabled (io_uring_setup() will fail with
+ -EPERM) for unprivileged processes not in the io_uring_group group.
+ Existing io_uring instances can still be used. See the
+ documentation for io_uring_group for more information.
+2 io_uring creation is disabled for all processes. io_uring_setup()
+ always fails with -EPERM. Existing io_uring instances can still be
+ used.
+= ======================================================================
+
+
+io_uring_group
+==============
+
+When io_uring_disabled is set to 1, a process must either be
+privileged (CAP_SYS_ADMIN) or be in the io_uring_group group in order
+to create an io_uring instance. If io_uring_group is set to -1 (the
+default), only processes with the CAP_SYS_ADMIN capability may create
+io_uring instances.
+
+
kexec_load_disabled
===================
-A toggle indicating if the ``kexec_load`` syscall has been disabled.
-This value defaults to 0 (false: ``kexec_load`` enabled), but can be
-set to 1 (true: ``kexec_load`` disabled).
+A toggle indicating if the syscalls ``kexec_load`` and
+``kexec_file_load`` have been disabled.
+This value defaults to 0 (false: ``kexec_*load`` enabled), but can be
+set to 1 (true: ``kexec_*load`` disabled).
Once true, kexec can no longer be used, and the toggle cannot be set
back to false.
This allows a kexec image to be loaded before disabling the syscall,
@@ -357,6 +490,24 @@ allowing a system to set up (and later use) an image without it being
altered.
Generally used together with the `modules_disabled`_ sysctl.
+kexec_load_limit_panic
+======================
+
+This parameter specifies a limit to the number of times the syscalls
+``kexec_load`` and ``kexec_file_load`` can be called with a crash
+image. It can only be set with a more restrictive value than the
+current one.
+
+== ======================================================
+-1 Unlimited calls to kexec. This is the default setting.
+N Number of calls left.
+== ======================================================
+
+kexec_load_limit_reboot
+=======================
+
+Similar functionality as ``kexec_load_limit_panic``, but for a normal
+image.
kptr_restrict
=============
@@ -391,10 +542,11 @@ modprobe
========
The full path to the usermode helper for autoloading kernel modules,
-by default "/sbin/modprobe". This binary is executed when the kernel
-requests a module. For example, if userspace passes an unknown
-filesystem type to mount(), then the kernel will automatically request
-the corresponding filesystem module by executing this usermode helper.
+by default ``CONFIG_MODPROBE_PATH``, which in turn defaults to
+"/sbin/modprobe". This binary is executed when the kernel requests a
+module. For example, if userspace passes an unknown filesystem type
+to mount(), then the kernel will automatically request the
+corresponding filesystem module by executing this usermode helper.
This usermode helper should insert the needed module into the kernel.
This sysctl only affects module autoloading. It has no effect on the
@@ -459,6 +611,15 @@ Notes:
successful IPC object allocation. If an IPC object allocation syscall
fails, it is undefined if the value remains unmodified or is reset to -1.
+
+ngroups_max
+===========
+
+Maximum number of supplementary groups, _i.e._ the maximum size which
+``setgroups`` will accept. Exports ``NGROUPS_MAX`` from the kernel.
+
+
+
nmi_watchdog
============
@@ -480,70 +641,87 @@ in a KVM virtual machine. This default can be overridden by adding::
nmi_watchdog=1
-to the guest kernel command line (see :doc:`/admin-guide/kernel-parameters`).
+to the guest kernel command line (see
+Documentation/admin-guide/kernel-parameters.rst).
+
+
+nmi_wd_lpm_factor (PPC only)
+============================
+
+Factor to apply to the NMI watchdog timeout (only when ``nmi_watchdog`` is
+set to 1). This factor represents the percentage added to
+``watchdog_thresh`` when calculating the NMI watchdog timeout during an
+LPM. The soft lockup timeout is not impacted.
+
+A value of 0 means no change. The default value is 200 meaning the NMI
+watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
numa_balancing
==============
-Enables/disables automatic page fault based NUMA memory
-balancing. Memory is moved automatically to nodes
-that access it often.
+Enables/disables and configures automatic page fault based NUMA memory
+balancing. Memory is moved automatically to nodes that access it often.
+The value to set can be the result of ORing the following:
-Enables/disables automatic NUMA memory balancing. On NUMA machines, there
-is a performance penalty if remote memory is accessed by a CPU. When this
-feature is enabled the kernel samples what task thread is accessing memory
-by periodically unmapping pages and later trapping a page fault. At the
-time of the page fault, it is determined if the data being accessed should
-be migrated to a local memory node.
+= =================================
+0 NUMA_BALANCING_DISABLED
+1 NUMA_BALANCING_NORMAL
+2 NUMA_BALANCING_MEMORY_TIERING
+= =================================
+
+Or NUMA_BALANCING_NORMAL to optimize page placement among different
+NUMA nodes to reduce remote accessing. On NUMA machines, there is a
+performance penalty if remote memory is accessed by a CPU. When this
+feature is enabled the kernel samples what task thread is accessing
+memory by periodically unmapping pages and later trapping a page
+fault. At the time of the page fault, it is determined if the data
+being accessed should be migrated to a local memory node.
The unmapping of pages and trapping faults incur additional overhead that
ideally is offset by improved memory locality but there is no universal
guarantee. If the target workload is already bound to NUMA nodes then this
-feature should be disabled. Otherwise, if the system overhead from the
-feature is too high then the rate the kernel samples for NUMA hinting
-faults may be controlled by the `numa_balancing_scan_period_min_ms,
-numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
-numa_balancing_scan_size_mb`_, and numa_balancing_settle_count sysctls.
+feature should be disabled.
+Or NUMA_BALANCING_MEMORY_TIERING to optimize page placement among
+different types of memory (represented as different NUMA nodes) to
+place the hot pages in the fast memory. This is implemented based on
+unmapping and page fault too.
-numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
-===============================================================================================================================
+numa_balancing_promote_rate_limit_MBps
+======================================
+Too high promotion/demotion throughput between different memory types
+may hurt application latency. This can be used to rate limit the
+promotion throughput. The per-node max promotion throughput in MB/s
+will be limited to be no more than the set value.
-Automatic NUMA balancing scans tasks address space and unmaps pages to
-detect if pages are properly placed or if the data should be migrated to a
-memory node local to where the task is running. Every "scan delay" the task
-scans the next "scan size" number of pages in its address space. When the
-end of the address space is reached the scanner restarts from the beginning.
+A rule of thumb is to set this to less than 1/10 of the PMEM node
+write bandwidth.
-In combination, the "scan delay" and "scan size" determine the scan rate.
-When "scan delay" decreases, the scan rate increases. The scan delay and
-hence the scan rate of every task is adaptive and depends on historical
-behaviour. If pages are properly placed then the scan delay increases,
-otherwise the scan delay decreases. The "scan size" is not adaptive but
-the higher the "scan size", the higher the scan rate.
+oops_all_cpu_backtrace
+======================
-Higher scan rates incur higher system overhead as page faults must be
-trapped and potentially data must be migrated. However, the higher the scan
-rate, the more quickly a tasks memory is migrated to a local node if the
-workload pattern changes and minimises performance impact due to remote
-memory accesses. These sysctls control the thresholds for scan delays and
-the number of pages scanned.
+If this option is set, the kernel will send an NMI to all CPUs to dump
+their backtraces when an oops event occurs. It should be used as a last
+resort in case a panic cannot be triggered (to protect VMs running, for
+example) or kdump can't be collected. This file shows up if CONFIG_SMP
+is enabled.
-``numa_balancing_scan_period_min_ms`` is the minimum time in milliseconds to
-scan a tasks virtual memory. It effectively controls the maximum scanning
-rate for each task.
+0: Won't show all CPUs backtraces when an oops is detected.
+This is the default behavior.
-``numa_balancing_scan_delay_ms`` is the starting "scan delay" used for a task
-when it initially forks.
+1: Will non-maskably interrupt all CPUs and dump their backtraces when
+an oops event is detected.
-``numa_balancing_scan_period_max_ms`` is the maximum time in milliseconds to
-scan a tasks virtual memory. It effectively controls the minimum scanning
-rate for each task.
-``numa_balancing_scan_size_mb`` is how many megabytes worth of pages are
-scanned for a given scan.
+oops_limit
+==========
+
+Number of kernel oopses after which the kernel should panic when
+``panic_on_oops`` is not set. Setting this to 0 disables checking
+the count. Setting this to 1 has the same effect as setting
+``panic_on_oops=1``. The default value is 10000.
osrelease, ostype & version
@@ -670,6 +848,8 @@ bit 1 print system memory info
bit 2 print timer info
bit 3 print locks info if ``CONFIG_LOCKDEP`` is on
bit 4 print ftrace buffer
+bit 5 print all printk messages in buffer
+bit 6 print all CPUs backtrace (if available in the arch)
===== ============================================
So for example to print tasks and memory info on panic, user can::
@@ -688,6 +868,13 @@ is useful to define the root cause of RCU stalls using a vmcore.
1 panic() after printing RCU stall messages.
= ============================================================
+max_rcu_stall_to_panic
+======================
+
+When ``panic_on_rcu_stall`` is set to 1, this value determines the
+number of times that RCU can stall before panic() is called.
+
+When ``panic_on_rcu_stall`` is set to 0, this value is has no effect.
perf_cpu_time_max_percent
=========================
@@ -721,7 +908,13 @@ perf_event_paranoid
===================
Controls use of the performance events system by unprivileged
-users (without CAP_SYS_ADMIN). The default value is 2.
+users (without CAP_PERFMON). The default value is 2.
+
+For backward compatibility reasons access to system performance
+monitoring and observability remains open for CAP_SYS_ADMIN
+privileged processes but CAP_SYS_ADMIN usage for secure system
+performance monitoring and observability operations is discouraged
+with respect to CAP_PERFMON use cases.
=== ==================================================================
-1 Allow use of (almost) all events by all users.
@@ -730,13 +923,13 @@ users (without CAP_SYS_ADMIN). The default value is 2.
``CAP_IPC_LOCK``.
>=0 Disallow ftrace function tracepoint by users without
- ``CAP_SYS_ADMIN``.
+ ``CAP_PERFMON``.
- Disallow raw tracepoint access by users without ``CAP_SYS_ADMIN``.
+ Disallow raw tracepoint access by users without ``CAP_PERFMON``.
->=1 Disallow CPU event access by users without ``CAP_SYS_ADMIN``.
+>=1 Disallow CPU event access by users without ``CAP_PERFMON``.
->=2 Disallow kernel profiling by users without ``CAP_SYS_ADMIN``.
+>=2 Disallow kernel profiling by users without ``CAP_PERFMON``.
=== ==================================================================
@@ -756,7 +949,7 @@ The default value is 127.
perf_event_mlock_kb
===================
-Control size of per-cpu ring buffer not counted agains mlock limit.
+Control size of per-cpu ring buffer not counted against mlock limit.
The default value is 512 + 1 page
@@ -774,6 +967,36 @@ enabled, otherwise writing to this file will return ``-EBUSY``.
The default value is 8.
+perf_user_access (arm64 and riscv only)
+=======================================
+
+Controls user space access for reading perf event counters.
+
+arm64
+=====
+
+The default value is 0 (access disabled).
+
+When set to 1, user space can read performance monitor counter registers
+directly.
+
+See Documentation/arch/arm64/perf.rst for more information.
+
+riscv
+=====
+
+When set to 0, user space access is disabled.
+
+The default value is 1, user space can read performance monitor counter
+registers through perf, any direct access without perf intervention will trigger
+an illegal instruction.
+
+When set to 2, which enables legacy mode (user space has direct access to cycle
+and insret CSRs only). Note that this legacy value is deprecated and will be
+removed once all user space applications are fixed.
+
+Note that the time CSR is always directly accessible to all modes.
+
pid_max
=======
@@ -871,7 +1094,33 @@ this sysctl interface anymore.
pty
===
-See Documentation/filesystems/devpts.txt.
+See Documentation/filesystems/devpts.rst.
+
+
+random
+======
+
+This is a directory, with the following entries:
+
+* ``boot_id``: a UUID generated the first time this is retrieved, and
+ unvarying after that;
+
+* ``uuid``: a UUID generated every time this is retrieved (this can
+ thus be used to generate UUIDs at will);
+
+* ``entropy_avail``: the pool's entropy count, in bits;
+
+* ``poolsize``: the entropy pool size, in bits;
+
+* ``urandom_min_reseed_secs``: obsolete (used to determine the minimum
+ number of seconds between urandom pool reseeding). This file is
+ writable for compatibility purposes, but writing to it has no effect
+ on any RNG behavior;
+
+* ``write_wakeup_threshold``: when the entropy count drops below this
+ (as a number of bits), processes waiting to write to ``/dev/random``
+ are woken up. This file is writable for compatibility purposes, but
+ writing to it has no effect on any RNG behavior.
randomize_va_space
@@ -911,7 +1160,7 @@ that support this feature.
real-root-dev
=============
-See :doc:`/admin-guide/initrd`.
+See Documentation/admin-guide/initrd.rst.
reboot-cmd (SPARC only)
@@ -930,8 +1179,16 @@ automatically on platforms where it can run (that is,
platforms with asymmetric CPU topologies and having an Energy
Model available). If your platform happens to meet the
requirements for EAS but you do not want to use it, change
-this value to 0.
+this value to 0. On Non-EAS platforms, write operation fails and
+read doesn't return anything.
+task_delayacct
+===============
+
+Enables/disables task delay accounting (see
+Documentation/accounting/delay-accounting.rst. Enabling this feature incurs
+a small amount of overhead in the scheduler but is useful for debugging
+and performance tuning. It is required by some tools such as iotop.
sched_schedstats
================
@@ -940,11 +1197,65 @@ Enables/disables scheduler statistics. Enabling this feature
incurs a small amount of overhead in the scheduler but is
useful for debugging and performance tuning.
+sched_util_clamp_min
+====================
+
+Max allowed *minimum* utilization.
+
+Default value is 1024, which is the maximum possible value.
+
+It means that any requested uclamp.min value cannot be greater than
+sched_util_clamp_min, i.e., it is restricted to the range
+[0:sched_util_clamp_min].
+
+sched_util_clamp_max
+====================
+
+Max allowed *maximum* utilization.
+
+Default value is 1024, which is the maximum possible value.
+
+It means that any requested uclamp.max value cannot be greater than
+sched_util_clamp_max, i.e., it is restricted to the range
+[0:sched_util_clamp_max].
+
+sched_util_clamp_min_rt_default
+===============================
+
+By default Linux is tuned for performance. Which means that RT tasks always run
+at the highest frequency and most capable (highest capacity) CPU (in
+heterogeneous systems).
+
+Uclamp achieves this by setting the requested uclamp.min of all RT tasks to
+1024 by default, which effectively boosts the tasks to run at the highest
+frequency and biases them to run on the biggest CPU.
+
+This knob allows admins to change the default behavior when uclamp is being
+used. In battery powered devices particularly, running at the maximum
+capacity and frequency will increase energy consumption and shorten the battery
+life.
+
+This knob is only effective for RT tasks which the user hasn't modified their
+requested uclamp.min value via sched_setattr() syscall.
+
+This knob will not escape the range constraint imposed by sched_util_clamp_min
+defined above.
+
+For example if
+
+ sched_util_clamp_min_rt_default = 800
+ sched_util_clamp_min = 600
+
+Then the boost will be clamped to 600 because 800 is outside of the permissible
+range of [0:600]. This could happen for instance if a powersave mode will
+restrict all boosts temporarily by modifying sched_util_clamp_min. As soon as
+this restriction is lifted, the requested sched_util_clamp_min_rt_default
+will take effect.
seccomp
=======
-See :doc:`/userspace-api/seccomp_filter`.
+See Documentation/userspace-api/seccomp_filter.rst.
sg-big-buff
@@ -1073,11 +1384,34 @@ This parameter can be used to control the soft lockup detector.
= =================================
The soft lockup detector monitors CPUs for threads that are hogging the CPUs
-without rescheduling voluntarily, and thus prevent the 'watchdog/N' threads
-from running. The mechanism depends on the CPUs ability to respond to timer
-interrupts which are needed for the 'watchdog/N' threads to be woken up by
-the watchdog timer function, otherwise the NMI watchdog — if enabled — can
-detect a hard lockup condition.
+without rescheduling voluntarily, and thus prevent the 'migration/N' threads
+from running, causing the watchdog work fail to execute. The mechanism depends
+on the CPUs ability to respond to timer interrupts which are needed for the
+watchdog work to be queued by the watchdog timer function, otherwise the NMI
+watchdog — if enabled — can detect a hard lockup condition.
+
+
+split_lock_mitigate (x86 only)
+==============================
+
+On x86, each "split lock" imposes a system-wide performance penalty. On larger
+systems, large numbers of split locks from unprivileged users can result in
+denials of service to well-behaved and potentially more important users.
+
+The kernel mitigates these bad users by detecting split locks and imposing
+penalties: forcing them to wait and only allowing one core to execute split
+locks at a time.
+
+These mitigations can make those bad applications unbearably slow. Setting
+split_lock_mitigate=0 may restore some application performance, but will also
+increase system exposure to denial of service attacks from split lock users.
+
+= ===================================================================
+0 Disable the mitigation mode - just warns the split lock on kernel log
+ and exposes the system to denials of service from the split lockers.
+1 Enable the mitigation mode (this is the default) - penalizes the split
+ lockers with intentional performance degradation.
+= ===================================================================
stack_erasing
@@ -1115,7 +1449,7 @@ the boot PROM.
sysrq
=====
-See :doc:`/admin-guide/sysrq`.
+See Documentation/admin-guide/sysrq.rst.
tainted
@@ -1127,7 +1461,7 @@ ORed together. The letters are seen in "Tainted" line of Oops reports.
====== ===== ==============================================================
1 `(P)` proprietary module was loaded
2 `(F)` module was force loaded
- 4 `(S)` SMP kernel oops on an officially SMP incapable processor
+ 4 `(S)` kernel running on an out of specification system
8 `(R)` module was force unloaded
16 `(M)` processor reported a Machine Check Exception (MCE)
32 `(B)` bad page referenced or some unexpected page flags
@@ -1145,8 +1479,16 @@ ORed together. The letters are seen in "Tainted" line of Oops reports.
131072 `(T)` The kernel was built with the struct randomization plugin
====== ===== ==============================================================
-See :doc:`/admin-guide/tainted-kernels` for more information.
+See Documentation/admin-guide/tainted-kernels.rst for more information.
+Note:
+ writes to this sysctl interface will fail with ``EINVAL`` if the kernel is
+ booted with the command line option ``panic_on_taint=<bitmask>,nousertaint``
+ and any of the ORed together values being written to ``tainted`` match with
+ the bitmask declared on panic_on_taint.
+ See Documentation/admin-guide/kernel-parameters.rst for more details on
+ that particular kernel command line option and its optional
+ ``nousertaint`` switch.
threads-max
===========
@@ -1167,6 +1509,49 @@ If a value outside of this range is written to ``threads-max`` an
``EINVAL`` error occurs.
+traceoff_on_warning
+===================
+
+When set, disables tracing (see Documentation/trace/ftrace.rst) when a
+``WARN()`` is hit.
+
+
+tracepoint_printk
+=================
+
+When tracepoints are sent to printk() (enabled by the ``tp_printk``
+boot parameter), this entry provides runtime control::
+
+ echo 0 > /proc/sys/kernel/tracepoint_printk
+
+will stop tracepoints from being sent to printk(), and::
+
+ echo 1 > /proc/sys/kernel/tracepoint_printk
+
+will send them to printk() again.
+
+This only works if the kernel was booted with ``tp_printk`` enabled.
+
+See Documentation/admin-guide/kernel-parameters.rst and
+Documentation/trace/boottime-trace.rst.
+
+
+unaligned-trap
+==============
+
+On architectures where unaligned accesses cause traps, and where this
+feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_ALLOW``; currently,
+``arc``, ``parisc`` and ``loongarch``), controls whether unaligned traps
+are caught and emulated (instead of failing).
+
+= ========================================================
+0 Do not emulate unaligned accesses.
+1 Emulate unaligned accesses. This is the default setting.
+= ========================================================
+
+See also `ignore-unaligned-usertrap`_.
+
+
unknown_nmi_panic
=================
@@ -1178,6 +1563,37 @@ NMI switch that most IA32 servers have fires unknown NMI up, for
example. If a system hangs up, try pressing the NMI switch.
+unprivileged_bpf_disabled
+=========================
+
+Writing 1 to this entry will disable unprivileged calls to ``bpf()``;
+once disabled, calling ``bpf()`` without ``CAP_SYS_ADMIN`` or ``CAP_BPF``
+will return ``-EPERM``. Once set to 1, this can't be cleared from the
+running kernel anymore.
+
+Writing 2 to this entry will also disable unprivileged calls to ``bpf()``,
+however, an admin can still change this setting later on, if needed, by
+writing 0 or 1 to this entry.
+
+If ``BPF_UNPRIV_DEFAULT_OFF`` is enabled in the kernel config, then this
+entry will default to 2 instead of 0.
+
+= =============================================================
+0 Unprivileged calls to ``bpf()`` are enabled
+1 Unprivileged calls to ``bpf()`` are disabled without recovery
+2 Unprivileged calls to ``bpf()`` are disabled
+= =============================================================
+
+
+warn_limit
+==========
+
+Number of kernel warnings after which the kernel should panic when
+``panic_on_warn`` is not set. Setting this to 0 disables checking
+the warning count. Setting this to 1 has the same effect as setting
+``panic_on_warn=1``. The default value is 0.
+
+
watchdog
========
diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index e043c9213388..396091651955 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -31,17 +31,18 @@ see only some of them, depending on your kernel's configuration.
Table : Subdirectories in /proc/sys/net
- ========= =================== = ========== ==================
+ ========= =================== = ========== ===================
Directory Content Directory Content
- ========= =================== = ========== ==================
- core General parameter appletalk Appletalk protocol
- unix Unix domain sockets netrom NET/ROM
- 802 E802 protocol ax25 AX25
- ethernet Ethernet protocol rose X.25 PLP layer
+ ========= =================== = ========== ===================
+ 802 E802 protocol mptcp Multipath TCP
+ appletalk Appletalk protocol netfilter Network Filter
+ ax25 AX25 netrom NET/ROM
+ bridge Bridging rose X.25 PLP layer
+ core General parameter tipc TIPC
+ ethernet Ethernet protocol unix Unix domain sockets
ipv4 IP version 4 x25 X.25 protocol
- bridge Bridging decnet DEC net
- ipv6 IP version 6 tipc TIPC
- ========= =================== = ========== ==================
+ ipv6 IP version 6
+ ========= =================== = ========== ===================
1. /proc/sys/net/core - Network core options
============================================
@@ -64,16 +65,17 @@ two flavors of JITs, the newer eBPF JIT currently supported on:
- arm64
- arm32
- ppc64
+ - ppc32
- sparc64
- mips64
- s390x
- riscv64
- riscv32
+ - loongarch64
And the older cBPF JIT supported on the following archs:
- mips
- - ppc
- sparc
eBPF JITs are a superset of cBPF JITs, meaning the kernel will
@@ -101,6 +103,9 @@ Values:
- 1 - enable JIT hardening for unprivileged users only
- 2 - enable JIT hardening for all users
+where "privileged user" in this context means a process having
+CAP_BPF or CAP_SYS_ADMIN in the root user name space.
+
bpf_jit_kallsyms
----------------
@@ -211,6 +216,12 @@ rmem_max
The maximum receive socket buffer size in bytes.
+rps_default_mask
+----------------
+
+The default RPS CPU mask used on newly created network devices. An empty
+mask means RPS disabled by default.
+
tstamp_allow_data
-----------------
Allow processes to receive tx timestamps looped together with the original
@@ -271,7 +282,7 @@ poll cycle or the number of packets processed reaches netdev_budget.
netdev_max_backlog
------------------
-Maximum number of packets, queued on the INPUT side, when the interface
+Maximum number of packets, queued on the INPUT side, when the interface
receives packets faster than kernel can process them.
netdev_rss_key
@@ -311,21 +322,52 @@ permit to distribute the load on several cpus.
If set to 1 (default), timestamps are sampled as soon as possible, before
queueing.
+netdev_unregister_timeout_secs
+------------------------------
+
+Unregister network device timeout in seconds.
+This option controls the timeout (in seconds) used to issue a warning while
+waiting for a network device refcount to drop to 0 during device
+unregistration. A lower value may be useful during bisection to detect
+a leaked reference faster. A larger value may be useful to prevent false
+warnings on slow/loaded systems.
+Default value is 10, minimum 1, maximum 3600.
+
+skb_defer_max
+-------------
+
+Max size (in skbs) of the per-cpu list of skbs being freed
+by the cpu which allocated them. Used by TCP stack so far.
+
+Default: 64
+
optmem_max
----------
Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence
-of struct cmsghdr structures with appended data.
+of struct cmsghdr structures with appended data. TCP tx zerocopy also uses
+optmem_max as a limit for its internal structures.
+
+Default : 128 KB
fb_tunnels_only_for_init_net
----------------------------
Controls if fallback tunnels (like tunl0, gre0, gretap0, erspan0,
-sit0, ip6tnl0, ip6gre0) are automatically created when a new
-network namespace is created, if corresponding tunnel is present
-in initial network namespace.
-If set to 1, these devices are not automatically created, and
-user space is responsible for creating them if needed.
+sit0, ip6tnl0, ip6gre0) are automatically created. There are 3 possibilities
+(a) value = 0; respective fallback tunnels are created when module is
+loaded in every net namespaces (backward compatible behavior).
+(b) value = 1; [kcmd value: initns] respective fallback tunnels are
+created only in init net namespace and every other net namespace will
+not have them.
+(c) value = 2; [kcmd value: none] fallback tunnels are not created
+when a module is loaded in any of the net namespace. Setting value to
+"2" is pointless after boot if these modules are built-in, so there is
+a kernel command-line option that can change this default. Please refer to
+Documentation/admin-guide/kernel-parameters.txt for additional details.
+
+Not creating fallback tunnels gives control to userspace to create
+whatever is needed only and avoid creating devices which are redundant.
Default : 0 (for compatibility reasons)
@@ -339,10 +381,42 @@ settings from init_net and for IPv6 we reset all settings to default.
If set to 1, both IPv4 and IPv6 settings are forced to inherit from
current ones in init_net. If set to 2, both IPv4 and IPv6 settings are
-forced to reset to their default values.
+forced to reset to their default values. If set to 3, both IPv4 and IPv6
+settings are forced to inherit from current ones in the netns where this
+new netns has been created.
Default : 0 (for compatibility reasons)
+txrehash
+--------
+
+Controls default hash rethink behaviour on socket when SO_TXREHASH option is set
+to SOCK_TXREHASH_DEFAULT (i. e. not overridden by setsockopt).
+
+If set to 1 (default), hash rethink is performed on listening socket.
+If set to 0, hash rethink is not performed.
+
+gro_normal_batch
+----------------
+
+Maximum number of the segments to batch up on output of GRO. When a packet
+exits GRO, either as a coalesced superframe or as an original packet which
+GRO has decided not to coalesce, it is placed on a per-NAPI list. This
+list is then passed to the stack when the number of segments reaches the
+gro_normal_batch limit.
+
+high_order_alloc_disable
+------------------------
+
+By default the allocator for page frags tries to use high order pages (order-3
+on x86). While the default behavior gives good results in most cases, some users
+might have hit a contention in page allocations/freeing. This was especially
+true on older kernels (< 5.14) when high-order pages were not stored on per-cpu
+lists. This allows to opt-in for order-0 allocation instead but is now mostly of
+historical importance.
+
+Default: 0
+
2. /proc/sys/net/unix - Parameters for Unix domain sockets
----------------------------------------------------------
@@ -353,8 +427,8 @@ socket's buffer. It will not take effect unless PF_UNIX flag is specified.
3. /proc/sys/net/ipv4 - IPV4 settings
-------------------------------------
-Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for
-descriptions of these entries.
+Please see: Documentation/networking/ip-sysctl.rst and
+Documentation/admin-guide/sysctl/net.rst for descriptions of these entries.
4. Appletalk
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 0329a4d3fa9e..c59889de122b 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -25,8 +25,8 @@ files can be found in mm/swap.c.
Currently, these files are in /proc/sys/vm:
- admin_reserve_kbytes
-- block_dump
- compact_memory
+- compaction_proactiveness
- compact_unevictable_allowed
- dirty_background_bytes
- dirty_background_ratio
@@ -37,6 +37,7 @@ Currently, these files are in /proc/sys/vm:
- dirty_writeback_centisecs
- drop_caches
- extfrag_threshold
+- highmem_is_dirtyable
- hugetlb_shm_group
- laptop_mode
- legacy_va_layout
@@ -61,8 +62,9 @@ Currently, these files are in /proc/sys/vm:
- overcommit_memory
- overcommit_ratio
- page-cluster
+- page_lock_unfairness
- panic_on_oom
-- percpu_pagelist_fraction
+- percpu_pagelist_high_fraction
- stat_interval
- stat_refresh
- numa_stat
@@ -104,13 +106,6 @@ On x86_64 this is about 128MB.
Changing this takes effect whenever an application requests memory.
-block_dump
-==========
-
-block_dump enables block I/O debugging when set to a nonzero value. More
-information on block I/O debugging is in Documentation/admin-guide/laptops/laptop-mode.rst.
-
-
compact_memory
==============
@@ -119,6 +114,22 @@ all zones are compacted such that free memory is available in contiguous
blocks where possible. This can be important for example in the allocation of
huge pages although processes will also directly compact memory as required.
+compaction_proactiveness
+========================
+
+This tunable takes a value in the range [0, 100] with a default value of
+20. This tunable determines how aggressively compaction is done in the
+background. Write of a non zero value to this tunable will immediately
+trigger the proactive compaction. Setting it to 0 disables proactive compaction.
+
+Note that compaction has a non-trivial system-wide impact as pages
+belonging to different processes are moved around, which could also lead
+to latency spikes in unsuspecting applications. The kernel employs
+various heuristics to avoid wasting CPU cycles if it detects that
+proactive compaction is not being effective.
+
+Be careful when setting it to extreme values like 100, as that may
+cause excessive background compaction activity.
compact_unevictable_allowed
===========================
@@ -129,7 +140,7 @@ This should be used on systems where stalls for minor page faults are an
acceptable trade for large contiguous free memory. Set to 0 to prevent
compaction from moving pages that are unevictable. Default value is 1.
On CONFIG_PREEMPT_RT the default value is 0 in order to avoid a page fault, due
-to compaction, which would block the task from becomming active until the fault
+to compaction, which would block the task from becoming active until the fault
is resolved.
@@ -345,7 +356,7 @@ The lowmem_reserve_ratio is an array. You can see them by reading this file::
But, these values are not used directly. The kernel calculates # of protection
pages for each zones from them. These are shown as array of protection pages
-in /proc/zoneinfo like followings. (This is an example of x86-64 box).
+in /proc/zoneinfo like the following. (This is an example of x86-64 box).
Each zone has an array of protection pages like this::
Node 0, zone DMA
@@ -411,7 +422,7 @@ While most applications need less than a thousand maps, certain
programs, particularly malloc debuggers, may consume lots of them,
e.g., up to one or two maps per allocation.
-The default value is 65536.
+The default value is 65530.
memory_failure_early_kill:
@@ -422,7 +433,7 @@ a 2bit error in a memory module) is detected in the background by hardware
that cannot be handled by the kernel. In some cases (like the page
still having a valid copy on disk) the kernel will handle the failure
transparently without affecting any applications. But if there is
-no other uptodate copy of the data it will kill to prevent any data
+no other up-to-date copy of the data it will kill to prevent any data
corruptions from propagating.
1: Kill all processes that have the corrupted and not reloadable page mapped
@@ -551,6 +562,43 @@ Change the minimum size of the hugepage pool.
See Documentation/admin-guide/mm/hugetlbpage.rst
+hugetlb_optimize_vmemmap
+========================
+
+This knob is not available when the size of 'struct page' (a structure defined
+in include/linux/mm_types.h) is not power of two (an unusual system config could
+result in this).
+
+Enable (set to 1) or disable (set to 0) HugeTLB Vmemmap Optimization (HVO).
+
+Once enabled, the vmemmap pages of subsequent allocation of HugeTLB pages from
+buddy allocator will be optimized (7 pages per 2MB HugeTLB page and 4095 pages
+per 1GB HugeTLB page), whereas already allocated HugeTLB pages will not be
+optimized. When those optimized HugeTLB pages are freed from the HugeTLB pool
+to the buddy allocator, the vmemmap pages representing that range needs to be
+remapped again and the vmemmap pages discarded earlier need to be rellocated
+again. If your use case is that HugeTLB pages are allocated 'on the fly' (e.g.
+never explicitly allocating HugeTLB pages with 'nr_hugepages' but only set
+'nr_overcommit_hugepages', those overcommitted HugeTLB pages are allocated 'on
+the fly') instead of being pulled from the HugeTLB pool, you should weigh the
+benefits of memory savings against the more overhead (~2x slower than before)
+of allocation or freeing HugeTLB pages between the HugeTLB pool and the buddy
+allocator. Another behavior to note is that if the system is under heavy memory
+pressure, it could prevent the user from freeing HugeTLB pages from the HugeTLB
+pool to the buddy allocator since the allocation of vmemmap pages could be
+failed, you have to retry later if your system encounter this situation.
+
+Once disabled, the vmemmap pages of subsequent allocation of HugeTLB pages from
+buddy allocator will not be optimized meaning the extra overhead at allocation
+time from buddy allocator disappears, whereas already optimized HugeTLB pages
+will not be affected. If you want to make sure there are no optimized HugeTLB
+pages, you can set "nr_hugepages" to 0 first and then disable this. Note that
+writing 0 to nr_hugepages will make any "in use" HugeTLB pages become surplus
+pages. So, those surplus pages are still optimized until they are no longer
+in use. You would need to wait for those surplus pages to be released before
+there are no optimized pages in the system.
+
+
nr_hugepages_mempolicy
======================
@@ -583,7 +631,7 @@ trimming of allocations is initiated.
The default value is 1.
-See Documentation/nommu-mmap.txt for more information.
+See Documentation/admin-guide/mm/nommu-mmap.rst for more information.
numa_zonelist_order
@@ -694,8 +742,8 @@ overcommit_memory
This value contains a flag that enables memory overcommitment.
-When this flag is 0, the kernel attempts to estimate the amount
-of free memory left when userspace requests more memory.
+When this flag is 0, the kernel compares the userspace memory request
+size against total memory plus swap and rejects obvious overcommits.
When this flag is 1, the kernel pretends there is always enough
memory until it actually runs out.
@@ -710,7 +758,7 @@ and don't use much of it.
The default value is 0.
-See Documentation/vm/overcommit-accounting.rst and
+See Documentation/mm/overcommit-accounting.rst and
mm/util.c::__vm_enough_memory() for more information.
@@ -744,6 +792,14 @@ extra faults and I/O delays for following faults if they would have been part of
that consecutive pages readahead would have brought in.
+page_lock_unfairness
+====================
+
+This value determines the number of times that the page lock can be
+stolen from under a waiter. After the lock is stolen the number of times
+specified in this file (default is 5), the "fair lock handoff" semantics
+will apply, and the waiter will only be awakened if the lock can be taken.
+
panic_on_oom
============
@@ -773,22 +829,24 @@ panic_on_oom=2+kdump gives you very strong tool to investigate
why oom happens. You can get snapshot.
-percpu_pagelist_fraction
-========================
+percpu_pagelist_high_fraction
+=============================
-This is the fraction of pages at most (high mark pcp->high) in each zone that
-are allocated for each per cpu page list. The min value for this is 8. It
-means that we don't allow more than 1/8th of pages in each zone to be
-allocated in any single per_cpu_pagelist. This entry only changes the value
-of hot per cpu pagelists. User can specify a number like 100 to allocate
-1/100th of each zone to each per cpu page list.
+This is the fraction of pages in each zone that are can be stored to
+per-cpu page lists. It is an upper boundary that is divided depending
+on the number of online CPUs. The min value for this is 8 which means
+that we do not allow more than 1/8th of pages in each zone to be stored
+on per-cpu page lists. This entry only changes the value of hot per-cpu
+page lists. A user can specify a number like 100 to allocate 1/100th of
+each zone between per-cpu lists.
-The batch value of each per cpu pagelist is also updated as a result. It is
-set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8)
+The batch value of each per-cpu page list remains the same regardless of
+the value of the high fraction so allocation latencies are unaffected.
-The initial value is zero. Kernel does not use this value at boot time to set
-the high water marks for each per cpu page list. If the user writes '0' to this
-sysctl, it will revert to this default behavior.
+The initial value is zero. Kernel uses this value to set the high pcp->high
+mark based on the low watermark for the zone and the number of local
+online CPUs. If the user writes '0' to this sysctl, it will revert to
+this default behavior.
stat_interval
@@ -831,25 +889,46 @@ tooling to work, you can do::
swappiness
==========
-This control is used to define how aggressive the kernel will swap
-memory pages. Higher values will increase aggressiveness, lower values
-decrease the amount of swap. A value of 0 instructs the kernel not to
-initiate swap until the amount of free and file-backed pages is less
-than the high water mark in a zone.
+This control is used to define the rough relative IO cost of swapping
+and filesystem paging, as a value between 0 and 200. At 100, the VM
+assumes equal IO cost and will thus apply memory pressure to the page
+cache and swap-backed pages equally; lower values signify more
+expensive swap IO, higher values indicates cheaper.
+
+Keep in mind that filesystem IO patterns under memory pressure tend to
+be more efficient than swap's random IO. An optimal value will require
+experimentation and will also be workload-dependent.
The default value is 60.
+For in-memory swap, like zram or zswap, as well as hybrid setups that
+have swap on faster devices than the filesystem, values beyond 100 can
+be considered. For example, if the random IO against the swap device
+is on average 2x faster than IO from the filesystem, swappiness should
+be 133 (x + 2x = 200, 2x = 133.33).
+
+At 0, the kernel will not initiate swap until the amount of free and
+file-backed pages is less than the high watermark in a zone.
+
unprivileged_userfaultfd
========================
-This flag controls whether unprivileged users can use the userfaultfd
-system calls. Set this to 1 to allow unprivileged users to use the
-userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
-privileged users (with SYS_CAP_PTRACE capability).
+This flag controls the mode in which unprivileged users can use the
+userfaultfd system calls. Set this to 0 to restrict unprivileged users
+to handle page faults in user mode only. In this case, users without
+SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to
+succeed. Prohibiting use of userfaultfd for handling faults from kernel
+mode may make certain vulnerabilities more difficult to exploit.
-The default value is 1.
+Set this to 1 to allow unprivileged users to use the userfaultfd system
+calls without any restrictions.
+
+The default value is 0.
+Another way to control permissions for userfaultfd is to use
+/dev/userfaultfd instead of userfaultfd(2). See
+Documentation/admin-guide/mm/userfaultfd.rst.
user_reserve_kbytes
===================
@@ -901,12 +980,12 @@ allocations, THP and hugetlbfs pages.
To make it sensible with respect to the watermark_scale_factor
parameter, the unit is in fractions of 10,000. The default value of
-15,000 on !DISCONTIGMEM configurations means that up to 150% of the high
-watermark will be reclaimed in the event of a pageblock being mixed due
-to fragmentation. The level of reclaim is determined by the number of
-fragmentation events that occurred in the recent past. If this value is
-smaller than a pageblock then a pageblocks worth of pages will be reclaimed
-(e.g. 2MB on 64-bit x86). A boost factor of 0 will disable the feature.
+15,000 means that up to 150% of the high watermark will be reclaimed in the
+event of a pageblock being mixed due to fragmentation. The level of reclaim
+is determined by the number of fragmentation events that occurred in the
+recent past. If this value is smaller than a pageblock then a pageblocks
+worth of pages will be reclaimed (e.g. 2MB on 64-bit x86). A boost factor
+of 0 will disable the feature.
watermark_scale_factor
@@ -918,7 +997,7 @@ how much memory needs to be free before kswapd goes back to sleep.
The unit is in fractions of 10,000. The default value of 10 means the
distances between watermarks are 0.1% of the available memory in the
-node/system. The maximum value is 1000, or 10% of memory.
+node/system. The maximum value is 3000, or 30% of memory.
A high rate of threads entering direct reclaim (allocstall) or kswapd
going to sleep prematurely (kswapd_low_wmark_hit_quickly) can indicate
@@ -948,11 +1027,11 @@ that benefit from having their data cached, zone_reclaim_mode should be
left disabled as the caching effect is likely to be more important than
data locality.
-zone_reclaim may be enabled if it's known that the workload is partitioned
-such that each partition fits within a NUMA node and that accessing remote
-memory would cause a measurable performance reduction. The page allocator
-will then reclaim easily reusable pages (those page cache pages that are
-currently not used) before allocating off node pages.
+Consider enabling one or more zone_reclaim mode bits if it's known that the
+workload is partitioned such that each partition fits within a NUMA node
+and that accessing remote memory would cause a measurable performance
+reduction. The page allocator will take additional actions before
+allocating off node pages.
Allowing zone reclaim to write out pages stops processes that are
writing large amounts of data from dirtying pages on other nodes. Zone
diff --git a/Documentation/admin-guide/sysrq.rst b/Documentation/admin-guide/sysrq.rst
index a46209f4636c..2f2e5bd440f9 100644
--- a/Documentation/admin-guide/sysrq.rst
+++ b/Documentation/admin-guide/sysrq.rst
@@ -72,13 +72,24 @@ On PowerPC
On other
If you know of the key combos for other architectures, please
- let me know so I can add them to this section.
+ submit a patch to be included in this section.
On all
- Write a character to /proc/sysrq-trigger. e.g.::
+ Write a single character to /proc/sysrq-trigger.
+ Only the first character is processed, the rest of the string is
+ ignored. However, it is not recommended to write any extra characters
+ as the behavior is undefined and might change in the future versions.
+ E.g.::
echo t > /proc/sysrq-trigger
+ Alternatively, write multiple characters prepended by underscore.
+ This way, all characters will be processed. E.g.::
+
+ echo _reisub > /proc/sysrq-trigger
+
+The :kbd:`<command key>` is case sensitive.
+
What are the 'command' keys?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -88,8 +99,8 @@ Command Function
``b`` Will immediately reboot the system without syncing or unmounting
your disks.
-``c`` Will perform a system crash by a NULL pointer dereference.
- A crashdump will be taken if configured.
+``c`` Will perform a system crash and a crashdump will be taken
+ if configured.
``d`` Shows all locks that are held.
@@ -136,7 +147,7 @@ Command Function
``v`` Forcefully restores framebuffer console
``v`` Causes ETM buffer dump [ARM-specific]
-``w`` Dumps tasks that are in uninterruptable (blocked) state.
+``w`` Dumps tasks that are in uninterruptible (blocked) state.
``x`` Used by xmon interface on ppc/powerpc platforms.
Show global PMU Registers on sparc64.
@@ -203,10 +214,12 @@ frozen (probably root) filesystem via the FIFREEZE ioctl.
Sometimes SysRq seems to get 'stuck' after using it, what can I do?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-That happens to me, also. I've found that tapping shift, alt, and control
-on both sides of the keyboard, and hitting an invalid sysrq sequence again
-will fix the problem. (i.e., something like :kbd:`alt-sysrq-z`). Switching to
-another virtual console (:kbd:`ALT+Fn`) and then back again should also help.
+When this happens, try tapping shift, alt and control on both sides of the
+keyboard, and hitting an invalid sysrq sequence again. (i.e., something like
+:kbd:`alt-sysrq-z`).
+
+Switching to another virtual console (:kbd:`ALT+Fn`) and then back again
+should also help.
I hit SysRq, but nothing seems to happen, what's wrong?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -231,13 +244,13 @@ prints help, and C) an action_msg string, that will print right before your
handler is called. Your handler must conform to the prototype in 'sysrq.h'.
After the ``sysrq_key_op`` is created, you can call the kernel function
-``register_sysrq_key(int key, struct sysrq_key_op *op_p);`` this will
+``register_sysrq_key(int key, const struct sysrq_key_op *op_p);`` this will
register the operation pointed to by ``op_p`` at table key 'key',
if that slot in the table is blank. At module unload time, you must call
-the function ``unregister_sysrq_key(int key, struct sysrq_key_op *op_p)``, which
-will remove the key op pointed to by 'op_p' from the key 'key', if and only if
-it is currently registered in that slot. This is in case the slot has been
-overwritten since you registered it.
+the function ``unregister_sysrq_key(int key, const struct sysrq_key_op *op_p)``,
+which will remove the key op pointed to by 'op_p' from the key 'key', if and
+only if it is currently registered in that slot. This is in case the slot has
+been overwritten since you registered it.
The Magic SysRQ system works by registering key operations against a key op
lookup table, which is defined in 'drivers/tty/sysrq.c'. This key table has
diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index 71e9184a9079..92a8a07f5c43 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -38,7 +38,7 @@ either letters or blanks. In above example it looks like this::
Tainted: P W O
-The meaning of those characters is explained in the table below. In tis case
+The meaning of those characters is explained in the table below. In this case
the kernel got tainted earlier because a proprietary Module (``P``) was loaded,
a warning occurred (``W``), and an externally-built module was loaded (``O``).
To decode other letters use the table below.
@@ -61,7 +61,7 @@ this on the machine that had the statements in the logs that were quoted earlier
* Proprietary module was loaded (#0)
* Kernel issued warning (#9)
* Externally-built ('out-of-tree') module was loaded (#12)
- See Documentation/admin-guide/tainted-kernels.rst in the the Linux kernel or
+ See Documentation/admin-guide/tainted-kernels.rst in the Linux kernel or
https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html for
a more details explanation of the various taint flags.
Raw taint value as int/string: 4609/'P W O '
@@ -84,7 +84,7 @@ Bit Log Number Reason that got the kernel tainted
=== === ====== ========================================================
0 G/P 1 proprietary module was loaded
1 _/F 2 module was force loaded
- 2 _/S 4 SMP kernel oops on an officially SMP incapable processor
+ 2 _/S 4 kernel running on an out of specification system
3 _/R 8 module was force unloaded
4 _/M 16 processor reported a Machine Check Exception (MCE)
5 _/B 32 bad page referenced or some unexpected page flags
@@ -100,6 +100,7 @@ Bit Log Number Reason that got the kernel tainted
15 _/K 32768 kernel has been live patched
16 _/X 65536 auxiliary taint, defined for and used by distros
17 _/T 131072 kernel was built with the struct randomization plugin
+ 18 _/N 262144 an in-kernel test has been run
=== === ====== ========================================================
Note: The character ``_`` is representing a blank in this table to make reading
@@ -116,10 +117,29 @@ More detailed explanation for tainting
1) ``F`` if any module was force loaded by ``insmod -f``, ``' '`` if all
modules were loaded normally.
- 2) ``S`` if the oops occurred on an SMP kernel running on hardware that
- hasn't been certified as safe to run multiprocessor.
- Currently this occurs only on various Athlons that are not
- SMP capable.
+ 2) ``S`` if the kernel is running on a processor or system that is out of
+ specification: hardware has been put into an unsupported configuration,
+ therefore proper execution cannot be guaranteed.
+ Kernel will be tainted if, for example:
+
+ - on x86: PAE is forced through forcepae on intel CPUs (such as Pentium M)
+ which do not report PAE but may have a functional implementation, an SMP
+ kernel is running on non officially capable SMP Athlon CPUs, MSRs are
+ being poked at from userspace.
+ - on arm: kernel running on certain CPUs (such as Keystone 2) without
+ having certain kernel features enabled.
+ - on arm64: there are mismatched hardware features between CPUs, the
+ bootloader has booted CPUs in different modes.
+ - certain drivers are being used on non supported architectures (such as
+ scsi/snic on something else than x86_64, scsi/ips on non
+ x86/x86_64/itanium, have broken firmware settings for the
+ irqchip/irq-gic on arm64 ...).
+ - x86/x86_64: Microcode late loading is dangerous and will result in
+ tainting the kernel. It requires that all CPUs rendezvous to make sure
+ the update happens when the system is as quiescent as possible. However,
+ a higher priority MCE/SMI/NMI can move control flow away from that
+ rendezvous and interrupt the update, which can be detrimental to the
+ machine.
3) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all
modules were unloaded normally.
@@ -130,7 +150,7 @@ More detailed explanation for tainting
5) ``B`` If a page-release function has found a bad page reference or some
unexpected page flags. This indicates a hardware problem or a kernel bug;
there should be other information in the log indicating why this tainting
- occured.
+ occurred.
6) ``U`` if a user or user application specifically requested that the
Tainted flag be set, ``' '`` otherwise.
diff --git a/Documentation/admin-guide/thermal/index.rst b/Documentation/admin-guide/thermal/index.rst
new file mode 100644
index 000000000000..193b7b01a87d
--- /dev/null
+++ b/Documentation/admin-guide/thermal/index.rst
@@ -0,0 +1,8 @@
+=================
+Thermal Subsystem
+=================
+
+.. toctree::
+ :maxdepth: 1
+
+ intel_powerclamp
diff --git a/Documentation/admin-guide/thermal/intel_powerclamp.rst b/Documentation/admin-guide/thermal/intel_powerclamp.rst
new file mode 100644
index 000000000000..08509b978af4
--- /dev/null
+++ b/Documentation/admin-guide/thermal/intel_powerclamp.rst
@@ -0,0 +1,345 @@
+=======================
+Intel Powerclamp Driver
+=======================
+
+By:
+ - Arjan van de Ven <arjan@linux.intel.com>
+ - Jacob Pan <jacob.jun.pan@linux.intel.com>
+
+.. Contents:
+
+ (*) Introduction
+ - Goals and Objectives
+
+ (*) Theory of Operation
+ - Idle Injection
+ - Calibration
+
+ (*) Performance Analysis
+ - Effectiveness and Limitations
+ - Power vs Performance
+ - Scalability
+ - Calibration
+ - Comparison with Alternative Techniques
+
+ (*) Usage and Interfaces
+ - Generic Thermal Layer (sysfs)
+ - Kernel APIs (TBD)
+
+ (*) Module Parameters
+
+INTRODUCTION
+============
+
+Consider the situation where a system’s power consumption must be
+reduced at runtime, due to power budget, thermal constraint, or noise
+level, and where active cooling is not preferred. Software managed
+passive power reduction must be performed to prevent the hardware
+actions that are designed for catastrophic scenarios.
+
+Currently, P-states, T-states (clock modulation), and CPU offlining
+are used for CPU throttling.
+
+On Intel CPUs, C-states provide effective power reduction, but so far
+they’re only used opportunistically, based on workload. With the
+development of intel_powerclamp driver, the method of synchronizing
+idle injection across all online CPU threads was introduced. The goal
+is to achieve forced and controllable C-state residency.
+
+Test/Analysis has been made in the areas of power, performance,
+scalability, and user experience. In many cases, clear advantage is
+shown over taking the CPU offline or modulating the CPU clock.
+
+
+THEORY OF OPERATION
+===================
+
+Idle Injection
+--------------
+
+On modern Intel processors (Nehalem or later), package level C-state
+residency is available in MSRs, thus also available to the kernel.
+
+These MSRs are::
+
+ #define MSR_PKG_C2_RESIDENCY 0x60D
+ #define MSR_PKG_C3_RESIDENCY 0x3F8
+ #define MSR_PKG_C6_RESIDENCY 0x3F9
+ #define MSR_PKG_C7_RESIDENCY 0x3FA
+
+If the kernel can also inject idle time to the system, then a
+closed-loop control system can be established that manages package
+level C-state. The intel_powerclamp driver is conceived as such a
+control system, where the target set point is a user-selected idle
+ratio (based on power reduction), and the error is the difference
+between the actual package level C-state residency ratio and the target idle
+ratio.
+
+Injection is controlled by high priority kernel threads, spawned for
+each online CPU.
+
+These kernel threads, with SCHED_FIFO class, are created to perform
+clamping actions of controlled duty ratio and duration. Each per-CPU
+thread synchronizes its idle time and duration, based on the rounding
+of jiffies, so accumulated errors can be prevented to avoid a jittery
+effect. Threads are also bound to the CPU such that they cannot be
+migrated, unless the CPU is taken offline. In this case, threads
+belong to the offlined CPUs will be terminated immediately.
+
+Running as SCHED_FIFO and relatively high priority, also allows such
+scheme to work for both preemptible and non-preemptible kernels.
+Alignment of idle time around jiffies ensures scalability for HZ
+values. This effect can be better visualized using a Perf timechart.
+The following diagram shows the behavior of kernel thread
+kidle_inject/cpu. During idle injection, it runs monitor/mwait idle
+for a given "duration", then relinquishes the CPU to other tasks,
+until the next time interval.
+
+The NOHZ schedule tick is disabled during idle time, but interrupts
+are not masked. Tests show that the extra wakeups from scheduler tick
+have a dramatic impact on the effectiveness of the powerclamp driver
+on large scale systems (Westmere system with 80 processors).
+
+::
+
+ CPU0
+ ____________ ____________
+ kidle_inject/0 | sleep | mwait | sleep |
+ _________| |________| |_______
+ duration
+ CPU1
+ ____________ ____________
+ kidle_inject/1 | sleep | mwait | sleep |
+ _________| |________| |_______
+ ^
+ |
+ |
+ roundup(jiffies, interval)
+
+Only one CPU is allowed to collect statistics and update global
+control parameters. This CPU is referred to as the controlling CPU in
+this document. The controlling CPU is elected at runtime, with a
+policy that favors BSP, taking into account the possibility of a CPU
+hot-plug.
+
+In terms of dynamics of the idle control system, package level idle
+time is considered largely as a non-causal system where its behavior
+cannot be based on the past or current input. Therefore, the
+intel_powerclamp driver attempts to enforce the desired idle time
+instantly as given input (target idle ratio). After injection,
+powerclamp monitors the actual idle for a given time window and adjust
+the next injection accordingly to avoid over/under correction.
+
+When used in a causal control system, such as a temperature control,
+it is up to the user of this driver to implement algorithms where
+past samples and outputs are included in the feedback. For example, a
+PID-based thermal controller can use the powerclamp driver to
+maintain a desired target temperature, based on integral and
+derivative gains of the past samples.
+
+
+
+Calibration
+-----------
+During scalability testing, it is observed that synchronized actions
+among CPUs become challenging as the number of cores grows. This is
+also true for the ability of a system to enter package level C-states.
+
+To make sure the intel_powerclamp driver scales well, online
+calibration is implemented. The goals for doing such a calibration
+are:
+
+a) determine the effective range of idle injection ratio
+b) determine the amount of compensation needed at each target ratio
+
+Compensation to each target ratio consists of two parts:
+
+ a) steady state error compensation
+
+ This is to offset the error occurring when the system can
+ enter idle without extra wakeups (such as external interrupts).
+
+ b) dynamic error compensation
+
+ When an excessive amount of wakeups occurs during idle, an
+ additional idle ratio can be added to quiet interrupts, by
+ slowing down CPU activities.
+
+A debugfs file is provided for the user to examine compensation
+progress and results, such as on a Westmere system::
+
+ [jacob@nex01 ~]$ cat
+ /sys/kernel/debug/intel_powerclamp/powerclamp_calib
+ controlling cpu: 0
+ pct confidence steady dynamic (compensation)
+ 0 0 0 0
+ 1 1 0 0
+ 2 1 1 0
+ 3 3 1 0
+ 4 3 1 0
+ 5 3 1 0
+ 6 3 1 0
+ 7 3 1 0
+ 8 3 1 0
+ ...
+ 30 3 2 0
+ 31 3 2 0
+ 32 3 1 0
+ 33 3 2 0
+ 34 3 1 0
+ 35 3 2 0
+ 36 3 1 0
+ 37 3 2 0
+ 38 3 1 0
+ 39 3 2 0
+ 40 3 3 0
+ 41 3 1 0
+ 42 3 2 0
+ 43 3 1 0
+ 44 3 1 0
+ 45 3 2 0
+ 46 3 3 0
+ 47 3 0 0
+ 48 3 2 0
+ 49 3 3 0
+
+Calibration occurs during runtime. No offline method is available.
+Steady state compensation is used only when confidence levels of all
+adjacent ratios have reached satisfactory level. A confidence level
+is accumulated based on clean data collected at runtime. Data
+collected during a period without extra interrupts is considered
+clean.
+
+To compensate for excessive amounts of wakeup during idle, additional
+idle time is injected when such a condition is detected. Currently,
+we have a simple algorithm to double the injection ratio. A possible
+enhancement might be to throttle the offending IRQ, such as delaying
+EOI for level triggered interrupts. But it is a challenge to be
+non-intrusive to the scheduler or the IRQ core code.
+
+
+CPU Online/Offline
+------------------
+Per-CPU kernel threads are started/stopped upon receiving
+notifications of CPU hotplug activities. The intel_powerclamp driver
+keeps track of clamping kernel threads, even after they are migrated
+to other CPUs, after a CPU offline event.
+
+
+Performance Analysis
+====================
+This section describes the general performance data collected on
+multiple systems, including Westmere (80P) and Ivy Bridge (4P, 8P).
+
+Effectiveness and Limitations
+-----------------------------
+The maximum range that idle injection is allowed is capped at 50
+percent. As mentioned earlier, since interrupts are allowed during
+forced idle time, excessive interrupts could result in less
+effectiveness. The extreme case would be doing a ping -f to generated
+flooded network interrupts without much CPU acknowledgement. In this
+case, little can be done from the idle injection threads. In most
+normal cases, such as scp a large file, applications can be throttled
+by the powerclamp driver, since slowing down the CPU also slows down
+network protocol processing, which in turn reduces interrupts.
+
+When control parameters change at runtime by the controlling CPU, it
+may take an additional period for the rest of the CPUs to catch up
+with the changes. During this time, idle injection is out of sync,
+thus not able to enter package C- states at the expected ratio. But
+this effect is minor, in that in most cases change to the target
+ratio is updated much less frequently than the idle injection
+frequency.
+
+Scalability
+-----------
+Tests also show a minor, but measurable, difference between the 4P/8P
+Ivy Bridge system and the 80P Westmere server under 50% idle ratio.
+More compensation is needed on Westmere for the same amount of
+target idle ratio. The compensation also increases as the idle ratio
+gets larger. The above reason constitutes the need for the
+calibration code.
+
+On the IVB 8P system, compared to an offline CPU, powerclamp can
+achieve up to 40% better performance per watt. (measured by a spin
+counter summed over per CPU counting threads spawned for all running
+CPUs).
+
+Usage and Interfaces
+====================
+The powerclamp driver is registered to the generic thermal layer as a
+cooling device. Currently, it’s not bound to any thermal zones::
+
+ jacob@chromoly:/sys/class/thermal/cooling_device14$ grep . *
+ cur_state:0
+ max_state:50
+ type:intel_powerclamp
+
+cur_state allows user to set the desired idle percentage. Writing 0 to
+cur_state will stop idle injection. Writing a value between 1 and
+max_state will start the idle injection. Reading cur_state returns the
+actual and current idle percentage. This may not be the same value
+set by the user in that current idle percentage depends on workload
+and includes natural idle. When idle injection is disabled, reading
+cur_state returns value -1 instead of 0 which is to avoid confusing
+100% busy state with the disabled state.
+
+Example usage:
+
+- To inject 25% idle time::
+
+ $ sudo sh -c "echo 25 > /sys/class/thermal/cooling_device80/cur_state
+
+If the system is not busy and has more than 25% idle time already,
+then the powerclamp driver will not start idle injection. Using Top
+will not show idle injection kernel threads.
+
+If the system is busy (spin test below) and has less than 25% natural
+idle time, powerclamp kernel threads will do idle injection. Forced
+idle time is accounted as normal idle in that common code path is
+taken as the idle task.
+
+In this example, 24.1% idle is shown. This helps the system admin or
+user determine the cause of slowdown, when a powerclamp driver is in action::
+
+
+ Tasks: 197 total, 1 running, 196 sleeping, 0 stopped, 0 zombie
+ Cpu(s): 71.2%us, 4.7%sy, 0.0%ni, 24.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
+ Mem: 3943228k total, 1689632k used, 2253596k free, 74960k buffers
+ Swap: 4087804k total, 0k used, 4087804k free, 945336k cached
+
+ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
+ 3352 jacob 20 0 262m 644 428 S 286 0.0 0:17.16 spin
+ 3341 root -51 0 0 0 0 D 25 0.0 0:01.62 kidle_inject/0
+ 3344 root -51 0 0 0 0 D 25 0.0 0:01.60 kidle_inject/3
+ 3342 root -51 0 0 0 0 D 25 0.0 0:01.61 kidle_inject/1
+ 3343 root -51 0 0 0 0 D 25 0.0 0:01.60 kidle_inject/2
+ 2935 jacob 20 0 696m 125m 35m S 5 3.3 0:31.11 firefox
+ 1546 root 20 0 158m 20m 6640 S 3 0.5 0:26.97 Xorg
+ 2100 jacob 20 0 1223m 88m 30m S 3 2.3 0:23.68 compiz
+
+Tests have shown that by using the powerclamp driver as a cooling
+device, a PID based userspace thermal controller can manage to
+control CPU temperature effectively, when no other thermal influence
+is added. For example, a UltraBook user can compile the kernel under
+certain temperature (below most active trip points).
+
+Module Parameters
+=================
+
+``cpumask`` (RW)
+ A bit mask of CPUs to inject idle. The format of the bitmask is same as
+ used in other subsystems like in /proc/irq/\*/smp_affinity. The mask is
+ comma separated 32 bit groups. Each CPU is one bit. For example for a 256
+ CPU system the full mask is:
+ ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
+
+ The rightmost mask is for CPU 0-32.
+
+``max_idle`` (RW)
+ Maximum injected idle time to the total CPU time ratio in percent range
+ from 1 to 100. Even if the cooling device max_state is always 100 (100%),
+ this parameter allows to add a max idle percent limit. The default is 50,
+ to match the current implementation of powerclamp driver. Also doesn't
+ allow value more than 75, if the cpumask includes every CPU present in
+ the system.
diff --git a/Documentation/admin-guide/thunderbolt.rst b/Documentation/admin-guide/thunderbolt.rst
index 10c4f0ce2ad0..2ed79f41a411 100644
--- a/Documentation/admin-guide/thunderbolt.rst
+++ b/Documentation/admin-guide/thunderbolt.rst
@@ -47,6 +47,9 @@ be DMA masters and thus read contents of the host memory without CPU and OS
knowing about it. There are ways to prevent this by setting up an IOMMU but
it is not always available for various reasons.
+Some USB4 systems have a BIOS setting to disable PCIe tunneling. This is
+treated as another security level (nopcie).
+
The security levels are as follows:
none
@@ -77,6 +80,10 @@ The security levels are as follows:
Display Port in a dock. All PCIe links downstream of the dock are
removed.
+ nopcie
+ PCIe tunneling is disabled/forbidden from the BIOS. Available in some
+ USB4 systems.
+
The current security level can be read from
``/sys/bus/thunderbolt/devices/domainX/security`` where ``domainX`` is
the Thunderbolt domain the host controller manages. There is typically
@@ -153,6 +160,22 @@ If the user still wants to connect the device they can either approve
the device without a key or write a new key and write 1 to the
``authorized`` file to get the new key stored on the device NVM.
+De-authorizing devices
+----------------------
+It is possible to de-authorize devices by writing ``0`` to their
+``authorized`` attribute. This requires support from the connection
+manager implementation and can be checked by reading domain
+``deauthorization`` attribute. If it reads ``1`` then the feature is
+supported.
+
+When a device is de-authorized the PCIe tunnel from the parent device
+PCIe downstream (or root) port to the device PCIe upstream port is torn
+down. This is essentially the same thing as PCIe hot-remove and the PCIe
+toplogy in question will not be accessible anymore until the device is
+authorized again. If there is storage such as NVMe or similar involved,
+there is a risk for data loss if the filesystem on that storage is not
+properly shut down. You have been warned!
+
DMA protection utilizing IOMMU
------------------------------
Recent systems from 2018 and forward with Thunderbolt ports may natively
@@ -173,8 +196,8 @@ following ``udev`` rule::
ACTION=="add", SUBSYSTEM=="thunderbolt", ATTRS{iommu_dma_protection}=="1", ATTR{authorized}=="0", ATTR{authorized}="1"
-Upgrading NVM on Thunderbolt device or host
--------------------------------------------
+Upgrading NVM on Thunderbolt device, host or retimer
+----------------------------------------------------
Since most of the functionality is handled in firmware running on a
host controller or a device, it is important that the firmware can be
upgraded to the latest where possible bugs in it have been fixed.
@@ -185,9 +208,10 @@ for some machines:
`Thunderbolt Updates <https://thunderbolttechnology.net/updates>`_
-Before you upgrade firmware on a device or host, please make sure it is a
-suitable upgrade. Failing to do that may render the device (or host) in a
-state where it cannot be used properly anymore without special tools!
+Before you upgrade firmware on a device, host or retimer, please make
+sure it is a suitable upgrade. Failing to do that may render the device
+in a state where it cannot be used properly anymore without special
+tools!
Host NVM upgrade on Apple Macs is not supported.
@@ -232,6 +256,35 @@ Note names of the NVMem devices ``nvm_activeN`` and ``nvm_non_activeN``
depend on the order they are registered in the NVMem subsystem. N in
the name is the identifier added by the NVMem subsystem.
+Upgrading on-board retimer NVM when there is no cable connected
+---------------------------------------------------------------
+If the platform supports, it may be possible to upgrade the retimer NVM
+firmware even when there is nothing connected to the USB4
+ports. When this is the case the ``usb4_portX`` devices have two special
+attributes: ``offline`` and ``rescan``. The way to upgrade the firmware
+is to first put the USB4 port into offline mode::
+
+ # echo 1 > /sys/bus/thunderbolt/devices/0-0/usb4_port1/offline
+
+This step makes sure the port does not respond to any hotplug events,
+and also ensures the retimers are powered on. The next step is to scan
+for the retimers::
+
+ # echo 1 > /sys/bus/thunderbolt/devices/0-0/usb4_port1/rescan
+
+This enumerates and adds the on-board retimers. Now retimer NVM can be
+upgraded in the same way than with cable connected (see previous
+section). However, the retimer is not disconnected as we are offline
+mode) so after writing ``1`` to ``nvm_authenticate`` one should wait for
+5 or more seconds before running rescan again::
+
+ # echo 1 > /sys/bus/thunderbolt/devices/0-0/usb4_port1/rescan
+
+This point if everything went fine, the port can be put back to
+functional state again::
+
+ # echo 0 > /sys/bus/thunderbolt/devices/0-0/usb4_port1/offline
+
Upgrading NVM when host controller is in safe mode
--------------------------------------------------
If the existing NVM is not properly authenticated (or is missing) the
diff --git a/Documentation/admin-guide/unicode.rst b/Documentation/admin-guide/unicode.rst
index 7425a3351321..cba7e5017d36 100644
--- a/Documentation/admin-guide/unicode.rst
+++ b/Documentation/admin-guide/unicode.rst
@@ -3,11 +3,10 @@ Unicode support
Last update: 2005-01-17, version 1.4
-This file is maintained by H. Peter Anvin <unicode@lanana.org> as part
-of the Linux Assigned Names And Numbers Authority (LANANA) project.
-The current version can be found at:
-
- http://www.lanana.org/docs/unicode/admin-guide/unicode.rst
+Note: The original version of this document, which was maintained at
+lanana.org as part of the Linux Assigned Names And Numbers Authority
+(LANANA) project, is no longer existent. So, this version in the
+mainline Linux kernel is now the maintained main document.
Introduction
------------
@@ -114,7 +113,7 @@ Unicode practice.
This range is now officially managed by the ConScript Unicode
Registry. The normative reference is at:
- http://www.evertype.com/standards/csur/klingon.html
+ https://www.evertype.com/standards/csur/klingon.html
Klingon has an alphabet of 26 characters, a positional numeric writing
system with 10 digits, and is written left-to-right, top-to-bottom.
@@ -178,7 +177,7 @@ fictional and artificial scripts has been established by John Cowan
<jcowan@reutershealth.com> and Michael Everson <everson@evertype.com>.
The ConScript Unicode Registry is accessible at:
- http://www.evertype.com/standards/csur/
+ https://www.evertype.com/standards/csur/
The ranges used fall at the low end of the End User Zone and can hence
not be normatively assigned, but it is recommended that people who
diff --git a/Documentation/admin-guide/wimax/i2400m.rst b/Documentation/admin-guide/wimax/i2400m.rst
deleted file mode 100644
index 194388c0c351..000000000000
--- a/Documentation/admin-guide/wimax/i2400m.rst
+++ /dev/null
@@ -1,283 +0,0 @@
-.. include:: <isonum.txt>
-
-====================================================
-Driver for the Intel Wireless Wimax Connection 2400m
-====================================================
-
-:Copyright: |copy| 2008 Intel Corporation < linux-wimax@intel.com >
-
- This provides a driver for the Intel Wireless WiMAX Connection 2400m
- and a basic Linux kernel WiMAX stack.
-
-1. Requirements
-===============
-
- * Linux installation with Linux kernel 2.6.22 or newer (if building
- from a separate tree)
- * Intel i2400m Echo Peak or Baxter Peak; this includes the Intel
- Wireless WiMAX/WiFi Link 5x50 series.
- * build tools:
-
- + Linux kernel development package for the target kernel; to
- build against your currently running kernel, you need to have
- the kernel development package corresponding to the running
- image installed (usually if your kernel is named
- linux-VERSION, the development package is called
- linux-dev-VERSION or linux-headers-VERSION).
- + GNU C Compiler, make
-
-2. Compilation and installation
-===============================
-
-2.1. Compilation of the drivers included in the kernel
-------------------------------------------------------
-
- Configure the kernel; to enable the WiMAX drivers select Drivers >
- Networking Drivers > WiMAX device support. Enable all of them as
- modules (easier).
-
- If USB or SDIO are not enabled in the kernel configuration, the options
- to build the i2400m USB or SDIO drivers will not show. Enable said
- subsystems and go back to the WiMAX menu to enable the drivers.
-
- Compile and install your kernel as usual.
-
-2.2. Compilation of the drivers distributed as an standalone module
--------------------------------------------------------------------
-
- To compile::
-
- $ cd source/directory
- $ make
-
- Once built you can load and unload using the provided load.sh script;
- load.sh will load the modules, load.sh u will unload them.
-
- To install in the default kernel directories (and enable auto loading
- when the device is plugged)::
-
- $ make install
- $ depmod -a
-
- If your kernel development files are located in a non standard
- directory or if you want to build for a kernel that is not the
- currently running one, set KDIR to the right location::
-
- $ make KDIR=/path/to/kernel/dev/tree
-
- For more information, please contact linux-wimax@intel.com.
-
-3. Installing the firmware
---------------------------
-
- The firmware can be obtained from http://linuxwimax.org or might have
- been supplied with your hardware.
-
- It has to be installed in the target system::
-
- $ cp FIRMWAREFILE.sbcf /lib/firmware/i2400m-fw-BUSTYPE-1.3.sbcf
-
- * NOTE: if your firmware came in an .rpm or .deb file, just install
- it as normal, with the rpm (rpm -i FIRMWARE.rpm) or dpkg
- (dpkg -i FIRMWARE.deb) commands. No further action is needed.
- * BUSTYPE will be usb or sdio, depending on the hardware you have.
- Each hardware type comes with its own firmware and will not work
- with other types.
-
-4. Design
-=========
-
- This package contains two major parts: a WiMAX kernel stack and a
- driver for the Intel i2400m.
-
- The WiMAX stack is designed to provide for common WiMAX control
- services to current and future WiMAX devices from any vendor; please
- see README.wimax for details.
-
- The i2400m kernel driver is broken up in two main parts: the bus
- generic driver and the bus-specific drivers. The bus generic driver
- forms the drivercore and contain no knowledge of the actual method we
- use to connect to the device. The bus specific drivers are just the
- glue to connect the bus-generic driver and the device. Currently only
- USB and SDIO are supported. See drivers/net/wimax/i2400m/i2400m.h for
- more information.
-
- The bus generic driver is logically broken up in two parts: OS-glue and
- hardware-glue. The OS-glue interfaces with Linux. The hardware-glue
- interfaces with the device on using an interface provided by the
- bus-specific driver. The reason for this breakup is to be able to
- easily reuse the hardware-glue to write drivers for other OSes; note
- the hardware glue part is written as a native Linux driver; no
- abstraction layers are used, so to port to another OS, the Linux kernel
- API calls should be replaced with the target OS's.
-
-5. Usage
-========
-
- To load the driver, follow the instructions in the install section;
- once the driver is loaded, plug in the device (unless it is permanently
- plugged in). The driver will enumerate the device, upload the firmware
- and output messages in the kernel log (dmesg, /var/log/messages or
- /var/log/kern.log) such as::
-
- ...
- i2400m_usb 5-4:1.0: firmware interface version 8.0.0
- i2400m_usb 5-4:1.0: WiMAX interface wmx0 (00:1d:e1:01:94:2c) ready
-
- At this point the device is ready to work.
-
- Current versions require the Intel WiMAX Network Service in userspace
- to make things work. See the network service's README for instructions
- on how to scan, connect and disconnect.
-
-5.1. Module parameters
-----------------------
-
- Module parameters can be set at kernel or module load time or by
- echoing values::
-
- $ echo VALUE > /sys/module/MODULENAME/parameters/PARAMETERNAME
-
- To make changes permanent, for example, for the i2400m module, you can
- also create a file named /etc/modprobe.d/i2400m containing::
-
- options i2400m idle_mode_disabled=1
-
- To find which parameters are supported by a module, run::
-
- $ modinfo path/to/module.ko
-
- During kernel bootup (if the driver is linked in the kernel), specify
- the following to the kernel command line::
-
- i2400m.PARAMETER=VALUE
-
-5.1.1. i2400m: idle_mode_disabled
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
- The i2400m module supports a parameter to disable idle mode. This
- parameter, once set, will take effect only when the device is
- reinitialized by the driver (eg: following a reset or a reconnect).
-
-5.2. Debug operations: debugfs entries
---------------------------------------
-
- The driver will register debugfs entries that allow the user to tweak
- debug settings. There are three main container directories where
- entries are placed, which correspond to the three blocks a i2400m WiMAX
- driver has:
-
- * /sys/kernel/debug/wimax:DEVNAME/ for the generic WiMAX stack
- controls
- * /sys/kernel/debug/wimax:DEVNAME/i2400m for the i2400m generic
- driver controls
- * /sys/kernel/debug/wimax:DEVNAME/i2400m-usb (or -sdio) for the
- bus-specific i2400m-usb or i2400m-sdio controls).
-
- Of course, if debugfs is mounted in a directory other than
- /sys/kernel/debug, those paths will change.
-
-5.2.1. Increasing debug output
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
- The files named *dl_* indicate knobs for controlling the debug output
- of different submodules::
-
- # find /sys/kernel/debug/wimax\:wmx0 -name \*dl_\*
- /sys/kernel/debug/wimax:wmx0/i2400m-usb/dl_tx
- /sys/kernel/debug/wimax:wmx0/i2400m-usb/dl_rx
- /sys/kernel/debug/wimax:wmx0/i2400m-usb/dl_notif
- /sys/kernel/debug/wimax:wmx0/i2400m-usb/dl_fw
- /sys/kernel/debug/wimax:wmx0/i2400m-usb/dl_usb
- /sys/kernel/debug/wimax:wmx0/i2400m/dl_tx
- /sys/kernel/debug/wimax:wmx0/i2400m/dl_rx
- /sys/kernel/debug/wimax:wmx0/i2400m/dl_rfkill
- /sys/kernel/debug/wimax:wmx0/i2400m/dl_netdev
- /sys/kernel/debug/wimax:wmx0/i2400m/dl_fw
- /sys/kernel/debug/wimax:wmx0/i2400m/dl_debugfs
- /sys/kernel/debug/wimax:wmx0/i2400m/dl_driver
- /sys/kernel/debug/wimax:wmx0/i2400m/dl_control
- /sys/kernel/debug/wimax:wmx0/wimax_dl_stack
- /sys/kernel/debug/wimax:wmx0/wimax_dl_op_rfkill
- /sys/kernel/debug/wimax:wmx0/wimax_dl_op_reset
- /sys/kernel/debug/wimax:wmx0/wimax_dl_op_msg
- /sys/kernel/debug/wimax:wmx0/wimax_dl_id_table
- /sys/kernel/debug/wimax:wmx0/wimax_dl_debugfs
-
- By reading the file you can obtain the current value of said debug
- level; by writing to it, you can set it.
-
- To increase the debug level of, for example, the i2400m's generic TX
- engine, just write::
-
- $ echo 3 > /sys/kernel/debug/wimax:wmx0/i2400m/dl_tx
-
- Increasing numbers yield increasing debug information; for details of
- what is printed and the available levels, check the source. The code
- uses 0 for disabled and increasing values until 8.
-
-5.2.2. RX and TX statistics
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
- The i2400m/rx_stats and i2400m/tx_stats provide statistics about the
- data reception/delivery from the device::
-
- $ cat /sys/kernel/debug/wimax:wmx0/i2400m/rx_stats
- 45 1 3 34 3104 48 480
-
- The numbers reported are:
-
- * packets/RX-buffer: total, min, max
- * RX-buffers: total RX buffers received, accumulated RX buffer size
- in bytes, min size received, max size received
-
- Thus, to find the average buffer size received, divide accumulated
- RX-buffer / total RX-buffers.
-
- To clear the statistics back to 0, write anything to the rx_stats file::
-
- $ echo 1 > /sys/kernel/debug/wimax:wmx0/i2400m_rx_stats
-
- Likewise for TX.
-
- Note the packets this debug file refers to are not network packet, but
- packets in the sense of the device-specific protocol for communication
- to the host. See drivers/net/wimax/i2400m/tx.c.
-
-5.2.3. Tracing messages received from user space
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
- To echo messages received from user space into the trace pipe that the
- i2400m driver creates, set the debug file i2400m/trace_msg_from_user to
- 1::
-
- $ echo 1 > /sys/kernel/debug/wimax:wmx0/i2400m/trace_msg_from_user
-
-5.2.4. Performing a device reset
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
- By writing a 0, a 1 or a 2 to the file
- /sys/kernel/debug/wimax:wmx0/reset, the driver performs a warm (without
- disconnecting from the bus), cold (disconnecting from the bus) or bus
- (bus specific) reset on the device.
-
-5.2.5. Asking the device to enter power saving mode
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
- By writing any value to the /sys/kernel/debug/wimax:wmx0 file, the
- device will attempt to enter power saving mode.
-
-6. Troubleshooting
-==================
-
-6.1. Driver complains about ``i2400m-fw-usb-1.2.sbcf: request failed``
-----------------------------------------------------------------------
-
- If upon connecting the device, the following is output in the kernel
- log::
-
- i2400m_usb 5-4:1.0: fw i2400m-fw-usb-1.3.sbcf: request failed: -2
-
- This means that the driver cannot locate the firmware file named
- /lib/firmware/i2400m-fw-usb-1.2.sbcf. Check that the file is present in
- the right location.
diff --git a/Documentation/admin-guide/wimax/index.rst b/Documentation/admin-guide/wimax/index.rst
deleted file mode 100644
index fdf7c1f99ff5..000000000000
--- a/Documentation/admin-guide/wimax/index.rst
+++ /dev/null
@@ -1,19 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-===============
-WiMAX subsystem
-===============
-
-.. toctree::
- :maxdepth: 2
-
- wimax
-
- i2400m
-
-.. only:: subproject and html
-
- Indices
- =======
-
- * :ref:`genindex`
diff --git a/Documentation/admin-guide/wimax/wimax.rst b/Documentation/admin-guide/wimax/wimax.rst
deleted file mode 100644
index 817ee8ba2732..000000000000
--- a/Documentation/admin-guide/wimax/wimax.rst
+++ /dev/null
@@ -1,89 +0,0 @@
-.. include:: <isonum.txt>
-
-========================
-Linux kernel WiMAX stack
-========================
-
-:Copyright: |copy| 2008 Intel Corporation < linux-wimax@intel.com >
-
- This provides a basic Linux kernel WiMAX stack to provide a common
- control API for WiMAX devices, usable from kernel and user space.
-
-1. Design
-=========
-
- The WiMAX stack is designed to provide for common WiMAX control
- services to current and future WiMAX devices from any vendor.
-
- Because currently there is only one and we don't know what would be the
- common services, the APIs it currently provides are very minimal.
- However, it is done in such a way that it is easily extensible to
- accommodate future requirements.
-
- The stack works by embedding a struct wimax_dev in your device's
- control structures. This provides a set of callbacks that the WiMAX
- stack will call in order to implement control operations requested by
- the user. As well, the stack provides API functions that the driver
- calls to notify about changes of state in the device.
-
- The stack exports the API calls needed to control the device to user
- space using generic netlink as a marshalling mechanism. You can access
- them using your own code or use the wrappers provided for your
- convenience in libwimax (in the wimax-tools package).
-
- For detailed information on the stack, please see
- include/linux/wimax.h.
-
-2. Usage
-========
-
- For usage in a driver (registration, API, etc) please refer to the
- instructions in the header file include/linux/wimax.h.
-
- When a device is registered with the WiMAX stack, a set of debugfs
- files will appear in /sys/kernel/debug/wimax:wmxX can tweak for
- control.
-
-2.1. Obtaining debug information: debugfs entries
--------------------------------------------------
-
- The WiMAX stack is compiled, by default, with debug messages that can
- be used to diagnose issues. By default, said messages are disabled.
-
- The drivers will register debugfs entries that allow the user to tweak
- debug settings.
-
- Each driver, when registering with the stack, will cause a debugfs
- directory named wimax:DEVICENAME to be created; optionally, it might
- create more subentries below it.
-
-2.1.1. Increasing debug output
-------------------------------
-
- The files named *dl_* indicate knobs for controlling the debug output
- of different submodules of the WiMAX stack::
-
- # find /sys/kernel/debug/wimax\:wmx0 -name \*dl_\*
- /sys/kernel/debug/wimax:wmx0/wimax_dl_stack
- /sys/kernel/debug/wimax:wmx0/wimax_dl_op_rfkill
- /sys/kernel/debug/wimax:wmx0/wimax_dl_op_reset
- /sys/kernel/debug/wimax:wmx0/wimax_dl_op_msg
- /sys/kernel/debug/wimax:wmx0/wimax_dl_id_table
- /sys/kernel/debug/wimax:wmx0/wimax_dl_debugfs
- /sys/kernel/debug/wimax:wmx0/.... # other driver specific files
-
- NOTE:
- Of course, if debugfs is mounted in a directory other than
- /sys/kernel/debug, those paths will change.
-
- By reading the file you can obtain the current value of said debug
- level; by writing to it, you can set it.
-
- To increase the debug level of, for example, the id-table submodule,
- just write:
-
- $ echo 3 > /sys/kernel/debug/wimax:wmx0/wimax_dl_id_table
-
- Increasing numbers yield increasing debug information; for details of
- what is printed and the available levels, check the source. The code
- uses 0 for disabled and increasing values until 8.
diff --git a/Documentation/admin-guide/workload-tracing.rst b/Documentation/admin-guide/workload-tracing.rst
new file mode 100644
index 000000000000..b2e254ec8ee8
--- /dev/null
+++ b/Documentation/admin-guide/workload-tracing.rst
@@ -0,0 +1,606 @@
+.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
+
+======================================================
+Discovering Linux kernel subsystems used by a workload
+======================================================
+
+:Authors: - Shuah Khan <skhan@linuxfoundation.org>
+ - Shefali Sharma <sshefali021@gmail.com>
+:maintained-by: Shuah Khan <skhan@linuxfoundation.org>
+
+Key Points
+==========
+
+ * Understanding system resources necessary to build and run a workload
+ is important.
+ * Linux tracing and strace can be used to discover the system resources
+ in use by a workload. The completeness of the system usage information
+ depends on the completeness of coverage of a workload.
+ * Performance and security of the operating system can be analyzed with
+ the help of tools such as:
+ `perf <https://man7.org/linux/man-pages/man1/perf.1.html>`_,
+ `stress-ng <https://www.mankier.com/1/stress-ng>`_,
+ `paxtest <https://github.com/opntr/paxtest-freebsd>`_.
+ * Once we discover and understand the workload needs, we can focus on them
+ to avoid regressions and use it to evaluate safety considerations.
+
+Methodology
+===========
+
+`strace <https://man7.org/linux/man-pages/man1/strace.1.html>`_ is a
+diagnostic, instructional, and debugging tool and can be used to discover
+the system resources in use by a workload. Once we discover and understand
+the workload needs, we can focus on them to avoid regressions and use it
+to evaluate safety considerations. We use strace tool to trace workloads.
+
+This method of tracing using strace tells us the system calls invoked by
+the workload and doesn't include all the system calls that can be invoked
+by it. In addition, this tracing method tells us just the code paths within
+these system calls that are invoked. As an example, if a workload opens a
+file and reads from it successfully, then the success path is the one that
+is traced. Any error paths in that system call will not be traced. If there
+is a workload that provides full coverage of a workload then the method
+outlined here will trace and find all possible code paths. The completeness
+of the system usage information depends on the completeness of coverage of a
+workload.
+
+The goal is tracing a workload on a system running a default kernel without
+requiring custom kernel installs.
+
+How do we gather fine-grained system information?
+=================================================
+
+strace tool can be used to trace system calls made by a process and signals
+it receives. System calls are the fundamental interface between an
+application and the operating system kernel. They enable a program to
+request services from the kernel. For instance, the open() system call in
+Linux is used to provide access to a file in the file system. strace enables
+us to track all the system calls made by an application. It lists all the
+system calls made by a process and their resulting output.
+
+You can generate profiling data combining strace and perf record tools to
+record the events and information associated with a process. This provides
+insight into the process. "perf annotate" tool generates the statistics of
+each instruction of the program. This document goes over the details of how
+to gather fine-grained information on a workload's usage of system resources.
+
+We used strace to trace the perf, stress-ng, paxtest workloads to illustrate
+our methodology to discover resources used by a workload. This process can
+be applied to trace other workloads.
+
+Getting the system ready for tracing
+====================================
+
+Before we can get started we will show you how to get your system ready.
+We assume that you have a Linux distribution running on a physical system
+or a virtual machine. Most distributions will include strace command. Let’s
+install other tools that aren’t usually included to build Linux kernel.
+Please note that the following works on Debian based distributions. You
+might have to find equivalent packages on other Linux distributions.
+
+Install tools to build Linux kernel and tools in kernel repository.
+scripts/ver_linux is a good way to check if your system already has
+the necessary tools::
+
+ sudo apt-get build-essentials flex bison yacc
+ sudo apt install libelf-dev systemtap-sdt-dev libaudit-dev libslang2-dev libperl-dev libdw-dev
+
+cscope is a good tool to browse kernel sources. Let's install it now::
+
+ sudo apt-get install cscope
+
+Install stress-ng and paxtest::
+
+ apt-get install stress-ng
+ apt-get install paxtest
+
+Workload overview
+=================
+
+As mentioned earlier, we used strace to trace perf bench, stress-ng and
+paxtest workloads to show how to analyze a workload and identify Linux
+subsystems used by these workloads. Let's start with an overview of these
+three workloads to get a better understanding of what they do and how to
+use them.
+
+perf bench (all) workload
+-------------------------
+
+The perf bench command contains multiple multi-threaded microkernel
+benchmarks for executing different subsystems in the Linux kernel and
+system calls. This allows us to easily measure the impact of changes,
+which can help mitigate performance regressions. It also acts as a common
+benchmarking framework, enabling developers to easily create test cases,
+integrate transparently, and use performance-rich tooling subsystems.
+
+Stress-ng netdev stressor workload
+----------------------------------
+
+stress-ng is used for performing stress testing on the kernel. It allows
+you to exercise various physical subsystems of the computer, as well as
+interfaces of the OS kernel, using "stressor-s". They are available for
+CPU, CPU cache, devices, I/O, interrupts, file system, memory, network,
+operating system, pipelines, schedulers, and virtual machines. Please refer
+to the `stress-ng man-page <https://www.mankier.com/1/stress-ng>`_ to
+find the description of all the available stressor-s. The netdev stressor
+starts specified number (N) of workers that exercise various netdevice
+ioctl commands across all the available network devices.
+
+paxtest kiddie workload
+-----------------------
+
+paxtest is a program that tests buffer overflows in the kernel. It tests
+kernel enforcements over memory usage. Generally, execution in some memory
+segments makes buffer overflows possible. It runs a set of programs that
+attempt to subvert memory usage. It is used as a regression test suite for
+PaX, but might be useful to test other memory protection patches for the
+kernel. We used paxtest kiddie mode which looks for simple vulnerabilities.
+
+What is strace and how do we use it?
+====================================
+
+As mentioned earlier, strace which is a useful diagnostic, instructional,
+and debugging tool and can be used to discover the system resources in use
+by a workload. It can be used:
+
+ * To see how a process interacts with the kernel.
+ * To see why a process is failing or hanging.
+ * For reverse engineering a process.
+ * To find the files on which a program depends.
+ * For analyzing the performance of an application.
+ * For troubleshooting various problems related to the operating system.
+
+In addition, strace can generate run-time statistics on times, calls, and
+errors for each system call and report a summary when program exits,
+suppressing the regular output. This attempts to show system time (CPU time
+spent running in the kernel) independent of wall clock time. We plan to use
+these features to get information on workload system usage.
+
+strace command supports basic, verbose, and stats modes. strace command when
+run in verbose mode gives more detailed information about the system calls
+invoked by a process.
+
+Running strace -c generates a report of the percentage of time spent in each
+system call, the total time in seconds, the microseconds per call, the total
+number of calls, the count of each system call that has failed with an error
+and the type of system call made.
+
+ * Usage: strace <command we want to trace>
+ * Verbose mode usage: strace -v <command>
+ * Gather statistics: strace -c <command>
+
+We used the “-c” option to gather fine-grained run-time statistics in use
+by three workloads we have chose for this analysis.
+
+ * perf
+ * stress-ng
+ * paxtest
+
+What is cscope and how do we use it?
+====================================
+
+Now let’s look at `cscope <https://cscope.sourceforge.net/>`_, a command
+line tool for browsing C, C++ or Java code-bases. We can use it to find
+all the references to a symbol, global definitions, functions called by a
+function, functions calling a function, text strings, regular expression
+patterns, files including a file.
+
+We can use cscope to find which system call belongs to which subsystem.
+This way we can find the kernel subsystems used by a process when it is
+executed.
+
+Let’s checkout the latest Linux repository and build cscope database::
+
+ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux
+ cd linux
+ cscope -R -p10 # builds cscope.out database before starting browse session
+ cscope -d -p10 # starts browse session on cscope.out database
+
+Note: Run "cscope -R -p10" to build the database and c"scope -d -p10" to
+enter into the browsing session. cscope by default cscope.out database.
+To get out of this mode press ctrl+d. -p option is used to specify the
+number of file path components to display. -p10 is optimal for browsing
+kernel sources.
+
+What is perf and how do we use it?
+==================================
+
+Perf is an analysis tool based on Linux 2.6+ systems, which abstracts the
+CPU hardware difference in performance measurement in Linux, and provides
+a simple command line interface. Perf is based on the perf_events interface
+exported by the kernel. It is very useful for profiling the system and
+finding performance bottlenecks in an application.
+
+If you haven't already checked out the Linux mainline repository, you can do
+so and then build kernel and perf tool::
+
+ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux
+ cd linux
+ make -j3 all
+ cd tools/perf
+ make
+
+Note: The perf command can be built without building the kernel in the
+repository and can be run on older kernels. However matching the kernel
+and perf revisions gives more accurate information on the subsystem usage.
+
+We used "perf stat" and "perf bench" options. For a detailed information on
+the perf tool, run "perf -h".
+
+perf stat
+---------
+The perf stat command generates a report of various hardware and software
+events. It does so with the help of hardware counter registers found in
+modern CPUs that keep the count of these activities. "perf stat cal" shows
+stats for cal command.
+
+Perf bench
+----------
+The perf bench command contains multiple multi-threaded microkernel
+benchmarks for executing different subsystems in the Linux kernel and
+system calls. This allows us to easily measure the impact of changes,
+which can help mitigate performance regressions. It also acts as a common
+benchmarking framework, enabling developers to easily create test cases,
+integrate transparently, and use performance-rich tooling.
+
+"perf bench all" command runs the following benchmarks:
+
+ * sched/messaging
+ * sched/pipe
+ * syscall/basic
+ * mem/memcpy
+ * mem/memset
+
+What is stress-ng and how do we use it?
+=======================================
+
+As mentioned earlier, stress-ng is used for performing stress testing on
+the kernel. It allows you to exercise various physical subsystems of the
+computer, as well as interfaces of the OS kernel, using stressor-s. They
+are available for CPU, CPU cache, devices, I/O, interrupts, file system,
+memory, network, operating system, pipelines, schedulers, and virtual
+machines.
+
+The netdev stressor starts N workers that exercise various netdevice ioctl
+commands across all the available network devices. The following ioctls are
+exercised:
+
+ * SIOCGIFCONF, SIOCGIFINDEX, SIOCGIFNAME, SIOCGIFFLAGS
+ * SIOCGIFADDR, SIOCGIFNETMASK, SIOCGIFMETRIC, SIOCGIFMTU
+ * SIOCGIFHWADDR, SIOCGIFMAP, SIOCGIFTXQLEN
+
+The following command runs the stressor::
+
+ stress-ng --netdev 1 -t 60 --metrics command.
+
+We can use the perf record command to record the events and information
+associated with a process. This command records the profiling data in the
+perf.data file in the same directory.
+
+Using the following commands you can record the events associated with the
+netdev stressor, view the generated report perf.data and annotate the to
+view the statistics of each instruction of the program::
+
+ perf record stress-ng --netdev 1 -t 60 --metrics command.
+ perf report
+ perf annotate
+
+What is paxtest and how do we use it?
+=====================================
+
+paxtest is a program that tests buffer overflows in the kernel. It tests
+kernel enforcements over memory usage. Generally, execution in some memory
+segments makes buffer overflows possible. It runs a set of programs that
+attempt to subvert memory usage. It is used as a regression test suite for
+PaX, and will be useful to test other memory protection patches for the
+kernel.
+
+paxtest provides kiddie and blackhat modes. The paxtest kiddie mode runs
+in normal mode, whereas the blackhat mode tries to get around the protection
+of the kernel testing for vulnerabilities. We focus on the kiddie mode here
+and combine "paxtest kiddie" run with "perf record" to collect CPU stack
+traces for the paxtest kiddie run to see which function is calling other
+functions in the performance profile. Then the "dwarf" (DWARF's Call Frame
+Information) mode can be used to unwind the stack.
+
+The following command can be used to view resulting report in call-graph
+format::
+
+ perf record --call-graph dwarf paxtest kiddie
+ perf report --stdio
+
+Tracing workloads
+=================
+
+Now that we understand the workloads, let's start tracing them.
+
+Tracing perf bench all workload
+-------------------------------
+
+Run the following command to trace perf bench all workload::
+
+ strace -c perf bench all
+
+**System Calls made by the workload**
+
+The below table shows the system calls invoked by the workload, number of
+times each system call is invoked, and the corresponding Linux subsystem.
+
++-------------------+-----------+-----------------+-------------------------+
+| System Call | # calls | Linux Subsystem | System Call (API) |
++===================+===========+=================+=========================+
+| getppid | 10000001 | Process Mgmt | sys_getpid() |
++-------------------+-----------+-----------------+-------------------------+
+| clone | 1077 | Process Mgmt. | sys_clone() |
++-------------------+-----------+-----------------+-------------------------+
+| prctl | 23 | Process Mgmt. | sys_prctl() |
++-------------------+-----------+-----------------+-------------------------+
+| prlimit64 | 7 | Process Mgmt. | sys_prlimit64() |
++-------------------+-----------+-----------------+-------------------------+
+| getpid | 10 | Process Mgmt. | sys_getpid() |
++-------------------+-----------+-----------------+-------------------------+
+| uname | 3 | Process Mgmt. | sys_uname() |
++-------------------+-----------+-----------------+-------------------------+
+| sysinfo | 1 | Process Mgmt. | sys_sysinfo() |
++-------------------+-----------+-----------------+-------------------------+
+| getuid | 1 | Process Mgmt. | sys_getuid() |
++-------------------+-----------+-----------------+-------------------------+
+| getgid | 1 | Process Mgmt. | sys_getgid() |
++-------------------+-----------+-----------------+-------------------------+
+| geteuid | 1 | Process Mgmt. | sys_geteuid() |
++-------------------+-----------+-----------------+-------------------------+
+| getegid | 1 | Process Mgmt. | sys_getegid |
++-------------------+-----------+-----------------+-------------------------+
+| close | 49951 | Filesystem | sys_close() |
++-------------------+-----------+-----------------+-------------------------+
+| pipe | 604 | Filesystem | sys_pipe() |
++-------------------+-----------+-----------------+-------------------------+
+| openat | 48560 | Filesystem | sys_opennat() |
++-------------------+-----------+-----------------+-------------------------+
+| fstat | 8338 | Filesystem | sys_fstat() |
++-------------------+-----------+-----------------+-------------------------+
+| stat | 1573 | Filesystem | sys_stat() |
++-------------------+-----------+-----------------+-------------------------+
+| pread64 | 9646 | Filesystem | sys_pread64() |
++-------------------+-----------+-----------------+-------------------------+
+| getdents64 | 1873 | Filesystem | sys_getdents64() |
++-------------------+-----------+-----------------+-------------------------+
+| access | 3 | Filesystem | sys_access() |
++-------------------+-----------+-----------------+-------------------------+
+| lstat | 1880 | Filesystem | sys_lstat() |
++-------------------+-----------+-----------------+-------------------------+
+| lseek | 6 | Filesystem | sys_lseek() |
++-------------------+-----------+-----------------+-------------------------+
+| ioctl | 3 | Filesystem | sys_ioctl() |
++-------------------+-----------+-----------------+-------------------------+
+| dup2 | 1 | Filesystem | sys_dup2() |
++-------------------+-----------+-----------------+-------------------------+
+| execve | 2 | Filesystem | sys_execve() |
++-------------------+-----------+-----------------+-------------------------+
+| fcntl | 8779 | Filesystem | sys_fcntl() |
++-------------------+-----------+-----------------+-------------------------+
+| statfs | 1 | Filesystem | sys_statfs() |
++-------------------+-----------+-----------------+-------------------------+
+| epoll_create | 2 | Filesystem | sys_epoll_create() |
++-------------------+-----------+-----------------+-------------------------+
+| epoll_ctl | 64 | Filesystem | sys_epoll_ctl() |
++-------------------+-----------+-----------------+-------------------------+
+| newfstatat | 8318 | Filesystem | sys_newfstatat() |
++-------------------+-----------+-----------------+-------------------------+
+| eventfd2 | 192 | Filesystem | sys_eventfd2() |
++-------------------+-----------+-----------------+-------------------------+
+| mmap | 243 | Memory Mgmt. | sys_mmap() |
++-------------------+-----------+-----------------+-------------------------+
+| mprotect | 32 | Memory Mgmt. | sys_mprotect() |
++-------------------+-----------+-----------------+-------------------------+
+| brk | 21 | Memory Mgmt. | sys_brk() |
++-------------------+-----------+-----------------+-------------------------+
+| munmap | 128 | Memory Mgmt. | sys_munmap() |
++-------------------+-----------+-----------------+-------------------------+
+| set_mempolicy | 156 | Memory Mgmt. | sys_set_mempolicy() |
++-------------------+-----------+-----------------+-------------------------+
+| set_tid_address | 1 | Process Mgmt. | sys_set_tid_address() |
++-------------------+-----------+-----------------+-------------------------+
+| set_robust_list | 1 | Futex | sys_set_robust_list() |
++-------------------+-----------+-----------------+-------------------------+
+| futex | 341 | Futex | sys_futex() |
++-------------------+-----------+-----------------+-------------------------+
+| sched_getaffinity | 79 | Scheduler | sys_sched_getaffinity() |
++-------------------+-----------+-----------------+-------------------------+
+| sched_setaffinity | 223 | Scheduler | sys_sched_setaffinity() |
++-------------------+-----------+-----------------+-------------------------+
+| socketpair | 202 | Network | sys_socketpair() |
++-------------------+-----------+-----------------+-------------------------+
+| rt_sigprocmask | 21 | Signal | sys_rt_sigprocmask() |
++-------------------+-----------+-----------------+-------------------------+
+| rt_sigaction | 36 | Signal | sys_rt_sigaction() |
++-------------------+-----------+-----------------+-------------------------+
+| rt_sigreturn | 2 | Signal | sys_rt_sigreturn() |
++-------------------+-----------+-----------------+-------------------------+
+| wait4 | 889 | Time | sys_wait4() |
++-------------------+-----------+-----------------+-------------------------+
+| clock_nanosleep | 37 | Time | sys_clock_nanosleep() |
++-------------------+-----------+-----------------+-------------------------+
+| capget | 4 | Capability | sys_capget() |
++-------------------+-----------+-----------------+-------------------------+
+
+Tracing stress-ng netdev stressor workload
+------------------------------------------
+
+Run the following command to trace stress-ng netdev stressor workload::
+
+ strace -c stress-ng --netdev 1 -t 60 --metrics
+
+**System Calls made by the workload**
+
+The below table shows the system calls invoked by the workload, number of
+times each system call is invoked, and the corresponding Linux subsystem.
+
++-------------------+-----------+-----------------+-------------------------+
+| System Call | # calls | Linux Subsystem | System Call (API) |
++===================+===========+=================+=========================+
+| openat | 74 | Filesystem | sys_openat() |
++-------------------+-----------+-----------------+-------------------------+
+| close | 75 | Filesystem | sys_close() |
++-------------------+-----------+-----------------+-------------------------+
+| read | 58 | Filesystem | sys_read() |
++-------------------+-----------+-----------------+-------------------------+
+| fstat | 20 | Filesystem | sys_fstat() |
++-------------------+-----------+-----------------+-------------------------+
+| flock | 10 | Filesystem | sys_flock() |
++-------------------+-----------+-----------------+-------------------------+
+| write | 7 | Filesystem | sys_write() |
++-------------------+-----------+-----------------+-------------------------+
+| getdents64 | 8 | Filesystem | sys_getdents64() |
++-------------------+-----------+-----------------+-------------------------+
+| pread64 | 8 | Filesystem | sys_pread64() |
++-------------------+-----------+-----------------+-------------------------+
+| lseek | 1 | Filesystem | sys_lseek() |
++-------------------+-----------+-----------------+-------------------------+
+| access | 2 | Filesystem | sys_access() |
++-------------------+-----------+-----------------+-------------------------+
+| getcwd | 1 | Filesystem | sys_getcwd() |
++-------------------+-----------+-----------------+-------------------------+
+| execve | 1 | Filesystem | sys_execve() |
++-------------------+-----------+-----------------+-------------------------+
+| mmap | 61 | Memory Mgmt. | sys_mmap() |
++-------------------+-----------+-----------------+-------------------------+
+| munmap | 3 | Memory Mgmt. | sys_munmap() |
++-------------------+-----------+-----------------+-------------------------+
+| mprotect | 20 | Memory Mgmt. | sys_mprotect() |
++-------------------+-----------+-----------------+-------------------------+
+| mlock | 2 | Memory Mgmt. | sys_mlock() |
++-------------------+-----------+-----------------+-------------------------+
+| brk | 3 | Memory Mgmt. | sys_brk() |
++-------------------+-----------+-----------------+-------------------------+
+| rt_sigaction | 21 | Signal | sys_rt_sigaction() |
++-------------------+-----------+-----------------+-------------------------+
+| rt_sigprocmask | 1 | Signal | sys_rt_sigprocmask() |
++-------------------+-----------+-----------------+-------------------------+
+| sigaltstack | 1 | Signal | sys_sigaltstack() |
++-------------------+-----------+-----------------+-------------------------+
+| rt_sigreturn | 1 | Signal | sys_rt_sigreturn() |
++-------------------+-----------+-----------------+-------------------------+
+| getpid | 8 | Process Mgmt. | sys_getpid() |
++-------------------+-----------+-----------------+-------------------------+
+| prlimit64 | 5 | Process Mgmt. | sys_prlimit64() |
++-------------------+-----------+-----------------+-------------------------+
+| arch_prctl | 2 | Process Mgmt. | sys_arch_prctl() |
++-------------------+-----------+-----------------+-------------------------+
+| sysinfo | 2 | Process Mgmt. | sys_sysinfo() |
++-------------------+-----------+-----------------+-------------------------+
+| getuid | 2 | Process Mgmt. | sys_getuid() |
++-------------------+-----------+-----------------+-------------------------+
+| uname | 1 | Process Mgmt. | sys_uname() |
++-------------------+-----------+-----------------+-------------------------+
+| setpgid | 1 | Process Mgmt. | sys_setpgid() |
++-------------------+-----------+-----------------+-------------------------+
+| getrusage | 1 | Process Mgmt. | sys_getrusage() |
++-------------------+-----------+-----------------+-------------------------+
+| geteuid | 1 | Process Mgmt. | sys_geteuid() |
++-------------------+-----------+-----------------+-------------------------+
+| getppid | 1 | Process Mgmt. | sys_getppid() |
++-------------------+-----------+-----------------+-------------------------+
+| sendto | 3 | Network | sys_sendto() |
++-------------------+-----------+-----------------+-------------------------+
+| connect | 1 | Network | sys_connect() |
++-------------------+-----------+-----------------+-------------------------+
+| socket | 1 | Network | sys_socket() |
++-------------------+-----------+-----------------+-------------------------+
+| clone | 1 | Process Mgmt. | sys_clone() |
++-------------------+-----------+-----------------+-------------------------+
+| set_tid_address | 1 | Process Mgmt. | sys_set_tid_address() |
++-------------------+-----------+-----------------+-------------------------+
+| wait4 | 2 | Time | sys_wait4() |
++-------------------+-----------+-----------------+-------------------------+
+| alarm | 1 | Time | sys_alarm() |
++-------------------+-----------+-----------------+-------------------------+
+| set_robust_list | 1 | Futex | sys_set_robust_list() |
++-------------------+-----------+-----------------+-------------------------+
+
+Tracing paxtest kiddie workload
+-------------------------------
+
+Run the following command to trace paxtest kiddie workload::
+
+ strace -c paxtest kiddie
+
+**System Calls made by the workload**
+
+The below table shows the system calls invoked by the workload, number of
+times each system call is invoked, and the corresponding Linux subsystem.
+
++-------------------+-----------+-----------------+----------------------+
+| System Call | # calls | Linux Subsystem | System Call (API) |
++===================+===========+=================+======================+
+| read | 3 | Filesystem | sys_read() |
++-------------------+-----------+-----------------+----------------------+
+| write | 11 | Filesystem | sys_write() |
++-------------------+-----------+-----------------+----------------------+
+| close | 41 | Filesystem | sys_close() |
++-------------------+-----------+-----------------+----------------------+
+| stat | 24 | Filesystem | sys_stat() |
++-------------------+-----------+-----------------+----------------------+
+| fstat | 2 | Filesystem | sys_fstat() |
++-------------------+-----------+-----------------+----------------------+
+| pread64 | 6 | Filesystem | sys_pread64() |
++-------------------+-----------+-----------------+----------------------+
+| access | 1 | Filesystem | sys_access() |
++-------------------+-----------+-----------------+----------------------+
+| pipe | 1 | Filesystem | sys_pipe() |
++-------------------+-----------+-----------------+----------------------+
+| dup2 | 24 | Filesystem | sys_dup2() |
++-------------------+-----------+-----------------+----------------------+
+| execve | 1 | Filesystem | sys_execve() |
++-------------------+-----------+-----------------+----------------------+
+| fcntl | 26 | Filesystem | sys_fcntl() |
++-------------------+-----------+-----------------+----------------------+
+| openat | 14 | Filesystem | sys_openat() |
++-------------------+-----------+-----------------+----------------------+
+| rt_sigaction | 7 | Signal | sys_rt_sigaction() |
++-------------------+-----------+-----------------+----------------------+
+| rt_sigreturn | 38 | Signal | sys_rt_sigreturn() |
++-------------------+-----------+-----------------+----------------------+
+| clone | 38 | Process Mgmt. | sys_clone() |
++-------------------+-----------+-----------------+----------------------+
+| wait4 | 44 | Time | sys_wait4() |
++-------------------+-----------+-----------------+----------------------+
+| mmap | 7 | Memory Mgmt. | sys_mmap() |
++-------------------+-----------+-----------------+----------------------+
+| mprotect | 3 | Memory Mgmt. | sys_mprotect() |
++-------------------+-----------+-----------------+----------------------+
+| munmap | 1 | Memory Mgmt. | sys_munmap() |
++-------------------+-----------+-----------------+----------------------+
+| brk | 3 | Memory Mgmt. | sys_brk() |
++-------------------+-----------+-----------------+----------------------+
+| getpid | 1 | Process Mgmt. | sys_getpid() |
++-------------------+-----------+-----------------+----------------------+
+| getuid | 1 | Process Mgmt. | sys_getuid() |
++-------------------+-----------+-----------------+----------------------+
+| getgid | 1 | Process Mgmt. | sys_getgid() |
++-------------------+-----------+-----------------+----------------------+
+| geteuid | 2 | Process Mgmt. | sys_geteuid() |
++-------------------+-----------+-----------------+----------------------+
+| getegid | 1 | Process Mgmt. | sys_getegid() |
++-------------------+-----------+-----------------+----------------------+
+| getppid | 1 | Process Mgmt. | sys_getppid() |
++-------------------+-----------+-----------------+----------------------+
+| arch_prctl | 2 | Process Mgmt. | sys_arch_prctl() |
++-------------------+-----------+-----------------+----------------------+
+
+Conclusion
+==========
+
+This document is intended to be used as a guide on how to gather fine-grained
+information on the resources in use by workloads using strace.
+
+References
+==========
+
+ * `Discovery Linux Kernel Subsystems used by OpenAPS <https://elisa.tech/blog/2022/02/02/discovery-linux-kernel-subsystems-used-by-openaps>`_
+ * `ELISA-White-Papers-Discovering Linux kernel subsystems used by a workload <https://github.com/elisa-tech/ELISA-White-Papers/blob/master/Processes/Discovering_Linux_kernel_subsystems_used_by_a_workload.md>`_
+ * `strace <https://man7.org/linux/man-pages/man1/strace.1.html>`_
+ * `perf <https://man7.org/linux/man-pages/man1/perf.1.html>`_
+ * `paxtest README <https://github.com/opntr/paxtest-freebsd/blob/hardenedbsd/0.9.14-hbsd/README>`_
+ * `stress-ng <https://www.mankier.com/1/stress-ng>`_
+ * `Monitoring and managing system status and performance <https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/index>`_
diff --git a/Documentation/admin-guide/xfs.rst b/Documentation/admin-guide/xfs.rst
index ad911be5b5e9..b67772cf36d6 100644
--- a/Documentation/admin-guide/xfs.rst
+++ b/Documentation/admin-guide/xfs.rst
@@ -133,7 +133,7 @@ When mounting an XFS filesystem, the following options are accepted.
logbsize must be an integer multiple of the log
stripe unit configured at **mkfs(8)** time.
- The default value for for version 1 logs is 32768, while the
+ The default value for version 1 logs is 32768, while the
default value for version 2 logs is MAX(32768, log_sunit).
logdev=device and rtdev=device
@@ -192,7 +192,7 @@ When mounting an XFS filesystem, the following options are accepted.
are any integer multiple of a valid ``sunit`` value.
Typically the only time these mount options are necessary if
- after an underlying RAID device has had it's geometry
+ after an underlying RAID device has had its geometry
modified, such as adding a new disk to a RAID5 lun and
reshaping it.
@@ -210,14 +210,40 @@ When mounting an XFS filesystem, the following options are accepted.
inconsistent namespace presentation during or after a
failover event.
+Deprecation of V4 Format
+========================
+
+The V4 filesystem format lacks certain features that are supported by
+the V5 format, such as metadata checksumming, strengthened metadata
+verification, and the ability to store timestamps past the year 2038.
+Because of this, the V4 format is deprecated. All users should upgrade
+by backing up their files, reformatting, and restoring from the backup.
+
+Administrators and users can detect a V4 filesystem by running xfs_info
+against a filesystem mountpoint and checking for a string containing
+"crc=". If no such string is found, please upgrade xfsprogs to the
+latest version and try again.
+
+The deprecation will take place in two parts. Support for mounting V4
+filesystems can now be disabled at kernel build time via Kconfig option.
+The option will default to yes until September 2025, at which time it
+will be changed to default to no. In September 2030, support will be
+removed from the codebase entirely.
+
+Note: Distributors may choose to withdraw V4 format support earlier than
+the dates listed above.
Deprecated Mount Options
========================
-=========================== ================
+============================ ================
Name Removal Schedule
-=========================== ================
-=========================== ================
+============================ ================
+Mounting with V4 filesystem September 2030
+Mounting ascii-ci filesystem September 2030
+ikeep/noikeep September 2025
+attr2/noattr2 September 2025
+============================ ================
Removed Mount Options
@@ -259,6 +285,9 @@ The following sysctls are available for the XFS filesystem:
removes unused preallocation from clean inodes and releases
the unused space back to the free pool.
+ fs.xfs.speculative_cow_prealloc_lifetime
+ This is an alias for speculative_prealloc_lifetime.
+
fs.xfs.error_level (Min: 0 Default: 3 Max: 11)
A volume knob for error reporting when internal errors occur.
This will generate detailed messages & backtraces for filesystem
@@ -268,7 +297,7 @@ The following sysctls are available for the XFS filesystem:
XFS_ERRLEVEL_LOW: 1
XFS_ERRLEVEL_HIGH: 5
- fs.xfs.panic_mask (Min: 0 Default: 0 Max: 256)
+ fs.xfs.panic_mask (Min: 0 Default: 0 Max: 511)
Causes certain error conditions to call BUG(). Value is a bitmask;
OR together the tags which represent errors which should cause panics:
@@ -331,7 +360,13 @@ The following sysctls are available for the XFS filesystem:
Deprecated Sysctls
==================
-None at present.
+=========================================== ================
+ Name Removal Schedule
+=========================================== ================
+fs.xfs.irix_sgid_inherit September 2025
+fs.xfs.irix_symlink_mode September 2025
+fs.xfs.speculative_cow_prealloc_lifetime September 2025
+=========================================== ================
Removed Sysctls
@@ -465,3 +500,45 @@ the class and error context. For example, the default values for
"metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults
to "fail immediately" behaviour. This is done because ENODEV is a fatal,
unrecoverable error no matter how many times the metadata IO is retried.
+
+Workqueue Concurrency
+=====================
+
+XFS uses kernel workqueues to parallelize metadata update processes. This
+enables it to take advantage of storage hardware that can service many IO
+operations simultaneously. This interface exposes internal implementation
+details of XFS, and as such is explicitly not part of any userspace API/ABI
+guarantee the kernel may give userspace. These are undocumented features of
+the generic workqueue implementation XFS uses for concurrency, and they are
+provided here purely for diagnostic and tuning purposes and may change at any
+time in the future.
+
+The control knobs for a filesystem's workqueues are organized by task at hand
+and the short name of the data device. They all can be found in:
+
+ /sys/bus/workqueue/devices/${task}!${device}
+
+================ ===========
+ Task Description
+================ ===========
+ xfs_iwalk-$pid Inode scans of the entire filesystem. Currently limited to
+ mount time quotacheck.
+ xfs-gc Background garbage collection of disk space that have been
+ speculatively allocated beyond EOF or for staging copy on
+ write operations.
+================ ===========
+
+For example, the knobs for the quotacheck workqueue for /dev/nvme0n1 would be
+found in /sys/bus/workqueue/devices/xfs_iwalk-1111!nvme0n1/.
+
+The interesting knobs for XFS workqueues are as follows:
+
+============ ===========
+ Knob Description
+============ ===========
+ max_active Maximum number of background threads that can be started to
+ run the work.
+ cpumask CPUs upon which the threads are allowed to run.
+ nice Relative priority of scheduling the threads. These are the
+ same nice levels that can be applied to userspace processes.
+============ ===========