1 files changed, 389 insertions, 277 deletions
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 38b606991065..104c6d047d9b 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -47,10 +47,13 @@ fixes/update part 1.1  Stefani Seibold <stefani@seibold.net>    June 9 2009
   3.10  /proc/<pid>/timerslack_ns - Task timerslack value
   3.11	/proc/<pid>/patch_state - Livepatch patch operation state
   3.12	/proc/<pid>/arch_status - Task architecture specific information
+  3.13  /proc/<pid>/fd - List of symlinks to open files
 
   4	Configuring procfs
   4.1	Mount options
 
+  5	Filesystem behavior
+
 Preface
 =======
 
@@ -82,7 +85,7 @@ contact Bodo  Bauer  at  bb@ricochet.net.  We'll  be happy to add them to this
 document.
 
 The   latest   version    of   this   document   is    available   online   at
-http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html
+https://www.kernel.org/doc/html/latest/filesystems/proc.html
 
 If  the above  direction does  not works  for you,  you could  try the  kernel
 mailing  list  at  linux-kernel@vger.kernel.org  and/or try  to  reach  me  at
@@ -121,10 +124,10 @@ show you how you can use /proc/sys to change settings.
 The directory  /proc  contains  (among other things) one subdirectory for each
 process running on the system, which is named after the process ID (PID).
 
-The link  self  points  to  the  process reading the file system. Each process
+The link  'self'  points to  the process reading the file system. Each process
 subdirectory has the entries listed in Table 1-1.
 
-Note that an open a file descriptor to /proc/<pid> or to any of its
+Note that an open file descriptor to /proc/<pid> or to any of its
 contained files or subdirectories does not prevent <pid> being reused
 for some other process in the event that <pid> exits. Operations on
 open /proc/<pid> file descriptors corresponding to dead processes
@@ -176,6 +179,7 @@ read the file /proc/PID/status::
   Gid:    100     100     100     100
   FDSize: 256
   Groups: 100 14 16
+  Kthread:    0
   VmPeak:     5004 kB
   VmSize:     5004 kB
   VmLck:         0 kB
@@ -208,6 +212,7 @@ read the file /proc/PID/status::
   NoNewPrivs:     0
   Seccomp:        0
   Speculation_Store_Bypass:       thread vulnerable
+  SpeculationIndirectBranch:      conditional enabled
   voluntary_ctxt_switches:        0
   nonvoluntary_ctxt_switches:     1
 
@@ -218,7 +223,7 @@ file /proc/PID/status. It fields are described in table 1-2.
 
 The  statm  file  contains  more  detailed  information about the process
 memory usage. Its seven fields are explained in Table 1-3.  The stat file
-contains details information about the process itself.  Its fields are
+contains detailed information about the process itself.  Its fields are
 explained in Table 1-4.
 
 (for SMP CONFIG users)
@@ -228,7 +233,7 @@ asynchronous manner and the value may not be very precise. To see a precise
 snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
 It's slow but very precise.
 
-.. table:: Table 1-2: Contents of the status files (as of 4.19)
+.. table:: Table 1-2: Contents of the status fields (as of 4.19)
 
  ==========================  ===================================================
  Field                       Content
@@ -242,7 +247,8 @@ It's slow but very precise.
  Ngid                        NUMA group ID (0 if none)
  Pid                         process id
  PPid                        process id of the parent process
- TracerPid                   PID of process tracing this process (0 if not)
+ TracerPid                   PID of process tracing this process (0 if not, or
+                             the tracer is outside of the current pid namespace)
  Uid                         Real, effective, saved set, and  file system UIDs
  Gid                         Real, effective, saved set, and  file system GIDs
  FDSize                      number of file descriptor slots currently allocated
@@ -251,6 +257,7 @@ It's slow but very precise.
  NSpid                       descendant namespace process ID hierarchy
  NSpgid                      descendant namespace process group ID hierarchy
  NSsid                       descendant namespace session ID hierarchy
+ Kthread                     kernel thread flag, 1 is yes, 0 is no
  VmPeak                      peak virtual memory size
  VmSize                      total program size
  VmLck                       locked memory size
@@ -290,6 +297,7 @@ It's slow but very precise.
  NoNewPrivs                  no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...)
  Seccomp                     seccomp mode, like prctl(PR_GET_SECCOMP, ...)
  Speculation_Store_Bypass    speculative store bypass mitigation status
+ SpeculationIndirectBranch   indirect branch speculation mode
  Cpus_allowed                mask of CPUs on which this process may run
  Cpus_allowed_list           Same as previous, but in "list format"
  Mems_allowed                mask of memory nodes allowed to this process
@@ -299,7 +307,7 @@ It's slow but very precise.
  ==========================  ===================================================
 
 
-.. table:: Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
+.. table:: Table 1-3: Contents of the statm fields (as of 2.6.8-rc3)
 
  ======== ===============================	==============================
  Field    Content
@@ -317,7 +325,7 @@ It's slow but very precise.
  ======== ===============================	==============================
 
 
-.. table:: Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
+.. table:: Table 1-4: Contents of the stat fields (as of 2.6.30-rc7)
 
   ============= ===============================================================
   Field         Content
@@ -422,12 +430,16 @@ with the memory region, as the case would be with BSS (uninitialized data).
 The "pathname" shows the name associated file for this mapping.  If the mapping
 is not associated with a file:
 
- =======                    ====================================
+ ===================        ===========================================
  [heap]                     the heap of the program
  [stack]                    the stack of the main process
  [vdso]                     the "virtual dynamic shared object",
                             the kernel system call handler
- =======                    ====================================
+ [anon:<name>]              a private anonymous mapping that has been
+                            named by userspace
+ [anon_shmem:<name>]        an anonymous shared memory mapping that has
+                            been named by userspace
+ ===================        ===========================================
 
  or if empty, the mapping is anonymous.
 
@@ -442,12 +454,14 @@ Memory Area, or VMA) there is a series of lines such as the following::
     MMUPageSize:           4 kB
     Rss:                 892 kB
     Pss:                 374 kB
+    Pss_Dirty:             0 kB
     Shared_Clean:        892 kB
     Shared_Dirty:          0 kB
     Private_Clean:         0 kB
     Private_Dirty:         0 kB
     Referenced:          892 kB
     Anonymous:             0 kB
+    KSM:                   0 kB
     LazyFree:              0 kB
     AnonHugePages:         0 kB
     ShmemPmdMapped:        0 kB
@@ -473,7 +487,9 @@ dirty shared and private pages in the mapping.
 The "proportional set size" (PSS) of a process is the count of pages it has
 in memory, where each page is divided by the number of processes sharing it.
 So if a process has 1000 pages all to itself, and 1000 shared with one other
-process, its PSS will be 1500.
+process, its PSS will be 1500.  "Pss_Dirty" is the portion of PSS which
+consists of dirty pages.  ("Pss_Clean" is not included, but it can be
+calculated by subtracting "Pss_Dirty" from "Pss".)
 
 Note that even a page which is part of a MAP_SHARED mapping, but has only
 a single pte mapped, i.e.  is currently used by only one process, is accounted
@@ -486,18 +502,21 @@ accessed.
 a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
 and a page is modified, the file page is replaced by a private anonymous copy.
 
+"KSM" reports how many of the pages are KSM pages. Note that KSM-placed zeropages
+are not included, only actual KSM pages.
+
 "LazyFree" shows the amount of memory which is marked by madvise(MADV_FREE).
 The memory isn't freed immediately with madvise(). It's freed in memory
 pressure if the memory is clean. Please note that the printed value might
 be lower than the real value due to optimizations used in the current
 implementation. If this is not desirable please file a bug report.
 
-"AnonHugePages" shows the ammount of memory backed by transparent hugepage.
+"AnonHugePages" shows the amount of memory backed by transparent hugepage.
 
-"ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by
+"ShmemPmdMapped" shows the amount of shared (shmem/tmpfs) memory backed by
 huge pages.
 
-"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
+"Shared_Hugetlb" and "Private_Hugetlb" show the amounts of memory backed by
 hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
 reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field.
 
@@ -508,8 +527,10 @@ replaced by copy-on-write) part of the underlying shmem object out on swap.
 "SwapPss" shows proportional swap share of this mapping. Unlike "Swap", this
 does not take into account swapped out page of underlying shmem objects.
 "Locked" indicates whether the mapping is locked in memory or not.
-"THPeligible" indicates whether the mapping is eligible for allocating THP
-pages - 1 if true, 0 otherwise. It just shows the current status.
+
+"THPeligible" indicates whether the mapping is eligible for allocating
+naturally aligned THP pages of any currently enabled size. 1 if true, 0
+otherwise.
 
 "VmFlags" field deserves a separate description. This member represents the
 kernel flags associated with the particular virtual memory area in two letter
@@ -536,13 +557,20 @@ encoded manner. The codes are the following:
     ac    area is accountable
     nr    swap space is not reserved for the area
     ht    area uses huge tlb pages
+    sf    synchronous page fault
     ar    architecture specific flag
+    wf    wipe on fork
     dd    do not include area into core dump
     sd    soft dirty flag
     mm    mixed map area
     hg    huge page advise flag
     nh    no huge page advise flag
-    mg    mergable advise flag
+    mg    mergeable advise flag
+    bt    arm64 BTI guarded page
+    mt    arm64 MTE allocation tags are enabled
+    um    userfaultfd missing tracking
+    uw    userfaultfd wr-protect tracking
+    ss    shadow stack page
     ==    =======================================
 
 Note that there is no guarantee that every flag and associated mnemonic will
@@ -661,9 +689,15 @@ files are there, and which are missing.
  File         Content
  ============ ===============================================================
  apm          Advanced power management info
+ bootconfig   Kernel command line obtained from boot config,
+ 	      and, if there were kernel parameters from the
+	      boot loader, a "# Parameters from bootloader:"
+	      line followed by a line containing those
+	      parameters prefixed by "# ".			(5.5)
  buddyinfo    Kernel memory allocator information (see text)	(2.5)
  bus          Directory containing bus specific information
- cmdline      Kernel command line
+ cmdline      Kernel command line, both from bootloader and embedded
+              in the kernel image
  cpuinfo      Info about the CPU
  devices      Available devices (block and character)
  dma          Used DMS channels
@@ -681,7 +715,14 @@ files are there, and which are missing.
  kcore        Kernel core image (can be ELF or A.OUT(deprecated in 2.4))
  kmsg         Kernel messages
  ksyms        Kernel symbol table
- loadavg      Load average of last 1, 5 & 15 minutes
+ loadavg      Load average of last 1, 5 & 15 minutes;
+                number of processes currently runnable (running or on ready queue);
+                total number of processes in system;
+                last pid created.
+                All fields are separated by one space except "number of
+                processes currently runnable" and "total number of processes
+                in system", which are separated by a slash ('/'). Example:
+                0.61 0.61 0.55 3/828 22084
  locks        Kernel locks
  meminfo      Memory info
  misc         Miscellaneous
@@ -779,7 +820,7 @@ SPU
   For this case the APIC will generate the interrupt with a IRQ vector
   of 0xff. This might also be generated by chipset bugs.
 
-RES, CAL, TLB]
+RES, CAL, TLB
   rescheduling, call and TLB flush interrupts are
   sent from one CPU to another per the needs of the OS.  Typically,
   their statistics are used by kernel developers and interested users to
@@ -791,7 +832,7 @@ suppressed when the system is a uniprocessor.  As of this writing, only
 i386 and x86_64 platforms support the new IRQ vector displays.
 
 Of some interest is the introduction of the /proc/irq directory to 2.4.
-It could be used to set IRQ to CPU affinity, this means that you can "hook" an
+It could be used to set IRQ to CPU affinity. This means that you can "hook" an
 IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the
 irq subdir is one subdir for each IRQ, and two files; default_smp_affinity and
 prof_cpu_mask.
@@ -805,7 +846,7 @@ For example::
   smp_affinity
 
 smp_affinity is a bitmask, in which you can specify which CPUs can handle the
-IRQ, you can set it by doing::
+IRQ. You can set it by doing::
 
   > echo 1 > /proc/irq/10/smp_affinity
 
@@ -818,7 +859,7 @@ The contents of each smp_affinity file is the same by default::
   ffffffff
 
 There is an alternate interface, smp_affinity_list which allows specifying
-a cpu range instead of a bitmask::
+a CPU range instead of a bitmask::
 
   > cat /proc/irq/0/smp_affinity_list
   1024-1031
@@ -832,7 +873,7 @@ reports itself as being attached. This hardware locality information does not
 include information about any possible driver locality preference.
 
 prof_cpu_mask specifies which CPUs are to be profiled by the system wide
-profiler. Default value is ffffffff (all cpus if there are only 32 of them).
+profiler. Default value is ffffffff (all CPUs if there are only 32 of them).
 
 The way IRQs are routed is handled by the IO-APIC, and it's Round Robin
 between all the CPUs which are allowed to handle it. As usual the kernel has
@@ -894,7 +935,7 @@ pagetypeinfo::
 
 Fragmentation avoidance in the kernel works by grouping pages of different
 migrate types into the same contiguous regions of memory called page blocks.
-A page block is typically the size of the default hugepage size e.g. 2MB on
+A page block is typically the size of the default hugepage size, e.g. 2MB on
 X86-64. By keeping pages grouped based on their ability to move, the kernel
 can reclaim pages within a page block to satisfy a high-order allocation.
 
@@ -916,56 +957,82 @@ meminfo
 ~~~~~~~
 
 Provides information about distribution and utilization of memory.  This
-varies by architecture and compile options.  The following is from a
-16GB PIII, which has highmem enabled.  You may not have all of these fields.
+varies by architecture and compile options.  Some of the counters reported
+here overlap.  The memory reported by the non overlapping counters may not
+add up to the overall memory usage and the difference for some workloads
+can be substantial.  In many cases there are other means to find out
+additional memory using subsystem specific interfaces, for instance
+/proc/net/sockstat for TCP memory allocations.
+
+Example output. You may not have all of these fields.
 
 ::
 
     > cat /proc/meminfo
 
-    MemTotal:     16344972 kB
-    MemFree:      13634064 kB
-    MemAvailable: 14836172 kB
-    Buffers:          3656 kB
-    Cached:        1195708 kB
-    SwapCached:          0 kB
-    Active:         891636 kB
-    Inactive:      1077224 kB
-    HighTotal:    15597528 kB
-    HighFree:     13629632 kB
-    LowTotal:       747444 kB
-    LowFree:          4432 kB
-    SwapTotal:           0 kB
-    SwapFree:            0 kB
-    Dirty:             968 kB
-    Writeback:           0 kB
-    AnonPages:      861800 kB
-    Mapped:         280372 kB
-    Shmem:             644 kB
-    KReclaimable:   168048 kB
-    Slab:           284364 kB
-    SReclaimable:   159856 kB
-    SUnreclaim:     124508 kB
-    PageTables:      24448 kB
-    NFS_Unstable:        0 kB
-    Bounce:              0 kB
-    WritebackTmp:        0 kB
-    CommitLimit:   7669796 kB
-    Committed_AS:   100056 kB
-    VmallocTotal:   112216 kB
-    VmallocUsed:       428 kB
-    VmallocChunk:   111088 kB
-    Percpu:          62080 kB
-    HardwareCorrupted:   0 kB
-    AnonHugePages:   49152 kB
-    ShmemHugePages:      0 kB
-    ShmemPmdMapped:      0 kB
+    MemTotal:       32858820 kB
+    MemFree:        21001236 kB
+    MemAvailable:   27214312 kB
+    Buffers:          581092 kB
+    Cached:          5587612 kB
+    SwapCached:            0 kB
+    Active:          3237152 kB
+    Inactive:        7586256 kB
+    Active(anon):      94064 kB
+    Inactive(anon):  4570616 kB
+    Active(file):    3143088 kB
+    Inactive(file):  3015640 kB
+    Unevictable:           0 kB
+    Mlocked:               0 kB
+    SwapTotal:             0 kB
+    SwapFree:              0 kB
+    Zswap:              1904 kB
+    Zswapped:           7792 kB
+    Dirty:                12 kB
+    Writeback:             0 kB
+    AnonPages:       4654780 kB
+    Mapped:           266244 kB
+    Shmem:              9976 kB
+    KReclaimable:     517708 kB
+    Slab:             660044 kB
+    SReclaimable:     517708 kB
+    SUnreclaim:       142336 kB
+    KernelStack:       11168 kB
+    PageTables:        20540 kB
+    SecPageTables:         0 kB
+    NFS_Unstable:          0 kB
+    Bounce:                0 kB
+    WritebackTmp:          0 kB
+    CommitLimit:    16429408 kB
+    Committed_AS:    7715148 kB
+    VmallocTotal:   34359738367 kB
+    VmallocUsed:       40444 kB
+    VmallocChunk:          0 kB
+    Percpu:            29312 kB
+    EarlyMemtestBad:       0 kB
+    HardwareCorrupted:     0 kB
+    AnonHugePages:   4149248 kB
+    ShmemHugePages:        0 kB
+    ShmemPmdMapped:        0 kB
+    FileHugePages:         0 kB
+    FilePmdMapped:         0 kB
+    CmaTotal:              0 kB
+    CmaFree:               0 kB
+    HugePages_Total:       0
+    HugePages_Free:        0
+    HugePages_Rsvd:        0
+    HugePages_Surp:        0
+    Hugepagesize:       2048 kB
+    Hugetlb:               0 kB
+    DirectMap4k:      401152 kB
+    DirectMap2M:    10008576 kB
+    DirectMap1G:    24117248 kB
 
 MemTotal
-              Total usable ram (i.e. physical ram minus a few reserved
+              Total usable RAM (i.e. physical RAM minus a few reserved
               bits and the kernel binary code)
 MemFree
-              The sum of LowFree+HighFree
+              Total free RAM. On highmem systems, the sum of LowFree+HighFree
 MemAvailable
               An estimate of how much memory is available for starting new
               applications, without swapping. Calculated from MemFree,
@@ -979,8 +1046,9 @@ Buffers
               Relatively temporary storage for raw disk blocks
               shouldn't get tremendously large (20MB or so)
 Cached
-              in-memory cache for files read from the disk (the
-              pagecache).  Doesn't include SwapCached
+              In-memory cache for files read from the disk (the
+              pagecache) as well as tmpfs & shmem.
+              Doesn't include SwapCached.
 SwapCached
               Memory that once was swapped out, is swapped back in but
               still also is in the swapfile (if memory is needed it
@@ -992,8 +1060,13 @@ Active
 Inactive
               Memory which has been less recently used.  It is more
               eligible to be reclaimed for other purposes
+Unevictable
+              Memory allocated for userspace which cannot be reclaimed, such
+              as mlocked pages, ramfs backing pages, secret memfd pages etc.
+Mlocked
+              Memory locked with mlock().
 HighTotal, HighFree
-              Highmem is all memory above ~860MB of physical memory
+              Highmem is all memory above ~860MB of physical memory.
               Highmem areas are for use by userspace programs, or
               for the pagecache.  The kernel must use tricks to access
               this memory, making it slower to access than lowmem.
@@ -1008,26 +1081,20 @@ SwapTotal
 SwapFree
               Memory which has been evicted from RAM, and is temporarily
               on the disk
+Zswap
+              Memory consumed by the zswap backend (compressed size)
+Zswapped
+              Amount of anonymous memory stored in zswap (original size)
 Dirty
               Memory which is waiting to get written back to the disk
 Writeback
               Memory which is actively being written back to the disk
 AnonPages
               Non-file backed pages mapped into userspace page tables
-HardwareCorrupted
-              The amount of RAM/memory in KB, the kernel identifies as
-	      corrupted.
-AnonHugePages
-              Non-file backed huge pages mapped into userspace page tables
 Mapped
-              files which have been mmaped, such as libraries
+              files which have been mmapped, such as libraries
 Shmem
               Total memory used by shared memory (shmem) and tmpfs
-ShmemHugePages
-              Memory used by shared memory (shmem) and tmpfs allocated
-              with huge pages
-ShmemPmdMapped
-              Shared memory mapped into userspace with huge pages
 KReclaimable
               Kernel allocations that the kernel will attempt to reclaim
               under memory pressure. Includes SReclaimable (below), and other
@@ -1038,12 +1105,16 @@ SReclaimable
               Part of Slab, that might be reclaimed, such as caches
 SUnreclaim
               Part of Slab, that cannot be reclaimed on memory pressure
+KernelStack
+              Memory consumed by the kernel stacks of all tasks
 PageTables
-              amount of memory dedicated to the lowest level of page
-              tables.
+              Memory consumed by userspace page tables
+SecPageTables
+              Memory consumed by secondary page tables, this currently
+              currently includes KVM mmu allocations on x86 and arm64.
 NFS_Unstable
-              NFS pages sent to the server, but not yet committed to stable
-	      storage
+              Always zero. Previous counted pages which had been written to
+              the server, but has not been committed to stable storage.
 Bounce
               Memory used for block device "bounce buffers"
 WritebackTmp
@@ -1065,23 +1136,23 @@ CommitLimit
               yield a CommitLimit of 7.3G.
 
               For more details, see the memory overcommit documentation
-              in vm/overcommit-accounting.
+              in mm/overcommit-accounting.
 Committed_AS
               The amount of memory presently allocated on the system.
               The committed memory is a sum of all of the memory which
               has been allocated by processes, even if it has not been
               "used" by them as of yet. A process which malloc()'s 1G
               of memory, but only touches 300M of it will show up as
-	      using 1G. This 1G is memory which has been "committed" to
+              using 1G. This 1G is memory which has been "committed" to
               by the VM and can be used at any time by the allocating
               application. With strict overcommit enabled on the system
-              (mode 2 in 'vm.overcommit_memory'),allocations which would
+              (mode 2 in 'vm.overcommit_memory'), allocations which would
               exceed the CommitLimit (detailed above) will not be permitted.
               This is useful if one needs to guarantee that processes will
               not fail due to lack of memory once that memory has been
               successfully allocated.
 VmallocTotal
-              total size of vmalloc memory area
+              total size of vmalloc virtual address space
 VmallocUsed
               amount of vmalloc area which is used
 VmallocChunk
@@ -1089,6 +1160,37 @@ VmallocChunk
 Percpu
               Memory allocated to the percpu allocator used to back percpu
               allocations. This stat excludes the cost of metadata.
+EarlyMemtestBad
+              The amount of RAM/memory in kB, that was identified as corrupted
+              by early memtest. If memtest was not run, this field will not
+              be displayed at all. Size is never rounded down to 0 kB.
+              That means if 0 kB is reported, you can safely assume
+              there was at least one pass of memtest and none of the passes
+              found a single faulty byte of RAM.
+HardwareCorrupted
+              The amount of RAM/memory in KB, the kernel identifies as
+              corrupted.
+AnonHugePages
+              Non-file backed huge pages mapped into userspace page tables
+ShmemHugePages
+              Memory used by shared memory (shmem) and tmpfs allocated
+              with huge pages
+ShmemPmdMapped
+              Shared memory mapped into userspace with huge pages
+FileHugePages
+              Memory used for filesystem data (page cache) allocated
+              with huge pages
+FilePmdMapped
+              Page cache mapped into userspace with huge pages
+CmaTotal
+              Memory reserved for the Contiguous Memory Allocator (CMA)
+CmaFree
+              Free remaining memory in the CMA reserves
+HugePages_Total, HugePages_Free, HugePages_Rsvd, HugePages_Surp, Hugepagesize, Hugetlb
+              See Documentation/admin-guide/mm/hugetlbpage.rst.
+DirectMap4k, DirectMap2M, DirectMap1G
+              Breakdown of page table sizes used in the kernel's
+              identity mapping of RAM
 
 vmallocinfo
 ~~~~~~~~~~~
@@ -1096,7 +1198,7 @@ vmallocinfo
 Provides information about vmalloced/vmaped areas. One line per area,
 containing the virtual address range of the area, size in bytes,
 caller information of the creator, and optional information depending
-on the kind of area :
+on the kind of area:
 
  ==========  ===================================================
  pages=nr    number of pages
@@ -1141,101 +1243,23 @@ on the kind of area :
 softirqs
 ~~~~~~~~
 
-Provides counts of softirq handlers serviced since boot time, for each cpu.
+Provides counts of softirq handlers serviced since boot time, for each CPU.
 
 ::
 
     > cat /proc/softirqs
-		    CPU0       CPU1       CPU2       CPU3
+		  CPU0       CPU1       CPU2       CPU3
 	HI:          0          0          0          0
-    TIMER:      27166      27120      27097      27034
+    TIMER:       27166      27120      27097      27034
     NET_TX:          0          0          0         17
     NET_RX:         42          0          0         39
-    BLOCK:          0          0        107       1121
-    TASKLET:          0          0          0        290
-    SCHED:      27035      26983      26971      26746
-    HRTIMER:          0          0          0          0
-	RCU:       1678       1769       2178       2250
-
-
-1.3 IDE devices in /proc/ide
-----------------------------
-
-The subdirectory /proc/ide contains information about all IDE devices of which
-the kernel  is  aware.  There is one subdirectory for each IDE controller, the
-file drivers  and a link for each IDE device, pointing to the device directory
-in the controller specific subtree.
-
-The file  drivers  contains general information about the drivers used for the
-IDE devices::
-
-  > cat /proc/ide/drivers
-  ide-cdrom version 4.53
-  ide-disk version 1.08
-
-More detailed  information  can  be  found  in  the  controller  specific
-subdirectories. These  are  named  ide0,  ide1  and  so  on.  Each  of  these
-directories contains the files shown in table 1-6.
-
-
-.. table:: Table 1-6: IDE controller info in  /proc/ide/ide?
-
- ======= =======================================
- File    Content
- ======= =======================================
- channel IDE channel (0 or 1)
- config  Configuration (only for PCI/IDE bridge)
- mate    Mate name
- model   Type/Chipset of IDE controller
- ======= =======================================
-
-Each device  connected  to  a  controller  has  a separate subdirectory in the
-controllers directory.  The  files  listed in table 1-7 are contained in these
-directories.
-
-
-.. table:: Table 1-7: IDE device information
-
- ================ ==========================================
- File             Content
- ================ ==========================================
- cache            The cache
- capacity         Capacity of the medium (in 512Byte blocks)
- driver           driver and version
- geometry         physical and logical geometry
- identify         device identify block
- media            media type
- model            device identifier
- settings         device setup
- smart_thresholds IDE disk management thresholds
- smart_values     IDE disk management values
- ================ ==========================================
-
-The most  interesting  file is ``settings``. This file contains a nice
-overview of the drive parameters::
-
-  # cat /proc/ide/ide0/hda/settings
-  name                    value           min             max             mode
-  ----                    -----           ---             ---             ----
-  bios_cyl                526             0               65535           rw
-  bios_head               255             0               255             rw
-  bios_sect               63              0               63              rw
-  breada_readahead        4               0               127             rw
-  bswap                   0               0               1               r
-  file_readahead          72              0               2097151         rw
-  io_32bit                0               0               3               rw
-  keepsettings            0               0               1               rw
-  max_kb_per_request      122             1               127             rw
-  multcount               0               0               8               rw
-  nice1                   1               0               1               rw
-  nowerr                  0               0               1               rw
-  pio_mode                write-only      0               255             w
-  slow                    0               0               1               rw
-  unmaskirq               0               0               1               rw
-  using_dma               0               0               1               rw
-
-
-1.4 Networking info in /proc/net
+    BLOCK:           0          0        107       1121
+    TASKLET:         0          0          0        290
+    SCHED:       27035      26983      26971      26746
+    HRTIMER:         0          0          0          0
+	RCU:      1678       1769       2178       2250
+
+1.3 Networking info in /proc/net
 --------------------------------
 
 The subdirectory  /proc/net  follows  the  usual  pattern. Table 1-8 shows the
@@ -1281,6 +1305,7 @@ support this. Table 1-9 lists the files and their meaning.
  rt_cache      Routing cache
  snmp          SNMP data
  sockstat      Socket statistics
+ softnet_stat  Per-CPU incoming packets queues statistics of online CPUs
  tcp           TCP  sockets
  udp           UDP sockets
  unix          UNIX domain sockets
@@ -1314,12 +1339,12 @@ It will contain information that is specific to that bond, such as the
 current slaves of the bond, the link status of the slaves, and how
 many times the slaves link has failed.
 
-1.5 SCSI info
+1.4 SCSI info
 -------------
 
-If you  have  a  SCSI  host adapter in your system, you'll find a subdirectory
-named after  the driver for this adapter in /proc/scsi. You'll also see a list
-of all recognized SCSI devices in /proc/scsi::
+If you have a SCSI or ATA host adapter in your system, you'll find a
+subdirectory named after the driver for this adapter in /proc/scsi.
+You'll also see a list of all recognized SCSI devices in /proc/scsi::
 
   >cat /proc/scsi/scsi
   Attached devices:
@@ -1377,7 +1402,7 @@ AHA-2940 SCSI adapter::
     Total transfers 0 (0 reads and 0 writes)
 
 
-1.6 Parallel port info in /proc/parport
+1.5 Parallel port info in /proc/parport
 ---------------------------------------
 
 The directory  /proc/parport  contains information about the parallel ports of
@@ -1402,11 +1427,11 @@ These directories contain the four files shown in Table 1-10.
            number or none).
  ========= ====================================================================
 
-1.7 TTY info in /proc/tty
+1.6 TTY info in /proc/tty
 -------------------------
 
 Information about  the  available  and actually used tty's can be found in the
-directory /proc/tty.You'll  find  entries  for drivers and line disciplines in
+directory /proc/tty. You'll find  entries  for drivers and line disciplines in
 this directory, as shown in Table 1-11.
 
 
@@ -1437,7 +1462,7 @@ To see  which  tty's  are  currently in use, you can simply look into the file
   unknown              /dev/tty        4    1-63 console
 
 
-1.8 Miscellaneous kernel statistics in /proc/stat
+1.7 Miscellaneous kernel statistics in /proc/stat
 -------------------------------------------------
 
 Various pieces   of  information about  kernel activity  are  available in the
@@ -1445,16 +1470,18 @@ Various pieces   of  information about  kernel activity  are  available in the
 since the system first booted.  For a quick look, simply cat the file::
 
   > cat /proc/stat
-  cpu  2255 34 2290 22625563 6290 127 456 0 0 0
-  cpu0 1132 34 1441 11311718 3675 127 438 0 0 0
-  cpu1 1123 0 849 11313845 2614 0 18 0 0 0
-  intr 114930548 113199788 3 0 5 263 0 4 [... lots more numbers ...]
-  ctxt 1990473
-  btime 1062191376
-  processes 2915
-  procs_running 1
+  cpu  237902850 368826709 106375398 1873517540 1135548 0 14507935 0 0 0
+  cpu0 60045249 91891769 26331539 468411416 495718 0 5739640 0 0 0
+  cpu1 59746288 91759249 26609887 468860630 312281 0 4384817 0 0 0
+  cpu2 59489247 92985423 26904446 467808813 171668 0 2268998 0 0 0
+  cpu3 58622065 92190267 26529524 468436680 155879 0 2114478 0 0 0
+  intr 8688370575 8 3373 0 0 0 0 0 0 1 40791 0 0 353317 0 0 0 0 224789828 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 190974333 41958554 123983334 43 0 224593 0 0 0 <more 0's deleted>
+  ctxt 22848221062
+  btime 1605316999
+  processes 746787147
+  procs_running 2
   procs_blocked 0
-  softirq 183433 0 21755 12 39 1137 231 21459 2263
+  softirq 12121874454 100099120 3938138295 127375644 2795979 187870761 0 173808342 3072582055 52608 224184354
 
 The very first  "cpu" line aggregates the  numbers in all  of the other "cpuN"
 lines.  These numbers identify the amount of time the CPU has spent performing
@@ -1468,9 +1495,9 @@ second).  The meanings of the columns are as follows, from left to right:
 - iowait: In a word, iowait stands for waiting for I/O to complete. But there
   are several problems:
 
-  1. Cpu will not wait for I/O to complete, iowait is the time that a task is
-     waiting for I/O to complete. When cpu goes into idle state for
-     outstanding task io, another task will be scheduled on this CPU.
+  1. CPU will not wait for I/O to complete, iowait is the time that a task is
+     waiting for I/O to complete. When CPU goes into idle state for
+     outstanding task I/O, another task will be scheduled on this CPU.
   2. In a multi-core CPU, the task waiting for I/O to complete is not running
      on any CPU, so the iowait of each CPU is difficult to calculate.
   3. The value of iowait field in /proc/stat will decrease in certain
@@ -1510,14 +1537,14 @@ softirqs serviced; each subsequent column is the total for that particular
 softirq.
 
 
-1.9 Ext4 file system parameters
+1.8 Ext4 file system parameters
 -------------------------------
 
 Information about mounted ext4 file systems can be found in
 /proc/fs/ext4.  Each mounted filesystem will have a directory in
 /proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
-/proc/fs/ext4/dm-0).   The files in each per-device directory are shown
-in Table 1-12, below.
+/proc/fs/ext4/sda9 or /proc/fs/ext4/dm-0).   The files in each per-device
+directory are shown in Table 1-12, below.
 
 .. table:: Table 1-12: Files in /proc/fs/ext4/<devname>
 
@@ -1526,8 +1553,8 @@ in Table 1-12, below.
  mb_groups       details of multiblock allocator buddy cache of free blocks
  ==============  ==========================================================
 
-2.0 /proc/consoles
-------------------
+1.9 /proc/consoles
+-------------------
 Shows registered system console lines.
 
 To see which character device lines are currently used for the system console
@@ -1587,10 +1614,9 @@ production system.  Set  up  a  development machine and test to make sure that
 everything works  the  way  you want it to. You may have no alternative but to
 reboot the machine once an error has been made.
 
-To change  a  value,  simply  echo  the new value into the file. An example is
-given below  in the section on the file system data. You need to be root to do
-this. You  can  create  your  own  boot script to perform this every time your
-system boots.
+To change  a  value,  simply  echo  the new value into the file.
+You need to be root to do this. You  can  create  your  own  boot script
+to perform this every time your system boots.
 
 The files  in /proc/sys can be used to fine tune and monitor miscellaneous and
 general things  in  the operation of the Linux kernel. Since some of the files
@@ -1598,12 +1624,12 @@ can inadvertently  disrupt  your  system,  it  is  advisable  to  read  both
 documentation and  source  before actually making adjustments. In any case, be
 very careful  when  writing  to  any  of these files. The entries in /proc may
 change slightly between the 2.1.* and the 2.2 kernel, so if there is any doubt
-review the kernel documentation in the directory /usr/src/linux/Documentation.
+review the kernel documentation in the directory linux/Documentation.
 This chapter  is  heavily  based  on the documentation included in the pre 2.2
 kernels, and became part of it in version 2.2.1 of the Linux kernel.
 
-Please see: Documentation/admin-guide/sysctl/ directory for descriptions of these
-entries.
+Please see: Documentation/admin-guide/sysctl/ directory for descriptions of
+these entries.
 
 Summary
 -------
@@ -1621,8 +1647,8 @@ Chapter 3: Per-process Parameters
 3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj- Adjust the oom-killer score
 --------------------------------------------------------------------------------
 
-These file can be used to adjust the badness heuristic used to select which
-process gets killed in out of memory conditions.
+These files can be used to adjust the badness heuristic used to select which
+process gets killed in out of memory (oom) conditions.
 
 The badness heuristic assigns a value to each candidate task ranging from 0
 (never kill) to 1000 (always kill) to determine which process is targeted.  The
@@ -1631,9 +1657,6 @@ may allocate from based on an estimation of its current memory and swap use.
 For example, if a task is using all allowed memory, its badness score will be
 1000.  If it is using half of its allowed memory, its score will be 500.
 
-There is an additional factor included in the badness score: the current memory
-and swap usage is discounted by 3% for root processes.
-
 The amount of "allowed" memory depends on the context in which the oom killer
 was called.  If it is due to the memory assigned to the allocating task's cpuset
 being exhausted, the allowed memory represents the set of mems assigned to that
@@ -1669,24 +1692,22 @@ The value of /proc/<pid>/oom_score_adj may be reduced no lower than the last
 value set by a CAP_SYS_RESOURCE process. To reduce the value any lower
 requires CAP_SYS_RESOURCE.
 
-Caveat: when a parent task is selected, the oom killer will sacrifice any first
-generation children with separate address spaces instead, if possible.  This
-avoids servers and important system daemons from being killed and loses the
-minimal amount of work.
-
 
 3.2 /proc/<pid>/oom_score - Display current oom-killer score
 -------------------------------------------------------------
 
-This file can be used to check the current score used by the oom-killer is for
+This file can be used to check the current score used by the oom-killer for
 any given <pid>. Use it together with /proc/<pid>/oom_score_adj to tune which
 process should be killed in an out-of-memory situation.
 
+Please note that the exported value includes oom_score_adj so it is
+effectively in range [0,2000].
+
 
 3.3  /proc/<pid>/io - Display the IO accounting fields
 -------------------------------------------------------
 
-This file contains IO statistics for each running process
+This file contains IO statistics for each running process.
 
 Example
 ~~~~~~~
@@ -1717,7 +1738,7 @@ The number of bytes which this task has caused to be read from storage. This
 is simply the sum of bytes which this process passed to read() and pread().
 It includes things like tty IO and it is unaffected by whether or not actual
 physical disk IO was required (the read might have been satisfied from
-pagecache)
+pagecache).
 
 
 wchar
@@ -1839,19 +1860,19 @@ For example::
 This file contains lines of the form::
 
     36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
-    (1)(2)(3)   (4)   (5)      (6)      (7)   (8) (9)   (10)         (11)
-
-    (1) mount ID:  unique identifier of the mount (may be reused after umount)
-    (2) parent ID:  ID of parent (or of self for the top of the mount tree)
-    (3) major:minor:  value of st_dev for files on filesystem
-    (4) root:  root of the mount within the filesystem
-    (5) mount point:  mount point relative to the process's root
-    (6) mount options:  per mount options
-    (7) optional fields:  zero or more fields of the form "tag[:value]"
-    (8) separator:  marks the end of the optional fields
-    (9) filesystem type:  name of filesystem of the form "type[.subtype]"
-    (10) mount source:  filesystem specific information or "none"
-    (11) super options:  per super block options
+    (1)(2)(3)   (4)   (5)      (6)     (n…m) (m+1)(m+2) (m+3)         (m+4)
+
+    (1)   mount ID:        unique identifier of the mount (may be reused after umount)
+    (2)   parent ID:       ID of parent (or of self for the top of the mount tree)
+    (3)   major:minor:     value of st_dev for files on filesystem
+    (4)   root:            root of the mount within the filesystem
+    (5)   mount point:     mount point relative to the process's root
+    (6)   mount options:   per mount options
+    (n…m) optional fields: zero or more fields of the form "tag[:value]"
+    (m+1) separator:       marks the end of the optional fields
+    (m+2) filesystem type: name of filesystem of the form "type[.subtype]"
+    (m+3) mount source:    filesystem specific information or "none"
+    (m+4) super options:   per super block options
 
 Parsers should ignore all unrecognised optional fields.  Currently the
 possible optional fields are:
@@ -1870,12 +1891,12 @@ unbindable        mount is unbindable
 
 For more information on mount propagation see:
 
-  Documentation/filesystems/sharedsubtree.txt
+  Documentation/filesystems/sharedsubtree.rst
 
 
 3.6	/proc/<pid>/comm  & /proc/<pid>/task/<tid>/comm
 --------------------------------------------------------
-These files provide a method to access a tasks comm value. It also allows for
+These files provide a method to access a task's comm value. It also allows for
 a task to set its own or one of its thread siblings comm value. The comm value
 is limited in size compared to the cmdline value, so writing anything longer
 then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated
@@ -1888,32 +1909,34 @@ This file provides a fast way to retrieve first level children pids
 of a task pointed by <pid>/<tid> pair. The format is a space separated
 stream of pids.
 
-Note the "first level" here -- if a child has own children they will
-not be listed here, one needs to read /proc/<children-pid>/task/<tid>/children
+Note the "first level" here -- if a child has its own children they will
+not be listed here; one needs to read /proc/<children-pid>/task/<tid>/children
 to obtain the descendants.
 
 Since this interface is intended to be fast and cheap it doesn't
 guarantee to provide precise results and some children might be
 skipped, especially if they've exited right after we printed their
-pids, so one need to either stop or freeze processes being inspected
+pids, so one needs to either stop or freeze processes being inspected
 if precise results are needed.
 
 
 3.8	/proc/<pid>/fdinfo/<fd> - Information about opened file
 ---------------------------------------------------------------
 This file provides information associated with an opened file. The regular
-files have at least three fields -- 'pos', 'flags' and mnt_id. The 'pos'
-represents the current offset of the opened file in decimal form [see lseek(2)
-for details], 'flags' denotes the octal O_xxx mask the file has been
-created with [see open(2) for details] and 'mnt_id' represents mount ID of
-the file system containing the opened file [see 3.5 /proc/<pid>/mountinfo
-for details].
+files have at least four fields -- 'pos', 'flags', 'mnt_id' and 'ino'.
+The 'pos' represents the current offset of the opened file in decimal
+form [see lseek(2) for details], 'flags' denotes the octal O_xxx mask the
+file has been created with [see open(2) for details] and 'mnt_id' represents
+mount ID of the file system containing the opened file [see 3.5
+/proc/<pid>/mountinfo for details]. 'ino' represents the inode number of
+the file.
 
 A typical output is::
 
 	pos:	0
 	flags:	0100002
 	mnt_id:	19
+	ino:	63107
 
 All locks associated with a file descriptor are shown in its fdinfo too::
 
@@ -1930,6 +1953,7 @@ Eventfd files
 	pos:	0
 	flags:	04002
 	mnt_id:	9
+	ino:	63107
 	eventfd-count:	5a
 
 where 'eventfd-count' is hex value of a counter.
@@ -1942,6 +1966,7 @@ Signalfd files
 	pos:	0
 	flags:	04002
 	mnt_id:	9
+	ino:	63107
 	sigmask:	0000000000000200
 
 where 'sigmask' is hex value of the signal mask associated
@@ -1955,6 +1980,7 @@ Epoll files
 	pos:	0
 	flags:	02
 	mnt_id:	9
+	ino:	63107
 	tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af sdev:7
 
 where 'tfd' is a target file descriptor number in decimal form,
@@ -1971,9 +1997,11 @@ For inotify files the format is the following::
 
 	pos:	0
 	flags:	02000000
+	mnt_id:	9
+	ino:	63107
 	inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
 
-where 'wd' is a watch descriptor in decimal form, ie a target file
+where 'wd' is a watch descriptor in decimal form, i.e. a target file
 descriptor number, 'ino' and 'sdev' are inode and device where the
 target file resides and the 'mask' is the mask of events, all in hex
 form [see inotify(7) for more details].
@@ -1993,6 +2021,7 @@ For fanotify files the format is::
 	pos:	0
 	flags:	02
 	mnt_id:	9
+	ino:	63107
 	fanotify flags:10 event-flags:0
 	fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
 	fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
@@ -2000,10 +2029,10 @@ For fanotify files the format is::
 where fanotify 'flags' and 'event-flags' are values used in fanotify_init
 call, 'mnt_id' is the mount point identifier, 'mflags' is the value of
 flags associated with mark which are tracked separately from events
-mask. 'ino', 'sdev' are target inode and device, 'mask' is the events
+mask. 'ino' and 'sdev' are target inode and device, 'mask' is the events
 mask and 'ignored_mask' is the mask of events which are to be ignored.
-All in hex format. Incorporation of 'mflags', 'mask' and 'ignored_mask'
-does provide information about flags and mask used in fanotify_mark
+All are in hex format. Incorporation of 'mflags', 'mask' and 'ignored_mask'
+provide information about flags and mask used in fanotify_mark
 call [see fsnotify manpage for details].
 
 While the first three lines are mandatory and always printed, the rest is
@@ -2017,6 +2046,7 @@ Timerfd files
 	pos:	0
 	flags:	02
 	mnt_id:	9
+	ino:	63107
 	clockid: 0
 	ticks: 0
 	settime flags: 01
@@ -2026,11 +2056,27 @@ Timerfd files
 where 'clockid' is the clock type and 'ticks' is the number of the timer expirations
 that have occurred [see timerfd_create(2) for details]. 'settime flags' are
 flags in octal form been used to setup the timer [see timerfd_settime(2) for
-details]. 'it_value' is remaining time until the timer exiration.
+details]. 'it_value' is remaining time until the timer expiration.
 'it_interval' is the interval for the timer. Note the timer might be set up
 with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value'
 still exhibits timer's remaining time.
 
+DMA Buffer files
+~~~~~~~~~~~~~~~~
+
+::
+
+	pos:	0
+	flags:	04002
+	mnt_id:	9
+	ino:	63107
+	size:   32768
+	count:  2
+	exp_name:  system-heap
+
+where 'size' is the size of the DMA buffer in bytes. 'count' is the file count of
+the DMA buffer file. 'exp_name' is the name of the DMA buffer exporter.
+
 3.9	/proc/<pid>/map_files - Information about memory mapped files
 ---------------------------------------------------------------------
 This directory contains symbolic links which represent memory mapped files
@@ -2056,13 +2102,13 @@ are actually shared.
 3.10	/proc/<pid>/timerslack_ns - Task timerslack value
 ---------------------------------------------------------
 This file provides the value of the task's timerslack value in nanoseconds.
-This value specifies a amount of time that normal timers may be deferred
+This value specifies an amount of time that normal timers may be deferred
 in order to coalesce timers and avoid unnecessary wakeups.
 
-This allows a task's interactivity vs power consumption trade off to be
+This allows a task's interactivity vs power consumption tradeoff to be
 adjusted.
 
-Writing 0 to the file will set the tasks timerslack to the default value.
+Writing 0 to the file will set the task's timerslack to the default value.
 
 Valid values are from 0 - ULLONG_MAX
 
@@ -2102,10 +2148,10 @@ Example
 Description
 ~~~~~~~~~~~
 
-x86 specific entries:
+x86 specific entries
 ~~~~~~~~~~~~~~~~~~~~~
 
-AVX512_elapsed_ms:
+AVX512_elapsed_ms
 ^^^^^^^^^^^^^^^^^^
 
   If AVX512 is supported on the machine, this entry shows the milliseconds
@@ -2131,8 +2177,24 @@ AVX512_elapsed_ms:
   the task is unlikely an AVX512 user, but depends on the workload and the
   scheduling scenario, it also could be a false negative mentioned above.
 
-Configuring procfs
-------------------
+3.13 /proc/<pid>/fd - List of symlinks to open files
+-------------------------------------------------------
+This directory contains symbolic links which represent open files
+the process is maintaining.  Example output::
+
+  lr-x------ 1 root root 64 Sep 20 17:53 0 -> /dev/null
+  l-wx------ 1 root root 64 Sep 20 17:53 1 -> /dev/null
+  lrwx------ 1 root root 64 Sep 20 17:53 10 -> 'socket:[12539]'
+  lrwx------ 1 root root 64 Sep 20 17:53 11 -> 'socket:[12540]'
+  lrwx------ 1 root root 64 Sep 20 17:53 12 -> 'socket:[12542]'
+
+The number of open files for the process is stored in 'size' member
+of stat() output for /proc/<pid>/fd for fast access.
+-------------------------------------------------------
+
+
+Chapter 4: Configuring procfs
+=============================
 
 4.1	Mount options
 ---------------------
@@ -2142,28 +2204,78 @@ The following mount options are supported:
 	=========	========================================================
 	hidepid=	Set /proc/<pid>/ access mode.
 	gid=		Set the group authorized to learn processes information.
+	subset=		Show only the specified subset of procfs.
 	=========	========================================================
 
-hidepid=0 means classic mode - everybody may access all /proc/<pid>/ directories
-(default).
-
-hidepid=1 means users may not access any /proc/<pid>/ directories but their
-own.  Sensitive files like cmdline, sched*, status are now protected against
-other users.  This makes it impossible to learn whether any user runs
-specific program (given the program doesn't reveal itself by its behaviour).
-As an additional bonus, as /proc/<pid>/cmdline is unaccessible for other users,
-poorly written programs passing sensitive information via program arguments are
-now protected against local eavesdroppers.
-
-hidepid=2 means hidepid=1 plus all /proc/<pid>/ will be fully invisible to other
-users.  It doesn't mean that it hides a fact whether a process with a specific
-pid value exists (it can be learned by other means, e.g. by "kill -0 $PID"),
-but it hides process' uid and gid, which may be learned by stat()'ing
-/proc/<pid>/ otherwise.  It greatly complicates an intruder's task of gathering
-information about running processes, whether some daemon runs with elevated
-privileges, whether other user runs some sensitive program, whether other users
-run any program at all, etc.
+hidepid=off or hidepid=0 means classic mode - everybody may access all
+/proc/<pid>/ directories (default).
+
+hidepid=noaccess or hidepid=1 means users may not access any /proc/<pid>/
+directories but their own.  Sensitive files like cmdline, sched*, status are now
+protected against other users.  This makes it impossible to learn whether any
+user runs specific program (given the program doesn't reveal itself by its
+behaviour).  As an additional bonus, as /proc/<pid>/cmdline is unaccessible for
+other users, poorly written programs passing sensitive information via program
+arguments are now protected against local eavesdroppers.
+
+hidepid=invisible or hidepid=2 means hidepid=1 plus all /proc/<pid>/ will be
+fully invisible to other users.  It doesn't mean that it hides a fact whether a
+process with a specific pid value exists (it can be learned by other means, e.g.
+by "kill -0 $PID"), but it hides process' uid and gid, which may be learned by
+stat()'ing /proc/<pid>/ otherwise.  It greatly complicates an intruder's task of
+gathering information about running processes, whether some daemon runs with
+elevated privileges, whether other user runs some sensitive program, whether
+other users run any program at all, etc.
+
+hidepid=ptraceable or hidepid=4 means that procfs should only contain
+/proc/<pid>/ directories that the caller can ptrace.
 
 gid= defines a group authorized to learn processes information otherwise
 prohibited by hidepid=.  If you use some daemon like identd which needs to learn
 information about processes information, just add identd to this group.
+
+subset=pid hides all top level files and directories in the procfs that
+are not related to tasks.
+
+Chapter 5: Filesystem behavior
+==============================
+
+Originally, before the advent of pid namespace, procfs was a global file
+system. It means that there was only one procfs instance in the system.
+
+When pid namespace was added, a separate procfs instance was mounted in
+each pid namespace. So, procfs mount options are global among all
+mountpoints within the same namespace::
+
+	# grep ^proc /proc/mounts
+	proc /proc proc rw,relatime,hidepid=2 0 0
+
+	# strace -e mount mount -o hidepid=1 -t proc proc /tmp/proc
+	mount("proc", "/tmp/proc", "proc", 0, "hidepid=1") = 0
+	+++ exited with 0 +++
+
+	# grep ^proc /proc/mounts
+	proc /proc proc rw,relatime,hidepid=2 0 0
+	proc /tmp/proc proc rw,relatime,hidepid=2 0 0
+
+and only after remounting procfs mount options will change at all
+mountpoints::
+
+	# mount -o remount,hidepid=1 -t proc proc /tmp/proc
+
+	# grep ^proc /proc/mounts
+	proc /proc proc rw,relatime,hidepid=1 0 0
+	proc /tmp/proc proc rw,relatime,hidepid=1 0 0
+
+This behavior is different from the behavior of other filesystems.
+
+The new procfs behavior is more like other filesystems. Each procfs mount
+creates a new procfs instance. Mount options affect own procfs instance.
+It means that it became possible to have several procfs instances
+displaying tasks with different filtering options in one pid namespace::
+
+	# mount -o hidepid=invisible -t proc proc /proc
+	# mount -o hidepid=noaccess -t proc proc /tmp/proc
+	# grep ^proc /proc/mounts
+	proc /proc proc rw,relatime,hidepid=invisible 0 0
+	proc /tmp/proc proc rw,relatime,hidepid=noaccess 0 0