aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/PCI
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/PCI')
-rw-r--r--Documentation/PCI/acpi-info.rst18
-rw-r--r--Documentation/PCI/boot-interrupts.rst36
-rw-r--r--Documentation/PCI/endpoint/function/binding/pci-ntb.rst38
-rw-r--r--Documentation/PCI/endpoint/function/binding/pci-test.rst26
-rw-r--r--Documentation/PCI/endpoint/function/binding/pci-test.txt19
-rw-r--r--Documentation/PCI/endpoint/index.rst7
-rw-r--r--Documentation/PCI/endpoint/pci-endpoint-cfs.rst26
-rw-r--r--Documentation/PCI/endpoint/pci-endpoint.rst18
-rw-r--r--Documentation/PCI/endpoint/pci-ntb-function.rst348
-rw-r--r--Documentation/PCI/endpoint/pci-ntb-howto.rst158
-rw-r--r--Documentation/PCI/endpoint/pci-vntb-function.rst129
-rw-r--r--Documentation/PCI/endpoint/pci-vntb-howto.rst164
-rw-r--r--Documentation/PCI/index.rst7
-rw-r--r--Documentation/PCI/msi-howto.rst12
-rw-r--r--Documentation/PCI/pci-error-recovery.rst36
-rw-r--r--Documentation/PCI/pci-iov-howto.rst7
-rw-r--r--Documentation/PCI/pci.rst34
-rw-r--r--Documentation/PCI/pcieaer-howto.rst183
-rw-r--r--Documentation/PCI/pciebus-howto.rst14
-rw-r--r--Documentation/PCI/sysfs-pci.rst138
20 files changed, 1199 insertions, 219 deletions
diff --git a/Documentation/PCI/acpi-info.rst b/Documentation/PCI/acpi-info.rst
index 060217081c79..34c64a5a66ec 100644
--- a/Documentation/PCI/acpi-info.rst
+++ b/Documentation/PCI/acpi-info.rst
@@ -22,9 +22,9 @@ or if the device has INTx interrupts connected by platform interrupt
controllers and a _PRT is needed to describe those connections.
ACPI resource description is done via _CRS objects of devices in the ACPI
-namespace [2].   The _CRS is like a generalized PCI BAR: the OS can read
+namespace [2]. The _CRS is like a generalized PCI BAR: the OS can read
_CRS and figure out what resource is being consumed even if it doesn't have
-a driver for the device [3].  That's important because it means an old OS
+a driver for the device [3]. That's important because it means an old OS
can work correctly even on a system with new devices unknown to the OS.
The new devices might not do anything, but the OS can at least make sure no
resources conflict with them.
@@ -41,15 +41,15 @@ ACPI, that device will have a specific _HID/_CID that tells the OS what
driver to bind to it, and the _CRS tells the OS and the driver where the
device's registers are.
-PCI host bridges are PNP0A03 or PNP0A08 devices.  Their _CRS should
-describe all the address space they consume.  This includes all the windows
+PCI host bridges are PNP0A03 or PNP0A08 devices. Their _CRS should
+describe all the address space they consume. This includes all the windows
they forward down to the PCI bus, as well as registers of the host bridge
-itself that are not forwarded to PCI.  The host bridge registers include
+itself that are not forwarded to PCI. The host bridge registers include
things like secondary/subordinate bus registers that determine the bus
range below the bridge, window registers that describe the apertures, etc.
These are all device-specific, non-architected things, so the only way a
PNP0A03/PNP0A08 driver can manage them is via _PRS/_CRS/_SRS, which contain
-the device-specific details.  The host bridge registers also include ECAM
+the device-specific details. The host bridge registers also include ECAM
space, since it is consumed by the host bridge.
ACPI defines a Consumer/Producer bit to distinguish the bridge registers
@@ -66,7 +66,7 @@ the PNP0A03/PNP0A08 device itself. The workaround was to describe the
bridge registers (including ECAM space) in PNP0C02 catch-all devices [6].
With the exception of ECAM, the bridge register space is device-specific
anyway, so the generic PNP0A03/PNP0A08 driver (pci_root.c) has no need to
-know about it.  
+know about it.
New architectures should be able to use "Consumer" Extended Address Space
descriptors in the PNP0A03 device for bridge registers, including ECAM,
@@ -75,9 +75,9 @@ ia64 kernels assume all address space descriptors, including "Consumer"
Extended Address Space ones, are windows, so it would not be safe to
describe bridge registers this way on those architectures.
-PNP0C02 "motherboard" devices are basically a catch-all.  There's no
+PNP0C02 "motherboard" devices are basically a catch-all. There's no
programming model for them other than "don't use these resources for
-anything else."  So a PNP0C02 _CRS should claim any address space that is
+anything else." So a PNP0C02 _CRS should claim any address space that is
(1) not claimed by _CRS under any other device object in the ACPI namespace
and (2) should not be assigned by the OS to something else.
diff --git a/Documentation/PCI/boot-interrupts.rst b/Documentation/PCI/boot-interrupts.rst
index d078ef3eb192..931077bb0953 100644
--- a/Documentation/PCI/boot-interrupts.rst
+++ b/Documentation/PCI/boot-interrupts.rst
@@ -32,12 +32,13 @@ interrupt goes unhandled over time, they are tracked by the Linux kernel as
Spurious Interrupts. The IRQ will be disabled by the Linux kernel after it
reaches a specific count with the error "nobody cared". This disabled IRQ
now prevents valid usage by an existing interrupt which may happen to share
-the IRQ line.
+the IRQ line::
irq 19: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 2988 Comm: irq/34-nipalk Tainted: 4.14.87-rt49-02410-g4a640ec-dirty #1
Hardware name: National Instruments NI PXIe-8880/NI PXIe-8880, BIOS 2.1.5f1 01/09/2020
Call Trace:
+
<IRQ>
? dump_stack+0x46/0x5e
? __report_bad_irq+0x2e/0xb0
@@ -60,7 +61,7 @@ Conditions
==========
The use of threaded interrupts is the most likely condition to trigger
-this problem today. Threaded interrupts may not be reenabled after the IRQ
+this problem today. Threaded interrupts may not be re-enabled after the IRQ
handler wakes. These "one shot" conditions mean that the threaded interrupt
needs to keep the interrupt line masked until the threaded handler has run.
Especially when dealing with high data rate interrupts, the thread needs to
@@ -85,15 +86,18 @@ Mitigations
The mitigations take the form of PCI quirks. The preference has been to
first identify and make use of a means to disable the routing to the PCH.
In such a case a quirk to disable boot interrupt generation can be
-added.[1]
+added. [1]_
- Intel® 6300ESB I/O Controller Hub
+Intel® 6300ESB I/O Controller Hub
Alternate Base Address Register:
BIE: Boot Interrupt Enable
- 0 = Boot interrupt is enabled.
- 1 = Boot interrupt is disabled.
- Intel® Sandy Bridge through Sky Lake based Xeon servers:
+ == ===========================
+ 0 Boot interrupt is enabled.
+ 1 Boot interrupt is disabled.
+ == ===========================
+
+Intel® Sandy Bridge through Sky Lake based Xeon servers:
Coherent Interface Protocol Interrupt Control
dis_intx_route2pch/dis_intx_route2ich/dis_intx_route2dmi2:
When this bit is set. Local INTx messages received from the
@@ -109,12 +113,12 @@ line by default. Therefore, on chipsets where this INTx routing cannot be
disabled, the Linux kernel will reroute the valid interrupt to its legacy
interrupt. This redirection of the handler will prevent the occurrence of
the spurious interrupt detection which would ordinarily disable the IRQ
-line due to excessive unhandled counts.[2]
+line due to excessive unhandled counts. [2]_
The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS exists to enable (or
disable) the redirection of the interrupt handler to the PCH interrupt
line. The option can be overridden by either pci=ioapicreroute or
-pci=noioapicreroute.[3]
+pci=noioapicreroute. [3]_
More Documentation
@@ -127,19 +131,19 @@ into the evolution of its handling with chipsets.
Example of disabling of the boot interrupt
------------------------------------------
-Intel® 6300ESB I/O Controller Hub (Document # 300641-004US)
+ - Intel® 6300ESB I/O Controller Hub (Document # 300641-004US)
5.7.3 Boot Interrupt
https://www.intel.com/content/dam/doc/datasheet/6300esb-io-controller-hub-datasheet.pdf
-Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families
-Datasheet - Volume 2: Registers (Document # 330784-003)
+ - Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families
+ Datasheet - Volume 2: Registers (Document # 330784-003)
6.6.41 cipintrc Coherent Interface Protocol Interrupt Control
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf
Example of handler rerouting
----------------------------
-Intel® 6700PXH 64-bit PCI Hub (Document # 302628)
+ - Intel® 6700PXH 64-bit PCI Hub (Document # 302628)
2.15.2 PCI Express Legacy INTx Support and Boot Interrupt
https://www.intel.com/content/dam/doc/datasheet/6700pxh-64-bit-pci-hub-datasheet.pdf
@@ -150,6 +154,6 @@ Cheers,
Sean V Kelley
sean.v.kelley@linux.intel.com
-[1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/
-[2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/
-[3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/
+.. [1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/
+.. [2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/
+.. [3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/
diff --git a/Documentation/PCI/endpoint/function/binding/pci-ntb.rst b/Documentation/PCI/endpoint/function/binding/pci-ntb.rst
new file mode 100644
index 000000000000..40253d3d5163
--- /dev/null
+++ b/Documentation/PCI/endpoint/function/binding/pci-ntb.rst
@@ -0,0 +1,38 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================
+PCI NTB Endpoint Function
+==========================
+
+1) Create a subdirectory to pci_epf_ntb directory in configfs.
+
+Standard EPF Configurable Fields:
+
+================ ===========================================================
+vendorid should be 0x104c
+deviceid should be 0xb00d for TI's J721E SoC
+revid don't care
+progif_code don't care
+subclass_code should be 0x00
+baseclass_code should be 0x5
+cache_line_size don't care
+subsys_vendor_id don't care
+subsys_id don't care
+interrupt_pin don't care
+msi_interrupts don't care
+msix_interrupts don't care
+================ ===========================================================
+
+2) Create a subdirectory to directory created in 1
+
+NTB EPF specific configurable fields:
+
+================ ===========================================================
+db_count Number of doorbells; default = 4
+mw1 size of memory window1
+mw2 size of memory window2
+mw3 size of memory window3
+mw4 size of memory window4
+num_mws Number of memory windows; max = 4
+spad_count Number of scratchpad registers; default = 64
+================ ===========================================================
diff --git a/Documentation/PCI/endpoint/function/binding/pci-test.rst b/Documentation/PCI/endpoint/function/binding/pci-test.rst
new file mode 100644
index 000000000000..57ee866fb165
--- /dev/null
+++ b/Documentation/PCI/endpoint/function/binding/pci-test.rst
@@ -0,0 +1,26 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================
+PCI Test Endpoint Function
+==========================
+
+name: Should be "pci_epf_test" to bind to the pci_epf_test driver.
+
+Configurable Fields:
+
+================ ===========================================================
+vendorid should be 0x104c
+deviceid should be 0xb500 for DRA74x and 0xb501 for DRA72x
+revid don't care
+progif_code don't care
+subclass_code don't care
+baseclass_code should be 0xff
+cache_line_size don't care
+subsys_vendor_id don't care
+subsys_id don't care
+interrupt_pin Should be 1 - INTA, 2 - INTB, 3 - INTC, 4 -INTD
+msi_interrupts Should be 1 to 32 depending on the number of MSI interrupts
+ to test
+msix_interrupts Should be 1 to 2048 depending on the number of MSI-X
+ interrupts to test
+================ ===========================================================
diff --git a/Documentation/PCI/endpoint/function/binding/pci-test.txt b/Documentation/PCI/endpoint/function/binding/pci-test.txt
deleted file mode 100644
index cd76ba47394b..000000000000
--- a/Documentation/PCI/endpoint/function/binding/pci-test.txt
+++ /dev/null
@@ -1,19 +0,0 @@
-PCI TEST ENDPOINT FUNCTION
-
-name: Should be "pci_epf_test" to bind to the pci_epf_test driver.
-
-Configurable Fields:
-vendorid : should be 0x104c
-deviceid : should be 0xb500 for DRA74x and 0xb501 for DRA72x
-revid : don't care
-progif_code : don't care
-subclass_code : don't care
-baseclass_code : should be 0xff
-cache_line_size : don't care
-subsys_vendor_id : don't care
-subsys_id : don't care
-interrupt_pin : Should be 1 - INTA, 2 - INTB, 3 - INTC, 4 -INTD
-msi_interrupts : Should be 1 to 32 depending on the number of MSI interrupts
- to test
-msix_interrupts : Should be 1 to 2048 depending on the number of MSI-X
- interrupts to test
diff --git a/Documentation/PCI/endpoint/index.rst b/Documentation/PCI/endpoint/index.rst
index d114ea74b444..4d2333e7ae06 100644
--- a/Documentation/PCI/endpoint/index.rst
+++ b/Documentation/PCI/endpoint/index.rst
@@ -11,3 +11,10 @@ PCI Endpoint Framework
pci-endpoint-cfs
pci-test-function
pci-test-howto
+ pci-ntb-function
+ pci-ntb-howto
+ pci-vntb-function
+ pci-vntb-howto
+
+ function/binding/pci-test
+ function/binding/pci-ntb
diff --git a/Documentation/PCI/endpoint/pci-endpoint-cfs.rst b/Documentation/PCI/endpoint/pci-endpoint-cfs.rst
index b6d39cdec56e..fb73345cfb8a 100644
--- a/Documentation/PCI/endpoint/pci-endpoint-cfs.rst
+++ b/Documentation/PCI/endpoint/pci-endpoint-cfs.rst
@@ -24,7 +24,7 @@ Directory Structure
The pci_ep configfs has two directories at its root: controllers and
functions. Every EPC device present in the system will have an entry in
-the *controllers* directory and and every EPF driver present in the system
+the *controllers* directory and every EPF driver present in the system
will have an entry in the *functions* directory.
::
@@ -43,6 +43,7 @@ entries corresponding to EPF driver will be created by the EPF core.
.. <EPF Driver1>/
... <EPF Device 11>/
... <EPF Device 21>/
+ ... <EPF Device 31>/
.. <EPF Driver2>/
... <EPF Device 12>/
... <EPF Device 22>/
@@ -68,6 +69,24 @@ created)
... subsys_vendor_id
... subsys_id
... interrupt_pin
+ ... <Symlink EPF Device 31>/
+ ... primary/
+ ... <Symlink EPC Device1>/
+ ... secondary/
+ ... <Symlink EPC Device2>/
+
+If an EPF device has to be associated with 2 EPCs (like in the case of
+Non-transparent bridge), symlink of endpoint controller connected to primary
+interface should be added in 'primary' directory and symlink of endpoint
+controller connected to secondary interface should be added in 'secondary'
+directory.
+
+The <EPF Device> directory can have a list of symbolic links
+(<Symlink EPF Device 31>) to other <EPF Device>. These symbolic links should
+be created by the user to represent the virtual functions that are bound to
+the physical function. In the above directory structure <EPF Device 11> is a
+physical function and <EPF Device 31> is a virtual function. An EPF device once
+it's linked to another EPF device, cannot be linked to a EPC device.
EPC Device
==========
@@ -88,7 +107,8 @@ entries corresponding to EPC device will be created by the EPC core.
The <EPC Device> directory will have a list of symbolic links to
<EPF Device>. These symbolic links should be created by the user to
-represent the functions present in the endpoint device.
+represent the functions present in the endpoint device. Only <EPF Device>
+that represents a physical function can be linked to a EPC device.
The <EPC Device> directory will also have a *start* field. Once
"1" is written to this field, the endpoint device will be ready to
@@ -115,4 +135,4 @@ all the EPF devices are created and linked with the EPC device.
| interrupt_pin
| function
-[1] :doc:`pci-endpoint`
+[1] Documentation/PCI/endpoint/pci-endpoint.rst
diff --git a/Documentation/PCI/endpoint/pci-endpoint.rst b/Documentation/PCI/endpoint/pci-endpoint.rst
index 0e2311b5617b..4f5622a65555 100644
--- a/Documentation/PCI/endpoint/pci-endpoint.rst
+++ b/Documentation/PCI/endpoint/pci-endpoint.rst
@@ -78,8 +78,8 @@ by the PCI controller driver.
Cleanup the pci_epc_mem structure allocated during pci_epc_mem_init().
-APIs for the PCI Endpoint Function Driver
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+EPC APIs for the PCI Endpoint Function Driver
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section lists the APIs that the PCI Endpoint core provides to be used
by the PCI endpoint function driver.
@@ -117,8 +117,8 @@ by the PCI endpoint function driver.
The PCI endpoint function driver should use pci_epc_mem_free_addr() to
free the memory space allocated using pci_epc_mem_alloc_addr().
-Other APIs
-~~~~~~~~~~
+Other EPC APIs
+~~~~~~~~~~~~~~
There are other APIs provided by the EPC library. These are used for binding
the EPF device with EPC device. pci-ep-cfs.c can be used as reference for
@@ -160,8 +160,8 @@ PCI Endpoint Function(EPF) Library
The EPF library provides APIs to be used by the function driver and the EPC
library to provide endpoint mode functionality.
-APIs for the PCI Endpoint Function Driver
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+EPF APIs for the PCI Endpoint Function Driver
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section lists the APIs that the PCI Endpoint core provides to be used
by the PCI endpoint function driver.
@@ -204,8 +204,8 @@ by the PCI endpoint controller library.
The PCI endpoint controller library invokes pci_epf_linkup() when the
EPC device has established the connection to the host.
-Other APIs
-~~~~~~~~~~
+Other EPF APIs
+~~~~~~~~~~~~~~
There are other APIs provided by the EPF library. These are used to notify
the function driver when the EPF device is bound to the EPC device.
@@ -214,7 +214,7 @@ pci-ep-cfs.c can be used as reference for using these APIs.
* pci_epf_create()
Create a new PCI EPF device by passing the name of the PCI EPF device.
- This name will be used to bind the the EPF device to a EPF driver.
+ This name will be used to bind the EPF device to a EPF driver.
* pci_epf_destroy()
diff --git a/Documentation/PCI/endpoint/pci-ntb-function.rst b/Documentation/PCI/endpoint/pci-ntb-function.rst
new file mode 100644
index 000000000000..3b9d836a4924
--- /dev/null
+++ b/Documentation/PCI/endpoint/pci-ntb-function.rst
@@ -0,0 +1,348 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+PCI NTB Function
+=================
+
+:Author: Kishon Vijay Abraham I <kishon@ti.com>
+
+PCI Non-Transparent Bridges (NTB) allow two host systems to communicate
+with each other by exposing each host as a device to the other host.
+NTBs typically support the ability to generate interrupts on the remote
+machine, expose memory ranges as BARs, and perform DMA. They also support
+scratchpads, which are areas of memory within the NTB that are accessible
+from both machines.
+
+PCI NTB Function allows two different systems (or hosts) to communicate
+with each other by configuring the endpoint instances in such a way that
+transactions from one system are routed to the other system.
+
+In the below diagram, PCI NTB function configures the SoC with multiple
+PCI Endpoint (EP) instances in such a way that transactions from one EP
+controller are routed to the other EP controller. Once PCI NTB function
+configures the SoC with multiple EP instances, HOST1 and HOST2 can
+communicate with each other using SoC as a bridge.
+
+.. code-block:: text
+
+ +-------------+ +-------------+
+ | | | |
+ | HOST1 | | HOST2 |
+ | | | |
+ +------^------+ +------^------+
+ | |
+ | |
+ +---------|-------------------------------------------------|---------+
+ | +------v------+ +------v------+ |
+ | | | | | |
+ | | EP | | EP | |
+ | | CONTROLLER1 | | CONTROLLER2 | |
+ | | <-----------------------------------> | |
+ | | | | | |
+ | | | | | |
+ | | | SoC With Multiple EP Instances | | |
+ | | | (Configured using NTB Function) | | |
+ | +-------------+ +-------------+ |
+ +---------------------------------------------------------------------+
+
+Constructs used for Implementing NTB
+====================================
+
+ 1) Config Region
+ 2) Self Scratchpad Registers
+ 3) Peer Scratchpad Registers
+ 4) Doorbell (DB) Registers
+ 5) Memory Window (MW)
+
+
+Config Region:
+--------------
+
+Config Region is a construct that is specific to NTB implemented using NTB
+Endpoint Function Driver. The host and endpoint side NTB function driver will
+exchange information with each other using this region. Config Region has
+Control/Status Registers for configuring the Endpoint Controller. Host can
+write into this region for configuring the outbound Address Translation Unit
+(ATU) and to indicate the link status. Endpoint can indicate the status of
+commands issued by host in this region. Endpoint can also indicate the
+scratchpad offset and number of memory windows to the host using this region.
+
+The format of Config Region is given below. All the fields here are 32 bits.
+
+.. code-block:: text
+
+ +------------------------+
+ | COMMAND |
+ +------------------------+
+ | ARGUMENT |
+ +------------------------+
+ | STATUS |
+ +------------------------+
+ | TOPOLOGY |
+ +------------------------+
+ | ADDRESS (LOWER 32) |
+ +------------------------+
+ | ADDRESS (UPPER 32) |
+ +------------------------+
+ | SIZE |
+ +------------------------+
+ | NO OF MEMORY WINDOW |
+ +------------------------+
+ | MEMORY WINDOW1 OFFSET |
+ +------------------------+
+ | SPAD OFFSET |
+ +------------------------+
+ | SPAD COUNT |
+ +------------------------+
+ | DB ENTRY SIZE |
+ +------------------------+
+ | DB DATA |
+ +------------------------+
+ | : |
+ +------------------------+
+ | : |
+ +------------------------+
+ | DB DATA |
+ +------------------------+
+
+
+ COMMAND:
+
+ NTB function supports three commands:
+
+ CMD_CONFIGURE_DOORBELL (0x1): Command to configure doorbell. Before
+ invoking this command, the host should allocate and initialize
+ MSI/MSI-X vectors (i.e., initialize the MSI/MSI-X Capability in the
+ Endpoint). The endpoint on receiving this command will configure
+ the outbound ATU such that transactions to Doorbell BAR will be routed
+ to the MSI/MSI-X address programmed by the host. The ARGUMENT
+ register should be populated with number of DBs to configure (in the
+ lower 16 bits) and if MSI or MSI-X should be configured (BIT 16).
+
+ CMD_CONFIGURE_MW (0x2): Command to configure memory window (MW). The
+ host invokes this command after allocating a buffer that can be
+ accessed by remote host. The allocated address should be programmed
+ in the ADDRESS register (64 bit), the size should be programmed in
+ the SIZE register and the memory window index should be programmed
+ in the ARGUMENT register. The endpoint on receiving this command
+ will configure the outbound ATU such that transactions to MW BAR
+ are routed to the address provided by the host.
+
+ CMD_LINK_UP (0x3): Command to indicate an NTB application is
+ bound to the EP device on the host side. Once the endpoint
+ receives this command from both the hosts, the endpoint will
+ raise a LINK_UP event to both the hosts to indicate the host
+ NTB applications can start communicating with each other.
+
+ ARGUMENT:
+
+ The value of this register is based on the commands issued in
+ command register. See COMMAND section for more information.
+
+ TOPOLOGY:
+
+ Set to NTB_TOPO_B2B_USD for Primary interface
+ Set to NTB_TOPO_B2B_DSD for Secondary interface
+
+ ADDRESS/SIZE:
+
+ Address and Size to be used while configuring the memory window.
+ See "CMD_CONFIGURE_MW" for more info.
+
+ MEMORY WINDOW1 OFFSET:
+
+ Memory Window 1 and Doorbell registers are packed together in the
+ same BAR. The initial portion of the region will have doorbell
+ registers and the latter portion of the region is for memory window 1.
+ This register will specify the offset of the memory window 1.
+
+ NO OF MEMORY WINDOW:
+
+ Specifies the number of memory windows supported by the NTB device.
+
+ SPAD OFFSET:
+
+ Self scratchpad region and config region are packed together in the
+ same BAR. The initial portion of the region will have config region
+ and the latter portion of the region is for self scratchpad. This
+ register will specify the offset of the self scratchpad registers.
+
+ SPAD COUNT:
+
+ Specifies the number of scratchpad registers supported by the NTB
+ device.
+
+ DB ENTRY SIZE:
+
+ Used to determine the offset within the DB BAR that should be written
+ in order to raise doorbell. EPF NTB can use either MSI or MSI-X to
+ ring doorbell (MSI-X support will be added later). MSI uses same
+ address for all the interrupts and MSI-X can provide different
+ addresses for different interrupts. The MSI/MSI-X address is provided
+ by the host and the address it gives is based on the MSI/MSI-X
+ implementation supported by the host. For instance, ARM platform
+ using GIC ITS will have the same MSI-X address for all the interrupts.
+ In order to support all the combinations and use the same mechanism
+ for both MSI and MSI-X, EPF NTB allocates a separate region in the
+ Outbound Address Space for each of the interrupts. This region will
+ be mapped to the MSI/MSI-X address provided by the host. If a host
+ provides the same address for all the interrupts, all the regions
+ will be translated to the same address. If a host provides different
+ addresses, the regions will be translated to different addresses. This
+ will ensure there is no difference while raising the doorbell.
+
+ DB DATA:
+
+ EPF NTB supports 32 interrupts, so there are 32 DB DATA registers.
+ This holds the MSI/MSI-X data that has to be written to MSI address
+ for raising doorbell interrupt. This will be populated by EPF NTB
+ while invoking CMD_CONFIGURE_DOORBELL.
+
+Scratchpad Registers:
+---------------------
+
+ Each host has its own register space allocated in the memory of NTB endpoint
+ controller. They are both readable and writable from both sides of the bridge.
+ They are used by applications built over NTB and can be used to pass control
+ and status information between both sides of a device.
+
+ Scratchpad registers has 2 parts
+ 1) Self Scratchpad: Host's own register space
+ 2) Peer Scratchpad: Remote host's register space.
+
+Doorbell Registers:
+-------------------
+
+ Doorbell Registers are used by the hosts to interrupt each other.
+
+Memory Window:
+--------------
+
+ Actual transfer of data between the two hosts will happen using the
+ memory window.
+
+Modeling Constructs:
+====================
+
+There are 5 or more distinct regions (config, self scratchpad, peer
+scratchpad, doorbell, one or more memory windows) to be modeled to achieve
+NTB functionality. At least one memory window is required while more than
+one is permitted. All these regions should be mapped to BARs for hosts to
+access these regions.
+
+If one 32-bit BAR is allocated for each of these regions, the scheme would
+look like this:
+
+====== ===============
+BAR NO CONSTRUCTS USED
+====== ===============
+BAR0 Config Region
+BAR1 Self Scratchpad
+BAR2 Peer Scratchpad
+BAR3 Doorbell
+BAR4 Memory Window 1
+BAR5 Memory Window 2
+====== ===============
+
+However if we allocate a separate BAR for each of the regions, there would not
+be enough BARs for all the regions in a platform that supports only 64-bit
+BARs.
+
+In order to be supported by most of the platforms, the regions should be
+packed and mapped to BARs in a way that provides NTB functionality and
+also makes sure the host doesn't access any region that it is not supposed
+to.
+
+The following scheme is used in EPF NTB Function:
+
+====== ===============================
+BAR NO CONSTRUCTS USED
+====== ===============================
+BAR0 Config Region + Self Scratchpad
+BAR1 Peer Scratchpad
+BAR2 Doorbell + Memory Window 1
+BAR3 Memory Window 2
+BAR4 Memory Window 3
+BAR5 Memory Window 4
+====== ===============================
+
+With this scheme, for the basic NTB functionality 3 BARs should be sufficient.
+
+Modeling Config/Scratchpad Region:
+----------------------------------
+
+.. code-block:: text
+
+ +-----------------+------->+------------------+ +-----------------+
+ | BAR0 | | CONFIG REGION | | BAR0 |
+ +-----------------+----+ +------------------+<-------+-----------------+
+ | BAR1 | | |SCRATCHPAD REGION | | BAR1 |
+ +-----------------+ +-->+------------------+<-------+-----------------+
+ | BAR2 | Local Memory | BAR2 |
+ +-----------------+ +-----------------+
+ | BAR3 | | BAR3 |
+ +-----------------+ +-----------------+
+ | BAR4 | | BAR4 |
+ +-----------------+ +-----------------+
+ | BAR5 | | BAR5 |
+ +-----------------+ +-----------------+
+ EP CONTROLLER 1 EP CONTROLLER 2
+
+Above diagram shows Config region + Scratchpad region for HOST1 (connected to
+EP controller 1) allocated in local memory. The HOST1 can access the config
+region and scratchpad region (self scratchpad) using BAR0 of EP controller 1.
+The peer host (HOST2 connected to EP controller 2) can also access this
+scratchpad region (peer scratchpad) using BAR1 of EP controller 2. This
+diagram shows the case where Config region and Scratchpad regions are allocated
+for HOST1, however the same is applicable for HOST2.
+
+Modeling Doorbell/Memory Window 1:
+----------------------------------
+
+.. code-block:: text
+
+ +-----------------+ +----->+----------------+-----------+-----------------+
+ | BAR0 | | | Doorbell 1 +-----------> MSI-X ADDRESS 1 |
+ +-----------------+ | +----------------+ +-----------------+
+ | BAR1 | | | Doorbell 2 +---------+ | |
+ +-----------------+----+ +----------------+ | | |
+ | BAR2 | | Doorbell 3 +-------+ | +-----------------+
+ +-----------------+----+ +----------------+ | +-> MSI-X ADDRESS 2 |
+ | BAR3 | | | Doorbell 4 +-----+ | +-----------------+
+ +-----------------+ | |----------------+ | | | |
+ | BAR4 | | | | | | +-----------------+
+ +-----------------+ | | MW1 +---+ | +-->+ MSI-X ADDRESS 3||
+ | BAR5 | | | | | | +-----------------+
+ +-----------------+ +----->-----------------+ | | | |
+ EP CONTROLLER 1 | | | | +-----------------+
+ | | | +---->+ MSI-X ADDRESS 4 |
+ +----------------+ | +-----------------+
+ EP CONTROLLER 2 | | |
+ (OB SPACE) | | |
+ +-------> MW1 |
+ | |
+ | |
+ +-----------------+
+ | |
+ | |
+ | |
+ | |
+ | |
+ +-----------------+
+ PCI Address Space
+ (Managed by HOST2)
+
+Above diagram shows how the doorbell and memory window 1 is mapped so that
+HOST1 can raise doorbell interrupt on HOST2 and also how HOST1 can access
+buffers exposed by HOST2 using memory window1 (MW1). Here doorbell and
+memory window 1 regions are allocated in EP controller 2 outbound (OB) address
+space. Allocating and configuring BARs for doorbell and memory window1
+is done during the initialization phase of NTB endpoint function driver.
+Mapping from EP controller 2 OB space to PCI address space is done when HOST2
+sends CMD_CONFIGURE_MW/CMD_CONFIGURE_DOORBELL.
+
+Modeling Optional Memory Windows:
+---------------------------------
+
+This is modeled the same was as MW1 but each of the additional memory windows
+is mapped to separate BARs.
diff --git a/Documentation/PCI/endpoint/pci-ntb-howto.rst b/Documentation/PCI/endpoint/pci-ntb-howto.rst
new file mode 100644
index 000000000000..4261e7157ef1
--- /dev/null
+++ b/Documentation/PCI/endpoint/pci-ntb-howto.rst
@@ -0,0 +1,158 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================================================
+PCI Non-Transparent Bridge (NTB) Endpoint Function (EPF) User Guide
+===================================================================
+
+:Author: Kishon Vijay Abraham I <kishon@ti.com>
+
+This document is a guide to help users use pci-epf-ntb function driver
+and ntb_hw_epf host driver for NTB functionality. The list of steps to
+be followed in the host side and EP side is given below. For the hardware
+configuration and internals of NTB using configurable endpoints see
+Documentation/PCI/endpoint/pci-ntb-function.rst
+
+Endpoint Device
+===============
+
+Endpoint Controller Devices
+---------------------------
+
+For implementing NTB functionality at least two endpoint controller devices
+are required.
+
+To find the list of endpoint controller devices in the system::
+
+ # ls /sys/class/pci_epc/
+ 2900000.pcie-ep 2910000.pcie-ep
+
+If PCI_ENDPOINT_CONFIGFS is enabled::
+
+ # ls /sys/kernel/config/pci_ep/controllers
+ 2900000.pcie-ep 2910000.pcie-ep
+
+
+Endpoint Function Drivers
+-------------------------
+
+To find the list of endpoint function drivers in the system::
+
+ # ls /sys/bus/pci-epf/drivers
+ pci_epf_ntb pci_epf_ntb
+
+If PCI_ENDPOINT_CONFIGFS is enabled::
+
+ # ls /sys/kernel/config/pci_ep/functions
+ pci_epf_ntb pci_epf_ntb
+
+
+Creating pci-epf-ntb Device
+----------------------------
+
+PCI endpoint function device can be created using the configfs. To create
+pci-epf-ntb device, the following commands can be used::
+
+ # mount -t configfs none /sys/kernel/config
+ # cd /sys/kernel/config/pci_ep/
+ # mkdir functions/pci_epf_ntb/func1
+
+The "mkdir func1" above creates the pci-epf-ntb function device that will
+be probed by pci_epf_ntb driver.
+
+The PCI endpoint framework populates the directory with the following
+configurable fields::
+
+ # ls functions/pci_epf_ntb/func1
+ baseclass_code deviceid msi_interrupts pci-epf-ntb.0
+ progif_code secondary subsys_id vendorid
+ cache_line_size interrupt_pin msix_interrupts primary
+ revid subclass_code subsys_vendor_id
+
+The PCI endpoint function driver populates these entries with default values
+when the device is bound to the driver. The pci-epf-ntb driver populates
+vendorid with 0xffff and interrupt_pin with 0x0001::
+
+ # cat functions/pci_epf_ntb/func1/vendorid
+ 0xffff
+ # cat functions/pci_epf_ntb/func1/interrupt_pin
+ 0x0001
+
+
+Configuring pci-epf-ntb Device
+-------------------------------
+
+The user can configure the pci-epf-ntb device using its configfs entry. In order
+to change the vendorid and the deviceid, the following
+commands can be used::
+
+ # echo 0x104c > functions/pci_epf_ntb/func1/vendorid
+ # echo 0xb00d > functions/pci_epf_ntb/func1/deviceid
+
+The PCI endpoint framework also automatically creates a sub-directory in the
+function attribute directory. This sub-directory has the same name as the name
+of the function device and is populated with the following NTB specific
+attributes that can be configured by the user::
+
+ # ls functions/pci_epf_ntb/func1/pci_epf_ntb.0/
+ db_count mw1 mw2 mw3 mw4 num_mws
+ spad_count
+
+A sample configuration for NTB function is given below::
+
+ # echo 4 > functions/pci_epf_ntb/func1/pci_epf_ntb.0/db_count
+ # echo 128 > functions/pci_epf_ntb/func1/pci_epf_ntb.0/spad_count
+ # echo 2 > functions/pci_epf_ntb/func1/pci_epf_ntb.0/num_mws
+ # echo 0x100000 > functions/pci_epf_ntb/func1/pci_epf_ntb.0/mw1
+ # echo 0x100000 > functions/pci_epf_ntb/func1/pci_epf_ntb.0/mw2
+
+Binding pci-epf-ntb Device to EP Controller
+--------------------------------------------
+
+NTB function device should be attached to two PCI endpoint controllers
+connected to the two hosts. Use the 'primary' and 'secondary' entries
+inside NTB function device to attach one PCI endpoint controller to
+primary interface and the other PCI endpoint controller to the secondary
+interface::
+
+ # ln -s controllers/2900000.pcie-ep/ functions/pci-epf-ntb/func1/primary
+ # ln -s controllers/2910000.pcie-ep/ functions/pci-epf-ntb/func1/secondary
+
+Once the above step is completed, both the PCI endpoint controllers are ready to
+establish a link with the host.
+
+
+Start the Link
+--------------
+
+In order for the endpoint device to establish a link with the host, the _start_
+field should be populated with '1'. For NTB, both the PCI endpoint controllers
+should establish link with the host::
+
+ # echo 1 > controllers/2900000.pcie-ep/start
+ # echo 1 > controllers/2910000.pcie-ep/start
+
+
+RootComplex Device
+==================
+
+lspci Output
+------------
+
+Note that the devices listed here correspond to the values populated in
+"Creating pci-epf-ntb Device" section above::
+
+ # lspci
+ 0000:00:00.0 PCI bridge: Texas Instruments Device b00d
+ 0000:01:00.0 RAM memory: Texas Instruments Device b00d
+
+
+Using ntb_hw_epf Device
+-----------------------
+
+The host side software follows the standard NTB software architecture in Linux.
+All the existing client side NTB utilities like NTB Transport Client and NTB
+Netdev, NTB Ping Pong Test Client and NTB Tool Test Client can be used with NTB
+function device.
+
+For more information on NTB see
+:doc:`Non-Transparent Bridge <../../driver-api/ntb>`
diff --git a/Documentation/PCI/endpoint/pci-vntb-function.rst b/Documentation/PCI/endpoint/pci-vntb-function.rst
new file mode 100644
index 000000000000..0c51f53ab972
--- /dev/null
+++ b/Documentation/PCI/endpoint/pci-vntb-function.rst
@@ -0,0 +1,129 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+PCI vNTB Function
+=================
+
+:Author: Frank Li <Frank.Li@nxp.com>
+
+The difference between PCI NTB function and PCI vNTB function is
+
+PCI NTB function need at two endpoint instances and connect HOST1
+and HOST2.
+
+PCI vNTB function only use one host and one endpoint(EP), use NTB
+connect EP and PCI host
+
+.. code-block:: text
+
+
+ +------------+ +---------------------------------------+
+ | | | |
+ +------------+ | +--------------+
+ | NTB | | | NTB |
+ | NetDev | | | NetDev |
+ +------------+ | +--------------+
+ | NTB | | | NTB |
+ | Transfer | | | Transfer |
+ +------------+ | +--------------+
+ | | | | |
+ | PCI NTB | | | |
+ | EPF | | | |
+ | Driver | | | PCI Virtual |
+ | | +---------------+ | NTB Driver |
+ | | | PCI EP NTB |<------>| |
+ | | | FN Driver | | |
+ +------------+ +---------------+ +--------------+
+ | | | | | |
+ | PCI BUS | <-----> | PCI EP BUS | | Virtual PCI |
+ | | PCI | | | BUS |
+ +------------+ +---------------+--------+--------------+
+ PCI RC PCI EP
+
+Constructs used for Implementing vNTB
+=====================================
+
+ 1) Config Region
+ 2) Self Scratchpad Registers
+ 3) Peer Scratchpad Registers
+ 4) Doorbell (DB) Registers
+ 5) Memory Window (MW)
+
+
+Config Region:
+--------------
+
+It is same as PCI NTB Function driver
+
+Scratchpad Registers:
+---------------------
+
+It is appended after Config region.
+
+.. code-block:: text
+
+
+ +--------------------------------------------------+ Base
+ | |
+ | |
+ | |
+ | Common Config Register |
+ | |
+ | |
+ | |
+ +-----------------------+--------------------------+ Base + span_offset
+ | | |
+ | Peer Span Space | Span Space |
+ | | |
+ | | |
+ +-----------------------+--------------------------+ Base + span_offset
+ | | | + span_count * 4
+ | | |
+ | Span Space | Peer Span Space |
+ | | |
+ +-----------------------+--------------------------+
+ Virtual PCI Pcie Endpoint
+ NTB Driver NTB Driver
+
+
+Doorbell Registers:
+-------------------
+
+ Doorbell Registers are used by the hosts to interrupt each other.
+
+Memory Window:
+--------------
+
+ Actual transfer of data between the two hosts will happen using the
+ memory window.
+
+Modeling Constructs:
+====================
+
+32-bit BARs.
+
+====== ===============
+BAR NO CONSTRUCTS USED
+====== ===============
+BAR0 Config Region
+BAR1 Doorbell
+BAR2 Memory Window 1
+BAR3 Memory Window 2
+BAR4 Memory Window 3
+BAR5 Memory Window 4
+====== ===============
+
+64-bit BARs.
+
+====== ===============================
+BAR NO CONSTRUCTS USED
+====== ===============================
+BAR0 Config Region + Scratchpad
+BAR1
+BAR2 Doorbell
+BAR3
+BAR4 Memory Window 1
+BAR5
+====== ===============================
+
+
diff --git a/Documentation/PCI/endpoint/pci-vntb-howto.rst b/Documentation/PCI/endpoint/pci-vntb-howto.rst
new file mode 100644
index 000000000000..70d3bc90893f
--- /dev/null
+++ b/Documentation/PCI/endpoint/pci-vntb-howto.rst
@@ -0,0 +1,164 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================================================================
+PCI Non-Transparent Bridge (NTB) Endpoint Function (EPF) User Guide
+===================================================================
+
+:Author: Frank Li <Frank.Li@nxp.com>
+
+This document is a guide to help users use pci-epf-vntb function driver
+and ntb_hw_epf host driver for NTB functionality. The list of steps to
+be followed in the host side and EP side is given below. For the hardware
+configuration and internals of NTB using configurable endpoints see
+Documentation/PCI/endpoint/pci-vntb-function.rst
+
+Endpoint Device
+===============
+
+Endpoint Controller Devices
+---------------------------
+
+To find the list of endpoint controller devices in the system::
+
+ # ls /sys/class/pci_epc/
+ 5f010000.pcie_ep
+
+If PCI_ENDPOINT_CONFIGFS is enabled::
+
+ # ls /sys/kernel/config/pci_ep/controllers
+ 5f010000.pcie_ep
+
+Endpoint Function Drivers
+-------------------------
+
+To find the list of endpoint function drivers in the system::
+
+ # ls /sys/bus/pci-epf/drivers
+ pci_epf_ntb pci_epf_test pci_epf_vntb
+
+If PCI_ENDPOINT_CONFIGFS is enabled::
+
+ # ls /sys/kernel/config/pci_ep/functions
+ pci_epf_ntb pci_epf_test pci_epf_vntb
+
+
+Creating pci-epf-vntb Device
+----------------------------
+
+PCI endpoint function device can be created using the configfs. To create
+pci-epf-vntb device, the following commands can be used::
+
+ # mount -t configfs none /sys/kernel/config
+ # cd /sys/kernel/config/pci_ep/
+ # mkdir functions/pci_epf_vntb/func1
+
+The "mkdir func1" above creates the pci-epf-ntb function device that will
+be probed by pci_epf_vntb driver.
+
+The PCI endpoint framework populates the directory with the following
+configurable fields::
+
+ # ls functions/pci_epf_ntb/func1
+ baseclass_code deviceid msi_interrupts pci-epf-ntb.0
+ progif_code secondary subsys_id vendorid
+ cache_line_size interrupt_pin msix_interrupts primary
+ revid subclass_code subsys_vendor_id
+
+The PCI endpoint function driver populates these entries with default values
+when the device is bound to the driver. The pci-epf-vntb driver populates
+vendorid with 0xffff and interrupt_pin with 0x0001::
+
+ # cat functions/pci_epf_vntb/func1/vendorid
+ 0xffff
+ # cat functions/pci_epf_vntb/func1/interrupt_pin
+ 0x0001
+
+
+Configuring pci-epf-vntb Device
+-------------------------------
+
+The user can configure the pci-epf-vntb device using its configfs entry. In order
+to change the vendorid and the deviceid, the following
+commands can be used::
+
+ # echo 0x1957 > functions/pci_epf_vntb/func1/vendorid
+ # echo 0x0809 > functions/pci_epf_vntb/func1/deviceid
+
+The PCI endpoint framework also automatically creates a sub-directory in the
+function attribute directory. This sub-directory has the same name as the name
+of the function device and is populated with the following NTB specific
+attributes that can be configured by the user::
+
+ # ls functions/pci_epf_vntb/func1/pci_epf_vntb.0/
+ db_count mw1 mw2 mw3 mw4 num_mws
+ spad_count
+
+A sample configuration for NTB function is given below::
+
+ # echo 4 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/db_count
+ # echo 128 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/spad_count
+ # echo 1 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/num_mws
+ # echo 0x100000 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/mw1
+
+A sample configuration for virtual NTB driver for virtual PCI bus::
+
+ # echo 0x1957 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_vid
+ # echo 0x080A > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vntb_pid
+ # echo 0x10 > functions/pci_epf_vntb/func1/pci_epf_vntb.0/vbus_number
+
+Binding pci-epf-ntb Device to EP Controller
+--------------------------------------------
+
+NTB function device should be attached to PCI endpoint controllers
+connected to the host.
+
+ # ln -s controllers/5f010000.pcie_ep functions/pci-epf-ntb/func1/primary
+
+Once the above step is completed, the PCI endpoint controllers are ready to
+establish a link with the host.
+
+
+Start the Link
+--------------
+
+In order for the endpoint device to establish a link with the host, the _start_
+field should be populated with '1'. For NTB, both the PCI endpoint controllers
+should establish link with the host (imx8 don't need this steps)::
+
+ # echo 1 > controllers/5f010000.pcie_ep/start
+
+RootComplex Device
+==================
+
+lspci Output at Host side
+-------------------------
+
+Note that the devices listed here correspond to the values populated in
+"Creating pci-epf-ntb Device" section above::
+
+ # lspci
+ 00:00.0 PCI bridge: Freescale Semiconductor Inc Device 0000 (rev 01)
+ 01:00.0 RAM memory: Freescale Semiconductor Inc Device 0809
+
+Endpoint Device / Virtual PCI bus
+=================================
+
+lspci Output at EP Side / Virtual PCI bus
+-----------------------------------------
+
+Note that the devices listed here correspond to the values populated in
+"Creating pci-epf-ntb Device" section above::
+
+ # lspci
+ 10:00.0 Unassigned class [ffff]: Dawicontrol Computersysteme GmbH Device 1234 (rev ff)
+
+Using ntb_hw_epf Device
+-----------------------
+
+The host side software follows the standard NTB software architecture in Linux.
+All the existing client side NTB utilities like NTB Transport Client and NTB
+Netdev, NTB Ping Pong Test Client and NTB Tool Test Client can be used with NTB
+function device.
+
+For more information on NTB see
+:doc:`Non-Transparent Bridge <../../driver-api/ntb>`
diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst
index 8f66feaafd4f..e73f84aebde3 100644
--- a/Documentation/PCI/index.rst
+++ b/Documentation/PCI/index.rst
@@ -1,8 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0
-=======================
-Linux PCI Bus Subsystem
-=======================
+=================
+PCI Bus Subsystem
+=================
.. toctree::
:maxdepth: 2
@@ -12,6 +12,7 @@ Linux PCI Bus Subsystem
pciebus-howto
pci-iov-howto
msi-howto
+ sysfs-pci
acpi-info
pci-error-recovery
pcieaer-howto
diff --git a/Documentation/PCI/msi-howto.rst b/Documentation/PCI/msi-howto.rst
index aa2046af69f7..783d30b7bb42 100644
--- a/Documentation/PCI/msi-howto.rst
+++ b/Documentation/PCI/msi-howto.rst
@@ -236,7 +236,7 @@ including a full 'lspci -v' so we can add the quirks to the kernel.
Disabling MSIs below a bridge
-----------------------------
-Some PCI bridges are not able to route MSIs between busses properly.
+Some PCI bridges are not able to route MSIs between buses properly.
In this case, MSIs must be disabled on all devices behind the bridge.
Some bridges allow you to enable MSIs by changing some bits in their
@@ -285,3 +285,13 @@ to bridges between the PCI root and the device, MSIs are disabled.
It is also worth checking the device driver to see whether it supports MSIs.
For example, it may contain calls to pci_alloc_irq_vectors() with the
PCI_IRQ_MSI or PCI_IRQ_MSIX flags.
+
+
+List of device drivers MSI(-X) APIs
+===================================
+
+The PCI/MSI subsystem has a dedicated C file for its exported device driver
+APIs — `drivers/pci/msi/api.c`. The following functions are exported:
+
+.. kernel-doc:: drivers/pci/msi/api.c
+ :export:
diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
index 13beee23cb04..42e1e78353f3 100644
--- a/Documentation/PCI/pci-error-recovery.rst
+++ b/Documentation/PCI/pci-error-recovery.rst
@@ -17,7 +17,7 @@ chipsets are able to deal with these errors; these include PCI-E chipsets,
and the PCI-host bridges found on IBM Power4, Power5 and Power6-based
pSeries boxes. A typical action taken is to disconnect the affected device,
halting all I/O to it. The goal of a disconnection is to avoid system
-corruption; for example, to halt system memory corruption due to DMA's
+corruption; for example, to halt system memory corruption due to DMAs
to "wild" addresses. Typically, a reconnection mechanism is also
offered, so that the affected PCI device(s) are reset and put back
into working condition. The reset phase requires coordination
@@ -79,19 +79,20 @@ This structure has the form::
struct pci_error_handlers
{
- int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
+ int (*error_detected)(struct pci_dev *dev, pci_channel_state_t);
int (*mmio_enabled)(struct pci_dev *dev);
int (*slot_reset)(struct pci_dev *dev);
void (*resume)(struct pci_dev *dev);
+ void (*cor_error_detected)(struct pci_dev *dev);
};
The possible channel states are::
- enum pci_channel_state {
+ typedef enum {
pci_channel_io_normal, /* I/O channel is in normal state */
pci_channel_io_frozen, /* I/O to channel is blocked */
pci_channel_io_perm_failure, /* PCI card is dead */
- };
+ } pci_channel_state_t;
Possible return values are::
@@ -177,9 +178,9 @@ is STEP 6 (Permanent Failure).
complex and not worth implementing.
The current powerpc implementation doesn't much care if the device
- attempts I/O at this point, or not. I/O's will fail, returning
+ attempts I/O at this point, or not. I/Os will fail, returning
a value of 0xff on read, and writes will be dropped. If more than
- EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
+ EEH_MAX_FAILS I/Os are attempted to a frozen adapter, EEH
assumes that the device driver has gone into an infinite loop
and prints an error to syslog. A reboot is then required to
get the device working again.
@@ -203,7 +204,7 @@ instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
.. note::
The following is proposed; no platform implements this yet:
- Proposal: All I/O's should be done _synchronously_ from within
+ Proposal: All I/Os should be done _synchronously_ from within
this callback, errors triggered by them will be returned via
the normal pci_check_whatever() API, no new error_detected()
callback will be issued due to an error happening here. However,
@@ -248,7 +249,7 @@ STEP 4: Slot Reset
------------------
In response to a return value of PCI_ERS_RESULT_NEED_RESET, the
-the platform will perform a slot reset on the requesting PCI device(s).
+platform will perform a slot reset on the requesting PCI device(s).
The actual steps taken by a platform to perform a slot reset
will be platform-dependent. Upon completion of slot reset, the
platform will call the device slot_reset() callback.
@@ -257,7 +258,7 @@ Powerpc platforms implement two levels of slot reset:
soft reset(default) and fundamental(optional) reset.
Powerpc soft reset consists of asserting the adapter #RST line and then
-restoring the PCI BAR's and PCI configuration header to a state
+restoring the PCI BARs and PCI configuration header to a state
that is equivalent to what it would be after a fresh system
power-on followed by power-on BIOS/system firmware initialization.
Soft reset is also known as hot-reset.
@@ -295,7 +296,7 @@ and let the driver restart normal I/O processing.
A driver can still return a critical failure for this function if
it can't get the device operational after reset. If the platform
previously tried a soft reset, it might now try a hard reset (power
-cycle) and then call slot_reset() again. It the device still can't
+cycle) and then call slot_reset() again. If the device still can't
be recovered, there is nothing more that can be done; the platform
will typically report a "permanent failure" in such a case. The
device will be considered "dead" in this case.
@@ -348,7 +349,7 @@ STEP 6: Permanent Failure
-------------------------
A "permanent failure" has occurred, and the platform cannot recover
the device. The platform will call error_detected() with a
-pci_channel_state value of pci_channel_io_perm_failure.
+pci_channel_state_t value of pci_channel_io_perm_failure.
The device driver should, at this point, assume the worst. It should
cancel all pending I/O, refuse all new I/O, returning -EIO to
@@ -361,9 +362,9 @@ permanent failure in some way. If the device is hotplug-capable,
the operator will probably want to remove and replace the device.
Note, however, not all failures are truly "permanent". Some are
caused by over-heating, some by a poorly seated card. Many
-PCI error events are caused by software bugs, e.g. DMA's to
+PCI error events are caused by software bugs, e.g. DMAs to
wild addresses or bogus split transactions due to programming
-errors. See the discussion in powerpc/eeh-pci-error-recovery.txt
+errors. See the discussion in Documentation/arch/powerpc/eeh-pci-error-recovery.rst
for additional detail on real-life experience of the causes of
software errors.
@@ -403,7 +404,7 @@ That is, the recovery API only requires that:
.. note::
Implementation details for the powerpc platform are discussed in
- the file Documentation/powerpc/eeh-pci-error-recovery.rst
+ the file Documentation/arch/powerpc/eeh-pci-error-recovery.rst
As of this writing, there is a growing list of device drivers with
patches implementing error recovery. Not all of these patches are in
@@ -417,10 +418,15 @@ That is, the recovery API only requires that:
- drivers/next/e100.c
- drivers/net/e1000
- drivers/net/e1000e
- - drivers/net/ixgb
- drivers/net/ixgbe
- drivers/net/cxgb3
- drivers/net/s2io.c
+ The cor_error_detected() callback is invoked in handle_error_source() when
+ the error severity is "correctable". The callback is optional and allows
+ additional logging to be done if desired. See example:
+
+ - drivers/cxl/pci.c
+
The End
-------
diff --git a/Documentation/PCI/pci-iov-howto.rst b/Documentation/PCI/pci-iov-howto.rst
index b9fd003206f1..27d35933cea2 100644
--- a/Documentation/PCI/pci-iov-howto.rst
+++ b/Documentation/PCI/pci-iov-howto.rst
@@ -125,14 +125,14 @@ Following piece of code illustrates the usage of the SR-IOV API.
...
}
- static int dev_suspend(struct pci_dev *dev, pm_message_t state)
+ static int dev_suspend(struct device *dev)
{
...
return 0;
}
- static int dev_resume(struct pci_dev *dev)
+ static int dev_resume(struct device *dev)
{
...
@@ -165,8 +165,7 @@ Following piece of code illustrates the usage of the SR-IOV API.
.id_table = dev_id_table,
.probe = dev_probe,
.remove = dev_remove,
- .suspend = dev_suspend,
- .resume = dev_resume,
+ .driver.pm = &dev_pm_ops,
.shutdown = dev_shutdown,
.sriov_configure = dev_sriov_configure,
};
diff --git a/Documentation/PCI/pci.rst b/Documentation/PCI/pci.rst
index 8c016d8c9862..cced568d78e9 100644
--- a/Documentation/PCI/pci.rst
+++ b/Documentation/PCI/pci.rst
@@ -17,7 +17,7 @@ PCI device drivers.
A more complete resource is the third edition of "Linux Device Drivers"
by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman.
LDD3 is available for free (under Creative Commons License) from:
-http://lwn.net/Kernel/LDD3/.
+https://lwn.net/Kernel/LDD3/.
However, keep in mind that all documents are subject to "bit rot".
Refer to the source code if things are not working as described here.
@@ -103,6 +103,7 @@ need pass only as many optional fields as necessary:
- subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
- class and classmask fields default to 0
- driver_data defaults to 0UL.
+ - override_only field defaults to 0.
Note that driver_data must match the value used by any of the pci_device_id
entries defined in the driver. This makes the driver_data field mandatory
@@ -209,12 +210,12 @@ the PCI device by calling pci_enable_device(). This will:
OS BUG: we don't check resource allocations before enabling those
resources. The sequence would make more sense if we called
pci_request_resources() before calling pci_enable_device().
- Currently, the device drivers can't detect the bug when when two
+ Currently, the device drivers can't detect the bug when two
devices have been allocated the same range. This is not a common
problem and unlikely to get fixed soon.
This has been discussed before but not changed as of 2.6.19:
- http://lkml.org/lkml/2006/3/2/194
+ https://lore.kernel.org/r/20060302180025.GC28895@flint.arm.linux.org.uk/
pci_set_master() will enable DMA by setting the bus master bit
@@ -265,33 +266,33 @@ Set the DMA mask size
---------------------
.. note::
If anything below doesn't make sense, please refer to
- Documentation/DMA-API.txt. This section is just a reminder that
+ Documentation/core-api/dma-api.rst. This section is just a reminder that
drivers need to indicate DMA capabilities of the device and is not
an authoritative source for DMA interfaces.
While all drivers should explicitly indicate the DMA capability
(e.g. 32 or 64 bit) of the PCI bus master, devices with more than
32-bit bus master capability for streaming data need the driver
-to "register" this capability by calling pci_set_dma_mask() with
+to "register" this capability by calling dma_set_mask() with
appropriate parameters. In general this allows more efficient DMA
on systems where System RAM exists above 4G _physical_ address.
Drivers for all PCI-X and PCIe compliant devices must call
-pci_set_dma_mask() as they are 64-bit DMA devices.
+dma_set_mask() as they are 64-bit DMA devices.
Similarly, drivers must also "register" this capability if the device
-can directly address "consistent memory" in System RAM above 4G physical
-address by calling pci_set_consistent_dma_mask().
+can directly address "coherent memory" in System RAM above 4G physical
+address by calling dma_set_coherent_mask().
Again, this includes drivers for all PCI-X and PCIe compliant devices.
Many 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are
64-bit DMA capable for payload ("streaming") data but not control
-("consistent") data.
+("coherent") data.
Setup shared control data
-------------------------
-Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared)
-memory. See Documentation/DMA-API.txt for a full description of
+Once the DMA masks are set, the driver can allocate "coherent" (a.k.a. shared)
+memory. See Documentation/core-api/dma-api.rst for a full description of
the DMA APIs. This section is just a reminder that it needs to be done
before enabling DMA on the device.
@@ -366,7 +367,7 @@ steps need to be performed:
- Disable the device from generating IRQs
- Release the IRQ (free_irq())
- Stop all DMA activity
- - Release DMA buffers (both streaming and consistent)
+ - Release DMA buffers (both streaming and coherent)
- Unregister from other subsystems (e.g. scsi or netdev)
- Disable device from responding to MMIO/IO Port addresses
- Release MMIO/IO Port resource(s)
@@ -419,9 +420,9 @@ Once DMA is stopped, clean up streaming DMA first.
I.e. unmap data buffers and return buffers to "upstream"
owners if there is one.
-Then clean up "consistent" buffers which contain the control data.
+Then clean up "coherent" buffers which contain the control data.
-See Documentation/DMA-API.txt for details on unmapping interfaces.
+See Documentation/core-api/dma-api.rst for details on unmapping interfaces.
Unregister from other subsystems
@@ -514,9 +515,8 @@ your driver if they're helpful, or just use plain hex constants.
The device IDs are arbitrary hex numbers (vendor controlled) and normally used
only in a single location, the pci_device_id table.
-Please DO submit new vendor/device IDs to http://pci-ids.ucw.cz/.
-There are mirrors of the pci.ids file at http://pciids.sourceforge.net/
-and https://github.com/pciutils/pciids.
+Please DO submit new vendor/device IDs to https://pci-ids.ucw.cz/.
+There's a mirror of the pci.ids file at https://github.com/pciutils/pciids.
Obsolete functions
diff --git a/Documentation/PCI/pcieaer-howto.rst b/Documentation/PCI/pcieaer-howto.rst
index 0b36b9ebfa4b..e00d63971695 100644
--- a/Documentation/PCI/pcieaer-howto.rst
+++ b/Documentation/PCI/pcieaer-howto.rst
@@ -16,62 +16,61 @@ Overview
About this guide
----------------
-This guide describes the basics of the PCI Express Advanced Error
+This guide describes the basics of the PCI Express (PCIe) Advanced Error
Reporting (AER) driver and provides information on how to use it, as
-well as how to enable the drivers of endpoint devices to conform with
-PCI Express AER driver.
+well as how to enable the drivers of Endpoint devices to conform with
+the PCIe AER driver.
-What is the PCI Express AER Driver?
------------------------------------
+What is the PCIe AER Driver?
+----------------------------
-PCI Express error signaling can occur on the PCI Express link itself
-or on behalf of transactions initiated on the link. PCI Express
+PCIe error signaling can occur on the PCIe link itself
+or on behalf of transactions initiated on the link. PCIe
defines two error reporting paradigms: the baseline capability and
the Advanced Error Reporting capability. The baseline capability is
-required of all PCI Express components providing a minimum defined
+required of all PCIe components providing a minimum defined
set of error reporting requirements. Advanced Error Reporting
-capability is implemented with a PCI Express advanced error reporting
+capability is implemented with a PCIe Advanced Error Reporting
extended capability structure providing more robust error reporting.
-The PCI Express AER driver provides the infrastructure to support PCI
-Express Advanced Error Reporting capability. The PCI Express AER
-driver provides three basic functions:
+The PCIe AER driver provides the infrastructure to support PCIe Advanced
+Error Reporting capability. The PCIe AER driver provides three basic
+functions:
- Gathers the comprehensive error information if errors occurred.
- Reports error to the users.
- Performs error recovery actions.
-AER driver only attaches root ports which support PCI-Express AER
-capability.
+The AER driver only attaches to Root Ports and RCECs that support the PCIe
+AER capability.
User Guide
==========
-Include the PCI Express AER Root Driver into the Linux Kernel
--------------------------------------------------------------
+Include the PCIe AER Root Driver into the Linux Kernel
+------------------------------------------------------
-The PCI Express AER Root driver is a Root Port service driver attached
-to the PCI Express Port Bus driver. If a user wants to use it, the driver
-has to be compiled. Option CONFIG_PCIEAER supports this capability. It
-depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and
-CONFIG_PCIEAER = y.
+The PCIe AER driver is a Root Port service driver attached
+via the PCIe Port Bus driver. If a user wants to use it, the driver
+must be compiled. It is enabled with CONFIG_PCIEAER, which
+depends on CONFIG_PCIEPORTBUS.
-Load PCI Express AER Root Driver
---------------------------------
+Load PCIe AER Root Driver
+-------------------------
Some systems have AER support in firmware. Enabling Linux AER support at
-the same time the firmware handles AER may result in unpredictable
+the same time the firmware handles AER would result in unpredictable
behavior. Therefore, Linux does not handle AER events unless the firmware
-grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0
+grants AER control to the OS via the ACPI _OSC method. See the PCI Firmware
Specification for details regarding _OSC usage.
AER error output
----------------
When a PCIe AER error is captured, an error message will be output to
-console. If it's a correctable error, it is output as a warning.
+console. If it's a correctable error, it is output as an info message.
Otherwise, it is printed as an error. So users could choose different
log level to filter out correctable error messages.
@@ -82,9 +81,9 @@ Below shows an example::
0000:50:00.0: [20] Unsupported Request (First)
0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100
-In the example, 'Requester ID' means the ID of the device who sends
-the error message to root port. Pls. refer to pci express specs for
-other fields.
+In the example, 'Requester ID' means the ID of the device that sent
+the error message to the Root Port. Please refer to PCIe specs for other
+fields.
AER Statistics / Counters
-------------------------
@@ -96,65 +95,56 @@ Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
Developer Guide
===============
-To enable AER aware support requires a software driver to configure
-the AER capability structure within its device and to provide callbacks.
+To enable error recovery, a software driver must provide callbacks.
-To support AER better, developers need understand how AER does work
-firstly.
+To support AER better, developers need to understand how AER works.
-PCI Express errors are classified into two types: correctable errors
-and uncorrectable errors. This classification is based on the impacts
+PCIe errors are classified into two types: correctable errors
+and uncorrectable errors. This classification is based on the impact
of those errors, which may result in degraded performance or function
failure.
Correctable errors pose no impacts on the functionality of the
-interface. The PCI Express protocol can recover without any software
+interface. The PCIe protocol can recover without any software
intervention or any loss of data. These errors are detected and
-corrected by hardware. Unlike correctable errors, uncorrectable
+corrected by hardware.
+
+Unlike correctable errors, uncorrectable
errors impact functionality of the interface. Uncorrectable errors
-can cause a particular transaction or a particular PCI Express link
+can cause a particular transaction or a particular PCIe link
to be unreliable. Depending on those error conditions, uncorrectable
errors are further classified into non-fatal errors and fatal errors.
Non-fatal errors cause the particular transaction to be unreliable,
-but the PCI Express link itself is fully functional. Fatal errors, on
+but the PCIe link itself is fully functional. Fatal errors, on
the other hand, cause the link to be unreliable.
-When AER is enabled, a PCI Express device will automatically send an
-error message to the PCIe root port above it when the device captures
+When PCIe error reporting is enabled, a device will automatically send an
+error message to the Root Port above it when it captures
an error. The Root Port, upon receiving an error reporting message,
-internally processes and logs the error message in its PCI Express
-capability structure. Error information being logged includes storing
+internally processes and logs the error message in its AER
+Capability structure. Error information being logged includes storing
the error reporting agent's requestor ID into the Error Source
Identification Registers and setting the error bits of the Root Error
-Status Register accordingly. If AER error reporting is enabled in Root
-Error Command Register, the Root Port generates an interrupt if an
+Status Register accordingly. If AER error reporting is enabled in the Root
+Error Command Register, the Root Port generates an interrupt when an
error is detected.
-Note that the errors as described above are related to the PCI Express
+Note that the errors as described above are related to the PCIe
hierarchy and links. These errors do not include any device specific
errors because device specific errors will still get sent directly to
the device driver.
-Configure the AER capability structure
---------------------------------------
-
-AER aware drivers of PCI Express component need change the device
-control registers to enable AER. They also could change AER registers,
-including mask and severity registers. Helper function
-pci_enable_pcie_error_reporting could be used to enable AER. See
-section 3.3.
-
Provide callbacks
-----------------
-callback reset_link to reset pci express link
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+callback reset_link to reset PCIe link
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-This callback is used to reset the pci express physical link when a
-fatal error happens. The root port aer service driver provides a
-default reset_link function, but different upstream ports might
-have different specifications to reset pci express link, so all
-upstream ports should provide their own reset_link functions.
+This callback is used to reset the PCIe physical link when a
+fatal error happens. The Root Port AER service driver provides a
+default reset_link function, but different Upstream Ports might
+have different specifications to reset the PCIe link, so
+Upstream Port drivers may provide their own reset_link functions.
Section 3.2.2.2 provides more detailed info on when to call
reset_link.
@@ -162,24 +152,24 @@ reset_link.
PCI error-recovery callbacks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-The PCI Express AER Root driver uses error callbacks to coordinate
+The PCIe AER Root driver uses error callbacks to coordinate
with downstream device drivers associated with a hierarchy in question
when performing error recovery actions.
Data struct pci_driver has a pointer, err_handler, to point to
pci_error_handlers who consists of a couple of callback function
-pointers. AER driver follows the rules defined in
-pci-error-recovery.txt except pci express specific parts (e.g.
-reset_link). Pls. refer to pci-error-recovery.txt for detailed
+pointers. The AER driver follows the rules defined in
+pci-error-recovery.rst except PCIe-specific parts (e.g.
+reset_link). Please refer to pci-error-recovery.rst for detailed
definitions of the callbacks.
-Below sections specify when to call the error callback functions.
+The sections below specify when to call the error callback functions.
Correctable errors
~~~~~~~~~~~~~~~~~~
Correctable errors pose no impacts on the functionality of
-the interface. The PCI Express protocol can recover without any
+the interface. The PCIe protocol can recover without any
software intervention or any loss of data. These errors do not
require any recovery actions. The AER driver clears the device's
correctable error status register accordingly and logs these errors.
@@ -190,12 +180,12 @@ Non-correctable (non-fatal and fatal) errors
If an error message indicates a non-fatal error, performing link reset
at upstream is not required. The AER driver calls error_detected(dev,
pci_channel_io_normal) to all drivers associated within a hierarchy in
-question. for example::
+question. For example::
- EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort
+ Endpoint <==> Downstream Port B <==> Upstream Port A <==> Root Port
-If Upstream port A captures an AER error, the hierarchy consists of
-Downstream port B and EndPoint.
+If Upstream Port A captures an AER error, the hierarchy consists of
+Downstream Port B and Endpoint.
A driver may return PCI_ERS_RESULT_CAN_RECOVER,
PCI_ERS_RESULT_DISCONNECT, or PCI_ERS_RESULT_NEED_RESET, depending on
@@ -212,36 +202,11 @@ to reset the link. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER
and reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
to mmio_enabled.
-helper functions
-----------------
-::
-
- int pci_enable_pcie_error_reporting(struct pci_dev *dev);
-
-pci_enable_pcie_error_reporting enables the device to send error
-messages to root port when an error is detected. Note that devices
-don't enable the error reporting by default, so device drivers need
-call this function to enable it.
-
-::
-
- int pci_disable_pcie_error_reporting(struct pci_dev *dev);
-
-pci_disable_pcie_error_reporting disables the device to send error
-messages to root port when an error is detected.
-
-::
-
- int pci_aer_clear_nonfatal_status(struct pci_dev *dev);`
-
-pci_aer_clear_nonfatal_status clears non-fatal errors in the uncorrectable
-error status register.
-
Frequent Asked Questions
------------------------
Q:
- What happens if a PCI Express device driver does not provide an
+ What happens if a PCIe device driver does not provide an
error recovery handler (pci_driver->err_handler is equal to NULL)?
A:
@@ -257,24 +222,6 @@ A:
Fatal error recovery will fail if the errors are reported by the
upstream ports who are attached by the service driver.
-Q:
- How does this infrastructure deal with driver that is not PCI
- Express aware?
-
-A:
- This infrastructure calls the error callback functions of the
- driver when an error happens. But if the driver is not aware of
- PCI Express, the device might not report its own errors to root
- port.
-
-Q:
- What modifications will that driver need to make it compatible
- with the PCI Express AER Root driver?
-
-A:
- It could call the helper functions to enable AER in devices and
- cleanup uncorrectable status register. Pls. refer to section 3.3.
-
Software error injection
========================
@@ -296,5 +243,5 @@ from:
https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/
-More information about aer-inject can be found in the document comes
-with its source code.
+More information about aer-inject can be found in the document in
+its source code.
diff --git a/Documentation/PCI/pciebus-howto.rst b/Documentation/PCI/pciebus-howto.rst
index f882ff62c51f..a0027e8fb0d0 100644
--- a/Documentation/PCI/pciebus-howto.rst
+++ b/Documentation/PCI/pciebus-howto.rst
@@ -213,8 +213,12 @@ PCI Config Registers
--------------------
Each service driver runs its PCI config operations on its own
-capability structure except the PCI Express capability structure, in
-which Root Control register and Device Control register are shared
-between PME and AER. This patch assumes that all service drivers
-will be well behaved and not overwrite other service driver's
-configuration settings.
+capability structure except the PCI Express capability structure,
+that is shared between many drivers including the service drivers.
+RMW Capability accessors (pcie_capability_clear_and_set_word(),
+pcie_capability_set_word(), and pcie_capability_clear_word()) protect
+a selected set of PCI Express Capability Registers (Link Control
+Register and Root Control Register). Any change to those registers
+should be performed using RMW accessors to avoid problems due to
+concurrent updates. For the up-to-date list of protected registers,
+see pcie_capability_clear_and_set_word().
diff --git a/Documentation/PCI/sysfs-pci.rst b/Documentation/PCI/sysfs-pci.rst
new file mode 100644
index 000000000000..f495185aa88a
--- /dev/null
+++ b/Documentation/PCI/sysfs-pci.rst
@@ -0,0 +1,138 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================================
+Accessing PCI device resources through sysfs
+============================================
+
+sysfs, usually mounted at /sys, provides access to PCI resources on platforms
+that support it. For example, a given bus might look like this::
+
+ /sys/devices/pci0000:17
+ |-- 0000:17:00.0
+ | |-- class
+ | |-- config
+ | |-- device
+ | |-- enable
+ | |-- irq
+ | |-- local_cpus
+ | |-- remove
+ | |-- resource
+ | |-- resource0
+ | |-- resource1
+ | |-- resource2
+ | |-- revision
+ | |-- rom
+ | |-- subsystem_device
+ | |-- subsystem_vendor
+ | `-- vendor
+ `-- ...
+
+The topmost element describes the PCI domain and bus number. In this case,
+the domain number is 0000 and the bus number is 17 (both values are in hex).
+This bus contains a single function device in slot 0. The domain and bus
+numbers are reproduced for convenience. Under the device directory are several
+files, each with their own function.
+
+ =================== =====================================================
+ file function
+ =================== =====================================================
+ class PCI class (ascii, ro)
+ config PCI config space (binary, rw)
+ device PCI device (ascii, ro)
+ enable Whether the device is enabled (ascii, rw)
+ irq IRQ number (ascii, ro)
+ local_cpus nearby CPU mask (cpumask, ro)
+ remove remove device from kernel's list (ascii, wo)
+ resource PCI resource host addresses (ascii, ro)
+ resource0..N PCI resource N, if present (binary, mmap, rw\ [1]_)
+ resource0_wc..N_wc PCI WC map resource N, if prefetchable (binary, mmap)
+ revision PCI revision (ascii, ro)
+ rom PCI ROM resource, if present (binary, ro)
+ subsystem_device PCI subsystem device (ascii, ro)
+ subsystem_vendor PCI subsystem vendor (ascii, ro)
+ vendor PCI vendor (ascii, ro)
+ =================== =====================================================
+
+::
+
+ ro - read only file
+ rw - file is readable and writable
+ wo - write only file
+ mmap - file is mmapable
+ ascii - file contains ascii text
+ binary - file contains binary data
+ cpumask - file contains a cpumask type
+
+.. [1] rw for IORESOURCE_IO (I/O port) regions only
+
+The read only files are informational, writes to them will be ignored, with
+the exception of the 'rom' file. Writable files can be used to perform
+actions on the device (e.g. changing config space, detaching a device).
+mmapable files are available via an mmap of the file at offset 0 and can be
+used to do actual device programming from userspace. Note that some platforms
+don't support mmapping of certain resources, so be sure to check the return
+value from any attempted mmap. The most notable of these are I/O port
+resources, which also provide read/write access.
+
+The 'enable' file provides a counter that indicates how many times the device
+has been enabled. If the 'enable' file currently returns '4', and a '1' is
+echoed into it, it will then return '5'. Echoing a '0' into it will decrease
+the count. Even when it returns to 0, though, some of the initialisation
+may not be reversed.
+
+The 'rom' file is special in that it provides read-only access to the device's
+ROM file, if available. It's disabled by default, however, so applications
+should write the string "1" to the file to enable it before attempting a read
+call, and disable it following the access by writing "0" to the file. Note
+that the device must be enabled for a rom read to return data successfully.
+In the event a driver is not bound to the device, it can be enabled using the
+'enable' file, documented above.
+
+The 'remove' file is used to remove the PCI device, by writing a non-zero
+integer to the file. This does not involve any kind of hot-plug functionality,
+e.g. powering off the device. The device is removed from the kernel's list of
+PCI devices, the sysfs directory for it is removed, and the device will be
+removed from any drivers attached to it. Removal of PCI root buses is
+disallowed.
+
+Accessing legacy resources through sysfs
+----------------------------------------
+
+Legacy I/O port and ISA memory resources are also provided in sysfs if the
+underlying platform supports them. They're located in the PCI class hierarchy,
+e.g.::
+
+ /sys/class/pci_bus/0000:17/
+ |-- bridge -> ../../../devices/pci0000:17
+ |-- cpuaffinity
+ |-- legacy_io
+ `-- legacy_mem
+
+The legacy_io file is a read/write file that can be used by applications to
+do legacy port I/O. The application should open the file, seek to the desired
+port (e.g. 0x3e8) and do a read or a write of 1, 2 or 4 bytes. The legacy_mem
+file should be mmapped with an offset corresponding to the memory offset
+desired, e.g. 0xa0000 for the VGA frame buffer. The application can then
+simply dereference the returned pointer (after checking for errors of course)
+to access legacy memory space.
+
+Supporting PCI access on new platforms
+--------------------------------------
+
+In order to support PCI resource mapping as described above, Linux platform
+code should ideally define ARCH_GENERIC_PCI_MMAP_RESOURCE and use the generic
+implementation of that functionality. To support the historical interface of
+mmap() through files in /proc/bus/pci, platforms may also set HAVE_PCI_MMAP.
+
+Alternatively, platforms which set HAVE_PCI_MMAP may provide their own
+implementation of pci_mmap_resource_range() instead of defining
+ARCH_GENERIC_PCI_MMAP_RESOURCE.
+
+Platforms which support write-combining maps of PCI resources must define
+arch_can_pci_mmap_wc() which shall evaluate to non-zero at runtime when
+write-combining is permitted. Platforms which support maps of I/O resources
+define arch_can_pci_mmap_io() similarly.
+
+Legacy resources are protected by the HAVE_PCI_LEGACY define. Platforms
+wishing to support legacy functionality should define it and provide
+pci_legacy_read, pci_legacy_write and pci_mmap_legacy_page_range functions.