aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/platforms/powernv
AgeCommit message (Collapse)Author
2016-08-09powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARsBenjamin Herrenschmidt
The generic allocation code may sometimes decide to assign a prefetchable 64-bit BAR to the M32 window. In fact it may also decide to allocate a 64-bit non-prefetchable BAR to the M64 one ! So using the resource flags as a test to decide which window was used for PE allocation is just wrong and leads to insane PE numbers. Instead, compare the addresses to figure it out. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Rename the function as agreed by Ben & Gavin] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09powerpc/book3s: Fix MCE console messages for unrecoverable MCE.Mahesh Salgaonkar
When machine check occurs with MSR(RI=0), it means MC interrupt is unrecoverable and kernel goes down to panic path. But the console message still shows it as recovered. This patch fixes the MCE console messages. Fixes: 36df96f8acaf ("powerpc/book3s: Decode and save machine check event.") Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09powerpc/powernv/ioda: Fix TCE invalidate to work in real mode againAlexey Kardashevskiy
Commit fd141d1a99a3 ("powerpc/powernv/pci: Rework accessing the TCE invalidate register") broke TCE invalidation on IODA2/PHB3 for real mode. This makes invalidate work again. Fixes: fd141d1a99a3 ("powerpc/powernv/pci: Rework accessing the TCE invalidate register") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09powerpc/xics: Properly set Edge/Level type and enable resendBenjamin Herrenschmidt
This sets the type of the interrupt appropriately. We set it as follow: - If not mapped from the device-tree, we use edge. This is the case of the virtual interrupts and PCI MSIs for example. - If mapped from the device-tree and #interrupt-cells is 2 (PAPR compliant), we use the second cell to set the appropriate type - If mapped from the device-tree and #interrupt-cells is 1 (current OPAL on P8 does that), we assume level sensitive since those are typically going to be the PSI LSIs which are level sensitive. Additionally, we mark the interrupts requested via the opal_interrupts property all level. This is a bit fishy but the best we can do until we fix OPAL to properly expose them with a complete descriptor. It is also correct for the current HW anyway as OPAL interrupts are currently PCI error and PSI interrupts which are level. Finally now that edge interrupts are properly identified, we can enable CONFIG_HARDIRQS_SW_RESEND which will make the core re-send them if they occur while masked, which some drivers rely upon. This fixes issues with lost interrupts on some Mellanox adapters. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-04dma-mapping: use unsigned long for dma_attrsKrzysztof Kozlowski
The dma-mapping core and the implementations do not change the DMA attributes passed by pointer. Thus the pointer can point to const data. However the attributes do not have to be a bitfield. Instead unsigned long will do fine: 1. This is just simpler. Both in terms of reading the code and setting attributes. Instead of initializing local attributes on the stack and passing pointer to it to dma_set_attr(), just set the bits. 2. It brings safeness and checking for const correctness because the attributes are passed by value. Semantic patches for this change (at least most of them): virtual patch virtual context @r@ identifier f, attrs; @@ f(..., - struct dma_attrs *attrs + unsigned long attrs , ...) { ... } @@ identifier r.f; @@ f(..., - NULL + 0 ) and // Options: --all-includes virtual patch virtual context @r@ identifier f, attrs; type t; @@ t f(..., struct dma_attrs *attrs); @@ identifier r.f; @@ f(..., - NULL + 0 ) Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> Acked-by: Mark Salter <msalter@redhat.com> [c6x] Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris] Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm] Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp] Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core] Acked-by: David Vrabel <david.vrabel@citrix.com> [xen] Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb] Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon] Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32] Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc] Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: - ARM: GICv3 ITS emulation and various fixes. Removal of the old VGIC implementation. - s390: support for trapping software breakpoints, nested virtualization (vSIE), the STHYI opcode, initial extensions for CPU model support. - MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups, preliminary to this and the upcoming support for hardware virtualization extensions. - x86: support for execute-only mappings in nested EPT; reduced vmexit latency for TSC deadline timer (by about 30%) on Intel hosts; support for more than 255 vCPUs. - PPC: bugfixes. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits) KVM: PPC: Introduce KVM_CAP_PPC_HTM MIPS: Select HAVE_KVM for MIPS64_R{2,6} MIPS: KVM: Reset CP0_PageMask during host TLB flush MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX() MIPS: KVM: Sign extend MFC0/RDHWR results MIPS: KVM: Fix 64-bit big endian dynamic translation MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase MIPS: KVM: Use 64-bit CP0_EBase when appropriate MIPS: KVM: Set CP0_Status.KX on MIPS64 MIPS: KVM: Make entry code MIPS64 friendly MIPS: KVM: Use kmap instead of CKSEG0ADDR() MIPS: KVM: Use virt_to_phys() to get commpage PFN MIPS: Fix definition of KSEGX() for 64-bit KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD kvm: x86: nVMX: maintain internal copy of current VMCS KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures KVM: arm64: vgic-its: Simplify MAPI error handling KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers KVM: arm64: vgic-its: Turn device_id validation into generic ID validation ...
2016-07-21powerpc/powernv/ioda: Fix endianness when reading TCEsAlexey Kardashevskiy
The iommu_table_ops::exchange() callback writes new TCE to the table and returns old value and permission mask. The old TCE value is correctly converted from BE to CPU endian; however permission mask was calculated from BE value and therefore always returned DMA_NONE which could cause memory leak on LE systems using VFIO SPAPR TCE IOMMU v1 driver. This fixes pnv_tce_xchg() to have @oldtce a CPU endian. Fixes: 05c6cfb9dce0 ("powerpc/iommu/powernv: Release replaced TCE") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21powerpc: Get rid of ppc_md.init_early()Benjamin Herrenschmidt
It is now called right after platform probe, so the probe function can just do the job. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21powerpc/64: Move 64-bit probe_machine() to later in the boot processBenjamin Herrenschmidt
We no long need the machine type that early, so we can move probe_machine() to after the device-tree has been expanded. This will allow further consolidation. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21powerpc/64: Move MMU backend selection out of platform codeBenjamin Herrenschmidt
We move it into early_mmu_init() based on firmware features. For PS3, we have to move the setting of these into early_init_devtree(). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21powerpc: Put exception configuration in a common placeBenjamin Herrenschmidt
The various calls to establish exception endianness and AIL are now done from a single point using already established CPU and FW feature bits to decide what to do. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-19powerpc/powernv: Fix pci-cxl.c build when CONFIG_MODULES=nIan Munsie
pnv_cxl_enable_phb_kernel_api() grabs a reference to the cxl module to prevent it from being unloaded after the PHB has been switched to CX4 mode. This breaks the build when CONFIG_MODULES=n as module_mutex doesn't exist. However, if we don't have modules, we don't need to protect against the case of the cxl module being unloaded. As such, split the relevant code out into a function surrounded with #if IS_MODULE(CXL) so we don't try to compile it if cxl isn't being compiled as a module. Fixes: 5918dbc9b4ec ("powerpc/powernv: Add support for the cxl kernel api on the real phb") Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/powernv/pci: Check status of a PHB before using itBenjamin Herrenschmidt
If the firmware encounters an error (internal or HW) during initialization of a PHB, it might leave the device-node in the tree but mark it disabled using the "status" property. We should check it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/powernv/pci: Use the device-tree to get available range of M64'sBenjamin Herrenschmidt
M64's are the configurable 64-bit windows that cover the 64-bit MMIO space. We used to hard code 16 windows. Newer chips might have a variable number and might need to reserve some as well (for example on PHB4/POWER9, M32 and M64 are actually unified and we use M64#0 to map the 32-bit space). So newer OPALs will provide a property we can use to know what range of windows is available. The property is named so that it can eventually support multiple ranges but we only use the first one for now. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/powernv/pci: Fallback to OPAL for TCE invalidationsBenjamin Herrenschmidt
If we don't find registers for the PHB or don't know the model specific invalidation method, use OPAL calls instead. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/powernv/pci: Rework accessing the TCE invalidate registerBenjamin Herrenschmidt
It's architected, always in a known place, so there is no need to keep a separate pointer to it, we use the existing "regs", and we complement it with a real mode variant. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> # Conflicts: # arch/powerpc/platforms/powernv/pci-ioda.c # arch/powerpc/platforms/powernv/pci.h Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/powernv/pci: Remove SWINV constants and obsolete TCE codeBenjamin Herrenschmidt
We have some obsolete code in pnv_pci_p7ioc_tce_invalidate() to handle some internal lab tools that have stopped being useful a long time ago. Remove that along with the definition and test for the TCE_PCI_SWINV_* flags whose value is basically always the same. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/powernv/pci: Rename TCE invalidation callsBenjamin Herrenschmidt
The TCE invalidation functions are fairly implementation specific, and while the IODA specs more/less describe the register, in practice various implementation workarounds may be required. So name the functions after the target PHB. Note today and for the foreseeable future, there's a 1:1 relationship between an IODA version and a PHB implementation. There exist another variant of IODA1 (Torrent) but we never supported in with OPAL and never will. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/opal: Add real mode call wrappersBenjamin Herrenschmidt
Replace the old generic opal_call_realmode() with proper per-call wrappers similar to the normal ones and convert callers. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/powernv: Discover IODA3 PHBsBenjamin Herrenschmidt
We instanciate them as IODA2. We also change the MSI EOI hack to only kick on PHB3 since it will not be needed on any new implementation. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Add XICS emulation APIsBenjamin Herrenschmidt
OPAL provides an emulated XICS interrupt controller to use as a fallback on newer processors that don't have a XICS. It's meant as a way to provide backward compatibility with future processors. Add the corresponding interfaces. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Use deepest stop state when cpu is offlinedShreyas B. Prabhu
If hardware supports stop state, use the deepest stop state when the cpu is offlined. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Add platform support for stop instructionShreyas B. Prabhu
POWER ISA v3 defines a new idle processor core mechanism. In summary, a) new instruction named stop is added. This instruction replaces instructions like nap, sleep, rvwinkle. b) new per thread SPR named Processor Stop Status and Control Register (PSSCR) is added which controls the behavior of stop instruction. PSSCR layout: ---------------------------------------------------------- | PLS | /// | SD | ESL | EC | PSLL | /// | TR | MTL | RL | ---------------------------------------------------------- 0 4 41 42 43 44 48 54 56 60 PSSCR key fields: Bits 0:3 - Power-Saving Level Status. This field indicates the lowest power-saving state the thread entered since stop instruction was last executed. Bit 42 - Enable State Loss 0 - No state is lost irrespective of other fields 1 - Allows state loss Bits 44:47 - Power-Saving Level Limit This limits the power-saving level that can be entered into. Bits 60:63 - Requested Level Used to specify which power-saving level must be entered on executing stop instruction This patch adds support for stop instruction and PSSCR handling. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Add support for interrupts on the Mellanox CX4Ian Munsie
The Mellanox CX4 in cxl mode uses a hybrid interrupt model, where interrupts are routed from the networking hardware to the XSL using the MSIX table, and from there will be transformed back into an MSIX interrupt using the cxl style interrupts (i.e. using IVTE entries and ranges to map a PE and AFU interrupt number to an MSIX address). We want to hide the implementation details of cxl interrupts as much as possible. To this end, we use a special version of the MSI setup & teardown routines in the PHB while in cxl mode to allocate the cxl interrupts and configure the IVTE entries in the process element. This function does not configure the MSIX table - the CX4 card uses a custom format in that table and it would not be appropriate to fill that out in generic code. The rest of the functionality is similar to the "Full MSI-X mode" described in the CAIA, and this could be easily extended to support other adapters that use that mode in the future. The interrupts will be associated with the default context. If the maximum number of interrupts per context has been limited (e.g. by the mlx5 driver), it will automatically allocate additional kernel contexts to associate extra interrupts as required. These contexts will be started using the same WED that was used to start the default context. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14powerpc/powernv: Add support for the cxl kernel api on the real phbIan Munsie
This adds support for the peer model of the cxl kernel api to the PowerNV PHB, in which physical function 0 represents the cxl function on the card (an XSL in the case of the CX4), which other physical functions will use for memory access and interrupt services. It is referred to as the peer model as these functions are peers of one another, as opposed to the Virtual PHB model which forms a hierarchy. This patch exports APIs to enable the peer mode, check if a PCI device is attached to a PHB in this mode, and to set and get the peer AFU for this mode. The cxl driver will enable this mode for supported cards by calling pnv_cxl_enable_phb_kernel_api(). This will set a flag in the PHB to note that this mode is enabled, and switch out it's controller_ops for the cxl version. The cxl version of the controller_ops struct implements it's own versions of the enable_device_hook and release_device to handle refcounting on the peer AFU and to allocate a default context for the device. Once enabled, the cxl kernel API may not be disabled on a PHB. Currently there is no safe way to disable cxl mode short of a reboot, so until that changes there is no reason to support the disable path. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14powerpc/powernv: Split cxl code out into a separate fileIan Munsie
The support for using the Mellanox CX4 in cxl mode will require additions to the PHB code. In preparation for this, move the existing cxl code out of pci-ioda.c into a separate pci-cxl.c file to keep things more organised. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-08powerpc/opal: Wake up kopald polling thread before waiting for eventsBenjamin Herrenschmidt
On some environments (prototype machines, some simulators, etc...) there is no functional interrupt source to signal completion, so we rely on the fairly slow OPAL heartbeat. In a number of cases, the calls complete very quickly or even immediately. We've observed that it helps a lot to wakeup the OPAL heartbeat thread before waiting for event in those cases, it will call OPAL immediately to collect completions for anything that finished fast enough. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-By: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-29powerpc/powernv: Add driver for operator panel on FSP machinesSuraj Jitindar Singh
Implement new character device driver to allow access from user space to the operator panel display present on IBM Power Systems machines with FSPs. This will allow status information to be presented on the display which is visible to a user. The driver implements a character buffer which a user can read/write by accessing the device (/dev/op_panel). This buffer is then displayed on the operator panel display. Any attempt to write past the last character position will have no effect and attempts to write more characters than the size of the display will be truncated. The device may only be accessed by a single process at a time. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-29powerpc/opal: Add inline function to get rc from an ASYNC_COMP opal_msgSuraj Jitindar Singh
An opal_msg of type OPAL_MSG_ASYNC_COMP contains the return code in the params[1] struct member. However this isn't intuitive or obvious when reading the code and requires that a user look at the skiboot documentation or opal-api.h to verify this. Add an inline function to get the return code from an opal_msg and update call sites accordingly. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-28powerpc/powernv: Fix spelling mistake "Retrived" -> "Retrieved"Colin Ian King
Trivial fix to spelling mistake in pr_debug() message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-23powerpc/powernv: set power_save func after the idle states are initializedShreyas B. Prabhu
pnv_init_idle_states() discovers supported idle states from the device tree and does the required initialization. Set power_save function pointer only after this initialization is done Otherwise on machines which don't support nap, eg. Power9, the kernel will crash when it tries to nap. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Print correct PHB type namesGavin Shan
We're initializing "IODA1" and "IODA2" PHBs though they are IODA2 and NPU PHBs as below kernel log indicates. Initializing IODA1 OPAL PHB /pciex@3fffe40700000 Initializing IODA2 OPAL PHB /pciex@3fff000400000 This fixes the PHB names. After it's applied, we get: Initializing IODA2 PHB (/pciex@3fffe40700000) Initializing NPU PHB (/pciex@3fff000400000) Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Functions to get/set PCI slot stateGavin Shan
This exports 4 functions, which base on the corresponding OPAL APIs to get/set PCI slot status. Those functions are going to be used by PowerNV PCI hotplug driver: pnv_pci_get_device_tree() opal_get_device_tree() pnv_pci_get_presence_state() opal_pci_get_presence_state() pnv_pci_get_power_state() opal_pci_get_power_state() pnv_pci_set_power_state() opal_pci_set_power_state() Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Introduce pnv_pci_get_slot_id()Gavin Shan
This introduces pnv_pci_get_slot_id() to get the hotpluggable PCI slot ID from the corresponding device node. It will be used by hotplug driver. Requested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Use PCI slot reset infrastructureGavin Shan
The (OPAL) firmware might provide the PCI slot reset capability which is identified by property "ibm,reset-by-firmware" on the PCI slot associated device node. This routes the reset request to firmware if "ibm,reset-by-firmware" exists in the PCI slot device node. Otherwise, the reset is done inside kernel as before. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Support PCI slot IDGavin Shan
The reset and poll functionality from (OPAL) firmware supports PHB and PCI slot at same time. They are identified by ID. This supports PCI slot ID by: * Rename the argument name for opal_pci_reset() and opal_pci_poll() accordingly * Rename pnv_eeh_phb_poll() to pnv_eeh_poll() and adjust its argument name. * One macro is added to produce PCI slot ID. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/pci: Delay populating pdnGavin Shan
The pdn (struct pci_dn) instances are allocated from memblock or bootmem when creating PCI controller (hoses) in setup_arch(). PCI hotplug, which will be supported by proceeding patches, releases PCI device nodes and their corresponding pdn on unplugging event. The memory chunks for pdn instances allocated from memblock or bootmem are hard to reused after being released. This delays creating pdn by pci_devs_phb_init() from setup_arch() to core_initcall() so that they are allocated from slab. The memory consumed by pdn can be released to system without problem during PCI unplugging time. It indicates that pci_dn is unavailable in setup_arch() and the the fixup on pdn (like AGP's) can't be carried out that time. We have to do that in pcibios_root_bridge_prepare() on maple/pasemi/powermac platforms where/when the pdn is available. pcibios_root_bridge_prepare is called from subsys_initcall() which is executed after core_initcall() so the code flow does not change. At the mean while, the EEH device is created when pdn is populated, meaning pdn and EEH device have same life cycle. In turn, we needn't call eeh_dev_init() to create EEH device explicitly. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Dynamically release PEGavin Shan
This supports releasing PEs dynamically. A reference count is introduced to PE representing number of PCI devices associated with the PE. The reference count is increased when PCI device joins the PE and decreased when PCI device leaves the PE in pnv_pci_release_device(). When the count becomes zero, the PE and its consumed resources are released. Note that the count is accessed concurrently. So a counter with "int" type is enough here. In order to release the sources consumed by the PE, couple of helper functions are introduced as below: * pnv_pci_ioda1_unset_window() - Unset IODA1 DMA32 window * pnv_pci_ioda1_release_dma_pe() - Release IODA1 DMA32 segments * pnv_pci_ioda2_release_dma_pe() - Release IODA2 DMA resource * pnv_ioda_release_pe_seg() - Unmap IO/M32/M64 segments Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Make pnv_ioda_deconfigure_pe() visibleGavin Shan
pnv_ioda_deconfigure_pe() is visible only when CONFIG_PCI_IOV is enabled. The function will be used to tear down PE's associated mapping in PCI hotplug path that doesn't depend on CONFIG_PCI_IOV. This makes pnv_ioda_deconfigure_pe() visible and not depend on CONFIG_PCI_IOV. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Extend PCI bridge resourcesGavin Shan
The PCI slots are associated with root port or downstream ports of the PCIe switch connected to root port. When adapter is hot added to the PCI slot, it usually requests more IO or memory resource from the directly connected parent bridge (port) and update the bridge's windows accordingly. The resource windows of upstream bridges can't be updated automatically. It possibly leads to unbalanced resource across the bridges: The window of downstream bridge is overruning that of upstream bridge. The IO or MMIO path won't work. This resolves the above issue by extending bridge windows of root port and upstream port of the PCIe switch connected to the root port to PHB's windows. The windows of root port and bridge behind that are extended to the PHB's windows to accomodate the PCI hotplug happening in future. The PHB's 64KB 32-bits MSI region is included in bridge's M32 windows (in hardware) though it's excluded in the corresponding resource, as the bridge's M32 windows have 1MB as their minimal alignment. We observed EEH error during system boot when the MSI region is included in bridge's M32 window. This excludes top 1MB (including 64KB 32-bits MSI region) region from bridge's M32 windows when extending them. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Setup PE for root busGavin Shan
There is no parent bridge for root bus, meaning pcibios_setup_bridge() isn't invoked for root bus. The PE for root bus is the ancestor of other PEs in PELTV. It means we need PE for root bus populated before all others. This populates the PE for root bus in pcibios_setup_bridge() path if it's not populated yet. The PE number next to the reserved one is used as the PE# to avoid holes in continuous M64 space. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Create PEs in pcibios_setup_bridge()Gavin Shan
Currently, the PEs and their associated resources are assigned in ppc_md.pcibios_fixup() except those used by SRIOV VFs. The function is called for once after PCI probing and resources assignment is completed. So it's obviously not hotplug friendly. This creates PEs dynamically in pcibios_setup_bridge() that is called for the event during system bootup and PCI hotplug: updating PCI bridge's windows after resource assignment/reassignment are done. In partial hotplug case, not all PCI devices included to one particular PE are unplugged and plugged again, we just need unbinding/binding the hot added PCI devices with the corresponding PE without creating new one. The change is applied to IODA1 and IODA2 PHBs only. The behaviour on NPU PHBs aren't changed. There are no PCI bridges on NPU PHBs, meaning pcibios_setup_bridge() won't be invoked there. We have to use old path (pnv_pci_ioda_fixup()) to setup PEs on NPU PHBs. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Allocate PE# in reverse orderGavin Shan
PE number for one particular PE can be allocated dynamically or reserved according to the consumed M64 (64-bits prefetchable) segments of the PE. The M64 segment can't be remapped to arbitrary PE, meaning the PE number is determined according to the index of the consumed M64 segment. As below figure shows, M64 resource grows from low to high end, meaning the PE (number) reserved according to M64 segment grows from low to high end as well, so does the dynamically allocated PE number. It will lead to conflict: PE number (M64 segment) reserved by dynamic allocation is required by hot added PCI adapter at later point. It fails the PCI hotplug because of the PE number can't be reserved based on the index of the consumed M64 segment. +---+---+---+---+---+--------------------------------+-----+ | 0 | 1 | 2 | 3 | 4 | ....... | 255 | +---+---+---+---+---+--------------------------------+-----+ PE number for dynamic allocation -----------------> PE number reserved for M64 segment -----------------> To resolve above conflicts, this forces the PE number to be allocated dynamically in reverse order. With this patch applied, the PE numbers are reserved in ascending order, but allocated dynamically in reverse order. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Increase PE# capacityGavin Shan
Each PHB maintains an array helping to translate 2-bytes Request ID (RID) to PE# with the assumption that PE# takes one byte, meaning that we can't have more than 256 PEs. However, pci_dn->pe_number already had 4-bytes for the PE#. This extends the PE# capacity for every PHB. After that, the PE number is represented by 4-bytes value. Then we can reuse IODA_INVALID_PE to check the PE# in phb->pe_rmap[] is valid or not. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Move pnv_pci_ioda_setup_opal_tce_kill() aroundGavin Shan
pnv_pci_ioda_setup_opal_tce_kill() called by pnv_ioda_setup_dma() to remap the TCE kill regiter. What's done in pnv_ioda_setup_dma() will be covered in pcibios_setup_bridge() which is invoked on each PCI bridge. It means we will possibly remap the TCE kill register for multiple times and it's unnecessary. This moves pnv_pci_ioda_setup_opal_tce_kill() to where the PHB is initialized (pnv_pci_init_ioda_phb()) to avoid above issue. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-21powerpc/powernv: Remove PCI_RESET_DELAY_USGavin Shan
The macro defined in arch/powerpc/platforms/powernv/pci.c isn't used by anyone. Just remove it. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-20powerpc/powernv: Remove the usage of PACAR1 from opal wrappersMahesh Salgaonkar
OPAL_CALL wrapper code sticks the r1 (stack pointer) into PACAR1 purely for debugging purpose only. The power7_wakeup* functions relies on stack pointer saved in PACAR1. Any opal call made using opal wrapper (directly or in-directly) before we fall through power7_wakeup*, then it ends up replacing r1 in PACAR1(r13) leading to kernel panic. So far we don't see any issues because we have never made any opal calls using OPAL wrapper before power7_wakeup*. But the subsequent HMI patch would need to invoke C calls during cpu wakeup/idle path that in-directly makes opal call using opal wrapper. This patch facilitates the subsequent HMI patch by removing usage of PACAR1 from opal call wrapper. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-06-16cxl: Add support for CAPP DMA modeIan Munsie
This adds support for using CAPP DMA mode, which is required for XSL based cards such as the Mellanox CX4 to function. This is currently an RFC as it depends on the corresponding support to be merged into skiboot first, which was submitted here: http://patchwork.ozlabs.org/patch/625582/ In the event that the skiboot on the system does not have the above support, it will indicate as such in the kernel log and abort the init process. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-16powerpc: Introduce asm-prototypes.hDaniel Axtens
Sparse picked up a number of functions that are implemented in C and then only referred to in asm code. This introduces asm-prototypes.h, which provides a place for prototypes of these functions. This silences some sparse warnings. Signed-off-by: Daniel Axtens <dja@axtens.net> [mpe: Add include guards, clean up copyright & GPL text] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-16powerpc/sparse: make some things staticDaniel Axtens
This is just a smattering of things picked up by sparse that should be made static. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>