summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel/eeh_driver.c
AgeCommit message (Collapse)Author
2015-05-13powerpc/eeh: fix comment for wait_state()Wei Yang
To retrieve the PCI slot state, EEH driver would set a timeout for that. While current comment is not aligned to what the code does. This patch fixes those comments according to the code. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-24powerpc/eeh: Remove device_node dependencyGavin Shan
The patch removes struct eeh_dev::dn and the corresponding helper functions: eeh_dev_to_of_node() and of_node_to_eeh_dev(). Instead, eeh_dev_to_pdn() and pdn_to_eeh_dev() should be used to get the pdn, which might contain device_node on PowerNV platform. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-01-23powerpc/eeh: Allow to set maximal frozen timesGavin Shan
When PE's frozen count hits maximal allowed frozen times, which is 5 currently, it will be forced to be offline permanently. Once the PE is removed permanently, rebooting machine is required to bring the PE back. It's not convienent when testing EEH functionality. The patch exports the maximal allowed frozen times through debugfs entry (/sys/kernel/debug/powerpc/eeh_max_freezes). Requested-by: Ryan Grimm <grimm@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23powerpc/eeh: Introduce flag EEH_PE_REMOVEDGavin Shan
The conditions that one specific PE's frozen count exceeds the maximal allowed times (EEH_MAX_ALLOWED_FREEZES) and it's in isolated or recovery state indicate the PE was removed permanently implicitly. The patch introduces flag EEH_PE_REMOVED to indicate that explicitly so that we don't depend on the fixed maximal allowed times, which can be varied as we do in subsequent patch. Flag EEH_PE_REMOVED is expected to be marked for the PE whose frozen count exceeds the maximal allowed times, or just failed from recovery. Requested-by: Ryan Grimm <grimm@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-02powerpc/eeh: Set EEH_PE_RESET on PE resetGavin Shan
The patch introduces additional flag EEH_PE_RESET to indicate the corresponding PE is under reset. In turn, the PE retrieval bakcend on PowerNV platform can return unfrozen state for the EEH core to moving forward. Flag EEH_PE_CFG_BLOCKED isn't the correct one for the purpose. In PCI passthrou case, the problem is more worse: Guest doesn't recover 6th EEH error. The PE is left in isolated (frozen) and config blocked state on Broadcom adapters. We can't retrieve the PE's state correctly any more, even from the host side via sysfs /sys/bus/pci/devices/xxx/eeh_pe_state. Reported-by: Rajeshkumar Subramanian <rajeshkumars@in.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-10-15powerpc/eeh: Rename flag EEH_PE_RESET to EEH_PE_CFG_BLOCKEDGavin Shan
The flag EEH_PE_RESET indicates blocking config space of the PE during reset time. We potentially need block PE's config space other than reset time. So it's reasonable to replace it with EEH_PE_CFG_BLOCKED to indicate its usage. There are no substantial code or logic changes in this patch. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/eeh: Emulate EEH recovery for VFIO devicesGavin Shan
When enabling EEH functionality on passed through devices (PE) with VFIO, the devices in the PE would be removed permanently from guest side. In that case, the PE remains frozen state. When returning PE to host, or restarting the guest again, we had mechanism unfreezing the PE by clearing PESTA/B frozen bits. However, that's not enough for some adapters, which are indicated as following "lspci" shows. Those adapters require hot reset on the parent bus to bring their firmware back to workable state. Otherwise, those adaptrs won't be operative and the host (for returning case) or the guest will fail to load the drivers for those adapters without exception. 0000:01:00.0 Ethernet controller: Emulex Corporation OneConnect \ 10Gb NIC (be3) (rev 02) 0000:01:00.0 0200: 19a2:0710 (rev 02) 0001:03:00.0 Ethernet controller: Emulex Corporation OneConnect \ NIC (Lancer) (rev 10) 0001:03:00.0 0200: 10df:e220 (rev 10) The patch adds mechanism to emulate EEH recovery (for hot reset on parent PCI bus) on 3 gates to fix the issue: open/release one adapter of the PE, enable EEH functionality on one adapter of the PE. Reported-by: Murilo Fossa Vicentini <muvic@br.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30powerpc/eeh: Use eeh_unfreeze_pe()Gavin Shan
The patch uses eeh_unfreeze_pe() to replace the logic clearing frozen IO and DMA, in order to simplify the code. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-08-05powerpc/eeh: Replace pr_warning() with pr_warn()Gavin Shan
pr_warn() is equal to pr_warning(), but the former is a bit more formal according to commit fc62f2f ("kernel.h: add pr_warn for symmetry to dev_warn, netdev_warn"). The patch replaces pr_warning() with pr_warn(). Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11powerpc/powernv: Fix killed EEH eventGavin Shan
On PowerNV platform, EEH errors are reported by IO accessors or poller driven by interrupt. After the PE is isolated, we won't produce EEH event for the PE. The current implementation has possibility of EEH event lost in this way: The interrupt handler queues one "special" event, which drives the poller. EEH thread doesn't pick the special event yet. IO accessors kicks in, the frozen PE is marked as "isolated" and EEH event is queued to the list. EEH thread runs because of special event and purge all existing EEH events. However, we never produce an other EEH event for the frozen PE. Eventually, the PE is marked as "isolated" and we don't have EEH event to recover it. The patch fixes the issue to keep EEH events for PEs that have been marked as "isolated" with the help of additional "force" help to eeh_remove_event(). Reported-by: Rolf Brudeseth <rolfb@us.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-06-11powerpc/eeh: Clear frozen state for child PEGavin Shan
Since commit cb523e09 ("powerpc/eeh: Avoid I/O access during PE reset"), the PE is kept as frozen state on hardware level until the PE reset is done completely. After that, we explicitly clear the frozen state of the affected PE. However, there might have frozen child PEs of the affected PE and we also need clear their frozen state as well. Otherwise, the recovery is going to fail. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28powerpc/eeh: Can't recover from non-PE-reset caseGavin Shan
When PCI_ERS_RESULT_CAN_RECOVER returned from device drivers, the EEH core should enable I/O and DMA for the affected PE. However, it was missed to have DMA enabled in eeh_handle_normal_event(). Besides, the frozen state of the affected PE should be cleared after successful recovery, but we didn't. The patch fixes both of the issues as above. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28powerpc/eeh: No hotplug on permanently removed devGavin Shan
The issue was detected in a bit complicated test case where we have multiple hierarchical PEs shown as following figure: +-----------------+ | PE#3 p2p#0 | | p2p#1 | +-----------------+ | +-----------------+ | PE#4 pdev#0 | | pdev#1 | +-----------------+ PE#4 (have 2 PCI devices) is the child of PE#3, which has 2 p2p bridges. We accidentally had less-known scenario: PE#4 was removed permanently from the system because of permanent failure (e.g. exceeding the max allowd failure times in last hour), then we detects EEH errors on PE#3 and tried to recover it. However, eeh_dev instances for pdev#0/1 were not detached from PE#4, which was still connected to PE#3. All of that was because of the fact that we rely on count-based pcibios_release_device(), which isn't reliable enough. When doing recovery for PE#3, we still apply hotplug on PE#4 and pdev#0/1, which are not valid any more. Eventually, we run into kernel crash. The patch fixes above issue from two aspects. For unplug, we simply skip those permanently removed PE, whose state is (EEH_PE_STATE_ISOLATED && !EEH_PE_STATE_RECOVERING) and its frozen count should be greater than EEH_MAX_ALLOWED_FREEZES. For plug, we marked all permanently removed EEH devices with EEH_DEV_REMOVED and return 0xFF's on read its PCI config so that PCI core will omit them. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28powerpc/eeh: Avoid I/O access during PE resetGavin Shan
We have suffered recrusive frozen PE a lot, which was caused by IO accesses during the PE reset. Ben came up with the good idea to keep frozen PE until recovery (BAR restore) gets done. With that, IO accesses during PE reset are dropped by hardware and wouldn't incur the recrusive frozen PE any more. The patch implements the idea. We don't clear the frozen state until PE reset is done completely. During the period, the EEH core expects unfrozen state from backend to keep going. So we have to reuse EEH_PE_RESET flag, which has been set during PE reset, to return normal state from backend. The side effect is we have to clear frozen state for towice (PE reset and clear it explicitly), but that's harmless. We have some limitations on pHyp. pHyp doesn't allow to enable IO or DMA for unfrozen PE. So we don't enable them on unfrozen PE in eeh_pci_enable(). We have to enable IO before grabbing logs on pHyp. Otherwise, 0xFF's is always returned from PCI config space. Also, we had wrong return value from eeh_pci_enable() for EEH_OPT_THAW_DMA case. The patch fixes it too. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28powerpc/eeh: Block PCI-CFG access during PE resetGavin Shan
We've observed multiple PE reset failures because of PCI-CFG access during that period. Potentially, some device drivers can't support EEH very well and they can't put the device to motionless state before PE reset. So those device drivers might produce PCI-CFG accesses during PE reset. Also, we could have PCI-CFG access from user space (e.g. "lspci"). Since access to frozen PE should return 0xFF's, we can block PCI-CFG access during the period of PE reset so that we won't get recrusive EEH errors. The patch adds flag EEH_PE_RESET, which is kept during PE reset. The PowerNV/pSeries PCI-CFG accessors reuse the flag to block PCI-CFG accordingly. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28powerpc/eeh: Remove EEH_PE_PHB_DEADGavin Shan
The PE state (for eeh_pe instance) EEH_PE_PHB_DEAD is duplicate to EEH_PE_ISOLATED. Originally, those PHBs (PHB PE) with EEH_PE_PHB_DEAD would be removed from the system. However, it's safe to replace that with EEH_PE_ISOLATED. The patch also clear EEH_PE_RECOVERING after fenced PHB has been handled, either failure or success. It makes the PHB PE state consistent with: PHB functions normally NONE PHB has been removed EEH_PE_ISOLATED PHB fenced, recovery in progress EEH_PE_ISOLATED | RECOVERING Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-03-05powerpc: eeh: Fixup the brown paperbag fallout of the "cleanup"Thomas Gleixner
Commit b8a9a11b9 (powerpc: eeh: Kill another abuse of irq_desc) is missing some brackets ..... It's not a good idea to write patches in grumpy mode and then forget to at least compile test them or rely on the few eyeballs discussing that patch to spot it..... Reported-by: fengguang.wu@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Gavin Shan <shangw@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: ppc <linuxppc-dev@lists.ozlabs.org>
2014-03-04powerpc: Eeh: Kill another abuse of irq_descThomas Gleixner
commit 91150af3a (powerpc/eeh: Fix unbalanced enable for IRQ) is another brilliant example of trainwreck engineering. The patch "fixes" the issue of an unbalanced call to irq_enable() which causes a prominent warning by checking the disabled state of the interrupt line and call conditionally into the core code. This is wrong in two aspects: 1) The warning is there to tell users, that they need to fix their asymetric enable/disable patterns by finding the root cause and solving it there. It's definitely not meant to work around it by conditionally calling into the core code depending on the random state of the irq line. Asymetric irq_disable/enable calls are a clear sign of wrong usage of the interfaces which have to be cured at the root and not by somehow hacking around it. 2) The abuse of core internal data structure instead of using the proper interfaces for retrieving the information for the 'hack around' irq_desc is core internal and it's clear enough stated. Replace at least the irq_desc abuse with the proper functions and add a big fat comment why this is absurd and completely wrong. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Gavin Shan <shangw@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: ppc <linuxppc-dev@lists.ozlabs.org> Link: http://lkml.kernel.org/r/20140223212736.562906212@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-11powerpc/eeh: Drop taken reference to driver on eeh_rmv_deviceThadeu Lima de Souza Cascardo
Commit f5c57710dd62dd06f176934a8b4b8accbf00f9f8 ("powerpc/eeh: Use partial hotplug for EEH unaware drivers") introduces eeh_rmv_device, which may grab a reference to a driver, but not release it. That prevents a driver from being removed after it has gone through EEH recovery. This patch drops the reference if it was taken. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-27Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "So here's my next branch for powerpc. A bit late as I was on vacation last week. It's mostly the same stuff that was in next already, I just added two patches today which are the wiring up of lockref for powerpc, which for some reason fell through the cracks last time and is trivial. The highlights are, in addition to a bunch of bug fixes: - Reworked Machine Check handling on kernels running without a hypervisor (or acting as a hypervisor). Provides hooks to handle some errors in real mode such as TLB errors, handle SLB errors, etc... - Support for retrieving memory error information from the service processor on IBM servers running without a hypervisor and routing them to the memory poison infrastructure. - _PAGE_NUMA support on server processors - 32-bit BookE relocatable kernel support - FSL e6500 hardware tablewalk support - A bunch of new/revived board support - FSL e6500 deeper idle states and altivec powerdown support You'll notice a generic mm change here, it has been acked by the relevant authorities and is a pre-req for our _PAGE_NUMA support" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (121 commits) powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked() powerpc: Add support for the optimised lockref implementation powerpc/powernv: Call OPAL sync before kexec'ing powerpc/eeh: Escalate error on non-existing PE powerpc/eeh: Handle multiple EEH errors powerpc: Fix transactional FP/VMX/VSX unavailable handlers powerpc: Don't corrupt transactional state when using FP/VMX in kernel powerpc: Reclaim two unused thread_info flag bits powerpc: Fix races with irq_work Move precessing of MCE queued event out from syscall exit path. pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines powerpc: Make add_system_ram_resources() __init powerpc: add SATA_MV to ppc64_defconfig powerpc/powernv: Increase candidate fw image size powerpc: Add debug checks to catch invalid cpu-to-node mappings powerpc: Fix the setup of CPU-to-Node mappings during CPU online powerpc/iommu: Don't detach device without IOMMU group powerpc/eeh: Hotplug improvement powerpc/eeh: Call opal_pci_reinit() on powernv for restoring config space powerpc/eeh: Add restore_config operation ...
2014-01-15powerpc/eeh: Use global PCI rescan-remove lockingRafael J. Wysocki
Race conditions are theoretically possible between the PCI device addition and removal in the PPC64 PCI error recovery driver and the generic PCI bus rescan and device removal that can be triggered via sysfs. To avoid those race conditions make PPC64 PCI error recovery driver use global PCI rescan-remove locking around PCI device addition and removal. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-01-15powerpc/eeh: Handle multiple EEH errorsGavin Shan
For one PCI error relevant OPAL event, we possibly have multiple EEH errors for that. For example, multiple frozen PEs detected on different PHBs. Unfortunately, we didn't cover the case. The patch enumarates the return value from eeh_ops::next_error() and change eeh_handle_special_event() and eeh_ops::next_error() to handle all existing EEH errors. As Ben pointed out, we needn't list_for_each_entry_safe() since we are not deleting any PHB from the hose_list and the EEH serialized lock should be held while purging EEH events. The patch covers those suggestions as well. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc/eeh: Hotplug improvementGavin Shan
When EEH error comes to one specific PCI device before its driver is loaded, we will apply hotplug to recover the error. During the plug time, the PCI device will be probed and its driver is loaded. Then we wrongly calls to the error handlers if the driver supports EEH explicitly. The patch intends to fix by introducing flag EEH_DEV_NO_HANDLER and set it before we remove the PCI device. In turn, we can avoid wrongly calls the error handlers of the PCI device after its driver loaded. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-05powerpc: Increase EEH recovery timeout for SR-IOVBrian King
In order to support concurrent adapter firmware download to SR-IOV adapters on pSeries, each VF will see an EEH event where the slot will remain in the unavailable state for the duration of the adapter firmware update, which can take as long as 5 minutes. Extend the EEH recovery timeout to account for this. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-24powerpc/eeh: Fix unbalanced enable for IRQGavin Shan
The patch fixes following issue: Unbalanced enable for IRQ 23 ------------[ cut here ]------------ WARNING: at kernel/irq/manage.c:437 : NIP [c00000000016de8c] .__enable_irq+0x11c/0x140 LR [c00000000016de88] .__enable_irq+0x118/0x140 Call Trace: [c000003ea1f23880] [c00000000016de88] .__enable_irq+0x118/0x140 (unreliable) [c000003ea1f23910] [c00000000016df08] .enable_irq+0x58/0xa0 [c000003ea1f239a0] [c0000000000388b4] .eeh_enable_irq+0xc4/0xe0 [c000003ea1f23a30] [c000000000038a28] .eeh_report_reset+0x78/0x130 [c000003ea1f23ac0] [c000000000037508] .eeh_pe_dev_traverse+0x98/0x170 [c000003ea1f23b60] [c0000000000391ac] .eeh_handle_normal_event+0x2fc/0x3d0 [c000003ea1f23bf0] [c000000000039538] .eeh_handle_event+0x2b8/0x2c0 [c000003ea1f23c90] [c000000000039600] .eeh_event_handler+0xc0/0x170 [c000003ea1f23d30] [c0000000000da9a0] .kthread+0xf0/0x100 [c000003ea1f23e30] [c00000000000a1dc] .ret_from_kernel_thread+0x5c/0x80 Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-24powerpc/eeh: Use partial hotplug for EEH unaware driversGavin Shan
When EEH error happens to one specific PE, some devices with drivers supporting EEH won't except hotplug on the device. However, there might have other deivces without driver, or with driver without EEH support. For the case, we need do partial hotplug in order to make sure that the PE becomes absolutely quite during reset. Otherise, the PE reset might fail and leads to failure of error recovery. The current code doesn't handle that 'mixed' case properly, it either uses the error callbacks to the drivers, or tries hotplug, but doesn't handle a PE (EEH domain) composed of a combination of the two. The patch intends to support so-called "partial" hotplug for EEH: Before we do reset, we stop and remove those PCI devices without EEH sensitive driver. The corresponding EEH devices are not detached from its PE, but with special flag. After the reset is done, those EEH devices with the special flag will be scanned one by one. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-24powerpc/eeh: Keep PE during hotplugGavin Shan
When we do normal hotplug, the PE (shadow EEH structure) shouldn't be kept around. However, we need to keep it if the hotplug an artifial one caused by EEH errors recovery. Since we remove EEH device through the PCI hook pcibios_release_device(), the flag "purge_pe" passed to various functions is meaningless. So the patch removes the meaningless flag and introduce new flag "EEH_PE_KEEP" to save the PE while doing hotplug during EEH error recovery. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-01powerpc/eeh: Refactor the output messageGavin Shan
We needn't the the whole backtrace other than one-line message in the error reporting interrupt handler. For errors triggered by access PCI config space or MMIO, we replace "WARN(1, ...)" with pr_err() and dump_stack(). The patch also adds more output messages to indicate what EEH core is doing. Besides, some printk() are replaced with pr_warning(). Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc/eeh: EEH core to handle special eventGavin Shan
On PowerNV platform, the EEH event caused by interrupt won't have binding PE. The patch enables EEH core to handle the special event. To avoid the current logic we have, The eeh_handle_event() is renamed to eeh_handle_normal_event(), and the eeh_handle_special_event() is introduced. The function eeh_handle_event() dispatches to above two functions according to the input parameter. Besides, new backend "next_error" added to eeh_ops and it's expected to have following return values: 4 - Dead IOC 3 - Dead PHB 2 - Fenced PHB 1 - Frozen PE 0 - No error found Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc/eeh: Trace time on first error for PEGavin Shan
We're not expecting that one specific PE got frozen for over 5 times in last hour. Otherwise, the PE will be removed from the system upon newly coming EEH errors. The patch introduces time stamp to trace the first error on specific PE in last hour and function to update that accordingly. Besides, the time stamp is recovered during PE hotplug path as we did for frozen count. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20powerpc/eeh: Move common part to kernel directoryGavin Shan
The patch moves the common part of EEH core into arch/powerpc/kernel directory so that we needn't PPC_PSERIES while compiling POWERNV platform: * Move the EEH common part into arch/powerpc/kernel * Move the functions for PCI hotplug from pSeries platform to arch/powerpc/kernel/pci-hotplug.c * Move CONFIG_EEH from arch/powerpc/platforms/pseries/Kconfig to arch/powerpc/platforms/Kconfig * Adjust makefile accordingly Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>