aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/edac
AgeCommit message (Collapse)Author
2024-02-20Merge branch 'v5.4/standard/base' into v5.4/standard/intel-x86Bruce Ashfield
2024-01-25EDAC/thunderx: Fix possible out-of-bounds string accessArnd Bergmann
[ Upstream commit 475c58e1a471e9b873e3e39958c64a2d278275c8 ] Enabling -Wstringop-overflow globally exposes a warning for a common bug in the usage of strncat(): drivers/edac/thunderx_edac.c: In function 'thunderx_ocx_com_threaded_isr': drivers/edac/thunderx_edac.c:1136:17: error: 'strncat' specified bound 1024 equals destination size [-Werror=stringop-overflow=] 1136 | strncat(msg, other, OCX_MESSAGE_SIZE); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... 1145 | strncat(msg, other, OCX_MESSAGE_SIZE); ... 1150 | strncat(msg, other, OCX_MESSAGE_SIZE); ... Apparently the author of this driver expected strncat() to behave the way that strlcat() does, which uses the size of the destination buffer as its third argument rather than the length of the source buffer. The result is that there is no check on the size of the allocated buffer. Change it to strlcat(). [ bp: Trim compiler output, fixup commit message. ] Fixes: 41003396f932 ("EDAC, thunderx: Add Cavium ThunderX EDAC driver") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20231122222007.3199885-1-arnd@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-19Merge branch 'v5.4/standard/base' into v5.4/standard/intel-x86Bruce Ashfield
2023-05-17EDAC/skx: Fix overflows on the DRAM row address mapping arraysQiuxu Zhuo
[ Upstream commit 71b1e3ba3fed5a34c5fac6d3a15c2634b04c1eb7 ] The current DRAM row address mapping arrays skx_{open,close}_row[] only support ranks with sizes up to 16G. Decoding a rank address to a DRAM row address for a 32G rank by using either one of the above arrays by the skx_edac driver, will result in an overflow on the array. For a 32G rank, the most significant DRAM row address bit (the bit17) is mapped from the bit34 of the rank address. Add this new mapping item to both arrays to fix the overflow issue. Fixes: 4ec656bdf43a ("EDAC, skx_edac: Add EDAC driver for Skylake") Reported-by: Feng Xu <feng.f.xu@intel.com> Tested-by: Feng Xu <feng.f.xu@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/all/20230211011728.71764-1-qiuxu.zhuo@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-17Merge branch 'v5.4/standard/base' into v5.4/standard/intel-x86Bruce Ashfield
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com> # Conflicts: # drivers/dma/dmaengine.c
2023-02-06EDAC/qcom: Do not pass llcc_driv_data as edac_device_ctl_info's pvt_infoManivannan Sadhasivam
commit 977c6ba624f24ae20cf0faee871257a39348d4a9 upstream. The memory for llcc_driv_data is allocated by the LLCC driver. But when it is passed as the private driver info to the EDAC core, it will get freed during the qcom_edac driver release. So when the qcom_edac driver gets probed again, it will try to use the freed data leading to the use-after-free bug. Hence, do not pass llcc_driv_data as pvt_info but rather reference it using the platform_data pointer in the qcom_edac driver. Fixes: 27450653f1db ("drivers: edac: Add EDAC driver support for QCOM SoCs") Reported-by: Steev Klimaszewski <steev@kali.org> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Cc: <stable@vger.kernel.org> # 4.20 Link: https://lore.kernel.org/r/20230118150904.26913-4-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-06EDAC/device: Respect any driver-supplied workqueue polling valueManivannan Sadhasivam
commit cec669ff716cc83505c77b242aecf6f7baad869d upstream. The EDAC drivers may optionally pass the poll_msec value. Use that value if available, else fall back to 1000ms. [ bp: Touchups. ] Fixes: e27e3dac6517 ("drivers/edac: add edac_device class") Reported-by: Luca Weiss <luca.weiss@fairphone.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Steev Klimaszewski <steev@kali.org> # Thinkpad X13s Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride Cc: <stable@vger.kernel.org> # 4.9 Link: https://lore.kernel.org/r/COZYL8MWN97H.MROQ391BGA09@otso Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-06EDAC/highbank: Fix memory leak in highbank_mc_probe()Miaoqian Lin
[ Upstream commit e7a293658c20a7945014570e1921bf7d25d68a36 ] When devres_open_group() fails, it returns -ENOMEM without freeing memory allocated by edac_mc_alloc(). Call edac_mc_free() on the error handling path to avoid a memory leak. [ bp: Massage commit message. ] Fixes: a1b01edb2745 ("edac: add support for Calxeda highbank memory controller") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Link: https://lore.kernel.org/r/20221229054825.1361993-1-linmq006@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-25Merge branch 'v5.4/standard/base' into v5.4/standard/intel-x86Bruce Ashfield
2023-01-18EDAC/device: Fix period calculation in edac_device_reset_delay_period()Eliav Farber
commit e84077437902ec99eba0a6b516df772653f142c7 upstream. Fix period calculation in case user sets a value of 1000. The input of round_jiffies_relative() should be in jiffies and not in milli-seconds. [ bp: Use the same code pattern as in edac_device_workq_setup() for clarity. ] Fixes: c4cf3b454eca ("EDAC: Rework workqueue handling") Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20221020124458.22153-1-farbere@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-18EDAC/i10nm: fix refcount leak in pci_get_dev_wrapper()Yang Yingliang
[ Upstream commit 9c8921555907f4d723f01ed2d859b66f2d14f08e ] As the comment of pci_get_domain_bus_and_slot() says, it returns a PCI device with refcount incremented, so it doesn't need to call an extra pci_dev_get() in pci_get_dev_wrapper(), and the PCI device needs to be put in the error path. Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20221128065512.3572550-1-yangyingliang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-09Merge branch 'v5.4/standard/base' into v5.4/standard/intel-x86Bruce Ashfield
2022-04-27EDAC/synopsys: Read the error count from the correct registerShubhrajyoti Datta
commit e2932d1f6f055b2af2114c7e64a26dc1b5593d0c upstream. Currently, the error count is read wrongly from the status register. Read the count from the proper error count register (ERRCNT). [ bp: Massage. ] Fixes: b500b4a029d5 ("EDAC, synopsys: Add ECC support for ZynqMP DDR controller") Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Michal Simek <michal.simek@xilinx.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220414102813.4468-1-shubhrajyoti.datta@xilinx.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-02Merge branch 'v5.4/standard/base' into v5.4/standard/intel-x86Bruce Ashfield
2022-02-23EDAC: Fix calculation of returned address and next offset in edac_align_ptr()Eliav Farber
commit f8efca92ae509c25e0a4bd5d0a86decea4f0c41e upstream. Do alignment logic properly and use the "ptr" local variable for calculating the remainder of the alignment. This became an issue because struct edac_mc_layer has a size that is not zero modulo eight, and the next offset that was prepared for the private data was unaligned, causing an alignment exception. The patch in Fixes: which broke this actually wanted to "what we actually care about is the alignment of the actual pointer that's about to be returned." But it didn't check that alignment. Use the correct variable "ptr" for that. [ bp: Massage commit message. ] Fixes: 8447c4d15e35 ("edac: Do alignment logic properly in edac_align_ptr()") Signed-off-by: Eliav Farber <farbere@amazon.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220113100622.12783-2-farbere@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-09Merge branch 'v5.4/standard/base' into v5.4/standard/intel-x86Bruce Ashfield
2022-02-08EDAC/xgene: Fix deferred probingSergey Shtylyov
commit dfd0dfb9a7cc04acf93435b440dd34c2ca7b4424 upstream. The driver overrides error codes returned by platform_get_irq_optional() to -EINVAL for some strange reason, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the proper error codes to platform driver code upwards. [ bp: Massage commit message. ] Fixes: 0d4429301c4a ("EDAC: Add APM X-Gene SoC EDAC driver") Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220124185503.6720-3-s.shtylyov@omp.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-08EDAC/altera: Fix deferred probingSergey Shtylyov
commit 279eb8575fdaa92c314a54c0d583c65e26229107 upstream. The driver overrides the error codes returned by platform_get_irq() to -ENODEV for some strange reason, so if it returns -EPROBE_DEFER, the driver will fail the probe permanently instead of the deferred probing. Switch to propagating the proper error codes to platform driver code upwards. [ bp: Massage commit message. ] Fixes: 71bcada88b0f ("edac: altera: Add Altera SDRAM EDAC support") Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Dinh Nguyen <dinguyen@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220124185503.6720-2-s.shtylyov@omp.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-03Merge branch 'v5.4/standard/base' into v5.4/standard/intel-x86Bruce Ashfield
2022-01-27EDAC/synopsys: Use the quirk for version instead of ddr versionDinh Nguyen
[ Upstream commit bd1d6da17c296bd005bfa656952710d256e77dd3 ] Version 2.40a supports DDR_ECC_INTR_SUPPORT for a quirk, so use that quirk to determine a call to setup_address_map(). Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Michal Simek <michal.simek@xilinx.com> Link: https://lkml.kernel.org/r/20211012190709.1504152-1-dinguyen@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-29Merge branch 'v5.4/standard/base' into v5.4/standard/intel-x86Bruce Ashfield
2021-11-17EDAC/amd64: Handle three rank interleaving modeYazen Ghannam
[ Upstream commit 9f4873fb6af7966de8fcbd95c36b61351c1c4b1f ] AMD Rome systems and later support interleaving between three identical ranks within a channel. Check for this mode by counting the number of enabled chip selects and comparing their masks. If there are exactly three enabled chip selects and their masks are identical, then three rank interleaving is enabled. The size of a rank is determined from its mask value. However, three rank interleaving doesn't follow the method of swapping an interleave bit with the most significant bit. Rather, the interleave bit is flipped and the most significant bit remains the same. There is only a single interleave bit in this case. Account for this when determining the chip select size by keeping the most significant bit at its original value and ignoring any zero bits. This will return a full bitmask in [MSB:1]. Fixes: e53a3b267fb0 ("EDAC/amd64: Find Chip Select memory size using Address Mask") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211005154419.2060504-1-yazen.ghannam@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-17EDAC/sb_edac: Fix top-of-high-memory value for Broadwell/HaswellEric Badger
commit 537bddd069c743759addf422d0b8f028ff0f8dbc upstream. The computation of TOHM is off by one bit. This missed bit results in too low a value for TOHM, which can cause errors in regular memory to incorrectly report: EDAC MC0: 1 CE Error at MMIOH area, on addr 0x000000207fffa680 on any memory Fixes: 50d1bb93672f ("sb_edac: add support for Haswell based systems") Cc: stable@vger.kernel.org Reported-by: Meeta Saggi <msaggi@purestorage.com> Signed-off-by: Eric Badger <ebadger@purestorage.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20211010170127.848113-1-ebadger@purestorage.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-28Merge branch 'v5.4/standard/base' into v5.4/standard/intel-x86Bruce Ashfield
2021-10-20EDAC/armada-xp: Fix output of uncorrectable error counterHans Potsch
commit d9b7748ffc45250b4d7bcf22404383229bc495f5 upstream. The number of correctable errors is displayed as uncorrectable errors because the "SBE" error count is passed to both calls of edac_mc_handle_error(). Pass the correct uncorrectable error count to the second edac_mc_handle_error() call when logging uncorrectable errors. [ bp: Massage commit message. ] Fixes: 7f6998a41257 ("ARM: 8888/1: EDAC: Add driver for the Marvell Armada XP SDRAM and L2 cache ECC") Signed-off-by: Hans Potsch <hans.potsch@nokia.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20211006121332.58788-1-hans.potsch@nokia.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-03Merge branch 'v5.4/standard/base' into v5.4/standard/intel-x86Bruce Ashfield
2021-09-30EDAC/synopsys: Fix wrong value type assignment for edac_modeSai Krishna Potthuri
commit 5297cfa6bdf93e3889f78f9b482e2a595a376083 upstream. dimm->edac_mode contains values of type enum edac_type - not the corresponding capability flags. Fix that. Issue caught by Coverity check "enumerated type mixed with another type." [ bp: Rewrite commit message, add tags. ] Fixes: ae9b56e3996d ("EDAC, synps: Add EDAC support for zynq ddr ecc controller") Signed-off-by: Sai Krishna Potthuri <lakshmi.sai.krishna.potthuri@xilinx.com> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20210818072315.15149-1-shubhrajyoti.datta@xilinx.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-27Merge branch 'v5.4/standard/base' into v5.4/standard/intel-x86Bruce Ashfield
2021-09-15EDAC/i10nm: Fix NVDIMM detectionQiuxu Zhuo
[ Upstream commit 2294a7299f5e51667b841f63c6d69474491753fb ] MCDDRCFG is a per-channel register and uses bit{0,1} to indicate the NVDIMM presence on DIMM slot{0,1}. Current i10nm_edac driver wrongly uses MCDDRCFG as per-DIMM register and fails to detect the NVDIMM. Fix it by reading MCDDRCFG as per-channel register and using its bit{0,1} to check whether the NVDIMM is populated on DIMM slot{0,1}. Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Reported-by: Fan Du <fan.du@intel.com> Tested-by: Wen Jin <wen.jin@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210818175701.1611513-2-tony.luck@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14Merge branch 'v5.4/base' into v5.4/standard/intel-x86Bruce Ashfield
2021-07-14EDAC/Intel: Do not load EDAC driver when running as a guestLuck, Tony
[ Upstream commit f0a029fff4a50eb01648810a77ba1873e829fdd4 ] There's little to no point in loading an EDAC driver running in a guest: 1) The CPU model reported by CPUID may not represent actual h/w 2) The hypervisor likely does not pass in access to memory controller devices 3) Hypervisors generally do not pass corrected error details to guests Add a check in each of the Intel EDAC drivers for X86_FEATURE_HYPERVISOR and simply return -ENODEV in the init routine. Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210615174419.GA1087688@agluck-desk2.amr.corp.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-14EDAC/ti: Add missing MODULE_DEVICE_TABLEBixuan Cui
[ Upstream commit 0a37f32ba5272b2d4ec8c8d0f6b212b81b578f7e ] The module misses MODULE_DEVICE_TABLE() for of_device_id tables and thus never autoloads on ID matches. Add the missing declaration. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Bixuan Cui <cuibixuan@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tero Kristo <kristo@kernel.org> Link: https://lkml.kernel.org/r/20210512033727.26701-1-cuibixuan@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-20EDAC/igen6: ecclog_llist can be statickernel test robot
commit 77429eebd9b1af516bf1b6898e63b098ed748374 upstream. Fixes: 10590a9d4f23 ("EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECC") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20201123031850.GA20416@aef56166e5fc Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Liwei Song <liwei.song@windriver.com>
2021-01-20EDAC/igen6: Add debugfs interface for Intel client SoC EDAC driverQiuxu Zhuo
commit 2223d8c781a0c1a8cf26b1d8f13aff84557ecbfc upstream. Add debugfs support to fake memory correctable errors to test the error reporting path and the error address decoding logic in the igen6_edac driver. Please note that the fake errors are also reported to EDAC core and then the CE counter in EDAC sysfs is also increased. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Liwei Song <liwei.song@windriver.com>
2021-01-20EDAC/igen6: Add EDAC driver for Intel client SoCs using IBECCQiuxu Zhuo
commit 10590a9d4f23e0a519730d79d39331df60ad2079 upstream. This driver supports Intel client SoC with integrated memory controller using In-Band ECC(IBECC). The memory correctable and uncorrectable errors are reported via NMIs. The driver handles the NMIs and decodes the memory error address to platform specific address. The first IBECC-supported SoC is Elkhart Lake. [Tony: s/#include <linux/nmi.h>/#include <asm/nmi.h>/ to fix randconfig build] Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Liwei Song <liwei.song@windriver.com>
2021-01-20EDAC: Add three new memory typesQiuxu Zhuo
commit 3b20369313a486246582c8ef6ff5d1d0b9c34613 upstream. There are {Low-Power DDR3/4, WIO2} types of memory. Add new entries to 'enum mem_type' and new strings to 'edac_mem_types[]' for the new types. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Liwei Song <liwei.song@windriver.com>
2021-01-07Merge branch 'v5.4/standard/base' into v5.4/standard/intel-x86Bruce Ashfield
2020-12-30EDAC/amd64: Fix PCI component registrationBorislav Petkov
commit 706657b1febf446a9ba37dc51b89f46604f57ee9 upstream. In order to setup its PCI component, the driver needs any node private instance in order to get a reference to the PCI device and hand that into edac_pci_create_generic_ctl(). For convenience, it uses the 0th memory controller descriptor under the assumption that if any, the 0th will be always present. However, this assumption goes wrong when the 0th node doesn't have memory and the driver doesn't initialize an instance for it: EDAC amd64: F17h detected (node 0). ... EDAC amd64: Node 0: No DIMMs detected. But looking up node instances is not really needed - all one needs is the pointer to the proper device which gets discovered during instance init. So stash that pointer into a variable and use it when setting up the EDAC PCI component. Clear that variable when the driver needs to unwind due to some instances failing init to avoid any registration imbalance. Cc: <stable@vger.kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201122150815.13808-1-bp@alien8.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30EDAC/i10nm: Use readl() to access MMIO registersQiuxu Zhuo
commit 83ff51c4e3fecf6b8587ce4d46f6eac59f5d7c5a upstream. Instead of raw access, use readl() to access MMIO registers of memory controller to avoid possible compiler re-ordering. Fixes: d4dc89d069aa ("EDAC, i10nm: Add a driver for Intel 10nm server processors") Cc: <stable@vger.kernel.org> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30EDAC/mce_amd: Use struct cpuinfo_x86.cpu_die_id for AMD NodeIdYazen Ghannam
[ Upstream commit 8de0c9917cc1297bc5543b61992d5bdee4ce621a ] The edac_mce_amd module calls decode_dram_ecc() on AMD Family17h and later systems. This function is used in amd64_edac_mod to do system-specific decoding for DRAM ECC errors. The function takes a "NodeId" as a parameter. In AMD documentation, NodeId is used to identify a physical die in a system. This can be used to identify a node in the AMD_NB code and also it is used with umc_normaddr_to_sysaddr(). However, the input used for decode_dram_ecc() is currently the NUMA node of a logical CPU. In the default configuration, the NUMA node and physical die will be equivalent, so this doesn't have an impact. But the NUMA node configuration can be adjusted with optional memory interleaving modes. This will cause the NUMA node enumeration to not match the physical die enumeration. The mismatch will cause the address translation function to fail or report incorrect results. Use struct cpuinfo_x86.cpu_die_id for the node_id parameter to ensure the physical ID is used. Fixes: fbe63acf62f5 ("EDAC, mce_amd: Use cpu_to_node() to find the node ID") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201109210659.754018-4-Yazen.Ghannam@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-29x86/mce/amd, edac: Remove report_gart_errorsBorislav Petkov
commit 3e0fdec858d82c829774f271e88b5ceb17051551 upstream. ... because no one should be interested in spurious MCEs anyway. Make the filtering unconditional and move it to amd_filter_mce(). Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20200407163414.18058-2-bp@alien8.de Signed-off-by: Liwei Song <liwei.song@windriver.com>
2020-11-02Merge branch 'v5.4/standard/base' into v5.4/standard/intel-x86Bruce Ashfield
2020-10-29EDAC/ti: Fix handling of platform_get_irq() errorKrzysztof Kozlowski
[ Upstream commit 66077adb70a2a9e92540155b2ace33ec98299c90 ] platform_get_irq() returns a negative error number on error. In such a case, comparison to 0 would pass the check therefore check the return value properly, whether it is negative. [ bp: Massage commit message. ] Fixes: 86a18ee21e5e ("EDAC, ti: Add support for TI keystone and DRA7xx EDAC") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tero Kristo <t-kristo@ti.com> Link: https://lkml.kernel.org/r/20200827070743.26628-2-krzk@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29EDAC/aspeed: Fix handling of platform_get_irq() errorKrzysztof Kozlowski
[ Upstream commit afce6996943be265fa39240b67025cfcb1bcdfb1 ] platform_get_irq() returns a negative error number on error. In such a case, comparison to 0 would pass the check therefore check the return value properly, whether it is negative. [ bp: Massage commit message. ] Fixes: 9b7e6242ee4e ("EDAC, aspeed: Add an Aspeed AST2500 EDAC driver") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Stefan Schaeckeler <schaecsn@gmx.net> Link: https://lkml.kernel.org/r/20200827070743.26628-1-krzk@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-29EDAC/i5100: Fix error handling order in i5100_init_one()Dinghao Liu
[ Upstream commit 857a3139bd8be4f702c030c8ca06f3fd69c1741a ] When pci_get_device_func() fails, the driver doesn't need to execute pci_dev_put(). mci should still be freed, though, to prevent a memory leak. When pci_enable_device() fails, the error injection PCI device "einj" doesn't need to be disabled either. [ bp: Massage commit message, rename label to "bail_mc_free". ] Fixes: 52608ba205461 ("i5100_edac: probe for device 19 function 0") Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200826121437.31606-1-dinghao.liu@zju.edu.cn Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-05Merge branch 'v5.4/standard/base' into v5.4/standard/intel-x86Bruce Ashfield
Signed-off-by: Bruce Ashfield <bruce.ashfield@gmail.com>
2020-10-01EDAC/ghes: Check whether the driver is on the safe list correctlyBorislav Petkov
[ Upstream commit 251c54ea26fa6029b01a76161a37a12fde5124e4 ] With CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, a system would try to probe, unregister and probe again a driver. When ghes_edac is attempted to be loaded on a system which is not on the safe platforms list, ghes_edac_register() would return early. The unregister counterpart ghes_edac_unregister() would still attempt to unregister and exit early at the refcount test, leading to the refcount underflow below. In order to not do *anything* on the unregister path too, reuse the force_load parameter and check it on that path too, before fumbling with the refcount. ghes_edac: ghes_edac_register: entry ghes_edac: ghes_edac_register: return -ENODEV ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 10 PID: 1 at lib/refcount.c:28 refcount_warn_saturate+0xb9/0x100 Modules linked in: CPU: 10 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #12 Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 RIP: 0010:refcount_warn_saturate+0xb9/0x100 Code: 82 e8 fb 8f 4d 00 90 0f 0b 90 90 c3 80 3d 55 4c f5 00 00 75 88 c6 05 4c 4c f5 00 01 90 48 c7 c7 d0 8a 10 82 e8 d8 8f 4d 00 90 <0f> 0b 90 90 c3 80 3d 30 4c f5 00 00 0f 85 61 ff ff ff c6 05 23 4c RSP: 0018:ffffc90000037d58 EFLAGS: 00010292 RAX: 0000000000000026 RBX: ffff88840b8da000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffff8216b24f RDI: 00000000ffffffff RBP: ffff88840c662e00 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000046 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88840ee80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000800002211000 CR4: 00000000003506e0 Call Trace: ghes_edac_unregister ghes_remove platform_drv_remove really_probe driver_probe_device device_driver_attach __driver_attach ? device_driver_attach ? device_driver_attach bus_for_each_dev bus_add_driver driver_register ? bert_init ghes_init do_one_initcall ? rcu_read_lock_sched_held kernel_init_freeable ? rest_init kernel_init ret_from_fork ... ghes_edac: ghes_edac_unregister: FALSE, refcount: -1073741824 Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200911164950.GB19320@zn.tnic Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-09Merge branch 'v5.4/standard/base' into v5.4/standard/intel-x86Bruce Ashfield
2020-09-03EDAC/{i7core,sb,pnd2,skx}: Fix error event severityTony Luck
[ Upstream commit 45bc6098a3e279d8e391d22428396687562797e2 ] IA32_MCG_STATUS.RIPV indicates whether the return RIP value pushed onto the stack as part of machine check delivery is valid or not. Various drivers copied a code fragment that uses the RIPV bit to determine the severity of the error as either HW_EVENT_ERR_UNCORRECTED or HW_EVENT_ERR_FATAL, but this check is reversed (marking errors where RIPV is set as "FATAL"). Reverse the tests so that the error is marked fatal when RIPV is not set. Reported-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200707194324.14884-1-tony.luck@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03EDAC: skx_common: get rid of unused type varMauro Carvalho Chehab
[ Upstream commit f05390d30e20cccd8f8de981dee42bcdd8d2d137 ] drivers/edac/skx_common.c: In function ‘skx_mce_output_error’: drivers/edac/skx_common.c:478:8: warning: variable ‘type’ set but not used [-Wunused-but-set-variable] 478 | char *type, *optype; | ^~~~ Acked-by: Borislav Petkov <bp@alien8.de> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>