summaryrefslogtreecommitdiffstats
path: root/drivers/edac
AgeCommit message (Collapse)Author
2015-07-03sb_edac: Fix erroneous bytes->gigabytes conversionJim Snow
commit 8c009100295597f23978c224aec5751a365bc965 upstream. Signed-off-by: Jim Snow <jim.snow@intel.com> Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> [ vlee: Backported to 3.14. Adjusted context. ] Signed-off-by: Vinson Lee <vlee@twitter.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-29sb_edac: avoid INTERNAL ERROR message in EDAC with unspecified channelSeth Jennings
commit 351fc4a99d49fde63fe5ab7412beb35c40d27269 upstream. Intel IA32 SDM Table 15-14 defines channel 0xf as 'not specified', but EDAC doesn't know about this and returns and INTERNAL ERROR when the channel is greater than NUM_CHANNELS: kernel: [ 1538.886456] CPU 0: Machine Check Exception: 0 Bank 1: 940000000000009f kernel: [ 1538.886669] TSC 2bc68b22e7e812 ADDR 46dae7000 MISC 0 PROCESSOR 0:306e4 TIME 1390414572 SOCKET 0 APIC 0 kernel: [ 1538.971948] EDAC MC1: INTERNAL ERROR: channel value is out of range (15 >= 4) kernel: [ 1538.972203] EDAC MC1: 0 CE memory read error on unknown memory (slot:0 page:0x46dae7 offset:0x0 grain:0 syndrome:0x0 - area:DRAM err_code:0000:009f socket:1 channel_mask:1 rank:0) This commit changes sb_edac to forward a channel of -1 to EDAC if the channel is not specified. edac_mc_handle_error() sets the channel to -1 internally after the error message anyway, so this commit should have no effect other than avoiding the INTERNAL ERROR message when the channel is not specified. Signed-off-by: Seth Jennings <sjenning@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Vinson Lee <vlee@twopensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-06EDAC, amd64_edac: Prevent OOPS with >16 memory controllersDaniel J Blueman
commit 0c510cc83bdbaac8406f4f7caef34f4da0ba35ea upstream. When DRAM errors occur on memory controllers after EDAC_MAX_MCS (16), the kernel fatally dereferences unallocated structures, see splat below; this occurs on at least NumaConnect systems. Fix by checking if a memory controller info structure was found. BUG: unable to handle kernel NULL pointer dereference at 0000000000000320 IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0 PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0 Oops: 0000 [#2] SMP Modules linked in: CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G D 3.19.0 #1 Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b 01/28/2015 task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000 RIP: 0010:[<ffffffff819f714f>] [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0 RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297 RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6 RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022 R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008 R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000 FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0 Stack: 0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13 000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010 Call Trace: <IRQ> ? vprintk_default ? printk amd_decode_mce notifier_call_chain atomic_notifier_call_chain mce_log machine_check_poll mce_timer_fn ? mce_cpu_restart call_timer_fn.isra.29 run_timer_softirq __do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt <EOI> ? down_read_trylock __do_page_fault ? __schedule do_page_fault page_fault Signed-off-by: Daniel J Blueman <daniel@numascale.com> Link: http://lkml.kernel.org/r/1424144078-24589-1-git-send-email-daniel@numascale.com [ Boris: massage commit message ] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14cpc925_edac: Report UE events properlyJason Baron
commit fa19ac4b92bc2b5024af3e868f41f81fa738567a upstream. Fix UE event being reported as HW_EVENT_ERR_CORRECTED. Signed-off-by: Jason Baron <jbaron@akamai.com> Link: http://lkml.kernel.org/r/8beb13803500076fef827eab33d523e355d83759.1413405053.git.jbaron@akamai.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14e7xxx_edac: Report CE events properlyJason Baron
commit 8030122a9ccf939186f8db96c318dbb99b5463f6 upstream. Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED. Signed-off-by: Jason Baron <jbaron@akamai.com> Link: http://lkml.kernel.org/r/e6dd616f2cd51583a7e77af6f639b86313c74144.1413405053.git.jbaron@akamai.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14i3200_edac: Report CE events properlyJason Baron
commit 8a3f075d6c9b3612b4a5fb2af8db82b38b20caf0 upstream. Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED. Signed-off-by: Jason Baron <jbaron@akamai.com> Link: http://lkml.kernel.org/r/d02465b4f30314b390c12c061502eda5e9d29c52.1413405053.git.jbaron@akamai.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-14i82860_edac: Report CE events properlyJason Baron
commit ab0543de6ff0877474f57a5aafbb51a61e88676f upstream. Fix CE event being reported as HW_EVENT_ERR_UNCORRECTED. Signed-off-by: Jason Baron <jbaron@akamai.com> Link: http://lkml.kernel.org/r/7aee8e244a32ff86b399a8f966c4aae70296aae0.1413405053.git.jbaron@akamai.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-25i7300_edac: Fix device reference countJean Delvare
pci_get_device() decrements the reference count of "from" (last argument) so when we break off the loop successfully we have only one device reference - and we don't know which device we have. If we want a reference to each device, we must take them explicitly and let the pci_get_device() walk complete to avoid duplicate references. This is serious, as over-putting device references will cause the device to eventually disappear. Without this fix, the kernel crashes after a few insmod/rmmod cycles. Tested on an Intel S7000FC4UR system with a 7300 chipset. Signed-off-by: Jean Delvare <jdelvare@suse.de> Link: http://lkml.kernel.org/r/20140224111656.09bbb7ed@endymion.delvare Cc: Mauro Carvalho Chehab <m.chehab@samsung.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: stable@vger.kernel.org Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-25i7core_edac: Fix PCI device reference countJean Delvare
The reference count changes done by pci_get_device can be a little misleading when the usage diverges from the most common scheme. The reference count of the device passed as the last parameter is always decreased, even if the function returns no new device. So if we are going to try alternative device IDs, we must manually increment the device reference count before each retry. If we don't, we end up decreasing the reference count, and after a few modprobe/rmmod cycles the PCI devices will vanish. In other words and as Alan put it: without this fix the EDAC code corrupts the PCI device list. This fixes kernel bug #50491: https://bugzilla.kernel.org/show_bug.cgi?id=50491 Signed-off-by: Jean Delvare <jdelvare@suse.de> Link: http://lkml.kernel.org/r/20140224093927.7659dd9d@endymion.delvare Reviewed-by: Alan Cox <alan@linux.intel.com> Cc: Mauro Carvalho Chehab <m.chehab@samsung.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: stable@vger.kernel.org Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-14EDAC: Correct workqueue setup pathBorislav Petkov
We're using edac_mc_workq_setup() both on the init path, when we load an edac driver and when we change the polling period (edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec. On that second path we don't need to init the workqueue which has been initialized already. Thanks to Tejun for workqueue insights. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com Cc: <stable@vger.kernel.org>
2014-02-14EDAC: Poll timeout cannot be zero, p2Borislav Petkov
Sanitize code even more to accept unsigned longs only and to not allow polling intervals below 1 second as this is unnecessary and doesn't make much sense anyway for polling errors. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com Cc: Doug Thompson <dougthompson@xmission.com> Cc: <stable@vger.kernel.org>
2014-02-10drivers/edac/edac_mc_sysfs.c: poll timeout cannot be zeroPrarit Bhargava
If you do echo 0 > /sys/module/edac_core/parameters/edac_mc_poll_msec the following stack trace is output because the edac module is not designed to poll with a timeout of zero. WARNING: CPU: 12 PID: 0 at lib/list_debug.c:33 __list_add+0xac/0xc0() list_add corruption. prev->next should be next (ffff8808291dd1b8), but was (null). (prev=ffff8808286fe3f8). Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 12 PID: 0 Comm: swapper/12 Not tainted 3.13.0+ #1 Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Call Trace: <IRQ> __list_add+0xac/0xc0 __internal_add_timer+0xab/0x130 internal_add_timer+0x17/0x40 mod_timer_pinned+0xca/0x170 intel_pstate_timer_func+0x28a/0x380 call_timer_fn+0x36/0x100 run_timer_softirq+0x1ff/0x2f0 __do_softirq+0xf5/0x2e0 irq_exit+0x10d/0x120 smp_apic_timer_interrupt+0x45/0x60 apic_timer_interrupt+0x6d/0x80 <EOI> cpuidle_idle_call+0xb9/0x1f0 arch_cpu_idle+0xe/0x30 cpu_startup_entry+0x9e/0x240 start_secondary+0x1e4/0x290 kernel BUG at kernel/timer.c:1084! invalid opcode: 0000 [#1] SMP Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 12 PID: 0 Comm: swapper/12 Tainted: G W 3.13.0+ #1 Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Call Trace: <IRQ> run_timer_softirq+0x245/0x2f0 __do_softirq+0xf5/0x2e0 irq_exit+0x10d/0x120 smp_apic_timer_interrupt+0x45/0x60 apic_timer_interrupt+0x6d/0x80 <EOI> cpuidle_idle_call+0xb9/0x1f0 arch_cpu_idle+0xe/0x30 cpu_startup_entry+0x9e/0x240 start_secondary+0x1e4/0x290 RIP cascade+0x93/0xa0 WARNING: CPU: 36 PID: 1154 at kernel/workqueue.c:1461 __queue_delayed_work+0xed/0x1a0() Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 36 PID: 1154 Comm: kworker/u481:3 Tainted: G W 3.13.0+ #1 Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Workqueue: edac-poller edac_mc_workq_function [edac_core] Call Trace: dump_stack+0x45/0x56 warn_slowpath_common+0x7d/0xa0 warn_slowpath_null+0x1a/0x20 __queue_delayed_work+0xed/0x1a0 queue_delayed_work_on+0x27/0x50 edac_mc_workq_function+0x72/0xa0 [edac_core] process_one_work+0x17b/0x460 worker_thread+0x11b/0x400 kthread+0xd2/0xf0 ret_from_fork+0x7c/0xb0 This patch adds a range check in the edac_mc_poll_msec code to check for 0. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-20Merge branch 'x86-ras-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: - SCI reporting for other error types not only correctable ones - GHES cleanups - Add the functionality to override error reporting agents as some machines are sporting a new extended error logging capability which, if done properly in the BIOS, makes a corresponding EDAC module redundant - PCIe AER tracepoint severity levels fix - Error path correction for the mce device init - MCE timer fix - Add more flexibility to the error injection (EINJ) debugfs interface * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, mce: Fix mce_start_timer semantics ACPI, APEI, GHES: Cleanup ghes memory error handling ACPI, APEI: Cleanup alignment-aware accesses ACPI, APEI, GHES: Do not report only correctable errors with SCI ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface ACPI, eMCA: Combine eMCA/EDAC event reporting priority EDAC, sb_edac: Modify H/W event reporting policy EDAC: Add an edac_report parameter to EDAC PCI, AER: Fix severity usage in aer trace event x86, mce: Call put_device on device_register failure
2014-01-20Merge tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds
Pull EDAC updates from Borislav Petkov: - mpc85xx PCIe error interrupt support - misc small enhancements/fixes all over the place. * tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC: Don't try to cancel workqueue when it's never setup e752x_edac: Fix pci_dev usage count sb_edac: Mark get_mci_for_node_id as static EDAC: Mark edac_create_debug_nodes as static amd64_edac: Remove "amd64" prefix from static functions amd64_edac: Simplify code around decode_bus_error amd64_edac: Mark amd64_decode_bus_error as static EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro amd64_edac: Fix condition to verify max channels allowed for F15 M30h edac/85xx: Add PCIe error interrupt edac support
2014-01-10EDAC: Don't try to cancel workqueue when it's never setupStephen Boyd
We only setup a workqueue for edac devices that use the polling method. We still try to cancel the workqueue if an edac_device uses the irq method though. This causes a warning from debug objects when we remove an edac device: WARNING: CPU: 0 PID: 56 at lib/debugobjects.c:260 debug_print_object+0x98/0xc0() ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x28 Modules linked in: krait_edac(-) CPU: 0 PID: 56 Comm: rmmod Not tainted 3.12.0-rc2-00035-g15292a0 #69 (unwind_backtrace+0x0/0x144) (show_stack+0x20/0x24) (dump_stack+0x74/0xb4) (warn_slowpath_common+0x78/0x9c) (warn_slowpath_fmt+0x40/0x48) (debug_print_object+0x98/0xc0) (debug_object_assert_init+0xdc/0xec) (del_timer+0x24/0x7c) (try_to_grab_pending+0xc0/0x1b0) (cancel_delayed_work+0x2c/0xa0) (edac_device_workq_teardown+0x1c/0x38) (edac_device_del_device+0xb8/0xe4) (krait_edac_remove+0x50/0x70 [krait_edac]) (platform_drv_remove+0x24/0x28) (__device_release_driver+0x68/0xc0) (driver_detach+0xc4/0xc8) (bus_remove_driver+0xac/0x114) (driver_unregister+0x38/0x58) (platform_driver_unregister+0x1c/0x20) (krait_edac_driver_exit+0x14/0x1c [krait_edac]) (SyS_delete_module+0x178/0x2b4) Fix it by skipping the workqueue teardown for such devices. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Link: http://lkml.kernel.org/r/1388434457-4194-2-git-send-email-sboyd@codeaurora.org Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-21e752x_edac: Fix pci_dev usage countAristeu Rozanski
In case the device 0, function 1 is not found using pci_get_device(), pci_scan_single_device() will be used but, differently than pci_get_device(), it allocates a pci_dev but doesn't does bump the usage count on the pci_dev and after few module removals and loads the pci_dev will be freed. Signed-off-by: Aristeu Rozanski <aris@redhat.com> Reviewed-by: mark gross <mark.gross@intel.com> Link: http://lkml.kernel.org/r/20131205153755.GL4545@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-16Merge tag 'ras_for_3.14' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras Pull RAS updates from Borislav Petkov: * Add the functionality to override error reporting agents as some machines are sporting a new extended error logging capability which, if done properly in the BIOS, makes a corresponding EDAC module redundant, from Gong Chen. * PCIe AER tracepoint severity levels fix, from Rui Wang. * Error path correction for the mce device init, from Levente Kurusa. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-15sb_edac: Mark get_mci_for_node_id as staticRashika Kheria
This patch marks the function get_mci_for_node_id() as static because it is not used outside of sb_edac.c. Thus, it also eliminates the following warning: drivers/edac/sb_edac.c:918:22: warning: no previous prototype for ‘get_mci_for_node_id’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Link: http://lkml.kernel.org/r/0441f508186fc4eeabc8e9c3e4dde013d99405d4.1387029387.git.rashika.kheria@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15EDAC: Mark edac_create_debug_nodes as staticRashika Kheria
This patch marks the function edac_create_debug_nodes() as static because it is not used outside of edac_mc_sysfs.c. Thus, it also eliminates the following warning: drivers/edac/edac_mc_sysfs.c:917:5: warning: no previous prototype for ‘edac_create_debug_nodes’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Link: http://lkml.kernel.org/r/a1c863b08c0d6f67d03280cf908c771bf26a3239.1387029387.git.rashika.kheria@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15amd64_edac: Remove "amd64" prefix from static functionsBorislav Petkov
No need for the namespace tagging there. Cleanup setup_pci_device while at it. Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15amd64_edac: Simplify code around decode_bus_errorBorislav Petkov
Drop wrapper function and prefixes. Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-15amd64_edac: Mark amd64_decode_bus_error as staticRashika Kheria
This patch marks the function amd64_decode_bus_error() as static because it is not used outside of amd64_edac.c. It also eliminates the following warning: drivers/edac/amd64_edac.c:2038:6: warning: no previous prototype for ‘amd64_decode_bus_error’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Link: http://lkml.kernel.org/r/7cddbd4c69ed493f183383e98853181aaf75b26b.1387029387.git.rashika.kheria@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-11EDAC, sb_edac: Modify H/W event reporting policyChen, Gong
Newer Intel platforms support more than one method to report H/W event. On this kind of platform, H/W event report can adopt new method and traditional EDAC method should be disabled. Moreover, if EDAC event report method is set to *force*, it means event must be reported via EDAC interface. IOW, it overrides the default event report policy. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1386310630-12529-3-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit and error messages ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-11EDAC: Add an edac_report parameter to EDACChen, Gong
This new parameter is used to control how to report HW error reporting, especially for newer Intel platform, like Ivybridge-EX, which contains an enhanced error decoding functionality in the firmware, i.e. eMCA. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1386310630-12529-2-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-06EDAC: Remove DEFINE_PCI_DEVICE_TABLE macroJingoo Han
Currently, there is no other bus that has something like this macro for their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com [ Boris: swap commit message with better one. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-06amd64_edac: Fix condition to verify max channels allowed for F15 M30hAravind Gopalakrishnan
The value returned from 'f15_m30h_determine_channel' will always be 0x3 max. The condition (channel > 4 || channel < 0) works as hardware never returns a value of 4, but it leads to static checker analysis errors like http://marc.info/?l=linux-edac&m=138607615131951&w=2. Fix that. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/20131203130857.GA32170@elgon.mountain [ Boris: massage commit message a bit. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-30sb_edac: Shut up compiler warning when EDAC_DEBUG is enabledAristeu Rozanski
Fix this: In file included from drivers/edac/sb_edac.c:27:0: drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’: drivers/edac/edac_core.h:50:8: warning: ‘limit’ may be used uninitialized in this function [-Wmaybe-uninitialized] printk(level "EDAC " prefix ": " fmt, ##arg) ^ drivers/edac/sb_edac.c:948:25: note: ‘limit’ was declared here u64 ch_addr, offset, limit, prv = 0; Limit can be initialized to 0. The only way limit wouldn't be initialized is if there are no DIMMs present (which would be a bug of course) and it'd fail on the next test. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Link: http://lkml.kernel.org/r/20131121122021.GD26009@pd.tnic Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-25edac/85xx: Add PCIe error interrupt edac supportChunhe Lan
Add pcie error interrupt edac support for mpc85xx, p3041, p4080, and p5020. The mpc85xx uses the legacy interrupt report mechanism - the error interrupts are reported directly to mpic. While the p3041/ p4080/p5020 attaches the most of error interrupts to interrupt zero. And report error interrupts to mpic via interrupt 0. This patch can handle both of them. Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com> Link: http://lkml.kernel.org/r/1384712714-8826-3-git-send-email-morbidrsa@gmail.com Cc: Doug Thompson <dougthompson@xmission.com> Cc: Dave Jiang <dave.jiang@gmail.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de> Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-18Merge branch 'linux_next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC driver updates from Mauro Carvalho Chehab: - sb_edac: add support for Ivy Bridge support - cell_edac: add a missing of_node_put() call * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: cell_edac: fix missing of_node_put sb_edac: add support for Ivy Bridge sb_edac: avoid decoding the same error multiple times sb_edac: rename mci_bind_devs() sb_edac: enable multiple PCI id tables to be used sb_edac: rework sad_pkg sb_edac: allow different interleave lists sb_edac: allow different dram_rule arrays sb_edac: isolate TOHM retrieval sb_edac: rename pci_br sb_edac: isolate TOLM retrieval sb_edac: make RANK_CFG_A value part of sbridge_info
2013-11-18Merge tag 'edac_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds
Pull EDAC updates from Borislav Petkov: "Following up on last week's discussion, here's my part of the EDAC pile, highlights in the signed tag. The last two patches have a date from just now because I've just applied them to the tree after Johannes sent them to me earlier. I decided to forward them now because they're trivial. There's a third one for MPC85xx which adds PCIe error interrupt support but since it is not so trivial and hasn't seen any linux-next time, I'm deferring it to 3.14 EDAC update highlights: - Support for Calxeda ECX-2000 memory controller, from Robert Richter - Misc Calxeda Highbank drivers and EDAC core cleanups, from Rob Herring and Robert Richter - New maintainer for Freescale's MPC85xx EDAC driver: Johannes Thumshirn" * tag 'edac_for_3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: edac/85xx: Remove mpc85xx_pci_err_remove EDAC: Add edac-mpc85xx driver to MAINTAINERS edac, highbank: Moving error injection to sysfs for edac edac, highbank: Add MAINTAINERS entry edac: Unify reporting of device info for device, mc and pci edac, highbank: Improve and unify naming edac, highbank: Add Calxeda ECX-2000 support ARM: dts: calxeda: move memory-controller node out of ecx-common.dtsi edac, highbank: Fix interrupt setup of mem and l2 controller
2013-11-17edac/85xx: Remove mpc85xx_pci_err_removeJohannes Thumshirn
Remove mpc85xx_pci_err_remove(...) which is obsolete, this removes the compiler warning which can be seen when building the driver either statically or as a module. Signed-off-by: Johannes Thumshirn <morbidrsa@gmail.com> Link: https://lkml.kernel.org/r/20131112161901.GA15637@jtlinux Signed-off-by: Johannes Thumshirn <johannes.thumshirn@men.de> Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-14cell_edac: fix missing of_node_putLibo Chen
Decrease device_node refcount np after task completion. Signed-off-by: Libo Chen <libo.chen@huawei.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: add support for Ivy BridgeAristeu Rozanski
Since Ivy Bridge memory controller is very similar to Sandy Bridge, it's wiser to modify sb_edac to support both instead of creating another driver. [m.chehab@samsung.com: Fix CodingStyle] Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: avoid decoding the same error multiple timesAristeu Rozanski
Whenever the extended error reporting is active, multiple MCEs will be generated for the same event, which will lead to multiple repeated errors to be reported. So check ADDRV and only decode the error if the MCE address is valid. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: rename mci_bind_devs()Aristeu Rozanski
This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: enable multiple PCI id tables to be usedAristeu Rozanski
This is needed to allow separated PCI id tables for Sandy Bridge and Ivy Bridge. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: rework sad_pkgAristeu Rozanski
This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: allow different interleave listsAristeu Rozanski
This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: allow different dram_rule arraysAristeu Rozanski
This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: isolate TOHM retrievalAristeu Rozanski
This is preparation of Ivy Bridge support. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: rename pci_brAristeu Rozanski
Ivy Bridge has more than one, so rename pci_br to pci_br0 Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: isolate TOLM retrievalAristeu Rozanski
This is in preparation for the Ivy Bridge support. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: make RANK_CFG_A value part of sbridge_infoAristeu Rozanski
This is in preparation of Ivy Bridge support. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-12Merge tag 'devicetree-for-3.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: "DeviceTree updates for 3.13. This is a bit larger pull request than usual for this cycle with lots of clean-up. - Cross arch clean-up and consolidation of early DT scanning code. - Clean-up and removal of arch prom.h headers. Makes arch specific prom.h optional on all but Sparc. - Addition of interrupts-extended property for devices connected to multiple interrupt controllers. - Refactoring of DT interrupt parsing code in preparation for deferred probe of interrupts. - ARM cpu and cpu topology bindings documentation. - Various DT vendor binding documentation updates" * tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits) powerpc: add missing explicit OF includes for ppc dt/irq: add empty of_irq_count for !OF_IRQ dt: disable self-tests for !OF_IRQ of: irq: Fix interrupt-map entry matching MIPS: Netlogic: replace early_init_devtree() call of: Add Panasonic Corporation vendor prefix of: Add Chunghwa Picture Tubes Ltd. vendor prefix of: Add AU Optronics Corporation vendor prefix of/irq: Fix potential buffer overflow of/irq: Fix bug in interrupt parsing refactor. of: set dma_mask to point to coherent_dma_mask of: add vendor prefix for PHYTEC Messtechnik GmbH DT: sort vendor-prefixes.txt of: Add vendor prefix for Cadence of: Add empty for_each_available_child_of_node() macro definition arm/versatile: Fix versatile irq specifications. of/irq: create interrupts-extended property microblaze/pci: Drop PowerPC-ism from irq parsing of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code. of/irq: Use irq_of_parse_and_map() ...
2013-11-04edac, highbank: Moving error injection to sysfs for edacRobert Richter
Always have the error injection i/f available, even if there is no debugfs or EDAC_DEBUG enabled. We need this for testing production kernels and environments. Thus, the entry moves from: /sys/kernel/debug/edac/mc0/inject_ctrl to: /sys/devices/system/edac/mc/mc0/inject_ctrl No other changes of the interface. Signed-off-by: Robert Richter <robert.richter@linaro.org> Signed-off-by: Robert Richter <rric@kernel.org>
2013-11-04edac: Unify reporting of device info for device, mc and pciRobert Richter
Log messages slightly differ between edac subsystems. Unifying it. Signed-off-by: Robert Richter <robert.richter@linaro.org> Acked-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rric@kernel.org>
2013-11-04edac, highbank: Improve and unify namingRobert Richter
Assinging correct names of the 'hb_mc_edac' and 'hb_l2_edac' edac modules for module, controller and device. Reported values for Highbank in dmesg are now: EDAC MC0: Giving out device to module hb_mc_edac controller calxeda,hb-ddr-ctrl: DEV fff00000.memory-controller (INTERRUPT) EDAC DEVICE0: Giving out device to module hb_l2_edac controller calxeda,hb-sregs-l2-ecc: DEV fff3c200.sregs (INTERRUPT) Signed-off-by: Robert Richter <robert.richter@linaro.org> Acked-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Robert Richter <rric@kernel.org>
2013-11-04edac, highbank: Add Calxeda ECX-2000 supportRobert Richter
Implement edac support for Calxeda ECX-2000. The ECX-2000 memory controller is similar to Highbank but has different register bases for error and interrupt registers. There is an own device tree name "calxeda,ecx-2000-ddr-ctrl" for identification and initialization of the ECX-2000 and its base addresses. Signed-off-by: Robert Richter <robert.richter@linaro.org> Acked-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Robert Richter <rric@kernel.org>
2013-11-04edac, highbank: Fix interrupt setup of mem and l2 controllerRobert Richter
Register and enable interrupts after the edac registration. Otherwise incomming ecc error interrupts lead to crashes during device setup. Fixing this in drivers for mc and l2. Signed-off-by: Robert Richter <robert.richter@linaro.org> Acked-by: Rob Herring <rob.herring@calxeda.com> Cc: stable <stable@vger.kernel.org> # 3.6+ Signed-off-by: Robert Richter <rric@kernel.org>
2013-10-23EDAC, GHES: Update ghes error record infoChen, Gong
In latest UEFI spec(by now it's 2.4) there are some new fields for memory error reporting. Add these new fields for ghes_edac interface. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Cc: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Tony Luck <tony.luck@intel.com>