aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/edac/sb_edac.c
AgeCommit message (Collapse)Author
2015-02-09sb_edac: Fix detection on SNB machinesBorislav Petkov
d0585cd815fa ("sb_edac: Claim a different PCI device") changed the probing of sb_edac to look for PCI device 0x3ca0: 3f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07) 00: 86 80 a0 3c 00 00 00 00 07 00 80 08 00 00 80 00 ... but we're matching for 0x3ca8, i.e. PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA in sbridge_probe() therefore the probing fails. Changing it to probe for 0x3ca0 (PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0), .i.e., the 14.0 device, fixes the issue and driver loads successfully again: [ 2449.013120] EDAC DEBUG: sbridge_init: [ 2449.017029] EDAC sbridge: Seeking for: PCI ID 8086:3ca0 [ 2449.022368] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca0 [ 2449.028498] EDAC sbridge: Seeking for: PCI ID 8086:3ca0 [ 2449.033768] EDAC sbridge: Seeking for: PCI ID 8086:3ca8 [ 2449.039028] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca8 [ 2449.045155] EDAC sbridge: Seeking for: PCI ID 8086:3ca8 ... Add a debug printk while at it to be able to catch the failure in the future and dump driver version on successful load. Fixes: d0585cd815fa ("sb_edac: Claim a different PCI device") Cc: stable@vger.kernel.org # 3.18 Acked-by: Aristeu Rozanski <aris@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2014-12-02sb_edac: Fix typo computing number of banksTony Luck
Code will always think there are 16 banks because of a typo Reported-by: Misha Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02sb_edac: Add support for Broadwell-DE processorTony Luck
Broadwell-DE is the microserver version of next generation Xeon processors. A whole bunch of new PCIe device ids, but otherwise pretty much the same as Haswell. Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02sb_edac: Fix discovery of top-of-low-memory for HaswellTony Luck
Haswell moved the TOLM/TOHM registers to a different device and offset. The sb_edac driver accounted for the change of device, but not for the new offset. There was also a typo in the constant to fill in the low 26 bits (was 0x1ffffff, should be 0x3ffffff). This resulted in a bogus value for the top of low memory: EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff) which would result in EDAC refusing to translate addresses for errors above the bogus value and below 4GB: sbridge MC3: HANDLING MCE MEMORY ERROR sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090 sbridge MC3: TSC 0 sbridge MC3: ADDR 2000000 sbridge MC3: MISC 523eac86 sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0 MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0) With the fix we see the correct TOLM value: DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff) and we decode address 2000000 correctly: sbridge MC3: HANDLING MCE MEMORY ERROR sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090 sbridge MC3: TSC 0 sbridge MC3: ADDR 2000000 sbridge MC3: MISC 523e1086 sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0 DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0 DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01 DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1 DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 0xffffffff, RIR interleave 0, index 0 DEBUG: sbridge_mce_output_error: area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0 MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0) Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02sb_edac: Fix erroneous bytes->gigabytes conversionJim Snow
Signed-off-by: Jim Snow <jim.snow@intel.com> Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-10-08sb_edac: Claim a different PCI deviceAndy Lutomirski
sb_edac controls a large number of different PCI functions. Rather than registering as a normal PCI driver for all of them, it registers for just one so that it gets probed and, at probe time, it looks for all the others. Coincidentally, the device it registers for also contains the SMBUS registers, so the PCI core will refuse to probe both sb_edac and a future iMC SMBUS driver. The drivers don't actually conflict, so just change sb_edac's device table to probe a different device. An alternative fix would be to merge the two drivers, but sb_edac will also refuse to load on non-ECC systems, whereas i2c_imc would still be useful without ECC. The only user-visible change should be that sb_edac appears to bind a different device. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Rui Wang <ruiv.wang@gmail.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-10-08Move Intel SNB device ids from sb_edac to pci_ids.hAndy Lutomirski
The i2c_imc driver will use two of them, and moving only part of the list seems messier. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-10-08sb_edac: avoid INTERNAL ERROR message in EDAC with unspecified channelSeth Jennings
Intel IA32 SDM Table 15-14 defines channel 0xf as 'not specified', but EDAC doesn't know about this and returns and INTERNAL ERROR when the channel is greater than NUM_CHANNELS: kernel: [ 1538.886456] CPU 0: Machine Check Exception: 0 Bank 1: 940000000000009f kernel: [ 1538.886669] TSC 2bc68b22e7e812 ADDR 46dae7000 MISC 0 PROCESSOR 0:306e4 TIME 1390414572 SOCKET 0 APIC 0 kernel: [ 1538.971948] EDAC MC1: INTERNAL ERROR: channel value is out of range (15 >= 4) kernel: [ 1538.972203] EDAC MC1: 0 CE memory read error on unknown memory (slot:0 page:0x46dae7 offset:0x0 grain:0 syndrome:0x0 - area:DRAM err_code:0000:009f socket:1 channel_mask:1 rank:0) This commit changes sb_edac to forward a channel of -1 to EDAC if the channel is not specified. edac_mc_handle_error() sets the channel to -1 internally after the error message anyway, so this commit should have no effect other than avoiding the INTERNAL ERROR message when the channel is not specified. Signed-off-by: Seth Jennings <sjenning@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-06-26sb_edac: add support for Haswell based systemsAristeu Rozanski
Haswell memory controllers are very similar to Ivy Bridge and Sandy Bridge ones. This patch adds support to Haswell based systems. [m.chehab@samsung.com: Fix CodingStyle issues] Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26sb_edac: Fix mix tab/spaces alignmentsMauro Carvalho Chehab
We should not have spaces before ^I on alignments. Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26sb_edac: remove bogus assumption on mc orderingAristeu Rozanski
When a MC is handled, the correct sbridge_dev is searched based on the node, checking again later with the assumption the first memory controller found is the first socket's memory controller is a bogus assumption. Get rid of it. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26sb_edac: make minimal use of channel_maskAristeu Rozanski
channel_mask will be used in the future to determine which group of memory modules is causing the errors since when mirroring, lockstep and close page are enabled you can't. While that doesn't happen, use the channel_mask to determine the channel instead of relying on the MC event/exception. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26sb_edac: fix socket detection on Ivy Bridge controllersAristeu Rozanski
This patch fixes the obvious bug while handling the socket/HA bitmask used in Ivy Bridge memory controllers. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26sb_edac: search devices using product idAristeu Rozanski
This patch changes the way devices are searched by using product id instead of device/function numbers. Tested in a Sandy Bridge and a Ivy Bridge machine to make sure everything works properly. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26sb_edac: make RIR limit retrieval per modelAristeu Rozanski
Haswell has a different way to retrieve RIR limits, make this procedure per model. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26sb_edac: make node id retrieval per modelAristeu Rozanski
Haswell has a different way to retrieve the node id, make so this procedure can be reimplemented. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-06-26sb_edac: make memory type detection per memory controllerAristeu Rozanski
Haswell has different register, offset to determine memory type and supports DDR4 in some models. This patch makes it easier to have a different method depending on the memory controller type. Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-04-04Merge branch 'v4l_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: "The main set of series of patches for media subsystem, including: - document RC sysfs class - added an API to setup scancode to allow waking up systems using the Remote Controller - add API for SDR devices. Drivers are still on staging - some API improvements for getting EDID data from media inputs/outputs - new DVB frontend driver for drx-j (ATSC) - one driver (it913x/it9137) got removed, in favor of an improvement on another driver (af9035) - added a skeleton V4L2 PCI driver at documentation - added a dual flash driver (lm3646) - added a new IR driver (img-ir) - added an IR scancode decoder for the Sharp protocol - some improvements at the usbtv driver, to allow its core to be reused. - added a new SDR driver (rtl2832u_sdr) - added a new tuner driver (msi001) - several improvements at em28xx driver to fix PM support, device removal and to split the V4L2 specific bits into a separate sub-driver - one driver got converted to videobuf2 (s2255drv) - the e4000 tuner driver now follows an improved binding model - some fixes at V4L2 compat32 code - several fixes and enhancements at videobuf2 code - some cleanups at V4L2 API documentation - usual driver enhancements, new board additions and misc fixups" [ NOTE! This merge effective drops commit 4329b93b283c ("of: Reduce indentation in of_graph_get_next_endpoint"). The of_graph_get_next_endpoint() function was moved and renamed by commit fd9fdb78a9bf ("[media] of: move graph helpers from drivers/media/v4l2-core to drivers/of"). It was originally called v4l2_of_get_next_endpoint() and lived in the file drivers/media/v4l2-core/v4l2-of.c. In that original location, it was then fixed to support empty port nodes by commit b9db140c1e46 ("[media] v4l: of: Support empty port nodes"), and that commit clashes badly with the dropped "Reduce intendation" commit. I had to choose one or the other, and decided that the "Support empty port nodes" commit was more important ] * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (426 commits) [media] em28xx-dvb: fix PCTV 461e tuner I2C binding Revert "[media] em28xx-dvb: fix PCTV 461e tuner I2C binding" [media] em28xx: fix PCTV 290e LNA oops [media] em28xx-dvb: fix PCTV 461e tuner I2C binding [media] m88ds3103: fix bug on .set_tone() [media] saa7134: fix WARN_ON during resume [media] v4l2-dv-timings: add module name, description, license [media] videodev2.h: add parenthesis around macro arguments [media] saa6752hs: depends on CRC32 [media] si4713: fix Kconfig dependencies [media] Sensoray 2255 uses videobuf2 [media] adv7180: free an interrupt on failure paths in init_device() [media] e4000: make VIDEO_V4L2 dependency optional [media] af9033: Don't export functions for the hardware filter [media] af9035: use af9033 PID filters [media] af9033: implement PID filter [media] rtl2832_sdr: do not use dynamic stack allocation [media] e4000: fix 32-bit build error [media] em28xx-audio: make sure audio is unmuted on open() [media] DocBook media: v4l2_format_sdr was renamed to v4l2_sdr_format ...
2014-04-03Merge branch 'linux_next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull sb_edac patches from Mauro Carvalho Chehab: "A couple sb_edac driver improvements, cleaning a little bit the amount of data sent to dmesg, and fixing one error message" * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: sb_edac: mark MCE messages as KERN_DEBUG sb_edac: use "event" instead of "exception" when MC wasnt signaled
2014-03-13sb_edac: mark MCE messages as KERN_DEBUGAristeu Rozanski
Since the driver is decoding the MCE, it's useless to have these messages printed unless you're debugging a problem in the driver. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-03-13sb_edac: use "event" instead of "exception" when MC wasnt signaledAristeu Rozanski
Corrected Errors are MC events, not exceptions and reporting as the later might confuse users. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-02-20sb_edac: Degrade log level for device registrationJiang Liu
On a system with four Intel processors, it generates too many messages "EDAC sbridge: Seeking for: dev 1d.3 PCI ID xxxx". And it doesn't give many useful information for normal users, so change log level from INFO to DEBUG. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Link: http://lkml.kernel.org/r/1392613824-11230-1-git-send-email-jiang.liu@linux.intel.com Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-07[media, edac] Change my email addressMauro Carvalho Chehab
There are several left overs with my old email address. Remove their occurrences and add myself at CREDITS, to allow people to be able to reach me on my new addresses. Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-01-20Merge branch 'x86-ras-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: - SCI reporting for other error types not only correctable ones - GHES cleanups - Add the functionality to override error reporting agents as some machines are sporting a new extended error logging capability which, if done properly in the BIOS, makes a corresponding EDAC module redundant - PCIe AER tracepoint severity levels fix - Error path correction for the mce device init - MCE timer fix - Add more flexibility to the error injection (EINJ) debugfs interface * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, mce: Fix mce_start_timer semantics ACPI, APEI, GHES: Cleanup ghes memory error handling ACPI, APEI: Cleanup alignment-aware accesses ACPI, APEI, GHES: Do not report only correctable errors with SCI ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface ACPI, eMCA: Combine eMCA/EDAC event reporting priority EDAC, sb_edac: Modify H/W event reporting policy EDAC: Add an edac_report parameter to EDAC PCI, AER: Fix severity usage in aer trace event x86, mce: Call put_device on device_register failure
2014-01-20Merge tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds
Pull EDAC updates from Borislav Petkov: - mpc85xx PCIe error interrupt support - misc small enhancements/fixes all over the place. * tag 'edac_for_3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC: Don't try to cancel workqueue when it's never setup e752x_edac: Fix pci_dev usage count sb_edac: Mark get_mci_for_node_id as static EDAC: Mark edac_create_debug_nodes as static amd64_edac: Remove "amd64" prefix from static functions amd64_edac: Simplify code around decode_bus_error amd64_edac: Mark amd64_decode_bus_error as static EDAC: Remove DEFINE_PCI_DEVICE_TABLE macro amd64_edac: Fix condition to verify max channels allowed for F15 M30h edac/85xx: Add PCIe error interrupt edac support
2013-12-16Merge tag 'ras_for_3.14' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras Pull RAS updates from Borislav Petkov: * Add the functionality to override error reporting agents as some machines are sporting a new extended error logging capability which, if done properly in the BIOS, makes a corresponding EDAC module redundant, from Gong Chen. * PCIe AER tracepoint severity levels fix, from Rui Wang. * Error path correction for the mce device init, from Levente Kurusa. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-15sb_edac: Mark get_mci_for_node_id as staticRashika Kheria
This patch marks the function get_mci_for_node_id() as static because it is not used outside of sb_edac.c. Thus, it also eliminates the following warning: drivers/edac/sb_edac.c:918:22: warning: no previous prototype for ‘get_mci_for_node_id’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Link: http://lkml.kernel.org/r/0441f508186fc4eeabc8e9c3e4dde013d99405d4.1387029387.git.rashika.kheria@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-11EDAC, sb_edac: Modify H/W event reporting policyChen, Gong
Newer Intel platforms support more than one method to report H/W event. On this kind of platform, H/W event report can adopt new method and traditional EDAC method should be disabled. Moreover, if EDAC event report method is set to *force*, it means event must be reported via EDAC interface. IOW, it overrides the default event report policy. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1386310630-12529-3-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit and error messages ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-12-06EDAC: Remove DEFINE_PCI_DEVICE_TABLE macroJingoo Han
Currently, there is no other bus that has something like this macro for their device ids. Thus, DEFINE_PCI_DEVICE_TABLE macro should be removed. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Link: http://lkml.kernel.org/r/001c01ceefb3$5724d860$056e8920$%han@samsung.com [ Boris: swap commit message with better one. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-30sb_edac: Shut up compiler warning when EDAC_DEBUG is enabledAristeu Rozanski
Fix this: In file included from drivers/edac/sb_edac.c:27:0: drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’: drivers/edac/edac_core.h:50:8: warning: ‘limit’ may be used uninitialized in this function [-Wmaybe-uninitialized] printk(level "EDAC " prefix ": " fmt, ##arg) ^ drivers/edac/sb_edac.c:948:25: note: ‘limit’ was declared here u64 ch_addr, offset, limit, prv = 0; Limit can be initialized to 0. The only way limit wouldn't be initialized is if there are no DIMMs present (which would be a bug of course) and it'd fail on the next test. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Link: http://lkml.kernel.org/r/20131121122021.GD26009@pd.tnic Signed-off-by: Borislav Petkov <bp@suse.de>
2013-11-18Merge branch 'linux_next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC driver updates from Mauro Carvalho Chehab: - sb_edac: add support for Ivy Bridge support - cell_edac: add a missing of_node_put() call * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: cell_edac: fix missing of_node_put sb_edac: add support for Ivy Bridge sb_edac: avoid decoding the same error multiple times sb_edac: rename mci_bind_devs() sb_edac: enable multiple PCI id tables to be used sb_edac: rework sad_pkg sb_edac: allow different interleave lists sb_edac: allow different dram_rule arrays sb_edac: isolate TOHM retrieval sb_edac: rename pci_br sb_edac: isolate TOLM retrieval sb_edac: make RANK_CFG_A value part of sbridge_info
2013-11-14sb_edac: add support for Ivy BridgeAristeu Rozanski
Since Ivy Bridge memory controller is very similar to Sandy Bridge, it's wiser to modify sb_edac to support both instead of creating another driver. [m.chehab@samsung.com: Fix CodingStyle] Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: avoid decoding the same error multiple timesAristeu Rozanski
Whenever the extended error reporting is active, multiple MCEs will be generated for the same event, which will lead to multiple repeated errors to be reported. So check ADDRV and only decode the error if the MCE address is valid. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: rename mci_bind_devs()Aristeu Rozanski
This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: enable multiple PCI id tables to be usedAristeu Rozanski
This is needed to allow separated PCI id tables for Sandy Bridge and Ivy Bridge. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: rework sad_pkgAristeu Rozanski
This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: allow different interleave listsAristeu Rozanski
This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: allow different dram_rule arraysAristeu Rozanski
This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: isolate TOHM retrievalAristeu Rozanski
This is preparation of Ivy Bridge support. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: rename pci_brAristeu Rozanski
Ivy Bridge has more than one, so rename pci_br to pci_br0 Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: isolate TOLM retrievalAristeu Rozanski
This is in preparation for the Ivy Bridge support. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-11-14sb_edac: make RANK_CFG_A value part of sbridge_infoAristeu Rozanski
This is in preparation of Ivy Bridge support. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-10-21bitops: Introduce a more generic BITMASK macroChen, Gong
GENMASK is used to create a contiguous bitmask([hi:lo]). It is implemented twice in current kernel. One is in EDAC driver, the other is in SiS/XGI FB driver. Move it to a more generic place for other usage. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Winischhofer <thomas@winischhofer.net> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-04-29edac: sb_edac.c should not require prescence of IMC_DDRIO deviceLuck, Tony
The Sandy Bridge EDAC driver uses a register in the IMC_DDRIO CSR space to determine the type of DIMMs (registered or unregistered). But this device does not exist on some single socket Sandy Bridge servers. While the type of DIMMs is nice to know, it is not essential for this driver's other functions. So it seems harsh to have it refuse to load at all when it cannot find this device. Make the check for this device be optional. If it isn't present just report the memory type as "MEM_UNKNOWN". Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-20Merge tag 'v3.8-rc7' into nextMauro Carvalho Chehab
Linux 3.8-rc7 * tag 'v3.8-rc7': (12052 commits) Linux 3.8-rc7 net: sctp: sctp_endpoint_free: zero out secret key data net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree atm/iphase: rename fregt_t -> ffreg_t ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned ARM: DMA mapping: fix bad atomic test ARM: realview: ensure that we have sufficient IRQs available ARM: GIC: fix GIC cpumask initialization net: usb: fix regression from FLAG_NOARP code l2tp: dont play with skb->truesize net: sctp: sctp_auth_key_put: use kzfree instead of kfree netback: correct netbk_tx_err to handle wrap around. xen/netback: free already allocated memory on failure in xen_netbk_get_requests xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop. xen/netback: shutdown the ring if it contains garbage. drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try virtio_console: Don't access uninitialized data. net: qmi_wwan: add more Huawei devices, including E320 net: cdc_ncm: add another Huawei vendor specific device ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit() ...
2013-01-03Drivers: edac: remove __dev* attributes.Greg Kroah-Hartman
CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Olof Johansson <olof@lixom.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-12-21sb_edac: add a missing /n on a debug messageMauro Carvalho Chehab
[ 17.024963] EDAC DEBUG: get_memory_layout: TOHM: 132.160 GB (0x0000002043ffffff)<7>[ 17.024971] EDAC DEBUG: get_memory_layout: SAD#0 DRAM up to 33.792 GB (0x0000000840000000) Interleave: 8:6 reg=0x000083c3 Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-09-25sb_edac: Avoid overflow errors at memory size calculationMauro Carvalho Chehab
Sandy bridge EDAC is calculating the memory size with overflow. Basically, the size field and the integer calculation is using 32 bits. More bits are needed, when the DIMM memories have high density. The net result is that memories are improperly reported there, when high-density DIMMs are used: EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 0, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 1, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 As the number of pages value is handled at the EDAC core as unsigned ints, the driver shows the 16 GB memories at sysfs interface as 16760832 MB! The fix is simple: calculate the number of pages as unsigned 64-bits integer. After the patch, the memory size (16 GB) is properly detected: EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 0, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 1, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 Cc: stable@kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-07-29Merge branch 'devel'Mauro Carvalho Chehab
* devel: (33 commits) edac i5000, i5400: fix pointer math in i5000_get_mc_regs() edac: allow specifying the error count with fake_inject edac: add support for Calxeda highbank L2 cache ecc edac: add support for Calxeda highbank memory controller edac: create top-level debugfs directory sb_edac: properly handle error count i7core_edac: properly handle error count edac: edac_mc_handle_error(): add an error_count parameter edac: remove arch-specific parameter for the error handler amd64_edac: Don't pass driver name as an error parameter edac_mc: check for allocation failure in edac_mc_alloc() edac: Increase version to 3.0.0 edac_mc: Cleanup per-dimm_info debug messages edac: Convert debugfX to edac_dbg(X, edac: Use more normal debugging macro style edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs Edac: Add ABI Documentation for the new device nodes edac: move documentation ABI to ABI/testing/sysfs-devices-edac i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy edac: change the mem allocation scheme to make Documentation/kobject.txt happy ...
2012-06-12sb_edac: properly handle error countMauro Carvalho Chehab
Instead of reporting the error count via driver-specific details, use the new way provided by edac_mc_handle_error. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>