summaryrefslogtreecommitdiffstats
path: root/drivers/edac/edac_mc_sysfs.c
AgeCommit message (Collapse)AuthorFilesLines
2017-01-19EDAC: Expose per-DIMM error counts in sysfsAaron Miller1-0/+38
The old csrowX sysfs directories have per-csrow error counters, but the new dimmX directories do not currently expose error counts. EDAC already keeps these counts, add them to sysfs so per-DIMM counts are still available when CONFIG_EDAC_LEGACY_SYSFS=n. Signed-off-by: Aaron Miller <aaronmiller@fb.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20161103220153.3997328-1-aaronmiller@fb.com Signed-off-by: Borislav Petkov <bp@suse.de>
2017-01-06EDAC: Make dev_attr_sdram_scrub_rate staticBen Dooks1-1/+1
The dev_attr_sdram_scrub_rate is not declared in a header or used anywhere else, so make it static to fix the following warning: drivers/edac/edac_mc_sysfs.c:816:1: warning: symbol 'dev_attr_sdram_scrub_rate' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1465407356-7357-1-git-send-email-ben.dooks@codethink.co.uk Signed-off-by: Borislav Petkov <bp@suse.de>
2016-12-15edac: rename edac_core.h to edac_mc.hMauro Carvalho Chehab1-1/+1
Now, all left at edac_core.h are at drivers/edac/edac_mc.c, so rename it to edac_mc.h. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-06-16EDAC: Correct channel count limitBorislav Petkov1-1/+19
c44696fff04f ("EDAC: Remove arbitrary limit on number of channels") lifted the arbitrary limit on memory controller channels in EDAC. However, the dynamic channel attributes dynamic_csrow_dimm_attr and dynamic_csrow_ce_count_attr remained 6. This wasn't a problem except channels 6 and 7 weren't visible in sysfs on machines with more than 6 channels after the conversion to static attr groups with 2c1946b6d629 ("EDAC: Use static attribute groups for managing sysfs entries") [ without that, we're exploding in edac_create_sysfs_mci_device() because we're dereferencing out of the bounds of the dynamic_csrow_dimm_attr array. ] Add attributes for channels 6 and 7 along with a guard for the future, should more channels be required and/or to sanity check for misconfigured machines. We still need to check against the number of channels present on the MC first, as Thor reported. Signed-off-by: Borislav Petkov <bp@suse.de> Reported-by: Hironobu Ishii <ishii.hironobu@jp.fujitsu.com> Tested-by: Thor Thayer <tthayer@opensource.altera.com> Cc: <stable@vger.kernel.org> # 4.2
2016-04-23EDAC: Fix used after kfree() error in edac_unregister_sysfs()Tony Luck1-1/+2
Code flow looks like this: device_unregister(&mci->dev); -> kobject_put+0x25/0x50 -> kobject_cleanup+0x77/0x190 -> device_release+0x32/0xa0 -> mci_attr_release+0x36/0x70 -> kfree(mci); bus_unregister(mci->bus); Fix is to grab a local copy of "mci->bus" and use that when we call bus_unregister(). Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/21d595b0ab3d718d9cb206647f4ec91c05e62ec4.1461261078.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Remove edac_get_sysfs_subsys() error handlingBorislav Petkov1-10/+1
It cannot fail now. We either load EDAC core after having successfully initialized edac_subsys or we don't. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC: Rip out the edac_subsys reference countingBorislav Petkov1-4/+1
This was really dumb - reference counting for the main EDAC sysfs object. While we could've simply registered it as the first thing in the module init path and then hand it around to what needs it. Do that and rip out all the code around it, thus simplifying the whole handling significantly. Move the edac_subsys node back to edac_module.c. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11EDAC, mc_sysfs: Fix freeing bus' nameBorislav Petkov1-7/+14
I get the splat below when modprobing/rmmoding EDAC drivers. It happens because bus->name is invalid after bus_unregister() has run. The Code: section below corresponds to: .loc 1 1108 0 movq 672(%rbx), %rax # mci_1(D)->bus, mci_1(D)->bus .loc 1 1109 0 popq %rbx # .loc 1 1108 0 movq (%rax), %rdi # _7->name, jmp kfree # and %rax has some funky stuff 2030203020312030 which looks a lot like something walked over it. Fix that by saving the name ptr before doing stuff to string it points to. general protection fault: 0000 [#1] SMP Modules linked in: ... CPU: 4 PID: 10318 Comm: modprobe Tainted: G I EN 3.12.51-11-default+ #48 Hardware name: HP ProLiant DL380 G7, BIOS P67 05/05/2011 task: ffff880311320280 ti: ffff88030da3e000 task.ti: ffff88030da3e000 RIP: 0010:[<ffffffffa019da92>] [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core] RSP: 0018:ffff88030da3fe28 EFLAGS: 00010292 RAX: 2030203020312030 RBX: ffff880311b4e000 RCX: 000000000000095c RDX: 0000000000000001 RSI: ffff880327bb9600 RDI: 0000000000000286 RBP: ffff880311b4e750 R08: 0000000000000000 R09: ffffffff81296110 R10: 0000000000000400 R11: 0000000000000000 R12: ffff88030ba1ac68 R13: 0000000000000001 R14: 00000000011b02f0 R15: 0000000000000000 FS: 00007fc9bf8f5700(0000) GS:ffff8801a7c40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000403c90 CR3: 000000019ebdf000 CR4: 00000000000007e0 Stack: Call Trace: i7core_unregister_mci.isra.9 i7core_remove pci_device_remove __device_release_driver driver_detach bus_remove_driver pci_unregister_driver i7core_exit SyS_delete_module system_call_fastpath 0x7fc9bf426536 Code: 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb e8 52 2a 1f e1 48 8b bb a0 02 00 00 e8 46 59 1f e1 48 8b 83 a0 02 00 00 5b <48> 8b 38 e9 26 9a fe e0 66 0f 1f 44 00 00 66 66 66 66 90 48 8b RIP [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core] RSP <ffff88030da3fe28> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: <stable@vger.kernel.org> # v3.6.. Fixes: 7a623c039075 ("edac: rewrite the sysfs code to use struct device")
2015-10-14EDAC: Use edac_debugfs_remove_recursive()Tan Xiaojun1-1/+1
debugfs_remove() is used to remove a file or a directory from the debugfs filesystem, but mci->debugfs might not empty. This can be triggered by the following sequence: 1) Enable CONFIG_EDAC_DEBUG 2) insmod an EDAC module (like i3000_edac or similar) 3) rmmod this module 4) we can see files remaining under <debugfs_mountpoint>/edac/ like "fake_inject", for example. Removing edac_core then, causes a NULL pointer dereference. Reported-by: Yun Wu (Abel) <wuyun.wu@huawei.com> Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1444787364-104353-1-git-send-email-tanxiaojun@huawei.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-28EDAC: Don't allow empty DIMM labelsToshi Kani1-2/+2
Updating dimm_label to an empty string does not make much sense. Change the sysfs dimm_label store operation to fail a request when an input string is empty. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: elliott@hpe.com Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1443124767.25474.172.camel@hpe.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-25EDAC: Fix sysfs dimm_label store operationToshi Kani1-10/+24
Sysfs "dimm_label" and "chX_dimm_label" nodes have the following issues in their store operation: 1) A newline-terminated input string causes redundant newlines: # echo "test" > /sys/bus/mc0/devices/dimm0/dimm_label # cat /sys/bus/mc0/devices/dimm0/dimm_label test # od -bc /sys/bus/mc0/devices/dimm0/dimm_label 0000000 164 145 163 164 012 012 t e s t \n \n 0000006 2) The original label string (31 characters) cannot be stored due to an improper size check: # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0" > /sys/bus/mc0/devices/dimm0/dimm_label # cat /sys/bus/mc0/devices/dimm0/dimm_label # od -bc /sys/bus/mc0/devices/dimm0/dimm_label 0000000 012 012 \n \n 0000002 3) An input string longer than the buffer size results a wrong label info as it allows a retry with the remaining string: # echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0_TEST" > /sys/bus/mc0/devices/dimm0/dimm_label # cat /sys/bus/mc0/devices/dimm0/dimm_label _TEST Fix these issues by making the following changes: 1) Replace a newline character at the end by setting a null. It also assures that the string is null-terminated in the label buffer. 2) Check the label buffer size with 'sizeof(dimm->label)'. 3) Fail a request if its string exceeds the label buffer size. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Robert Elliott <elliott@hpe.com> Link: http://lkml.kernel.org/r/1443121564.25474.160.camel@hpe.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-25EDAC: Fix sysfs dimm_label show operationToshi Kani1-2/+2
After 7d375bffa524 ("sb_edac: Fix support for systems with two home agents per socket") sysfs "dimm_label" and "chX_dimm_label" show their label string without a newline "\n" at the end. [root@orange ~]# cat /sys/bus/mc0/devices/dimm0/dimm_label CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]# [root@orange ~]# cat /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]# The label strings now have 31 characters, which are the same as EDAC_MC_LABEL_LEN. Since the snprintf()s in channel_dimm_label_show() and dimmdev_label_show() limit the whole length by EDAC_MC_LABEL_LEN, the newline in the format "%s\n" is ignored. [root@orange ~]# od -bc /sys/bus/mc0/devices/dimm0/dimm_label 0000000 103 120 125 137 123 162 143 111 104 043 060 137 110 141 043 060 C P U _ S r c I D # 0 _ H a # 0 0000020 137 103 150 141 156 043 060 137 104 111 115 115 043 060 000 _ C h a n # 0 _ D I M M # 0 \0 0000037 Fix it by using 'sizeof(dimm->label) + 1' as the whole length in the snprintf()s in channel_dimm_label_show() and dimmdev_label_show(). Reported-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Link: http://lkml.kernel.org/r/1442933883-21587-2-git-send-email-toshi.kani@hpe.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-09-22EDAC: Carve out debugfs functionalityBorislav Petkov1-109/+1
... into a separate compilation unit and drop a couple of CONFIG_EDAC_DEBUG ifdefferies. Rename edac_create_debug_nodes() to edac_create_debugfs_nodes(), while at it. No functionality change. Cc: <linux-edac@vger.kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-03EDAC: Remove arbitrary limit on number of channelsTony Luck1-5/+0
Currently set to "6", but the reset of the code will dynamically allocate as needed. We need to go to "8" today, but drop the check completely to save doing this again when we need even larger numbers. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2015-02-23EDAC: Properly unwind on failure path in edac_init()Alexey Khoroshilov1-2/+2
edac_init() does not deallocate already allocated resources on failure path. Found by Linux Driver Verification project (linuxtesting.org). [ Boris: The unwind path functions have __exit annotation but are being used in an __init function, leading to section mismatches. Drop the section annotation and make them normal functions. ] Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Link: http://lkml.kernel.org/r/1423203162-26368-1-git-send-email-khoroshilov@ispras.ru Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23EDAC: Allow to pass driver-specific attribute groupsTakashi Iwai1-1/+3
Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info object with the optional attribute groups. This allows drivers to pass additional sysfs entries without manual (and racy) device_create_file() and co calls. edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups() with NULL groups. Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.de Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23EDAC: Use static attribute groups for managing sysfs entriesTakashi Iwai1-88/+71
Instead of manual calls of device_create_file() and device_remove_file(), use static attribute groups with proper is_visible callbacks for managing the sysfs entries. This simplifies the code a lot and avoids the possible races. Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1423046938-18111-2-git-send-email-tiwai@suse.de Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-30EDAC: edac_mc_sysfs: Make stuff staticBorislav Petkov1-11/+11
Fix sparse warnings. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-30EDAC: Fix the leak of mci->bus->name when bus_register failsJunjie Mao1-13/+16
Also use goto labels for all failure paths in edac_create_sysfs_mci_device and update meaningless labels. Signed-off-by: Junjie Mao <junjie.mao@hotmail.com> Link: http://lkml.kernel.org/r/BLU436-SMTP25291B6B612942A212AEBFE95300@phx.gbl [ Boris: Use ! for 0 checks and add newlines for less crammed code. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2014-12-02sb_edac: Fix off-by-one error in number of channelsJim Snow1-1/+1
This prevented edac sysfs code from properly handling 6 channels per memory controller. Signed-off-by: Jim Snow <jim.snow@intel.com> Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-06-26edac: add DDR4 and RDDR4Aristeu Rozanski1-1/+3
Haswell memory controller can make use of DDR4 and Registered DDR4 Cc: tony.luck@intel.com Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2014-03-11Merge tag 'v3.14-rc5' into patchworkMauro Carvalho Chehab1-4/+6
Linux 3.14-rc5 * tag 'v3.14-rc5': (1117 commits) Linux 3.14-rc5 drm/vmwgfx: avoid null pointer dereference at failure paths drm/vmwgfx: Make sure backing mobs are cleared when allocated. Update driver date. drm/vmwgfx: Remove some unused surface formats MAINTAINERS: add maintainer entry for Armada DRM driver arm64: Fix !CONFIG_SMP kernel build arm64: mm: Add double logical invert to pte accessors dm cache: fix truncation bug when mapping I/O to >2TB fast device perf tools: Fix strict alias issue for find_first_bit powerpc/powernv: Fix indirect XSCOM unmangling powerpc/powernv: Fix opal_xscom_{read,write} prototype powerpc/powernv: Refactor PHB diag-data dump powerpc/powernv: Dump PHB diag-data immediately powerpc: Increase stack redzone for 64-bit userspace to 512 bytes powerpc/ftrace: bugfix for test_24bit_addr powerpc/crashdump : Fix page frame number check in copy_oldmem_page powerpc/le: Ensure that the 'stop-self' RTAS token is handled correctly kvm, vmx: Really fix lazy FPU on nested guest perf tools: fix BFD detection on opensuse drm/radeon: enable speaker allocation setup on dce3.2 ...
2014-02-14EDAC: Poll timeout cannot be zero, p2Borislav Petkov1-4/+6
Sanitize code even more to accept unsigned longs only and to not allow polling intervals below 1 second as this is unnecessary and doesn't make much sense anyway for polling errors. Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com Cc: Doug Thompson <dougthompson@xmission.com> Cc: <stable@vger.kernel.org>
2014-02-10drivers/edac/edac_mc_sysfs.c: poll timeout cannot be zeroPrarit Bhargava1-1/+1
If you do echo 0 > /sys/module/edac_core/parameters/edac_mc_poll_msec the following stack trace is output because the edac module is not designed to poll with a timeout of zero. WARNING: CPU: 12 PID: 0 at lib/list_debug.c:33 __list_add+0xac/0xc0() list_add corruption. prev->next should be next (ffff8808291dd1b8), but was (null). (prev=ffff8808286fe3f8). Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 12 PID: 0 Comm: swapper/12 Not tainted 3.13.0+ #1 Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Call Trace: <IRQ> __list_add+0xac/0xc0 __internal_add_timer+0xab/0x130 internal_add_timer+0x17/0x40 mod_timer_pinned+0xca/0x170 intel_pstate_timer_func+0x28a/0x380 call_timer_fn+0x36/0x100 run_timer_softirq+0x1ff/0x2f0 __do_softirq+0xf5/0x2e0 irq_exit+0x10d/0x120 smp_apic_timer_interrupt+0x45/0x60 apic_timer_interrupt+0x6d/0x80 <EOI> cpuidle_idle_call+0xb9/0x1f0 arch_cpu_idle+0xe/0x30 cpu_startup_entry+0x9e/0x240 start_secondary+0x1e4/0x290 kernel BUG at kernel/timer.c:1084! invalid opcode: 0000 [#1] SMP Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 12 PID: 0 Comm: swapper/12 Tainted: G W 3.13.0+ #1 Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Call Trace: <IRQ> run_timer_softirq+0x245/0x2f0 __do_softirq+0xf5/0x2e0 irq_exit+0x10d/0x120 smp_apic_timer_interrupt+0x45/0x60 apic_timer_interrupt+0x6d/0x80 <EOI> cpuidle_idle_call+0xb9/0x1f0 arch_cpu_idle+0xe/0x30 cpu_startup_entry+0x9e/0x240 start_secondary+0x1e4/0x290 RIP cascade+0x93/0xa0 WARNING: CPU: 36 PID: 1154 at kernel/workqueue.c:1461 __queue_delayed_work+0xed/0x1a0() Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU: 36 PID: 1154 Comm: kworker/u481:3 Tainted: G W 3.13.0+ #1 Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 Workqueue: edac-poller edac_mc_workq_function [edac_core] Call Trace: dump_stack+0x45/0x56 warn_slowpath_common+0x7d/0xa0 warn_slowpath_null+0x1a/0x20 __queue_delayed_work+0xed/0x1a0 queue_delayed_work_on+0x27/0x50 edac_mc_workq_function+0x72/0xa0 [edac_core] process_one_work+0x17b/0x460 worker_thread+0x11b/0x400 kthread+0xd2/0xf0 ret_from_fork+0x7c/0xb0 This patch adds a range check in the edac_mc_poll_msec code to check for 0. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-07[media, edac] Change my email addressMauro Carvalho Chehab1-1/+1
There are several left overs with my old email address. Remove their occurrences and add myself at CREDITS, to allow people to be able to reach me on my new addresses. Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
2013-12-15EDAC: Mark edac_create_debug_nodes as staticRashika Kheria1-1/+1
This patch marks the function edac_create_debug_nodes() as static because it is not used outside of edac_mc_sysfs.c. Thus, it also eliminates the following warning: drivers/edac/edac_mc_sysfs.c:917:5: warning: no previous prototype for ‘edac_create_debug_nodes’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Link: http://lkml.kernel.org/r/a1c863b08c0d6f67d03280cf908c771bf26a3239.1387029387.git.rashika.kheria@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-15Merge tag 'edac_for_3.12' of ↵Ingo Molnar1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras Pull RAS/EDAC updates from Boris Petkov: "An amd64_edac fix for single channel configurations + trivial cleanups courtesy of Jingoo Han." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-24EDAC: Replace strict_strtol() with kstrtol()Jingoo Han1-2/+4
The usage of strict_strtol() is not preferred, because strict_strtol() is obsolete. Thus, kstrtol() should be used. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2013-07-23EDAC: Fix lockdep splatBorislav Petkov1-13/+15
Fix the following: BUG: key ffff88043bdd0330 not in .data! ------------[ cut here ]------------ WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0() DEBUG_LOCKS_WARN_ON(1) Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1 Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013 0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958 ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000 ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc Call Trace: dump_stack warn_slowpath_common warn_slowpath_fmt lockdep_init_map ? trace_hardirqs_on_caller ? trace_hardirqs_on debug_mutex_init __mutex_init bus_register edac_create_sysfs_mci_device edac_mc_add_mc sbridge_probe pci_device_probe driver_probe_device __driver_attach ? driver_probe_device bus_for_each_dev driver_attach bus_add_driver driver_register __pci_register_driver ? 0xffffffffa0010fff sbridge_init ? 0xffffffffa0010fff do_one_initcall load_module ? unset_module_init_ro_nx SyS_init_module tracesys ---[ end trace d24a70b0d3ddf733 ]--- EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0 EDAC sbridge: Driver loaded. What happens is that bus_register needs a statically allocated lock_key because the last is handed in to lockdep. However, struct mem_ctl_info embeds struct bus_type (the whole struct, not a pointer to it) and the whole thing gets dynamically allocated. Fix this by using a statically allocated struct bus_type for the MC bus. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: stable@kernel.org # v3.10 Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-08EDAC: Replace strict_strtoul() with kstrtoul()Jingoo Han1-1/+1
The usage of strict_strtoul() is not preferred, because strict_strtoul() is obsolete. Thus, kstrtoul() should be used. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2013-05-09EDAC: Don't give write permission to read-only filesSrivatsa S. Bhat1-6/+6
I get the following warning on boot: ------------[ cut here ]------------ WARNING: at drivers/base/core.c:575 device_create_file+0x9a/0xa0() Hardware name: -[8737R2A]- Write permission without 'store' ... </snip> Drilling down, this is related to dynamic channel ce_count attribute files sporting a S_IWUSR mode without a ->store() function. Looking around, it appears that they aren't supposed to have a ->store() function. So remove the bogus write permission to get rid of the warning. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: <stable@vger.kernel.org> # 3.[89] [ shorten commit message ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-25EDAC, mc_sysfs.c: Fix string array pointer typesBorislav Petkov1-3/+3
Those should be const ptr to a const string, fix them. Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-16EDAC: Merge mci.mem_is_per_rank with mci.csbasedMauro Carvalho Chehab1-1/+1
Both mci.mem_is_per_rank and mci.csbased denote the same thing: the memory controller is csrows based. Merge both fields into one. There's no need for the driver to actually fill it, as the core detects it by checking if one of the layers has the csrows type as part of the memory hierarchy: if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT) per_rank = true; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-16amd64_edac: Correct DIMM sizesMauro Carvalho Chehab1-10/+3
We were filling the csrow size with a wrong value. 16a528ee3975 ("EDAC: Fix csrow size reported in sysfs") tried to address the issue. It fixed the report with the old API but not with the new one. Correct it for the new API too. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> [ make it a per-csrow accounting regardless of ->channel_count ] Signed-off-by: Borislav Petkov <bp@suse.de>
2013-03-05EDAC: Make sysfs functions staticStephen Hemminger1-1/+1
Fixes lots of sparse warnings here. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2013-02-21edac: better report error conditions in debug modeMauro Carvalho Chehab1-1/+6
It is hard to find what's wrong without a proper error report. Improve it, in debug mode. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-02-21edac: only create sdram_scrub_rate where supportedMauro Carvalho Chehab1-10/+19
Currently, sdram_scrub_rate sysfs node is created even if the device doesn't support get/set the scub rate. Change the logic to only create this device node when the operation is supported. Reported-by: Felipe Balbi <balbi@ti.com> Acked-by: Borislav Petkov <bp@suse.de> Reviewed-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2013-01-07EDAC: Cleanup device deregistering pathLans Zhang1-12/+6
Use device_unregister to replace put_device + device_del for cleanup, and fix the potential use after free. Signed-off-by: Lans Zhang <jia.zhang@windriver.com> Signed-off-by: Borislav Petkov <bp@alien8.de>
2013-01-07EDAC: Fix kernel panic on module unloadingKonstantin Khlebnikov1-2/+1
This patch fixes use-after-free and double-free bugs in edac_mc_sysfs_exit(). mci_pdev has single reference and put_device() calls mc_attr_release() which calls kfree(). The following device_del() works with already released memory. An another kfree() in edac_mc_sysfs_exit() releses the same memory again. Great. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: stable@vger.kernel.org # 3.[67] Cc: Denis Kirjanov <kirjanov@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Link: http://lkml.kernel.org/r/20121214110310.11019.21098.stgit@zurg Signed-off-by: Borislav Petkov <bp@alien8.de>
2012-11-28EDAC: Handle error path in edac_mc_sysfs_init() properlyDenis Kirjanov1-2/+15
Make sure proper deregistration happens on all error paths in edac_mc_sysfs_init. Signed-off-by: Denis Kirjanov <kirjanov@gmail.com> [ Boris: cleanup and concretize commit message ] Signed-off-by: Borislav Petkov <bp@alien8.de>
2012-11-28EDAC: Convert to use simple_open()Wei Yongjun1-7/+1
This removes an open coded simple_open() function and replaces file operations references to the function with simple_open() instead. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Fix mc size reported in sysfsJosh Hunt1-4/+8
This is the complement to previous commit "EDAC: Fix csrow size reported in sysfs". This fixes the memory controller size reporting on csrow-based memory controllers. The csrow size is already combined for both channels. Without this patch memory size is reported doubled. Signed-off-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Fix csrow size reported in sysfsBorislav Petkov1-0/+3
On csrow-based memory controllers, we combine the csrow size from both channels and there's no need to do that again in csrow_size_show which leads to double the size of a csrow. Fix it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-11-28EDAC: Pass mci parentBorislav Petkov1-0/+1
Initialize the mem_ctl_info descriptor of a csrow properly. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-06-27edac: allow specifying the error count with fake_injectMauro Carvalho Chehab1-2/+13
In order to test if the error counters are properly incremented, add a way to specify how many errors were generated by a trace. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-12edac: create top-level debugfs directoryRob Herring1-1/+22
Create a single, top-level "edac" directory for debugfs. An "mc[0-N]" directory is then created for each memory controller. Individual drivers can create additional entries such as h/w error injection control. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-12edac: edac_mc_handle_error(): add an error_count parameterMauro Carvalho Chehab1-1/+1
In order to avoid loosing error events, it is desirable to group error events together and generate a single trace for several identical errors. The trace API already allows reporting multiple errors. Change the handle_error function to also allow that. The changes at the drivers were made by this small script: $file .=$_ while (<>); $file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g; print $file; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-11edac: remove arch-specific parameter for the error handlerMauro Carvalho Chehab1-1/+1
Remove the arch-dependent parameter, as it were not used, as the MCE tracepoint weren't implemented. It probably doesn't make sense to have an MCE-specific tracepoint, as this will cost more bytes at the tracepoint, and tracepoint is not free. The changes at the EDAC drivers were done by this small perl script: $file .=$_ while (<>); $file =~ s/(edac_mc_handle_error)\s*\(([^\;]+)\,([^\,\)]+)\s*\)/$1($2)/g; print $file; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-11edac_mc: Cleanup per-dimm_info debug messagesMauro Carvalho Chehab1-10/+1
The edac_mc_alloc() routine allocates one dimm_info device for all possible memories, including the non-filled ones. The debug messages there are somewhat confusing. So, cleans them, by moving the code that prints the memory location to edac_mc, and using it on both edac_mc_sysfs and edac_mc. Also, only dumps information when DIMM/ranks are actually filled. After this patch, a dimm-based memory controller will print the debug info as: [ 1011.380027] EDAC DEBUG: edac_mc_dump_csrow: csrow->csrow_idx = 0 [ 1011.380029] EDAC DEBUG: edac_mc_dump_csrow: csrow = ffff8801169be000 [ 1011.380031] EDAC DEBUG: edac_mc_dump_csrow: csrow->first_page = 0x0 [ 1011.380032] EDAC DEBUG: edac_mc_dump_csrow: csrow->last_page = 0x0 [ 1011.380034] EDAC DEBUG: edac_mc_dump_csrow: csrow->page_mask = 0x0 [ 1011.380035] EDAC DEBUG: edac_mc_dump_csrow: csrow->nr_channels = 3 [ 1011.380037] EDAC DEBUG: edac_mc_dump_csrow: csrow->channels = ffff8801149c2840 [ 1011.380039] EDAC DEBUG: edac_mc_dump_csrow: csrow->mci = ffff880117426000 [ 1011.380041] EDAC DEBUG: edac_mc_dump_channel: channel->chan_idx = 0 [ 1011.380042] EDAC DEBUG: edac_mc_dump_channel: channel = ffff8801149c2860 [ 1011.380044] EDAC DEBUG: edac_mc_dump_channel: channel->csrow = ffff8801169be000 [ 1011.380046] EDAC DEBUG: edac_mc_dump_channel: channel->dimm = ffff88010fe90400 ... [ 1011.380095] EDAC DEBUG: edac_mc_dump_dimm: dimm0: channel 0 slot 0 mapped as virtual row 0, chan 0 [ 1011.380097] EDAC DEBUG: edac_mc_dump_dimm: dimm = ffff88010fe90400 [ 1011.380099] EDAC DEBUG: edac_mc_dump_dimm: dimm->label = 'CPU#0Channel#0_DIMM#0' [ 1011.380101] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000 [ 1011.380103] EDAC DEBUG: edac_mc_dump_dimm: dimm->grain = 8 [ 1011.380104] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000 ... (a rank-based memory controller would print, instead of "dimm?", "rank?" on the above debug info) Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-06-11edac: Convert debugfX to edac_dbg(X,Joe Perches1-19/+18
Use a more common debugging style. Remove __FILE__ uses, add missing newlines, coalesce formats and align arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>