Age | Commit message (Collapse) | Author | Files | Lines |
|
The old csrowX sysfs directories have per-csrow error counters, but the
new dimmX directories do not currently expose error counts.
EDAC already keeps these counts, add them to sysfs so per-DIMM counts
are still available when CONFIG_EDAC_LEGACY_SYSFS=n.
Signed-off-by: Aaron Miller <aaronmiller@fb.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161103220153.3997328-1-aaronmiller@fb.com
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
The dev_attr_sdram_scrub_rate is not declared in a header or used
anywhere else, so make it static to fix the following warning:
drivers/edac/edac_mc_sysfs.c:816:1: warning: symbol
'dev_attr_sdram_scrub_rate' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1465407356-7357-1-git-send-email-ben.dooks@codethink.co.uk
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Now, all left at edac_core.h are at drivers/edac/edac_mc.c,
so rename it to edac_mc.h.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
|
c44696fff04f ("EDAC: Remove arbitrary limit on number of channels")
lifted the arbitrary limit on memory controller channels in EDAC.
However, the dynamic channel attributes dynamic_csrow_dimm_attr and
dynamic_csrow_ce_count_attr remained 6.
This wasn't a problem except channels 6 and 7 weren't visible in sysfs
on machines with more than 6 channels after the conversion to static
attr groups with
2c1946b6d629 ("EDAC: Use static attribute groups for managing sysfs entries")
[ without that, we're exploding in edac_create_sysfs_mci_device()
because we're dereferencing out of the bounds of the
dynamic_csrow_dimm_attr array. ]
Add attributes for channels 6 and 7 along with a guard for the
future, should more channels be required and/or to sanity check for
misconfigured machines.
We still need to check against the number of channels present on the MC
first, as Thor reported.
Signed-off-by: Borislav Petkov <bp@suse.de>
Reported-by: Hironobu Ishii <ishii.hironobu@jp.fujitsu.com>
Tested-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: <stable@vger.kernel.org> # 4.2
|
|
Code flow looks like this:
device_unregister(&mci->dev);
-> kobject_put+0x25/0x50
-> kobject_cleanup+0x77/0x190
-> device_release+0x32/0xa0
-> mci_attr_release+0x36/0x70
-> kfree(mci);
bus_unregister(mci->bus);
Fix is to grab a local copy of "mci->bus" and use that when we call
bus_unregister().
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/21d595b0ab3d718d9cb206647f4ec91c05e62ec4.1461261078.git.tony.luck@intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
It cannot fail now. We either load EDAC core after having successfully
initialized edac_subsys or we don't.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
This was really dumb - reference counting for the main EDAC sysfs
object. While we could've simply registered it as the first thing in the
module init path and then hand it around to what needs it.
Do that and rip out all the code around it, thus simplifying the whole
handling significantly.
Move the edac_subsys node back to edac_module.c.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
I get the splat below when modprobing/rmmoding EDAC drivers. It happens
because bus->name is invalid after bus_unregister() has run. The Code: section
below corresponds to:
.loc 1 1108 0
movq 672(%rbx), %rax # mci_1(D)->bus, mci_1(D)->bus
.loc 1 1109 0
popq %rbx #
.loc 1 1108 0
movq (%rax), %rdi # _7->name,
jmp kfree #
and %rax has some funky stuff 2030203020312030 which looks a lot like
something walked over it.
Fix that by saving the name ptr before doing stuff to string it points to.
general protection fault: 0000 [#1] SMP
Modules linked in: ...
CPU: 4 PID: 10318 Comm: modprobe Tainted: G I EN 3.12.51-11-default+ #48
Hardware name: HP ProLiant DL380 G7, BIOS P67 05/05/2011
task: ffff880311320280 ti: ffff88030da3e000 task.ti: ffff88030da3e000
RIP: 0010:[<ffffffffa019da92>] [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core]
RSP: 0018:ffff88030da3fe28 EFLAGS: 00010292
RAX: 2030203020312030 RBX: ffff880311b4e000 RCX: 000000000000095c
RDX: 0000000000000001 RSI: ffff880327bb9600 RDI: 0000000000000286
RBP: ffff880311b4e750 R08: 0000000000000000 R09: ffffffff81296110
R10: 0000000000000400 R11: 0000000000000000 R12: ffff88030ba1ac68
R13: 0000000000000001 R14: 00000000011b02f0 R15: 0000000000000000
FS: 00007fc9bf8f5700(0000) GS:ffff8801a7c40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000403c90 CR3: 000000019ebdf000 CR4: 00000000000007e0
Stack:
Call Trace:
i7core_unregister_mci.isra.9
i7core_remove
pci_device_remove
__device_release_driver
driver_detach
bus_remove_driver
pci_unregister_driver
i7core_exit
SyS_delete_module
system_call_fastpath
0x7fc9bf426536
Code: 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 48 89 fb e8 52 2a 1f e1 48 8b bb a0 02 00 00 e8 46 59 1f e1 48 8b 83 a0 02 00 00 5b <48> 8b 38 e9 26 9a fe e0 66 0f 1f 44 00 00 66 66 66 66 90 48 8b
RIP [<ffffffffa019da92>] edac_unregister_sysfs+0x22/0x30 [edac_core]
RSP <ffff88030da3fe28>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: <stable@vger.kernel.org> # v3.6..
Fixes: 7a623c039075 ("edac: rewrite the sysfs code to use struct device")
|
|
debugfs_remove() is used to remove a file or a directory from the
debugfs filesystem, but mci->debugfs might not empty.
This can be triggered by the following sequence:
1) Enable CONFIG_EDAC_DEBUG
2) insmod an EDAC module (like i3000_edac or similar)
3) rmmod this module
4) we can see files remaining under <debugfs_mountpoint>/edac/ like
"fake_inject", for example.
Removing edac_core then, causes a NULL pointer dereference.
Reported-by: Yun Wu (Abel) <wuyun.wu@huawei.com>
Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1444787364-104353-1-git-send-email-tanxiaojun@huawei.com
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Updating dimm_label to an empty string does not make much sense. Change
the sysfs dimm_label store operation to fail a request when an input
string is empty.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: elliott@hpe.com
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1443124767.25474.172.camel@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Sysfs "dimm_label" and "chX_dimm_label" nodes have the following issues
in their store operation:
1) A newline-terminated input string causes redundant newlines:
# echo "test" > /sys/bus/mc0/devices/dimm0/dimm_label
# cat /sys/bus/mc0/devices/dimm0/dimm_label
test
# od -bc /sys/bus/mc0/devices/dimm0/dimm_label
0000000 164 145 163 164 012 012
t e s t \n \n
0000006
2) The original label string (31 characters) cannot be stored due to
an improper size check:
# echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0" > /sys/bus/mc0/devices/dimm0/dimm_label
# cat /sys/bus/mc0/devices/dimm0/dimm_label
# od -bc /sys/bus/mc0/devices/dimm0/dimm_label
0000000 012 012
\n \n
0000002
3) An input string longer than the buffer size results a wrong label
info as it allows a retry with the remaining string:
# echo "CPU_SrcID#0_Ha#0_Chan#0_DIMM#0_TEST" > /sys/bus/mc0/devices/dimm0/dimm_label
# cat /sys/bus/mc0/devices/dimm0/dimm_label
_TEST
Fix these issues by making the following changes:
1) Replace a newline character at the end by setting a null. It also
assures that the string is null-terminated in the label buffer.
2) Check the label buffer size with 'sizeof(dimm->label)'.
3) Fail a request if its string exceeds the label buffer size.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Robert Elliott <elliott@hpe.com>
Link: http://lkml.kernel.org/r/1443121564.25474.160.camel@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
After
7d375bffa524 ("sb_edac: Fix support for systems with two home agents per socket")
sysfs "dimm_label" and "chX_dimm_label" show their label string without a
newline "\n" at the end.
[root@orange ~]# cat /sys/bus/mc0/devices/dimm0/dimm_label
CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#
[root@orange ~]# cat /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label
CPU_SrcID#0_Ha#0_Chan#0_DIMM#0[root@orange ~]#
The label strings now have 31 characters, which are the same as
EDAC_MC_LABEL_LEN. Since the snprintf()s in channel_dimm_label_show()
and dimmdev_label_show() limit the whole length by EDAC_MC_LABEL_LEN,
the newline in the format "%s\n" is ignored.
[root@orange ~]# od -bc /sys/bus/mc0/devices/dimm0/dimm_label
0000000 103 120 125 137 123 162 143 111 104 043 060 137 110 141 043 060
C P U _ S r c I D # 0 _ H a # 0
0000020 137 103 150 141 156 043 060 137 104 111 115 115 043 060 000
_ C h a n # 0 _ D I M M # 0 \0
0000037
Fix it by using 'sizeof(dimm->label) + 1' as the whole length in the
snprintf()s in channel_dimm_label_show() and dimmdev_label_show().
Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Link: http://lkml.kernel.org/r/1442933883-21587-2-git-send-email-toshi.kani@hpe.com
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
... into a separate compilation unit and drop a couple of
CONFIG_EDAC_DEBUG ifdefferies. Rename edac_create_debug_nodes() to
edac_create_debugfs_nodes(), while at it.
No functionality change.
Cc: <linux-edac@vger.kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Currently set to "6", but the reset of the code will dynamically
allocate as needed. We need to go to "8" today, but drop the check
completely to save doing this again when we need even larger numbers.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
|
|
edac_init() does not deallocate already allocated resources on failure
path.
Found by Linux Driver Verification project (linuxtesting.org).
[ Boris: The unwind path functions have __exit annotation but are being
used in an __init function, leading to section mismatches. Drop the
section annotation and make them normal functions. ]
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Link: http://lkml.kernel.org/r/1423203162-26368-1-git-send-email-khoroshilov@ispras.ru
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info
object with the optional attribute groups. This allows drivers to
pass additional sysfs entries without manual (and racy)
device_create_file() and co calls.
edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups()
with NULL groups.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Instead of manual calls of device_create_file() and
device_remove_file(), use static attribute groups with proper
is_visible callbacks for managing the sysfs entries.
This simplifies the code a lot and avoids the possible races.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: http://lkml.kernel.org/r/1423046938-18111-2-git-send-email-tiwai@suse.de
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Fix sparse warnings.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Also use goto labels for all failure paths in
edac_create_sysfs_mci_device and update meaningless labels.
Signed-off-by: Junjie Mao <junjie.mao@hotmail.com>
Link: http://lkml.kernel.org/r/BLU436-SMTP25291B6B612942A212AEBFE95300@phx.gbl
[ Boris: Use ! for 0 checks and add newlines for less crammed code. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
This prevented edac sysfs code from properly handling 6 channels
per memory controller.
Signed-off-by: Jim Snow <jim.snow@intel.com>
Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
|
|
Haswell memory controller can make use of DDR4 and Registered DDR4
Cc: tony.luck@intel.com
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
|
|
Linux 3.14-rc5
* tag 'v3.14-rc5': (1117 commits)
Linux 3.14-rc5
drm/vmwgfx: avoid null pointer dereference at failure paths
drm/vmwgfx: Make sure backing mobs are cleared when allocated. Update driver date.
drm/vmwgfx: Remove some unused surface formats
MAINTAINERS: add maintainer entry for Armada DRM driver
arm64: Fix !CONFIG_SMP kernel build
arm64: mm: Add double logical invert to pte accessors
dm cache: fix truncation bug when mapping I/O to >2TB fast device
perf tools: Fix strict alias issue for find_first_bit
powerpc/powernv: Fix indirect XSCOM unmangling
powerpc/powernv: Fix opal_xscom_{read,write} prototype
powerpc/powernv: Refactor PHB diag-data dump
powerpc/powernv: Dump PHB diag-data immediately
powerpc: Increase stack redzone for 64-bit userspace to 512 bytes
powerpc/ftrace: bugfix for test_24bit_addr
powerpc/crashdump : Fix page frame number check in copy_oldmem_page
powerpc/le: Ensure that the 'stop-self' RTAS token is handled correctly
kvm, vmx: Really fix lazy FPU on nested guest
perf tools: fix BFD detection on opensuse
drm/radeon: enable speaker allocation setup on dce3.2
...
|
|
Sanitize code even more to accept unsigned longs only and to not allow
polling intervals below 1 second as this is unnecessary and doesn't make
much sense anyway for polling errors.
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@vger.kernel.org>
|
|
If you do
echo 0 > /sys/module/edac_core/parameters/edac_mc_poll_msec
the following stack trace is output because the edac module is not
designed to poll with a timeout of zero.
WARNING: CPU: 12 PID: 0 at lib/list_debug.c:33 __list_add+0xac/0xc0()
list_add corruption. prev->next should be next (ffff8808291dd1b8), but was (null). (prev=ffff8808286fe3f8).
Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 0 Comm: swapper/12 Not tainted 3.13.0+ #1
Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Call Trace:
<IRQ>
__list_add+0xac/0xc0
__internal_add_timer+0xab/0x130
internal_add_timer+0x17/0x40
mod_timer_pinned+0xca/0x170
intel_pstate_timer_func+0x28a/0x380
call_timer_fn+0x36/0x100
run_timer_softirq+0x1ff/0x2f0
__do_softirq+0xf5/0x2e0
irq_exit+0x10d/0x120
smp_apic_timer_interrupt+0x45/0x60
apic_timer_interrupt+0x6d/0x80
<EOI>
cpuidle_idle_call+0xb9/0x1f0
arch_cpu_idle+0xe/0x30
cpu_startup_entry+0x9e/0x240
start_secondary+0x1e4/0x290
kernel BUG at kernel/timer.c:1084!
invalid opcode: 0000 [#1] SMP
Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 12 PID: 0 Comm: swapper/12 Tainted: G W 3.13.0+ #1
Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Call Trace:
<IRQ>
run_timer_softirq+0x245/0x2f0
__do_softirq+0xf5/0x2e0
irq_exit+0x10d/0x120
smp_apic_timer_interrupt+0x45/0x60
apic_timer_interrupt+0x6d/0x80
<EOI>
cpuidle_idle_call+0xb9/0x1f0
arch_cpu_idle+0xe/0x30
cpu_startup_entry+0x9e/0x240
start_secondary+0x1e4/0x290
RIP cascade+0x93/0xa0
WARNING: CPU: 36 PID: 1154 at kernel/workqueue.c:1461 __queue_delayed_work+0xed/0x1a0()
Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache cfg80211 rfkill x86_pkg_temp_thermal coretemp kvm_intel kvm ixgbe e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt ptp sb_edac iTCO_vendor_support pps_core mdio ipmi_devintf edac_core ioatdma microcode shpchp lpc_ich pcspkr i2c_i801 dca mfd_core ipmi_si wmi ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod sr_mod cdrom crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt isci i2c_algo_bit drm_kms_helper ttm drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod
CPU: 36 PID: 1154 Comm: kworker/u481:3 Tainted: G W 3.13.0+ #1
Hardware name: Intel Corporation LH Pass ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Workqueue: edac-poller edac_mc_workq_function [edac_core]
Call Trace:
dump_stack+0x45/0x56
warn_slowpath_common+0x7d/0xa0
warn_slowpath_null+0x1a/0x20
__queue_delayed_work+0xed/0x1a0
queue_delayed_work_on+0x27/0x50
edac_mc_workq_function+0x72/0xa0 [edac_core]
process_one_work+0x17b/0x460
worker_thread+0x11b/0x400
kthread+0xd2/0xf0
ret_from_fork+0x7c/0xb0
This patch adds a range check in the edac_mc_poll_msec code to check for 0.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There are several left overs with my old email address.
Remove their occurrences and add myself at CREDITS, to
allow people to be able to reach me on my new addresses.
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
|
|
This patch marks the function edac_create_debug_nodes() as static
because it is not used outside of edac_mc_sysfs.c.
Thus, it also eliminates the following warning:
drivers/edac/edac_mc_sysfs.c:917:5: warning: no previous prototype for ‘edac_create_debug_nodes’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/a1c863b08c0d6f67d03280cf908c771bf26a3239.1387029387.git.rashika.kheria@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras
Pull RAS/EDAC updates from Boris Petkov:
"An amd64_edac fix for single channel configurations + trivial cleanups
courtesy of Jingoo Han."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The usage of strict_strtol() is not preferred, because strict_strtol()
is obsolete. Thus, kstrtol() should be used.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Fix the following:
BUG: key ffff88043bdd0330 not in .data!
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
Call Trace:
dump_stack
warn_slowpath_common
warn_slowpath_fmt
lockdep_init_map
? trace_hardirqs_on_caller
? trace_hardirqs_on
debug_mutex_init
__mutex_init
bus_register
edac_create_sysfs_mci_device
edac_mc_add_mc
sbridge_probe
pci_device_probe
driver_probe_device
__driver_attach
? driver_probe_device
bus_for_each_dev
driver_attach
bus_add_driver
driver_register
__pci_register_driver
? 0xffffffffa0010fff
sbridge_init
? 0xffffffffa0010fff
do_one_initcall
load_module
? unset_module_init_ro_nx
SyS_init_module
tracesys
---[ end trace d24a70b0d3ddf733 ]---
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
EDAC sbridge: Driver loaded.
What happens is that bus_register needs a statically allocated lock_key
because the last is handed in to lockdep. However, struct mem_ctl_info
embeds struct bus_type (the whole struct, not a pointer to it) and the
whole thing gets dynamically allocated.
Fix this by using a statically allocated struct bus_type for the MC bus.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: stable@kernel.org # v3.10
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
The usage of strict_strtoul() is not preferred, because strict_strtoul()
is obsolete. Thus, kstrtoul() should be used.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
I get the following warning on boot:
------------[ cut here ]------------
WARNING: at drivers/base/core.c:575 device_create_file+0x9a/0xa0()
Hardware name: -[8737R2A]-
Write permission without 'store'
...
</snip>
Drilling down, this is related to dynamic channel ce_count attribute
files sporting a S_IWUSR mode without a ->store() function. Looking
around, it appears that they aren't supposed to have a ->store()
function. So remove the bogus write permission to get rid of the
warning.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: <stable@vger.kernel.org> # 3.[89]
[ shorten commit message ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Those should be const ptr to a const string, fix them.
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
memory controller is csrows based. Merge both fields into one.
There's no need for the driver to actually fill it, as the core detects
it by checking if one of the layers has the csrows type as part of the
memory hierarchy:
if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
per_rank = true;
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
We were filling the csrow size with a wrong value. 16a528ee3975 ("EDAC:
Fix csrow size reported in sysfs") tried to address the issue. It fixed
the report with the old API but not with the new one. Correct it for the
new API too.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
[ make it a per-csrow accounting regardless of ->channel_count ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
Fixes lots of sparse warnings here.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|
It is hard to find what's wrong without a proper error
report. Improve it, in debug mode.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Currently, sdram_scrub_rate sysfs node is created even if the device
doesn't support get/set the scub rate. Change the logic to only
create this device node when the operation is supported.
Reported-by: Felipe Balbi <balbi@ti.com>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Use device_unregister to replace put_device + device_del for
cleanup, and fix the potential use after free.
Signed-off-by: Lans Zhang <jia.zhang@windriver.com>
Signed-off-by: Borislav Petkov <bp@alien8.de>
|
|
This patch fixes use-after-free and double-free bugs in
edac_mc_sysfs_exit(). mci_pdev has single reference and put_device()
calls mc_attr_release() which calls kfree(). The following
device_del() works with already released memory. An another kfree() in
edac_mc_sysfs_exit() releses the same memory again. Great.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: stable@vger.kernel.org # 3.[67]
Cc: Denis Kirjanov <kirjanov@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Link: http://lkml.kernel.org/r/20121214110310.11019.21098.stgit@zurg
Signed-off-by: Borislav Petkov <bp@alien8.de>
|
|
Make sure proper deregistration happens on all error paths in
edac_mc_sysfs_init.
Signed-off-by: Denis Kirjanov <kirjanov@gmail.com>
[ Boris: cleanup and concretize commit message ]
Signed-off-by: Borislav Petkov <bp@alien8.de>
|
|
This removes an open coded simple_open() function and replaces file
operations references to the function with simple_open() instead.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
This is the complement to previous commit "EDAC: Fix csrow size
reported in sysfs". This fixes the memory controller size reporting on
csrow-based memory controllers. The csrow size is already combined for
both channels. Without this patch memory size is reported doubled.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
On csrow-based memory controllers, we combine the csrow size from both
channels and there's no need to do that again in csrow_size_show which
leads to double the size of a csrow.
Fix it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
Initialize the mem_ctl_info descriptor of a csrow properly.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
|
In order to test if the error counters are properly incremented,
add a way to specify how many errors were generated by a trace.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Create a single, top-level "edac" directory for debugfs. An "mc[0-N]"
directory is then created for each memory controller. Individual drivers
can create additional entries such as h/w error injection control.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
In order to avoid loosing error events, it is desirable to group
error events together and generate a single trace for several identical
errors.
The trace API already allows reporting multiple errors. Change the
handle_error function to also allow that.
The changes at the drivers were made by this small script:
$file .=$_ while (<>);
$file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g;
print $file;
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Remove the arch-dependent parameter, as it were not used,
as the MCE tracepoint weren't implemented. It probably doesn't
make sense to have an MCE-specific tracepoint, as this will
cost more bytes at the tracepoint, and tracepoint is not free.
The changes at the EDAC drivers were done by this small perl script:
$file .=$_ while (<>);
$file =~ s/(edac_mc_handle_error)\s*\(([^\;]+)\,([^\,\)]+)\s*\)/$1($2)/g;
print $file;
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
The edac_mc_alloc() routine allocates one dimm_info device for all
possible memories, including the non-filled ones. The debug messages
there are somewhat confusing. So, cleans them, by moving the code
that prints the memory location to edac_mc, and using it on both
edac_mc_sysfs and edac_mc.
Also, only dumps information when DIMM/ranks are actually
filled.
After this patch, a dimm-based memory controller will print the debug
info as:
[ 1011.380027] EDAC DEBUG: edac_mc_dump_csrow: csrow->csrow_idx = 0
[ 1011.380029] EDAC DEBUG: edac_mc_dump_csrow: csrow = ffff8801169be000
[ 1011.380031] EDAC DEBUG: edac_mc_dump_csrow: csrow->first_page = 0x0
[ 1011.380032] EDAC DEBUG: edac_mc_dump_csrow: csrow->last_page = 0x0
[ 1011.380034] EDAC DEBUG: edac_mc_dump_csrow: csrow->page_mask = 0x0
[ 1011.380035] EDAC DEBUG: edac_mc_dump_csrow: csrow->nr_channels = 3
[ 1011.380037] EDAC DEBUG: edac_mc_dump_csrow: csrow->channels = ffff8801149c2840
[ 1011.380039] EDAC DEBUG: edac_mc_dump_csrow: csrow->mci = ffff880117426000
[ 1011.380041] EDAC DEBUG: edac_mc_dump_channel: channel->chan_idx = 0
[ 1011.380042] EDAC DEBUG: edac_mc_dump_channel: channel = ffff8801149c2860
[ 1011.380044] EDAC DEBUG: edac_mc_dump_channel: channel->csrow = ffff8801169be000
[ 1011.380046] EDAC DEBUG: edac_mc_dump_channel: channel->dimm = ffff88010fe90400
...
[ 1011.380095] EDAC DEBUG: edac_mc_dump_dimm: dimm0: channel 0 slot 0 mapped as virtual row 0, chan 0
[ 1011.380097] EDAC DEBUG: edac_mc_dump_dimm: dimm = ffff88010fe90400
[ 1011.380099] EDAC DEBUG: edac_mc_dump_dimm: dimm->label = 'CPU#0Channel#0_DIMM#0'
[ 1011.380101] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000
[ 1011.380103] EDAC DEBUG: edac_mc_dump_dimm: dimm->grain = 8
[ 1011.380104] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000
...
(a rank-based memory controller would print, instead of "dimm?", "rank?"
on the above debug info)
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|
|
Use a more common debugging style.
Remove __FILE__ uses, add missing newlines,
coalesce formats and align arguments.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
|