<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/drivers/edac/edac_device.c, branch v2.6.39-rc2</title>
<subtitle>Linux Kernel (branches are rebased on master from time to time)</subtitle>
<id>https://sre.ring0.de/linux/atom?h=v2.6.39-rc2</id>
<link rel='self' href='https://sre.ring0.de/linux/atom?h=v2.6.39-rc2'/>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/'/>
<updated>2009-09-24T14:21:05Z</updated>
<entry>
<title>edac: core: remove completion-wait for complete with rcu_barrier</title>
<updated>2009-09-24T14:21:05Z</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>hawk@comx.dk</email>
</author>
<published>2009-09-23T22:57:29Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=458e5ff13e1bed050990d97e9aa55bcdafc951a7'/>
<id>urn:sha1:458e5ff13e1bed050990d97e9aa55bcdafc951a7</id>
<content type='text'>
Module edac_core.ko uses call_rcu() callbacks in edac_device.c, edac_mc.c
and edac_pci.c.

They all use a wait_for_completion() scheme, but this scheme it not 100%
safe on multiple CPUs.  See the _rcu_barrier() implementation which
explains why extra precausion is needed.

The patch adds a comment about rcu_barrier() and as a precausion calls
rcu_barrier().  A maintainer needs to look at removing the
wait_for_completion code.

[dougthompson@xmission.com: remove the wait_for_completion code]
Signed-off-by Jesper Dangaard Brouer &lt;hawk@comx.dk&gt;
Signed-off-by: Doug Thompson &lt;dougthompson@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>edac: add edac_device_alloc_index()</title>
<updated>2009-06-18T20:03:56Z</updated>
<author>
<name>Harry Ciao</name>
<email>qingtao.cao@windriver.com</email>
</author>
<published>2009-06-17T23:27:59Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=1dc9b70d7d48abd8a5c6f83021f38992f3b5a77f'/>
<id>urn:sha1:1dc9b70d7d48abd8a5c6f83021f38992f3b5a77f</id>
<content type='text'>
Add edac_device_alloc_index(), because for MAPLE platform there may
exist several EDAC driver modules that could make use of
edac_device_ctl_info structure at the same time. The index allocation
for these structures should be taken care of by EDAC core.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Harry Ciao &lt;qingtao.cao@windriver.com&gt;
Cc: Doug Thompson &lt;norsk5@yahoo.com&gt;
Cc: Michael Ellerman &lt;michael@ellerman.id.au&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: Kumar Gala &lt;galak@gate.crashing.org&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>edac: use to_delayed_work()</title>
<updated>2009-04-13T22:04:34Z</updated>
<author>
<name>Jean Delvare</name>
<email>khali@linux-fr.org</email>
</author>
<published>2009-04-13T21:40:21Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=fbeb4384748abb78531bbe1e80d627412a0abcfa'/>
<id>urn:sha1:fbeb4384748abb78531bbe1e80d627412a0abcfa</id>
<content type='text'>
The edac-core driver includes code which assumes that the work_struct
which is included in every delayed_work is the first member of that
structure.  This is currently the case but might change in the future, so
use to_delayed_work() instead, which doesn't make such an assumption.

linux-2.6.30-rc1 has the to_delayed_work() function that will allow this
patch to work

Signed-off-by: Jean Delvare &lt;khali@linux-fr.org&gt;
Signed-off-by: Doug Thompson &lt;dougthompson@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>edac: struct device: replace bus_id with dev_name(), dev_set_name()</title>
<updated>2009-01-06T23:59:30Z</updated>
<author>
<name>Kay Sievers</name>
<email>kay.sievers@vrfy.org</email>
</author>
<published>2009-01-06T22:42:57Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=281efb17d88a91dc3b879bb1d49e3a66daf48797'/>
<id>urn:sha1:281efb17d88a91dc3b879bb1d49e3a66daf48797</id>
<content type='text'>
This patch is part of a larger patch series which will remove the "char
bus_id[20]" name string from struct device.  The device name is managed in
the kobject anyway, and without any size limitation, and just needlessly
copied into "struct device".

[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;
Acked-by: Doug Thompson &lt;dougthompson@xmission.com&gt;
Signed-off-by: Kay Sievers &lt;kay.sievers@vrfy.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>edac: fix edac core deadlock when removing a device</title>
<updated>2008-12-23T23:58:21Z</updated>
<author>
<name>Harry Ciao</name>
<email>qingtao.cao@windriver.com</email>
</author>
<published>2008-12-23T21:57:16Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=d519c8d9ccb7956e61a55ce3a0fd6a25f42cbb33'/>
<id>urn:sha1:d519c8d9ccb7956e61a55ce3a0fd6a25f42cbb33</id>
<content type='text'>
When deleting an edac device, we have to wait for its edac_dev.work to be
completed before deleting the whole edac_dev structure.  Since we have no
idea which work in current edac_poller's workqueue is the work we are
conerned about, we wait for all work in the edac_poller's workqueue to be
proceseed.  This is done via flush_cpu_workqueue() which inserts a
wq_barrier into the tail of the workqueue and then sleeping on the
completion of this wq_barrier.  The edac_poller will wake up sleepers when
it is found.

EDAC core creates only one kernel worker thread, edac_poller, to run the
works of all current edac devices.  They share the same callback function
of edac_device_workq_function(), which would grab the mutex of
device_ctls_mutex first before it checks the device.  This is exactly
where edac_poller and rmmod would have a great chance to deadlock.

In below call trace of rmmod &gt; ... &gt;
edac_device_del_device &gt;
edac_device_workq_teardown &gt; flush_workqueue &gt; flush_cpu_workqueue,

device_ctls_mutex would have already been grabbed by
edac_device_del_device().  So, on one hand rmmod would sleep on the
completion of a wq_barrier, holding device_ctls_mutex; on the other hand
edac_poller would be blocked on the same mutex when it's running any one
of works of existing edac evices(Note, this edac_dev.work is likely to be
totally irrelevant to the one that is being removed right now)and never
would have a chance to run the work of above wq_barrier to wake rmmod up.

edac_device_workq_teardown() should not be called within the critical
region of device_ctls_mutex.  Just like is done in edac_pci_del_device()
and edac_mc_del_mc(), where edac_pci_workq_teardown() and
edac_mc_workq_teardown() are called after related mutex are released.

Moreover, an edac_dev.work should check first if it is being removed.  If
this is the case, then it should bail out immediately.  Since not all of
existing edac devices are to be removed, this "shutting flag" should be
contained to edac device being removed.  The current edac_dev.op_state can
be used to serve this purpose.

The original deadlock problem and the solution have been witnessed and
tested on actual hardware.  Without the solution, rmmod an edac driver
would result in below deadlock:

root@localhost:/root&gt; rmmod mv64x60_edac
EDAC DEBUG: mv64x60_dma_err_remove()
EDAC DEBUG: edac_device_del_device()
EDAC DEBUG: find_edac_device_by_dev()

(hang for a moment)

INFO: task edac-poller:2030 blocked for more than 120 seconds.
"echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
edac-poller   D 00000000     0  2030      2
Call Trace:
[df159dc0] [c0071e3c] free_hot_cold_page+0x17c/0x304 (unreliable)
[df159e80] [c000a024] __switch_to+0x6c/0xa0
[df159ea0] [c03587d8] schedule+0x2f4/0x4d8
[df159f00] [c03598a8] __mutex_lock_slowpath+0xa0/0x174
[df159f40] [e1030434] edac_device_workq_function+0x28/0xd8 [edac_core]
[df159f60] [c003beb4] run_workqueue+0x114/0x218
[df159f90] [c003c674] worker_thread+0x5c/0xc8
[df159fd0] [c004106c] kthread+0x5c/0xa0
[df159ff0] [c0013538] original_kernel_thread+0x44/0x60
INFO: task rmmod:2062 blocked for more than 120 seconds.
"echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rmmod         D 0ff2c9fc     0  2062   1839
Call Trace:
[df119c00] [c0437a74] 0xc0437a74 (unreliable)
[df119cc0] [c000a024] __switch_to+0x6c/0xa0
[df119ce0] [c03587d8] schedule+0x2f4/0x4d8
[df119d40] [c03591dc] schedule_timeout+0xb0/0xf4

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>dev_name introduction fall out fix</title>
<updated>2008-05-05T22:08:38Z</updated>
<author>
<name>Stephen Rothwell</name>
<email>sfr@canb.auug.org.au</email>
</author>
<published>2008-05-05T03:54:19Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=17aa7e034416e3080bc57a786d09ba0a4a044561'/>
<id>urn:sha1:17aa7e034416e3080bc57a786d09ba0a4a044561</id>
<content type='text'>
Commit 06916639e2fed9ee475efef2747a1b7429f8fe76 ("driver-core: add
dev_name() to help transition away from using bus_id") added a static
inline dev_name() and used it in dev_printk.

Unfortunately, drivers/edac/edac_core.h defines a macro called
dev_name().  Rename the latter.

Diagnosis by Tony Breeds and Michael Ellerman.

Signed-off-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Acked-by: Doug Thompson &lt;dougthompson@xmission.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>edac: remove unneeded functions and add static accessor</title>
<updated>2008-04-29T15:06:26Z</updated>
<author>
<name>Adrian Bunk</name>
<email>bunk@kernel.org</email>
</author>
<published>2008-04-29T08:03:18Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=1a45027d1afd7e85254b5ef8535e93ce3d588cf4'/>
<id>urn:sha1:1a45027d1afd7e85254b5ef8535e93ce3d588cf4</id>
<content type='text'>
Collection of patches, merged into one, from Adrian that do the following:

1) This patch makes the following needlessly global functions static:
- edac_pci_get_log_pe()
- edac_pci_get_log_npe()
- edac_pci_get_panic_on_pe()
- edac_pci_unregister_sysfs_instance_kobj()
- edac_pci_main_kobj_setup()

2) Remove unneeded function edac_device_find()

3) Added #if 0 around function  edac_pci_find()

4) make the needlessly global edac_pci_generic_check() static

5) Removed function edac_check_mc_devices()

Doug Thompson modified Adrian's patches, to bettern represent
the direction of EDAC, and make them one patch.

Cc: Alan Cox &lt;alan@lxorguk.ukuu.org.uk&gt;
Signed-off-by: Adrian Bunk &lt;bunk@kernel.org&gt;
Signed-off-by: Doug Thompson &lt;dougthompson@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>edac: use the shorter LIST_HEAD for brevity</title>
<updated>2008-04-29T15:06:26Z</updated>
<author>
<name>Robert P. J. Day</name>
<email>rpjday@crashcourse.ca</email>
</author>
<published>2008-04-29T08:03:17Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=ff6ac2a616c85d1215899ffda815e29b699cbd3a'/>
<id>urn:sha1:ff6ac2a616c85d1215899ffda815e29b699cbd3a</id>
<content type='text'>
Signed-off-by: Robert P. J. Day &lt;rpjday@crashcourse.ca&gt;
Acked-by: Doug Thompson &lt;norsk5@yahoo.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>drivers-edac: use round_jiffies_relative</title>
<updated>2008-02-07T16:42:23Z</updated>
<author>
<name>Anton Blanchard</name>
<email>anton@samba.org</email>
</author>
<published>2008-02-07T08:14:51Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=c2ae24cfd1969a28e76641807026a3bbc11c5f31'/>
<id>urn:sha1:c2ae24cfd1969a28e76641807026a3bbc11c5f31</id>
<content type='text'>
When rounding a relative timeout we need to use round_jiffies_relative().

Signed-off-by: Anton Blanchard &lt;anton@samba.org&gt;
Acked-by: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Cc: Alan Cox &lt;alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson &lt;dougthompson@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>drivers-edac: turn on edac device error logging</title>
<updated>2008-02-07T16:42:23Z</updated>
<author>
<name>Doug Thompson</name>
<email>dougthompson@xmission.com</email>
</author>
<published>2008-02-07T08:14:51Z</published>
<link rel='alternate' type='text/html' href='https://sre.ring0.de/linux/commit/?id=56e61a9c5fe7b799504b125c278b56cc2c42670f'/>
<id>urn:sha1:56e61a9c5fe7b799504b125c278b56cc2c42670f</id>
<content type='text'>
ENABLE the 'logging' of CE and UE events for the EDAC_DEVICE class of error
harvester in EDAC

Cc: Alan Cox &lt;alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson &lt;dougthompson@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
