summaryrefslogtreecommitdiffstats
path: root/drivers/md/raid1.c
AgeCommit message (Collapse)AuthorFilesLines
2012-07-19md/raid1: close some possible races on write errors during resyncNeilBrown1-2/+8
commit 4367af556133723d0f443e14ca8170d9447317cb md/raid1: clear bad-block record when write succeeds. Added a 'reschedule_retry' call possibility at the end of end_sync_write, but didn't add matching code at the end of sync_request_write. So if the writes complete very quickly, or scheduling makes it seem that way, then we can miss rescheduling the request and the resync could hang. Also commit 73d5c38a9536142e062c35997b044e89166e063b md: avoid races when stopping resync. Fix a race condition in this same code in end_sync_write but didn't make the change in sync_request_write. This patch updates sync_request_write to fix both of those. Patch is suitable for 3.1 and later kernels. Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com> Original-version-by: Alexander Lyakas <alex.bolshoy@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-09md/raid1: fix use-after-free bug in RAID1 data-check code.NeilBrown1-1/+2
This bug has been present ever since data-check was introduce in 2.6.16. However it would only fire if a data-check were done on a degraded array, which was only possible if the array has 3 or more devices. This is certainly possible, but is quite uncommon. Since hot-replace was added in 3.3 it can happen more often as the same condition can arise if not all possible replacements are present. The problem is that as soon as we submit the last read request, the 'r1_bio' structure could be freed at any time, so we really should stop looking at it. If the last device is being read from we will stop looking at it. However if the last device is not due to be read from, we will still check the bio pointer in the r1_bio, but the r1_bio might already be free. So use the read_targets counter to make sure we stop looking for bios to submit as soon as we have submitted them all. This fix is suitable for any -stable kernel since 2.6.16. Cc: stable@vger.kernel.org Reported-by: Arnold Schulz <arnysch@gmx.net> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md: fix up plugging (again).NeilBrown1-5/+2
The value returned by "mddev_check_plug" is only valid until the next 'schedule' as that will unplug things. This could happen at any call to mempool_alloc. So just calling mddev_check_plug at the start doesn't really make sense. So call it just before, or just after, queuing things for the thread. As the action that happens at unplug is to wake the thread, this makes lots of sense. If we cannot add a plug (which requires a small GFP_ATOMIC alloc) we wake thread immediately. RAID5 is a bit different. Requests are queued for the thread and the thread is woken by release_stripe. So we don't need to wake the thread on failure. However the thread doesn't perform certain actions when there is any active plug, so it is important to install a plug before waking the thread. So for RAID5 we install the plug *before* queuing the request and waking the thread. Without this patch it is possible for raid1 or raid10 to queue a request without then waking the thread, resulting in the array locking up. Also change raid10 to only flush_pending_write when there are not active plugs, just like raid1. This patch is suitable for 3.0 or later. I plan to submit it to -stable, but I'll like to let it spend a few weeks in mainline first to be sure it is completely safe. Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md/raid1: fix bug in read_balance introduced by hot-replaceNeilBrown1-2/+2
When we added hot_replace we doubled the number of devices that could be in a RAID1 array. So we doubled how far read_balance would search. Unfortunately we didn't double the point at which it looped back to the beginning - so it effectively loops over all non-replacement disks twice. This doesn't cause bad behaviour, but it pointless and means we never read from replacement devices. Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md: make 'name' arg to md_register_thread non-optional.NeilBrown1-1/+1
Having the 'name' arg optional and defaulting to the current personality name is no necessary and leads to errors, as when changing the level of an array we can end up using the name of the old level instead of the new one. So make it non-optional and always explicitly pass the name of the level that the array will be. Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-05-31md: raid1/raid10: fix problem with merge_bvec_fnNeilBrown1-0/+4
The new merge_bvec_fn which calls the corresponding function in subsidiary devices requires that mddev->merge_check_needed be set if any child has a merge_bvec_fn. However were were only setting that when a device was hot-added, not when a device was present from the start. This bug was introduced in 3.4 so patch is suitable for 3.4.y kernels. However that are conflicts in raid10.c so a separate patch will be needed for 3.4.y. Cc: stable@vger.kernel.org Reported-by: Sebastian Riemer <sebastian.riemer@profitbricks.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-05-22MD RAID1: Further conditionalize 'fullsync'Jonathan Brassow1-1/+2
A RAID1 device does not necessarily need a fullsync if the bitmap can be used instead. Similar to commit d6b212f4b19da5301e6b6eca562e5c7a2a6e8c8d in raid5.c, if a raid1 device can be brought back (i.e. from a transient failure) it shouldn't need a complete resync. Provided the bitmap is not to old, it will have recorded the areas of the disk that need recovery. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-05-22md: allow array to be resized while bitmap is present.NeilBrown1-2/+9
Now that bitmaps can be resized, we can allow an array to be resized while the bitmap is present. This only covers resizing that involves changing the effective size of member devices, not resizing that changes the number of devices. Signed-off-by: NeilBrown <neilb@suse.de>
2012-05-22md/raid1: allow fix_read_error to read from recovering device.majianpeng1-1/+3
When attempting to fix a read error, it is acceptable to read from a device that is recovering, provided the recovery has got past the place we are reading from. This makes the test for "can we read from here" the same as the test in read_balance. Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-05-21md: add possibility to change data-offset for devices.NeilBrown1-2/+2
When reshaping we can avoid costly intermediate backup by changing the 'start' address of the array on the device (if there is enough room). So as a first step, allow such a change to be requested through sysfs, and recorded in v1.x metadata. (As we didn't previous check that all 'pad' fields were zero, we need a new FEATURE flag for this. A (belatedly) check that all remaining 'pad' fields are zero to avoid a repeat of this) The new data offset must be requested separately for each device. This allows each to have a different change in the data offset. This is not likely to be used often but as data_offset can be set per-device, new_data_offset should be too. This patch also removes the 'acknowledged' arg to rdev_set_badblocks as it is never used and never will be. At the same time we add a new arg ('in_new') which is currently always zero but will be used more soon. When a reshape finishes we will need to update the data_offset and rdev->sectors. So provide an exported function to do that. Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-12md/raid1,raid10: Fix calculation of 'vcnt' when processing error recovery.majianpeng1-1/+2
If r1bio->sectors % 8 != 0,then the memcmp and a later memcpy will omit the last bio_vec. This is suitable for any stable kernel since 3.1 when bad-block management was introduced. Cc: stable@vger.kernel.org Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-03md/raid1,raid10: don't compare excess byte during consistency check.NeilBrown1-1/+1
When comparing two pages read from different legs of a mirror, only compare the bytes that were read, not the whole page. In most cases we read a whole page, but in some cases with bad blocks or odd sizes devices we might read fewer than that. This bug has been present "forever" but at worst it might cause a report of two many mismatches and generate a little bit extra resync IO, so there is no need to back-port to -stable kernels. Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-03md/raid1:Remove unnecessary rcu_dereference(conf->mirrors[i].rdev).majianpeng1-2/+1
Because rde->nr_pending > 0,so can not remove this disk. And in any case, we aren't holding rcu_read_lock() Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-04-02md/raid1: If md_integrity_register() failed,run() must free the memmajianpeng1-1/+7
Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-19md/raid1: handle merge_bvec_fn in member devices.NeilBrown1-21/+56
Currently we don't honour merge_bvec_fn in member devices so if there is one, we force all requests to be single-page at most. This is not ideal. So create a raid1 merge_bvec_fn to check that function in children as well. This introduces a small problem. There is no locking around calls the ->merge_bvec_fn and subsequent calls to ->make_request. So a device added between these could end up getting a request which violates its merge_bvec_fn. Currently the best we can do is synchronize_sched(). This will work providing no preemption happens. If there is is preemption, we just have to hope that new devices are largely consistent with old devices. Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-19md: tidy up rdev_for_each usage.NeilBrown1-2/+2
md.h has an 'rdev_for_each()' macro for iterating the rdevs in an mddev. However it uses the 'safe' version of list_for_each_entry, and so requires the extra variable, but doesn't include 'safe' in the name, which is useful documentation. Consequently some places use this safe version without needing it, and many use an explicity list_for_each entry. So: - rename rdev_for_each to rdev_for_each_safe - create a new rdev_for_each which uses the plain list_for_each_entry, - use the 'safe' version only where needed, and convert all other list_for_each_entry calls to use rdev_for_each. Signed-off-by: NeilBrown <neilb@suse.de>
2012-03-19md/raid1,raid10: avoid deadlock during resync/recovery.NeilBrown1-2/+15
If RAID1 or RAID10 is used under LVM or some other stacking block device, it is possible to enter a deadlock during resync or recovery. This can happen if the upper level block device creates two requests to the RAID1 or RAID10. The first request gets processed, blocks recovery and queue requests for underlying requests in current->bio_list. A resync request then starts which will wait for those requests and block new IO. But then the second request to the RAID1/10 will be attempted and it cannot progress until the resync request completes, which cannot progress until the underlying device requests complete, which are on a queue behind that second request. So allow that second request to proceed even though there is a resync request about to start. This is suitable for any -stable kernel. Cc: stable@vger.kernel.org Reported-by: Ray Morris <support@bettercgi.com> Tested-by: Ray Morris <support@bettercgi.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-02-13md/raid1: fix buglet in md_raid1_contested.NeilBrown1-1/+1
Since we added 'replacement' capability, RAID1 can have twice as many devices as ->raid_disks indicates. So md_raid1_congested needs to check that many possible devices, not just ->raid_disks many. Signed-off-by: NeilBrown <neilb@suse.de>
2012-01-11md/raid1: perform bad-block tests for WriteMostly devices too.NeilBrown1-1/+10
We normally try to avoid reading from write-mostly devices, but when we do we really have to check for bad blocks and be sure not to try reading them. With the current code, best_good_sectors might not get set and that causes zero-length read requests to be send down which is very confusing. This bug was introduced in commit d2eb35acfdccbe2 and so the patch is suitable for 3.1.x and 3.2.x Reported-and-tested-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Reported-and-tested-by: Art -kwaak- van Breemen <ard@telegraafnet.nl> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@vger.kernel.org
2011-12-23md/raid1: Mark device want_replacement when we see a write error.NeilBrown1-1/+15
Now that WantReplacement drives are replaced cleanly, mark a drive as want_replacement when we see a write error. It might get failed soon so the WantReplacement flag is irrelevant, but if the write error is recorded in the bad block log, we still want to activate any spare that might be available. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid1: If there is a spare and a want_replacement device, start replacement.NeilBrown1-2/+15
When attempting to add a spare to a RAID1 array, also consider adding it as a replacement for a want_replacement device. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid1: recognise replacements when assembling arrays.NeilBrown1-2/+23
If a Replacement is seen, file it as such. If we see two replacements (or two normal devices) for the one slot, abort. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid1: handle activation of replacement device when recovery completes.NeilBrown1-3/+33
When recovery completes ->spare_active is called. This checks if the replacement is ready and if so it fails the original. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid1: Allow a failed replacement device to be removed.NeilBrown1-0/+6
Replacement devices are stored at a different offset, so look there too. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid1: Allocate spare to store replacement devices and their bios.NeilBrown1-30/+34
In RAID1, a replacement is much like a normal device, so we just double the size of the relevant arrays and look at all possible devices for reads and writes. This means that the array looks like it is now double the size in some way - we need to be careful about that. In particular, we checking if the array is still degraded while creating a recovery request we need to only consider the first 'half' - i.e. the real (non-replacement) devices. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md/raid1: Replace use of mddev->raid_disks with conf->raid_disks.NeilBrown1-3/+4
In general mddev->raid_disks can change unexpectedly while conf->raid_disks will only change in a very controlled way. So change some uses of one to the other. The use of mddev->raid_disks will not cause actually problems but this way is more consistent and safer in the long term. Signed-off-by: NeilBrown <neilb@suse.de>
2011-12-23md: change hot_remove_disk to take an rdev rather than a number.NeilBrown1-4/+3
Soon an array will be able to have multiple devices with the same raid_disk number (an original and a replacement). So removing a device based on the number won't work. So pass the actual device handle instead. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-11-06Merge branch 'modsplit-Oct31_2011' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h
2011-11-04Merge branch 'for-3.2/core' of git://git.kernel.dk/linux-blockLinus Torvalds1-6/+3
* 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits) block: don't call blk_drain_queue() if elevator is not up blk-throttle: use queue_is_locked() instead of lockdep_is_held() blk-throttle: Take blkcg->lock while traversing blkcg->policy_list blk-throttle: Free up policy node associated with deleted rule block: warn if tag is greater than real_max_depth. block: make gendisk hold a reference to its queue blk-flush: move the queue kick into blk-flush: fix invalid BUG_ON in blk_insert_flush block: Remove the control of complete cpu from bio. block: fix a typo in the blk-cgroup.h file block: initialize the bounce pool if high memory may be added later block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown block: drop @tsk from attempt_plug_merge() and explain sync rules block: make get_request[_wait]() fail if queue is dead block: reorganize throtl_get_tg() and blk_throtl_bio() block: reorganize queue draining block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg() block: pass around REQ_* flags instead of broken down booleans during request alloc/free block: move blk_throtl prototypes to block/blk.h block: fix genhd refcounting in blkio_policy_parse_and_set() ... Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion and making the request functions be of type "void" instead of "int" in - drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c} - drivers/staging/zram/zram_drv.c
2011-10-31md: Add module.h to all files using it implicitlyPaul Gortmaker1-0/+1
A pending cleanup will mean that module.h won't be implicitly everywhere anymore. Make sure the modular drivers in md dir are actually calling out for <module.h> explicitly in advance. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-26md: Fix some bugs in recovery_disabled handling.NeilBrown1-1/+3
In 3.0 we changed the way recovery_disabled was handle so that instead of testing against zero, we test an mddev-> value against a conf-> value. Two problems: 1/ one place in raid1 was missed and still sets to '1'. 2/ We didn't explicitly set the conf-> value at array creation time. It defaulted to '0' just like the mddev value does so they could appear equal and thus disable recovery. This did not affect normal 'md' as it calls bind_rdev_to_array which changes the mddev value. However the dmraid interface doesn't call this and so doesn't change ->recovery_disabled; so at array start all recovery is incorrectly disabled. So initialise the 'conf' value to one less that the mddev value, so the will only be the same when explicitly set that way. Reported-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-24block: Remove the control of complete cpu from bio.Tao Ma1-1/+0
bio originally has the functionality to set the complete cpu, but it is broken. Chirstoph said that "This code is unused, and from the all the discussions lately pretty obviously broken. The only thing keeping it serves is creating more confusion and possibly more bugs." And Jens replied with "We can kill bio_set_completion_cpu(). I'm fine with leaving cpu control to the request based drivers, they are the only ones that can toggle the setting anyway". So this patch tries to remove all the work of controling complete cpu from a bio. Cc: Shaohua Li <shaohua.li@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19Merge branch 'v3.1-rc10' into for-3.2/coreJens Axboe1-7/+10
Conflicts: block/blk-core.c include/linux/blkdev.h Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-11md: add proper write-congestion reporting to RAID1 and RAID10.NeilBrown1-0/+20
RAID1 and RAID10 handle write requests by queuing them for handling by a separate thread. This is because when a write-intent-bitmap is active we might need to update the bitmap first, so it is good to queue a lot of writes, then do one big bitmap update for them all. However writeback request devices to appear to be congested after a while so it can make some guesstimate of throughput. The infinite queue defeats that (note that RAID5 has already has a finite queue so it doesn't suffer from this problem). So impose a limit on the number of pending write requests. By default it is 1024 which seems to be generally suitable. Make it configurable via module option just in case someone finds a regression. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11md: rename "mdk_personality" to "md_personality"NeilBrown1-1/+1
"mdk" doesn't mean anything any more. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11md/raid1: typedef removal: conf_t -> struct r1confNeilBrown1-47/+47
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11md: remove typedefs: mirror_info_t -> struct mirror_infoNeilBrown1-5/+5
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11md: remove typedefs: r10bio_t -> struct r10bio and r1bio_t -> struct r1bioNeilBrown1-30/+30
Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11md: remove typedefs: mddev_t -> struct mddevNeilBrown1-26/+26
Having mddev_t and 'struct mddev_s' is ugly and not preferred Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-11md: removing typedefs: mdk_rdev_t -> struct md_rdevNeilBrown1-23/+23
The typedefs are just annoying. 'mdk' probably refers to 'md_k.h' which used to be an include file that defined this thing. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07md: remove PRINTK and dprintk debugging and use pr_debugNeilBrown1-13/+11
Being able to dynamically enable these make them much more useful. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07md/raid1/ avoid bio search in end_sync_read()NeilBrown1-2/+1
We know which device we just read from so we don't need to search the bios to find out. Just use ->read_disk. Signed-off-by: NeilBrown <neilb@suse.de>
2011-10-07md/raid1: factor out common bio handling codeNamhyung Kim1-20/+24
When normal-write and sync-read/write bio completes, we should find out the disk number the bio belongs to. Factor those common code out to a separate function. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2011-09-21md: Avoid waking up a thread after it has been freed.NeilBrown1-2/+1
Two related problems: 1/ some error paths call "md_unregister_thread(mddev->thread)" without subsequently clearing ->thread. A subsequent call to mddev_unlock will try to wake the thread, and crash. 2/ Most calls to md_wakeup_thread are protected against the thread disappeared either by: - holding the ->mutex - having an active request, so something else must be keeping the array active. However mddev_unlock calls md_wakeup_thread after dropping the mutex and without any certainty of an active request, so the ->thread could theoretically disappear. So we need a spinlock to provide some protections. So change md_unregister_thread to take a pointer to the thread pointer, and ensure that it always does the required locking, and clears the pointer properly. Reported-by: "Moshe Melnikov" <moshe@zadarastorage.com> Signed-off-by: NeilBrown <neilb@suse.de> cc: stable@kernel.org
2011-09-12block: remove support for bio remapping from ->make_requestChristoph Hellwig1-5/+3
There is very little benefit in allowing to let a ->make_request instance update the bios device and sector and loop around it in __generic_make_request when we can archive the same through calling generic_make_request from the driver and letting the loop in generic_make_request handle it. Note that various drivers got the return value from ->make_request and returned non-zero values for errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-09-10md/raid1,10: Remove use-after-free bug in make_request.NeilBrown1-5/+9
A single request to RAID1 or RAID10 might result in multiple requests if there are known bad blocks that need to be avoided. To detect if we need to submit another write request we test: if (sectors_handled < (bio->bi_size >> 9)) { However this is after we call **_write_done() so the 'bio' no longer belongs to us - the writes could have completed and the bio freed. So move the **_write_done call until after the test against bio->bi_size. This addresses https://bugzilla.kernel.org/show_bug.cgi?id=41862 Reported-by: Bruno Wolff III <bruno@wolff.to> Tested-by: Bruno Wolff III <bruno@wolff.to> Signed-off-by: NeilBrown <neilb@suse.de>
2011-07-28md/raid1: factor several functions out or raid1d()NeilBrown1-159/+151
raid1d is too big with several deep branches. So separate them out into their own functions. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-28md/raid1: improve handling of read failure during recovery.NeilBrown1-7/+34
If we cannot read a block from anywhere during recovery, there is now a better approach than just giving up. We can record a bad block on each device and keep going - being careful not to clear the bad block when a write succeeds as it might - it will be a write of incorrect data. We have now reached the state where - for raid1 - we only call md_error if md_set_badblocks has failed. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-28md/raid1: record badblocks found during resync etc.NeilBrown1-30/+51
If we find a bad block while writing as part of resync/recovery we need to report that back to raid1d which must record the bad block, or fail the device. Similarly when fixing a read error, a further error should just record a bad block if possible rather than failing the device. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com>
2011-07-28md/raid1: Handle write errors by updating badblock log.NeilBrown1-23/+145
When we get a write error (in the data area, not in metadata), update the badblock log rather than failing the whole device. As the write may well be many blocks, we trying writing each block individually and only log the ones which fail. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com>