diff options
author | Jan Kara <jack@suse.cz> | 2013-11-12 15:07:51 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2013-11-13 12:09:07 +0900 |
commit | c4a391b53a72d2df4ee97f96f78c1d5971b47489 (patch) | |
tree | edaedbc54345e62211d36273caea8fa9f60d8916 /include | |
parent | 46c77e2bb07eba3b38edfec76873f12942c49dd3 (diff) | |
download | linux-c4a391b53a72d2df4ee97f96f78c1d5971b47489.tar.bz2 |
writeback: do not sync data dirtied after sync start
When there are processes heavily creating small files while sync(2) is
running, it can easily happen that quite some new files are created
between WB_SYNC_NONE and WB_SYNC_ALL pass of sync(2). That can happen
especially if there are several busy filesystems (remember that sync
traverses filesystems sequentially and waits in WB_SYNC_ALL phase on one
fs before starting it on another fs). Because WB_SYNC_ALL pass is slow
(e.g. causes a transaction commit and cache flush for each inode in
ext3), resulting sync(2) times are rather large.
The following script reproduces the problem:
function run_writers
{
for (( i = 0; i < 10; i++ )); do
mkdir $1/dir$i
for (( j = 0; j < 40000; j++ )); do
dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null
done &
done
}
for dir in "$@"; do
run_writers $dir
done
sleep 40
time sync
Fix the problem by disregarding inodes dirtied after sync(2) was called
in the WB_SYNC_ALL pass. To allow for this, sync_inodes_sb() now takes
a time stamp when sync has started which is used for setting up work for
flusher threads.
To give some numbers, when above script is run on two ext4 filesystems
on simple SATA drive, the average sync time from 10 runs is 267.549
seconds with standard deviation 104.799426. With the patched kernel,
the average sync time from 10 runs is 2.995 seconds with standard
deviation 0.096.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'include')
-rw-r--r-- | include/linux/writeback.h | 2 | ||||
-rw-r--r-- | include/trace/events/writeback.h | 6 |
2 files changed, 4 insertions, 4 deletions
diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 021b8a319b9e..fc0e4320aa6d 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -97,7 +97,7 @@ void writeback_inodes_sb_nr(struct super_block *, unsigned long nr, int try_to_writeback_inodes_sb(struct super_block *, enum wb_reason reason); int try_to_writeback_inodes_sb_nr(struct super_block *, unsigned long nr, enum wb_reason reason); -void sync_inodes_sb(struct super_block *); +void sync_inodes_sb(struct super_block *sb, unsigned long older_than_this); void wakeup_flusher_threads(long nr_pages, enum wb_reason reason); void inode_wait_for_writeback(struct inode *inode); diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h index 464ea82e10db..c7bbbe794e65 100644 --- a/include/trace/events/writeback.h +++ b/include/trace/events/writeback.h @@ -287,11 +287,11 @@ TRACE_EVENT(writeback_queue_io, __field(int, reason) ), TP_fast_assign( - unsigned long *older_than_this = work->older_than_this; + unsigned long older_than_this = work->older_than_this; strncpy(__entry->name, dev_name(wb->bdi->dev), 32); - __entry->older = older_than_this ? *older_than_this : 0; + __entry->older = older_than_this; __entry->age = older_than_this ? - (jiffies - *older_than_this) * 1000 / HZ : -1; + (jiffies - older_than_this) * 1000 / HZ : -1; __entry->moved = moved; __entry->reason = work->reason; ), |