summaryrefslogtreecommitdiffstats
path: root/fs/f2fs
AgeCommit message (Collapse)AuthorFilesLines
2017-05-03f2fs: release cp and dnode lock before IPUHou Pengyang4-16/+27
We don't need to rewrite the page under cp_rwsem and dnode locks. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: shrink size of struct discard_cmdChao Yu1-1/+1
In order to shrink size of struct discard_cmd, change variable type of @state in struct discard_cmd from int to unsigned char. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: don't hold cmd_lock during waiting discard commandChao Yu2-5/+21
Previously, with protection of cmd_lock, we will wait for end io of discard command which potentially may lead long latency, making worse concurrency. So, in this patch, we try to add reference into discard entry to prevent the entry being released by other thread, then we can avoid holding global cmd_lock during waiting discard to finish. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: nullify fio->encrypted_page for each writesJaegeuk Kim1-1/+1
This makes sure each write request has nullified encrypted_page pointer. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: sanity check segment countJin Qian1-0/+7
F2FS uses 4 bytes to represent block address. As a result, supported size of disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments. Signed-off-by: Jin Qian <jinqian@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: introduce valid_ipu_blkaddr to clean upJaegeuk Kim1-5/+12
This patch introduces valid_ipu_blkaddr to clean up checking block address for inplace-update. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: lookup extent cache first under IPU scenarioHou Pengyang3-2/+16
If a page is cold, NOT atomit written and need_ipu now, there is a high probability that IPU should be adapted. For IPU, we try to check extent tree to get the block index first, instead of reading the dnode page, where may lead to an useless dnode IO, since no need to update the dnode index for IPU. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: reconstruct code to write a data pageHou Pengyang3-37/+54
This patch introduces encrypt_one_page which encrypts one data page before submit_bio, and change the use of need_inplace_update. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: introduce __wait_discard_cmdChao Yu1-22/+18
Just cleanup, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-05-02f2fs: introduce __issue_discard_cmdChao Yu1-33/+30
Just cleanup, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-25f2fs: enable small discard by defaultChao Yu2-3/+3
This patch start to enable 4K granularity small discard by default when realtime discard is on, so, in seriously fragmented space, small size discard can be issued in time to avoid useless storage space occupying of invalid filesystem's data, then performance of flash storage can be recovered. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-25f2fs: delay awaking discard threadChao Yu1-1/+2
It's better to delay awaking discard thread while queuing discard commands in checkpoint, it will help to give more chances for merging big and small discard. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-25f2fs: seperate read nat page from nat_tree_lockYunlei He1-3/+8
This patch seperate nat page read io from nat_tree_lock. -lock_page -get_node_info() -current_nat_addr ...... -> write_checkpoint -get_meta_page Because we lock node page, we can make sure no other threads modify this nid concurrently. So we just obtain current_nat_addr under nat_tree_lock, node info is always same in both nat pack. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-25f2fs: fix multiple f2fs_add_link() having same name for inline dentrySheng Yong1-7/+6
Commit 88c5c13a5027 (f2fs: fix multiple f2fs_add_link() calls having same name) does not cover the scenario where inline dentry is enabled. In that case, F2FS_I(dir)->task will be NULL, and __f2fs_add_link will lookup dentries one more time. This patch fixes it by moving the assigment of current task to a upper level to cover both normal and inline dentry. Cc: <stable@vger.kernel.org> Fixes: 88c5c13a5027 (f2fs: fix multiple f2fs_add_link() calls having same name) Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-24f2fs: skip encrypted inode in ASYNC IPU policyHou Pengyang1-1/+2
Async request may be throttled in block layer, so page for async may keep WRITE_BACK for a long time. For encrytped inode, we need wait on page writeback no matter if the device supports BDI_CAP_STABLE_WRITES. This may result in a higher waiting page writeback time for async encrypted inode page. This patch skips IPU for encrypted inode's updating write. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-24f2fs: fix out-of free segmentsJaegeuk Kim3-7/+25
This patch also reverts d0db7703ac1 ("f2fs: do SSR in higher priority"). This patch fixes out of free segments caused by many small file creation by 1) mkfs -s 1 2G 2) mount 3) untar - preoduce 60000 small files burstly 4) sync - flush node pages - flush imeta Here, when we do f2fs_balance_fs, we missed # of imeta blocks, resulting in skipping to check has_not_enough_free_secs. Another test is done by 1) mkfs -s 12 2G 2) mount 3) untar - preoduce 60000 small files burstly 4) sync - flush node pages - flush imeta In this case, this patch also fixes wrong block allocation under large section size. Reported-by: William Brana <wbrana@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-24f2fs: improve definition of statistic macrosArnd Bergmann1-29/+29
With a recent addition of f2fs_lookup_extent_tree(), we get a warning about the use of empty macros: fs/f2fs/extent_cache.c: In function 'f2fs_lookup_extent_tree': fs/f2fs/extent_cache.c:358:32: error: suggest braces around empty body in an 'else' statement [-Werror=empty-body] stat_inc_rbtree_node_hit(sbi); A good way to avoid the warning and make the code more robust is to define all no-op macros as 'do { } while (0)'. Fixes: 54c2258cd63a ("f2fs: extract rb-tree operation infrastructure") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reivewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-24f2fs: assign allocation hint for warm/cold dataJaegeuk Kim1-0/+5
This patch gives slower device region to warm/cold data area more eagerly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-24f2fs: fix _IOW usageJaegeuk Kim1-2/+3
This patch fixes wrong _IOW usage. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-24f2fs: add ioctl to flush data from faster device to cold areaJaegeuk Kim5-23/+120
This patch adds an ioctl to flush data in faster device to cold area. User can give device number and number of segments to move. It doesn't move it if there is only one device. The parameter looks like: struct f2fs_flush_device { u32 dev_num; /* device number to flush */ u32 segments; /* # of segments to flush */ }; Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-19f2fs: introduce async IPU policyHou Pengyang3-3/+13
This patch introduces an ASYNC IPU policy. Under senario of large # of async updating(e.g. log writing in Android), disk would be seriously fragmented, and higher frequent gc would be triggered. This patch uses IPU to rewrite the async update writting, since async is NOT sensitive to io latency. Signed-off-by: Hou Pengyang <houpengyang@huawei.com>
2017-04-19f2fs: add undiscard blocks statChao Yu3-2/+14
This patch adds to account undiscard blocks. Signed-off-by: Chao Yu <yuchao0@huawei.com>
2017-04-19f2fs: unlock cp_rwsem early for IPU writesChao Yu2-1/+6
For IPU writes, there won't be any udpates in dnode page since we will reuse old block address instead of allocating new one, so we don't need to lock cp_rwsem during IPU IO submitting. Signed-off-by: Chao Yu <yuchao0@huawei.com>
2017-04-19f2fs: introduce __check_rb_tree_consistenceChao Yu3-2/+47
Introduce __check_rb_tree_consistence to check consistence of rb-tree based discard cache in runtime. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-19f2fs: trace __submit_discard_cmdChao Yu1-1/+3
Add an even class f2fs_discard for introducing f2fs_queue_discard, then use f2fs_{queue,issue}_discard to trace __{queue,submit}_discard_cmd. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-19f2fs: in prior to issue big discardChao Yu2-16/+45
Keep issuing big size discard in prior instead of the one with random size, so that we expect that it will help to: - be quick to recycle unused large space in flash storage device. - give a chance for a) wait to merge small piece discards into bigger one, or b) avoid issuing discards while they have being reallocated by SSR. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-19f2fs: clean up discard_cmd_control structureChao Yu2-16/+16
Avoid long variable name in discard_cmd_control structure, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-19f2fs: use rb-tree to track pending discard commandsChao Yu3-50/+236
Introduce rb-tree based discard cache infrastructure to speed up lookup and merge operation of discard entry. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: initialize dc to avoid build warning] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-18f2fs: avoid dirty node pages in check_only recoveryJaegeuk Kim1-3/+5
In the check_only mode, we should not make any dirty node pages. Otherwise, we can get this panic: F2FS-fs (nvme0n1p1): Need to recover fsync data ------------[ cut here ]------------ kernel BUG at fs/f2fs/node.c:2204! CPU: 7 PID: 19923 Comm: mount Tainted: G OE 4.9.8 #2 RIP: 0010:[<ffffffffc0979c0b>] [<ffffffffc0979c0b>] flush_nat_entries+0x43b/0x7d0 [f2fs] Call Trace: [<ffffffffc096ddaa>] ? __f2fs_submit_merged_bio+0x5a/0xd0 [f2fs] [<ffffffffc096ddaa>] ? __f2fs_submit_merged_bio+0x5a/0xd0 [f2fs] [<ffffffffc096dddb>] ? __f2fs_submit_merged_bio+0x8b/0xd0 [f2fs] [<ffffffff860e450f>] ? up_write+0x1f/0x40 [<ffffffffc096dddb>] ? __f2fs_submit_merged_bio+0x8b/0xd0 [f2fs] [<ffffffffc0969f04>] write_checkpoint+0x2f4/0xf20 [f2fs] [<ffffffff860e938d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffc0960bc9>] ? f2fs_sync_fs+0x79/0x190 [f2fs] [<ffffffffc0960bc9>] ? f2fs_sync_fs+0x79/0x190 [f2fs] [<ffffffffc0960bd5>] f2fs_sync_fs+0x85/0x190 [f2fs] [<ffffffffc097b6de>] f2fs_balance_fs_bg+0x7e/0x1c0 [f2fs] [<ffffffffc0977b64>] f2fs_write_node_pages+0x34/0x350 [f2fs] [<ffffffff860e5f42>] ? __lock_is_held+0x52/0x70 [<ffffffff861d9b31>] do_writepages+0x21/0x30 [<ffffffff86298ce1>] __writeback_single_inode+0x61/0x760 [<ffffffff86909127>] ? _raw_spin_unlock+0x27/0x40 [<ffffffff8629a735>] writeback_single_inode+0xd5/0x190 [<ffffffff8629a889>] write_inode_now+0x99/0xc0 [<ffffffff86283876>] iput+0x1f6/0x2c0 [<ffffffffc0964b52>] f2fs_fill_super+0xc32/0x10c0 [f2fs] [<ffffffff86266462>] mount_bdev+0x182/0x1b0 [<ffffffffc0963f20>] ? f2fs_commit_super+0x100/0x100 [f2fs] [<ffffffffc0960da5>] f2fs_mount+0x15/0x20 [f2fs] [<ffffffff86266e08>] mount_fs+0x38/0x170 [<ffffffff86288bab>] vfs_kern_mount+0x6b/0x160 [<ffffffff8628bcfe>] do_mount+0x1be/0xd60 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-12f2fs: fix not to set fsync/dentry markJaegeuk Kim1-0/+3
Otherwise, we can see stale fsync/dentry mark given by previous calls, resulting in giving up roll-forward recovery due to wrong dentry mark. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-12f2fs: allocate hot_data for atomic writesJaegeuk Kim1-0/+1
We'd better allocate atomic writes to hot_data zone. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-12f2fs: give time to flush dirty pages for checkpointJaegeuk Kim1-0/+3
If all the threads are waiting for checkpoint, we have no chance to flush required dirty pages. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-12f2fs: fix fs corruption due to zero inode pageJaegeuk Kim2-11/+11
This patch fixes the following scenario. - f2fs_create/f2fs_mkdir - write_checkpoint - f2fs_mark_inode_dirty_sync - block_operations - f2fs_lock_all - f2fs_sync_inode_meta - f2fs_unlock_all - sync_inode_metadata - f2fs_lock_op - f2fs_write_inode - update_inode_page - get_node_page return -ENOENT - new_inode_page - fill_node_footer - f2fs_mark_inode_dirty_sync - ... - f2fs_unlock_op - f2fs_inode_synced - f2fs_lock_all - do_checkpoint In this checkpoint, we can get an inode page which contains zeros having valid node footer only. Cc: <stable@vger.kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-12f2fs: shrink blk plug regionChao Yu1-4/+2
Don't use blk plug covering area where there won't be any IOs being issued. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-11f2fs: extract rb-tree operation infrastructureChao Yu2-132/+179
rb-tree lookup/update functions are deeply coupled into extent cache codes, it's very hard to reuse these basic functions, this patch extracts common rb-tree operation infrastructure for latter reusing. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-11f2fs: avoid frequent checkpoint during f2fs_gcJaegeuk Kim1-3/+5
Now we're doing SSR aggressively more than ever before, so once we reach to the reserved_segment, f2fs_balance_fs will call f2fs_gc, which triggers checkpoint everytime. We actually must avoid that. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-10f2fs: clean up some macros in terms of GET_SEGNOJaegeuk Kim6-42/+45
This patch cleans several macros by introducing: - BLKS_PER_SEC - GET_SEC_FROM_SEG - GET_SEG_FROM_SEC - GET_ZONE_FROM_SEC - GET_ZONE_FROM_SEG Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-10f2fs: clean up get_valid_blocks with consistent parameterJaegeuk Kim5-15/+15
This patch cleans up get_valid_blocks, which has no functional change. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-10f2fs: use segment number for get_valid_blocksJaegeuk Kim1-2/+4
This patch fixes to submit a segment number for get_valid_blocks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-10f2fs: guard macro variables with bracesTomohiro Kusumi5-70/+70
Add braces around variables used within macros for those make sense to do it. Many of the macros in f2fs already do this. What this commit doesn't do is anything that changes line# as a result of adding braces, which usually affects the binary via __LINE__. Confirmed no diff in fs/f2fs/f2fs.ko before/after this commit on x86_64, to make sure this has no functional change as well as there's been no unexpected side effect due to callers' arithmetics within the existing code. Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-10f2fs: fix comment on f2fs_flush_merged_bios() after 86531d6bTomohiro Kusumi1-1/+1
Callers are to unlock the page on failure after 86531d6b. Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-10f2fs: prevent waiter encountering incorrect discard statesChao Yu1-4/+5
In f2fs_submit_discard_endio, we will wake up waiter before setting discard command states, so waiter may use incorrect states. Change the order between complete() and states setting to fix this issue. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-10f2fs: introduce f2fs_wait_discard_biosChao Yu3-17/+24
Split f2fs_wait_discard_bios from f2fs_wait_discard_bio, just for cleanup, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-10f2fs: split discard_cmd_listChao Yu2-18/+32
Split discard_cmd_list to discard_{pend,wait}_list, so while sending/waiting discard command, we can avoid traversing unneeded entries in original list. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-10Revert "f2fs: put allocate_segment after refresh_sit_entry"Jaegeuk Kim1-5/+4
This reverts commit 3436c4bdb30de421d46f58c9174669fbcfd40ce0. This makes a leak to register dirty segments. I reproduced the issue by modified postmark which injects a lot of file create/delete/update and finally triggers huge number of SSR allocations. Cc: <stable@vger.kernel.org> # v4.10+ [Jaegeuk Kim: Change missing incorrect comment] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-05f2fs: split make_dentry_ptr() into block and inline versionsTomohiro Kusumi3-26/+24
Since callers statically know which type to use, make_dentry_ptr() can simply be splitted into two inline functions. This way, the code has less inlined, fewer arguments, and no cast. Signed-off-by: Tomohiro Kusumi <tkusumi@tuxera.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-05f2fs: submit bio of in-place-update pagesJaegeuk Kim3-4/+7
This patch tries to split in-place-update bios from sequential bios. Suggested-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-05f2fs: remove the redundant variable definitionKaixu Xia1-1/+0
The variable 'i' has been defined before, so here we can use it directly. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-05f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONEJaegeuk Kim3-2/+18
If two threads try to flush dirty pages in different inodes respectively, f2fs_write_data_pages() will produce WRITE and WRITE_SYNC one at a time, resulting in a lot of 4KB seperated IOs. So, this patch gives higher priority to WB_SYNC_ALL IOs and gathers write IOs with a big WRITE_SYNC'ed bio. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-04-05f2fs: write small sized IO to hot logJaegeuk Kim6-7/+21
It would better split small and large IOs separately in order to get more consecutive big writes. The default threshold is set to 64KB, but configurable by sysfs/min_hot_blocks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>