summaryrefslogtreecommitdiffstats
path: root/fs/f2fs/file.c
AgeCommit message (Collapse)AuthorFilesLines
2019-08-23f2fs: remove duplicate code in f2fs_file_write_iterLihong Kou1-5/+0
We will do the same check in generic_write_checks. if (iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT) return -EINVAL; just remove the same check in f2fs_file_write_iter. Signed-off-by: Lihong Kou <koulihong@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-08-23f2fs: fix to migrate blocks correctly during defragmentChao Yu1-4/+6
During defragment, we missed to trigger fragmented blocks migration for below condition: In defragment region: - total number of valid blocks is smaller than 512; - the tail part of the region are all holes; In addtion, return zero to user via range->len if there is no fragmented blocks. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-08-23f2fs: fix to use more generic EOPNOTSUPPChao Yu1-1/+1
EOPNOTSUPP is widely used as error number indicating operation is not supported in syscall, and ENOTSUPP was defined and only used for NFSv3 protocol, so use EOPNOTSUPP instead. Fixes: 0a2aa8fbb969 ("f2fs: refactor __exchange_data_block for speed up") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-08-23f2fs: Support case-insensitive file name lookupsDaniel Rosenberg1-2/+12
Modeled after commit b886ee3e778e ("ext4: Support case-insensitive file name lookups") """ This patch implements the actual support for case-insensitive file name lookups in f2fs, based on the feature bit and the encoding stored in the superblock. A filesystem that has the casefold feature set is able to configure directories with the +F (F2FS_CASEFOLD_FL) attribute, enabling lookups to succeed in that directory in a case-insensitive fashion, i.e: match a directory entry even if the name used by userspace is not a byte per byte match with the disk name, but is an equivalent case-insensitive version of the Unicode string. This operation is called a case-insensitive file name lookup. The feature is configured as an inode attribute applied to directories and inherited by its children. This attribute can only be enabled on empty directories for filesystems that support the encoding feature, thus preventing collision of file names that only differ by case. * dcache handling: For a +F directory, F2Fs only stores the first equivalent name dentry used in the dcache. This is done to prevent unintentional duplication of dentries in the dcache, while also allowing the VFS code to quickly find the right entry in the cache despite which equivalent string was used in a previous lookup, without having to resort to ->lookup(). d_hash() of casefolded directories is implemented as the hash of the casefolded string, such that we always have a well-known bucket for all the equivalencies of the same string. d_compare() uses the utf8_strncasecmp() infrastructure, which handles the comparison of equivalent, same case, names as well. For now, negative lookups are not inserted in the dcache, since they would need to be invalidated anyway, because we can't trust missing file dentries. This is bad for performance but requires some leveraging of the vfs layer to fix. We can live without that for now, and so does everyone else. * on-disk data: Despite using a specific version of the name as the internal representation within the dcache, the name stored and fetched from the disk is a byte-per-byte match with what the user requested, making this implementation 'name-preserving'. i.e. no actual information is lost when writing to storage. DX is supported by modifying the hashes used in +F directories to make them case/encoding-aware. The new disk hashes are calculated as the hash of the full casefolded string, instead of the string directly. This allows us to efficiently search for file names in the htree without requiring the user to provide an exact name. * Dealing with invalid sequences: By default, when a invalid UTF-8 sequence is identified, ext4 will treat it as an opaque byte sequence, ignoring the encoding and reverting to the old behavior for that unique file. This means that case-insensitive file name lookup will not work only for that file. An optional bit can be set in the superblock telling the filesystem code and userspace tools to enforce the encoding. When that optional bit is set, any attempt to create a file name using an invalid UTF-8 sequence will fail and return an error to userspace. * Normalization algorithm: The UTF-8 algorithms used to compare strings in f2fs is implemented in fs/unicode, and is based on a previous version developed by SGI. It implements the Canonical decomposition (NFD) algorithm described by the Unicode specification 12.1, or higher, combined with the elimination of ignorable code points (NFDi) and full case-folding (CF) as documented in fs/unicode/utf8_norm.c. NFD seems to be the best normalization method for F2FS because: - It has a lower cost than NFC/NFKC (which requires decomposing to NFD as an intermediary step) - It doesn't eliminate important semantic meaning like compatibility decompositions. Although: - This implementation is not completely linguistic accurate, because different languages have conflicting rules, which would require the specialization of the filesystem to a given locale, which brings all sorts of problems for removable media and for users who use more than one language. """ Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-08-23f2fs: disallow direct IO in atomic writeChao Yu1-0/+3
Atomic write needs page cache to cache data of transaction, direct IO should never be allowed in atomic write, detect and deny it when open atomic write file. Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-08-23f2fs: fix to spread f2fs_is_checkpoint_ready()Chao Yu1-0/+11
We missed to call f2fs_is_checkpoint_ready() in several places, it may allow space allocation even when free space was exhausted during checkpoint is disabled, fix to add them. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-12f2fs: remove redundant check from f2fs_setflags_common()Eric Biggers1-8/+1
Now that f2fs_ioc_setflags() and f2fs_ioc_fssetxattr() call the VFS helper functions which check for permission to change the immutable and append-only flags, it's no longer needed to do this check in f2fs_setflags_common() too. So remove it. This is based on a patch from Darrick Wong, but reworked to apply after commit 360985573b55 ("f2fs: separate f2fs i_flags from fs_flags and ext4 i_flags"). Originally-from: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-12f2fs: use generic checking function for FS_IOC_FSSETXATTREric Biggers1-31/+14
Make the f2fs implementation of FS_IOC_FSSETXATTR use the new VFS helper function vfs_ioc_fssetxattr_check(), and remove the project quota check since it's now done by the helper function. This is based on a patch from Darrick Wong, but reworked to apply after commit 360985573b55 ("f2fs: separate f2fs i_flags from fs_flags and ext4 i_flags"). Originally-from: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-12f2fs: use generic checking and prep function for FS_IOC_SETFLAGSEric Biggers1-1/+8
Make the f2fs implementation of FS_IOC_SETFLAGS use the new VFS helper function vfs_ioc_setflags_prepare(). This is based on a patch from Darrick Wong, but reworked to apply after commit 360985573b55 ("f2fs: separate f2fs i_flags from fs_flags and ext4 i_flags"). Originally-from: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-10f2fs: allow all the users to pin a fileJaegeuk Kim1-3/+0
This patch allows users to pin files. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02f2fs: allocate blocks for pinned fileJaegeuk Kim1-1/+6
This patch allows fallocate to allocate physical blocks for pinned file. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02f2fs: use generic EFSBADCRC/EFSCORRUPTEDChao Yu1-1/+1
f2fs uses EFAULT as error number to indicate filesystem is corrupted all the time, but generic filesystems use EUCLEAN for such condition, we need to change to follow others. This patch adds two new macros as below to wrap more generic error code macros, and spread them in code. EFSBADCRC EBADMSG /* Bad CRC detected */ EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */ Reported-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Chao Yu <yuchao0@huawei.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02f2fs: Use DIV_ROUND_UP() instead of open-codingGeert Uytterhoeven1-3/+3
Replace the open-coded divisions with round-up by calls to the DIV_ROUND_UP() helper macro. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02f2fs: introduce f2fs_<level> macros to wrap f2fs_printk()Joe Perches1-13/+8
- Add and use f2fs_<level> macros - Convert f2fs_msg to f2fs_printk - Remove level from f2fs_printk and embed the level in the format - Coalesce formats and align multi-line arguments - Remove unnecessary duplicate extern f2fs_msg f2fs.h Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02f2fs: ioctl for removing a range from F2FSQiuyang Sun1-0/+24
This ioctl shrinks a given length (aligned to sections) from end of the main area. Any cursegs and valid blocks will be moved out before invalidating the range. This feature can be used for adjusting partition sizes online. History of the patch: Sahitya Tummala: - Add this ioctl for f2fs_compat_ioctl() as well. - Fix debugfs status to reflect the online resize changes. - Fix potential race between online resize path and allocate new data block path or gc path. Others: - Rename some identifiers. - Add some error handling branches. - Clear sbi->next_victim_seg[BG_GC/FG_GC] in shrinking range. - Implement this interface as ext4's, and change the parameter from shrunk bytes to new block count of F2FS. - During resizing, force to empty sit_journal and forbid adding new entries to it, in order to avoid invalid segno in journal after resize. - Reduce sbi->user_block_count before resize starts. - Commit the updated superblock first, and then update in-memory metadata only when the former succeeds. - Target block count must align to sections. - Write checkpoint before and after committing the new superblock, w/o CP_FSCK_FLAG respectively, so that the FS can be fixed by fsck even if resize fails after the new superblock is committed. - In free_segment_range(), reduce granularity of gc_mutex. - Add protection on curseg migration. - Add freeze_bdev() and thaw_bdev() for resize fs. - Remove CUR_MAIN_SECS and use MAIN_SECS directly for allocation. - Recover super_block and FS metadata when resize fails. - No need to clear CP_FSCK_FLAG in update_ckpt_flags(). - Clean up the sb and fs metadata update functions for resize_fs. Geert Uytterhoeven: - Use div_u64*() for 64-bit divisions Arnd Bergmann: - Not all architectures support get_user() with a 64-bit argument: ERROR: "__get_user_bad" [fs/f2fs/f2fs.ko] undefined! Use copy_from_user() here, this will always work. Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-06-21f2fs: separate f2fs i_flags from fs_flags and ext4 i_flagsEric Biggers1-80/+159
f2fs copied all the on-disk i_flags from ext4, and along with it the assumption that the on-disk i_flags are the same as the bits used by FS_IOC_GETFLAGS and FS_IOC_SETFLAGS. This is problematic because reserving an on-disk inode flag in either filesystem's i_flags or in these ioctls effectively reserves it in all the other places too. In fact, most of the "f2fs i_flags" are not used by f2fs at all. Fix this by separating f2fs's i_flags from the ioctl bits and ext4's i_flags. In the process, un-reserve all "f2fs i_flags" that aren't actually supported by f2fs. This included various flags that were not settable at all, as well as various flags that were settable by FS_IOC_SETFLAGS but didn't actually do anything. There's a slight chance we'll need to add some flag(s) back to FS_IOC_SETFLAGS in order to avoid breaking users who expect f2fs to accept some random flag(s). But hopefully such users don't exist. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: add tracepoint for f2fs_filemap_fault()Chao Yu1-0/+2
This patch adds tracepoint for f2fs_filemap_fault(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: introduce DATA_GENERIC_ENHANCEChao Yu1-3/+12
Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check whether @blkaddr locates in main area or not. That check is weak, since the block address in range of main area can point to the address which is not valid in segment info table, and we can not detect such condition, we may suffer worse corruption as system continues running. So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check which trigger SIT bitmap check rather than only range check. This patch did below changes as wel: - set SBI_NEED_FSCK in f2fs_is_valid_blkaddr(). - get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid. - introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check. - spread blkaddr check in: * f2fs_get_node_info() * __read_out_blkaddrs() * f2fs_submit_page_read() * ra_data_block() * do_recover_data() This patch can fix bug reported from bugzilla below: https://bugzilla.kernel.org/show_bug.cgi?id=203215 https://bugzilla.kernel.org/show_bug.cgi?id=203223 https://bugzilla.kernel.org/show_bug.cgi?id=203231 https://bugzilla.kernel.org/show_bug.cgi?id=203235 https://bugzilla.kernel.org/show_bug.cgi?id=203241 = Update by Jaegeuk Kim = DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths. But, xfstest/generic/446 compalins some generated kernel messages saying invalid bitmap was detected when reading a block. The reaons is, when we get the block addresses from extent_cache, there is no lock to synchronize it from truncating the blocks in parallel. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: remove redundant check in f2fs_file_write_iter()Chengguang Xu1-13/+9
We have already checked flag IOCB_DIRECT in the sanity check of flag IOCB_NOWAIT, so don't have to check it again here. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: allow address pointer number of dnode aligning to specified sizeChao Yu1-3/+4
This patch expands scalability of dnode layout, it allows address pointer number of dnode aligning to specified size (now, the size is one byte by default), and later the number can align to compress cluster size (1 << n bytes, n=[2,..)), it can avoid cluster acrossing two dnode, making design of compress meta layout simple. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-05-08f2fs: remove new blank line of f2fs kernel messageChao Yu1-2/+2
Just removing '\n' in f2fs_msg(, "\n") to avoid redundant new blank line. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-04-16f2fs: add tracepoint for f2fs_file_write_iter()Chao Yu1-9/+19
This patch adds tracepoint for f2fs_file_write_iter(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-04-05f2fs: Fix use of number of devicesDamien Le Moal1-1/+1
For a single device mount using a zoned block device, the zone information for the device is stored in the sbi->devs single entry array and sbi->s_ndevs is set to 1. This differs from a single device mount using a regular block device which does not allocate sbi->devs and sets sbi->s_ndevs to 0. However, sbi->s_devs == 0 condition is used throughout the code to differentiate a single device mount from a multi-device mount where sbi->s_ndevs is always larger than 1. This results in problems with single zoned block device volumes as these are treated as multi-device mounts but do not have the start_blk and end_blk information set. One of the problem observed is skipping of zone discard issuing resulting in write commands being issued to full zones or unaligned to a zone write pointer. Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single regular block device mount) and sbi->s_ndevs == 1 (single zoned block device mount) in the same manner. This is done by introducing the helper function f2fs_is_multi_device() and using this helper in place of direct tests of sbi->s_ndevs value, improving code readability. Fixes: 7bb3a371d199 ("f2fs: Fix zoned block device support") Cc: <stable@vger.kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-15Merge tag 'f2fs-for-5.1' of ↵Linus Torvalds1-22/+24
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "We've continued mainly to fix bugs in this round, as f2fs has been shipped in more devices. Especially, we've focused on stabilizing checkpoint=disable feature, and provided some interfaces for QA. Enhancements: - expose FS_NOCOW_FL for pin_file - run discard jobs at unmount time with timeout - tune discarding thread to avoid idling which consumes power - some checking codes to address vulnerabilities - give random value to i_generation - shutdown with more flags for QA Bug fixes: - clean up stale objects when mount is failed along with checkpoint=disable - fix system being stuck due to wrong count by atomic writes - handle some corrupted disk cases - fix a deadlock in f2fs_read_inline_dir We've also added some minor build error fixes and clean-up patches" * tag 'f2fs-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (53 commits) f2fs: set pin_file under CAP_SYS_ADMIN f2fs: fix to avoid deadlock in f2fs_read_inline_dir() f2fs: fix to adapt small inline xattr space in __find_inline_xattr() f2fs: fix to do sanity check with inode.i_inline_xattr_size f2fs: give some messages for inline_xattr_size f2fs: don't trigger read IO for beyond EOF page f2fs: fix to add refcount once page is tagged PG_private f2fs: remove wrong comment in f2fs_invalidate_page() f2fs: fix to use kvfree instead of kzfree f2fs: print more parameters in trace_f2fs_map_blocks f2fs: trace f2fs_ioc_shutdown f2fs: fix to avoid deadlock of atomic file operations f2fs: fix to dirty inode for i_mode recovery f2fs: give random value to i_generation f2fs: no need to take page lock in readdir f2fs: fix to update iostat correctly in IPU path f2fs: fix encrypted page memory leak f2fs: make fault injection covering __submit_flush_wait() f2fs: fix to retry fill_super only if recovery failed f2fs: silence VM_WARN_ON_ONCE in mempool_alloc ...
2019-03-14f2fs: set pin_file under CAP_SYS_ADMINJaegeuk Kim1-2/+2
Android uses pin_file for uncrypt during OTA, and that should be managed by CAP_SYS_ADMIN only. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-12f2fs: trace f2fs_ioc_shutdownChao Yu1-0/+3
This patch supports to trace f2fs_ioc_shutdown. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-12f2fs: fix to dirty inode for i_mode recoveryChao Yu1-4/+1
As Seulbae Kim reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202637 We didn't recover permission field correctly after sudden power-cut, the reason is in setattr we didn't add inode into global dirty list once i_mode is changed, so latter checkpoint triggered by fsync will not flush last i_mode into disk, result in this problem, fix it. Reported-by: Seulbae Kim <seulbae@gatech.edu> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-05f2fs: fix potential data inconsistence of checkpointChao Yu1-8/+6
Previously, we changed lock from cp_rwsem to node_change, it solved the deadlock issue which was caused by below race condition: Thread A Thread B - f2fs_setattr - f2fs_lock_op -- read_lock - dquot_transfer - __dquot_transfer - dquot_acquire - commit_dqblk - f2fs_quota_write - f2fs_write_begin - f2fs_write_failed - write_checkpoint - block_operations - f2fs_lock_all -- write_lock - f2fs_truncate_blocks - f2fs_lock_op -- read_lock But it breaks the sematics of cp_rwsem, in other callers like: - f2fs_file_write_iter -> f2fs_write_begin -> f2fs_write_failed - f2fs_direct_IO -> f2fs_write_failed We allow to truncate dnode w/o cp_rwsem held, result in incorrect sit bitmap update, which can cause further data corruption. So this patch reverts previous fix implementation, and try to fix deadlock by skipping calling f2fs_truncate_blocks() in f2fs_write_failed() only for quota file, and keep the preallocated data/node in the tail of quota file, we can expecte that the preallocated space can be used to store quota info latter soon. Fixes: af033b2aa8a8 ("f2fs: guarantee journalled quota data by checkpoint") Signed-off-by: Gao Xiang <gaoxiang25@huawei.com> Signed-off-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-15f2fs: add quick mode of checkpoint=disable for QAJaegeuk Kim1-3/+3
This mode returns mount() quickly with EAGAIN. We can trigger this by shutdown(F2FS_GOING_DOWN_NEED_FSCK). Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-23f2fs: use IS_ENCRYPTED() to check encryption statusChandan Rajendra1-5/+5
This commit removes the f2fs specific f2fs_encrypted_inode() and makes use of the generic IS_ENCRYPTED() macro to check for the encryption status of an inode. Acked-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-01-08f2fs: export FS_NOCOW_FL flag to userJaegeuk Kim1-0/+2
This exports pin_file status to user. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-08f2fs: wait on atomic writes to count F2FS_CP_WB_DATAJaegeuk Kim1-5/+7
Otherwise, we can get wrong counts incurring checkpoint hang. IO_W (CP: -24, Data: 24, Flush: ( 0 0 1), Discard: ( 0 0)) Thread A Thread B - f2fs_write_data_pages - __write_data_page - f2fs_submit_page_write - inc_page_count(F2FS_WB_DATA) type is F2FS_WB_DATA due to file is non-atomic one - f2fs_ioc_start_atomic_write - set_inode_flag(FI_ATOMIC_FILE) - f2fs_write_end_io - dec_page_count(F2FS_WB_CP_DATA) type is F2FS_WB_DATA due to file becomes atomic one Cc: <stable@vger.kernel.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26f2fs: check PageWriteback flag for ordered caseChao Yu1-3/+3
For all ordered cases in f2fs_wait_on_page_writeback(), we need to check PageWriteback status, so let's clean up to relocate the check into f2fs_wait_on_page_writeback(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-26f2fs: fix to dirty inode synchronouslyChao Yu1-1/+1
If user change inode's i_flags via ioctl, let's add it into global dirty list, so that checkpoint can guarantee its persistence before fsync, it can make checkpoint keeping strong consistency. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-12-14f2fs: add an ioctl() to explicitly trigger fsck laterJaegeuk Kim1-0/+7
This adds an option in ioctl(F2FS_IOC_SHUTDOWN) in order to trigger fsck by setting a NEED_FSCK flag. Generally, shutdown is used for the test to validate filesystem consistency, and setting NEED_FSCK flag can be used for Android to trigger fsck.f2fs at boot time explicitly so that we could measure the elapsed time as well as force filesystem check. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: fix m_may_create to make OPU DIO write correctlyJia Zhu1-1/+3
Previously, we added a parameter @map.m_may_create to trigger OPU allocation and call f2fs_balance_fs() correctly. But in get_more_blocks(), @create has been overwritten by below code. So the function f2fs_map_blocks() will not allocate new block address but directly go out. Meanwile,there are several functions calling f2fs_map_blocks() directly and @map.m_may_create not initialized. CODE: create = dio->op == REQ_OP_WRITE; if (dio->flags & DIO_SKIP_HOLES) { if (fs_startblk <= ((i_size_read(dio->inode) - 1) >> i_blkbits)) create = 0; } This patch fixes it. Signed-off-by: Jia Zhu <zhujia13@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: fix out-place-update DIO writeChao Yu1-1/+2
In get_more_blocks(), we may override @create as below code: create = dio->op == REQ_OP_WRITE; if (dio->flags & DIO_SKIP_HOLES) { if (fs_startblk <= ((i_size_read(dio->inode) - 1) >> i_blkbits)) create = 0; } But in f2fs_map_blocks(), we only trigger f2fs_balance_fs() if @create is 1, so in LFS mode, dio overwrite under LFS mode can easily run out of free segments, result in below panic. Call Trace: allocate_segment_by_default+0xa8/0x270 [f2fs] f2fs_allocate_data_block+0x1ea/0x5c0 [f2fs] __allocate_data_block+0x306/0x480 [f2fs] f2fs_map_blocks+0x6f6/0x920 [f2fs] __get_data_block+0x4f/0xb0 [f2fs] get_data_block_dio_write+0x50/0x60 [f2fs] do_blockdev_direct_IO+0xcd5/0x21e0 __blockdev_direct_IO+0x3a/0x3c f2fs_direct_IO+0x1ff/0x4a0 [f2fs] generic_file_direct_write+0xd9/0x160 __generic_file_write_iter+0xbb/0x1e0 f2fs_file_write_iter+0xaf/0x220 [f2fs] __vfs_write+0xd0/0x130 vfs_write+0xb2/0x1b0 SyS_pwrite64+0x69/0xa0 ? vtime_user_exit+0x29/0x70 do_syscall_64+0x6e/0x160 entry_SYSCALL64_slow_path+0x25/0x25 RIP: new_curseg+0x36f/0x380 [f2fs] RSP: ffffac570393f7a8 So this patch introduces a parameter map.m_may_create to indicate that f2fs_map_blocks() is called from write or read path, which can give the right hint to let f2fs_map_blocks() trigger OPU allocation and call f2fs_balanc_fs() correctly. BTW, it disables physical address preallocation for direct IO in f2fs_preallocate_blocks, which is redundant to OPU allocation of f2fs_map_blocks. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: move dir data flush to write checkpoint processYunlei He1-0/+3
This patch move dir data flush to write checkpoint process, by doing this, it may reduce some time for dir fsync. pre: -f2fs_do_sync_file enter -file_write_and_wait_range <- flush & wait -write_checkpoint -do_checkpoint <- wait all -f2fs_do_sync_file exit now: -f2fs_do_sync_file enter -write_checkpoint -block_operations <- flush dir & no wait -do_checkpoint <- wait all -f2fs_do_sync_file exit Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: change segment to section in f2fs_ioc_gc_rangeYunlong Song1-1/+1
f2fs_ioc_gc_range skips blocks_per_seg each time, however, f2fs_gc moves blocks of section each time, so fix it from segment to section. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: introduce __is_large_section() for cleanupChao Yu1-1/+1
Introduce a wrapper __is_large_section() to clean up codes. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-11-26f2fs: clean up f2fs_sb_has_##feature_nameChao Yu1-7/+6
In F2FS_HAS_FEATURE(), we will use F2FS_SB(sb) to get sbi pointer to access .raw_super field, to avoid unneeded pointer conversion, this patch changes to F2FS_HAS_FEATURE() accept sbi parameter directly. Just do cleanup, no logic change. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22f2fs: fix to keep project quota consistentChao Yu1-9/+28
This patch does below changes to keep consistence of project quota data in sudden power-cut case: - update inode.i_projid and project quota atomically under lock_op() in f2fs_ioc_setproject() - recover inode.i_projid and project quota in recover_inode() Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22f2fs: guarantee journalled quota data by checkpointChao Yu1-7/+24
For journalled quota mode, let checkpoint to flush dquot dirty data and quota file data to guarntee persistence of all quota sysfile in last checkpoint, by this way, we can avoid corrupting quota sysfile when encountering SPO. The implementation is as below: 1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is cached dquot metadata changes in quota subsystem, and later checkpoint should: a) flush dquot metadata into quota file. b) flush quota file to storage to keep file usage be consistent. 2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota operation failed due to -EIO or -ENOSPC, so later, a) checkpoint will skip syncing dquot metadata. b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a hint for fsck repairing. 3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota data updating is very heavy, it may cause hungtask in block_operation(). To avoid this, if our retry time exceed threshold, let's just skip flushing and retry in next checkpoint(). Signed-off-by: Weichao Guo <guoweichao@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: avoid warnings and set fsck flag] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16f2fs: do not update REQ_TIME in case of error conditionsSahitya Tummala1-1/+1
The REQ_TIME should be updated only in case of success cases as followed at all other places in the file system. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16f2fs: checkpoint disablingDaniel Rosenberg1-1/+11
Note that, it requires "f2fs: return correct errno in f2fs_gc". This adds a lightweight non-persistent snapshotting scheme to f2fs. To use, mount with the option checkpoint=disable, and to return to normal operation, remount with checkpoint=enable. If the filesystem is shut down before remounting with checkpoint=enable, it will revert back to its apparent state when it was first mounted with checkpoint=disable. This is useful for situations where you wish to be able to roll back the state of the disk in case of some critical failure. Signed-off-by: Daniel Rosenberg <drosen@google.com> [Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-30f2fs: allow out-place-update for direct IO in LFS modeChao Yu1-1/+2
Normally, DIO uses in-pllace-update, but in LFS mode, f2fs doesn't allow triggering any in-place-update writes, so we fallback direct write to buffered write, result in bad performance of large size write. This patch adds to support triggering out-place-update for direct IO to enhance its performance. Note that it needs to exclude direct read IO during direct write, since new data writing to new block address will no be valid until write finished. storage: zram time xfs_io -f -d /mnt/f2fs/file -c "pwrite 0 1073741824" -c "fsync" Before: real 0m13.061s user 0m0.327s sys 0m12.486s After: real 0m6.448s user 0m0.228s sys 0m6.212s Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-30f2fs: refactor ->page_mkwrite() flowChao Yu1-23/+22
Thread A Thread B - f2fs_vm_page_mkwrite - f2fs_setattr - down_write(i_mmap_sem) - truncate_setsize - f2fs_truncate - up_write(i_mmap_sem) - f2fs_reserve_block reserve NEW_ADDR - skip dirty page due to truncation 1. we don't need to rserve new block address for a truncated page. 2. dn.data_blkaddr is used out of node page lock coverage. Refactor ->page_mkwrite() flow to fix above issues: - use __do_map_lock() to avoid racing checkpoint() - lock data page in prior to dnode page - cover f2fs_reserve_block with i_mmap_sem lock - wait page writeback before zeroing page Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-12f2fs: add SPDX license identifiersChao Yu1-4/+1
Remove the verbose license text from f2fs files and replace them with SPDX tags. This does not change the license of any of the code. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-11f2fs: fix setattr project check upon fssetxattr ioctlWang Shilong1-23/+37
Currently, project quota could be changed by fssetxattr ioctl, and existed permission check inode_owner_or_capable() is obviously not enough, just think that common users could change project id of file, that could make users to break project quota easily. This patch try to follow same regular of xfs project quota: "Project Quota ID state is only allowed to change from within the init namespace. Enforce that restriction only if we are trying to change the quota ID state. Everything else is allowed in user namespaces." Besides that, check and set project id'state should be an atomic operation, protect whole operation with inode lock. Signed-off-by: Wang Shilong <wshilong@ddn.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-05f2fs: avoid wrong decrypted data from diskJaegeuk Kim1-2/+1
1. Create a file in an encrypted directory 2. Do GC & drop caches 3. Read stale data before its bio for metapage was not issued yet Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>