summaryrefslogtreecommitdiffstats
path: root/fs/f2fs/file.c
AgeCommit message (Collapse)AuthorFilesLines
2023-01-03f2fs: initialize extent_cache parameterJaegeuk Kim1-1/+1
This can avoid confusing tracepoint values. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-14Merge tag 'f2fs-for-6.2-rc1' of ↵Linus Torvalds1-14/+32
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've added two features: F2FS_IOC_START_ATOMIC_REPLACE and a per-block age-based extent cache. F2FS_IOC_START_ATOMIC_REPLACE is a variant of the previous atomic write feature which guarantees a per-file atomicity. It would be more efficient than AtomicFile implementation in Android framework. The per-block age-based extent cache implements another type of extent cache in memory which keeps the per-block age in a file, so that block allocator could split the hot and cold data blocks more accurately. Enhancements: - introduce F2FS_IOC_START_ATOMIC_REPLACE - refactor extent_cache to add a new per-block-age-based extent cache support - introduce discard_urgent_util, gc_mode, max_ordered_discard sysfs knobs - add proc entry to show discard_plist info - optimize iteration over sparse directories - add barrier mount option Bug fixes: - avoid victim selection from previous victim section - fix to enable compress for newly created file if extension matches - set zstd compress level correctly - initialize locks early in f2fs_fill_super() to fix bugs reported by syzbot - correct i_size change for atomic writes - allow to read node block after shutdown - allow to set compression for inlined file - fix gc mode when gc_urgent_high_remaining is 1 - should put a page when checking the summary info Minor fixes and various clean-ups in GC, discard, debugfs, sysfs, and doc" * tag 'f2fs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (63 commits) f2fs: reset wait_ms to default if any of the victims have been selected f2fs: fix some format WARNING in debug.c and sysfs.c f2fs: don't call f2fs_issue_discard_timeout() when discard_cmd_cnt is 0 in f2fs_put_super() f2fs: fix iostat parameter for discard f2fs: Fix spelling mistake in label: free_bio_enrty_cache -> free_bio_entry_cache f2fs: add block_age-based extent cache f2fs: allocate the extent_cache by default f2fs: refactor extent_cache to support for read and more f2fs: remove unnecessary __init_extent_tree f2fs: move internal functions into extent_cache.c f2fs: specify extent cache for read explicitly f2fs: introduce f2fs_is_readonly() for readability f2fs: remove F2FS_SET_FEATURE() and F2FS_CLEAR_FEATURE() macro f2fs: do some cleanup for f2fs module init MAINTAINERS: Add f2fs bug tracker link f2fs: remove the unused flush argument to change_curseg f2fs: open code allocate_segment_by_default f2fs: remove struct segment_allocation default_salloc_ops f2fs: introduce discard_urgent_util sysfs node f2fs: define MIN_DISCARD_GRANULARITY macro ...
2022-12-12f2fs: add block_age-based extent cacheJaegeuk Kim1-0/+1
This patch introduces a runtime hot/cold data separation method for f2fs, in order to improve the accuracy for data temperature classification, reduce the garbage collection overhead after long-term data updates. Enhanced hot/cold data separation can record data block update frequency as "age" of the extent per inode, and take use of the age info to indicate better temperature type for data block allocation: - It records total data blocks allocated since mount; - When file extent has been updated, it calculate the count of data blocks allocated since last update as the age of the extent; - Before the data block allocated, it searches for the age info and chooses the suitable segment for allocation. Test and result: - Prepare: create about 30000 files * 3% for cold files (with cold file extension like .apk, from 3M to 10M) * 50% for warm files (with random file extension like .FcDxq, from 1K to 4M) * 47% for hot files (with hot file extension like .db, from 1K to 256K) - create(5%)/random update(90%)/delete(5%) the files * total write amount is about 70G * fsync will be called for .db files, and buffered write will be used for other files The storage of test device is large enough(128G) so that it will not switch to SSR mode during the test. Benefit: dirty segment count increment reduce about 14% - before: Dirty +21110 - after: Dirty +18286 Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com> Signed-off-by: xiongping1 <xiongping1@xiaomi.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-12f2fs: refactor extent_cache to support for read and moreJaegeuk Kim1-4/+4
This patch prepares extent_cache to be ready for addition. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28f2fs: introduce F2FS_IOC_START_ATOMIC_REPLACEDaeho Jeong1-6/+15
introduce a new ioctl to replace the whole content of a file atomically, which means it induces truncate and content update at the same time. We can start it with F2FS_IOC_START_ATOMIC_REPLACE and complete it with F2FS_IOC_COMMIT_ATOMIC_WRITE. Or abort it with F2FS_IOC_ABORT_ATOMIC_WRITE. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-11f2fs: correct i_size change for atomic writesDaeho Jeong1-7/+11
We need to make sure i_size doesn't change until atomic write commit is successful and restore it when commit is failed. Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-01f2fs: Fix typo in commentsKeoseong Park1-1/+1
Change "truncateion" to "truncation". Signed-off-by: Keoseong Park <keosung.park@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-01f2fs: allow to set compression for inlined fileJaegeuk Kim1-0/+4
The below commit disallows to set compression on empty created file which has a inline_data. Let's fix it. Fixes: 7165841d578e ("f2fs: fix to check inline_data during compressed inode conversion") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-10-20fs: rename current get acl methodChristian Brauner1-1/+1
The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. The current inode operation for getting posix acls takes an inode argument but various filesystems (e.g., 9p, cifs, overlayfs) need access to the dentry. In contrast to the ->set_acl() inode operation we cannot simply extend ->get_acl() to take a dentry argument. The ->get_acl() inode operation is called from: acl_permission_check() -> check_acl() -> get_acl() which is part of generic_permission() which in turn is part of inode_permission(). Both generic_permission() and inode_permission() are called in the ->permission() handler of various filesystems (e.g., overlayfs). So simply passing a dentry argument to ->get_acl() would amount to also having to pass a dentry argument to ->permission(). We should avoid this unnecessary change. So instead of extending the existing inode operation rename it from ->get_acl() to ->get_inode_acl() and add a ->get_acl() method later that passes a dentry argument and which filesystems that need access to the dentry can implement instead of ->get_inode_acl(). Filesystems like cifs which allow setting and getting posix acls but not using them for permission checking during lookup can simply not implement ->get_inode_acl(). This is intended to be a non-functional change. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Suggested-by/Inspired-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-19fs: pass dentry to set acl methodChristian Brauner1-1/+1
The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. Since some filesystem rely on the dentry being available to them when setting posix acls (e.g., 9p and cifs) they cannot rely on set acl inode operation. But since ->set_acl() is required in order to use the generic posix acl xattr handlers filesystems that do not implement this inode operation cannot use the handler and need to implement their own dedicated posix acl handlers. Update the ->set_acl() inode method to take a dentry argument. This allows all filesystems to rely on ->set_acl(). As far as I can tell all codepaths can be switched to rely on the dentry instead of just the inode. Note that the original motivation for passing the dentry separate from the inode instead of just the dentry in the xattr handlers was because of security modules that call security_d_instantiate(). This hook is called during d_instantiate_new(), d_add(), __d_instantiate_anon(), and d_splice_alias() to initialize the inode's security context and possibly to set security.* xattrs. Since this only affects security.* xattrs this is completely irrelevant for posix acls. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-10Merge tag 'f2fs-for-6.1-rc1' of ↵Linus Torvalds1-21/+36
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This round looks fairly small comparing to the previous updates and includes mostly minor bug fixes. Nevertheless, as we've still interested in improving the stability, Chao added some debugging methods to diagnoze subtle runtime inconsistency problem. Enhancements: - store all the corruption or failure reasons in superblock - detect meta inode, summary info, and block address inconsistency - increase the limit for reserve_root for low-end devices - add the number of compressed IO in iostat Bug fixes: - DIO write fix for zoned devices - do out-of-place writes for cold files - fix some stat updates (FS_CP_DATA_IO, dirty page count) - fix race condition on setting FI_NO_EXTENT flag - fix data races when freezing super - fix wrong continue condition check in GC - do not allow ATGC for LFS mode In addition, there're some code enhancement and clean-ups as usual" * tag 'f2fs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (32 commits) f2fs: change to use atomic_t type form sbi.atomic_files f2fs: account swapfile inodes f2fs: allow direct read for zoned device f2fs: support recording errors into superblock f2fs: support recording stop_checkpoint reason into super_block f2fs: remove the unnecessary check in f2fs_xattr_fiemap f2fs: introduce cp_status sysfs entry f2fs: fix to detect corrupted meta ino f2fs: fix to account FS_CP_DATA_IO correctly f2fs: code clean and fix a type error f2fs: add "c_len" into trace_f2fs_update_extent_tree_range for compressed file f2fs: fix to do sanity check on summary info f2fs: port to vfs{g,u}id_t and associated helpers f2fs: fix to do sanity check on destination blkaddr during recovery f2fs: let FI_OPU_WRITE override FADVISE_COLD_BIT f2fs: fix race condition on setting FI_NO_EXTENT flag f2fs: remove redundant check in f2fs_sanity_check_cluster f2fs: add static init_idisk_time function to reduce the code f2fs: fix typo f2fs: fix wrong dirty page count when race between mmap and fallocate. ...
2022-10-07f2fs: change to use atomic_t type form sbi.atomic_filesChao Yu1-3/+1
inode_lock[ATOMIC_FILE] was used for protecting sbi->atomic_files, update atomic_files variable's type to atomic_t instead of unsigned int, then inode_lock[ATOMIC_FILE] can be obsoleted. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-10-04f2fs: support recording errors into superblockChao Yu1-2/+10
This patch supports to record detail reason of FSCORRUPTED error into f2fs_super_block.s_errors[]. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-10-04f2fs: support recording stop_checkpoint reason into super_blockChao Yu1-5/+6
This patch supports to record stop_checkpoint error into f2fs_super_block.s_stop_reason[]. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-10-04f2fs: code clean and fix a type errorZhang Qilong1-1/+1
ERROR: code indent should use tabs where possible ERROR: spaces required around that ':' ERROR: incorrect tab Found serveral code type errors when review the code and fix it. There is no function change. Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-10-04f2fs: port to vfs{g,u}id_t and associated helpersChristian Brauner1-2/+3
A while ago we introduced a dedicated vfs{g,u}id_t type in commit 1e5267cd0895 ("mnt_idmapping: add vfs{g,u}id_t"). We already switched over a good part of the VFS. Ultimately we will remove all legacy idmapped mount helpers that operate only on k{g,u}id_t in favor of the new type safe helpers that operate on vfs{g,u}id_t. Cc: Seth Forshee (Digital Ocean) <sforshee@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: linux-f2fs-devel@lists.sourceforge.net Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-09-11f2fs: support STATX_DIOALIGNEric Biggers1-0/+18
Add support for STATX_DIOALIGN to f2fs, so that direct I/O alignment restrictions are exposed to userspace in a generic way. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Link: https://lore.kernel.org/r/20220827065851.135710-8-ebiggers@kernel.org
2022-09-11f2fs: simplify f2fs_force_buffered_io()Eric Biggers1-22/+5
f2fs only allows direct I/O that is aligned to the filesystem block size. Given that fact, simplify f2fs_force_buffered_io() by removing the redundant call to block_unaligned_IO(). This makes it easier to reuse this code for STATX_DIOALIGN. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Link: https://lore.kernel.org/r/20220827065851.135710-7-ebiggers@kernel.org
2022-09-11f2fs: move f2fs_force_buffered_io() into file.cEric Biggers1-0/+40
f2fs_force_buffered_io() is only used in file.c, so move it into there. No behavior change. This makes it easier to review later patches. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Link: https://lore.kernel.org/r/20220827065851.135710-6-ebiggers@kernel.org
2022-08-29f2fs: iostat: support accounting compressed IOChao Yu1-7/+9
Previously, we supported to account FS_CDATA_READ_IO type IO only, in this patch, it adds to account more type IO for compressed file: - APP_BUFFERED_CDATA_IO - APP_MAPPED_CDATA_IO - FS_CDATA_IO - APP_BUFFERED_CDATA_READ_IO - APP_MAPPED_CDATA_READ_IO Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-08Merge tag 'f2fs-for-6.0' of ↵Linus Torvalds1-26/+53
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this cycle, we mainly fixed some corner cases that manipulate a per-file compression flag inappropriately. And, we found f2fs counted valid blocks in a section incorrectly when zone capacity is set, and thus, fixed it with additional sysfs entry to check it easily. Lastly, this series includes several patches with respect to the new atomic write support such as a couple of bug fixes and re-adding atomic_write_abort support that we removed by mistake in the previous release. Enhancements: - add sysfs entries to understand atomic write operations and zone capacity - introduce memory mode to get a hint for low-memory devices - adjust the waiting time of foreground GC - decompress clusters under softirq to avoid non-deterministic latency - do not skip updating inode when retrying to flush node page - enforce single zone capacity Bug fixes: - set the compression/no-compression flags correctly - revive F2FS_IOC_ABORT_VOLATILE_WRITE - check inline_data during compressed inode conversion - understand zone capacity when calculating valid block count As usual, the series includes several minor clean-ups and sanity checks" * tag 'f2fs-for-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (29 commits) f2fs: use onstack pages instead of pvec f2fs: intorduce f2fs_all_cluster_page_ready f2fs: clean up f2fs_abort_atomic_write() f2fs: handle decompress only post processing in softirq f2fs: do not allow to decompress files have FI_COMPRESS_RELEASED f2fs: do not set compression bit if kernel doesn't support f2fs: remove device type check for direct IO f2fs: fix null-ptr-deref in f2fs_get_dnode_of_data f2fs: revive F2FS_IOC_ABORT_VOLATILE_WRITE f2fs: fix to do sanity check on segment type in build_sit_entries() f2fs: obsolete unused MAX_DISCARD_BLOCKS f2fs: fix to avoid use f2fs_bug_on() in f2fs_new_node_page() f2fs: fix to remove F2FS_COMPR_FL and tag F2FS_NOCOMP_FL at the same time f2fs: introduce sysfs atomic write statistics f2fs: don't bother wait_ms by foreground gc f2fs: invalidate meta pages only for post_read required inode f2fs: allow compression of files without blocks f2fs: fix to check inline_data during compressed inode conversion f2fs: Delete f2fs_copy_page() and replace with memcpy_page() f2fs: fix to invalidate META_MAPPING before DIO write ...
2022-08-05f2fs: clean up f2fs_abort_atomic_write()Chao Yu1-6/+3
f2fs_abort_atomic_write() has checked whether current inode is atomic_write one or not, it's redundant to check in its caller, remove it for cleanup. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-05f2fs: do not allow to decompress files have FI_COMPRESS_RELEASEDJaewook Kim1-0/+10
If a file has FI_COMPRESS_RELEASED, all writes for it should not be allowed. However, as of now, in case of compress_mode=user, writes triggered by IOCTLs like F2FS_IOC_DE/COMPRESS_FILE are allowed unexpectly, which could crash that file. To fix it, let's do not allow F2FS_IOC_DE/COMPRESS_IOCTL if a file already has FI_COMPRESS_RELEASED flag. This is the reproduction process: 1. $ touch ./file 2. $ chattr +c ./file 3. $ dd if=/dev/random of=./file bs=4096 count=30 conv=notrunc 4. $ dd if=/dev/zero of=./file bs=4096 count=34 seek=30 conv=notrunc 5. $ sync 6. $ do_compress ./file ; call F2FS_IOC_COMPRESS_FILE 7. $ get_compr_blocks ./file ; call F2FS_IOC_GET_COMPRESS_BLOCKS 8. $ release ./file ; call F2FS_IOC_RELEASE_COMPRESS_BLOCKS 9. $ do_compress ./file ; call F2FS_IOC_COMPRESS_FILE again 10. $ get_compr_blocks ./file ; call F2FS_IOC_GET_COMPRESS_BLOCKS again This reproduction process is tested in 128kb cluster size. You can find compr_blocks has a negative value. Fixes: 5fdb322ff2c2b ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE") Signed-off-by: Junbeom Yeom <junbeom.yeom@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com> Signed-off-by: Jaewook Kim <jw5454.kim@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-05f2fs: do not set compression bit if kernel doesn't supportJaegeuk Kim1-2/+2
If kernel doesn't have CONFIG_F2FS_FS_COMPRESSION, a file having FS_COMPR_FL via ioctl(FS_IOC_SETFLAGS) is unaccessible due to f2fs_is_compress_backend_ready(). Let's avoid it. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-05f2fs: fix null-ptr-deref in f2fs_get_dnode_of_dataYe Bin1-1/+1
There is issue as follows when test f2fs atomic write: F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock F2FS-fs (loop0): invalid crc_offset: 0 F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=1, run fsck to fix. F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=2, run fsck to fix. ================================================================== BUG: KASAN: null-ptr-deref in f2fs_get_dnode_of_data+0xac/0x16d0 Read of size 8 at addr 0000000000000028 by task rep/1990 CPU: 4 PID: 1990 Comm: rep Not tainted 5.19.0-rc6-next-20220715 #266 Call Trace: <TASK> dump_stack_lvl+0x6e/0x91 print_report.cold+0x49a/0x6bb kasan_report+0xa8/0x130 f2fs_get_dnode_of_data+0xac/0x16d0 f2fs_do_write_data_page+0x2a5/0x1030 move_data_page+0x3c5/0xdf0 do_garbage_collect+0x2015/0x36c0 f2fs_gc+0x554/0x1d30 f2fs_balance_fs+0x7f5/0xda0 f2fs_write_single_data_page+0xb66/0xdc0 f2fs_write_cache_pages+0x716/0x1420 f2fs_write_data_pages+0x84f/0x9a0 do_writepages+0x130/0x3a0 filemap_fdatawrite_wbc+0x87/0xa0 file_write_and_wait_range+0x157/0x1c0 f2fs_do_sync_file+0x206/0x12d0 f2fs_sync_file+0x99/0xc0 vfs_fsync_range+0x75/0x140 f2fs_file_write_iter+0xd7b/0x1850 vfs_write+0x645/0x780 ksys_write+0xf1/0x1e0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd As 3db1de0e582c commit changed atomic write way which new a cow_inode for atomic write file, and also mark cow_inode as FI_ATOMIC_FILE. When f2fs_do_write_data_page write cow_inode will use cow_inode's cow_inode which is NULL. Then will trigger null-ptr-deref. To solve above issue, introduce FI_COW_FILE flag for COW inode. Fiexes: 3db1de0e582c("f2fs: change the current atomic write way") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-08-05f2fs: revive F2FS_IOC_ABORT_VOLATILE_WRITEDaeho Jeong1-2/+28
F2FS_IOC_ABORT_VOLATILE_WRITE was used to abort a atomic write before. However it was removed accidentally. So revive it by changing the name, since volatile write had gone. Signed-off-by: Daeho Jeong <daehojeong@google.com> Fiexes: 7bc155fec5b3("f2fs: kill volatile write support") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-07-30f2fs: fix to remove F2FS_COMPR_FL and tag F2FS_NOCOMP_FL at the same timeChao Liu1-8/+1
If the inode has the compress flag, it will fail to use 'chattr -c +m' to remove its compress flag and tag no compress flag. However, the same command will be successful when executed again, as shown below: $ touch foo.txt $ chattr +c foo.txt $ chattr -c +m foo.txt chattr: Invalid argument while setting flags on foo.txt $ chattr -c +m foo.txt $ f2fs_io getflags foo.txt get a flag on foo.txt ret=0, flags=nocompression,inline_data Fix this by removing some checks in f2fs_setflags_common() that do not affect the original logic. I go through all the possible scenarios, and the results are as follows. Bold is the only thing that has changed. +---------------+-----------+-----------+----------+ | | file flags | + command +-----------+-----------+----------+ | | no flag | compr | nocompr | +---------------+-----------+-----------+----------+ | chattr +c | compr | compr | -EINVAL | | chattr -c | no flag | no flag | nocompr | | chattr +m | nocompr | -EINVAL | nocompr | | chattr -m | no flag | compr | no flag | | chattr +c +m | -EINVAL | -EINVAL | -EINVAL | | chattr +c -m | compr | compr | compr | | chattr -c +m | nocompr | *nocompr* | nocompr | | chattr -c -m | no flag | no flag | no flag | +---------------+-----------+-----------+----------+ Link: https://lore.kernel.org/linux-f2fs-devel/20220621064833.1079383-1-chaoliu719@gmail.com/ Fixes: 4c8ff7095bef ("f2fs: support data compression") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Chao Liu <liuchao@coolpad.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-07-30f2fs: introduce sysfs atomic write statisticsDaeho Jeong1-0/+1
introduce the below 4 new sysfs node for atomic write statistics. - current_atomic_write: the total current atomic write block count, which is not committed yet. - peak_atomic_write: the peak value of total current atomic write block count after boot. - committed_atomic_block: the accumulated total committed atomic write block count after boot. - revoked_atomic_block: the accumulated total revoked atomic write block count after boot. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-07-30f2fs: allow compression of files without blocksChao Liu1-1/+1
Files created by truncate(1) have a size but no blocks, so they can be allowed to enable compression. Signed-off-by: Chao Liu <liuchao@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-07-30f2fs: Delete f2fs_copy_page() and replace with memcpy_page()Fabio M. De Francesco1-1/+1
f2fs_copy_page() is a wrapper around two kmap() + one memcpy() from/to the mapped pages. It unnecessarily duplicates a kernel API and it makes use of kmap(), which is being deprecated in favor of kmap_local_page(). Two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Therefore, its use in __clone_blkaddrs() is safe and should be preferred. Delete f2fs_copy_page() and use a plain memcpy_page() in the only one site calling the removed function. memcpy_page() avoids open coding two kmap_local_page() + one memcpy() between the two kernel virtual addresses. Suggested-by: Christoph Hellwig <hch@infradead.org> Suggested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-07-30f2fs: adjust zone capacity when considering valid block countJaegeuk Kim1-3/+3
This patch fixes counting unusable blocks set by zone capacity when checking the valid block count in a section. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-06-28f2fs: optimize error handling in redirty_blocksJack Qiu1-4/+4
Current error handling is at risk of page leaks. However, we dot't seek any failure scenarios, just use f2fs_bug_on. Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-06-26attr: port attribute changes to new typesChristian Brauner1-8/+8
Now that we introduced new infrastructure to increase the type safety for filesystems supporting idmapped mounts port the first part of the vfs over to them. This ports the attribute changes codepaths to rely on the new better helpers using a dedicated type. Before this change we used to take a shortcut and place the actual values that would be written to inode->i_{g,u}id into struct iattr. This had the advantage that we moved idmappings mostly out of the picture early on but it made reasoning about changes more difficult than it should be. The filesystem was never explicitly told that it dealt with an idmapped mount. The transition to the value that needed to be stored in inode->i_{g,u}id appeared way too early and increased the probability of bugs in various codepaths. We know place the same value in struct iattr no matter if this is an idmapped mount or not. The vfs will only deal with type safe vfs{g,u}id_t. This makes it massively safer to perform permission checks as the type will tell us what checks we need to perform and what helpers we need to use. Fileystems raising FS_ALLOW_IDMAP can't simply write ia_vfs{g,u}id to inode->i_{g,u}id since they are different types. Instead they need to use the dedicated vfs{g,u}id_to_k{g,u}id() helpers that map the vfs{g,u}id into the filesystem. The other nice effect is that filesystems like overlayfs don't need to care about idmappings explicitly anymore and can simply set up struct iattr accordingly directly. Link: https://lore.kernel.org/lkml/CAHk-=win6+ahs1EwLkcq8apqLi_1wXFWbrPf340zYEhObpz4jA@mail.gmail.com [1] Link: https://lore.kernel.org/r/20220621141454.2914719-9-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-06-26quota: port quota helpers mount idsChristian Brauner1-2/+2
Port the is_quota_modification() and dqout_transfer() helper to type safe vfs{g,u}id_t. Since these helpers are only called by a few filesystems don't introduce a new helper but simply extend the existing helpers to pass down the mount's idmapping. Note, that this is a non-functional change, i.e. nothing will have happened here or at the end of this series to how quota are done! This a change necessary because we will at the end of this series make ownership changes easier to reason about by keeping the original value in struct iattr for both non-idmapped and idmapped mounts. For now we always pass the initial idmapping which makes the idmapping functions these helpers call nops. This is done because we currently always pass the actual value to be written to i_{g,u}id via struct iattr. While this allowed us to treat the {g,u}id values in struct iattr as values that can be directly written to inode->i_{g,u}id it also increases the potential for confusion for filesystems. Now that we are have dedicated types to prevent this confusion we will ultimately only map the value from the idmapped mount into a filesystem value that can be written to inode->i_{g,u}id when the filesystem actually updates the inode. So pass down the initial idmapping until we finished that conversion at which point we pass down the mount's idmapping. Since struct iattr uses an anonymous union with overlapping types as supported by the C standard, filesystems that haven't converted to ia_vfs{g,u}id won't see any difference and things will continue to work as before. In other words, no functional changes intended with this change. Link: https://lore.kernel.org/r/20220621141454.2914719-7-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-06-26fs: port to iattr ownership update helpersChristian Brauner1-12/+6
Earlier we introduced new helpers to abstract ownership update and remove code duplication. This converts all filesystems supporting idmapped mounts to make use of these new helpers. For now we always pass the initial idmapping which makes the idmapping functions these helpers call nops. This is done because we currently always pass the actual value to be written to i_{g,u}id via struct iattr. While this allowed us to treat the {g,u}id values in struct iattr as values that can be directly written to inode->i_{g,u}id it also increases the potential for confusion for filesystems. Now that we are have dedicated types to prevent this confusion we will ultimately only map the value from the idmapped mount into a filesystem value that can be written to inode->i_{g,u}id when the filesystem actually updates the inode. So pass down the initial idmapping until we finished that conversion at which point we pass down the mount's idmapping. No functional changes intended. Link: https://lore.kernel.org/r/20220621141454.2914719-6-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-05-31Merge tag 'f2fs-for-5.19-rc1' of ↵Linus Torvalds1-170/+137
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've refactored the existing atomic write support implemented by in-memory operations to have storing data in disk temporarily, which can give us a benefit to accept more atomic writes. At the same time, we removed the existing volatile write support. We've also revisited the file pinning and GC flows and found some corner cases which contributeed abnormal system behaviours. As usual, there're several minor code refactoring for readability, sanity check, and clean ups. Enhancements: - allow compression for mmap files in compress_mode=user - kill volatile write support - change the current atomic write way - give priority to select unpinned section for foreground GC - introduce data read/write showing path info - remove unnecessary f2fs_lock_op in f2fs_new_inode Bug fixes: - fix the file pinning flow during checkpoint=disable and GCs - fix foreground and background GCs to select the right victims and get free sections on time - fix GC flags on defragmenting pages - avoid an infinite loop to flush node pages - fix fallocate to use file_modified to update permissions consistently" * tag 'f2fs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits) f2fs: fix to tag gcing flag on page during file defragment f2fs: replace F2FS_I(inode) and sbi by the local variable f2fs: add f2fs_init_write_merge_io function f2fs: avoid unneeded error handling for revoke_entry_slab allocation f2fs: allow compression for mmap files in compress_mode=user f2fs: fix typo in comment f2fs: make f2fs_read_inline_data() more readable f2fs: fix to do sanity check for inline inode f2fs: fix fallocate to use file_modified to update permissions consistently f2fs: don't use casefolded comparison for "." and ".." f2fs: do not stop GC when requiring a free section f2fs: keep wait_ms if EAGAIN happens f2fs: introduce f2fs_gc_control to consolidate f2fs_gc parameters f2fs: reject test_dummy_encryption when !CONFIG_FS_ENCRYPTION f2fs: kill volatile write support f2fs: change the current atomic write way f2fs: don't need inode lock for system hidden quota f2fs: stop allocating pinned sections if EAGAIN happens f2fs: skip GC if possible when checkpoint disabling f2fs: give priority to select unpinned section for foreground GC ...
2022-05-27f2fs: fix to tag gcing flag on page during file defragmentChao Yu1-0/+1
In order to garantee migrated data be persisted during checkpoint, otherwise out-of-order persistency between data and node may cause data corruption after SPOR. Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-27f2fs: replace F2FS_I(inode) and sbi by the local variableYufen Yu1-11/+11
We have define 'fi' at the begin of the functions, just use it, rather than use F2FS_I(inode) again. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> [Jaegeuk Kim: replace sbi] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-24f2fs: allow compression for mmap files in compress_mode=userSungjong Seo1-10/+0
Since commit e3c548323d32 ("f2fs: let's allow compression for mmap files"), it has been allowed to compress mmap files. However, in compress_mode=user, it is not allowed yet. To keep the same concept in both compress_modes, f2fs_ioc_(de)compress_file() should also allow it. Let's remove checking mmap files in f2fs_ioc_(de)compress_file() so that the compression for mmap files is also allowed in compress_mode=user. Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-24Merge tag 'for-5.19-tag' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "Features: - subpage: - support for PAGE_SIZE > 4K (previously only 64K) - make it work with raid56 - repair super block num_devices automatically if it does not match the number of device items - defrag can convert inline extents to regular extents, up to now inline files were skipped but the setting of mount option max_inline could affect the decision logic - zoned: - minimal accepted zone size is explicitly set to 4MiB - make zone reclaim less aggressive and don't reclaim if there are enough free zones - add per-profile sysfs tunable of the reclaim threshold - allow automatic block group reclaim for non-zoned filesystems, with sysfs tunables - tree-checker: new check, compare extent buffer owner against owner rootid Performance: - avoid blocking on space reservation when doing nowait direct io writes (+7% throughput for reads and writes) - NOCOW write throughput improvement due to refined locking (+3%) - send: reduce pressure to page cache by dropping extent pages right after they're processed Core: - convert all radix trees to xarray - add iterators for b-tree node items - support printk message index - user bulk page allocation for extent buffers - switch to bio_alloc API, use on-stack bios where convenient, other bio cleanups - use rw lock for block groups to favor concurrent reads - simplify workques, don't allocate high priority threads for all normal queues as we need only one - refactor scrub, process chunks based on their constraints and similarity - allocate direct io structures on stack and pass around only pointers, avoids allocation and reduces potential error handling Fixes: - fix count of reserved transaction items for various inode operations - fix deadlock between concurrent dio writes when low on free data space - fix a few cases when zones need to be finished VFS, iomap: - add helper to check if sb write has started (usable for assertions) - new helper iomap_dio_alloc_bio, export iomap_dio_bio_end_io" * tag 'for-5.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (173 commits) btrfs: zoned: introduce a minimal zone size 4M and reject mount btrfs: allow defrag to convert inline extents to regular extents btrfs: add "0x" prefix for unsupported optional features btrfs: do not account twice for inode ref when reserving metadata units btrfs: zoned: fix comparison of alloc_offset vs meta_write_pointer btrfs: send: avoid trashing the page cache btrfs: send: keep the current inode open while processing it btrfs: allocate the btrfs_dio_private as part of the iomap dio bio btrfs: move struct btrfs_dio_private to inode.c btrfs: remove the disk_bytenr in struct btrfs_dio_private btrfs: allocate dio_data on stack iomap: add per-iomap_iter private data iomap: allow the file system to provide a bio_set for direct I/O btrfs: add a btrfs_dio_rw wrapper btrfs: zoned: zone finish unused block group btrfs: zoned: properly finish block group on metadata write btrfs: zoned: finish block group when there are no more allocatable bytes left btrfs: zoned: consolidate zone finish functions btrfs: zoned: introduce btrfs_zoned_bg_is_full btrfs: improve error reporting in lookup_inline_extent_backref ...
2022-05-18f2fs: fix fallocate to use file_modified to update permissions consistentlyChao Yu1-0/+4
This patch tries to fix permission consistency issue as all other mainline filesystems. Since the initial introduction of (posix) fallocate back at the turn of the century, it has been possible to use this syscall to change the user-visible contents of files. This can happen by extending the file size during a preallocation, or through any of the newer modes (punch, zero, collapse, insert range). Because the call can be used to change file contents, we should treat it like we do any other modification to a file -- update the mtime, and drop set[ug]id privileges/capabilities. The VFS function file_modified() does all this for us if pass it a locked inode, so let's make fallocate drop permissions correctly. Cc: stable@kernel.org Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-17f2fs: do not stop GC when requiring a free sectionJaegeuk Kim1-4/+8
The f2fs_gc uses a bitmap to indicate pinned sections, but when disabling chckpoint, we call f2fs_gc() with NULL_SEGNO which selects the same dirty segment as a victim all the time, resulting in checkpoint=disable failure, for example. Let's pick another one, if we fail to collect it. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-16iomap: add per-iomap_iter private dataChristoph Hellwig1-2/+2
Allow the file system to keep state for all iterations. For now only wire it up for direct I/O as there is an immediate need for it there. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-12f2fs: introduce f2fs_gc_control to consolidate f2fs_gc parametersJaegeuk Kim1-5/+25
No functional change. - remove checkpoint=disable check for f2fs_write_checkpoint - get sec_freed all the time Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-12f2fs: kill volatile write supportJaegeuk Kim1-113/+3
There's no user, since all can use atomic writes simply. Let's kill it. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-12f2fs: change the current atomic write wayDaeho Jeong1-22/+27
Current atomic write has three major issues like below. - keeps the updates in non-reclaimable memory space and they are even hard to be migrated, which is not good for contiguous memory allocation. - disk spaces used for atomic files cannot be garbage collected, so this makes it difficult for the filesystem to be defragmented. - If atomic write operations hit the threshold of either memory usage or garbage collection failure count, All the atomic write operations will fail immediately. To resolve the issues, I will keep a COW inode internally for all the updates to be flushed from memory, when we need to flush them out in a situation like high memory pressure. These COW inodes will be tagged as orphan inodes to be reclaimed in case of sudden power-cut or system failure during atomic writes. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-09f2fs: stop allocating pinned sections if EAGAIN happensJaegeuk Kim1-1/+1
EAGAIN doesn't guarantee to have a free section. Let's report it. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-06f2fs: fix to do sanity check on block address in f2fs_do_zero_range()Chao Yu1-4/+12
As Yanming reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215894 I have encountered a bug in F2FS file system in kernel v5.17. I have uploaded the system call sequence as case.c, and a fuzzed image can be found in google net disk The kernel should enable CONFIG_KASAN=y and CONFIG_KASAN_INLINE=y. You can reproduce the bug by running the following commands: kernel BUG at fs/f2fs/segment.c:2291! Call Trace: f2fs_invalidate_blocks+0x193/0x2d0 f2fs_fallocate+0x2593/0x4a70 vfs_fallocate+0x2a5/0xac0 ksys_fallocate+0x35/0x70 __x64_sys_fallocate+0x8e/0xf0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae The root cause is, after image was fuzzed, block mapping info in inode will be inconsistent with SIT table, so in f2fs_fallocate(), it will cause panic when updating SIT with invalid blkaddr. Let's fix the issue by adding sanity check on block address before updating SIT table with it. Cc: stable@vger.kernel.org Reported-by: Ming Yan <yanming@tju.edu.cn> Signed-off-by: Chao Yu <chao.yu@oppo.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-06f2fs: use flush command instead of FUA for zoned deviceJaegeuk Kim1-1/+2
The block layer for zoned disk can reorder the FUA'ed IOs. Let's use flush command to keep the write order. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-04-25f2fs: introduce data read/write showing path infoJaegeuk Kim1-7/+51
This was used in Android for a long time. Let's upstream it. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>