summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2022-01-10ext4: allow to change s_last_trim_minblks via sysfsLukas Czerner1-0/+2
Ext4 has an optimization mechanism for batched disacrd (FITRIM) that should help speed up subsequent calls of FITRIM ioctl by skipping the groups that were previously trimmed. However because the FITRIM allows to set the minimum size of an extent to trim, ext4 stores the last minimum extent size and only avoids trimming the group if it was previously trimmed with minimum extent size equal to, or smaller than the current call. There is currently no way to bypass the optimization without umount/mount cycle. This becomes a problem when the file system is live migrated to a different storage, because the optimization will prevent possibly useful discard calls to the storage. Fix it by exporting the s_last_trim_minblks via sysfs interface which will allow us to set the minimum size to the number of blocks larger than subsequent FITRIM call, effectively bypassing the optimization. By setting the s_last_trim_minblks to ULONG_MAX the optimization will be effectively cleared regardless of the previous state, or file system configuration. For example: getconf ULONG_MAX > /sys/fs/ext4/dm-1/last_trim_minblks Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reported-by: Laurent GUERBY <laurent@guerby.net> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20211103145122.17338-2-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10ext4: change s_last_trim_minblks type to unsigned longLukas Czerner2-3/+3
There is no good reason for the s_last_trim_minblks to be atomic. There is no data integrity needed and there is no real danger in setting and reading it in a racy manner. Change it to be unsigned long, the same type as s_clusters_per_group which is the maximum that's allowed. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Suggested-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20211103145122.17338-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10ext4: implement support for get/set fs labelLukas Czerner5-7/+357
Implement support for FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls for online reading and setting of file system label. ext4_ioctl_getlabel() is simple, just get the label from the primary superblock. This might not be the first sb on the file system if 'sb=' mount option is used. In ext4_ioctl_setlabel() we update what ext4 currently views as a primary superblock and then proceed to update backup superblocks. There are two caveats: - the primary superblock might not be the first superblock and so it might not be the one used by userspace tools if read directly off the disk. - because the primary superblock might not be the first superblock we potentialy have to update it as part of backup superblock update. However the first sb location is a bit more complicated than the rest so we have to account for that. The superblock modification is created generic enough so the infrastructure can be used for other potential superblock modification operations, such as chaning UUID. Tested with generic/492 with various configurations. I also checked the behavior with 'sb=' mount options, including very large file systems with and without sparse_super/sparse_super2. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20211213135618.43303-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10ext4: only set EXT4_MOUNT_QUOTA when journalled quota file is specifiedLukas Czerner1-1/+2
Only set EXT4_MOUNT_QUOTA when journalled quota file is specified, otherwise simply disabling specific quota type (usrjquota=) will also set the EXT4_MOUNT_QUOTA super block option. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Fixes: e6e268cb6822 ("ext4: move quota configuration out of handle_mount_opt()") Link: https://lore.kernel.org/r/20220104143518.134465-2-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10ext4: don't use kfree() on rcu protected pointer sbi->s_qf_namesLukas Czerner1-3/+5
During ext4 mount api rework the commit e6e268cb6822 ("ext4: move quota configuration out of handle_mount_opt()") introduced a bug where we would kfree(sbi->s_qf_names[i]) before assigning the new quota name in ext4_apply_quota_options(). This is wrong because we're using kfree() on rcu prointer that could be simultaneously accessed from ext4_show_quota_options() during remount. Fix it by using rcu_replace_pointer() to replace the old qname with the new one and then kfree_rcu() the old quota name. Also use get_qf_name() instead of sbi->s_qf_names in strcmp() to silence the sparse warning. Fixes: e6e268cb6822 ("ext4: move quota configuration out of handle_mount_opt()") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20220104143518.134465-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10ext4: avoid trim error on fs with small groupsJan Kara2-2/+8
A user reported FITRIM ioctl failing for him on ext4 on some devices without apparent reason. After some debugging we've found out that these devices (being LVM volumes) report rather large discard granularity of 42MB and the filesystem had 1k blocksize and thus group size of 8MB. Because ext4 FITRIM implementation puts discard granularity into minlen, ext4_trim_fs() declared the trim request as invalid. However just silently doing nothing seems to be a more appropriate reaction to such combination of parameters since user did not specify anything wrong. CC: Lukas Czerner <lczerner@redhat.com> Fixes: 5c2ed62fd447 ("ext4: Adjust minlen with discard_granularity in the FITRIM ioctl") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211112152202.26614-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10ext4: fix an use-after-free issue about data=journal writeback modeZhang Yi1-27/+10
Our syzkaller report an use-after-free issue that accessing the freed buffer_head on the writeback page in __ext4_journalled_writepage(). The problem is that if there was a truncate racing with the data=journalled writeback procedure, the writeback length could become zero and bget_one() refuse to get buffer_head's refcount, then the truncate procedure release buffer once we drop page lock, finally, the last ext4_walk_page_buffers() trigger the use-after-free problem. sync truncate ext4_sync_file() file_write_and_wait_range() ext4_setattr(0) inode->i_size = 0 ext4_writepage() len = 0 __ext4_journalled_writepage() page_bufs = page_buffers(page) ext4_walk_page_buffers(bget_one) <- does not get refcount do_invalidatepage() free_buffer_head() ext4_walk_page_buffers(page_bufs) <- trigger use-after-free After commit bdf96838aea6 ("ext4: fix race between truncate and __ext4_journalled_writepage()"), we have already handled the racing case, so the bget_one() and bput_one() are not needed. So this patch simply remove these hunk, and recheck the i_size to make it safe. Fixes: bdf96838aea6 ("ext4: fix race between truncate and __ext4_journalled_writepage()") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211225090937.712867-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10ext4: fix null-ptr-deref in '__ext4_journal_ensure_credits'Ye Bin1-0/+2
We got issue as follows when run syzkaller test: [ 1901.130043] EXT4-fs error (device vda): ext4_remount:5624: comm syz-executor.5: Abort forced by user [ 1901.130901] Aborting journal on device vda-8. [ 1901.131437] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.16: Detected aborted journal [ 1901.131566] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.11: Detected aborted journal [ 1901.132586] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.18: Detected aborted journal [ 1901.132751] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.9: Detected aborted journal [ 1901.136149] EXT4-fs error (device vda) in ext4_reserve_inode_write:6035: Journal has aborted [ 1901.136837] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-fuzzer: Detected aborted journal [ 1901.136915] ================================================================== [ 1901.138175] BUG: KASAN: null-ptr-deref in __ext4_journal_ensure_credits+0x74/0x140 [ext4] [ 1901.138343] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.13: Detected aborted journal [ 1901.138398] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.1: Detected aborted journal [ 1901.138808] Read of size 8 at addr 0000000000000000 by task syz-executor.17/968 [ 1901.138817] [ 1901.138852] EXT4-fs error (device vda): ext4_journal_check_start:61: comm syz-executor.30: Detected aborted journal [ 1901.144779] CPU: 1 PID: 968 Comm: syz-executor.17 Not tainted 4.19.90-vhulk2111.1.0.h893.eulerosv2r10.aarch64+ #1 [ 1901.146479] Hardware name: linux,dummy-virt (DT) [ 1901.147317] Call trace: [ 1901.147552] dump_backtrace+0x0/0x2d8 [ 1901.147898] show_stack+0x28/0x38 [ 1901.148215] dump_stack+0xec/0x15c [ 1901.148746] kasan_report+0x108/0x338 [ 1901.149207] __asan_load8+0x58/0xb0 [ 1901.149753] __ext4_journal_ensure_credits+0x74/0x140 [ext4] [ 1901.150579] ext4_xattr_delete_inode+0xe4/0x700 [ext4] [ 1901.151316] ext4_evict_inode+0x524/0xba8 [ext4] [ 1901.151985] evict+0x1a4/0x378 [ 1901.152353] iput+0x310/0x428 [ 1901.152733] do_unlinkat+0x260/0x428 [ 1901.153056] __arm64_sys_unlinkat+0x6c/0xc0 [ 1901.153455] el0_svc_common+0xc8/0x320 [ 1901.153799] el0_svc_handler+0xf8/0x160 [ 1901.154265] el0_svc+0x10/0x218 [ 1901.154682] ================================================================== This issue may happens like this: Process1 Process2 ext4_evict_inode ext4_journal_start ext4_truncate ext4_ind_truncate ext4_free_branches ext4_ind_truncate_ensure_credits ext4_journal_ensure_credits_fn ext4_journal_restart handle->h_transaction = NULL; mount -o remount,abort /mnt -> trigger JBD abort start_this_handle -> will return failed ext4_xattr_delete_inode ext4_journal_ensure_credits ext4_journal_ensure_credits_fn __ext4_journal_ensure_credits jbd2_handle_buffer_credits journal = handle->h_transaction->t_journal; ->null-ptr-deref Now, indirect truncate process didn't handle error. To solve this issue maybe simply add check handle is abort in '__ext4_journal_ensure_credits' is enough, and i also think this is necessary. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20211224100341.3299128-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10ext4: initialize err_blk before calling __ext4_get_inode_locHarshad Shirwadkar1-2/+2
It is not guaranteed that __ext4_get_inode_loc will definitely set err_blk pointer when it returns EIO. To avoid using uninitialized variables, let's first set err_blk to 0. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211201163421.2631661-1-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2022-01-10ext4: fix a possible ABBA deadlock due to busy PAChunguang Xu1-22/+18
We found on older kernel (3.10) that in the scenario of insufficient disk space, system may trigger an ABBA deadlock problem, it seems that this problem still exists in latest kernel, try to fix it here. The main process triggered by this problem is that task A occupies the PA and waits for the jbd2 transaction finish, the jbd2 transaction waits for the completion of task B's IO (plug_list), but task B waits for the release of PA by task A to finish discard, which indirectly forms an ABBA deadlock. The related calltrace is as follows: Task A vfs_write ext4_mb_new_blocks() ext4_mb_mark_diskspace_used() JBD2 jbd2_journal_get_write_access() -> jbd2_journal_commit_transaction() ->schedule() filemap_fdatawait() | | | Task B | | do_unlinkat() | | ext4_evict_inode() | | jbd2_journal_begin_ordered_truncate() | | filemap_fdatawrite_range() | | ext4_mb_new_blocks() | -ext4_mb_discard_group_preallocations() <----- Here, try to cancel ext4_mb_discard_group_preallocations() internal retry due to PA busy, and do a limited number of retries inside ext4_mb_discard_preallocations(), which can circumvent the above problems, but also has some advantages: 1. Since the PA is in a busy state, if other groups have free PAs, keeping the current PA may help to reduce fragmentation. 2. Continue to traverse forward instead of waiting for the current group PA to be released. In most scenarios, the PA discard time can be reduced. However, in the case of smaller free space, if only a few groups have space, then due to multiple traversals of the group, it may increase CPU overhead. But in contrast, I feel that the overall benefit is better than the cost. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/1637630277-23496-1-git-send-email-brookxu.cn@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2022-01-10ext4: replace snprintf in show functions with sysfs_emitQing Wang1-17/+17
coccicheck complains about the use of snprintf() in sysfs show functions. Fix the coccicheck warning: WARNING: use scnprintf or sprintf. Use sysfs_emit instead of scnprintf or sprintf makes more sense. Signed-off-by: Qing Wang <wangqing@vivo.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/1634095731-4528-1-git-send-email-wangqing@vivo.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10ext4: make sure to reset inode lockdep class when quota enabling failsJan Kara1-1/+12
When we succeed in enabling some quota type but fail to enable another one with quota feature, we correctly disable all enabled quota types. However we forget to reset i_data_sem lockdep class. When the inode gets freed and reused, it will inherit this lockdep class (i_data_sem is initialized only when a slab is created) and thus eventually lockdep barfs about possible deadlocks. Reported-and-tested-by: syzbot+3b6f9218b1301ddda3e2@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20211007155336.12493-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10ext4: make sure quota gets properly shutdown on errorJan Kara1-4/+6
When we hit an error when enabling quotas and setting inode flags, we do not properly shutdown quota subsystem despite returning error from Q_QUOTAON quotactl. This can lead to some odd situations like kernel using quota file while it is still writeable for userspace. Make sure we properly cleanup the quota subsystem in case of error. Signed-off-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20211007155336.12493-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-01-10ext4: Fix BUG_ON in ext4_bread when write quota dataYe Bin1-1/+1
We got issue as follows when run syzkaller: [ 167.936972] EXT4-fs error (device loop0): __ext4_remount:6314: comm rep: Abort forced by user [ 167.938306] EXT4-fs (loop0): Remounting filesystem read-only [ 167.981637] Assertion failure in ext4_getblk() at fs/ext4/inode.c:847: '(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) || handle != NULL || create == 0' [ 167.983601] ------------[ cut here ]------------ [ 167.984245] kernel BUG at fs/ext4/inode.c:847! [ 167.984882] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [ 167.985624] CPU: 7 PID: 2290 Comm: rep Tainted: G B 5.16.0-rc5-next-20211217+ #123 [ 167.986823] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 167.988590] RIP: 0010:ext4_getblk+0x17e/0x504 [ 167.989189] Code: c6 01 74 28 49 c7 c0 a0 a3 5c 9b b9 4f 03 00 00 48 c7 c2 80 9c 5c 9b 48 c7 c6 40 b6 5c 9b 48 c7 c7 20 a4 5c 9b e8 77 e3 fd ff <0f> 0b 8b 04 244 [ 167.991679] RSP: 0018:ffff8881736f7398 EFLAGS: 00010282 [ 167.992385] RAX: 0000000000000094 RBX: 1ffff1102e6dee75 RCX: 0000000000000000 [ 167.993337] RDX: 0000000000000001 RSI: ffffffff9b6e29e0 RDI: ffffed102e6dee66 [ 167.994292] RBP: ffff88816a076210 R08: 0000000000000094 R09: ffffed107363fa09 [ 167.995252] R10: ffff88839b1fd047 R11: ffffed107363fa08 R12: ffff88816a0761e8 [ 167.996205] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000001 [ 167.997158] FS: 00007f6a1428c740(0000) GS:ffff88839b000000(0000) knlGS:0000000000000000 [ 167.998238] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 167.999025] CR2: 00007f6a140716c8 CR3: 0000000133216000 CR4: 00000000000006e0 [ 167.999987] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 168.000944] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 168.001899] Call Trace: [ 168.002235] <TASK> [ 168.007167] ext4_bread+0xd/0x53 [ 168.007612] ext4_quota_write+0x20c/0x5c0 [ 168.010457] write_blk+0x100/0x220 [ 168.010944] remove_free_dqentry+0x1c6/0x440 [ 168.011525] free_dqentry.isra.0+0x565/0x830 [ 168.012133] remove_tree+0x318/0x6d0 [ 168.014744] remove_tree+0x1eb/0x6d0 [ 168.017346] remove_tree+0x1eb/0x6d0 [ 168.019969] remove_tree+0x1eb/0x6d0 [ 168.022128] qtree_release_dquot+0x291/0x340 [ 168.023297] v2_release_dquot+0xce/0x120 [ 168.023847] dquot_release+0x197/0x3e0 [ 168.024358] ext4_release_dquot+0x22a/0x2d0 [ 168.024932] dqput.part.0+0x1c9/0x900 [ 168.025430] __dquot_drop+0x120/0x190 [ 168.025942] ext4_clear_inode+0x86/0x220 [ 168.026472] ext4_evict_inode+0x9e8/0xa22 [ 168.028200] evict+0x29e/0x4f0 [ 168.028625] dispose_list+0x102/0x1f0 [ 168.029148] evict_inodes+0x2c1/0x3e0 [ 168.030188] generic_shutdown_super+0xa4/0x3b0 [ 168.030817] kill_block_super+0x95/0xd0 [ 168.031360] deactivate_locked_super+0x85/0xd0 [ 168.031977] cleanup_mnt+0x2bc/0x480 [ 168.033062] task_work_run+0xd1/0x170 [ 168.033565] do_exit+0xa4f/0x2b50 [ 168.037155] do_group_exit+0xef/0x2d0 [ 168.037666] __x64_sys_exit_group+0x3a/0x50 [ 168.038237] do_syscall_64+0x3b/0x90 [ 168.038751] entry_SYSCALL_64_after_hwframe+0x44/0xae In order to reproduce this problem, the following conditions need to be met: 1. Ext4 filesystem with no journal; 2. Filesystem image with incorrect quota data; 3. Abort filesystem forced by user; 4. umount filesystem; As in ext4_quota_write: ... if (EXT4_SB(sb)->s_journal && !handle) { ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)" " cancelled because transaction is not started", (unsigned long long)off, (unsigned long long)len); return -EIO; } ... We only check handle if NULL when filesystem has journal. There is need check handle if NULL even when filesystem has no journal. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211223015506.297766-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2022-01-10ext4: destroy ext4_fc_dentry_cachep kmemcache on module removalSebastian Andrzej Siewior3-0/+8
The kmemcache for ext4_fc_dentry_cachep remains registered after module removal. Destroy ext4_fc_dentry_cachep kmemcache on module removal. Fixes: aa75f4d3daaeb ("ext4: main fast-commit commit path") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211110134640.lyku5vklvdndw6uk@linutronix.de Link: https://lore.kernel.org/r/YbiK3JetFFl08bd7@linutronix.de Link: https://lore.kernel.org/r/20211223164436.2628390-1-bigeasy@linutronix.de Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2022-01-10ext4: fast commit may miss tracking unwritten range during ftruncateXin Yin1-2/+1
If use FALLOC_FL_KEEP_SIZE to alloc unwritten range at bottom, the inode->i_size will not include the unwritten range. When call ftruncate with fast commit enabled, it will miss to track the unwritten range. Change to trace the full range during ftruncate. Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211223032337.5198-3-yinxin.x@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2022-01-10ext4: use ext4_ext_remove_space() for fast commit replay delete rangeXin Yin1-5/+8
For now ,we use ext4_punch_hole() during fast commit replay delete range procedure. But it will be affected by inode->i_size, which may not correct during fast commit replay procedure. The following test will failed. -create & write foo (len 1000K) -falloc FALLOC_FL_ZERO_RANGE foo (range 400K - 600K) -create & fsync bar -falloc FALLOC_FL_PUNCH_HOLE foo (range 300K-500K) -fsync foo -crash before a full commit After the fast_commit reply procedure, the range 400K-500K will not be removed. Because in this case, when calling ext4_punch_hole() the inode->i_size is 0, and it just retruns with doing nothing. Change to use ext4_ext_remove_space() instead of ext4_punch_hole() to remove blocks of inode directly. Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211223032337.5198-2-yinxin.x@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2022-01-10ext4: fix fast commit may miss tracking range for FALLOC_FL_ZERO_RANGEXin Yin2-5/+4
when call falloc with FALLOC_FL_ZERO_RANGE, to set an range to unwritten, which has been already initialized. If the range is align to blocksize, fast commit will not track range for this change. Also track range for unwritten range in ext4_map_blocks(). Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211221022839.374606-1-yinxin.x@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2021-12-23ext4: update fast commit TODOsHarshad Shirwadkar1-8/+6
This series takes care of a couple of TODOs and adds new ones. Update the TODOs section to reflect current state and future work that needs to happen. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211223202140.2061101-5-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-23ext4: simplify updating of fast commit statsHarshad Shirwadkar3-59/+68
Move fast commit stats updating logic to a separate function from ext4_fc_commit(). This significantly improves readability of ext4_fc_commit(). Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211223202140.2061101-4-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-23ext4: drop ineligible txn start stop APIsHarshad Shirwadkar5-75/+20
This patch drops ext4_fc_start_ineligible() and ext4_fc_stop_ineligible() APIs. Fast commit ineligible transactions should simply call ext4_fc_mark_ineligible() after starting the trasaction. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211223202140.2061101-3-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-23ext4: use ext4_journal_start/stop for fast commit transactionsHarshad Shirwadkar6-24/+4
This patch drops all calls to ext4_fc_start_update() and ext4_fc_stop_update(). To ensure that there are no ongoing journal updates during fast commit, we also make jbd2_fc_begin_commit() lock journal for updates. This way we don't have to maintain two different transaction start stop APIs for fast commit and full commit. This patch doesn't remove the functions altogether since in future we want to have inode level locking for fast commits. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211223202140.2061101-2-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-23ext4: fix i_version handling on remountLukas Czerner1-25/+18
i_version mount option is getting lost on remount. This is because the 'i_version' mount option differs from the util-linux mount option 'iversion', but it has exactly the same functionality. We have to specifically notify the vfs that this is what we want by setting appropriate flag in fc->sb_flags. Fix it and as a result we can remove *flags argument from __ext4_remount(); do the same for __ext4_fill_super(). In addition set out to deprecate ext4 specific 'i_version' mount option in favor or 'iversion' by kernel version 5.20. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Fixes: cebe85d570cf ("ext4: switch to the new mount api") Link: https://lore.kernel.org/r/20211222104517.11187-2-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-23ext4: remove lazytime/nolazytime mount options handled by MS_LAZYTIMELukas Czerner1-11/+2
The lazytime and nolazytime mount options were added temporarily back in 2015 with commit a26f49926da9 ("ext4: add optimization for the lazytime mount option"). It think it has been enough time for the util-linux with lazytime support to get widely used. Remove the mount options. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20211222104517.11187-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-23ext4: don't fail remount if journalling mode didn't changeLukas Czerner1-5/+9
Switching to the new mount api introduced inconsistency in how the journalling mode mount option (data=) is handled during a remount. Ext4 always prevented changing the journalling mode during the remount, however the new code always fails the remount when the journalling mode is specified, even if it remains unchanged. Fix it. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reported-by: Heiner Kallweit <hkallweit1@gmail.com> Fixes: cebe85d570cf ("ext4: switch to the new mount api") Link: https://lore.kernel.org/r/20211220152657.101599-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: Remove unused match_table_t tokensLukas Czerner1-243/+131
Remove unused match_table_t, slim down mount_opts structure by removing unnecessary definitions, remove redundant MOPT_ flags and clean up ext4_parse_param() by converting the most of the if/else branching to switch except for the MOPT_SET/MOPT_CEAR handling. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-14-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: switch to the new mount apiLukas Czerner1-109/+86
Add the necessary functions for the fs_context_operations. Convert and rename ext4_remount() and ext4_fill_super() to ext4_get_tree() and ext4_reconfigure() respectively and switch the ext4 to use the new api. One user facing change is the fact that we no longer have access to the entire string of mount options provided by mount(2) since the mount api does not store it anywhere. As a result we can't print the options to the log as we did in the past after the successful mount. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-13-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: change token2str() to use ext4_param_specsLukas Czerner1-4/+4
Change token2str() to use ext4_param_specs instead of tokens so that we can get rid of tokens entirely. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-12-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: clean up return values in handle_mount_opt()Lukas Czerner1-11/+17
Clean up return values in handle_mount_opt() and rename the function to ext4_parse_param() Now we can use it in fs_context_operations as .parse_param. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-11-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: Completely separate options parsing and sb setupLukas Czerner1-135/+264
The new mount api separates option parsing and super block setup into two distinct steps and so we need to separate the options parsing out of the ext4_fill_super() and ext4_remount(). In order to achieve this we have to create new ext4_fill_super() and ext4_remount() functions which will serve its purpose only until we actually do convert to the new api (as such they are only temporary for this patch series) and move the option parsing out of the old function which will now be renamed to __ext4_fill_super() and __ext4_remount(). There is a small complication in the fact that while the mount option parsing is going to happen before we get to __ext4_fill_super(), the mount options stored in the super block itself needs to be applied first, before the user specified mount options. So with this patch we're going through the following sequence: - parse user provided options (including sb block) - initialize sbi and store s_sb_block if provided - in __ext4_fill_super() - read the super block - parse and apply options specified in s_mount_opts - check and apply user provided options stored in ctx - continue with the regular ext4_fill_super operation It's not exactly the most elegant solution, but if we still want to support s_mount_opts we have to do it in this order. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-10-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: get rid of super block and sbi from handle_mount_ops()Lukas Czerner1-173/+368
At the parsing phase of mount in the new mount api sb will not be available. We've already removed some uses of sb and sbi, but now we need to get rid of the rest of it. Use ext4_fs_context to store all of the configuration specification so that it can be later applied to the super block and sbi. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-9-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: check ext2/3 compatibility outside handle_mount_opt()Lukas Czerner1-16/+25
At the parsing phase of mount in the new mount api sb will not be available so move ext2/3 compatibility check outside handle_mount_opt(). Unfortunately we will lose the ability to show exactly which option is not compatible. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-8-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: move quota configuration out of handle_mount_opt()Lukas Czerner1-93/+165
At the parsing phase of mount in the new mount api sb will not be available so move quota confiquration out of handle_mount_opt() by noting the quota file names in the ext4_fs_context structure to be able to apply it later. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-7-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: Allow sb to be NULL in ext4_msg()Lukas Czerner1-66/+78
At the parsing phase of mount in the new mount api sb will not be available so allow sb to be NULL in ext4_msg and use that in handle_mount_opt(). Also change return value to appropriate -EINVAL where needed. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-6-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: Change handle_mount_opt() to use fs_parameterLukas Czerner1-107/+143
Use the new mount option specifications to parse the options in handle_mount_opt(). However we're still using the old API to get the options string. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-5-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: move option validation to a separate functionLukas Czerner1-2/+9
Move option validation out of parse_options() into a separate function ext4_validate_options(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-4-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: Add fs parameter specifications for mount optionsLukas Czerner1-0/+151
Create an array of fs_parameter_spec called ext4_param_specs to hold the mount option specifications we're going to be using with the new mount api. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-3-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09fs_parse: allow parameter value to be emptyLukas Czerner2-9/+24
Allow parameter value to be empty by specifying fs_param_can_be_empty flag. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-2-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-05Linux 5.16-rc4v5.16-rc4Linus Torvalds1-1/+1
2021-12-05Merge tag 'for-5.16/parisc-6' of ↵Linus Torvalds5-26/+30
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: "Some bug and warning fixes: - Fix "make install" to use debians "installkernel" script which is now in /usr/sbin - Fix the bindeb-pkg make target by giving the correct KBUILD_IMAGE file name - Fix compiler warnings by annotating parisc agp init functions with __init - Fix timekeeping on SMP machines with dual-core CPUs - Enable some more config options in the 64-bit defconfig" * tag 'for-5.16/parisc-6' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Mark cr16 CPU clocksource unstable on all SMP machines parisc: Fix "make install" on newer debian releases parisc/agp: Annotate parisc agp init functions with __init parisc: Enable sata sil, audit and usb support on 64-bit defconfig parisc: Fix KBUILD_IMAGE for self-extracting kernel
2021-12-05Merge tag 'usb-5.16-rc4' of ↵Linus Torvalds4-27/+21
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB fixes for a few reported issues. Included in here are: - xhci fix for a _much_ reported regression. I don't think there's a community distro that has not reported this problem yet :( - new USB quirk addition - cdns3 minor fixes - typec regression fix. All of these have been in linux-next with no reported problems, and the xhci fix has been reported by many to resolve their reported problem" * tag 'usb-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: cdnsp: Fix a NULL pointer dereference in cdnsp_endpoint_init() usb: cdns3: gadget: fix new urb never complete if ep cancel previous requests usb: typec: tcpm: Wait in SNK_DEBOUNCED until disconnect USB: NO_LPM quirk Lenovo Powered USB-C Travel Hub xhci: Fix commad ring abort, write all 64 bits to CRCR register.
2021-12-05Merge tag 'tty-5.16-rc4' of ↵Linus Torvalds12-33/+95
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are some small TTY and Serial driver fixes for 5.16-rc4 to resolve a number of reported problems. They include: - liteuart serial driver fixes - 8250_pci serial driver fixes for pericom devices - 8250 RTS line control fix while in RS-485 mode - tegra serial driver fix - msm_serial driver fix - pl011 serial driver new id - fsl_lpuart revert of broken change - 8250_bcm7271 serial driver fix - MAINTAINERS file update for rpmsg tty driver that came in 5.16-rc1 - vgacon fix for reported problem All of these, except for the 8250_bcm7271 fix have been in linux-next with no reported problem. The 8250_bcm7271 fix was added to the tree on Friday so no chance to be linux-next yet. But it should be fine as the affected developers submitted it" * tag 'tty-5.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: 8250_bcm7271: UART errors after resuming from S2 serial: 8250_pci: rewrite pericom_do_set_divisor() serial: 8250_pci: Fix ACCES entries in pci_serial_quirks array serial: 8250: Fix RTS modem control while in rs485 mode Revert "tty: serial: fsl_lpuart: drop earlycon entry for i.MX8QXP" serial: tegra: Change lower tolerance baud rate limit for tegra20 and tegra30 serial: liteuart: relax compile-test dependencies serial: liteuart: fix minor-number leak on probe errors serial: liteuart: fix use-after-free and memleak on unbind serial: liteuart: Fix NULL pointer dereference in ->remove() vgacon: Propagate console boot parameters before calling `vc_resize' tty: serial: msm_serial: Deactivate RX DMA for polling support serial: pl011: Add ACPI SBSA UART match id serial: core: fix transmit-buffer reset and memleak MAINTAINERS: Add rpmsg tty driver maintainer
2021-12-05Merge tag 'timers_urgent_for_v5.16_rc4' of ↵Linus Torvalds2-1/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Prevent a tick storm when a dedicated timekeeper CPU in nohz_full mode runs for prolonged periods with interrupts disabled and ends up programming the next tick in the past, leading to that storm * tag 'timers_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers/nohz: Last resort update jiffies on nohz_full IRQ entry
2021-12-05Merge tag 'sched_urgent_for_v5.16_rc4' of ↵Linus Torvalds3-8/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Properly init uclamp_flags of a runqueue, on first enqueuing - Fix preempt= callback return values - Correct utime/stime resource usage reporting on nohz_full to return the proper times instead of shorter ones * tag 'sched_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/uclamp: Fix rq->uclamp_max not set on first enqueue preempt/dynamic: Fix setup_preempt_mode() return value sched/cputime: Fix getrusage(RUSAGE_THREAD) with nohz_full
2021-12-05Merge tag 'x86_urgent_for_v5.16_rc4' of ↵Linus Torvalds10-43/+159
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Fix a couple of SWAPGS fencing issues in the x86 entry code - Use the proper operand types in __{get,put}_user() to prevent truncation in SEV-ES string io - Make sure the kernel mappings are present in trampoline_pgd in order to prevent any potential accesses to unmapped memory after switching to it - Fix a trivial list corruption in objtool's pv_ops validation - Disable the clocksource watchdog for TSC on platforms which claim that the TSC is constant, doesn't stop in sleep states, CPU has TSC adjust and the number of sockets of the platform are max 2, to prevent erroneous markings of the TSC as unstable. - Make sure TSC adjust is always checked not only when going idle - Prevent a stack leak by initializing struct _fpx_sw_bytes properly in the FPU code - Fix INTEL_FAM6_RAPTORLAKE define naming to adhere to the convention * tag 'x86_urgent_for_v5.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/xen: Add xenpv_restore_regs_and_return_to_usermode() x86/entry: Use the correct fence macro after swapgs in kernel CR3 x86/entry: Add a fence for kernel entry SWAPGS in paranoid_entry() x86/sev: Fix SEV-ES INS/OUTS instructions for word, dword, and qword x86/64/mm: Map all kernel memory into trampoline_pgd objtool: Fix pv_ops noinstr validation x86/tsc: Disable clocksource watchdog for TSC on qualified platorms x86/tsc: Add a timer to make sure TSC_adjust is always checked x86/fpu/signal: Initialize sw_bytes in save_xstate_epilog() x86/cpu: Drop spurious underscore from RAPTOR_LAKE #define
2021-12-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds10-51/+106
Pull more kvm fixes from Paolo Bonzini: - Static analysis fix - New SEV-ES protocol for communicating invalid VMGEXIT requests - Ensure APICv is considered inactive if there is no APIC - Fix reserved bits for AMD PerfEvtSeln register * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure KVM: SEV: Fall back to vmalloc for SEV-ES scratch area if necessary KVM: SEV: Return appropriate error codes if SEV-ES scratch setup fails KVM: x86/mmu: Retry page fault if root is invalidated by memslot update KVM: VMX: Set failure code in prepare_vmcs02() KVM: ensure APICv is considered inactive if there is no APIC KVM: x86/pmu: Fix reserved bits for AMD PerfEvtSeln register
2021-12-05KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failureTom Lendacky2-46/+71
Currently, an SEV-ES guest is terminated if the validation of the VMGEXIT exit code or exit parameters fails. The VMGEXIT instruction can be issued from userspace, even though userspace (likely) can't update the GHCB. To prevent userspace from being able to kill the guest, return an error through the GHCB when validation fails rather than terminating the guest. For cases where the GHCB can't be updated (e.g. the GHCB can't be mapped, etc.), just return back to the guest. The new error codes are documented in the lasest update to the GHCB specification. Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <b57280b5562893e2616257ac9c2d4525a9aeeb42.1638471124.git.thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-05KVM: SEV: Fall back to vmalloc for SEV-ES scratch area if necessarySean Christopherson1-4/+4
Use kvzalloc() to allocate KVM's buffer for SEV-ES's GHCB scratch area so that KVM falls back to __vmalloc() if physically contiguous memory isn't available. The buffer is purely a KVM software construct, i.e. there's no need for it to be physically contiguous. Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109222350.2266045-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-05KVM: SEV: Return appropriate error codes if SEV-ES scratch setup failsSean Christopherson1-13/+17
Return appropriate error codes if setting up the GHCB scratch area for an SEV-ES guest fails. In particular, returning -EINVAL instead of -ENOMEM when allocating the kernel buffer could be confusing as userspace would likely suspect a guest issue. Fixes: 8f423a80d299 ("KVM: SVM: Support MMIO for an SEV-ES guest") Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211109222350.2266045-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-04Merge tag 'xfs-5.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds1-1/+0
Pull xfs fix from Darrick Wong: "Remove an unnecessary (and backwards) rename flags check that duplicates a VFS level check" * tag 'xfs-5.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove incorrect ASSERT in xfs_rename