summaryrefslogtreecommitdiffstats
path: root/fs/ext4/inode.c
AgeCommit message (Collapse)AuthorFilesLines
2017-02-14ext4: don't BUG when truncating encrypted inodes on the orphan listTheodore Ts'o1-0/+4
Fix a BUG when the kernel tries to mount a file system constructed as follows: echo foo > foo.txt mke2fs -Fq -t ext4 -O encrypt foo.img 100 debugfs -w foo.img << EOF write foo.txt a set_inode_field a i_flags 0x80800 set_super_value s_last_orphan 12 quit EOF root@kvm-xfstests:~# mount -o loop foo.img /mnt [ 160.238770] ------------[ cut here ]------------ [ 160.240106] kernel BUG at /usr/projects/linux/ext4/fs/ext4/inode.c:3874! [ 160.240106] invalid opcode: 0000 [#1] SMP [ 160.240106] Modules linked in: [ 160.240106] CPU: 0 PID: 2547 Comm: mount Tainted: G W 4.10.0-rc3-00034-gcdd33b941b67 #227 [ 160.240106] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014 [ 160.240106] task: f4518000 task.stack: f47b6000 [ 160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4 [ 160.240106] EFLAGS: 00010246 CPU: 0 [ 160.240106] EAX: 00000001 EBX: f7be4b50 ECX: f47b7dc0 EDX: 00000007 [ 160.240106] ESI: f43b05a8 EDI: f43babec EBP: f47b7dd0 ESP: f47b7dac [ 160.240106] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 160.240106] CR0: 80050033 CR2: bfd85b08 CR3: 34a00680 CR4: 000006f0 [ 160.240106] Call Trace: [ 160.240106] ext4_truncate+0x1e9/0x3e5 [ 160.240106] ext4_fill_super+0x286f/0x2b1e [ 160.240106] ? set_blocksize+0x2e/0x7e [ 160.240106] mount_bdev+0x114/0x15f [ 160.240106] ext4_mount+0x15/0x17 [ 160.240106] ? ext4_calculate_overhead+0x39d/0x39d [ 160.240106] mount_fs+0x58/0x115 [ 160.240106] vfs_kern_mount+0x4b/0xae [ 160.240106] do_mount+0x671/0x8c3 [ 160.240106] ? _copy_from_user+0x70/0x83 [ 160.240106] ? strndup_user+0x31/0x46 [ 160.240106] SyS_mount+0x57/0x7b [ 160.240106] do_int80_syscall_32+0x4f/0x61 [ 160.240106] entry_INT80_32+0x2f/0x2f [ 160.240106] EIP: 0xb76b919e [ 160.240106] EFLAGS: 00000246 CPU: 0 [ 160.240106] EAX: ffffffda EBX: 08053838 ECX: 08052188 EDX: 080537e8 [ 160.240106] ESI: c0ed0000 EDI: 00000000 EBP: 080537e8 ESP: bfa13660 [ 160.240106] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b [ 160.240106] Code: 59 8b 00 a8 01 0f 84 09 01 00 00 8b 07 66 25 00 f0 66 3d 00 80 75 61 89 f8 e8 3e e2 ff ff 84 c0 74 56 83 bf 48 02 00 00 00 75 02 <0f> 0b 81 7d e8 00 10 00 00 74 02 0f 0b 8b 43 04 8b 53 08 31 c9 [ 160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4 SS:ESP: 0068:f47b7dac [ 160.317241] ---[ end trace d6a773a375c810a5 ]--- The problem is that when the kernel tries to truncate an inode in ext4_truncate(), it tries to clear any on-disk data beyond i_size. Without the encryption key, it can't do that, and so it triggers a BUG. E2fsck does *not* provide this service, and in practice most file systems have their orphan list processed by e2fsck, so to avoid crashing, this patch skips this step if we don't have access to the encryption key (which is the case when processing the orphan list; in all other cases, we will have the encryption key, or the kernel wouldn't have allowed the file to be opened). An open question is whether the fact that e2fsck isn't clearing the bytes beyond i_size causing problems --- and if we've lived with it not doing it for so long, can we drop this from the kernel replay of the orphan list in all cases (not just when we don't have the key for encrypted inodes). Addresses-Google-Bug: #35209576 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-05ext4: add shutdown bit and check for itTheodore Ts'o1-2/+28
Add a shutdown bit that will cause ext4 processing to fail immediately with EIO. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-02-04ext4: fix inline data error pathsTheodore Ts'o1-5/+15
The write_end() function must always unlock the page and drop its ref count, even on an error. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2017-01-27ext4: fix data corruption in data=journal modeJan Kara1-10/+13
ext4_journalled_write_end() did not propely handle all the cases when generic_perform_write() did not copy all the data into the target page and could mark buffers with uninitialized contents as uptodate and dirty leading to possible data corruption (which would be quickly fixed by generic_perform_write() retrying the write but still). Fix the problem by carefully handling the case when the page that is written to is not uptodate. CC: stable@vger.kernel.org Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-01-22ext4: propagate error values from ext4_inline_data_truncate()Theodore Ts'o1-1/+3
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-01-11ext4: avoid calling ext4_mark_inode_dirty() under unneeded semaphoresTheodore Ts'o1-1/+1
There is no need to call ext4_mark_inode_dirty while holding xattr_sem or i_data_sem, so where it's easy to avoid it, move it out from the critical region. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-14Merge branch 'for-4.10/fs-unmap' of git://git.kernel.dk/linux-blockLinus Torvalds1-13/+5
Pull fs meta data unmap optimization from Jens Axboe: "A series from Jan Kara, providing a more efficient way for unmapping meta data from in the buffer cache than doing it block-by-block. Provide a general helper that existing callers can use" * 'for-4.10/fs-unmap' of git://git.kernel.dk/linux-block: fs: Remove unmap_underlying_metadata fs: Add helper to clean bdev aliases under a bh and use it ext2: Use clean_bdev_aliases() instead of iteration ext4: Use clean_bdev_aliases() instead of iteration direct-io: Use clean_bdev_aliases() instead of handmade iteration fs: Provide function to unmap metadata for a range of blocks
2016-12-12Merge branch 'fscrypt' into devTheodore Ts'o1-1/+0
2016-12-11fscrypt: Cleanup page locking requirements for fscrypt_{decrypt,encrypt}_page()David Gstir1-1/+0
Rename the FS_CFLG_INPLACE_ENCRYPTION flag to FS_CFLG_OWN_PAGES which, when set, indicates that the fs uses pages under its own control as opposed to writeback pages which require locking and a bounce buffer for encryption. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-10ext4: reject inodes with negative sizeDarrick J. Wong1-0/+6
Don't load an inode with a negative size; this causes integer overflow problems in the VFS. [ Added EXT4_ERROR_INODE() to mark file system as corrupted. -TYT] Fixes: a48380f769df (ext4: rename i_dir_acl to i_size_high) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2016-12-02ext4: fix reading new encrypted symlinks on no-journal file systemsTheodore Ts'o1-1/+2
On a filesystem with no journal, a symlink longer than about 32 characters (exact length depending on padding for encryption) could not be followed or read immediately after being created in an encrypted directory. This happened because when the symlink data went through the delayed allocation path instead of the journaling path, the symlink was incorrectly detected as a "fast" symlink rather than a "slow" symlink until its data was written out. To fix this, disable delayed allocation for symlinks, since there is no benefit for delayed allocation anyway. Reported-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-01ext4: don't read out of bounds when checking for in-inode xattrsEric Biggers1-1/+3
With i_extra_isize equal to or close to the available space, it was possible for us to read past the end of the inode when trying to detect or validate in-inode xattrs. Fix this by checking for the needed extra space first. This patch shouldn't have any noticeable effect on non-corrupted/non-malicious filesystems. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2016-12-01ext4: forbid i_extra_isize not divisible by 4Eric Biggers1-4/+7
i_extra_isize not divisible by 4 is problematic for several reasons: - It causes the in-inode xattr space to be misaligned, but the xattr header and entries are not declared __packed to express this possibility. This may cause poor performance or incorrect code generation on some platforms. - When validating the xattr entries we can read past the end of the inode if the size available for xattrs is not a multiple of 4. - It allows the nonsensical i_extra_isize=1, which doesn't even leave enough room for i_extra_isize itself. Therefore, update ext4_iget() to consider i_extra_isize not divisible by 4 to be an error, like the case where i_extra_isize is too large. This also matches the rule recently added to e2fsck for determining whether an inode has valid i_extra_isize. This patch shouldn't have any noticeable effect on non-corrupted/non-malicious filesystems, since the size of ext4_inode has always been a multiple of 4. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2016-12-01ext4: fix inode checksum calculation problem if i_extra_size is smallDaeho Jeong1-3/+2
We've fixed the race condition problem in calculating ext4 checksum value in commit b47820edd163 ("ext4: avoid modifying checksum fields directly during checksum veficationon"). However, by this change, when calculating the checksum value of inode whose i_extra_size is less than 4, we couldn't calculate the checksum value in a proper way. This problem was found and reported by Nix, Thank you. Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-01ext4: warn when page is dirtied without buffersJan Kara1-0/+9
Warn when a page is dirtied without buffers (as that will likely lead to a crash in ext4_writepages()) or when it gets newly dirtied without the page being locked (as there is nothing that prevents buffers to get stripped just before calling set_page_dirty() under memory pressure). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: rip out DAX handling from direct IO pathJan Kara1-82/+15
Reads and writes for DAX inodes should no longer end up in direct IO code. Rip out the support and add a warning. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: convert DAX faults to iomap infrastructureJan Kara1-5/+9
Convert DAX faults to use iomap infrastructure. We would not have to start transaction in ext4_dax_fault() anymore since ext4_iomap_begin takes care of that but so far we do that to avoid lock inversion of transaction start with DAX entry lock which gets acquired in dax_iomap_fault() before calling ->iomap_begin handler. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: avoid split extents for DAX writesJan Kara1-17/+0
Currently mapping of blocks for DAX writes happen with EXT4_GET_BLOCKS_PRE_IO flag set. That has a result that each ext4_map_blocks() call creates a separate written extent, although it could be merged to the neighboring extents in the extent tree. The reason for using this flag is that in case the extent is unwritten, we need to convert it to written one and zero it out. However this "convert mapped range to written" operation is already implemented by ext4_map_blocks() for the case of data writes into unwritten extent. So just use flags for that mode of operation, simplify the code, and avoid unnecessary split extents. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: DAX iomap write supportJan Kara1-6/+120
Implement DAX writes using the new iomap infrastructure instead of overloading the direct IO path. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: use iomap for zeroing blocks in DAX modeJan Kara1-2/+4
Use iomap infrastructure for zeroing blocks when in DAX mode. ext4_iomap_begin() handles read requests just fine and that's all that is needed for iomap_zero_range(). Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: convert DAX reads to iomap infrastructureJan Kara1-0/+54
Implement basic iomap_begin function that handles reading and use it for DAX reads. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: only set S_DAX if DAX is really supportedJan Kara1-1/+8
Currently we have S_DAX set inode->i_flags for a regular file whenever ext4 is mounted with dax mount option. However in some cases we cannot really do DAX - e.g. when inode is marked to use data journalling, when inode data is being encrypted, or when inode is stored inline. Make sure S_DAX flag is appropriately set/cleared in these cases. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-14ext4: allow inode expansion for nojournal file systemsEric Whitney1-5/+7
Runs of xfstest ext4/022 on nojournal file systems result in failures because the inodes of some of its test files do not expand as expected. The cause is a conditional in ext4_mark_inode_dirty() that prevents inode expansion unless the test file system has a journal. Remove this unnecessary restriction. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-14ext4: use current_time() for inode timestampsDeepa Dinamani1-3/+3
CURRENT_TIME_SEC and CURRENT_TIME are not y2038 safe. current_time() will be transitioned to be y2038 safe along with vfs. current_time() returns timestamps according to the granularities set in the super_block. The granularity check in ext4_current_time() to call current_time() or CURRENT_TIME_SEC is not required. Use current_time() directly to obtain timestamps unconditionally, and remove ext4_current_time(). Quota files are assumed to be on the same filesystem. Hence, use current_time() for these files as well. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Arnd Bergmann <arnd@arndb.de>
2016-11-13ext4: allow ext4_ext_truncate() to return an errorTheodore Ts'o1-1/+3
Return errors to the caller instead of declaring the file system corrupted. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-11-13ext4: allow ext4_truncate() to return an errorTheodore Ts'o1-15/+26
This allows us to properly propagate errors back up to ext4_truncate()'s callers. This also means we no longer have to silently ignore some errors (e.g., when trying to add the inode to the orphan inode list). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-11-13Merge branch 'fscrypt' into originTheodore Ts'o1-2/+5
2016-11-13fscrypt: Let fs select encryption index/tweakDavid Gstir1-2/+2
Avoid re-use of page index as tweak for AES-XTS when multiple parts of same page are encrypted. This will happen on multiple (partial) calls of fscrypt_encrypt_page on same page. page->index is only valid for writeback pages. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13fscrypt: Enable partial page encryptionDavid Gstir1-2/+4
Not all filesystems work on full pages, thus we should allow them to hand partial pages to fscrypt for en/decryption. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13fscrypt: Allow fscrypt_decrypt_page() to function with non-writeback pagesDavid Gstir1-2/+3
Some filesystem might pass pages which do not have page->mapping->host set to the encrypted inode. We want the caller to explicitly pass the corresponding inode. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-08ext4: tell DAX the size of allocation holesRoss Zwisler1-0/+3
When DAX calls _ext4_get_block() and the file offset points to a hole we currently don't set bh->b_size. This is current worked around via buffer_size_valid() in fs/dax.c. _ext4_get_block() has the hole size information from ext4_map_blocks(), so populate bh->b_size so we can remove buffer_size_valid() in a later patch. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-04fs: Add helper to clean bdev aliases under a bh and use itJan Kara1-2/+1
Add a helper function that clears buffer heads from a block device aliasing passed bh. Use this helper function from filesystems instead of the original unmap_underlying_metadata() to save some boiler plate code and also have a better name for the functionalily since it is not unmapping anything for a *long* time. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-04ext4: Use clean_bdev_aliases() instead of iterationJan Kara1-11/+4
Use clean_bdev_aliases() instead of iterating through blocks one by one. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-10Merge branch 'work.misc' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted misc bits and pieces. There are several single-topic branches left after this (rename2 series from Miklos, current_time series from Deepa Dinamani, xattr series from Andreas, uaccess stuff from from me) and I'd prefer to send those separately" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits) proc: switch auxv to use of __mem_open() hpfs: support FIEMAP cifs: get rid of unused arguments of CIFSSMBWrite() posix_acl: uapi header split posix_acl: xattr representation cleanups fs/aio.c: eliminate redundant loads in put_aio_ring_file fs/internal.h: add const to ns_dentry_operations declaration compat: remove compat_printk() fs/buffer.c: make __getblk_slow() static proc: unsigned file descriptors fs/file: more unsigned file descriptors fs: compat: remove redundant check of nr_segs cachefiles: Fix attempt to read i_blocks after deleting file [ver #2] cifs: don't use memcpy() to copy struct iov_iter get rid of separate multipage fault-in primitives fs: Avoid premature clearing of capabilities fs: Give dentry to inode_change_ok() instead of inode fuse: Propagate dentry down to inode_change_ok() ceph: Propagate dentry down to inode_change_ok() xfs: Propagate dentry down to inode_change_ok() ...
2016-10-08Merge remote-tracking branch 'jk/vfs' into work.miscAl Viro1-1/+1
2016-09-30ext4: unmap metadata when zeroing blocksJan Kara1-1/+9
When zeroing blocks for DAX allocations, we also have to unmap aliases in the block device mappings. Otherwise writeback can overwrite zeros with stale data from block device page cache. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2016-09-30ext4: Allow parallel DIO readsJan Kara1-22/+18
We can easily support parallel direct IO reads. We only have to make sure we cannot expose uninitialized data by reading allocated block to which data was not written yet, or which was already truncated. That is easily achieved by holding inode_lock in shared mode - that excludes all writes, truncates, hole punches. We also have to guard against page writeback allocating blocks for delay-allocated pages - that race is handled by the fact that we writeback all the pages in the affected range and the lock protects us from new pages being created there. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-09-22ext4: allow DAX writeback for hole punchRoss Zwisler1-2/+2
Currently when doing a DAX hole punch with ext4 we fail to do a writeback. This is because the logic around filemap_write_and_wait_range() in ext4_punch_hole() only looks for dirty page cache pages in the radix tree, not for dirty DAX exceptional entries. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-09-22fs: Give dentry to inode_change_ok() instead of inodeJan Kara1-1/+1
inode_change_ok() will be resposible for clearing capabilities and IMA extended attributes and as such will need dentry. Give it as an argument to inode_change_ok() instead of an inode. Also rename inode_change_ok() to setattr_prepare() to better relect that it does also some modifications in addition to checks. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2016-09-15ext4: bugfix for mmaped pages in mpage_release_unused_pages()wangguang1-0/+2
Pages clear buffers after ext4 delayed block allocation failed, However, it does not clean its pte_dirty flag. if the pages unmap ,in cording to the pte_dirty , unmap_page_range may try to call __set_page_dirty, which may lead to the bugon at mpage_prepare_extent_to_map:head = page_buffers(page);. This patch just call clear_page_dirty_for_io to clean pte_dirty at mpage_release_unused_pages for pages mmaped. Steps to reproduce the bug: (1) mmap a file in ext4 addr = (char *)mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); memset(addr, 'i', 4096); (2) return EIO at ext4_writepages->mpage_map_and_submit_extent->mpage_map_one_extent which causes this log message to be print: ext4_msg(sb, KERN_CRIT, "Delayed block allocation failed for " "inode %lu at logical offset %llu with" " max blocks %u with error %d", inode->i_ino, (unsigned long long)map->m_lblk, (unsigned)map->m_len, -err); (3)Unmap the addr cause warning at __set_page_dirty:WARN_ON_ONCE(warn && !PageUptodate(page)); (4) wait for a minute,then bugon happen. Cc: stable@vger.kernel.org Signed-off-by: wangguang <wangguang03@zte.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-09-05ext4: remove old feature helpersKaho Ng1-4/+3
Use the ext4_{has,set,clear}_feature_* helpers to replace the old feature helpers. Signed-off-by: Kaho Ng <ngkaho1234@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-09-05ext4: reinforce check of i_dtime when clearing high fields of uid and gidDaeho Jeong1-4/+4
Now, ext4_do_update_inode() clears high 16-bit fields of uid/gid of deleted and evicted inode to fix up interoperability with old kernels. However, it checks only i_dtime of an inode to determine whether the inode was deleted and evicted, and this is very risky, because i_dtime can be used for the pointer maintaining orphan inode list, too. We need to further check whether the i_dtime is being used for the orphan inode list even if the i_dtime is not NULL. We found that high 16-bit fields of uid/gid of inode are unintentionally and permanently cleared when the inode truncation is just triggered, but not finished, and the inode metadata, whose high uid/gid bits are cleared, is written on disk, and the sudden power-off follows that in order. Cc: stable@vger.kernel.org Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Hobin Woo <hobin.woo@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-08-29Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix bugs that could cause kernel deadlocks or file system corruption while moving xattrs to expand the extended inode. Also add some sanity checks to the block group descriptors to make sure we don't end up overwriting the superblock" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: avoid deadlock when expanding inode size ext4: properly align shifted xattrs when expanding inodes ext4: fix xattr shifting when expanding inodes part 2 ext4: fix xattr shifting when expanding inodes ext4: validate that metadata blocks do not overlap superblock ext4: reserve xattr index for the Hurd
2016-08-11ext4: avoid deadlock when expanding inode sizeJan Kara1-2/+0
When we need to move xattrs into external xattr block, we call ext4_xattr_block_set() from ext4_expand_extra_isize_ea(). That may end up calling ext4_mark_inode_dirty() again which will recurse back into the inode expansion code leading to deadlocks. Protect from recursion using EXT4_STATE_NO_EXPAND inode flag and move its management into ext4_expand_extra_isize_ea() since its manipulation is safe there (due to xattr_sem) from possible races with ext4_xattr_set_handle() which plays with it as well. CC: stable@vger.kernel.org # 4.4.x Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-07-26Merge tag 'ext4_for_linus' of ↵Linus Torvalds1-26/+55
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "The major change this cycle is deleting ext4's copy of the file system encryption code and switching things over to using the copies in fs/crypto. I've updated the MAINTAINERS file to add an entry for fs/crypto listing Jaeguk Kim and myself as the maintainers. There are also a number of bug fixes, most notably for some problems found by American Fuzzy Lop (AFL) courtesy of Vegard Nossum. Also fixed is a writeback deadlock detected by generic/130, and some potential races in the metadata checksum code" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (21 commits) ext4: verify extent header depth ext4: short-cut orphan cleanup on error ext4: fix reference counting bug on block allocation error MAINTAINRES: fs-crypto maintainers update ext4 crypto: migrate into vfs's crypto engine ext2: fix filesystem deadlock while reading corrupted xattr block ext4: fix project quota accounting without quota limits enabled ext4: validate s_reserved_gdt_blocks on mount ext4: remove unused page_idx ext4: don't call ext4_should_journal_data() on the journal inode ext4: Fix WARN_ON_ONCE in ext4_commit_super() ext4: fix deadlock during page writeback ext4: correct error value of function verifying dx checksum ext4: avoid modifying checksum fields directly during checksum verification ext4: check for extents that wrap around jbd2: make journal y2038 safe jbd2: track more dependencies on transaction commit jbd2: move lockdep tracking to journal_s jbd2: move lockdep instrumentation for jbd2 handles ext4: respect the nobarrier mount option in nojournal mode ...
2016-07-10ext4 crypto: migrate into vfs's crypto engineJaegeuk Kim1-4/+4
This patch removes the most parts of internal crypto codes. And then, it modifies and adds some ext4-specific crypt codes to use the generic facility. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-07-04ext4: don't call ext4_should_journal_data() on the journal inodeVegard Nossum1-3/+3
If ext4_fill_super() fails early, it's possible for ext4_evict_inode() to call ext4_should_journal_data() before superblock options and flags are fully set up. In that case, the iput() on the journal inode can end up causing a BUG(). Work around this problem by reordering the tests so we only call ext4_should_journal_data() after we know it's not the journal inode. Fixes: 2d859db3e4 ("ext4: fix data corruption in inodes with journalled data") Fixes: 2b405bfa84 ("ext4: fix data=journal fast mount/umount hang") Cc: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-07-04ext4: fix deadlock during page writebackJan Kara1-3/+26
Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a deadlock in ext4_writepages() which was previously much harder to hit. After this commit xfstest generic/130 reproduces the deadlock on small filesystems. The problem happens when ext4_do_update_inode() sets LARGE_FILE feature and marks current inode handle as synchronous. That subsequently results in ext4_journal_stop() called from ext4_writepages() to block waiting for transaction commit while still holding page locks, reference to io_end, and some prepared bio in mpd structure each of which can possibly block transaction commit from completing and thus results in deadlock. Fix the problem by releasing page locks, io_end reference, and submitting prepared bio before calling ext4_journal_stop(). [ Changed to defer the call to ext4_journal_stop() only if the handle is synchronous. --tytso ] Reported-and-tested-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2016-07-03ext4: avoid modifying checksum fields directly during checksum verificationDaeho Jeong1-16/+22
We temporally change checksum fields in buffers of some types of metadata into '0' for verifying the checksum values. By doing this without locking the buffer, some metadata's checksums, which are being committed or written back to the storage, could be damaged. In our test, several metadata blocks were found with damaged metadata checksum value during recovery process. When we only verify the checksum value, we have to avoid modifying checksum fields directly. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-06-07fs: have ll_rw_block users pass in op and flags separatelyMike Christie1-3/+3
This has ll_rw_block users pass in the operation and flags separately, so ll_rw_block can setup the bio op and bi_rw flags on the bio that is submitted. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>