summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2014-03-13fs: push sync_filesystem() down to the file system's remount_fs()Theodore Ts'o44-2/+53
Previously, the no-op "mount -o mount /dev/xxx" operation when the file system is already mounted read-write causes an implied, unconditional syncfs(). This seems pretty stupid, and it's certainly documented or guaraunteed to do this, nor is it particularly useful, except in the case where the file system was mounted rw and is getting remounted read-only. However, it's possible that there might be some file systems that are actually depending on this behavior. In most file systems, it's probably fine to only call sync_filesystem() when transitioning from read-write to read-only, and there are some file systems where this is not needed at all (for example, for a pseudo-filesystem or something like romfs). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jan Kara <jack@suse.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anders Larsen <al@alarsen.net> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: xfs@oss.sgi.com Cc: linux-btrfs@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: codalist@coda.cs.cmu.edu Cc: linux-ext4@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: fuse-devel@lists.sourceforge.net Cc: cluster-devel@redhat.com Cc: linux-mtd@lists.infradead.org Cc: jfs-discussion@lists.sourceforge.net Cc: linux-nfs@vger.kernel.org Cc: linux-nilfs@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Cc: reiserfs-devel@vger.kernel.org
2014-03-12jbd2: improve error messages for inconsistent journal headsTheodore Ts'o2-19/+24
Fix up error messages printed when the transaction pointers in a journal head are inconsistent. This improves the error messages which are printed when running xfstests generic/068 in data=journal mode. See the bug report at: https://bugzilla.kernel.org/show_bug.cgi?id=60786 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-03-09jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()Theodore Ts'o1-2/+4
It's not needed until we start trying to modifying fields in the journal_head which are protected by j_list_lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-03-09jbd2: minimize region locked by j_list_lock in journal_get_create_access()Theodore Ts'o1-1/+2
It's not needed until we start trying to modifying fields in the journal_head which are protected by j_list_lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-03-09jbd2: check jh->b_transaction without taking j_list_lockTheodore Ts'o1-2/+2
jh->b_transaction is adequately protected for reading by the jbd_lock_bh_state(bh), so we don't need to take j_list_lock in __journal_try_to_free_buffer(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-03-08jbd2: add transaction to checkpoint list earlierTheodore Ts'o1-19/+20
We don't otherwise need j_list_lock during the rest of commit phase #7, so add the transaction to the checkpoint list at the very end of commit phase #6. This allows us to drop j_list_lock earlier, which is a good thing since it is a super hot lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-03-08jbd2: calculate statistics without holding j_state_lock and j_list_lockTheodore Ts'o1-18/+18
The two hottest locks, and thus the biggest scalability bottlenecks, in the jbd2 layer, are the j_list_lock and j_state_lock. This has inspired some people to do some truly unnatural things[1]. [1] https://www.usenix.org/system/files/conference/fast14/fast14-paper_kang.pdf We don't need to be holding both j_state_lock and j_list_lock while calculating the journal statistics, so move those calculations to the very end of jbd2_journal_commit_transaction. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-03-08jbd2: don't hold j_state_lock while calling wake_up()Theodore Ts'o1-2/+2
The j_state_lock is one of the hottest locks in the jbd2 layer and thus one of its scalability bottlenecks. We don't need to be holding the j_state_lock while we are calling wake_up(&journal->j_wait_commit), so release the lock a little bit earlier. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-03-08jbd2: don't unplug after writing revoke recordsTheodore Ts'o1-2/+0
During commit process, keep the block device plugged after we are done writing the revoke records, until we are finished writing the rest of the commit records in the journal. This will allow most of the journal blocks to be written in a single I/O operation, instead of separating the the revoke blocks from the rest of the journal blocks. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-03-04ext4: Speedup WB_SYNC_ALL pass called from sync(2)Jan Kara1-2/+11
When doing filesystem wide sync, there's no need to force transaction commit (or synchronously write inode buffer) separately for each inode because ext4_sync_fs() takes care of forcing commit at the end (VFS takes care of flushing buffer cache, respectively). Most of the time this slowness doesn't manifest because previous WB_SYNC_NONE writeback doesn't leave much to write but when there are processes aggressively creating new files and several filesystems to sync, the sync slowness can be noticeable. In the following test script sync(1) takes around 6 minutes when there are two ext4 filesystems mounted on a standard SATA drive. After this patch sync takes a couple of seconds so we have about two orders of magnitude improvement. function run_writers { for (( i = 0; i < 10; i++ )); do mkdir $1/dir$i for (( j = 0; j < 40000; j++ )); do dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null done & done } for dir in "$@"; do run_writers $dir done sleep 40 time sync Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-23ext4: Add support FALLOC_FL_COLLAPSE_RANGE for fallocateNamjae Jeon4-3/+342
This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for Ext4. The semantics of this flag are following: 1) It collapses the range lying between offset and length by removing any data blocks which are present in this range and than updates all the logical offsets of extents beyond "offset + len" to nullify the hole created by removing blocks. In short, it does not leave a hole. 2) It should be used exclusively. No other fallocate flag in combination. 3) Offset and length supplied to fallocate should be fs block size aligned in case of xfs and ext4. 4) Collaspe range does not work beyond i_size. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Tested-by: Dongsu Park <dongsu.park@profitbricks.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-22ext4: translate fallocate mode bits to stringsLukas Czerner3-3/+8
Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-20ext4: merge uninitialized extentsDarrick J. Wong1-4/+17
Allow for merging uninitialized extents. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-20ext4: avoid exposure of stale data in ext4_punch_hole()Maxim Patlasov1-0/+6
While handling punch-hole fallocate, it's useless to truncate page cache before removing the range from extent tree (or block map in indirect case) because page cache can be re-populated (by read-ahead or read(2) or mmap-ed read) immediately after truncating page cache, but before updating extent tree (or block map). In that case the user will see stale data even after fallocate is completed. Until the problem of data corruption resulting from pages backed by already freed blocks is fully resolved, the simple thing we can do now is to add another truncation of pagecache after punch hole is done. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2014-02-20ext4: silence warnings in extent status tree debugging codeEric Whitney1-7/+7
Adjust the conversion specifications in a few optionally compiled debug messages to match the return type of ext4_es_status(). Also, make a couple of minor grammatical message edits while we're at it. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-20ext4: remove unused ac_ex_scannedEric Sandeen2-4/+1
When looking at a bug report with: > kernel: EXT4-fs: 0 scanned, 0 found I thought wow, 0 scanned, that's odd? But it's not odd; it's printing a variable that is initialized to 0 and never touched again. It's never been used since the original merge, so I don't really even know what the original intent was, either. If anyone knows how to hook it up, speak now via patch, otherwise just yank it so it's not making a confusing situation more confusing in kernel logs. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-20ext4: avoid possible overflow in ext4_map_blocks()Theodore Ts'o1-0/+6
The ext4_map_blocks() function returns the number of blocks which satisfying the caller's request. This number of blocks requested by the caller is specified by an unsigned integer, but the return value of ext4_map_blocks() is a signed integer (to accomodate error codes per the kernel's standard error signalling convention). Historically, overflows could never happen since mballoc() will refuse to allocate more than 2048 blocks at a time (which is something we should fix), and if the blocks were already allocated, the fact that there would be some number of intervening metadata blocks pretty much guaranteed that there could never be a contiguous region of data blocks that was greater than 2**31 blocks. However, this is now possible if there is a file system which is a bit bigger than 8TB, and is created using the new mke2fs hugeblock feature, which can create a perfectly contiguous file. In that case, if a userspace program attempted to call fallocate() on this already fully allocated file, it's possible that ext4_map_blocks() could return a number large enough that it would overflow a signed integer, resulting in a ext4 thinking that the ext4_map_blocks() call had failed with some strange error code. Since ext4_map_blocks() is always free to return a smaller number of blocks than what was requested by the caller, fix this by capping the number of blocks that ext4_map_blocks() will ever try to map to 2**31 - 1. In practice this should never get hit, except by someone deliberately trying to provke the above-described bug. Thanks to the PaX team for asking whethre this could possibly happen in some off-line discussions about using some static code checking technology they are developing to find bugs in kernel code. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-20ext4: make sure ex.fe_logical is initializedTheodore Ts'o1-1/+3
The lowest levels of mballoc set all of the fields of struct ext4_free_extent except for fe_logical, since they are just trying to find the requested free set of blocks, and the logical block hasn't been set yet. This makes some static code checkers sad. Set it to various different debug values, which would be useful when debugging mballoc if these values were to ever show up due to the parts of mballoc triyng to use ac->ac_b_ex.fe_logical before it is properly upper layers of mballoc failing to properly set, usually by ext4_mb_use_best_found(). Addresses-Coverity-Id: #139697 Addresses-Coverity-Id: #139698 Addresses-Coverity-Id: #139699 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-19ext4: don't calculate total xattr header size unless neededTheodore Ts'o1-4/+4
The function ext4_expand_extra_isize_ea() doesn't need the size of all of the extended attribute headers. So if we don't calculate it when it is unneeded, it we can skip some undeeded memory references, and as a bonus, we eliminate some kvetching by static code analysis tools. Addresses-Coverity-Id: #741291 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-19ext4: add ext4_es_store_pblock_status()Theodore Ts'o2-8/+15
Avoid false positives by static code analysis tools such as sparse and coverity caused by the fact that we set the physical block, and then the status in the extent_status structure. It is also more efficient to set both of these values at once. Addresses-Coverity-Id: #989077 Addresses-Coverity-Id: #989078 Addresses-Coverity-Id: #1080722 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
2014-02-19ext4: fix error return from ext4_ext_handle_uninitialized_extents()Eric Whitney1-2/+6
Commit 3779473246 breaks the return of error codes from ext4_ext_handle_uninitialized_extents() in ext4_ext_map_blocks(). A portion of the patch assigns that function's signed integer return value to an unsigned int. Consequently, negatively valued error codes are lost and can be treated as a bogus allocated block count. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-02-17ext4: address a benign compiler warningPatrick Palka1-1/+1
When !defined(CONFIG_EXT4_DEBUG), mb_debug() should be defined as a no_printk() statement instead of an empty statement in order to suppress the following compiler warning: fs/ext4/mballoc.c: In function ‘ext4_mb_cleanup_pa’: fs/ext4/mballoc.c:2659:47: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] mb_debug(1, "mballoc: %u PAs left\n", count); Signed-off-by: Patrick Palka <patrick@parcs.ath.cx> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-17jbd2: mark file-local functions as staticRashika Kheria1-3/+3
Mark functions as static in jbd2/journal.c because they are not used outside this file. This eliminates the following warning in jbd2/journal.c: fs/jbd2/journal.c:125:5: warning: no previous prototype for ‘jbd2_verify_csum_type’ [-Wmissing-prototypes] fs/jbd2/journal.c:146:5: warning: no previous prototype for ‘jbd2_superblock_csum_verify’ [-Wmissing-prototypes] fs/jbd2/journal.c:154:6: warning: no previous prototype for ‘jbd2_superblock_csum_set’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2014-02-17ext4: remove an unneeded check in mext_page_mkuptodate()Dan Carpenter1-2/+1
"err" is zero here, there is no need to check again. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-17ext4: clean up error handling in swap_inode_boot_loader()Theodore Ts'o1-18/+6
Tighten up the code to make the code easier to read and maintain. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-17ext4: Add __init marking to init_inodecacheFabian Frederick1-1/+1
init_inodecache is only called by __init init_ext4_fs. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2014-02-17jbd2: fix use after free in jbd2_journal_start_reserved()Dan Carpenter1-2/+4
If start_this_handle() fails then it leads to a use after free of "handle". Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-02-16ext4: don't leave i_crtime.tv_sec uninitializedTheodore Ts'o1-0/+2
If the i_crtime field is not present in the inode, don't leave the field uninitialized. Fixes: ef7f38359 ("ext4: Add nanosecond timestamps") Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-02-15ext4: fix online resize with a non-standard blocks per group settingTheodore Ts'o1-1/+1
The set_flexbg_block_bitmap() function assumed that the number of blocks in a blockgroup was sb->blocksize * 8, which is normally true, but not always! Use EXT4_BLOCKS_PER_GROUP(sb) instead, to fix block bitmap corruption after: mke2fs -t ext4 -g 3072 -i 4096 /dev/vdd 1G mount -t ext4 /dev/vdd /vdd resize2fs /dev/vdd 8G Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Jon Bernard <jbernard@tuxion.com> Cc: stable@vger.kernel.org
2014-02-15ext4: fix online resize with very large inode tablesTheodore Ts'o1-12/+20
If a file system has a large number of inodes per block group, all of the metadata blocks in a flex_bg may be larger than what can fit in a single block group. Unfortunately, ext4_alloc_group_tables() in resize.c was never tested to see if it would handle this case correctly, and there were a large number of bugs which caused the following sequence to result in a BUG_ON: kernel bug at fs/ext4/resize.c:409! ... call trace: [<ffffffff81256768>] ext4_flex_group_add+0x1448/0x1830 [<ffffffff81257de2>] ext4_resize_fs+0x7b2/0xe80 [<ffffffff8123ac50>] ext4_ioctl+0xbf0/0xf00 [<ffffffff811c111d>] do_vfs_ioctl+0x2dd/0x4b0 [<ffffffff811b9df2>] ? final_putname+0x22/0x50 [<ffffffff811c1371>] sys_ioctl+0x81/0xa0 [<ffffffff81676aa9>] system_call_fastpath+0x16/0x1b code: c8 4c 89 df e8 41 96 f8 ff 44 89 e8 49 01 c4 44 29 6d d4 0 rip [<ffffffff81254fa1>] set_flexbg_block_bitmap+0x171/0x180 This can be reproduced with the following command sequence: mke2fs -t ext4 -i 4096 /dev/vdd 1G mount -t ext4 /dev/vdd /vdd resize2fs /dev/vdd 8G To fix this, we need to make sure the right thing happens when a block group's inode table straddles two block groups, which means the following bugs had to be fixed: 1) Not clearing the BLOCK_UNINIT flag in the second block group in ext4_alloc_group_tables --- the was proximate cause of the BUG_ON. 2) Incorrectly determining how many block groups contained contiguous free blocks in ext4_alloc_group_tables(). 3) Incorrectly setting the start of the next block range to be marked in use after a discontinuity in setup_new_flex_group_blocks(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-02-12ext4: don't try to modify s_flags if the the file system is read-onlyTheodore Ts'o1-7/+13
If an ext4 file system is created by some tool other than mke2fs (perhaps by someone who has a pathalogical fear of the GPL) that doesn't set one or the other of the EXT2_FLAGS_{UN}SIGNED_HASH flags, and that file system is then mounted read-only, don't try to modify the s_flags field. Otherwise, if dm_verity is in use, the superblock will change, causing an dm_verity failure. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2014-02-12ext4: fix error paths in swap_inode_boot_loader() Zheng Liu1-1/+2
In swap_inode_boot_loader() we forgot to release ->i_mutex and resume unlocked dio for inode and inode_bl if there is an error starting the journal handle. This commit fixes this issue. Reported-by: Ahmed Tamrawi <ahmedtamrawi@gmail.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Dr. Tilmann Bubeck <t.bubeck@reinform.de> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org # v3.10+
2014-02-12ext4: fix xfstest generic/299 block validity failuresEric Whitney1-0/+1
Commit a115f749c1 (ext4: remove wait for unwritten extent conversion from ext4_truncate) exposed a bug in ext4_ext_handle_uninitialized_extents(). It can be triggered by xfstest generic/299 when run on a test file system created without a journal. This test continuously fallocates and truncates files to which random dio/aio writes are simultaneously performed by a separate process. The test completes successfully, but if the test filesystem is mounted with the block_validity option, a warning message stating that a logical block has been mapped to an illegal physical block is posted in the kernel log. The bug occurs when an extent is being converted to the written state by ext4_end_io_dio() and ext4_ext_handle_uninitialized_extents() discovers a mapping for an existing uninitialized extent. Although it sets EXT4_MAP_MAPPED in map->m_flags, it fails to set map->m_pblk to the discovered physical block number. Because map->m_pblk is not otherwise initialized or set by this function or its callers, its uninitialized value is returned to ext4_map_blocks(), where it is stored as a bogus mapping in the extent status tree. Since map->m_pblk can accidentally contain illegal values that are larger than the physical size of the file system, calls to check_block_validity() in ext4_map_blocks() that are enabled if the block_validity mount option is used can fail, resulting in the logged warning message. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org # 3.11+
2014-02-09Linux 3.14-rc2v3.14-rc2Linus Torvalds1-1/+1
2014-02-09Merge branch 'for-linus' of ↵Linus Torvalds2-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull SELinux fixes from James Morris. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: SELinux: Fix kernel BUG on empty security contexts. selinux: add SOCK_DIAG_BY_FAMILY to the list of netlink message types
2014-02-09Merge branch 'for-linus' of ↵Linus Torvalds8-31/+17
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "A couple of fixes, both -stable fodder. The O_SYNC bug is fairly old..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix a kmap leak in virtio_console fix O_SYNC|O_APPEND syncing the wrong range on write()
2014-02-10Merge branch 'stable-3.14' of git://git.infradead.org/users/pcmoore/selinux ↵James Morris2-0/+6
into for-linus
2014-02-09fix a kmap leak in virtio_consoleAl Viro1-6/+3
While we are at it, don't do kmap() under kmap_atomic(), *especially* for a page we'd allocated with GFP_KERNEL. It's spelled "page_address", and had that been more than that, we'd have a real trouble - kmap_high() can block, and doing that while holding kmap_atomic() is a Bad Idea(tm). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-02-09fix O_SYNC|O_APPEND syncing the wrong range on write()Al Viro7-25/+14
It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support) when sync_page_range() had been introduced; generic_file_write{,v}() correctly synced pos_after_write - written .. pos_after_write - 1 but generic_file_aio_write() synced pos_before_write .. pos_before_write + written - 1 instead. Which is not the same thing with O_APPEND, obviously. A couple of years later correct variant had been killed off when everything switched to use of generic_file_aio_write(). All users of generic_file_aio_write() are affected, and the same bug has been copied into other instances of ->aio_write(). The fix is trivial; the only subtle point is that generic_write_sync() ought to be inlined to avoid calculations useless for the majority of calls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-02-09Merge branch 'for-linus' of ↵Linus Torvalds4-8/+9
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This is a small collection of fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix data corruption when reading/updating compressed extents Btrfs: don't loop forever if we can't run because of the tree mod log btrfs: reserve no transaction units in btrfs_ioctl_set_features btrfs: commit transaction after setting label and features Btrfs: fix assert screwup for the pending move stuff
2014-02-09Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds14-63/+162
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Tooling fixes, mostly related to the KASLR fallout, but also other fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf buildid-cache: Check relocation when checking for existing kcore perf tools: Adjust kallsyms for relocated kernel perf tests: No need to set up ref_reloc_sym perf symbols: Prevent the use of kcore if the kernel has moved perf record: Get ref_reloc_sym from kernel map perf machine: Set up ref_reloc_sym in machine__create_kernel_maps() perf machine: Add machine__get_kallsyms_filename() perf tools: Add kallsyms__get_function_start() perf symbols: Fix symbol annotation for relocated kernel perf tools: Fix include for non x86 architectures perf tools: Fix AAAAARGH64 memory barriers perf tools: Demangle kernel and kernel module symbols too perf/doc: Remove mention of non-existent set_perf_event_pending() from design.txt
2014-02-08Btrfs: fix data corruption when reading/updating compressed extentsFilipe David Borba Manana1-0/+2
When using a mix of compressed file extents and prealloc extents, it is possible to fill a page of a file with random, garbage data from some unrelated previous use of the page, instead of a sequence of zeroes. A simple sequence of steps to get into such case, taken from the test case I made for xfstests, is: _scratch_mkfs _scratch_mount "-o compress-force=lzo" $XFS_IO_PROG -f -c "pwrite -S 0x06 -b 18670 266978 18670" $SCRATCH_MNT/foobar $XFS_IO_PROG -c "falloc 26450 665194" $SCRATCH_MNT/foobar $XFS_IO_PROG -c "truncate 542872" $SCRATCH_MNT/foobar $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar This results in the following file items in the fs tree: item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160 inode generation 6 transid 6 size 542872 block group 0 mode 100600 item 5 key (257 INODE_REF 256) itemoff 15863 itemsize 16 inode ref index 2 namelen 6 name: foobar item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53 extent data disk byte 0 nr 0 gen 6 extent data offset 0 nr 24576 ram 266240 extent compression 0 item 7 key (257 EXTENT_DATA 24576) itemoff 15757 itemsize 53 prealloc data disk byte 12849152 nr 241664 gen 6 prealloc data offset 0 nr 241664 item 8 key (257 EXTENT_DATA 266240) itemoff 15704 itemsize 53 extent data disk byte 12845056 nr 4096 gen 6 extent data offset 0 nr 20480 ram 20480 extent compression 2 item 9 key (257 EXTENT_DATA 286720) itemoff 15651 itemsize 53 prealloc data disk byte 13090816 nr 405504 gen 6 prealloc data offset 0 nr 258048 The on disk extent at offset 266240 (which corresponds to 1 single disk block), contains 5 compressed chunks of file data. Each of the first 4 compress 4096 bytes of file data, while the last one only compresses 3024 bytes of file data. Therefore a read into the file region [285648 ; 286720[ (length = 4096 - 3024 = 1072 bytes) should always return zeroes (our next extent is a prealloc one). The solution here is the compression code path to zero the remaining (untouched) bytes of the last page it uncompressed data into, as the information about how much space the file data consumes in the last page is not known in the upper layer fs/btrfs/extent_io.c:__do_readpage(). In __do_readpage we were correctly zeroing the remainder of the page but only if it corresponds to the last page of the inode and if the inode's size is not a multiple of the page size. This would cause not only returning random data on reads, but also permanently storing random data when updating parts of the region that should be zeroed. For the example above, it means updating a single byte in the region [285648 ; 286720[ would store that byte correctly but also store random data on disk. A test case for xfstests follows soon. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08Btrfs: don't loop forever if we can't run because of the tree mod logJosef Bacik1-0/+1
A user reported a 100% cpu hang with my new delayed ref code. Turns out I forgot to increase the count check when we can't run a delayed ref because of the tree mod log. If we can't run any delayed refs during this there is no point in continuing to look, and we need to break out. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08btrfs: reserve no transaction units in btrfs_ioctl_set_featuresDavid Sterba1-1/+1
Added in patch "btrfs: add ioctls to query/change feature bits online" modifications to superblock don't need to reserve metadata blocks when starting a transaction. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08btrfs: commit transaction after setting label and featuresJeff Mahoney1-2/+2
The set_fslabel ioctl uses btrfs_end_transaction, which means it's possible that the change will be lost if the system crashes, same for the newly set features. Let's use btrfs_commit_transaction instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08Btrfs: fix assert screwup for the pending move stuffJosef Bacik1-5/+3
Wang noticed that he was failing btrfs/030 even though me and Filipe couldn't reproduce. Turns out this is because Wang didn't have CONFIG_BTRFS_ASSERT set, which meant that a key part of Filipe's original patch was not being built in. This appears to be a mess up with merging Filipe's patch as it does not exist in his original patch. Fix this by changing how we make sure del_waiting_dir_move asserts that it did not error and take the function out of the ifdef check. This makes btrfs/030 pass with the assert on or off. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Filipe Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08Merge tag 'pinctrl-v3.14-2' of ↵Linus Torvalds6-15/+32
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pinctrl fixes from Linus Walleij: "First round of pin control fixes for v3.14: - Protect pinctrl_list_add() with the proper mutex. This was identified by RedHat. Caused nasty locking warnings was rootcased by Stanislaw Gruszka. - Avoid adding dangerous debugfs files when either half of the subsystem is unused: pinmux or pinconf. - Various fixes to various drivers: locking, hardware particulars, DT parsing, error codes" * tag 'pinctrl-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: tegra: return correct error type pinctrl: do not init debugfs entries for unimplemented functionalities pinctrl: protect pinctrl_list add pinctrl: sirf: correct the pin index of ac97_pins group pinctrl: imx27: fix offset calculation in imx_read_2bit pinctrl: vt8500: Change devicetree data parsing pinctrl: imx27: fix wrong offset to ICONFB pinctrl: at91: use locked variant of irq_set_handler
2014-02-08Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "Add a missing Kconfig dependency" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Generic irq chip requires IRQ_DOMAIN
2014-02-08Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds18-84/+138
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "Quite a varied little collection of fixes. Most of them are relatively small or isolated; the biggest one is Mel Gorman's fixes for TLB range flushing. A couple of AMD-related fixes (including not crashing when given an invalid microcode image) and fix a crash when compiled with gcov" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode, AMD: Unify valid container checks x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y x86/efi: Allow mapping BGRT on x86-32 x86: Fix the initialization of physnode_map x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable() x86/intel/mid: Fix X86_INTEL_MID dependencies arch/x86/mm/srat: Skip NUMA_NO_NODE while parsing SLIT mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge x86: mm: change tlb_flushall_shift for IvyBridge x86/mm: Eliminate redundant page table walk during TLB range flushing x86/mm: Clean up inconsistencies when flushing TLB ranges mm, x86: Account for TLB flushes only when debugging x86/AMD/NB: Fix amd_set_subcaches() parameter type x86/quirks: Add workaround for AMD F16h Erratum792 x86, doc, kconfig: Fix dud URL for Microcode data
2014-02-08Merge tag 'jfs-3.14-rc2' of git://github.com/kleikamp/linux-shaggyLinus Torvalds1-7/+7
Pull jfs fix from David Kleikamp: "Fix regression" * tag 'jfs-3.14-rc2' of git://github.com/kleikamp/linux-shaggy: jfs: fix generic posix ACL regression