summaryrefslogtreecommitdiffstats
path: root/fs/gfs2
AgeCommit message (Collapse)AuthorFilesLines
2017-10-31gfs2: Fix xattr fsyncAndreas Gruenbacher1-32/+8
Make sure that changing xattrs marks the corresponding inode dirty so that a subsequent fsync will sync those changes to disk. We set I_DIRTY_SYNC as well as I_DIRTY_DATASYNC so that both fsync and fdatasync will sync xattr changes: xattrs can contain information critical to how the data can be accessed, so we don't want fdatasync to skip them. Fixes xfstest generic/066. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31GFS2: Take inode off order_write list when setting jdata flagBob Peterson1-1/+3
This patch fixes a deadlock caused when the jdata flag is set for inodes that are already on the ordered write list. Since it is on the ordered write list, log_flush calls gfs2_ordered_write which calls filemap_fdatawrite. But since the inode had the jdata flag set, that calls gfs2_jdata_writepages, which tries to start a new transaction. A new transaction cannot be started because it tries to acquire the log_flush rwsem which is already locked by the log flush operation. The bottom line is: We cannot switch an inode from ordered to jdata until we eliminate any ordered data pages (via log flush) or any log_flush operation afterward will create the circular dependency above. So we need to flush the log before setting the diskflags to switch the file mode, then we need to remove the inode from the ordered writes list. Before this patch, the log flush was done for jdata->ordered, but that's wrong. If we're going from jdata to ordered, we don't need to call gfs2_log_flush because the call to filemap_fdatawrite will do it for us: filemap_fdatawrite() -> __filemap_fdatawrite_range() __filemap_fdatawrite_range() -> do_writepages() do_writepages() -> gfs2_jdata_writepages() gfs2_jdata_writepages() -> gfs2_log_flush() This patch modifies function do_gfs2_set_flags so that if a file has its jdata flag set, and it's already on the ordered write list, the log will be flushed and it will be removed from the list before setting the flag. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Abhijith Das <adas@redhat.com>
2017-10-31GFS2: flush the log and all pages for jdata as we do for WB_SYNC_ALLBob Peterson1-2/+3
In function gfs2_write_inode, starting with patch a9185b41a4f84, we only flush the log and call filemap_fdatawait if we're passed in a wbc sync_mode of WB_SYNC_ALL. We also need to do these things if we're evicting a jdata inode, because we might have jdata pages still attached to bufdata descriptors that need to be revoked, but by the time it gets to evict() it's too late to start a new transaction. This patch changes it to treat jdata inodes as if WB_SYNC_ALL had been specified. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Abhijith Das <adas@redhat.com>
2017-10-31gfs2: Implement SEEK_HOLE / SEEK_DATA via iomapAndreas Gruenbacher3-3/+54
So far, lseek on gfs2 did not report holes. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-10-31GFS2: Switch fiemap implementation to use iomapBob Peterson2-25/+10
This patch switches GFS2's implementation of fiemap from the old block_map code to the new iomap interface. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-31GFS2: Implement iomap for block_mapBob Peterson3-68/+274
This patch implements iomap for block mapping, and switches the block_map function to use it under the covers. The additional IOMAP_F_BOUNDARY iomap flag indicates when iomap has reached a "metadata boundary" and fetching the next mapping is likely to incur an additional I/O. This flag is used for setting the bh buffer boundary flag. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-31GFS2: Make height info part of metapathBob Peterson1-47/+62
This patch eliminates height parameters from function gfs2_bmap_alloc. Function find_metapath determines the metapath's "find height", also known as the desired height. Function lookup_metapath determines the metapath's "actual height", previously known as starting height or sheight. Function gfs2_bmap_alloc now gets both height values from the metapath. This simplification was done as a step toward switching the block_map functions to using iomap. The bh_map responsibilities are also removed from function gfs2_bmap_alloc for the same reason. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-16Merge branch 'master' of ↵Andreas Gruenbacher6-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
2017-09-25Merge tag 'gfs2-for-linus-4.14-rc3' of ↵Linus Torvalds1-9/+5
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Bob Peterson: "GFS2: Fix an old regression in GFS2's debugfs interface This fixes a regression introduced by commit 88ffbf3e037e ("GFS2: Use resizable hash table for glocks"). The regression caused the glock dump in debugfs to not report all the glocks, which makes debugging extremely difficult" * tag 'gfs2-for-linus-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix debugfs glocks dump
2017-09-25gfs2: Always update inode ctime in set_aclAndreas Gruenbacher1-0/+1
Three-entry POSIX ACLs can be stored in the file mode permission bits, with no need to store them in extended attributes. When a process sets such a minimal ACL, the kernel updates the file mode like chmod does, and removes any existing extended attributes for that ACL. Make sure the ctime is always updated in that case. Fixes xfstest generic/307. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25gfs2: Support negative atimesAndreas Gruenbacher1-1/+2
When inodes are read from disk, GFS2 will only update in-memory atimes older than the on-disk atimes; this prevents atimes from going backwards. The atimes of newly allocated inodes are initialized to 0. This means that when an atime is explicitly set to a negative value, this value will not persist. Fix by setting the atime of newly allocated inodes to the lowest possible value instead of 0. Fixes xfstest generic/258. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25gfs2: Update ctime in setflags ioctlAndreas Gruenbacher1-0/+1
The FS_IOC_SETFLAGS ioctl is supposed to update the inode ctime. Fixes xfstests generic/277. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25gfs2: Clarify gfs2_block_mapAndreas Gruenbacher1-4/+10
Add a comment about the logical block size for directories. Rename "bsize" in gfs2_block_map to "factor". Fix a typo in the description of metaptr1. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25gfs2: Fix debugfs glocks dumpAndreas Gruenbacher1-9/+5
The switch to rhashtables (commit 88ffbf3e03) broke the debugfs glock dump (/sys/kernel/debug/gfs2/<device>/glocks) for dumps bigger than a single buffer: the right function for restarting an rhashtable iteration from the beginning of the hash table is rhashtable_walk_enter; rhashtable_walk_stop + rhashtable_walk_start will just resume from the current position. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Cc: stable@vger.kernel.org # v4.3+
2017-09-14Merge branch 'work.mount' of ↵Linus Torvalds6-8/+8
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mount flag updates from Al Viro: "Another chunk of fmount preparations from dhowells; only trivial conflicts for that part. It separates MS_... bits (very grotty mount(2) ABI) from the struct super_block ->s_flags (kernel-internal, only a small subset of MS_... stuff). This does *not* convert the filesystems to new constants; only the infrastructure is done here. The next step in that series is where the conflicts would be; that's the conversion of filesystems. It's purely mechanical and it's better done after the merge, so if you could run something like list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$') sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \ -e 's/\<MS_NOSUID\>/SB_NOSUID/g' \ -e 's/\<MS_NODEV\>/SB_NODEV/g' \ -e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \ -e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \ -e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \ -e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \ -e 's/\<MS_NOATIME\>/SB_NOATIME/g' \ -e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \ -e 's/\<MS_SILENT\>/SB_SILENT/g' \ -e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \ -e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \ -e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \ -e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \ $list and commit it with something along the lines of 'convert filesystems away from use of MS_... constants' as commit message, it would save a quite a bit of headache next cycle" * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: VFS: Differentiate mount flags (MS_*) from internal superblock flags VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
2017-09-07Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-blockLinus Torvalds3-3/+3
Pull block layer updates from Jens Axboe: "This is the first pull request for 4.14, containing most of the code changes. It's a quiet series this round, which I think we needed after the churn of the last few series. This contains: - Fix for a registration race in loop, from Anton Volkov. - Overflow complaint fix from Arnd for DAC960. - Series of drbd changes from the usual suspects. - Conversion of the stec/skd driver to blk-mq. From Bart. - A few BFQ improvements/fixes from Paolo. - CFQ improvement from Ritesh, allowing idling for group idle. - A few fixes found by Dan's smatch, courtesy of Dan. - A warning fixup for a race between changing the IO scheduler and device remova. From David Jeffery. - A few nbd fixes from Josef. - Support for cgroup info in blktrace, from Shaohua. - Also from Shaohua, new features in the null_blk driver to allow it to actually hold data, among other things. - Various corner cases and error handling fixes from Weiping Zhang. - Improvements to the IO stats tracking for blk-mq from me. Can drastically improve performance for fast devices and/or big machines. - Series from Christoph removing bi_bdev as being needed for IO submission, in preparation for nvme multipathing code. - Series from Bart, including various cleanups and fixes for switch fall through case complaints" * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits) kernfs: checking for IS_ERR() instead of NULL drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set drbd: Fix allyesconfig build, fix recent commit drbd: switch from kmalloc() to kmalloc_array() drbd: abort drbd_start_resync if there is no connection drbd: move global variables to drbd namespace and make some static drbd: rename "usermode_helper" to "drbd_usermode_helper" drbd: fix race between handshake and admin disconnect/down drbd: fix potential deadlock when trying to detach during handshake drbd: A single dot should be put into a sequence. drbd: fix rmmod cleanup, remove _all_ debugfs entries drbd: Use setup_timer() instead of init_timer() to simplify the code. drbd: fix potential get_ldev/put_ldev refcount imbalance during attach drbd: new disk-option disable-write-same drbd: Fix resource role for newly created resources in events2 drbd: mark symbols static where possible drbd: Send P_NEG_ACK upon write error in protocol != C drbd: add explicit plugging when submitting batches drbd: change list_for_each_safe to while(list_first_entry_or_null) drbd: introduce drbd_recv_header_maybe_unplug ...
2017-09-06Merge tag 'wberr-v4.14-1' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull writeback error handling updates from Jeff Layton: "This pile continues the work from last cycle on better tracking writeback errors. In v4.13 we added some basic errseq_t infrastructure and converted a few filesystems to use it. This set continues refining that infrastructure, adds documentation, and converts most of the other filesystems to use it. The main exception at this point is the NFS client" * tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: ecryptfs: convert to file_write_and_wait in ->fsync mm: remove optimizations based on i_size in mapping writeback waits fs: convert a pile of fsync routines to errseq_t based reporting gfs2: convert to errseq_t based writeback error reporting for fsync fs: convert sync_file_range to use errseq_t based error-tracking mm: add file_fdatawait_range and file_write_and_wait fuse: convert to errseq_t based error tracking for fsync mm: consolidate dax / non-dax checks for writeback Documentation: add some docs for errseq_t errseq: rename __errseq_set to errseq_set
2017-09-06Merge tag 'gfs2-4.14.fixes' of ↵Linus Torvalds20-110/+321
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Bob Peterson: "We've got a whopping 29 GFS2 patches for this merge window, mainly because we held some back from the previous merge window until we could get them perfected and well tested. We have a couple patch sets, including my patch set for protecting glock gl_object and Andreas Gruenbacher's patch set to fix the long-standing shrink- slab hang, plus a bunch of assorted bugs and cleanups. Summary: - I fixed a bug whereby an IO error would lead to a double-brelse. - Andreas Gruenbacher made a minor cleanup to call his relatively new function, gfs2_holder_initialized, rather than doing it manually. This was just missed by a previous patch set. - Jan Kara fixed a bug whereby the SGID was being cleared when inheriting ACLs. - Andreas found a bug and fixed it in his previous patch, "Get rid of flush_delayed_work in gfs2_evict_inode". A call to flush_delayed_work was deleted from *gfs2_inode_lookup and added to gfs2_create_inode. - Wang Xibo found and fixed a list_add call in inode_go_lock that specified the parameters in the wrong order. - Coly Li submitted a patch to add the REQ_PRIO to some of GFS2's metadata reads that were accidentally missing them. - I submitted a 4-patch set to protect the glock gl_object field. GFS2 was setting and checking gl_object with no locking mechanism, so the value was occasionally stomped on, which caused file system corruption. - I submitted a small cleanup to function gfs2_clear_rgrpd. It was needlessly adding rgrp glocks to the lru list, then pulling them back off immediately. The rgrp glocks don't use the lru list anyway, so doing so was just a waste of time. - I submitted a patch that checks the GLOF_LRU flag on a glock before trying to remove it from the lru_list. This avoids a lot of unnecessary spin_lock contention. - I submitted a patch to delete GFS2's debugfs files only after we evict all the glocks. Before this patch, GFS2 would delete the debugfs files, and if unmount hung waiting for a glock, there was no way to debug the problem. Now, if a hang occurs during umount, we can examine the debugfs files to figure out why it's hung. - Andreas Gruenbacher submitted a patch to fix some trivial typos. - Andreas also submitted a five-part patch set to fix the longstanding hang involving the slab shrinker: dlm requires memory, calls the inode shrinker, which calls gfs2's evict, which calls back into DLM before it can evict an inode. - Abhi Das submitted a patch to forcibly flush the active items list to relieve memory pressure. This fixes a long-standing bug whereby GFS2 was getting hung permanently in balance_dirty_pages. - Thomas Tai submitted a patch to fix a slab corruption problem due to a residual pointer left in the lock_dlm lockstruct. - I submitted a patch to withdraw the file system if IO errors are encountered while writing to the journals or statfs system file which were previously not being sent back up. Before, some IO errors were sometimes not be detected for several hours, and at recovery time, the journal errors made journal replay impossible. - Andreas has a patch to fix an annoying format-truncation compiler warning so GFS2 compiles cleanly. - I have a patch that fixes a handful of sparse compiler warnings. - Andreas fixed up an useless gl_object warning caused by an earlier patch. - Arvind Yadav added a patch to properly constify our rhashtable params declare. - I added a patch to fix a regression caused by the non-recursive delete and truncate patch that caused file system blocks to not be properly freed. - Ernesto A. Fernández added a patch to fix a place where GFS2 would send back the wrong return code setting extended attributes. - Ernesto also added a patch to fix a case in which GFS2 was improperly setting an inode's i_mode, potentially granting access to the wrong users" * tag 'gfs2-4.14.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (29 commits) gfs2: preserve i_mode if __gfs2_set_acl() fails gfs2: don't return ENODATA in __gfs2_xattr_set unless replacing GFS2: Fix non-recursive truncate bug gfs2: constify rhashtable_params GFS2: Fix gl_object warnings GFS2: Fix up some sparse warnings gfs2: Silence gcc format-truncation warning GFS2: Withdraw for IO errors writing to the journal or statfs gfs2: fix slab corruption during mounting and umounting gfs file system gfs2: forcibly flush ail to relieve memory pressure gfs2: Clean up waiting on glocks gfs2: Defer deleting inodes under memory pressure gfs2: gfs2_evict_inode: Put glocks asynchronously gfs2: Get rid of gfs2_set_nlink gfs2: gfs2_glock_get: Wait on freeing glocks gfs2: Fix trivial typos GFS2: Delete debugfs files only after we evict the glocks GFS2: Don't waste time locking lru_lock for non-lru glocks GFS2: Don't bother trying to add rgrps to the lru list GFS2: Clear gl_object when deleting an inode in gfs2_delete_inode ...
2017-08-31gfs2: preserve i_mode if __gfs2_set_acl() failsErnesto A. Fernández1-5/+8
When changing a file's acl mask, __gfs2_set_acl() will first set the group bits of i_mode to the value of the mask, and only then set the actual extended attribute representing the new acl. If the second part fails (due to lack of space, for example) and the file had no acl attribute to begin with, the system will from now on assume that the mask permission bits are actual group permission bits, potentially granting access to the wrong users. Prevent this by only changing the inode mode after the acl has been set. Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-31gfs2: don't return ENODATA in __gfs2_xattr_set unless replacingErnesto A. Fernández1-2/+6
The function __gfs2_xattr_set() will return -ENODATA when called to remove a xattr that does not exist. The result is that setfacl will show an exit status of 1 when called to set only a file's mode bits (on a file with no ACLs), despite succeeding. A "No data available" error will be printed as well. To fix this return 0 instead, except when the XATTR_REPLACE flag is set, in which case -ENODATA is appropriate. This is consistent with how most other xattr setting functions work, in other filesystems. Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-30GFS2: Fix non-recursive truncate bugBob Peterson1-3/+16
Before this patch if you truncated a file to a smaller size it wasn't freeing all the blocks properly. There are two reasons. First, the metapath comparison was not comparing previous heights. I added a function, mp_eq_to_hgt, which checks the metapath at all heights prior to the target height. Second, in function find_nonnull_ptr, it needed to zero out all pointers for heights following the target height. Translated into decimal integer terms, this way a number like 299, when incremented, becomes 300, not 399. The 2 gets incremented to 3, and the following digits need to be reset. These two things allow the truncate state machine to properly find the blocks it needs to delete. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-30gfs2: constify rhashtable_paramsArvind Yadav1-1/+1
rhashtable_params are not supposed to change at runtime. All Functions rhashtable_* working with const rhashtable_params provided by <linux/rhashtable.h>. So mark the non-const structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-30GFS2: Fix gl_object warningsAndreas Gruenbacher1-1/+1
The following cleanup is needed to avoid spilling the syslog with false warnings. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-25GFS2: Fix up some sparse warningsBob Peterson4-6/+9
This patch cleans up various pieces of GFS2 to avoid sparse errors. This doesn't fix them all, but it fixes several. The first error, in function glock_hash_walk was a genuine bug where the rhashtable could be started and not stopped. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-25gfs2: Silence gcc format-truncation warningAndreas Gruenbacher2-4/+4
Enlarge sd_fsname to be big enough for the longest long lock table name and an arbitrary journal number. This silences two -Wformat-truncation warnings with gcc 7.1.1. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-25GFS2: Withdraw for IO errors writing to the journal or statfsBob Peterson5-5/+21
Before this patch, if GFS2 encountered IO errors while writing to the journal, it would not report the problem, so they would go unnoticed, sometimes for many hours. Sometimes this would only be noticed later, when recovery tried to do journal replay and failed due to invalid metadata at the blocks that resulted in IO errors. This patch makes GFS2's log daemon check for IO errors. If it encounters one, it withdraws from the file system and reports why in dmesg. A similar action is taken when IO errors occur when writing to the system statfs file. These errors are also reported back to any callers of fsync, since that requires the journal to be flushed. Therefore, any IO errors that would previously go unnoticed are now noticed and the file system is withdrawn as early as possible, thus preventing further file system damage. Also note that this reintroduces superblock variable sd_log_error, which Christoph removed with commit f729b66fca. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-23block: replace bi_bdev with a gendisk pointer and partitions indexChristoph Hellwig3-3/+3
This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-15gfs2: fix slab corruption during mounting and umounting gfs file systemThomas Tai1-0/+1
When using cman-3.0.12.1 and gfs2-utils-3.0.12.1, mounting and unmounting GFS2 file system would cause kernel to hang. The slab allocator suggests that it is likely a double free memory corruption. The issue is traced back to v3.9-rc6 where a patch is submitted to use kzalloc() for storing a bitmap instead of using a local variable. The intention is to allocate memory during mount and to free memory during unmount. The original patch misses a code path which has already freed the memory and caused memory corruption. This patch sets the memory pointer to NULL after the memory is freed, so that double free memory corruption will not happen. gdlm_mount() '-- set_recover_size() which use kzalloc() '-- if dlm does not support ops callbacks then '--- free_recover_size() which use kfree() gldm_unmount() '-- free_recover_size() which use kfree() Previous patch which introduced the double free issue is commit 57c7310b8eb9 ("GFS2: use kmalloc for lvb bitmap") Signed-off-by: Thomas Tai <thomas.tai@oracle.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
2017-08-10gfs2: forcibly flush ail to relieve memory pressureAbhi Das3-1/+18
On systems with low memory, it is possible for gfs2 to infinitely loop in balance_dirty_pages() under heavy IO (creating sparse files). balance_dirty_pages() attempts to write out the dirty pages via gfs2_writepages() but none are found because these dirty pages are being used by the journaling code in the ail. Normally, the journal has an upper threshold which when hit triggers an automatic flush of the ail. But this threshold can be higher than the number of allowable dirty pages and result in the ail never being flushed. This patch forces an ail flush when gfs2_writepages() fails to write anything. This is a good indication that the ail might be holding some dirty pages. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10gfs2: Clean up waiting on glocksAndreas Gruenbacher1-20/+7
The prepare_to_wait_on_glock and finish_wait_on_glock functions introduced in commit 56a365be "gfs2: gfs2_glock_get: Wait on freeing glocks" are better removed, resulting in cleaner code. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10gfs2: Defer deleting inodes under memory pressureAndreas Gruenbacher1-0/+21
When under memory pressure and an inode's link count has dropped to zero, defer deleting the inode to the delete workqueue. This avoids calling into DLM under memory pressure, which can deadlock. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10gfs2: gfs2_evict_inode: Put glocks asynchronouslyAndreas Gruenbacher3-3/+39
gfs2_evict_inode is called to free inodes under memory pressure. The function calls into DLM when an inode's last cluster-wide reference goes away (remote unlink) and to release the glock and associated DLM lock before finally destroying the inode. However, if DLM is blocked on memory to become available, calling into DLM again will deadlock. Avoid that by decoupling releasing glocks from destroying inodes in that case: with gfs2_glock_queue_put, glocks will be dequeued asynchronously in work queue context, when the associated inodes have likely already been destroyed. With this change, inodes can end up being unlinked, remote-unlink can be triggered, and then the inode can be reallocated before all remote-unlink callbacks are processed. To detect that, revalidate the link count in gfs2_evict_inode to make sure we're not deleting an allocated, referenced inode. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10gfs2: Get rid of gfs2_set_nlinkAndreas Gruenbacher1-27/+1
Remove gfs2_set_nlink which prevents the link count of an inode from becoming non-zero once it has reached zero. The next commit reduces the amount of waiting on glocks when an inode is evicted from memory. With that, an inode can become reallocated before all the remote-unlink callbacks from a previous delete are processed, which causes the link count to change from zero to non-zero. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10gfs2: gfs2_glock_get: Wait on freeing glocksAndreas Gruenbacher1-22/+104
Keep glocks in their hash table until they are freed instead of removing them when their last reference is dropped. This allows to wait for any previous instances of a glock to go away in gfs2_glock_get before creating a new glocks. Special thanks to Andy Price for finding and fixing a problem which also required us to delete the rcu_read_unlock from the error case in function gfs2_glock_get. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09gfs2: Fix trivial typosAndreas Gruenbacher2-2/+2
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09GFS2: Delete debugfs files only after we evict the glocksBob Peterson2-1/+1
This patch moves the call to gfs2_delete_debugfs_file so that it comes after the glock hash table has been cleared. This way we can query the debugfs files if umount hangs. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09GFS2: Don't waste time locking lru_lock for non-lru glocksBob Peterson1-0/+3
Before this patch, glock_dq would call gfs2_glock_remove_from_lru. For glocks that are never put on the LRU, such as the transaction glock, this just takes the spin_lock, determines there's nothing to be done because the list is empty, then unlocks again. This was causing unnecessary lock contention on the lru_lock spin_lock. This patch adds a check for GLOF_LRU in the glops before taking the spin_lock. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09GFS2: Don't bother trying to add rgrps to the lru listBob Peterson1-1/+0
This patch removes a call to gfs2_glock_add_to_lru from function gfs2_clear_rgrpd. The call is just a waste of time because as soon as it adds it to the lru_list, the call to gfs2_glock_put takes it back off again. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-09GFS2: Clear gl_object when deleting an inode in gfs2_delete_inodeBob Peterson1-1/+10
This patch adds some calls to clear gl_object in function gfs2_delete_inode. Since we are deleting the inode, and the glock typically outlives the inode in core, we must clear gl_object so subsequent use of the glock (e.g. for a new inode in its place) will not have the old pointer sitting there. In error cases we need to tidy up after ourselves. In non-error cases, we need to clear gl_object before we set the block free in the bitmap so residules aren't left for potential inode creators. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-08-09GFS2: Clear gl_object if gfs2_create_inode failsBob Peterson1-1/+4
If function gfs2_create_inode fails after the inode has been created (for example, if the inode_refresh fails for some reason) the function was setting gl_object but never clearing it again. The glocks are left pointing to a freed inode. This patch adds the calls to clear gl_object in the appropriate error paths. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-08-01gfs2: convert to errseq_t based writeback error reporting for fsyncJeff Layton1-2/+4
Also, fix a place where a writeback error might get dropped in the gfs2_is_jdata case. Signed-off-by: Jeff Layton <jlayton@redhat.com>
2017-07-21GFS2: Set gl_object in inode lookup only after block type checkBob Peterson1-2/+2
Before this patch, the inode glock's gl_object was set after a reference was acquired, but before the block type was verified. In cases where the block was unlinked, then freed and reused on another node, a residule delete callback (delete_work) would try to look up the inode, eventually failing the block check, but only after it overwrites gl_object with a pointer to the wrong inode. This patch moves the assignment of gl_object after the block check so it won't be improperly overwritten. Likewise, at the end of the function, gfs2_inode_lookup was clearing gl_object after it unlocked the glock, which meant another process might free the glock in the meantime. This patch guards against that case. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-07-21GFS2: Introduce helper for clearing gl_objectBob Peterson3-4/+38
This patch introduces a new helper function in glock.h that clears gl_object, with an added integrity check. An additional integrity check has been added to glock_set_object, plus comments. This is step 1 in a series to ensure gl_object integrity. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-07-21gfs2: add flag REQ_PRIO for metadata I/OColy Li4-6/+11
When gfs2 does metadata I/O, only REQ_META is used as a metadata hint of the bio. But flag REQ_META is just a hint for block trace, not for block layer code to handle a bio as metadata request. For some of metadata I/Os of gfs2, A REQ_PRIO flag on the metadata bio would be very informative to block layer code. For example, if bcache is used as a I/O cache for gfs2, it will be possible for bcache code to get the hint and cache the pre-fetched metadata blocks on cache device. This behavior may be helpful to improve metadata I/O performance if the following requests hit the cache. Here are the locations in gfs2 code where a REQ_PRIO flag should be added, - All places where REQ_READAHEAD is used, gfs2 code uses this flag for metadata read ahead. - In gfs2_meta_rq() where the first metadata block is read in. - In gfs2_write_buf_to_page(), read in quota metadata blocks to have them up to date. These metadata blocks are probably to be accessed again in future, adding a REQ_PRIO flag may have bcache to keep such metadata in fast cache device. For system without a cache layer, REQ_PRIO can still provide hint to block layer to handle metadata requests more properly. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-21GFS2: fix code parameter error in inode_go_lockWang Xibo1-1/+1
In inode_go_lock() function, the parameter order of list_add() is error. According to the define of list_add(), the first parameter is new entry and the second is the list head, so ip->i_trunc_list should be the first parameter and the sdp->sd_trunc_list should be second. Signed-off-by: Wang Xibo<wang.xibo@zte.com.cn> Signed-off-by: Xiao Likun<xiao.likun@zte.com.cn> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-19gfs2: Fixup to "Get rid of flush_delayed_work in gfs2_evict_inode"Andreas Gruenbacher1-2/+2
When commit 4fd1a57952 moved the call to flush_delayed_work from gfs2_evict_inode to gfs2_inode_lookup to avoid calling into DLM during evict, a similar call should have been added to gfs2_create_inode: that's another code path in which glocks of previous inodes may be reused. The flush of the iopen glock work queue added by 4fd1a57952, on the other hand, is unnecessary and can be removed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-19gfs2: Don't clear SGID when inheriting ACLsJan Kara1-13/+14
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit set, DIR1 is expected to have SGID bit set (and owning group equal to the owning group of 'DIR0'). However when 'DIR0' also has some default ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on 'DIR1' to get cleared if user is not member of the owning group. Fix the problem by moving posix_acl_update_mode() out of __gfs2_set_acl() into gfs2_set_acl(). That way the function will not be called when inheriting ACLs which is what we want as it prevents SGID bit clearing and the mode has been properly set by posix_acl_create() anyway. Fixes: 073931017b49d9458aa351605b43a7e34598caef Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-17gfs2: Lock holder cleanup (fixup)Andreas Gruenbacher1-2/+1
Function gfs2_holder_initialized should be used in do_flock as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-17GFS2: Prevent double brelse in gfs2_meta_indirect_bufferBob Peterson1-1/+2
Before this patch, problems reading in indirect buffers would send an IO error back to the caller, and release the buffer_head with brelse() in function gfs2_meta_indirect_buffer, however, it would still return the address of the buffer_head it released. After the error was discovered, function gfs2_block_map would call function release_metapath to free all buffers. That checked: if (mp->mp_bh[i] == NULL) but since the value was set after the error, it was non-zero, so brelse was called a second time. This resulted in the following error: kernel: WARNING: at fs/buffer.c:1224 __brelse+0x3a/0x40() (Tainted: G W -- ------------ ) kernel: Hardware name: RHEV Hypervisor kernel: VFS: brelse: Trying to free free buffer This patch changes gfs2_meta_indirect_buffer so it only sets the buffer_head pointer in cases where it isn't released. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2017-07-17VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)David Howells6-8/+8
Firstly by applying the following with coccinelle's spatch: @@ expression SB; @@ -SB->s_flags & MS_RDONLY +sb_rdonly(SB) to effect the conversion to sb_rdonly(sb), then by applying: @@ expression A, SB; @@ ( -(!sb_rdonly(SB)) && A +!sb_rdonly(SB) && A | -A != (sb_rdonly(SB)) +A != sb_rdonly(SB) | -A == (sb_rdonly(SB)) +A == sb_rdonly(SB) | -!(sb_rdonly(SB)) +!sb_rdonly(SB) | -A && (sb_rdonly(SB)) +A && sb_rdonly(SB) | -A || (sb_rdonly(SB)) +A || sb_rdonly(SB) | -(sb_rdonly(SB)) != A +sb_rdonly(SB) != A | -(sb_rdonly(SB)) == A +sb_rdonly(SB) == A | -(sb_rdonly(SB)) && A +sb_rdonly(SB) && A | -(sb_rdonly(SB)) || A +sb_rdonly(SB) || A ) @@ expression A, B, SB; @@ ( -(sb_rdonly(SB)) ? 1 : 0 +sb_rdonly(SB) | -(sb_rdonly(SB)) ? A : B +sb_rdonly(SB) ? A : B ) to remove left over excess bracketage and finally by applying: @@ expression A, SB; @@ ( -(A & MS_RDONLY) != sb_rdonly(SB) +(bool)(A & MS_RDONLY) != sb_rdonly(SB) | -(A & MS_RDONLY) == sb_rdonly(SB) +(bool)(A & MS_RDONLY) == sb_rdonly(SB) ) to make comparisons against the result of sb_rdonly() (which is a bool) work correctly. Signed-off-by: David Howells <dhowells@redhat.com>