summaryrefslogtreecommitdiffstats
path: root/fs/gfs2/log.c
AgeCommit message (Collapse)AuthorFilesLines
2019-06-27gfs2: eliminate tr_num_revoke_rmBob Peterson1-2/+1
For its journal processing, gfs2 kept track of the number of buffers added and removed on a per-transaction basis. These values are used to calculate space needed in the journal. But while these calculations make sense for the number of buffers, they make no sense for revokes. Revokes are managed in their own list, linked from the superblock. So it's entirely unnecessary to keep separate per-transaction counts for revokes added and removed. A single count will do the same job. Therefore, this patch combines the transaction revokes into a single count. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-08Merge tag 'spdx-5.2-rc4' of ↵Linus Torvalds1-4/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull yet more SPDX updates from Greg KH: "Another round of SPDX header file fixes for 5.2-rc4 These are all more "GPL-2.0-or-later" or "GPL-2.0-only" tags being added, based on the text in the files. We are slowly chipping away at the 700+ different ways people tried to write the license text. All of these were reviewed on the spdx mailing list by a number of different people. We now have over 60% of the kernel files covered with SPDX tags: $ ./scripts/spdxcheck.py -v 2>&1 | grep Files Files checked: 64533 Files with SPDX: 40392 Files with errors: 0 I think the majority of the "easy" fixups are now done, it's now the start of the longer-tail of crazy variants to wade through" * tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (159 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 450 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 449 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 448 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 446 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 445 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 444 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 443 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 442 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 440 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 438 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 436 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 435 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 434 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 433 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 432 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 431 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 430 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 429 ...
2019-06-06Revert "gfs2: Replace gl_revokes with a GLF flag"Bob Peterson1-3/+1
Commit 73118ca8baf7 introduced a glock reference counting bug in gfs2_trans_remove_revoke. Given that, replacing gl_revokes with a GLF flag is no longer useful, so revert that commit. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398Thomas Gleixner1-4/+1
Based on 1 normalized pattern(s): this copyrighted material is made available to anyone wishing to use modify copy or redistribute it subject to the terms and conditions of the gnu general public license version 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 44 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-07gfs2: read journal in large chunksAbhi Das1-2/+2
Use bios to read in the journal into the address space of the journal inode (jd_inode), sequentially and in large chunks. This is faster for locating the journal head that the previous binary search approach. When performing recovery, we keep the journal in the address space until recovery is done, which further speeds up things. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Rename sd_log_le_{revoke,ordered}Andreas Gruenbacher1-7/+7
Rename sd_log_le_revoke to sd_log_revokes and sd_log_le_ordered to sd_log_ordered: not sure what le stands for here, but it doesn't add clarity, and if it stands for list entry, it's actually confusing as those are both list heads but not list entries. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Replace gl_revokes with a GLF flagBob Peterson1-1/+3
The gl_revokes value determines how many outstanding revokes a glock has on the superblock revokes list; this is used to avoid unnecessary log flushes. However, gl_revokes is only ever tested for being zero, and it's only decremented in revoke_lo_after_commit, which removes all revokes from the list, so we know that the gl_revoke values of all the glocks on the list will reach zero. Therefore, we can replace gl_revokes with a bit flag. This saves an atomic counter in struct gfs2_glock. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07gfs2: Fix occasional glock use-after-freeAndreas Gruenbacher1-1/+2
This patch has to do with the life cycle of glocks and buffers. When gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata object is assigned to track the buffer, and that is queued to various lists, including the glock's gl_ail_list to indicate it's on the active items list. Once the page associated with the buffer has been written, it is removed from the ail list, but its life isn't over until a revoke has been successfully written. So after the block is written, its bufdata object is moved from the glock's gl_ail_list to a file-system-wide list of pending revokes, sd_log_le_revoke. At that point the glock still needs to track how many revokes it contributed to that list (in gl_revokes) so that things like glock go_sync can ensure all the metadata has been not only written, but also revoked before the glock is granted to a different node. This is to guarantee journal replay doesn't replay the block once the glock has been granted to another node. Ross Lagerwall recently discovered a race in which an inode could be evicted, and its glock freed after its ail list had been synced, but while it still had unwritten revokes on the sd_log_le_revoke list. The evict decremented the glock reference count to zero, which allowed the glock to be freed. After the revoke was written, function revoke_lo_after_commit tried to adjust the glock's gl_revokes counter and clear its GLF_LFLUSH flag, at which time it referenced the freed glock. This patch fixes the problem by incrementing the glock reference count in gfs2_add_revoke when the glock's first bufdata object is moved from the glock to the global revokes list. Later, when the glock's last such bufdata object is freed, the reference count is decremented. This guarantees that whichever process finishes last (the revoke writing or the evict) will properly free the glock, and neither will reference the glock after it has been freed. Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2019-05-07gfs2: clean_journal improperly set sd_log_flush_headBob Peterson1-8/+16
This patch fixes regressions in 588bff95c94efc05f9e1a0b19015c9408ed7c0ef. Due to that patch, function clean_journal was setting the value of sd_log_flush_head, but that's only valid if it is replaying the node's own journal. If it's replaying another node's journal, that's completely wrong and will lead to multiple problems. This patch tries to clean up the mess by passing the value of the logical journal block number into gfs2_write_log_header so the function can treat non-owned journals generically. For the local journal, the journal extent map is used for best performance. For other nodes from other journals, new function gfs2_lblk_to_dblk is called to figure it out using gfs2_iomap_get. This patch also tries to establish more consistency when passing journal block parameters by changing several unsigned int types to a consistent u32. Fixes: 588bff95c94e ("GFS2: Reduce code redundancy writing log headers") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-02-14Revert "gfs2: read journal in large chunks to locate the head"Bob Peterson1-2/+2
This reverts commit 2a5f14f279f59143139bcd1606903f2f80a34241. This patch causes xfstests generic/311 to fail. Reverting this for now until we have a proper fix. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-11gfs2: Remove vestigial bd_opsBob Peterson1-1/+0
Field bd_ops was set but never used, so I removed it, and all code supporting it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-12-11gfs2: read journal in large chunks to locate the headAbhi Das1-2/+2
Use bio(s) to read in the journal sequentially in large chunks and locate the head of the journal. This version addresses the issues Christoph pointed out w.r.t error handling and using deprecated API. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Cc: Christoph Hellwig <hch@infradead.org>
2018-12-11gfs2: changes to gfs2_log_XXX_bioAbhi Das1-2/+2
Change gfs2_log_XXX_bio family of functions so they can be used with different bios, not just sdp->sd_log_bio. This patch also contains some clean up suggested by Andreas. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-10-15gfs2: write revokes should traverse sd_ail1_list in reverseBob Peterson1-2/+2
All the other functions that deal with the sd_ail_list run the list from the tail back to the head, iow, in reverse. We should do the same while writing revokes, otherwise we might miss removing entries properly from the list when we hit the limit of how many revokes we can write at one time (based on block size, which determines how many block pointers will fit in the revoke block). Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-10-05gfs2: slow the deluge of io error messagesBob Peterson1-2/+5
When an io error is hit, it calls gfs2_io_error_bh_i for every journal buffer it can't write. Since we changed gfs2_io_error_bh_i recently to withdraw later in the cycle, it sends a flood of errors to the console. This patch checks for the file system already being withdrawn, and if so, doesn't send more messages. It doesn't stop the flood of messages, but it slows it down and keeps it more reasonable. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-06-21gfs2: call ktime_get_coarse_real_ts64() directlyArnd Bergmann1-1/+1
current_kernel_time64() is now just a deprecated wrapper around ktime_get_coarse_real_ts64(), so let's just call that directly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-06-21gfs2: Don't withdraw under a spin lockAndreas Gruenbacher1-7/+19
In two places, the gfs2_io_error_bh macro is called while holding the sd_ail_lock spin lock. This isn't allowed because gfs2_io_error_bh withdraws the filesystem, which can sleep because it issues a uevent. To fix that, add a gfs2_io_error_bh_wd macro that does withdraw the filesystem and change gfs2_io_error_bh to not withdraw the filesystem. In those places where the new gfs2_io_error_bh is used, withdraw the filesystem after releasing sd_ail_lock. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com>
2018-03-08GFS2: Make function gfs2_remove_from_ail staticBob Peterson1-1/+1
Function gfs2_remove_from_ail is only ever used from log.c, so there is no reason to declare it extern. This patch removes the extern and declares it static. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-23GFS2: Log the reason for log flushes in every log headerBob Peterson1-6/+8
This patch just adds the capability for GFS2 to track which function called gfs2_log_flush. This should make it easier to diagnose problems based on the sequence of events found in the journals. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-01-23GFS2: Introduce new gfs2_log_header_v2Bob Peterson1-19/+56
This patch adds a new structure called gfs2_log_header_v2 which is used to store expanded fields into previously unused areas of the log headers (i.e., this change is backwards compatible). Some of these are used for debug purposes so we can backtrack when problems occur. Others are reserved for future expansion. This patch is based on a prototype from Steve Whitehouse. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-01-22gfs2: Get rid of gfs2_log_header_inAndreas Gruenbacher1-1/+1
Get rid of gfs2_log_header_in by integrating it into get_log_header. Clean up the crc32 computations and use the same functions for encoding and decoding to make things less confusing. Eliminate lh_hash from gfs2_log_header_host which is completely useless. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-22gfs2: Trim the ordered write list in gfs2_ordered_write()Abhi Das1-2/+5
We iterate through the entire ordered writes list in gfs2_ordered_write() to write out inodes. It's a good place to try and shrink the list by throwing out inodes that don't have any pages. Signed-off-by: Abhi Das <adas@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-12-22GFS2: Reduce code redundancy writing log headersBob Peterson1-14/+32
Before this patch, there was a lot of code redundancy between functions log_write_header (which uses bio) and clean_journal (which uses buffer_head). This patch reduces the redundancy to simplify the code and make log header writing more consistent. We want more consistency and reduced redundancy because we plan to add a bunch of new fields to improve performance (by eliminating the local statfs and quota files) improve metadata integrity (by adding new crcs and such) and for better debugging (by adding new fields to track when and where metadata was pushed through the journals.) We don't want to duplicate setting these new fields, nor allow for human error in the process. This reduction in code redundancy is accomplished by introducing a new helper function, gfs2_write_log_header which uses bio rather than bh. That simplifies recovery function clean_journal() to use the new helper function and iomap rather than redundancy and block_map (and eventually we can maybe remove block_map). It also reduces our dependency on buffer_heads. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-25GFS2: Withdraw for IO errors writing to the journal or statfsBob Peterson1-0/+9
Before this patch, if GFS2 encountered IO errors while writing to the journal, it would not report the problem, so they would go unnoticed, sometimes for many hours. Sometimes this would only be noticed later, when recovery tried to do journal replay and failed due to invalid metadata at the blocks that resulted in IO errors. This patch makes GFS2's log daemon check for IO errors. If it encounters one, it withdraws from the file system and reports why in dmesg. A similar action is taken when IO errors occur when writing to the system statfs file. These errors are also reported back to any callers of fsync, since that requires the journal to be flushed. Therefore, any IO errors that would previously go unnoticed are now noticed and the file system is withdrawn as early as possible, thus preventing further file system damage. Also note that this reintroduces superblock variable sd_log_error, which Christoph removed with commit f729b66fca. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-10gfs2: forcibly flush ail to relieve memory pressureAbhi Das1-0/+4
On systems with low memory, it is possible for gfs2 to infinitely loop in balance_dirty_pages() under heavy IO (creating sparse files). balance_dirty_pages() attempts to write out the dirty pages via gfs2_writepages() but none are found because these dirty pages are being used by the journaling code in the ail. Normally, the journal has an upper threshold which when hit triggers an automatic flush of the ail. But this threshold can be higher than the number of allowable dirty pages and result in the ail never being flushed. This patch forces an ail flush when gfs2_writepages() fails to write anything. This is a good indication that the ail might be holding some dirty pages. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05Merge tag 'gfs2-4.13.fixes' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Bob Peterson: "We've got eight GFS2 patches for this merge window: - Andreas Gruenbacher has four patches related to cleaning up the GFS2 inode evict process. This is about half of his patches designed to fix a long-standing GFS2 hang related to the inode shrinker: Shrinker calls gfs2 evict, evict calls DLM, DLM requires memory and blocks on the shrinker. These four patches have been well tested. His second set of patches are still being tested, so I plan to hold them until the next merge window, after we have more weeks of testing. The first patch eliminates the flush_delayed_work, which can block. - Andreas's second patch protects setting of gl_object for rgrps with a spin_lock to prevent proven races. - His third patch introduces a centralized mechanism for queueing glock work with better reference counting, to prevent more races. -His fourth patch retains a reference to inode glocks when an error occurs while creating an inode. This keeps the subsequent evict from needing to reacquire the glock, which might call into DLM and block in low memory conditions. - Arvind Yadav has a patch to add const to attribute_group structures. - I have a patch to detect directory entry inconsistencies and withdraw the file system if any are found. Better that than silent corruption. - I have a patch to remove a vestigial variable from glock structures, saving some slab space. - I have another patch to remove a vestigial variable from the GFS2 in-core superblock structure" * tag 'gfs2-4.13.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: GFS2: constify attribute_group structures. gfs2: gfs2_create_inode: Keep glock across iput gfs2: Clean up glock work enqueuing gfs2: Protect gl->gl_object by spin lock gfs2: Get rid of flush_delayed_work in gfs2_evict_inode GFS2: Eliminate vestigial sd_log_flush_wrapped GFS2: Remove gl_list from glock structure GFS2: Withdraw when directory entry inconsistencies are detected
2017-06-20GFS2: Eliminate vestigial sd_log_flush_wrappedBob Peterson1-3/+0
Superblock variable sd_log_flush_wrapped is set, but never referenced, so this patch eliminates it. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-05-24gfs2: Make flush bios explicitely syncJan Kara1-1/+1
Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...} definitions. generic_make_request_checks() however strips REQ_FUA and REQ_PREFLUSH flags from a bio when the storage doesn't report volatile write cache and thus write effectively becomes asynchronous which can lead to performance regressions Fix the problem by making sure all bios which are synchronous are properly marked with REQ_SYNC. Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3 CC: Steven Whitehouse <swhiteho@redhat.com> CC: cluster-devel@redhat.com CC: stable@vger.kernel.org Acked-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
2017-02-21Merge tag 'gfs2-4.11.fixes' of ↵Linus Torvalds1-6/+15
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Robert Peterson: "We've got eight GFS2 patches for this merge window: - Andy Price submitted a patch to make gfs2_write_full_page a static function. - Dan Carpenter submitted a patch to fix a ERR_PTR thinko. Three patches fix bugs related to deleting very large files, which cause GFS2 to run out of journal space: - The first one prevents GFS2 delete operation from requesting too much journal space. - The second one fixes a problem whereby GFS2 can hang because it wasn't taking journal space demand into its calculations. - The third one wakes up IO waiters when a flush is done to restart processes stuck waiting for journal space to become available. The final three patches are a performance improvement related to spin_lock contention between multiple writers: - The "tr_touched" variable was switched to a flag to be more atomic and eliminate the possibility of some races. - Function meta_lo_add was moved inline with its only caller to make the code more readable and efficient. - Contention on the gfs2_log_lock spinlock was greatly reduced by avoiding the lock altogether in cases where we don't really need it: buffers that already appear in the appropriate metadata list for the journal. Many thanks to Steve Whitehouse for the ideas and principles behind these patches" * tag 'gfs2-4.11.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Make gfs2_write_full_page static GFS2: Reduce contention on gfs2_log_lock GFS2: Inline function meta_lo_add GFS2: Switch tr_touched to flag in transaction GFS2: Wake up io waiters whenever a flush is done GFS2: Made logd daemon take into account log demand GFS2: Limit number of transaction blocks requested for truncates GFS2: Fix reference to ERR_PTR in gfs2_glock_iter_next
2017-01-27GFS2: Switch tr_touched to flag in transactionBob Peterson1-3/+3
This patch eliminates the int variable tr_touched in favor of a new flag in the transaction. This is a step toward reducing contention on the gfs2_log_lock spin_lock. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-01-06GFS2: Wake up io waiters whenever a flush is doneBob Peterson1-1/+5
Before this patch, if a process called function gfs2_log_reserve to reserve some journal blocks, but the journal not enough blocks were free, it would call io_schedule. However, in the log flush daemon, it woke up the waiters only if an gfs2_ail_flush was no longer required. This resulted in situations where processes would wait forever because the number of blocks required was so high that it pushed the journal into a perpetual state of flush being required. This patch changes the logd daemon so that it wakes up io waiters every time the log is actually flushed. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-01-05GFS2: Made logd daemon take into account log demandBob Peterson1-2/+7
Before this patch, the logd daemon only tried to flush things when the log blocks pinned exceeded a certain threshold. But when we're deleting very large files, it may require a huge number of journal blocks, and that, in turn, may exceed the threshold. This patch factors that into account. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-11-01block,fs: use REQ_* flags directlyChristoph Hellwig1-2/+2
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07gfs2: use bio op accessorsMike Christie1-4/+4
Separate the op from the rq_flag_bits and have gfs2 set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-14gfs2: clear journal live bit in gfs2_log_flushBenjamin Marzinski1-0/+3
When gfs2 was unmounting filesystems or changing them to read-only it was clearing the SDF_JOURNAL_LIVE bit before the final log flush. This caused a race. If an inode glock got demoted in the gap between clearing the bit and the shutdown flush, it would be unable to reserve log space to clear out the active items list in inode_go_sync, causing an error in inode_go_inval because the glock was still dirty. To solve this, the SDF_JOURNAL_LIVE bit is now cleared inside the shutdown log flush. This means that, because of the locking on the log blocks, either inode_go_sync will be able to reserve space to clean the glock before the shutdown flush, or the shutdown flush will clean the glock itself, before inode_go_sync fails to reserve the space. Either way, the glock will be clean before inode_go_inval. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2014-11-17GFS2: update freeze code to use freeze/thaw_super on all nodesBenjamin Marzinski1-21/+21
The current gfs2 freezing code is considerably more complicated than it should be because it doesn't use the vfs freezing code on any node except the one that begins the freeze. This is because it needs to acquire a cluster glock before calling the vfs code to prevent a deadlock, and without the new freeze_super and thaw_super hooks, that was impossible. To deal with the issue, gfs2 had to do some hacky locking tricks to make sure that a frozen node couldn't be holding on a lock it needed to do the unfreeze ioctl. This patch makes use of the new hooks to simply the gfs2 locking code. Now, all the nodes in the cluster freeze and thaw in exactly the same way. Every node in the cluster caches the freeze glock in the shared state. The new freeze_super hook allows the freezing node to grab this freeze glock in the exclusive state without first calling the vfs freeze_super function. All the nodes in the cluster see this lock change, and call the vfs freeze_super function. The vfs locking code guarantees that the nodes can't get stuck holding the glocks necessary to unfreeze the system. To unfreeze, the freezing node uses the new thaw_super hook to drop the freeze glock. Again, all the nodes notice this, reacquire the glock in shared mode and call the vfs thaw_super function. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-05-14GFS2: remove transaction glockBenjamin Marzinski1-30/+63
GFS2 has a transaction glock, which must be grabbed for every transaction, whose purpose is to deal with freezing the filesystem. Aside from this involving a large amount of locking, it is very easy to make the current fsfreeze code hang on unfreezing. This patch rewrites how gfs2 handles freezing the filesystem. The transaction glock is removed. In it's place is a freeze glock, which is cached (but not held) in a shared state by every node in the cluster when the filesystem is mounted. This lock only needs to be grabbed on freezing, and actions which need to be safe from freezing, like recovery. When a node wants to freeze the filesystem, it grabs this glock exclusively. When the freeze glock state changes on the nodes (either from shared to unlocked, or shared to exclusive), the filesystem does a special log flush. gfs2_log_flush() does all the work for flushing out the and shutting down the incore log, and then it tries to grab the freeze glock in a shared state again. Since the filesystem is stuck in gfs2_log_flush, no new transaction can start, and nothing can be written to disk. Unfreezing the filesytem simply involes dropping the freeze glock, allowing gfs2_log_flush() to grab and then release the shared lock, so it is cached for next time. However, in order for the unfreezing ioctl to occur, gfs2 needs to get a shared lock on the filesystem root directory inode to check permissions. If that glock has already been grabbed exclusively, fsfreeze will be unable to get the shared lock and unfreeze the filesystem. In order to allow the unfreeze, this patch makes gfs2 grab a shared lock on the filesystem root directory during the freeze, and hold it until it unfreezes the filesystem. The functions which need to grab a shared lock in order to allow the unfreeze ioctl to be issued now use the lock grabbed by the freeze code instead. The freeze and unfreeze code take care to make sure that this shared lock will not be dropped while another process is using it. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-03-12GFS2: Re-add a call to log_flush_wait when flushing the journalBob Peterson1-0/+1
Upstream commit 34cc178 changed a line of code from calling function log_flush_commit to calling log_write_header. This had the effect of eliminating a call to function log_flush_wait. That causes the journal to skip over log headers, which results in multiple wrap points, which itself leads to infinite loops in journal replay, both in the kernel code and fsck.gfs2 code. This patch re-adds that call. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-02-25GFS2: Remove extra "if" in gfs2_log_flush()Steven Whitehouse1-5/+3
By reordering some of the assignments in gfs2_log_flush() it is possible to remove one of the "if" statements as it can be merged with one higher up the function. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-02-24GFS2: Move log buffer accounting to transactionSteven Whitehouse1-43/+24
Now we have a master transaction into which other transactions are merged, the accounting can be done using this master transaction. We no longer require the superblock fields which were being used for this function. In addition, this allows for a clean up in calc_reserved() making it rather easier understand. Also, by reducing the number of variables used to track the buffers being added and removed from the journal, a number of error checks are now no longer required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-02-24GFS2: Move log buffer lists into transactionSteven Whitehouse1-3/+25
Over time, we hope to be able to improve the concurrency available in the log code. This is one small step towards that, by moving the buffer lists from the super block, and into the transaction structure, so that each transaction builds its own buffer lists. At transaction commit time, the buffer lists are merged into the currently accumulating transaction. That transaction then is passed into the before and after commit functions at journal flush time. Thus there should be no change in overall behaviour yet. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-02-03GFS2: Plug on AIL flushSteven Whitehouse1-0/+4
When we do a flush of the AIL list, we are writing out what is likely to be a lot of small I/Os, which are possibly in an order which is not ideal performance-wise. Since this is done by calling filemap_fdatatwrite for each individual inode's address space there is no overall plugging going on. In addition to that, we do not always wait for AIL i/o when we flush it, so that it is possible for things to get left behind on the queue. By adding explicit plugging here, we reduce the chances of this being an issues. A quick test using the AIL flush tracepoint shows a small, but measurable improvement. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-12-13GFS2: Fix use-after-free race when calling gfs2_remove_from_ailBob Peterson1-2/+2
Function gfs2_remove_from_ail drops the reference on the bh via brelse. This patch fixes a race condition whereby bh is deferenced after the brelse when setting bd->bd_blkno = bh->b_blocknr; Under certain rare circumstances, bh might be gone or reused, and bd->bd_blkno is set to whatever that memory happens to be, which is often 0. Later, in gfs2_trans_add_unrevoke, that bd fails the test "bd->bd_blkno >= blkno" which causes it to never be freed. The end result is that the bd is never freed from the bufdata cache, which results in this error: slab error in kmem_cache_destroy(): cache `gfs2_bufdata': Can't free all objects Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-19GFS2: aggressively issue revokes in gfs2_log_flushBenjamin Marzinski1-4/+74
This patch looks at all the outstanding blocks in all the transactions on the log, and moves the completed ones to the ail2 list. Then it issues revokes for these blocks. This will hopefully speed things up in situations where there is a lot of contention for glocks, especially if they are acquired serially. revoke_lo_before_commit will issue at most one log block's full of these preemptive revokes. The amount of reserved log space that gfs2_log_reserve() ignores has been incremented to allow for this extra block. This patch also consolidates the common revoke instructions into one function, gfs2_add_revoke(). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-08GFS2: replace gfs2_ail structure with gfs2_transBenjamin Marzinski1-47/+57
In order to allow transactions and log flushes to happen at the same time, gfs2 needs to move the transaction accounting and active items list code into the gfs2_trans structure. As a first step toward this, this patch removes the gfs2_ail structure, and handles the active items list in the gfs_trans structure. This keeps gfs2 from allocating an ail structure on log flushes, and gives us a struture that can later be used to store the transaction accounting outside of the gfs2 superblock structure. With this patch, at the end of a transaction, gfs2 will add the gfs2_trans structure to the superblock if there is not one already. This structure now has the active items fields that were previously in gfs2_ail. This is not necessary in the case where the transaction was simply used to add revokes, since these are never written outside of the journal, and thus, don't need an active items list. Also, in order to make sure that the transaction structure is not removed while it's still in use by gfs2_trans_end, unlocking the sd_log_flush_lock has to happen slightly later in ending the transaction. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29GFS2: Use ->writepages for ordered writesSteven Whitehouse1-40/+36
Instead of using a list of buffers to write ahead of the journal flush, this now uses a list of inodes and calls ->writepages via filemap_fdatawrite() in order to achieve the same thing. For most use cases this results in a shorter ordered write list, as well as much larger i/os being issued. The ordered write list is sorted by inode number before writing in order to retain the disk block ordering between inodes as per the previous code. The previous ordered write code used to conflict in its assumptions about how to write out the disk blocks with mpage_writepages() so that with this updated version we can also use mpage_writepages() for GFS2's ordered write, writepages implementation. So we will also send larger i/os from writeback too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-05-02GFS2: eliminate log elements and simplifyBob Peterson1-6/+6
This patch eliminates the gfs2_log_element data structure and rolls its two components into the gfs2_bufdata. This makes the code easier to understand and makes it easier to migrate to a rbtree to keep the list sorted. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-04-24GFS2: Log code fixesSteven Whitehouse1-4/+0
This patch removes a log lock from around atomic operation where it is not needed, removes an unused variable, and also changes a void pointer used incorrectly to a struct page pointer. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-04-24GFS2: Remove bd_list_trSteven Whitehouse1-17/+0
This is another clean up in the logging code. This per-transaction list was largely unused. Its main function was to ensure that the number of buffers in a transaction was correct, however that counter was only used to check the number of buffers in the bd_list_tr, plus an assert at the end of each transaction. With the assert now changed to use the calculated buffer counts, we can remove both bd_list_tr and its associated counter. This should make the code easier to understand as well as shrinking a couple of structures. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-04-24GFS2: Clean up log write code pathSteven Whitehouse1-46/+11
Prior to this patch, we have two ways of sending i/o to the log. One of those is used when we need to allocate both the data to be written itself and also a buffer head to submit it. This is done via sb_getblk and friends. This is used mostly for writing log headers. The other method is used when writing blocks which have some in-place counterpart. This is the case for all the metadata blocks which are journalled, and when journaled data is in use, for unescaped journalled data blocks. This patch replaces both of those two methods, and about half a dozen separate i/o submission points with a single i/o submission function. We also go direct to bio rather than using buffer heads, since this allows us to build i/o requests of the maximum size for the block device in question. It also reduces the memory required for flushing the log, which can be very useful in low memory situations. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>