summaryrefslogtreecommitdiffstats
path: root/fs/gfs2
AgeCommit message (Collapse)AuthorFilesLines
2013-07-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmwLinus Torvalds16-124/+282
Pull GFS2 updates from Steven Whitehouse: "There are a few bug fixes for various, mostly very minor corner cases, plus some interesting new features. The new features include atomic_open whose main benefit will be the reduction in locking overhead in case of combined lookup/create and open operations, sorting the log buffer lists by block number to improve the efficiency of AIL writeback, and aggressively issuing revokes in gfs2_log_flush to reduce overhead when dropping glocks." * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: Reserve journal space for quota change in do_grow GFS2: Fix fstrim boundary conditions GFS2: fix warning message GFS2: aggressively issue revokes in gfs2_log_flush GFS2: fix regression in dir_double_exhash GFS2: Add atomic_open support GFS2: Only do one directory search on create GFS2: fix error propagation in init_threads() GFS2: Remove no-op wrapper function GFS2: Cocci spatch "ptr_ret.spatch" GFS2: Eliminate gfs2_rg_lops GFS2: Sort buffer lists by inplace block number
2013-07-02Merge tag 'ext4_for_linus' of ↵Linus Torvalds1-5/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 update from Ted Ts'o: "Lots of bug fixes, cleanups and optimizations. In the bug fixes category, of note is a fix for on-line resizing file systems where the block size is smaller than the page size (i.e., file systems 1k blocks on x86, or more interestingly file systems with 4k blocks on Power or ia64 systems.) In the cleanup category, the ext4's punch hole implementation was significantly improved by Lukas Czerner, and now supports bigalloc file systems. In addition, Jan Kara significantly cleaned up the write submission code path. We also improved error checking and added a few sanity checks. In the optimizations category, two major optimizations deserve mention. The first is that ext4_writepages() is now used for nodelalloc and ext3 compatibility mode. This allows writes to be submitted much more efficiently as a single bio request, instead of being sent as individual 4k writes into the block layer (which then relied on the elevator code to coalesce the requests in the block queue). Secondly, the extent cache shrink mechanism, which was introduce in 3.9, no longer has a scalability bottleneck caused by the i_es_lru spinlock. Other optimizations include some changes to reduce CPU usage and to avoid issuing empty commits unnecessarily." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits) ext4: optimize starting extent in ext4_ext_rm_leaf() jbd2: invalidate handle if jbd2_journal_restart() fails ext4: translate flag bits to strings in tracepoints ext4: fix up error handling for mpage_map_and_submit_extent() jbd2: fix theoretical race in jbd2__journal_restart ext4: only zero partial blocks in ext4_zero_partial_blocks() ext4: check error return from ext4_write_inline_data_end() ext4: delete unnecessary C statements ext3,ext4: don't mess with dir_file->f_pos in htree_dirblock_to_tree() jbd2: move superblock checksum calculation to jbd2_write_superblock() ext4: pass inode pointer instead of file pointer to punch hole ext4: improve free space calculation for inline_data ext4: reduce object size when !CONFIG_PRINTK ext4: improve extent cache shrink mechanism to avoid to burn CPU time ext4: implement error handling of ext4_mb_new_preallocation() ext4: fix corruption when online resizing a fs with 1K block size ext4: delete unused variables ext4: return FIEMAP_EXTENT_UNKNOWN for delalloc extents jbd2: remove debug dependency on debug_fs and update Kconfig help text jbd2: use a single printk for jbd_debug() ...
2013-06-29[readdir] constify ->actorAl Viro1-4/+4
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-29[readdir] convert gfs2Al Viro4-51/+38
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-27GFS2: Reserve journal space for quota change in do_growBob Peterson1-1/+3
If a GFS2 file system is mounted with quotas and a file is grown in such a way that its free blocks for the allocation are represented in a secondary bitmap, GFS2 ran out of blocks in the transaction. That resulted in "fatal: assertion "tr->tr_num_buf <= tr->tr_blocks". This patch reserves extra blocks for the quota change so the transaction has enough space. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-19GFS2: Fix fstrim boundary conditionsAbhijith Das1-6/+8
This patch correctly distinguishes two boundary conditions: 1. When the given range is entire within the unaccounted space between two rgrps, and 2. The range begins beyond the end of the filesystem Also fix the unit of the returned value r.len (total trimming) to be in bytes instead of the (incorrect) 512 byte blocks With this patch, GFS2 passes multiple iterations of all the relevant xfstests (251, 260, 288) with different fs block sizes. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-19GFS2: fix warning messageBenjamin Marzinski1-1/+0
This patch fixes a warning message introduced in the recent "GFS2: aggressively issue revokes in gfs2_log_flush" patch. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-19GFS2: aggressively issue revokes in gfs2_log_flushBenjamin Marzinski6-23/+78
This patch looks at all the outstanding blocks in all the transactions on the log, and moves the completed ones to the ail2 list. Then it issues revokes for these blocks. This will hopefully speed things up in situations where there is a lot of contention for glocks, especially if they are acquired serially. revoke_lo_before_commit will issue at most one log block's full of these preemptive revokes. The amount of reserved log space that gfs2_log_reserve() ignores has been incremented to allow for this extra block. This patch also consolidates the common revoke instructions into one function, gfs2_add_revoke(). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-14GFS2: fix regression in dir_double_exhashBob Peterson1-1/+2
Recent commit e8830d8 introduced a bug in function dir_double_exhash; it was failing to set h in the fall-back case. This patch corrects it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-14GFS2: Add atomic_open supportSteven Whitehouse3-49/+146
I've restricted atomic_open to only operate on regular files, although I still don't understand why atomic_open should not be possible also for directories on GFS2. That can always be added in later though, if it makes sense. The ->atomic_open function can be passed negative dentries, which in most cases means either ENOENT (->lookup) or a call to d_instantiate (->create). In the GFS2 case though, we need to actually perform the look up, since we do not know whether there has been a new inode created on another node. The look up calls d_splice_alias which then tries to rehash the dentry - so the solution here is to simply check for that in d_splice_alias. The same issue is likely to affect any other cluster filesystem implementing ->atomic_open Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields fieldses org> Cc: Jeff Layton <jlayton@redhat.com>
2013-06-11GFS2: Only do one directory search on createSteven Whitehouse3-28/+25
Creation of a new inode requires a directory search in order to ensure that we are not trying to create an inode with the same name as an existing one. This was hidden away inside the create_ok() function. In the case that there was an existing inode, and a lookup can be substituted for a create (which is the case with regular files when the O_EXCL flag is not in use) then we were doing a second lookup in order to return the inode. This patch merges these two lookups into one. This can be done by passing a flag to gfs2_dir_search() to tell it to just return -EEXIST in the cases where we don't actually want to look up the inode. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-06GFS2: fix error propagation in init_threads()Alexey Khoroshilov1-4/+4
If kthread_run() fails, init_threads() returns IS_ERR(p) instead of PTR_ERR(p). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-05GFS2: Remove no-op wrapper functionSteven Whitehouse1-6/+1
This wrapper function is no longer required, so get rid of it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-05GFS2: Cocci spatch "ptr_ret.spatch"Thomas Meyer1-1/+1
Use PTR_RET in place of open coding this function. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-05GFS2: Eliminate gfs2_rg_lopsBob Peterson2-6/+0
With recent changes to the transactions, it appears that we are no longer using the "log ops" for resource groups. Since the log commit code processes the array of log ops, eliminating this should be marginally better for performance. Therefore this patch eliminates it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-05GFS2: Sort buffer lists by inplace block numberBenjamin Marzinski1-0/+16
This patch simply sort the data and metadata buffer lists by their inplace block number. This makes gfs2_log_flush issue the inplace IO in sequential order, which will hopefully speed up writing the IO out to disk. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-03GFS2: Don't cache iopen glocksBob Peterson2-1/+6
This patch makes GFS2 immediately reclaim/delete all iopen glocks as soon as they're dequeued. This allows deleters to get an EXclusive lock on iopen so files are deleted properly instead of being set as unlinked. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-03GFS2: Fall back to vmalloc if kmalloc fails for dir hash tablesBob Peterson1-10/+33
This version has one more correction: the vmalloc calls are replaced by __vmalloc calls to preserve the GFP_NOFS flag. When GFS2's directory management code allocates buffers for a directory hash table, if it can't get the memory it needs, it currently gives a bad return code. Rather than giving an error, this patch allows it to use virtual memory rather than kernel memory for the hash table. This should make it possible for directories to function properly, even when kernel memory becomes very fragmented. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-03GFS2: Increase i_writecount during gfs2_setattr_sizeBob Peterson3-11/+29
This patch calls get_write_access in a few functions. This merely increases inode->i_writecount for the duration of the function. That will ensure that any file closes won't delete the inode's multi-block reservation while the function is running. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-03GFS2: Set log descriptor type for jdata blocksBob Peterson1-1/+3
This patch sets the log descriptor type according to whether the journal commit is for (journaled) data or metadata. This was recently broken when the functions to process data and metadata log ops were combined. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-05-24GFS2: Fix typo in gfs2_log_end_write loopSteven Whitehouse1-1/+1
There was a missing _all in this loop iterator Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-05-24GFS2: fix DLM depends to fix build errorsRandy Dunlap1-1/+1
Fix build errors by correcting DLM dependencies in GFS2. Build errors happen when CONFIG_GFS2_FS_LOCKING_DLM=y and CONFIG_DLM=m: fs/built-in.o: In function `gfs2_lock': file.c:(.text+0xc7abd): undefined reference to `dlm_posix_get' file.c:(.text+0xc7ad0): undefined reference to `dlm_posix_unlock' file.c:(.text+0xc7ad9): undefined reference to `dlm_posix_lock' fs/built-in.o: In function `gdlm_unmount': lock_dlm.c:(.text+0xd6e5b): undefined reference to `dlm_release_lockspace' fs/built-in.o: In function `sync_unlock': lock_dlm.c:(.text+0xd6e9e): undefined reference to `dlm_unlock' fs/built-in.o: In function `sync_lock': lock_dlm.c:(.text+0xd6fb6): undefined reference to `dlm_lock' fs/built-in.o: In function `gdlm_put_lock': lock_dlm.c:(.text+0xd7238): undefined reference to `dlm_unlock' fs/built-in.o: In function `gdlm_mount': lock_dlm.c:(.text+0xd753e): undefined reference to `dlm_new_lockspace' lock_dlm.c:(.text+0xd79d3): undefined reference to `dlm_release_lockspace' fs/built-in.o: In function `gdlm_lock': lock_dlm.c:(.text+0xd8179): undefined reference to `dlm_lock' fs/built-in.o: In function `gdlm_cancel': lock_dlm.c:(.text+0xd6b22): undefined reference to `dlm_unlock' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-05-24GFS2: Use single-block reservations for directoriesBob Peterson1-2/+7
This patch changes the multi-block allocation code, such that directory inodes only get a single block reserved in the bitmap. That way, the bitmaps are more tightly packed together, and there are fewer spans of free blocks for in-use block reservations. This means it takes less time to find a free span of blocks in the bitmap, which speeds things up. This increases the performance of some workloads by almost 2X. In Nate's mockup.py script (which does (1) create dir, (2) create dir in dir, (3) create file in that dir) the test executes in 23 steps rather than 43 steps, a 47% performance improvement. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-05-24GFS2: two minor quota fixupsBob Peterson1-2/+2
This patch fixes two regression problems that Abhi found in the GFS2 quota code. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-05-21gfs2: use ->invalidatepage() length argumentLukas Czerner1-2/+7
->invalidatepage() aop now accepts range to invalidate so we can make use of it in gfs2_invalidatepage(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Cc: cluster-devel@redhat.com
2013-05-21mm: change invalidatepage prototype to accept lengthLukas Czerner1-3/+5
Currently there is no way to truncate partial page where the end truncate point is not at the end of the page. This is because it was not needed and the functionality was enough for file system truncate operation to work properly. However more file systems now support punch hole feature and it can benefit from mm supporting truncating page just up to the certain point. Specifically, with this functionality truncate_inode_pages_range() can be changed so it supports truncating partial page at the end of the range (currently it will BUG_ON() if 'end' is not at the end of the page). This commit changes the invalidatepage() address space operation prototype to accept range to be invalidated and update all the instances for it. We also change the block_invalidatepage() in the same way and actually make a use of the new length argument implementing range invalidation. Actual file system implementations will follow except the file systems where the changes are really simple and should not change the behaviour in any way .Implementation for truncate_page_range() which will be able to accept page unaligned ranges will follow as well. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com>
2013-05-08Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull block core updates from Jens Axboe: - Major bit is Kents prep work for immutable bio vecs. - Stable candidate fix for a scheduling-while-atomic in the queue bypass operation. - Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging discard bios. - Tejuns changes to convert the writeback thread pool to the generic workqueue mechanism. - Runtime PM framework, SCSI patches exists on top of these in James' tree. - A few random fixes. * 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits) relay: move remove_buf_file inside relay_close_buf partitions/efi.c: replace useless kzalloc's by kmalloc's fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read() block: fix max discard sectors limit blkcg: fix "scheduling while atomic" in blk_queue_bypass_start Documentation: cfq-iosched: update documentation help for cfq tunables writeback: expose the bdi_wq workqueue writeback: replace custom worker pool implementation with unbound workqueue writeback: remove unused bdi_pending_list aoe: Fix unitialized var usage bio-integrity: Add explicit field for owner of bip_buf block: Add an explicit bio flag for bios that own their bvec block: Add bio_alloc_pages() block: Convert some code to bio_for_each_segment_all() block: Add bio_for_each_segment_all() bounce: Refactor __blk_queue_bounce to not use bi_io_vec raid1: use bio_copy_data() pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage pktcdvd: use bio_copy_data() block: Add bio_copy_data() ...
2013-05-07aio: don't include aio.h in sched.hKent Overstreet2-0/+2
Faster kernel compiles by way of fewer unnecessary includes. [akpm@linux-foundation.org: fix fallout] [akpm@linux-foundation.org: fix build] Signed-off-by: Kent Overstreet <koverstreet@google.com> Cc: Zach Brown <zab@redhat.com> Cc: Felipe Balbi <balbi@ti.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmwLinus Torvalds16-242/+188
Pull GFS2 updates from Steven Whitehouse: "There is not a whole lot of change this time - there are some further changes which are in the works, but those will be held over until next time. Here there are some clean ups to inode creation, the addition of an origin (local or remote) indicator to glock demote requests, removal of one of the remaining GFP_NOFAIL allocations during log flushes, one minor clean up, and a one liner bug fix." * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: Flush work queue before clearing glock hash tables GFS2: Add origin indicator to glock demote tracing GFS2: Add origin indicator to glock callbacks GFS2: replace gfs2_ail structure with gfs2_trans GFS2: Remove vestigial parameter ip from function rs_deltree GFS2: Use gfs2_dinode_out() in the inode create path GFS2: Remove gfs2_refresh_inode from inode creation path GFS2: Clean up inode creation path
2013-04-30Merge branch 'for-linus' of ↵Linus Torvalds2-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual stuff, mostly comment fixes, typo fixes, printk fixes and small code cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits) mm: Convert print_symbol to %pSR gfs2: Convert print_symbol to %pSR m32r: Convert print_symbol to %pSR iostats.txt: add easy-to-find description for field 6 x86 cmpxchg.h: fix wrong comment treewide: Fix typo in printk and comments doc: devicetree: Fix various typos docbook: fix 8250 naming in device-drivers pata_pdc2027x: Fix compiler warning treewide: Fix typo in printks mei: Fix comments in drivers/misc/mei treewide: Fix typos in kernel messages pm44xx: Fix comment for "CONFIG_CPU_IDLE" doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP" mmzone: correct "pags" to "pages" in comment. kernel-parameters: remove outdated 'noresidual' parameter Remove spurious _H suffixes from ifdef comments sound: Remove stray pluses from Kconfig file radio-shark: Fix printk "CONFIG_LED_CLASS" doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE ...
2013-04-29gfs2: Convert print_symbol to %pSRJoe Perches2-3/+4
Use the new vsprintf extension to avoid any possible message interleaving. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-04-26GFS2: Flush work queue before clearing glock hash tablesBob Peterson1-0/+1
There was a timing window when a GFS2 file system was unmounted that caused GFS2 to call BUG() and panic the kernel. The call to BUG() is meant to ensure that the glock reference count, gl_ref, never gets down to zero and bounce back up again. What was happening during umount is that function gfs2_put_super was dequeing its glocks for well-known files. In particular, we saw it on the journal glock, sd_jinode_gh. The dequeue caused delayed work to be queued for the glock state machine, to transition the lock to an "unlocked" state. While the work was still queued, gfs2_put_super called gfs2_gl_hash_clear to clear out the glock hash tables. If the timing was just so, the glock work function would drop the reference count at the time when it was being checked for zero, and that caused BUG() to be called. This patch calls flush_workqueue before clearing the glock hash tables, thereby ensuring that the delayed work is executed before the hash tables are cleared, and therefore the reference count never goes to zero until the glock is cleared. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-10GFS2: Add origin indicator to glock demote tracingSteven Whitehouse2-5/+8
This adds the origin indicator to the trace point for glock demotion, so that it is possible to see where demote requests have come from. Note that requests generated from the demote_rq sysfs interface will show as remote, since they are intended to replicate exactly the effect of a demote reuqest from a remote node. It is still possible to tell these apart by looking at the process which initiated the demote request. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-10GFS2: Add origin indicator to glock callbacksSteven Whitehouse3-9/+9
This patch adds a bool indicating whether the demote request was originated locally or remotely. This is then used by the iopen ->go_callback() to make 100% sure that it will only respond to remote callbacks. Since ->evict_inode() uses GL_NOCACHE when it attempts to get an exclusive lock on the iopen lock, this may result in extra scheduling of the workqueue in case that the exclusive promotion request failed. This patch prevents that from happening. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-08GFS2: replace gfs2_ail structure with gfs2_transBenjamin Marzinski7-72/+94
In order to allow transactions and log flushes to happen at the same time, gfs2 needs to move the transaction accounting and active items list code into the gfs2_trans structure. As a first step toward this, this patch removes the gfs2_ail structure, and handles the active items list in the gfs_trans structure. This keeps gfs2 from allocating an ail structure on log flushes, and gives us a struture that can later be used to store the transaction accounting outside of the gfs2 superblock structure. With this patch, at the end of a transaction, gfs2 will add the gfs2_trans structure to the superblock if there is not one already. This structure now has the active items fields that were previously in gfs2_ail. This is not necessary in the case where the transaction was simply used to add revokes, since these are never written outside of the journal, and thus, don't need an active items list. Also, in order to make sure that the transaction structure is not removed while it's still in use by gfs2_trans_end, unlocking the sd_log_flush_lock has to happen slightly later in ending the transaction. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-08GFS2: Remove vestigial parameter ip from function rs_deltreeBob Peterson4-11/+11
The functions that delete block reservations from the rgrp block reservations rbtree no longer use the ip parameter. This patch eliminates the parameter. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-08GFS2: Use gfs2_dinode_out() in the inode create pathSteven Whitehouse1-35/+9
Over the previous two patches relating to inode creation, the content of init_dinode() has been looking more and more like gfs2_dinode_out(). This is not an accident! This patch replaces the parts of init_dinode() which are duplicated in gfs2_dinode_out() with a call to that function. Mostly that is straightforward, but there is one issue which needed to be resolved relating to the link count. The link count has to be set to zero in a certain error handling code path, which lands up calling iput(). This is now done specifically in that code path allowing the link count to be set earlier and written into the on disk inode by gfs2_dinode_put() in the normal way. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-08GFS2: Remove gfs2_refresh_inode from inode creation pathSteven Whitehouse3-58/+35
The original method for creating inodes used in GFS2 was to fill out a buffer, with all the information, and then to read that buffer into the in-core inode, using gfs2_refresh_inode() The problem with this approach is that all the inode's fields need to be calculated ahead of time, and were stored in various variables making the code rather complicated. The new approach is simply to allocate the in-core inode earlier and fill in as many fields as possible ahead of time. These can then be used to initilise the on disk representation. The code has been working towards the point where it is possible to remove gfs2_refresh_inode() because all the fields are correctly initialised ahead of time. We've now reached that milestone, and have reversed the order of setting up the in core and on disk inodes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-08GFS2: Clean up inode creation pathSteven Whitehouse2-69/+38
This patch cleans up the inode creation code path in GFS2. After the Orlov allocator was merged, a number of potential improvements are now possible, and this is a first set of these. The quota handling is now updated so that it matches the point in the code where the allocation takes place. This means that the one exception in gfs2_alloc_blocks relating to quota is now no longer required, and we can use the generic code everywhere. In addition the call to figure out whether we need to allocate any extra blocks in order to add a directory entry is moved higher up gfs2_create_inode. This means that if it returns an error, we can deal with that at a stage where it is easier to handle that case. The returned status cannot change during the function since we hold an exclusive lock on the directory. Two calls to gfs2_rindex_update have been changed to one, again at the top of gfs2_create_inode to simplify error handling. The time stamps are also now initialised earlier in the creation process, this is gradually moving towards being able to remove the call to gfs2_refresh_inode in gfs2_inode_create once we have all the fields covered. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-05GFS2: Issue discards in 512b sectorsBob Peterson1-17/+13
This patch changes GFS2's discard issuing code so that it calls function sb_issue_discard rather than blkdev_issue_discard. The code was calling blkdev_issue_discard and specifying the correct sector offset and sector size, but blkdev_issue_discard expects these values to be in terms of 512 byte sectors, even if the native sector size for the device is different. Calling sb_issue_discard with the BLOCK size instead ensures the correct block-to-512b-sector translation. I verified that "minlen" is specified in blocks, so comparing it to a number of blocks is correct. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-04GFS2: Fix unlock of fcntl locks during withdrawn stateSteven Whitehouse1-1/+4
When withdraw occurs, we need to continue to allow unlocks of fcntl locks to occur, however these will only be local, since the node has withdrawn from the cluster. This prevents triggering a VFS level bug trap due to locks remaining when a file is closed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-04GFS2: return error if malloc failed in gfs2_rs_alloc()Wei Yongjun1-1/+1
The error code in gfs2_rs_alloc() is set to ENOMEM when error but never be used, instead, gfs2_rs_alloc() always return 0. Fix to return 'error'. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-04GFS2: use memchr_invAkinobu Mita1-6/+2
Use memchr_inv to verify that the specified memory range is cleared. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: cluster-devel@redhat.com Cc: Christine Caulfield <ccaulfie@redhat.com> Cc: David Teigland <teigland@redhat.com>
2013-04-04GFS2: use kmalloc for lvb bitmapDavid Teigland2-13/+19
The temp lvb bitmap was on the stack, which could be an alignment problem for __set_bit_le. Use kmalloc for it instead. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-03-23block: Add bio_end_sector()Kent Overstreet1-1/+1
Just a little convenience macro - main reason to add it now is preparing for immutable bio vecs, it'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx into a struct bvec_iter. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Lars Ellenberg <drbd-dev@lists.linbit.com> CC: Jiri Kosina <jkosina@suse.cz> CC: Alasdair Kergon <agk@redhat.com> CC: dm-devel@redhat.com CC: Neil Brown <neilb@suse.de> CC: Martin Schwidefsky <schwidefsky@de.ibm.com> CC: Heiko Carstens <heiko.carstens@de.ibm.com> CC: linux-s390@vger.kernel.org CC: Chris Mason <chris.mason@fusionio.com> CC: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2013-03-03fs: Limit sys_mount to only request filesystem modules.Eric W. Biederman1-1/+3
Modify the request_module to prefix the file system type with "fs-" and add aliases to all of the filesystems that can be built as modules to match. A common practice is to build all of the kernel code and leave code that is not commonly needed as modules, with the result that many users are exposed to any bug anywhere in the kernel. Looking for filesystems with a fs- prefix limits the pool of possible modules that can be loaded by mount to just filesystems trivially making things safer with no real cost. Using aliases means user space can control the policy of which filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf with blacklist and alias directives. Allowing simple, safe, well understood work-arounds to known problematic software. This also addresses a rare but unfortunate problem where the filesystem name is not the same as it's module name and module auto-loading would not work. While writing this patch I saw a handful of such cases. The most significant being autofs that lives in the module autofs4. This is relevant to user namespaces because we can reach the request module in get_fs_type() without having any special permissions, and people get uncomfortable when a user specified string (in this case the filesystem type) goes all of the way to request_module. After having looked at this issue I don't think there is any particular reason to perform any filtering or permission checks beyond making it clear in the module request that we want a filesystem module. The common pattern in the kernel is to call request_module() without regards to the users permissions. In general all a filesystem module does once loaded is call register_filesystem() and go to sleep. Which means there is not much attack surface exposed by loading a filesytem module unless the filesystem is mounted. In a user namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT, which most filesystems do not set today. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Kees Cook <keescook@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-26Merge branch 'for-linus' of ↵Linus Torvalds4-21/+20
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle *and* ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...
2013-02-26fs: change return values from -EACCES to -EPERMZhao Hongjiang1-9/+9
According to SUSv3: [EACCES] Permission denied. An attempt was made to access a file in a way forbidden by its file access permissions. [EPERM] Operation not permitted. An attempt was made to perform an operation limited to processes with appropriate privileges or to the owner of a file or other resource. So -EPERM should be returned if capability checks fails. Strictly speaking this is an API change since the error code user sees is altered. Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26fs: encode_fh: return FILEID_INVALID if invalid fid_typeNamjae Jeon1-2/+2
This patch is a follow up on below patch: [PATCH] exportfs: add FILEID_INVALID to indicate invalid fid_type commit: 216b6cbdcbd86b1db0754d58886b466ae31f5a63 Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Vivek Trivedi <t.vivek@samsung.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Sage Weil <sage@inktank.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-25Merge branch 'for-linus' of ↵Linus Torvalds11-118/+104
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace and namespace infrastructure changes from Eric W Biederman: "This set of changes starts with a few small enhnacements to the user namespace. reboot support, allowing more arbitrary mappings, and support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the user namespace root. I do my best to document that if you care about limiting your unprivileged users that when you have the user namespace support enabled you will need to enable memory control groups. There is a minor bug fix to prevent overflowing the stack if someone creates way too many user namespaces. The bulk of the changes are a continuation of the kuid/kgid push down work through the filesystems. These changes make using uids and gids typesafe which ensures that these filesystems are safe to use when multiple user namespaces are in use. The filesystems converted for 3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The changes for these filesystems were a little more involved so I split the changes into smaller hopefully obviously correct changes. XFS is the only filesystem that remains. I was hoping I could get that in this release so that user namespace support would be enabled with an allyesconfig or an allmodconfig but it looks like the xfs changes need another couple of days before it they are ready." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits) cifs: Enable building with user namespaces enabled. cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t cifs: Convert struct cifs_sb_info to use kuids and kgids cifs: Modify struct smb_vol to use kuids and kgids cifs: Convert struct cifsFileInfo to use a kuid cifs: Convert struct cifs_fattr to use kuid and kgids cifs: Convert struct tcon_link to use a kuid. cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t cifs: Convert from a kuid before printing current_fsuid cifs: Use kuids and kgids SID to uid/gid mapping cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc cifs: Use BUILD_BUG_ON to validate uids and gids are the same size cifs: Override unmappable incoming uids and gids nfsd: Enable building with user namespaces enabled. nfsd: Properly compare and initialize kuids and kgids nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids nfsd: Modify nfsd4_cb_sec to use kuids and kgids nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion nfsd: Convert nfsxdr to use kuids and kgids nfsd: Convert nfs3xdr to use kuids and kgids ...