summaryrefslogtreecommitdiffstats
path: root/fs/ceph/super.c
AgeCommit message (Collapse)AuthorFilesLines
2010-10-29convert cephAl Viro1-23/+27
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-20ceph: factor out libceph from Ceph file systemYehuda Sadeh1-682/+472
This factors out protocol and low-level storage parts of ceph into a separate libceph module living in net/ceph and include/linux/ceph. This is mostly a matter of moving files around. However, a few key pieces of the interface change as well: - ceph_client becomes ceph_fs_client and ceph_client, where the latter captures the mon and osd clients, and the fs_client gets the mds client and file system specific pieces. - Mount option parsing and debugfs setup is correspondingly broken into two pieces. - The mon client gets a generic handler callback for otherwise unknown messages (mds map, in this case). - The basic supported/required feature bits can be expanded (and are by ceph_fs_client). No functional change, aside from some subtle error handling cases that got cleaned up in the refactoring process. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-03ceph: do not ignore osd_idle_ttl mount optionSage Weil1-0/+3
Actually apply the mount option to the mount_args struct. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01ceph: make ->sync_fs not wait if wait==0Sage Weil1-4/+13
The ->sync_fs() super op only needs to wait if wait is true. Otherwise, just get some dirty cap writeback started. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01ceph: use %pU to print uuid (fsid)Sage Weil1-6/+6
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01ceph: clean up fsid mount optionSage Weil1-13/+39
Specify the fsid mount option in hex, not via the major/minor u64 hackery we had before. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01ceph: remove unused 'monport' mount optionSage Weil1-2/+0
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01ceph: do caps accounting per mds_clientYehuda Sadeh1-6/+0
Caps related accounting is now being done per mds client instead of just being global. This prepares ground work for a later revision of the caps preallocated reservation list. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-10ceph: fix atomic64_t initialization on ia64Jeff Mahoney1-1/+1
bdi_seq is an atomic_long_t but we're using ATOMIC_INIT, which causes build failures on ia64. This patch fixes it to use ATOMIC_LONG_INIT. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-01ceph: fix f_namelen reported by statfsSage Weil1-1/+1
We were setting f_namelen in kstatfs to PATH_MAX instead of NAME_MAX. That disagrees with ceph_lookup behavior (which checks against NAME_MAX), and also makes the pjd posix test suite spit out ugly errors because with can't clean up its temporary files. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-30Merge branch 'for-linus' of ↵Linus Torvalds1-2/+10
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: clean up on forwarded aborted mds request ceph: fix leak of osd authorizer ceph: close out mds, osd connections before stopping auth ceph: make lease code DN specific fs/ceph: Use ERR_CAST ceph: renew auth tickets before they expire ceph: do not resend mon requests on auth ticket renewal ceph: removed duplicated #includes ceph: avoid possible null dereference ceph: make mds requests killable, not interruptible sched: add wait_for_completion_killable_timeout
2010-05-29ceph: close out mds, osd connections before stopping authSage Weil1-1/+9
The auth module (part of the mon_client) is needed to free any ceph_authorizer(s) used by the mds and osd connections. Flush the msgr workqueue before stopping monc to ensure that the destroy_authorizer auth op is available when those connections are closed out. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29fs/ceph: Use ERR_CASTJulia Lawall1-1/+1
Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more clear what is the purpose of the operation, which otherwise looks like a no-op. In the case of fs/ceph/inode.c, ERR_CAST is not needed, because the type of the returned value is the same as the type of the enclosing function. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; T x; identifier f; @@ T f (...) { <+... - ERR_PTR(PTR_ERR(x)) + x ...+> } @@ expression x; @@ - ERR_PTR(PTR_ERR(x)) + ERR_CAST(x) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-24Merge branch 'for-linus' of ↵Linus Torvalds1-44/+81
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (59 commits) ceph: reuse mon subscribe message instead of allocated anew ceph: avoid resending queued message to monitor ceph: Storage class should be before const qualifier ceph: all allocation functions should get gfp_mask ceph: specify max_bytes on readdir replies ceph: cleanup pool op strings ceph: Use kzalloc ceph: use common helper for aborted dir request invalidation ceph: cope with out of order (unsafe after safe) mds reply ceph: save peer feature bits in connection structure ceph: resync headers with userland ceph: use ceph. prefix for virtual xattrs ceph: throw out dirty caps metadata, data on session teardown ceph: attempt mds reconnect if mds closes our session ceph: clean up send_mds_reconnect interface ceph: wait for mds OPEN reply to indicate reconnect success ceph: only send cap releases when mds is OPEN|HUNG ceph: dicard cap releases on mds restart ceph: make mon client statfs handling more generic ceph: drop src address(es) from message header [new protocol feature] ...
2010-05-21ceph: should use deactivate_locked_super() on failure exitsAl Viro1-2/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-17ceph: specify max_bytes on readdir repliesSage Weil1-0/+8
Specify max bytes in request to bound size of reply. Add associated mount option with default value of 512 KB. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: name bdi ceph-%d instead of major:minorSage Weil1-1/+4
The bdi_setup_and_register() helper doesn't help us since we bdi_init() in create_client() and bdi_register() only when sget() succeeds. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: clean up mount options, ->show_options()Sage Weil1-31/+59
Ensure all options are included in /proc/mounts. Some cleanup. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: remove unused #includesHuang Weiyi1-3/+0
Remove unused #include's in fs/ceph/super.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: wait for both monmap and osdmap when opening sessionSage Weil1-5/+6
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-05-17ceph: use ceph_sb_to_client instead of ceph_clientCheng Renquan1-4/+4
ceph_sb_to_client and ceph_client are really identical, we need to dump one; while function ceph_client is confusing with "struct ceph_client", ceph_sb_to_client's definition is more clear; so we'd better switch all call to ceph_sb_to_client. -static inline struct ceph_client *ceph_client(struct super_block *sb) -{ - return sb->s_fs_info; -} Signed-off-by: Cheng Renquan <crquan@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-04ceph: unregister bdi before kill_anon_super releases device nameSage Weil1-7/+16
Unregister and destroy the bdi in put_super, after mount is r/o, but before put_anon_super releases the device name. For symmetry, bdi_destroy in destroy_client (we bdi_init in create_client). Only set s_bdi if bdi_register succeeds, since we use it to decide whether to bdi_unregister. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-03ceph: print more useful version info on module loadSage Weil1-3/+4
Decouple the client version from the server side. Print relevant protocol and map version info instead. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo1-0/+1
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-04ceph: reset osd after relevant messages timed outYehuda Sadeh1-1/+7
This simplifies the process of timing out messages. We keep lru of current messages that are in flight. If a timeout has passed, we reset the osd connection, so that messages will be retransmitted. This is a failsafe in case we hit some sort of problem sending out message to the OSD. Normally, we'll get notification via an updated osdmap if there are problems. If a request is older than the keepalive timeout, send a keepalive to ensure we detect any breaks in the TCP connection. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-17ceph: clean up readdir caps reservationSage Weil1-0/+5
Use a global counter for the minimum number of allocated caps instead of hard coding a check against readdir_max. This takes into account multiple client instances, and avoids examining the superblock mount options when a cap is dropped. Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-11ceph: put unused osd connections on lruYehuda Sadeh1-0/+3
Instead of removing osd connection immediately when the requests list is empty, put the osd connection on an lru. Only if that osd has not been used for more than a specified time, will it be removed. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-10ceph: allow renewal of auth credentialsSage Weil1-6/+6
Add infrastructure to allow the mon_client to periodically renew its auth credentials. Also add a messenger callback that will force such a renewal if a peer rejects our authenticator. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23ceph: only unregister registered bdiSage Weil1-1/+2
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21ceph: writeback congestion controlYehuda Sadeh1-0/+36
Set bdi congestion bit when amount of write data in flight exceeds adjustable threshold. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21ceph: hex dump corrupt server data to KERN_DEBUGSage Weil1-7/+2
Also, print fsid using standard format, NOT hex dump. Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-20ceph: mount fails immediately on errorYehuda Sadeh1-1/+5
Signed-off-by: Yehuda Sadeh <yehuda@newdream.net>
2009-11-20ceph: fix debugfs entry, simplify fsid checksSage Weil1-5/+27
We may first learn our fsid from any of the mon, osd, or mds maps (whichever the monitor sends first). Consolidate checks in a single helper. Initialize the client debugfs entry then, since we need the fsid (and global_id) for the directory name. Also remove dead mount code. Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-18ceph: move mempool creation to ceph_create_clientSage Weil1-11/+12
Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-18ceph: negotiate authentication protocol; implement AUTH_NONE protocolSage Weil1-13/+19
When we open a monitor session, we send an initial AUTH message listing the auth protocols we support, our entity name, and (possibly) a previously assigned global_id. The monitor chooses a protocol and responds with an initial message. Initially implement AUTH_NONE, a dummy protocol that provides no security, but works within the new framework. It generates 'authorizers' that are used when connecting to (mds, osd) services that simply state our entity name and global_id. This is a wire protocol change. Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-18ceph: handle errors during osd client initSage Weil1-1/+5
Unwind initializing if we get ENOMEM during client initialization. Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-04ceph: fix sparse endian warningSage Weil1-1/+1
Use the __le macro, even though for -1 it doesn't matter. Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-02ceph: init/destroy bdi in client create/destroy helpersSage Weil1-6/+9
This keeps bdi setup/teardown in line with client life cycle. Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-27ceph: allocate and parse mount args before client instanceSage Weil1-58/+48
This simplifies much of the error handling during mount. It also means that we have the mount args before client creation, and we can initialize based on those options. Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-27ceph: fix, clean up string mount arg parsingSage Weil1-2/+9
Clearly demark int and string argument options, and do not try to convert string arguments to ints. Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-26ceph: silence uninitialized variable warningSage Weil1-1/+1
Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-26ceph: reduce parse_mount_args stack usageSage Weil1-8/+17
Since we've increased the max mon count, we shouldn't put the addr array on the parse_mount_args stack. Put it on the heap instead. Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-22ceph: remove small mon addr limit; use CEPH_MAX_MON where appropriateSage Weil1-2/+2
Get rid of separate max mon limit; use the system limit instead. This allows mounts when there are lots of mon addrs provided by mount.ceph (as with a host with lots of A/AAAA records). Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-16ceph: enable readaheadSage Weil1-0/+1
Initialized bdi->ra_pages to enable readahead. Use 512KB default. Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-14ceph: initialize sb->s_bdi, bdi_unregister after kill_anon_superSage Weil1-1/+3
Writeback doesn't work without the bdi set, and writeback on umount doesn't work if we unregister the bdi too early. Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-12ceph: remove unused CEPH_MSG_{OSD,MDS}_GETMAPSage Weil1-2/+0
Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-07ceph: show meaningful version on module loadSage Weil1-2/+3
Kill the old git revision; print the ceph version and protocol versions instead. Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-06ceph: super.cSage Weil1-0/+936
Mount option parsing, client setup and teardown, and a few odds and ends (e.g., statfs). Signed-off-by: Sage Weil <sage@newdream.net>