summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2011-05-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tileLinus Torvalds7-28/+347
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: arch/tile: more /proc and /sys file support
2011-05-29Merge branch 'x86-platform-next' into x86-platformMatthew Garrett35-1192/+1393
2011-05-29Merge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linuxLinus Torvalds12-199/+323
* 'for-2.6.40' of git://linux-nfs.org/~bfields/linux: (22 commits) nfsd: make local functions static NFSD: Remove unused variable from nfsd4_decode_bind_conn_to_session() NFSD: Check status from nfsd4_map_bcts_dir() NFSD: Remove setting unused variable in nfsd_vfs_read() nfsd41: error out on repeated RECLAIM_COMPLETE nfsd41: compare request's opcnt with session's maxops at nfsd4_sequence nfsd v4.1 lOCKT clientid field must be ignored nfsd41: add flag checking for create_session nfsd41: make sure nfs server process OPEN with EXCLUSIVE4_1 correctly nfsd4: fix wrongsec handling for PUTFH + op cases nfsd4: make fh_verify responsibility of nfsd_lookup_dentry caller nfsd4: introduce OPDESC helper nfsd4: allow fh_verify caller to skip pseudoflavor checks nfsd: distinguish functions of NFSD_MAY_* flags svcrpc: complete svsk processing on cb receive failure svcrpc: take advantage of tcp autotuning SUNRPC: Don't wait for full record to receive tcp data svcrpc: copy cb reply instead of pages svcrpc: close connection if client sends short packet svcrpc: note network-order types in svc_process_calldir ...
2011-05-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dmLinus Torvalds11-124/+144
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: dm kcopyd: return client directly and not through a pointer dm kcopyd: reserve fewer pages dm io: use fixed initial mempool size dm kcopyd: alloc pages from the main page allocator dm kcopyd: add gfp parm to alloc_pl dm kcopyd: remove superfluous page allocation spinlock dm kcopyd: preallocate sub jobs to avoid deadlock dm kcopyd: avoid pointless job splitting dm mpath: do not fail paths after integrity errors dm table: reject devices without request fns dm table: allow targets to support discards internally
2011-05-29Merge branch 'nfs-for-2.6.40' of ↵Linus Torvalds12-66/+542
git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.40' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: SUNRPC: Support for RPC over AF_LOCAL transports SUNRPC: Remove obsolete comment SUNRPC: Use AF_LOCAL for rpcbind upcalls SUNRPC: Clean up use of curly braces in switch cases NFS: Revert NFSROOT default mount options SUNRPC: Rename xs_encode_tcp_fragment_header() nfs,rcu: convert call_rcu(nfs_free_delegation_callback) to kfree_rcu() nfs41: Correct offset for LAYOUTCOMMIT NFS: nfs_update_inode: print current and new inode size in debug output NFSv4.1: Fix the handling of NFS4ERR_SEQ_MISORDERED errors NFSv4: Handle expired stateids when the lease is still valid SUNRPC: Deal with the lack of a SYN_SENT sk->sk_state_change callback...
2011-05-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linusLinus Torvalds4-6/+6
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus: Squashfs: Fix sanity check patches on big-endian systems
2011-05-29Merge branch 'release' of ↵Linus Torvalds36-498/+729
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: ACPI EC: remove redundant code ACPI: Add D3 cold state ACPI: processor: fix processor_physically_present in UP kernel ACPI: Split out custom_method functionality into an own driver ACPI: Cleanup custom_method debug stuff ACPI EC: enable MSI workaround for Quanta laptops ACPICA: Update to version 20110413 ACPICA: Execute an orphan _REG method under the EC device ACPICA: Move ACPI_NUM_PREDEFINED_REGIONS to a more appropriate place ACPICA: Update internal address SpaceID for DataTable regions ACPICA: Add more methods eligible for NULL package element removal ACPICA: Split all internal Global Lock functions to new file - evglock ACPI: EC: add another DMI check for ASUS hardware ACPI EC: remove dead code ACPICA: Fix code divergence of global lock handling ACPICA: Use acpi_os_create_lock interface ACPI: osl, add acpi_os_create_lock interface ACPI:Fix goto flows in thermal-sys
2011-05-29Merge branch 'idle-release' of ↵Linus Torvalds14-40/+102
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param x86 idle: deprecate "no-hlt" cmdline param x86 idle APM: deprecate CONFIG_APM_CPU_IDLE x86 idle floppy: deprecate disable_hlt() x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it x86 idle: clarify AMD erratum 400 workaround idle governor: Avoid lock acquisition to read pm_qos before entering idle cpuidle: menu: fixed wrapping timers at 4.294 seconds
2011-05-29cifs/ubifs: Fix shrinker API change falloutAl Viro3-3/+5
Commit 1495f230fa77 ("vmscan: change shrinker API by passing shrink_control struct") changed the API of ->shrink(), but missed ubifs and cifs instances. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-29pnfs-obj: pg_test check for max_io_sizeBoaz Harrosh1-1/+18
Implement pg_test vector to test for max IO sizes. We calculate a max_io_size member only once, and cache it in lseg so to not do so on every page insert. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> [simplify logic] Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29NFSv4.1: define nfs_generic_pg_testBoaz Harrosh1-24/+20
By default, unless pnfs is used coalesce pages until pg_bsize (rsize or wsize) is reached. pnfs layout drivers define their own pg_test methods that use pnfs_generic_pg_test and need to define their own I/O size limits (e.g. based on the file stripe size). [Move a check from nfs_pageio_do_add_request to nfs_generic_pg_test] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29NFSv4.1: use pnfs_generic_pg_test directly by layout driverBenny Halevy4-19/+17
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29NFSv4.1: change pg_test return type to boolBenny Halevy6-23/+22
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29NFSv4.1: unify pnfs_pageio_init functionsBenny Halevy5-54/+27
Use common code for pnfs_pageio_init_{read,write} and use a common generic pg_test function. Note that this function always assumes the the layout driver's pg_test method is implemented. [Fix BUG] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs-obj: objlayout_encode_layoutcommit implementationBoaz Harrosh3-0/+61
* Define API for io-engines to report delta_space_used in IOs * Encode the osd-layout specific information of the layoutcommit XDR buffer. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs: encode_layoutcommitBenny Halevy2-3/+17
Add a layout driver method to encode the layout type specific opaque part of layout commit in-line in the xdr stream. Currently, the pnfs-objects layout driver uses it to encode metadata hints to the MDS and the blocks layout driver to commit provisionally allocated extents to the file. Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs-obj: report errors and .encode_layoutreturn Implementation.Boaz Harrosh3-2/+297
An io_state pre-allocates an error information structure for each possible osd-device that might error during IO. When IO is done if all was well the io_state is freed. (as today). If the I/O has ended with an error, the io_state is queued on a per-layout err_list. When eventually encode_layoutreturn() is called, each error is properly encoded on the XDR buffer and only then the io_state is removed from err_list and de-allocated. It is up to the io_engine to fill in the segment that fault and the type of osd_error that occurred. By calling objlayout_io_set_result() for each failing device. In objio_osd: * Allocate io-error descriptors space as part of io_state * Use generic objlayout error reporting at end of io. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs: encode_layoutreturnAndy Adamson2-2/+11
Add a layout driver method to encode the layout type specific opaque part of layout return in-line in the xdr stream. Currently the pnfs-objects layout driver uses it to encode i/o error information on LAYOUTRETURN. Signed-off-by: Andy Adamson <andros@netapp.com> [fixup layout header pointer for encode_layoutreturn] Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs: layoutret_on_setattrBenny Halevy3-0/+26
With the objects layout security model, we have object capabilities that are associated with the layout and we anticipate that the server will issue a cb_layoutrecall for any setattr that changes security related attributes (user/group/mode/acl) or truncates the file. Therefore, the layout is returned before issuing the setattr to avoid the anticipated cb_layoutrecall. Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs: layoutreturnBenny Halevy7-7/+274
NFSv4.1 LAYOUTRETURN implementation Currently, does not support layout-type payload encoding. Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn> [call pnfs_return_layout right before pnfs_destroy_layout] [remove assert_spin_locked from pnfs_clear_lseg_list] [remove wait parameter from the layoutreturn path.] [remove return_type field from nfs4_layoutreturn_args] [remove range from nfs4_layoutreturn_args] [no need to send layoutcommit from _pnfs_return_layout] [don't wait on sync layoutreturn] [fix layout stateid in layoutreturn args] [fixed NULL deref in _pnfs_return_layout] [removed recaim member of nfs4_layoutreturn_args] Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs-obj: osd raid engine read/write implementationBoaz Harrosh3-0/+901
With the use of the in-kernel osd library. Implement read/write of data from/to osd-objects according to information specified in the objects-layout. Support for stripping over mirrors with a received stripe_unit. There are however a few constrains which are not supported: 1. Stripe Unit must be a multiple of PAGE_SIZE 2. stripe length (stripe_unit * number_of_stripes) can not be bigger then 32bit. Also support raid-groups and partial-layout. Partial-layout is when not all the groups are received on the line, addressing only a partial range of the file. TODO: Only raid0! raid 4/5/6 support will come at later stage A none supported layout will send IO through the MDS [Important fallout from the last rebase] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> [gfp_flags] Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs: support for non-rpc layout driversBenny Halevy5-4/+62
Non-rpc layout driver such as for objects and blocks implement their own I/O path and error handling logic. Therefore bypass NFS-based error handling for these layout drivers. [fix lseg ref-count bugs, and null de-refs] [Fall out from: non-rpc layout drivers] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> [get rid of PNFS_USE_RPC_CODE] [get rid of __nfs4_write_done_cb] [revert useless change in nfs4_write_done_cb] Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs-obj: define per-inode private structureBenny Halevy3-0/+46
allocate and deallocate per-inode private pnfs_layout_hdr in preparation for I/O implementation. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs: alloc and free layout_hdr layoutdriver methodsBenny Halevy2-3/+22
[gfp_flags] Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs-obj: objio_osd device information retrieval and cachingBoaz Harrosh3-0/+233
When a new layout is received in objio_alloc_lseg all device_ids referenced are retrieved. The device information is queried for from MDS and then the osd_device is looked-up from the osd-initiator library. The devices are cached in a per-mount-point list, for later use. At unmount all devices are "put" back to the library. objlayout_get_deviceinfo(), objlayout_put_deviceinfo() middleware API for retrieving device information given a device_id. TODO: The device cache can get big. Cap its size. Keep an LRU and start to return devices which were not used, when list gets to big, or when new entries allocation fail. [pnfs-obj: Bugs in new global-device-cache code] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> [gfp_flags] [use global device cache] [use layout driver in global device cache] Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs-obj: decode layout, alloc/free lsegBoaz Harrosh4-2/+341
objlayout_alloc_lseg prepares an xdr_stream and calls the raid engins objio_alloc_lseg() to allocate a private pnfs_layout_segment. objio_osd.c::objio_alloc_lseg() uses passed xdr_stream to decode and store the layout_segment information in an objio_segment struct, using the pnfs_osd_xdr.h API for the actual parsing the layout xdr. objlayout_free_lseg calls objio_free_lseg() to free the allocated space. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> [gfp_flags] [removed "extern" from function definitions] Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs-obj: pnfs_osd XDR client implementationBoaz Harrosh2-1/+413
* Add the fs/nfs/objlayout/pnfs_osd_xdr_cli.c file, which will include the XDR encode/decode implementations for the pNFS client objlayout driver. [Wrong type in comments] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs-obj: pnfs_osd XDR definitionsBenny Halevy1-0/+345
* Add the pnfs_osd_xdr.h header * defintions the pnfs_osd_layout structure including all it's sub-types and constants. * Declare the pnfs_osd_xdr_decode_layout API + all needed inline helpers. * Define the pnfs_osd_deviceaddr structure and all its subtypes and constants. * Declare API for decoding of a pnfs_osd_deviceaddr from XDR stream. * Define the pnfs_osd_ioerr structure, its substructures and constants. * Declare API for encoding of a pnfs_osd_ioerr into XDR stream. * Define the pnfs_osd_layoutupdate structure and its substructures. * Declare API for encoding of a pnfs_osd_layoutupdate into XDR stream. [Remove server definitions] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs-obj: objlayoutdriver module skeletonBenny Halevy4-0/+93
* Define the PNFS_OBJLAYOUT Kconfig option in the nfs master Kconfig file. * Add the objlayout driver to the Kernel's Kbuild system. * Add the fs/nfs/objlayout/Kbuild file for building the objlayoutdriver.ko driver * Define fs/nfs/objlayout/objio_osd.c, register the driver on module initialization and unregister on exit. [pnfs-obj: remove of CONFIG_PNFS fallout] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> [added "unsure" clause] [depend on NFS_V4_1] Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs: client statsJ. Bruce Fields1-0/+25
A pNFS client auto-negotiates a lot of features (minorversion level, pNFS layout type, etc.). This is convenient, but makes certain kinds of failures hard for a user to detect. For example, if the client falls back on 4.0, or falls back to MDS IO because the user didn't connect to the right iscsi disks before mounting, the only symptoms may be reduced performance, which may not be noticed till long after the actual failure, and may be difficult for a user to diagnose. However, such "failures" may also be perfectly normal in some cases, so we don't want to spam the system logs with them. One approach would be to put some more information into /proc/self/mountstats. Signed-off-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [pnfs: add commit client stats] [fixup data types for "ret" variables in pnfs_try_to* inline funcs.] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [fix definition of show_pnfs for !CONFIG_PNFS] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: Fix show_sessions in the not CONFIG_NFS_V4_1 case] There is a build error when CONFIG_NFS_V4 is set but CONFIG_NFS_V4_1 is *not* set. show_sessions() prototype was unbalanced between the two cases. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> [pnfs: super.c remove CONFIG_PNFS] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs: Use byte-range for cb_layoutrecallBenny Halevy3-9/+12
Use recalled range to invalidate particular layout segments in the layout cache. Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs: align layoutget requests on page boundariesBenny Halevy1-0/+8
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs: Use byte-range for layoutgetBenny Halevy4-45/+142
Add offset and count parameters to pnfs_update_layout and use them to get the layout in the pageio path. Order cache layout segments in the following order: * offset (ascending) * length (descending) * iomode (RW before READ) Test byte range against the layout segment in use in pnfs_{read,write}_pg_test so not to coalesce pages not using the same layout segment. [fix lseg ordering] [clean up pnfs_find_lseg lseg arg] [remove unnecessary FIXME] [fix ordering in pnfs_insert_layout] [clean up pnfs_insert_layout] Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29SUNRPC: introduce xdr_init_decode_pagesBenny Halevy5-21/+27
Initialize xdr_stream and xdr_buf using an array of page pointers and length of buffer. Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29NFSv4.1: use layout driver in global device cacheBenny Halevy4-16/+23
pnfs deviceids are unique per server, per layout type. struct nfs_client is currently used to distinguish deviceids from different nfs servers, yet these may clash between different layout types on the same server. Therefore, use the layout driver associated with each deviceid at insertion time to look it up, unhash, or delete it. Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29pnfs: CB_NOTIFY_DEVICEIDMarc Eshel5-15/+241
Note: This functionlaity is incomplete as all layout segments referring to the 'to be removed device id' need to be reaped, and all in flight I/O drained. [use be32 res in nfs4_callback_devicenotify] [use nfs_client to qualify deviceid for cb_notify_deviceid] [use global deviceid cache for CB_NOTIFY_DEVICEID] [refactor device cache _lookup_deviceid] [refactor device cache _find_get_deviceid] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [Bug in new global-device-cache code] [layout_driver MUST set free_deviceid_node if using dev-cache] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29eCryptfs: Cleanup inode initialization codeTyler Hicks2-68/+69
The eCryptfs inode get, initialization, and dentry interposition code has two separate paths. One is for when dentry interposition is needed after doing things like a mkdir in the lower filesystem and the other is needed after a lookup. Unlocking new inodes and doing a d_add() needs to happen at different times, depending on which type of dentry interposing is being done. This patch cleans up the inode get and initialization code paths and splits them up so that the locking and d_add() differences mentioned above can be handled appropriately in a later patch. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Tested-by: David <david@unsolicited.net>
2011-05-29NFSv4.1: purge deviceid cache on nfs_free_clientBenny Halevy6-7/+70
Use the pnfs_layoutdriver_type both as a qualifier for the deviceid, distinguishing deviceid from different layout types on the server, and for freeing the layout-driver allocated structure containing the nfs4_deviceid_node. [BUG in _deviceid_purge_client] [layout_driver MUST set free_deviceid_node if using dev-cache] [let ver < 4.1 compile] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> [removed EXPORT_SYMBOL_GPL(nfs4_deviceid_purge_client)] Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2011-05-29eCryptfs: Consolidate inode functions into inode.cTyler Hicks4-107/+91
These functions should live in inode.c since their focus is on inodes and they're primarily used by functions in inode.c. Also does a simple cleanup of ecryptfs_inode_test() and rolls ecryptfs_init_inode() into ecryptfs_inode_set(). Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Tested-by: David <david@unsolicited.net>
2011-05-29mm, rmap: Add yet more comments to page_get_anon_vma/page_lock_anon_vmaPeter Zijlstra1-1/+6
Inspired by an analysis from Hugh on why again all this doesn't explode in our face. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-29dm kcopyd: return client directly and not through a pointerMikulas Patocka4-10/+12
Return client directly from dm_kcopyd_client_create, not through a parameter, making it consistent with dm_io_client_create. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29dm kcopyd: reserve fewer pagesMikulas Patocka4-13/+6
Reserve just the minimum of pages needed to process one job. Because we allocate pages from page allocator, we don't need to reserve a large number of pages. The maximum job size is SUB_JOB_SIZE and we calculate the number of reserved pages based on this. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29dm io: use fixed initial mempool sizeMikulas Patocka6-41/+10
Replace the arbitrary calculation of an initial io struct mempool size with a constant. The code calculated the number of reserved structures based on the request size and used a "magic" multiplication constant of 4. This patch changes it to reserve a fixed number - itself still chosen quite arbitrarily. Further testing might show if there is a better number to choose. Note that if there is no memory pressure, we can still allocate an arbitrary number of "struct io" structures. One structure is enough to process the whole request. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29dm kcopyd: alloc pages from the main page allocatorMikulas Patocka1-31/+60
This patch changes dm-kcopyd so that it allocates pages from the main page allocator with __GFP_NOWARN | __GFP_NORETRY flags (so that it can fail in case of memory pressure). If the allocation fails, dm-kcopyd allocates pages from its own reserve. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29dm kcopyd: add gfp parm to alloc_plMikulas Patocka1-4/+4
Introduce a parameter for gfp flags to alloc_pl() for use in following patches. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29dm kcopyd: remove superfluous page allocation spinlockMikulas Patocka1-10/+1
Remove the spinlock protecting the pages allocation. The spinlock is only taken on initialization or from single-threaded workqueue. Therefore, the spinlock is useless. The spinlock is taken in kcopyd_get_pages and kcopyd_put_pages. kcopyd_get_pages is only called from run_pages_job, which is only called from process_jobs called from do_work. kcopyd_put_pages is called from client_alloc_pages (which is initialization function) or from run_complete_job. run_complete_job is only called from process_jobs called from do_work. Another spinlock, kc->job_lock is taken each time someone pushes or pops some work for the worker thread. Once we take kc->job_lock, we guarantee that any written memory is visible to the other CPUs. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29dm kcopyd: preallocate sub jobs to avoid deadlockMikulas Patocka1-20/+29
There's a possible theoretical deadlock in dm-kcopyd because multiple allocations from the same mempool are required to finish a request. Avoid this by preallocating sub jobs. There is a mempool of 512 entries. Each request requires up to 9 entries from the mempool. If we have at least 57 concurrent requests running, the mempool may overflow and mempool allocations may start blocking until another entry is freed to the mempool. Because the same thread is used to free entries to the mempool and allocate entries from the mempool, this may result in a deadlock. This patch changes it so that one mempool entry contains all 9 "struct kcopyd_job" required to fulfill the whole request. The allocation is done only once in dm_kcopyd_copy and no further mempool allocations are done during request processing. If dm_kcopyd_copy is not run in the completion thread, this implementation is deadlock-free. MIN_JOBS needs reducing accordingly and we've chosen to reduce it further to 8. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29dm kcopyd: avoid pointless job splittingMikulas Patocka1-1/+1
Don't split SUB_JOB_SIZE jobs If the job size equals SUB_JOB_SIZE, there is no point in splitting it. Splitting it just unnecessarily wastes time, because the split job size is SUB_JOB_SIZE too. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29dm mpath: do not fail paths after integrity errorsMartin K. Petersen1-1/+1
Integrity errors need to be passed to the owner of the integrity metadata for processing. Consequently EILSEQ should be passed up the stack. Cc: stable@kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-05-29dm table: reject devices without request fnsMilan Broz1-0/+17
This patch adds a check that a block device has a request function defined before it is used. Otherwise, misconfiguration can cause an oops. Because we are allowing devices with zero size e.g. an offline multipath device as in commit 2cd54d9bedb79a97f014e86c0da393416b264eb3 ("dm: allow offline devices") there needs to be an additional check to ensure devices are initialised. Some block devices, like a loop device without a backing file, exist but have no request function. Reproducer is trivial: dm-mirror on unbound loop device (no backing file on loop devices) dmsetup create x --table "0 8 mirror core 2 8 sync 2 /dev/loop0 0 /dev/loop1 0" and mirror resync will immediatelly cause OOps. BUG: unable to handle kernel NULL pointer dereference at (null) ? generic_make_request+0x2bd/0x590 ? kmem_cache_alloc+0xad/0x190 submit_bio+0x53/0xe0 ? bio_add_page+0x3b/0x50 dispatch_io+0x1ca/0x210 [dm_mod] ? read_callback+0x0/0xd0 [dm_mirror] dm_io+0xbb/0x290 [dm_mod] do_mirror+0x1e0/0x748 [dm_mirror] Signed-off-by: Milan Broz <mbroz@redhat.com> Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>