summaryrefslogtreecommitdiffstats
path: root/drivers/block/xen-blkback/blkback.c
AgeCommit message (Collapse)AuthorFilesLines
2012-11-26xen-blkback: move free persistent grants codeRoger Pau Monne1-31/+37
Move the code that frees persistent grants from the red-black tree to a function. This will make it easier for other consumers to move this to a common place. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-11-04xen/blkback: persistent-grants fixesRoger Pau Monne1-3/+6
This patch contains fixes for persistent grants implementation v2: * handle == 0 is a valid handle, so initialize grants in blkback setting the handle to BLKBACK_INVALID_HANDLE instead of 0. Reported by Konrad Rzeszutek Wilk. * new_map is a boolean, use "true" or "false" instead of 1 and 0. Reported by Konrad Rzeszutek Wilk. * blkfront announces the persistent-grants feature as feature-persistent-grants, use feature-persistent instead which is consistent with blkback and the public Xen headers. * Add a consistency check in blkfront to make sure we don't try to access segments that have not been set. Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> [v1: The new_map int->bool had already been changed] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-10-30xen/blkback: Persistent grant maps for xen blk driversRoger Pau Monne1-25/+267
This patch implements persistent grants for the xen-blk{front,back} mechanism. The effect of this change is to reduce the number of unmap operations performed, since they cause a (costly) TLB shootdown. This allows the I/O performance to scale better when a large number of VMs are performing I/O. Previously, the blkfront driver was supplied a bvec[] from the request queue. This was granted to dom0; dom0 performed the I/O and wrote directly into the grant-mapped memory and unmapped it; blkfront then removed foreign access for that grant. The cost of unmapping scales badly with the number of CPUs in Dom0. An experiment showed that when Dom0 has 24 VCPUs, and guests are performing parallel I/O to a ramdisk, the IPIs from performing unmap's is a bottleneck at 5 guests (at which point 650,000 IOPS are being performed in total). If more than 5 guests are used, the performance declines. By 10 guests, only 400,000 IOPS are being performed. This patch improves performance by only unmapping when the connection between blkfront and back is broken. On startup blkfront notifies blkback that it is using persistent grants, and blkback will do the same. If blkback is not capable of persistent mapping, blkfront will still use the same grants, since it is compatible with the previous protocol, and simplifies the code complexity in blkfront. To perform a read, in persistent mode, blkfront uses a separate pool of pages that it maps to dom0. When a request comes in, blkfront transmutes the request so that blkback will write into one of these free pages. Blkback keeps note of which grefs it has already mapped. When a new ring request comes to blkback, it looks to see if it has already mapped that page. If so, it will not map it again. If the page hasn't been previously mapped, it is mapped now, and a record is kept of this mapping. Blkback proceeds as usual. When blkfront is notified that blkback has completed a request, it memcpy's from the shared memory, into the bvec supplied. A record that the {gref, page} tuple is mapped, and not inflight is kept. Writes are similar, except that the memcpy is peformed from the supplied bvecs, into the shared pages, before the request is put onto the ring. Blkback stores a mapping of grefs=>{page mapped to by gref} in a red-black tree. As the grefs are not known apriori, and provide no guarantees on their ordering, we have to perform a search through this tree to find the page, for every gref we receive. This operation takes O(log n) time in the worst case. In blkfront grants are stored using a single linked list. The maximum number of grants that blkback will persistenly map is currently set to RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, to prevent a malicios guest from attempting a DoS, by supplying fresh grefs, causing the Dom0 kernel to map excessively. If a guest is using persistent grants and exceeds the maximum number of grants to map persistenly the newly passed grefs will be mapped and unmaped. Using this approach, we can have requests that mix persistent and non-persistent grants, and we need to handle them correctly. This allows us to set the maximum number of persistent grants to a lower value than RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, although setting it will lead to unpredictable performance. In writing this patch, the question arrises as to if the additional cost of performing memcpys in the guest (to/from the pool of granted pages) outweigh the gains of not performing TLB shootdowns. The answer to that question is `no'. There appears to be very little, if any additional cost to the guest of using persistent grants. There is perhaps a small saving, from the reduced number of hypercalls performed in granting, and ending foreign access. Signed-off-by: Oliver Chick <oliver.chick@citrix.com> Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v1: Fixed up the misuse of bool as int]
2012-10-07Merge tag 'stable/for-linus-3.7-arm-tag' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull ADM Xen support from Konrad Rzeszutek Wilk: Features: * Allow a Linux guest to boot as initial domain and as normal guests on Xen on ARM (specifically ARMv7 with virtualized extensions). PV console, block and network frontend/backends are working. Bug-fixes: * Fix compile linux-next fallout. * Fix PVHVM bootup crashing. The Xen-unstable hypervisor (so will be 4.3 in a ~6 months), supports ARMv7 platforms. The goal in implementing this architecture is to exploit the hardware as much as possible. That means use as little as possible of PV operations (so no PV MMU) - and use existing PV drivers for I/Os (network, block, console, etc). This is similar to how PVHVM guests operate in X86 platform nowadays - except that on ARM there is no need for QEMU. The end result is that we share a lot of the generic Xen drivers and infrastructure. Details on how to compile/boot/etc are available at this Wiki: http://wiki.xen.org/wiki/Xen_ARMv7_with_Virtualization_Extensions and this blog has links to a technical discussion/presentations on the overall architecture: http://blog.xen.org/index.php/2012/09/21/xensummit-sessions-new-pvh-virtualisation-mode-for-arm-cortex-a15arm-servers-and-x86/ * tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (21 commits) xen/xen_initial_domain: check that xen_start_info is initialized xen: mark xen_init_IRQ __init xen/Makefile: fix dom-y build arm: introduce a DTS for Xen unprivileged virtual machines MAINTAINERS: add myself as Xen ARM maintainer xen/arm: compile netback xen/arm: compile blkfront and blkback xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree xen/arm: receive Xen events on ARM xen/arm: initialize grant_table on ARM xen/arm: get privilege status xen/arm: introduce CONFIG_XEN on ARM xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM xen/arm: Introduce xen_ulong_t for unsigned long xen/arm: Xen detection and shared_info page mapping docs: Xen ARM DT bindings xen/arm: empty implementation of grant_table arch specific functions xen/arm: sync_bitops xen/arm: page.h definitions xen/arm: hypercalls ...
2012-09-12xen/m2p: do not reuse kmap_op->dev_bus_addrStefano Stabellini1-1/+1
If the caller passes a valid kmap_op to m2p_add_override, we use kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is part of the interface with Xen and if we are batching the hypercalls it might not have been written by the hypervisor yet. That means that later on Xen will write to it and we'll think that the original mfn is actually what Xen has written to it. Rather than "stealing" struct members from kmap_op, keep using page->index to store the original mfn and add another parameter to m2p_remove_override to get the corresponding kmap_op instead. It is now responsibility of the caller to keep track of which kmap_op corresponds to a particular page in the m2p_override (gntdev, the only user of this interface that passes a valid kmap_op, is already doing that). CC: stable@kernel.org Reported-and-Tested-By: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-08-08xen/arm: compile blkfront and blkbackStefano Stabellini1-0/+1
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-24xen/blkback: Squash the discard support for 'file' and 'phy' type.Konrad Rzeszutek Wilk1-11/+8
The only reason for the distinction was for the special case of 'file' (which is assumed to be loopback device), was to reach inside the loopback device, find the underlaying file, and call fallocate on it. Fortunately "xen-blkback: convert hole punching to discard request on loop devices" removes that use-case and we now based the discard support based on blk_queue_discard(q) and extract all appropriate parameters from the 'struct request_queue'. CC: Li Dongyang <lidongyang@novell.com> Acked-by: Jan Beulich <JBeulich@suse.com> [v1: Dropping pointless initializer and keeping blank line] [v2: Remove the kfree as it is not used anymore] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-20xen/blkback: Enable blkback on HVM guestsDaniel De Graaf1-1/+1
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-03-20xen/blkback: use grant-table.c hypercall wrappersDaniel De Graaf1-25/+4
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-18xen-blkback: convert hole punching to discard request on loop devicesLi Dongyang1-17/+2
As of dfaa2ef68e80c378e610e3c8c536f1c239e8d3ef, loop devices support discard request now. We could just issue a discard request, and the loop driver will punch the hole for us, so we don't need to touch the internals of loop device and punch the hole ourselves, Thanks. V0->V1: rebased on devel/for-jens-3.3 Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-18xen/blkback: Move processing of BLKIF_OP_DISCARD from dispatch_rw_block_ioKonrad Rzeszutek Wilk1-30/+24
.. and move it to its own function that will deal with the discard operation. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-18xen/blk[front|back]: Enhance discard support with secure erasing support.Konrad Rzeszutek Wilk1-6/+12
Part of the blkdev_issue_discard(xx) operation is that it can also issue a secure discard operation that will permanantly remove the sectors in question. We advertise that we can support that via the 'discard-secure' attribute and on the request, if the 'secure' bit is set, we will attempt to pass in REQ_DISCARD | REQ_SECURE. CC: Li Dongyang <lidongyang@novell.com> [v1: Used 'flag' instead of 'secure:1' bit] [v2: Use 'reserved' uint8_t instead of adding a new value] [v3: Check for nseg when mapping instead of operation] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-18xen/blk[front|back]: Squash blkif_request_rw and blkif_request_discard togetherKonrad Rzeszutek Wilk1-6/+7
In a union type structure to deal with the overlapping attributes in a easier manner. Suggested-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-11-04Merge branch 'for-3.2/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds1-21/+109
* 'for-3.2/drivers' of git://git.kernel.dk/linux-block: (30 commits) virtio-blk: use ida to allocate disk index hpsa: add small delay when using PCI Power Management to reset for kump cciss: add small delay when using PCI Power Management to reset for kump xen/blkback: Fix two races in the handling of barrier requests. xen/blkback: Check for proper operation. xen/blkback: Fix the inhibition to map pages when discarding sector ranges. xen/blkback: Report VBD_WSECT (wr_sect) properly. xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests. xen-blkfront: plug device number leak in xlblk_init() error path xen-blkfront: If no barrier or flush is supported, use invalid operation. xen-blkback: use kzalloc() in favor of kmalloc()+memset() xen-blkback: fixed indentation and comments xen-blkfront: fix a deadlock while handling discard response xen-blkfront: Handle discard requests. xen-blkback: Implement discard requests ('feature-discard') xen-blkfront: add BLKIF_OP_DISCARD and discard request struct drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd() drivers/block/loop.c: emit uevent on auto release drivers/block/cpqarray.c: use pci_dev->revision loop: always allow userspace partitions and optionally support automatic scanning ... Fic up trivial header file includsion conflict in drivers/block/loop.c
2011-10-17xen/blkback: Fix two races in the handling of barrier requests.Konrad Rzeszutek Wilk1-5/+5
There are two windows of opportunity to cause a race when processing a barrier request. This patch fixes this. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-14xen/blkback: Check for proper operation.Konrad Rzeszutek Wilk1-1/+1
The patch titled: "xen/blkback: Fix the inhibition to map pages when discarding sector ranges." had the right idea except that it used the wrong comparison operator. It had == instead of !=. This fixes the bug where all (except discard) operations would have been ignored. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-13xen/blkback: Fix the inhibition to map pages when discarding sector ranges.Konrad Rzeszutek Wilk1-2/+1
The 'operation' parameters are the ones provided to the bio layer while the req->operation are the ones passed in between the backend and frontend. We used the wrong 'operation' value to squash the call to map pages when processing the discard operation resulting in an hypercall that did nothing. Lets guard against going in the mapping function by checking for the proper operation type. CC: Li Dongyang <lidongyang@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-13xen/blkback: Report VBD_WSECT (wr_sect) properly.Konrad Rzeszutek Wilk1-1/+1
We did not increment the amount of sectors written to disk b/c we tested for the == WRITE which is incorrect - as the operations are more of WRITE_FLUSH, WRITE_ODIRECT. This patch fixes it by doing a & WRITE check. CC: stable@kernel.org Reported-by: Andy Burns <xen.lists@burns.me.uk> Suggested-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-13xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.Konrad Rzeszutek Wilk1-2/+35
We emulate the barrier requests by draining the outstanding bio's and then sending the WRITE_FLUSH command. To drain the I/Os we use the refcnt that is used during disconnect to wait for all the I/Os before disconnecting from the frontend. We latch on its value and if it reaches either the threshold for disconnect or when there are no more outstanding I/Os, then we have drained all I/Os. Suggested-by: Christopher Hellwig <hch@infradead.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-13xen-blkback: use kzalloc() in favor of kmalloc()+memset()Jan Beulich1-4/+2
This fixes the problem of three of those four memset()-s having improper size arguments passed: Sizeof a pointer-typed expression returns the size of the pointer, not that of the pointed to data. It also reverts using kmalloc() instead of kzalloc() for the allocation of the pending grant handles array, as that array gets fully initialized in a subsequent loop. Reported-by: Julia Lawall <julia@diku.dk> Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-13xen-blkback: Implement discard requests ('feature-discard')Li Dongyang1-14/+72
..aka ATA TRIM/SCSI UNMAP command to be passed through the frontend and used as appropiately by the backend. We also advertise certain granulity parameters to the frontend so it can plug them in. If the backend is a realy device - we just end up using 'blkdev_issue_discard' while for loopback devices - we just punch a hole in the image file. Signed-off-by: Li Dongyang <lidongyang@novell.com> [v1: Fixed up pr_debug and commit description] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-09-29xen: modify kernel mappings corresponding to granted pagesStefano Stabellini1-1/+1
If we want to use granted pages for AIO, changing the mappings of a user vma and the corresponding p2m is not enough, we also need to update the kernel mappings accordingly. Currently this is only needed for pages that are created for user usages through /dev/xen/gntdev. As in, pages that have been in use by the kernel and use the P2M will not need this special mapping. However there are no guarantees that in the future the kernel won't start accessing pages through the 1:1 even for internal usage. In order to avoid the complexity of dealing with highmem, we allocated the pages lowmem. We issue a HYPERVISOR_grant_table_op right away in m2p_add_override and we remove the mappings using another HYPERVISOR_grant_table_op in m2p_remove_override. Considering that m2p_add_override and m2p_remove_override are called once per page we use multicalls and hypercall batching. Use the kmap_op pointer directly as argument to do the mapping as it is guaranteed to be present up until the unmapping is done. Before issuing any unmapping multicalls, we need to make sure that the mapping has already being done, because we need the kmap->handle to be set correctly. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> [v1: Removed GRANT_FRAME_BIT usage] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-06-30xen/blkback: Add module alias for autoloadingBastian Blank1-0/+1
Add xen-backend:vbd module alias to the xen-blkback module. This allows automatic loading of the module. Signed-off-by: Bastian Blank <waldi@debian.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-06-30xen/blkback: Don't let in-flight requests defer pending ones.Daniel Stodden1-17/+19
Running RING_FINAL_CHECK_FOR_REQUESTS from make_response is a bad idea. It means that in-flight I/O is essentially blocking continued batches. This essentially kills throughput on frontends which unplug (or even just notify) early and rightfully assume addtional requests will be picked up on time, not synchronously. Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com> [v1: Rebased and fixed compile problems] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-06-01xen/blkback: potential null dereference in error handlingDan Carpenter1-4/+6
blkbk->pending_pages can be NULL here so I added a check for it. Signed-off-by: Dan Carpenter <error27@gmail.com> [v1: Redid the loop a bit] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-18xen/blkback: don't fail empty barrier requestsJan Beulich1-7/+8
The sector number on empty barrier requests may (will?) be -1, which, given that it's being treated as unsigned 64-bit quantity, will almost always exceed the actual (virtual) disk's size. Inspired by Konrad's "When writting barriers set the sector number to zero...". While at it also add overflow checking to the math in vbd_translate(). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-13xen/blkback: fix xenbus_transaction_start() hang caused by double ↵Laszlo Ersek1-0/+1
xenbus_transaction_end() vbd_resize() up_read()'s xs_state.suspend_mutex twice in a row via double xenbus_transaction_end() calls. The next down_read() in xenbus_transaction_start() (at eg. the next resize attempt) hangs. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=618317 Acked-by: Jan Beulich <jbeulich@novell.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12xen/blkback: if log_stats is enabled print out the data.Konrad Rzeszutek Wilk1-1/+1
And not depend on the driver being built with -DDEBUG flag. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12xen/blkback: Prefix 'vbd' with 'xen' in structs and functions.Konrad Rzeszutek Wilk1-10/+10
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12xen/blkback: Change structure name blkif_st to xen_blkif.Konrad Rzeszutek Wilk1-14/+14
No need for that '_st' and xen_blkif is more apt. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12xen/blkback: Fixing some more of the cleanpatch.pl warnings.Konrad Rzeszutek Wilk1-1/+1
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12xen/blkback: Checkpatch.pl recommend against multiple assigments.Konrad Rzeszutek Wilk1-3/+6
CHECK: multiple assignments should be avoided Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12xen/blkback: Removing the debug_lvl option.Konrad Rzeszutek Wilk1-7/+0
It is not really used for anything. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12xen/blkback: Use the DRV_PFX in the pr_.. macros.Konrad Rzeszutek Wilk1-18/+18
To make it easier to read. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-12xen/blkback: Change printk/DPRINTK to pr_.. type variant.Konrad Rzeszutek Wilk1-29/+29
And also make them uniform and prefix the message with 'xen-blkback'. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-11xen/blkback: Fixed up comments and converted spaces to tabs.Konrad Rzeszutek Wilk1-28/+42
Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-05xen/blkback: Fix up some of the comments.Konrad Rzeszutek Wilk1-3/+3
They had the wrong data or were in the wrong spot. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-05xen/blkback: Squash the checking for operation into dispatch_rw_block_ioKonrad Rzeszutek Wilk1-32/+13
We do a check for the operations right before calling dispatch_rw_block_io. And then we do the same check in dispatch_rw_block_io. This patch squashes those checks into the 'dispatch_rw_block_io' function. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-05-05xen/blkback: Add support for BLKIF_OP_FLUSH_DISKCACHE and drop ↵Konrad Rzeszutek Wilk1-15/+18
BLKIF_OP_WRITE_BARRIER. We drop the support for 'feature-barrier' and add in the support for the 'feature-flush-cache' if the real backend storage supports flushing. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-27Revert "xen/blkback: Move the plugging/unplugging to a higher level."Konrad Rzeszutek Wilk1-6/+7
This reverts commit 97961ef46b9b5a6a7c918a38b898a7b3e49869f4 b/c we lose about 15% performance if we do the unplugging and the end of the reading the ring buffer.
2011-04-26xen/blkback: Stick REQ_SYNC on WRITEs to deal with CFQ I/O scheduler.Konrad Rzeszutek Wilk1-1/+1
If one runs a simple fio request with random read/write with a 20%/80% ratio, the numbers are incredibly bad when using the CFQ scheduler. IOmeter | | | | 64K, randrw | NOOP | CFQ | deadline | randrwmix=80 | | | | --------------+-------+------+----------+ blkback |103/27 |32/10 | 102/27 | --------------+-------+------+----------+ QEMU qdisk |103/27 |102/27| 102/27 | The problem as explained by Vivek Goyal was: ".. that difference is that sync vs async requests. In the case of a kernel thread submitting IO, [..] all the WRITES might be being considered as async and will go in a different queue. If you mix those with some READS, they are always sync and will go in differnet queue. In presence of sync queue, CFQ will idle and choke up WRITES in an attempt to improve latencies of READs. In case of AIO [note: this is what QEMU qdisk is doing] , [..] it is direct IO and both READS and WRITES will be considered SYNC and will go in a single queue and no choking of WRITES will take place." The solution is quite simple, tack on REQ_SYNC (which is what the WRITE_ODIRECT macro points to) and the numbers go back up. Suggested-by: Vivek Goyal <vgoyal@redhat.com Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-26xen/blkback: Move the plugging/unplugging to a higher level.Konrad Rzeszutek Wilk1-7/+6
We used to the plug/unplug on the submit_bio. But that means if within a stream of WRITE, WRITE, WRITE,...,WRITE we have one READ, it could stall the pipeline (as the 'submio_bio' could trigger the unplug_fnc to be called and stall/sync when doing the READ). Instead we want to move the unplugging when the whole (or as a much as possible) ring buffer has been processed. This also eliminates us doing plug/unplug for each request. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-20xen/blkback: Prefix exposed functions with xen_Konrad Rzeszutek Wilk1-18/+18
And also shorten the name if it has blkback to blkbk. This results in the symbol table (if compiled in the kernel) to be much shorter, prettier, and also easier to search for. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-20xen-blkback: Inline some of the functions that were moved from vbd/interface.cKonrad Rzeszutek Wilk1-72/+6
Shuffling code around. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-20xen/blkback: Squash vbd.c,interface.c in blkback.c and xenbus.c respectivly.Konrad Rzeszutek Wilk1-0/+135
Daniel Stodden suggested to eliminate vbd.c and interface.c, inlining the critical bits where they belong, respectively. Leaving only blkback.c for the data- and xenbus.c for the control path. Suggested-by: Daniel Stodden <daniel.stodden@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-04-18xen/blkback: Move it from drivers/xen to drivers/blockKonrad Rzeszutek Wilk1-0/+759
.. and modify the Makefile and Kconfig files appropriately. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>