summaryrefslogtreecommitdiffstats
path: root/block/ll_rw_blk.c
AgeCommit message (Collapse)AuthorFilesLines
2006-09-30[PATCH] blk_queue_start_tag() shared map race fixJens Axboe1-4/+9
If we share the tag map between two or more queues, then we cannot use __set_bit() to set the bit. In fact we need to make sure we atomically acquire this tag, so loop using test_and_set_bit() to protect from that. Noticed by Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30[PATCH] exit_io_context: don't disable irqsOleg Nesterov1-5/+2
We don't need to disable irqs to clear current->io_context, it is protected by ->alloc_lock. Even IF it was possible to submit I/O from IRQ on behalf of current this irq_disable() can't help: current_io_context() will re-instantiate ->io_context after irq_enable(). We don't need task_lock() or local_irq_disable() to clear ioc->task. This can't prevent other CPUs from playing with our io_context anyway. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30[PATCH] Allow file systems to differentiate between data and meta readsJens Axboe1-0/+2
We can use this information for making more intelligent priority decisions, and it will also be useful for blktrace. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30[PATCH] ll_rw_blk: allow more flexibility for read_ahead_kb storeJens Axboe1-3/+0
It can make sense to set read-ahead larger than a single request. We should not be enforcing such policy on the user. Additionally, using the BLKRASET ioctl doesn't impose such a restriction. So additionally we now expose identical behaviour through the two. Issue also reported by Anton <cbou@mail.ru> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30[PATCH] Add blk_start_queueing() helperJens Axboe1-5/+20
CFQ implements this on its own now, but it's really block layer knowledge. Tells a device queue to start dispatching requests to the driver, taking care to unplug if needed. Also fixes the issue where as/cfq will invoke a stopped queue, which we really don't want. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30[PATCH] Make sure all block/io scheduler setups are node awareJens Axboe1-6/+7
Some were kmalloc_node(), some were still kmalloc(). Change them all to kmalloc_node(). Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30[PATCH] Audit block layer inlinesJens Axboe1-4/+4
Kill a few inlines that bring in too much code to more than one location Shrinks kernel text by about 300 bytes on 32-bit x86. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30[PATCH] cfq-iosched: kill cfq_exit_lockJens Axboe1-1/+1
cfq_exit_lock is protecting two things now: - The per-ioc rbtree of cfq_io_contexts - The per-cfqd linked list of cfq_io_contexts The per-cfqd linked list can be protected by the queue lock, as it is (by definition) per cfqd as the queue lock is. The per-ioc rbtree is mainly used and updated by the process itself only. The only outside use is the io priority changing. If we move the priority changing to not browsing the rbtree, we can remove any locking from the rbtree updates and lookup completely. Let the sys_ioprio syscall just mark processes as having the iopriority changed and lazily update the private cfq io contexts the next time io is queued, and we can remove this locking as well. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30[PATCH] ll_rw_blk: cleanup __make_request()Jens Axboe1-15/+7
- Don't assign variables that are only used once. - Kill spin_lock() prefetching, it's opportunistic at best. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30[PATCH] Drop useless bio passing in may_queue/set_request APIJens Axboe1-5/+4
It's not needed for anything, so kill the bio passing. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30[PATCH] Remove ->rq_status from struct requestJens Axboe1-3/+0
After Christophs SCSI change, the only usage left is RQ_ACTIVE and RQ_INACTIVE. The block layer sets RQ_INACTIVE right before freeing the request, so any check for RQ_INACTIVE in a driver is a bug and indicates use-after-free. So kill/clean the remaining users, straight forward. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30[PATCH] Remove struct request_list from struct requestJens Axboe1-8/+2
It is always identical to &q->rq, and we only use it for detecting whether this request came out of our mempool or not. So replace it with an additional ->flags bit flag. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30[PATCH] Remove ->waiting member from struct requestJens Axboe1-7/+6
As the comments indicates in blkdev.h, we can fold it into ->end_io_data usage as that is really what ->waiting is. Fixup the users of blk_end_sync_rq(). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2006-09-30[PATCH] elevator: abstract out the rbtree sort handlingJens Axboe1-3/+4
The rbtree sort/lookup/reposition logic is mostly duplicated in cfq/deadline/as, so move it to the elevator core. The io schedulers still provide the actual rb root, as we don't want to impose any sort of specific handling on the schedulers. Introduce the helpers and rb_node in struct request to help migrate the IO schedulers. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30[PATCH] elevator: move the backmerging logic into the elevator coreJens Axboe1-0/+2
Right now, every IO scheduler implements its own backmerging (except for noop, which does no merging). That results in duplicated code for essentially the same operation, which is never a good thing. This patch moves the backmerging out of the io schedulers and into the elevator core. We save 1.6kb of text and as a bonus get backmerging for noop as well. Win-win! Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-30[PATCH] Split struct request ->flags into two partsJens Axboe1-66/+35
Right now ->flags is a bit of a mess: some are request types, and others are just modifiers. Clean this up by splitting it into ->cmd_type and ->cmd_flags. This allows introduction of generic Linux block message types, useful for sending generic Linux commands to block devices. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-09-29[PATCH] ifdef blktrace debugging fieldsAlexey Dobriyan1-2/+1
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-23Merge mulgrave-w:git/linux-2.6James Bottomley1-0/+12
Conflicts: include/linux/blkdev.h Trivial merge to incorporate tag prototypes.
2006-09-22Add a real API for dealing with blk_congestion_wait()Trond Myklebust1-0/+12
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-08-31[SCSI] block: add support for shared tag mapsJames Bottomley1-21/+88
The current block queue implementation already contains most of the machinery for shared tag maps. The only remaining pieces are a way to allocate and destroy a tag map independently of the queues (so that the maps can be managed on the life cycle of the overseeing entity) Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2006-08-21[PATCH] Fix current_io_context() vs set_task_ioprio() raceOleg Nesterov1-0/+2
I know nothing about io scheduler, but I suspect set_task_ioprio() is not safe. current_io_context() initializes "struct io_context", then sets ->io_context. set_task_ioprio() running on another cpu may see the changes out of order, so ->set_ioprio(ioc) may use io_context which was not initialized properly. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-07-06[PATCH] Only the first two bits in bio->bi_rw and rq->flags matchJens Axboe1-2/+2
Not three, as assumed. This causes the barrier bit to be needlessly set for some IO. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-07-03[PATCH] lockdep: annotate on-stack completionsIngo Molnar1-1/+1
lockdep needs to have the waitqueue lock initialized for on-stack waitqueues implicitly initialized by DECLARE_COMPLETION(). Annotate on-stack completions accordingly. Has no effect on non-lockdep kernels. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds1-1/+0
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: Remove obsolete #include <linux/config.h> remove obsolete swsusp_encrypt arch/arm26/Kconfig typos Documentation/IPMI typos Kconfig: Typos in net/sched/Kconfig v9fs: do not include linux/version.h Documentation/DocBook/mtdnand.tmpl: typo fixes typo fixes: specfic -> specific typo fixes in Documentation/networking/pktgen.txt typo fixes: occuring -> occurring typo fixes: infomation -> information typo fixes: disadvantadge -> disadvantage typo fixes: aquire -> acquire typo fixes: mecanism -> mechanism typo fixes: bandwith -> bandwidth fix a typo in the RTC_CLASS help text smb is no longer maintained Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S
2006-06-30[PATCH] Light weight event countersChristoph Lameter1-2/+2
The remaining counters in page_state after the zoned VM counter patches have been applied are all just for show in /proc/vmstat. They have no essential function for the VM. We use a simple increment of per cpu variables. In order to avoid the most severe races we disable preempt. Preempt does not prevent the race between an increment and an interrupt handler incrementing the same statistics counter. However, that race is exceedingly rare, we may only loose one increment or so and there is no requirement (at least not in kernel) that the vm event counters have to be accurate. In the non preempt case this results in a simple increment for each counter. For many architectures this will be reduced by the compiler to a single instruction. This single instruction is atomic for i386 and x86_64. And therefore even the rare race condition in an interrupt is avoided for both architectures in most cases. The patchset also adds an off switch for embedded systems that allows a building of linux kernels without these counters. The implementation of these counters is through inline code that hopefully results in only a single instruction increment instruction being emitted (i386, x86_64) or in the increment being hidden though instruction concurrency (EPIC architectures such as ia64 can get that done). Benefits: - VM event counter operations usually reduce to a single inline instruction on i386 and x86_64. - No interrupt disable, only preempt disable for the preempt case. Preempt disable can also be avoided by moving the counter into a spinlock. - Handling is similar to zoned VM counters. - Simple and easily extendable. - Can be omitted to reduce memory use for embedded use. References: RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2 RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2 local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2 V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2 V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2 V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2 Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-30Remove obsolete #include <linux/config.h>Jörn Engel1-1/+0
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-27[PATCH] cpu hotplug: use hotplug version of cpu notifier in appropriate placesChandra Seetharaman1-3/+1
Make use the of newly defined hotplug version of cpu_notifier functionality wherever appropriate. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] cpu hotplug: revert initdata patch submitted for 2.6.17Chandra Seetharaman1-1/+1
This patch reverts notifier_block changes made in 2.6.17 Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26spelling fixesAndreas Mohr1-2/+2
acquired (aquired) contiguous (contigious) successful (succesful, succesfull) surprise (suprise) whether (weather) some other misspellings Signed-off-by: Andreas Mohr <andi@lisas.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-06-23[BLOCK] Fix bounce limit address checkAndi Kleen1-1/+1
Do a safer check for when to enable DMA. Currently we enable ISA DMA for cases that do not need it, resulting in OOM conditions when ZONE_DMA runs out of space. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23[PATCH] Kill PF_SYNCWRITE flagJens Axboe1-0/+3
A process flag to indicate whether we are doing sync io is incredibly ugly. It also causes performance problems when one does a lot of async io and then proceeds to sync it. Part of the io will go out as async, and the other part as sync. This causes a disconnect between the previously submitted io and the synced io. For io schedulers such as CFQ, this will cause us lost merges and suboptimal behaviour in scheduling. Remove PF_SYNCWRITE completely from the fsync/msync paths, and let the O_DIRECT path just directly indicate that the writes are sync by using WRITE_SYNC instead. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23[PATCH] blk_start_queue() must be called with irq disabled - add warningPaolo 'Blaisorblade' Giarrusso1-1/+4
The queue lock can be taken from interrupts so it must always be taken with irq disabling primitives. Some primitives already verify this. blk_start_queue() is called under this lock, so interrupts must be disabled. Also document this requirement clearly in blk_init_queue(), where the queue spinlock is set. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-06-23[PATCH] list: use list_replace_init() instead of list_splice_init()Oleg Nesterov1-3/+2
list_splice_init(list, head) does unneeded job if it is known that list_empty(head) == 1. We can use list_replace_init() instead. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-23[PATCH] blk: fix gendisk->in_flight accounting during barrier sequenceJens Axboe1-1/+6
While executing barrrier sequence, the bar_rq which carries actual write was accounted as normal IO on completion, while it wasn't on queueing. This caused gendisk->in_flight to be decremented by 1 after each barrier thus messed up statistics. This patch makes bar_rq not accounted as normal IO. As the containing barrier request as a whole is accounted, part of it shouldn't be. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-11[BLOCK] limit request_fn recursionJens Axboe1-2/+15
Don't recurse back into the driver even if the unplug threshold is met, when the driver asks for a requeue. This is both silly from a logical point of view (requeues typically happen due to driver/hardware shortage), and also dangerous since we could hit an endless request_fn -> requeue -> unplug -> request_fn loop and crash on stack overrun. Also limit blk_run_queue() to one level of recursion, similar to how blk_start_queue() works. This patch fixed a real problem with SLES10 and lpfc, and it could hit any SCSI lld that returns non-zero from it's ->queuecommand() handler. Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-26[PATCH] Remove __devinitdata from notifier block definitionsChandra Seetharaman1-1/+1
Few of the notifier_chain_register() callers use __devinitdata in the definition of notifier_block data structure. It is incorrect as the data structure should be available after the initializations (they do not unregister them during initializations). This was leading to an oops when notifier_chain_register() call is invoked for those callback chains after initialization. This patch fixes all such usages to _not_ have the notifier_block data structure in the init data section. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-20[patch] cleanup: use blk_queue_stoppedCoywolf Qi Hunt1-2/+2
This cleanup the source to use blk_queue_stopped. Signed-off-by: Coywolf Qi Hunt <qiyong@freeforge.net> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-04-02Documentation: fix minor kernel-doc warningsMartin Waitz1-1/+1
This patch updates the comments to match the actual code. Signed-off-by: Martin Waitz <tali@admingilde.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-03-28Merge branch 'cfq-merge' of git://brick.kernel.dk/data/git/linux-2.6-blockLinus Torvalds1-6/+15
* 'cfq-merge' of git://brick.kernel.dk/data/git/linux-2.6-block: [BLOCK] cfq-iosched: seek and async performance fixes [PATCH] ll_rw_blk: fix 80-col offender in put_io_context() [PATCH] cfq-iosched: small cfq_choose_req() optimization [PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to tree
2006-03-28[PATCH] for_each_possible_cpu: fixes for generic partKAMEZAWA Hiroyuki1-1/+1
replaces for_each_cpu with for_each_possible_cpu(). Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-28[PATCH] ll_rw_blk: fix 80-col offender in put_io_context()Jens Axboe1-1/+3
This makes akpm more happy. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-28[PATCH] [BLOCK] cfq-iosched: change cfq io context linking from list to treeJens Axboe1-6/+13
On setups with many disks, we spend a considerable amount of time looking up the process-disk mapping on each queue of io. Testing with a NULL based block driver, this costs 40-50% reduction in throughput for 1000 disks. Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-27Merge branch 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-blockLinus Torvalds1-7/+7
* 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block: [PATCH] Don't make debugfs depend on DEBUG_KERNEL [PATCH] Fix blktrace compile with sysfs not defined [PATCH] unused label in drivers/block/cciss. [BLOCK] increase size of disk stat counters [PATCH] blk_execute_rq_nowait-speedup [PATCH] ide-cd: quiet down GPCMD_READ_CDVD_CAPACITY failure [BLOCK] ll_rw_blk: kmalloc -> kzalloc conversion [PATCH] kzalloc() conversion in drivers/block [PATCH] update max_sectors documentation
2006-03-27[PATCH] md: Make sure QUEUE_FLAG_CLUSTER is set properly for md.NeilBrown1-0/+2
This flag should be set for a virtual device iff it is set for all underlying devices. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27[PATCH] blk_execute_rq_nowait-speedupAndrew Morton1-3/+5
Both elv_add_request() and generic_unplug_device() grab the queue lock and disable interrupts, do that locally and use the __ variants. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-27[BLOCK] ll_rw_blk: kmalloc -> kzalloc conversionJens Axboe1-4/+2
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-23[PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23Jens Axboe1-2/+42
Signed-off-by: Jens Axboe <axboe@suse.de>
2006-03-18[PATCH] fix sysfs interaction and lifetime rules handling for queuesAl Viro1-25/+58
2006-03-18[PATCH] deal with rmmod/put_io_context() racesAl Viro1-0/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2006-03-18[PATCH] fix locking in queue_requests_store()Al Viro1-3/+7
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>