summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2017-02-02blktrace: make do_blk_trace_setup() staticOmar Sandoval1-4/+0
This isn't used outside of blktrace.c anymore. Fixes: 62c2a7d969f3 ("block: push BKL into blktrace ioctls") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02block: fix debugfs config conditional in struct request_queueOmar Sandoval1-1/+1
The debugfs dentries are only used for CONFIG_BLK_DEBUG_FS, so make them conditional on that instead of CONFIG_DEBUG_FS. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02debugfs: add debugfs_lookup()Omar Sandoval1-0/+8
We don't always have easy access to the dentry of a file or directory we created in debugfs. Add a helper which allows us to get a dentry we previously created. The motivation for this change is a problem with blktrace and the blk-mq debugfs entries introduced in 07e4fead45e6 ("blk-mq: create debugfs directory tree"). Namely, in some cases, the directory that blktrace needs to create may already exist, but in other cases, it may not. We _could_ rely on a bunch of implied knowledge to decide whether to create the directory or not, but it's much cleaner on our end to just look it up. Signed-off-by: Omar Sandoval <osandov@fb.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02scsi, block: fix duplicate bdi name registration crashesDan Williams2-0/+9
Warnings of the following form occur because scsi reuses a devt number while the block layer still has it referenced as the name of the bdi [1]: WARNING: CPU: 1 PID: 93 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80 sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:192' [..] Call Trace: dump_stack+0x86/0xc3 __warn+0xcb/0xf0 warn_slowpath_fmt+0x5f/0x80 ? kernfs_path_from_node+0x4f/0x60 sysfs_warn_dup+0x62/0x80 sysfs_create_dir_ns+0x77/0x90 kobject_add_internal+0xb2/0x350 kobject_add+0x75/0xd0 device_add+0x15a/0x650 device_create_groups_vargs+0xe0/0xf0 device_create_vargs+0x1c/0x20 bdi_register+0x90/0x240 ? lockdep_init_map+0x57/0x200 bdi_register_owner+0x36/0x60 device_add_disk+0x1bb/0x4e0 ? __pm_runtime_use_autosuspend+0x5c/0x70 sd_probe_async+0x10d/0x1c0 async_run_entry_fn+0x39/0x170 This is a brute-force fix to pass the devt release information from sd_probe() to the locations where we register the bdi, device_add_disk(), and unregister the bdi, blk_cleanup_queue(). Thanks to Omar for the quick reproducer script [2]. This patch survives where an unmodified kernel fails in a few seconds. [1]: https://marc.info/?l=linux-scsi&m=147116857810716&w=4 [2]: http://marc.info/?l=linux-block&m=148554717109098&w=2 Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Bart Van Assche <bart.vanassche@sandisk.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Jan Kara <jack@suse.cz> Reported-by: Omar Sandoval <osandov@osandov.com> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02block: Get rid of blk_get_backing_dev_info()Jan Kara2-2/+1
blk_get_backing_dev_info() is now a simple dereference. Remove that function and simplify some code around that. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02block: Make blk_get_backing_dev_info() safe without open bdevJan Kara1-0/+1
Currenly blk_get_backing_dev_info() is not safe to be called when the block device is not open as bdev->bd_disk is NULL in that case. However inode_to_bdi() uses this function and may be call called from flusher worker or other writeback related functions without bdev being open which leads to crashes such as: [113031.075540] Unable to handle kernel paging request for data at address 0x00000000 [113031.075614] Faulting instruction address: 0xc0000000003692e0 0:mon> t [c0000000fb65f900] c00000000036cb6c writeback_sb_inodes+0x30c/0x590 [c0000000fb65fa10] c00000000036ced4 __writeback_inodes_wb+0xe4/0x150 [c0000000fb65fa70] c00000000036d33c wb_writeback+0x30c/0x450 [c0000000fb65fb40] c00000000036e198 wb_workfn+0x268/0x580 [c0000000fb65fc50] c0000000000f3470 process_one_work+0x1e0/0x590 [c0000000fb65fce0] c0000000000f38c8 worker_thread+0xa8/0x660 [c0000000fb65fd80] c0000000000fc4b0 kthread+0x110/0x130 [c0000000fb65fe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02block: Dynamically allocate and refcount backing_dev_infoJan Kara3-2/+11
Instead of storing backing_dev_info inside struct request_queue, allocate it dynamically, reference count it, and free it when the last reference is dropped. Currently only request_queue holds the reference but in the following patch we add other users referencing backing_dev_info. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02block: Use pointer to backing_dev_info from request_queueJan Kara1-1/+2
We will want to have struct backing_dev_info allocated separately from struct request_queue. As the first step add pointer to backing_dev_info to request_queue and convert all users touching it. No functional changes in this patch. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02block: Unhash block device inodes on gendisk destructionJan Kara1-0/+1
Currently, block device inodes stay around after corresponding gendisk hash died until memory reclaim finds them and frees them. Since we will make block device inode pin the bdi, we want to free the block device inode as soon as the device goes away so that bdi does not stay around unnecessarily. Furthermore we need to avoid issues when new device with the same major,minor pair gets created since reusing the bdi structure would be rather difficult in this case. Unhashing block device inode on gendisk destruction nicely deals with these problems. Once last block device inode reference is dropped (which may be directly in del_gendisk()), the inode gets evicted. Furthermore if the major,minor pair gets reallocated, we are guaranteed to get new block device inode even if old block device inode is not yet evicted and thus we avoid issues with possible reuse of bdi. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31block: move internal_tag to same cache line as tagJens Axboe1-2/+3
Since we removed cmd_type, we now have a hole in the struct. Move the internal_tag member to the same cacheline as tag, since we use them at the same time. This doesn't fix the hole, just moves it elsewhere. Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31block: fold cmd_type into the REQ_OP_ spaceChristoph Hellwig3-20/+23
Instead of keeping two levels of indirection for requests types, fold it all into the operations. The little caveat here is that previously cmd_type only applied to struct request, while the request and bio op fields were set to plain REQ_OP_READ/WRITE even for passthrough operations. Instead this patch adds new REQ_OP_* for SCSI passthrough and driver private requests, althought it has to add two for each so that we can communicate the data in/out nature of the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31ide: don't abuse cmd_typeChristoph Hellwig1-11/+45
Currently the legacy ide driver defines several request types of it's own, which is in the way of removing that field entirely. Instead add a type field to struct ide_request and use that to distinguish the different types of IDE-internal requests. It's a bit of a mess, but so is the surrounding code.. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31block: introduce blk_rq_is_passthroughChristoph Hellwig3-8/+14
This can be used to check for fs vs non-fs requests and basically removes all knowledge of BLOCK_PC specific from the block layer, as well as preparing for removing the cmd_type field in struct request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27block: split scsi_request out of struct requestChristoph Hellwig4-14/+39
And require all drivers that want to support BLOCK_PC to allocate it as the first thing of their private data. To support this the legacy IDE and BSG code is switched to set cmd_size on their queues to let the block layer allocate the additional space. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27block/bsg: move queue creation into bsg_setup_queueChristoph Hellwig1-3/+2
Simply the boilerplate code needed for bsg nodes a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27scsi: allocate scsi_cmnd structures as part of struct requestChristoph Hellwig1-3/+0
Rely on the new block layer functionality to allocate additional driver specific data behind struct request instead of implementing it in SCSI itѕelf. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27scsi: remove __scsi_alloc_queueChristoph Hellwig2-2/+2
Instead do an internal export of __scsi_init_queue for the transport classes that export BSG nodes. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27dm: always defer request allocation to the owner of the request_queueChristoph Hellwig1-3/+0
DM already calls blk_mq_alloc_request on the request_queue of the underlying device if it is a blk-mq device. But now that we allow drivers to allocate additional data and initialize it ahead of time we need to do the same for all drivers. Doing so and using the new cmd_size infrastructure in the block layer greatly simplifies the dm-rq and mpath code, and should also make arbitrary combinations of SQ and MQ devices with SQ or MQ device mapper tables easily possible as a further step. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27block: cleanup tracingChristoph Hellwig2-23/+18
A couple tweaks to the tracing code: - trace the request size for all requests - trace request sector and nr_sectors only for fs requests, enforced by helpers - drop SCSI CDB tracing - we have SCSI tracing for this and are going to me the CDB out of the generic struct request soon. With this the tracing code stops to know about BLOCK_PC requests entirely, it's just FS vs passthrough requests now, where the latter includes any driver-private requests. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27block: allow specifying size for extra command dataChristoph Hellwig1-0/+7
This mirrors the blk-mq capabilities to allocate extra drivers-specific data behind struct request by setting a cmd_size field, as well as having a constructor / destructor for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27block: simplify blk_init_allocated_queueChristoph Hellwig1-2/+1
Return an errno value instead of the passed in queue so that the callers don't have to keep track of two queues, and move the assignment of the request_fn and lock to the caller as passing them as argument doesn't simplify anything. While we're at it also remove two pointless NULL assignments, given that the request structure is zeroed on allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27Merge branch 'for-4.11/block' into for-4.11/rq-refactorJens Axboe5-12/+128
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27block: add a op_is_flush helperChristoph Hellwig1-0/+9
This centralizes the checks for bios that needs to be go into the flush state machine. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()Jens Axboe1-1/+1
When we invoke dispatch_requests(), the scheduler empties everything into the passed in list. This isn't always a good thing, since it means that we remove items that we could have potentially merged with. Change the function to dispatch single requests at the time. If we do that, we can backoff exactly at the point where the device can't consume more IO, and leave the rest with the scheduler for better merging and future dispatch decision making. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Tested-by: Hannes Reinecke <hare@suse.com>
2017-01-27blk-mq-sched: fix starvation for multiple hardware queues and shared tagsJens Axboe1-0/+1
If we have both multiple hardware queues and shared tag map between devices, we need to ensure that we propagate the hardware queue restart bit higher up. This is because we can get into a situation where we don't have any IO pending on a hardware queue, yet we fail getting a tag to start new IO. If that happens, it's not enough to mark the hardware queue as needing a restart, we need to bubble that up to the higher level queue as well. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Tested-by: Hannes Reinecke <hare@suse.com>
2017-01-27sbitmap: add helpers for dumping to a seq_fileOmar Sandoval1-0/+30
This is useful debugging information that will be used in the blk-mq debugfs directory. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Changed 'weight' to 'busy'. Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-27blk-mq: create debugfs directory treeOmar Sandoval1-0/+5
In preparation for putting blk-mq debugging information in debugfs, create a directory tree mirroring the one in sysfs: # tree -d /sys/kernel/debug/block /sys/kernel/debug/block |-- nvme0n1 | `-- mq | |-- 0 | | `-- cpu0 | |-- 1 | | `-- cpu1 | |-- 2 | | `-- cpu2 | `-- 3 | `-- cpu3 `-- vda `-- mq `-- 0 |-- cpu0 |-- cpu1 |-- cpu2 `-- cpu3 Also add the scaffolding for the actual files that will go in here, either under the hardware queue or software queue directories. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+1
Pull KVM fixes from Radim Krčmář: "ARM: - Fix for timer setup on VHE machines - Drop spurious warning when the timer races against the vcpu running again - Prevent a vgic deadlock when the initialization fails (for stable) s390: - Fix a kernel memory exposure (for stable) x86: - Fix exception injection when hypercall instruction cannot be patched" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: do not expose random data via facility bitmap KVM: x86: fix fixing of hypercalls KVM: arm/arm64: vgic: Fix deadlock on error handling KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systems KVM: arm/arm64: Fix occasional warning from the timer work function
2017-01-20Merge tag 'scsi-fixes' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is a set of 12 fixes including the mpt3sas one that was causing hangs on ATA passthrough. The others are a couple of zoned block device fixes, a SAS device detection bug which lead to SATA drives not being matched to bays, two qla2xxx MSI fixes, a qla2xxx req for rsp confusion caused by cut and paste, and a few other minor fixes" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: mpt3sas: fix hang on ata passthrough commands scsi: lpfc: Set elsiocb contexts to NULL after freeing it scsi: sd: Ignore zoned field for host-managed devices scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type scsi: bfa: fix wrongly initialized variable in bfad_im_bsg_els_ct_request() scsi: ses: Fix SAS device detection in enclosure scsi: libfc: Fix variable name in fc_set_wwpn scsi: lpfc: avoid double free of resource identifiers scsi: qla2xxx: remove irq_affinity_notifier scsi: qla2xxx: fix MSI-X vector affinity scsi: qla2xxx: Fix apparent cut-n-paste error. scsi: qla2xxx: Get mutex lock before checking optrom_state
2017-01-18Merge branch 'smp-urgent-for-linus' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug update from Thomas Gleixner: "This contains a trivial typo fix and an extension to the core code for dynamically allocating states in the prepare stage. The extension is necessary right now because we need a proper way to unbreak LTTNG, which iscurrently non functional due to the removal of the notifiers. Surely it's out of tree, but it's widely used by distros. The simple solution would have been to reserve a state for LTTNG, but I'm not fond about unused crap in the kernel and the dynamic range, which we admittedly should have done right away, allows us to remove quite some of the hardcoded states, i.e. those which have no ordering requirements. So doing the right thing now is better than having an smaller intermediate solution which needs to be reworked anyway" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Provide dynamic range for prepare stage perf/x86/amd/ibs: Fix typo after cleanup state names in cpu/hotplug
2017-01-18Merge branch 'rcu-urgent-for-linus' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU fixes from Ingo Molnar: "This fixes sporadic ACPI related hangs in synchronize_rcu() that were caused by the ACPI code mistakenly relying on an aspect of RCU that was neither promised to work nor reliable but which happened to work - until in v4.9 we changed the RCU implementation, which made the hangs more prominent. Since the mis-use of the RCU facility wasn't properly detected and prevented either, these fixes make the RCU side work reliably instead of working around the problem in the ACPI code. Hence the slightly larger diffstat that goes beyond the normal scope of RCU fixes in -rc kernels" * 'rcu-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Narrow early boot window of illegal synchronous grace periods rcu: Remove cond_resched() from Tiny synchronize_sched()
2017-01-17Merge tag 'modules-for-v4.10-rc5' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull modules fix from Jessica Yu: - fix out-of-tree module breakage when it supplies its own definitions of true and false * tag 'modules-for-v4.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: taint/module: Fix problems when out-of-kernel driver defines true or false
2017-01-17Merge remote-tracking branch 'mkp-scsi/fixes' into fixesJames Bottomley1-3/+3
2017-01-17taint/module: Fix problems when out-of-kernel driver defines true or falseLarry Finger1-2/+2
Commit 7fd8329ba502 ("taint/module: Clean up global and module taint flags handling") used the key words true and false as character members of a new struct. These names cause problems when out-of-kernel modules such as VirtualBox include their own definitions of true and false. Fixes: 7fd8329ba502 ("taint/module: Clean up global and module taint flags handling") Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Petr Mladek <pmladek@suse.com> Cc: Jessica Yu <jeyu@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jessica Yu <jeyu@redhat.com>
2017-01-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds6-7/+16
Pull networking fixes from David Miller: 1) Handle multicast packets properly in fast-RX path of mac80211, from Johannes Berg. 2) Because of a logic bug, the user can't actually force SW checksumming on r8152 devices. This makes diagnosis of hw checksumming bugs really annoying. Fix from Hayes Wang. 3) VXLAN route lookup does not take the source and destination ports into account, which means IPSEC policies cannot be matched properly. Fix from Martynas Pumputis. 4) Do proper RCU locking in netvsc callbacks, from Stephen Hemminger. 5) Fix SKB leaks in mlxsw driver, from Arkadi Sharshevsky. 6) If lwtunnel_fill_encap() fails, we do not abort the netlink message construction properly in fib_dump_info(), from David Ahern. 7) Do not use kernel stack for DMA buffers in atusb driver, from Stefan Schmidt. 8) Openvswitch conntack actions need to maintain a correct checksum, fix from Lance Richardson. 9) ax25_disconnect() is missing a check for ax25->sk being NULL, in fact it already checks this, but not in all of the necessary spots. Fix from Basil Gunn. 10) Action GET operations in the packet scheduler can erroneously bump the reference count of the entry, making it unreleasable. Fix from Jamal Hadi Salim. Jamal gives a great set of example command lines that trigger this in the commit message. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits) net sched actions: fix refcnt when GETing of action after bind net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions net/mlx4_core: Fix racy CQ (Completion Queue) free net: stmmac: don't use netdev_[dbg, info, ..] before net_device is registered net/mlx5e: Fix a -Wmaybe-uninitialized warning ax25: Fix segfault after sock connection timeout bpf: rework prog_digest into prog_tag tipc: allocate user memory with GFP_KERNEL flag net: phy: dp83867: allow RGMII_TXID/RGMII_RXID interface types ip6_tunnel: Account for tunnel header in tunnel MTU mld: do not remove mld souce list info when set link down be2net: fix MAC addr setting on privileged BE3 VFs be2net: don't delete MAC on close on unprivileged BE3 VFs be2net: fix status check in be_cmd_pmac_add() cpmac: remove hopeless #warning ravb: do not use zero-length alignment DMA descriptor mlx4: do not call napi_schedule() without care openvswitch: maintain correct checksum state in conntrack actions tcp: fix tcp_fastopen unaligned access complaints on sparc ...
2017-01-17blk-mq-sched: allow setting of default IO schedulerJens Axboe1-0/+1
Add Kconfig entries to manage what devices get assigned an MQ scheduler, and add a blk-mq flag for drivers to opt out of scheduling. The latter is useful for admin type queues that still allocate a blk-mq queue and tag set, but aren't use for normal IO. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-17blk-mq-sched: add framework for MQ capable IO schedulersJens Axboe3-2/+39
This adds a set of hooks that intercepts the blk-mq path of allocating/inserting/issuing/completing requests, allowing us to develop a scheduler within that framework. We reuse the existing elevator scheduler API on the registration side, but augment that with the scheduler flagging support for the blk-mq interfce, and with a separate set of ops hooks for MQ devices. We split driver and scheduler tags, so we can run the scheduling independently of device queue depth. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-17blk-mq: add support for carrying internal tag information in blk_qc_tJens Axboe1-5/+17
No functional change in this patch, just in preparation for having two types of tags available to the block layer for a single request. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-17blk-mq: un-export blk_mq_free_hctx_request()Jens Axboe1-1/+0
It's only used in blk-mq, kill it from the main exported header and kill the symbol export as well. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-17block: move existing elevator ops to unionJens Axboe1-1/+3
Prep patch for adding MQ ops as well, since doing anon unions with named initializers doesn't work on older compilers. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
2017-01-17Merge tag 'kvm-arm-for-4.10-rc4' of ↵Radim Krčmář1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM updates for 4.10-rc4 - Fix for timer setup on VHE machines - Drop spurious warning when the timer races against the vcpu running again - Prevent a vgic deadlock when the initialization fails
2017-01-16scsi: libfc: Fix variable name in fc_set_wwpnFam Zheng1-3/+3
The parameter name should be wwpn instead of wwnn. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Johannes Thumshirn <jth@kernel.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-01-16bpf: rework prog_digest into prog_tagDaniel Borkmann4-5/+7
Commit 7bd509e311f4 ("bpf: add prog_digest and expose it via fdinfo/netlink") was recently discussed, partially due to admittedly suboptimal name of "prog_digest" in combination with sha1 hash usage, thus inevitably and rightfully concerns about its security in terms of collision resistance were raised with regards to use-cases. The intended use cases are for debugging resp. introspection only for providing a stable "tag" over the instruction sequence that both kernel and user space can calculate independently. It's not usable at all for making a security relevant decision. So collisions where two different instruction sequences generate the same tag can happen, but ideally at a rather low rate. The "tag" will be dumped in hex and is short enough to introspect in tracepoints or kallsyms output along with other data such as stack trace, etc. Thus, this patch performs a rename into prog_tag and truncates the tag to a short output (64 bits) to make it obvious it's not collision-free. Should in future a hash or facility be needed with a security relevant focus, then we can think about requirements, constraints, etc that would fit to that situation. For now, rework the exposed parts for the current use cases as long as nothing has been released yet. Tested on x86_64 and s390x. Fixes: 7bd509e311f4 ("bpf: add prog_digest and expose it via fdinfo/netlink") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-16Merge tag 'nfsd-4.10-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds1-0/+1
Pull nfsd fixes from Bruce Fields: "Miscellaneous nfsd bugfixes, one for a 4.10 regression, three for older bugs" * tag 'nfsd-4.10-1' of git://linux-nfs.org/~bfields/linux: svcrdma: avoid duplicate dma unmapping during error recovery sunrpc: don't call sleeping functions from the notifier block callbacks svcrpc: don't leak contexts on PROC_DESTROY nfsd: fix supported attributes for acl & labels
2017-01-16cpu/hotplug: Provide dynamic range for prepare stageThomas Gleixner1-0/+2
Mathieu reported that the LTTNG modules are broken as of 4.10-rc1 due to the removal of the cpu hotplug notifiers. Usually I don't care much about out of tree modules, but LTTNG is widely used in distros. There are two ways to solve that: 1) Reserve a hotplug state for LTTNG 2) Add a dynamic range for the prepare states. While #1 is the simplest solution, #2 is the proper one as we can convert in tree users, which do not care about ordering, to the dynamic range as well. Add a dynamic range which allows LTTNG to request states in the prepare stage. Reported-and-tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Sewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701101353010.3401@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-16Merge branch 'rcu/urgent' of ↵Ingo Molnar1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into rcu/urgent Pull an urgent RCU fix from Paul E. McKenney: "This series contains a pair of commits that permit RCU synchronous grace periods (synchronize_rcu() and friends) to work correctly throughout boot. This eliminates the current "dead time" starting when the scheduler spawns its first taks and ending when the last of RCU's kthreads is spawned (this last happens during early_initcall() time). Although RCU's synchronous grace periods have long been documented as not working during this time, prior to 4.9, the expedited grace periods worked by accident, and some ACPI code came to rely on this unintentional behavior. (Note that this unintentional behavior was -not- reliable. For example, failures from ACPI could occur on !SMP systems and on systems booting with the rcu_normal kernel boot parameter.) Either way, there is a bug that needs fixing, and the 4.9 switch of RCU's expedited grace periods to workqueues could be considered to have caused a regression. This series therefore makes RCU's expedited grace periods operate correctly throughout the boot process. This has been demonstrated to fix the problems ACPI was encountering, and has the added longer-term benefit of simplifying RCU's behavior." Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-15Merge tag 'mac80211-for-davem-2017-01-13' of ↵David S. Miller1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== We have a number of fixes, in part because I was late in actually sending them out - will try to do better in the future: * handle VHT opmode properly when hostapd is controlling full station state * two fixes for minimum channel width in mac80211 * don't leave SMPS set to junk in HT capabilities * fix headroom when forwarding mesh packets, recently broken by another fix that failed to take into account frame encryption * fix the TID in null-data packets indicating EOSP (end of service period) in U-APSD * prevent attempting to use (and then failing which results in crashes) TXQs on stations that aren't added to the driver yet ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-15Merge branch 'i2c/for-current' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Bugfixes for I2C. Mostly core this time which is a bit unusual but nothing really scary in there" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: piix4: Avoid race conditions with IMC i2c: fix spelling mistake: "insufficent" -> "insufficient" i2c: print correct device invalid address i2c: do not enable fall back to Host Notify by default i2c: fix kernel memory disclosure in dev interface
2017-01-15Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc race fixes uncovered by fuzzing efforts, a Sparse fix, two PMU driver fixes, plus miscellanous tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Reject non sampling events with precise_ip perf/x86/intel: Account interrupts for PEBS errors perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race perf/core: Fix sys_perf_event_open() vs. hotplug perf/x86/intel: Use ULL constant to prevent undefined shift behaviour perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code perf/x86: Set pmu->module in Intel PMU modules perf probe: Fix to probe on gcc generated symbols for offline kernel perf probe: Fix --funcs to show correct symbols for offline module perf symbols: Robustify reading of build-id from sysfs perf tools: Install tools/lib/traceevent plugins with install-bin tools lib traceevent: Fix prev/next_prio for deadline tasks perf record: Fix --switch-output documentation and comment perf record: Make __record_options static tools lib subcmd: Add OPT_STRING_OPTARG_SET option perf probe: Fix to get correct modname from elf header samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include samples/bpf sock_example: Avoid getting ethhdr from two includes perf sched timehist: Show total scheduling time
2017-01-15Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "A number of regression fixes: - Fix a boot hang on machines that have somewhat unusual memory map entries of phys_addr=0x0 num_pages=0, which broke due to a recent commit. This commit got cherry-picked from the v4.11 queue because the bug is affecting real machines. - Fix a boot hang also reported by KASAN, caused by incorrect init ordering introduced by a recent optimization. - Fix a recent robustification fix to allocate_new_fdt_and_exit_boot() that introduced an invalid assumption. Neither bugs were seen in the wild AFAIK" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/x86: Prune invalid memory map entries and fix boot regression x86/efi: Don't allocate memmap through memblock after mm_init() efi/libstub/arm*: Pass latest memory map to the kernel