summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2015-10-14mm: add architecture primitives for software dirty bit clearingMartin Schwidefsky1-0/+10
There are primitives to create and query the software dirty bits in a pte or pmd. But the clearing of the software dirty bits is done in common code with x86 specific page table functions. Add the missing architecture primitives to clear the software dirty bits to allow the feature to be used on non-x86 systems, e.g. the s390 architecture. Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-11Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "Three trivial commits: - Fix a kerneldoc regression - Export handle_bad_irq to unbreak a driver in next - Add an accessor for the of_node field so refactoring in next does not depend on merge ordering" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqdomain: Add an accessor for the of_node field genirq: Fix handle_bad_irq kerneldoc comment genirq: Export handle_bad_irq
2015-10-10Merge tag 'usb-4.3-rc5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB and PHY fixes and quirk updates for 4.3-rc5. Nothing major here, full details in the shortlog, and all of these have been in linux-next for a while" * tag 'usb-4.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: Add device quirk for Logitech PTZ cameras USB: chaoskey read offset bug USB: Add reset-resume quirk for two Plantronics usb headphones. usb: renesas_usbhs: Add support for R-Car H3 usb: renesas_usbhs: fix build warning if 64-bit architecture usb: gadget: bdc: fix memory leak phy: berlin-sata: Fix module autoload for OF platform driver phy: rockchip-usb: power down phy when rockchip phy probe phy: qcom-ufs: fix build error when the component is built as a module
2015-10-09irqdomain: Add an accessor for the of_node fieldMarc Zyngier1-0/+5
As we're about to remove the of_node field from the irqdomain structure, introduce an accessor for it. Subsequent patches will take care of the actual repainting. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jason Cooper <jason@lakedaemon.net> Link: http://lkml.kernel.org/r/1444402211-1141-1-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-06Merge tag 'for-linus-4.3b-rc4-tag' of ↵Linus Torvalds1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: - Fix VM save performance regression with x86 PV guests - Make kexec work in x86 PVHVM guests (if Xen has the soft-reset ABI) - Other minor fixes. * tag 'for-linus-4.3b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen/p2m: hint at the last populated P2M entry x86/xen: Do not clip xen_e820_map to xen_e820_map_entries when sanitizing map x86/xen: Support kexec/kdump in HVM guests by doing a soft reset xen/x86: Don't try to write syscall-related MSRs for PV guests xen: use correct type for HYPERVISOR_memory_op()
2015-10-04Merge branch 'strscpy' of ↵Linus Torvalds2-8/+75
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile Pull strscpy string copy function implementation from Chris Metcalf. Chris sent this during the merge window, but I waffled back and forth on the pull request, which is why it's going in only now. The new "strscpy()" function is definitely easier to use and more secure than either strncpy() or strlcpy(), both of which are horrible nasty interfaces that have serious and irredeemable problems. strncpy() has a useless return value, and doesn't NUL-terminate an overlong result. To make matters worse, it pads a short result with zeroes, which is a performance disaster if you have big buffers. strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking the insane NUL padding, but having a differently broken return value which returns the original length of the source string. Which means that it will read characters past the count from the source buffer, and you have to trust the source to be properly terminated. It also makes error handling fragile, since the test for overflow is unnecessarily subtle. strscpy() avoids both these problems, guaranteeing the NUL termination (but not excessive padding) if the destination size wasn't zero, and making the overflow condition very obvious by returning -E2BIG. It also doesn't read past the size of the source, and can thus be used for untrusted source data too. So why did I waffle about this for so long? Every time we introduce a new-and-improved interface, people start doing these interminable series of trivial conversion patches. And every time that happens, somebody does some silly mistake, and the conversion patch to the improved interface actually makes things worse. Because the patch is mindnumbing and trivial, nobody has the attention span to look at it carefully, and it's usually done over large swatches of source code which means that not every conversion gets tested. So I'm pulling the strscpy() support because it *is* a better interface. But I will refuse to pull mindless conversion patches. Use this in places where it makes sense, but don't do trivial patches to fix things that aren't actually known to be broken. * 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: tile: use global strscpy() rather than private copy string: provide strscpy() Make asm/word-at-a-time.h available on all architectures
2015-10-03Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds3-0/+6
Pull drm fixes from Dave Airlie: "Bunch of fixes all over the place, all pretty small: amdgpu, i915, exynos, one qxl and one vmwgfx. There is also a bunch of mst fixes, I left some cleanups in the series as I didn't think it was worth splitting up the tested series" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (37 commits) drm/dp/mst: add some defines for logical/physical ports drm/dp/mst: drop cancel work sync in the mstb destroy path (v2) drm/dp/mst: split connector registration into two parts (v2) drm/dp/mst: update the link_address_sent before sending the link address (v3) drm/dp/mst: fixup handling hotplug on port removal. drm/dp/mst: don't pass port into the path builder function drm/radeon: drop radeon_fb_helper_set_par drm: handle cursor_set2 in restore_fbdev_mode drm/exynos: Staticize local function in exynos_drm_gem.c drm/exynos: fimd: actually disable dp clock drm/exynos: dp: remove suspend/resume functions drm/qxl: recreate the primary surface when the bo is not primary drm/amdgpu: only print meaningful VM faults drm/amdgpu/cgs: remove import_gpu_mem drm/i915: Call non-locking version of drm_kms_helper_poll_enable(), v2 drm: Add a non-locking version of drm_kms_helper_poll_enable(), v2 drm/vmwgfx: Fix a command submission hang regression drm/exynos: remove unused mode_fixup() code drm/exynos: remove decon_mode_fixup() drm/exynos: remove fimd_mode_fixup() ...
2015-10-02Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2-4/+3
Pull block fixes from Jens Axboe: "Another week, another round of fixes. These have been brewing for a bit and in various iterations, but I feel pretty comfortable about the quality of them. They fix real issues. The pull request is mostly blk-mq related, and the only one not fixing a real bug, is the tag iterator abstraction from Christoph. But it's pretty trivial, and we'll need it for another fix soon. Apart from the blk-mq fixes, there's an NVMe affinity fix from Keith, and a single fix for xen-blkback from Roger fixing failure to free requests on disconnect" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: factor out a helper to iterate all tags for a request_queue blk-mq: fix racy updates of rq->errors blk-mq: fix deadlock when reading cpu_list blk-mq: avoid inserting requests before establishing new mapping blk-mq: fix q->mq_usage_counter access race blk-mq: Fix use after of free q->mq_map blk-mq: fix sysfs registration/unregistration race blk-mq: avoid setting hctx->tags->cpumask before allocation NVMe: Set affinity after allocating request queues xen/blkback: free requests on disconnection
2015-10-02Merge git://git.infradead.org/intel-iommuLinus Torvalds1-2/+2
Pull IOVA fixes from David Woodhouse: "The main fix here is the first one, fixing the over-allocation of size-aligned requests. The other patches simply make the existing IOVA code available to users other than the Intel VT-d driver, with no functional change. I concede the latter really *should* have been submitted during the merge window, but since it's basically risk-free and people are waiting to build on top of it and it's my fault I didn't get it in, I (and they) would be grateful if you'd take it" * git://git.infradead.org/intel-iommu: iommu: Make the iova library a module iommu: iova: Export symbols iommu: iova: Move iova cache management to the iova library iommu/iova: Avoid over-allocating when size-aligned
2015-10-02drm/dp/mst: add some defines for logical/physical portsDave Airlie1-0/+4
This just removes the magic number. Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-02drm/dp/mst: split connector registration into two parts (v2)Dave Airlie1-0/+1
In order to cache the EDID properly for tiled displays, we need to retrieve it before we register the connector with userspace, otherwise userspace can call get resources and try and get the edid before we've even cached it. This fixes some problems when hotplugging mst monitors, with X/mutter running. As mutter seems to get 0 modes for one of the monitors in the tile. v2: fix warning in radeon handle tile setting in cached path rather than get edid path. Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-01Merge branch 'akpm' (patches from Andrew)Linus Torvalds3-3/+21
Merge misc fixes from Andrew Morton: "12 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: dmapool: fix overflow condition in pool_find_page() thermal: avoid division by zero in power allocator memcg: remove pcp_counter_lock kprobes: use _do_fork() in samples to make them work again drivers/input/joystick/Kconfig: zhenhua.c needs BITREVERSE memcg: make mem_cgroup_read_stat() unsigned memcg: fix dirty page migration dax: fix NULL pointer in __dax_pmd_fault() mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1) userfaultfd: remove kernel header include from uapi header arch/x86/include/asm/efi.h: fix build failure
2015-10-01Merge tag 'pm+acpi-4.3-rc4' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "These are fixes mostly, for a few changes made in this cycle (the intel_idle driver, the OPP library, the ACPI EC driver, turbostat) and for some issues that have just been discovered (ACPI PCI IRQ management, PCI power management documentation, turbostat), with a couple of cleanups on top of them. Specifics: - intel_idle driver fixup for the recently added Skylake chips support (Len Brown). - Operating Performance Points (OPP) library fix related to the recently added support for new DT bindings and a fix for a typo in a comment (Viresh Kumar, Stephen Boyd). - ACPI EC driver fix for a recently introduced memory leak in an error code path (Lv Zheng). - ACPI PCI IRQ management fix for the issue where an ISA IRQ is shared with a PCI device which requires it to be configured in a different way and may cause an interrupt storm to happen as a result with an extra ACPI SCI IRQ handling simplification on top of it (Jiang Liu). - Update of the PCI power management documentation that became outdated and started to actively confuse the readers to make it actually reflect the code (Rafael J Wysocki). - turbostat fixes including an IVB Xeon regression fix (related to the --debug command line option), Skylake adjustment for the TSC running at a frequency that doesn't match the base one exactly, and a Knights Landing quirk to account for the fact that it only updates APERF and MPERF every 1024 clock cycles plus bumping up the turbostat version number (Len Brown, Hubert Chrzaniuk)" * tag 'pm+acpi-4.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: tools/power turbosat: update version number tools/power turbostat: SKL: Adjust for TSC difference from base frequency tools/power turbostat: KNL workaround for %Busy and Avg_MHz tools/power turbostat: IVB Xeon: fix --debug regression ACPI / PCI: Remove duplicated penalty on SCI IRQ ACPI, PCI, irq: Do not share PCI IRQ with ISA IRQ ACPI / EC: Fix a memory leak issue in acpi_ec_query() PM / OPP: Fix typo modifcation -> modification PCI / PM: Update runtime PM documentation for PCI devices PM / OPP: of_property_count_u32_elems() can return errors intel_idle: Skylake Client Support - updated
2015-10-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2-2/+6
Pull networking fixes from David Miller: 1) Fix regression in SKB partial checksum handling, from Pravin B Shalar. 2) Fix VLAN inside of VXLAN handling in i40e driver, from Jesse Brandeburg. 3) Cure softlockups during accept() in SCTP, from Karl Heiss. 4) MSG_PEEK should return multiple SKBs worth of data in AF_UNIX, from Aaron Conole. 5) IPV6 erroneously ignores output interface specifier in lookup key for route lookups, fix from David Ahern. 6) In Marvell DSA driver, forward unknown frames to CPU port, from Andrew Lunn. 7) Mission flow flag initializations in some code paths, from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: Initialize flow flags in input path net: dsa: fix preparation of a port STP update testptp: Silence compiler warnings on ppc64 net/mlx4: Handle return codes in mlx4_qp_attach_common dsa: mv88e6xxx: Enable forwarding for unknown to the CPU port skbuff: Fix skb checksum partial check. net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set net sysfs: Print link speed as signed integer bna: fix error handling af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag af_unix: Convert the unix_sk macro to an inline function for type safety net: sctp: Don't use 64 kilobyte lookup table for four elements l2tp: protect tunnel->del_work by ref_count net/ibm/emac: bump version numbers for correct work with ethtool sctp: Prevent soft lockup when sctp_accept() is called during a timeout event sctp: Whitespace fix i40e/i40evf: check for stopped admin queue i40e: fix VLAN inside VXLAN r8169: fix handling rtl_readphy result net: hisilicon: fix handling platform_get_irq result
2015-10-01memcg: remove pcp_counter_lockGreg Thelen1-1/+0
Commit 733a572e66d2 ("memcg: make mem_cgroup_read_{stat|event}() iterate possible cpus instead of online") removed the last use of the per memcg pcp_counter_lock but forgot to remove the variable. Kill the vestigial variable. Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-01memcg: fix dirty page migrationGreg Thelen1-0/+21
The problem starts with a file backed dirty page which is charged to a memcg. Then page migration is used to move oldpage to newpage. Migration: - copies the oldpage's data to newpage - clears oldpage.PG_dirty - sets newpage.PG_dirty - uncharges oldpage from memcg - charges newpage to memcg Clearing oldpage.PG_dirty decrements the charged memcg's dirty page count. However, because newpage is not yet charged, setting newpage.PG_dirty does not increment the memcg's dirty page count. After migration completes newpage.PG_dirty is eventually cleared, often in account_page_cleaned(). At this time newpage is charged to a memcg so the memcg's dirty page count is decremented which causes underflow because the count was not previously incremented by migration. This underflow causes balance_dirty_pages() to see a very large unsigned number of dirty memcg pages which leads to aggressive throttling of buffered writes by processes in non root memcg. This issue: - can harm performance of non root memcg buffered writes. - can report too small (even negative) values in memory.stat[(total_)dirty] counters of all memcg, including the root. To avoid polluting migrate.c with #ifdef CONFIG_MEMCG checks, introduce page_memcg() and set_page_memcg() helpers. Test: 0) setup and enter limited memcg mkdir /sys/fs/cgroup/test echo 1G > /sys/fs/cgroup/test/memory.limit_in_bytes echo $$ > /sys/fs/cgroup/test/cgroup.procs 1) buffered writes baseline dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k sync grep ^dirty /sys/fs/cgroup/test/memory.stat 2) buffered writes with compaction antagonist to induce migration yes 1 > /proc/sys/vm/compact_memory & rm -rf /data/tmp/foo dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k kill % sync grep ^dirty /sys/fs/cgroup/test/memory.stat 3) buffered writes without antagonist, should match baseline rm -rf /data/tmp/foo dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k sync grep ^dirty /sys/fs/cgroup/test/memory.stat (speed, dirty residue) unpatched patched 1) 841 MB/s 0 dirty pages 886 MB/s 0 dirty pages 2) 611 MB/s -33427456 dirty pages 793 MB/s 0 dirty pages 3) 114 MB/s -33427456 dirty pages 891 MB/s 0 dirty pages Notice that unpatched baseline performance (1) fell after migration (3): 841 -> 114 MB/s. In the patched kernel, post migration performance matches baseline. Fixes: c4843a7593a9 ("memcg: add per cgroup dirty page accounting") Signed-off-by: Greg Thelen <gthelen@google.com> Reported-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> [4.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-01userfaultfd: remove kernel header include from uapi headerAndre Przywara1-2/+0
As include/uapi/linux/userfaultfd.h is a user visible header file, it should not include kernel-exclusive header files. So trying to build the userfaultfd test program from the selftests directory fails, since it contains a reference to linux/compiler.h. As it turns out, that header is not really needed there, so we can simply remove it to fix that issue. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-01Merge tag 'for-linus' of ↵Linus Torvalds2-12/+0
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: - Fixes for mlx5 related issues - Fixes for ipoib multicast handling * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/ipoib: increase the max mcast backlog queue IB/ipoib: Make sendonly multicast joins create the mcast group IB/ipoib: Expire sendonly multicast joins IB/mlx5: Remove pa_lkey usages IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY IB/iser: Add module parameter for always register memory xprtrdma: Replace global lkey with lkey local to PD
2015-10-01Merge branches 'pm-pci' and 'acpi-pci'Rafael J. Wysocki1-0/+1
* pm-pci: PCI / PM: Update runtime PM documentation for PCI devices * acpi-pci: ACPI / PCI: Remove duplicated penalty on SCI IRQ ACPI, PCI, irq: Do not share PCI IRQ with ISA IRQ
2015-10-01blk-mq: factor out a helper to iterate all tags for a request_queueChristoph Hellwig1-2/+0
And replace the blk_mq_tag_busy_iter with it - the driver use has been replaced with a new helper a while ago, and internal to the block we only need the new version. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-01blk-mq: fix racy updates of rq->errorsChristoph Hellwig1-1/+1
blk_mq_complete_request may be a no-op if the request has already been completed by others means (e.g. a timeout or cancellation), but currently drivers have to set rq->errors before calling blk_mq_complete_request, which might leave us with the wrong error value. Add an error parameter to blk_mq_complete_request so that we can defer setting rq->errors until we known we won the race to complete the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-09-30usb: renesas_usbhs: fix build warning if 64-bit architectureYoshihiro Shimoda1-1/+1
This patch fixes the following warning if 64-bit architecture environment: ./drivers/usb/renesas_usbhs/common.c:496:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] dparam->type = of_id ? (u32)of_id->data : 0; Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Felipe Balbi <balbi@ti.com>
2015-09-30drm: Add a non-locking version of drm_kms_helper_poll_enable(), v2Egbert Eich1-0/+1
drm_kms_helper_poll_enable() was converted to lock the mode_config mutex in commit 8c4ccc4ab6f64e859d4ff8d7c02c2ed2e956e07f ("drm/probe-helper: Grab mode_config.mutex in poll_init/enable"). This disregarded the cases where this function is called from a context where this mutex is already locked. Add a non-locking version as well. Changes since v1: - use function name suffix '_locked' for the function that is to be called from a locked context. Signed-off-by: Egbert Eich <eich@suse.de> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2015-09-29skbuff: Fix skb checksum partial check.Pravin B Shelar1-1/+1
Earlier patch 6ae459bda tried to detect void ckecksum partial skb by comparing pull length to checksum offset. But it does not work for all cases since checksum-offset depends on updates to skb->data. Following patch fixes it by validating checksum start offset after skb-data pointer is updated. Negative value of checksum offset start means there is no need to checksum. Fixes: 6ae459bda ("skbuff: Fix skb checksum flag on skb pull") Reported-by: Andrew Vagin <avagin@odin.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29af_unix: Convert the unix_sk macro to an inline function for type safetyAaron Conole1-1/+5
As suggested by Eric Dumazet this change replaces the #define with a static inline function to enjoy complaints by the compiler when misusing the API. Signed-off-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-29blk-mq: fix sysfs registration/unregistration raceAkinobu Mita2-1/+2
There is a race between cpu hotplug handling and adding/deleting gendisk for blk-mq, where both are trying to register and unregister the same sysfs entries. null_add_dev --> blk_mq_init_queue --> blk_mq_init_allocated_queue --> add to 'all_q_list' (*) --> add_disk --> blk_register_queue --> blk_mq_register_disk (++) null_del_dev --> del_gendisk --> blk_unregister_queue --> blk_mq_unregister_disk (--) --> blk_cleanup_queue --> blk_mq_free_queue --> del from 'all_q_list' (*) blk_mq_queue_reinit --> blk_mq_sysfs_unregister (-) --> blk_mq_sysfs_register (+) While the request queue is added to 'all_q_list' (*), blk_mq_queue_reinit() can be called for the queue anytime by CPU hotplug callback. But blk_mq_sysfs_unregister (-) and blk_mq_sysfs_register (+) in blk_mq_queue_reinit must not be called before blk_mq_register_disk (++) and after blk_mq_unregister_disk (--) is finished. Because '/sys/block/*/mq/' is not exists. There has already been BLK_MQ_F_SYSFS_UP flag in hctx->flags which can be used to track these sysfs stuff, but it is only fixing this issue partially. In order to fix it completely, we just need per-queue flag instead of per-hctx flag with appropriate locking. So this introduces q->mq_sysfs_init_done which is properly protected with all_q_mutex. Also, we need to ensure that blk_mq_map_swqueue() is called with all_q_mutex is held. Since hctx->nr_ctx is reset temporarily and updated in blk_mq_map_swqueue(), so we should avoid blk_mq_register_hctx() seeing the temporary hctx->nr_ctx value in CPU hotplug handling or adding/deleting gendisk . Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Cc: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-09-28x86/xen: Support kexec/kdump in HVM guests by doing a soft resetVitaly Kuznetsov1-0/+8
Currently there is a number of issues preventing PVHVM Xen guests from doing successful kexec/kdump: - Bound event channels. - Registered vcpu_info. - PIRQ/emuirq mappings. - shared_info frame after XENMAPSPACE_shared_info operation. - Active grant mappings. Basically, newly booted kernel stumbles upon already set up Xen interfaces and there is no way to reestablish them. In Xen-4.7 a new feature called 'soft reset' is coming. A guest performing kexec/kdump operation is supposed to call SCHEDOP_shutdown hypercall with SHUTDOWN_soft_reset reason before jumping to new kernel. Hypervisor (with some help from toolstack) will do full domain cleanup (but keeping its memory and vCPU contexts intact) returning the guest to the state it had when it was first booted and thus allowing it to start over. Doing SHUTDOWN_soft_reset on Xen hypervisors which don't support it is probably OK as by default all unknown shutdown reasons cause domain destroy with a message in toolstack log: 'Unknown shutdown reason code 5. Destroying domain.' which gives a clue to what the problem is and eliminates false expectations. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-09-28Merge branch 'for-mingo' of ↵Ingo Molnar1-6/+5
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull RCU fixes from Paul E. McKenney, for two regressions introduced in this merge window: - Fix bug with recent GCCs. - Fix false positive lockdep splat. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds1-0/+1
Pull SCSI target fixes from Nicholas Bellinger: "This includes a iser-target series from Jenny + Sagi @ Mellanox that addresses the few remaining active I/O shutdown bugs, along with a patch to support zero-copy for immediate data payloads that gives a nice performance improvement for small block WRITEs. Also included are some recent >= v4.2 regression bug-fixes. The most notable is a RCU conversion regression for SPC-3 PR registrations, and recent removal of obsolete RFC-3720 markers that introduced a login regression bug with MSFT iSCSI initiators. Thanks to everyone who has been testing + reporting bugs for v4.x" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: iscsi-target: Avoid OFMarker + IFMarker negotiation target: Make TCM_WRITE_PROTECT failure honor D_SENSE bit target: Fix target_sense_desc_format NULL pointer dereference target: Propigate backend read-only to core_tpg_add_lun target: Fix PR registration + APTPL RCU conversion regression iser-target: Skip data copy if all the command data comes as immediate iser-target: Change the recv buffers posting logic iser-target: Fix pending connections handling in target stack shutdown sequnce iser-target: Remove np_ prefix from isert_np members iser-target: Remove unused variables iser-target: Put the reference on commands waiting for unsol data iser-target: remove command with state ISTATE_REMOVE
2015-09-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds11-27/+62
Pull networking fixes from David Miller: 1) When we run a tap on netlink sockets, we have to copy mmap'd SKBs instead of cloning them. From Daniel Borkmann. 2) When converting classical BPF into eBPF, fix the setting of the source reg to BPF_REG_X. From Tycho Andersen. 3) Fix igmpv3/mldv2 report parsing in the bridge multicast code, from Linus Lussing. 4) Fix dst refcounting for ipv6 tunnels, from Martin KaFai Lau. 5) Set NLM_F_REPLACE flag properly when replacing ipv6 routes, from Roopa Prabhu. 6) Add some new cxgb4 PCI device IDs, from Hariprasad Shenai. 7) Fix headroom tests and SKB leaks in ipv6 fragmentation code, from Florian Westphal. 8) Check DMA mapping errors in bna driver, from Ivan Vecera. 9) Several 8139cp bug fixes (dev_kfree_skb_any in interrupt context, misclearing of interrupt status in TX timeout handler, etc.) from David Woodhouse. 10) In tipc, reset SKB header pointer after skb_linearize(), from Erik Hugne. 11) Fix autobind races et al. in netlink code, from Herbert Xu with help from Tejun Heo and others. 12) Missing SET_NETDEV_DEV in sunvnet driver, from Sowmini Varadhan. 13) Fix various races in timewait timer and reqsk_queue_hadh_req, from Eric Dumazet. 14) Fix array overruns in mac80211, from Johannes Berg and Dan Carpenter. 15) Fix data race in rhashtable_rehash_one(), from Dmitriy Vyukov. 16) Fix race between poll_one_napi and napi_disable, from Neil Horman. 17) Fix byte order in geneve tunnel port config, from John W Linville. 18) Fix handling of ARP replies over lightweight tunnels, from Jiri Benc. 19) We can loop when fib rule dumps cross multiple SKBs, fix from Wilson Kok and Roopa Prabhu. 20) Several reference count handling bug fixes in the PHY/MDIO layer from Russel King. 21) Fix lockdep splat in ppp_dev_uninit(), from Guillaume Nault. 22) Fix crash in icmp_route_lookup(), from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits) net: Fix panic in icmp_route_lookup net: update docbook comment for __mdiobus_register() ppp: fix lockdep splat in ppp_dev_uninit() net: via/Kconfig: GENERIC_PCI_IOMAP required if PCI not selected phy: marvell: add link partner advertised modes net: fix net_device refcounting phy: add phy_device_remove() phy: fixed-phy: properly validate phy in fixed_phy_update_state() net: fix phy refcounting in a bunch of drivers of_mdio: fix MDIO phy device refcounting phy: add proper phy struct device refcounting phy: fix mdiobus module safety net: dsa: fix of_mdio_find_bus() device refcount leak phy: fix of_mdio_find_bus() device refcount leak ip6_tunnel: Reduce log level in ip6_tnl_err() to debug ip6_gre: Reduce log level in ip6gre_err() to debug fib_rules: fix fib rule dumps across multiple skbs bnx2x: byte swap rss_key to comply to Toeplitz specs net: revert "net_sched: move tp->root allocation into fw_init()" lwtunnel: remove source and destination UDP port config option ...
2015-09-26ACPI, PCI, irq: Do not share PCI IRQ with ISA IRQJiang Liu1-0/+1
Avoid IRQs occupied by ISA IRQs when allocating IRQs for PCI link devices, otherwise it may cause interrupt storm due to incompatible pin attributes. This issue was triggered on a KVM virtual machine, which 1) uses IRQ9 for SCI in high level mode. 2) defines an PCI interrupt link device (LNKS) with IRQ9 as the only possible irq. 3) has an PCI device referring to link device LNKS. So it causes interrupt storm when enabling the PCI device because PCI IRQ works in low level mode. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-09-25Merge branch 'for-4.3-fixes' of ↵Linus Torvalds1-2/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull another cgroup fix from Tejun Heo: "The cgroup writeback support got inadvertently enabled for traditional hierarchies revealing two regressions which are currently being worked on. It shouldn't have been enabled on traditional hierarchies, so disable it on them. This is enough to make the regressions go away for people who aren't experimenting with cgroup" * 'for-4.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup, writeback: don't enable cgroup writeback on traditional hierarchies
2015-09-25Merge tag 'nfs-for-4.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-0/+3
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable patches: - fix v4.2 SEEK on files over 2 gigs - Fix a layout segment reference leak when pNFS I/O falls back to inband I/O. - Fix recovery of recalled read delegations Bugfixes: - Fix a case where NFSv4 fails to send CLOSE after a server reboot - Fix sunrpc to wait for connections to complete before retrying - Fix sunrpc races between transport connect/disconnect and shutdown - Fix an infinite loop when layoutget fail with BAD_STATEID - nfs/filelayout: Fix NULL reference caused by double freeing of fh_array - Fix a bogus WARN_ON_ONCE() in O_DIRECT when layout commit_through_mds is set - Fix layoutreturn/close ordering issues" * tag 'nfs-for-4.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS41: make close wait for layoutreturn NFS: Skip checking ds_cinfo.buckets when lseg's commit_through_mds is set NFSv4.x/pnfs: Don't try to recover stateids twice in layoutget NFSv4: Recovery of recalled read delegations is broken NFS: Fix an infinite loop when layoutget fail with BAD_STATEID NFS: Do cleanup before resetting pageio read/write to mds SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose SUNRPC: Lock the transport layer on shutdown nfs/filelayout: Fix NULL reference caused by double freeing of fh_array SUNRPC: Ensure that we wait for connections to complete before retrying SUNRPC: drop null test before destroy functions nfs: fix v4.2 SEEK on files over 2 gigs SUNRPC: Fix races between socket connection and destroy code nfs: fix pg_test page count calculation Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount
2015-09-25IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEYSagi Grimberg2-12/+0
Commit 96249d70dd70 ("IB/core: Guarantee that a local_dma_lkey is available") allows ULPs that make use of the local dma key to keep working as before by allocating a DMA MR with local permissions and converted these consumers to use the MR associated with the PD rather then device->local_dma_lkey. ConnectIB has some known issues with memory registration using the local_dma_lkey (SEND, RDMA, RECV seems to work ok). Thus don't expose support for it (remove device->local_dma_lkey setting), and take advantage of the above commit such that no regression is introduced to working systems. The local_dma_lkey support will be restored in CX4 depending on FW capability query. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-24target: Propigate backend read-only to core_tpg_add_lunNicholas Bellinger1-0/+1
This patch adds a DF_READ_ONLY flag that is used by IBLOCK to signal when a backend has been set to read-only mode, in order to propigate read-only status up to core_tpg_add_lun() for all future LUN fabric exports. With this is place, existing emulation for reporting read-only in spc_emulate_modesense() and normal transport_lookup_cmd_lun() TCM_WRITE_PROTECTED status checking just works as expected. Reported-by: Joeue Deng <joeue404@gmail.com> Reported-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-09-24phy: add phy_device_remove()Russell King1-0/+1
Add a phy_device_remove() function to complement phy_device_register(), which undoes the effects of phy_device_register() by removing the phy device from visibility, but not freeing it. This allows these details to be moved out of the mdio bus code into the phy code where this action belongs. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24phy: fix mdiobus module safetyRussell King1-1/+4
Re-implement the mdiobus module refcounting to ensure that we actually ensure that the mdiobus module code does not go away while we might call into it. The old scheme using bus->dev.driver was buggy, because bus->dev is a class device which never has a struct device_driver associated with it, and hence the associated code trying to obtain a refcount did nothing useful. Instead, take the approach that other subsystems do: pass the module when calling mdiobus_register(), and record that in the mii_bus struct. When we need to increment the module use count in the phy code, use this stored pointer. When the phy is deteched, drop the module refcount, remembering that the phy device might go away at that point. This doesn't stop the mii_bus going away while there are in-use phys - it merely stops the underlying code vanishing. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24Merge branch 'next' of ↵Linus Torvalds1-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management fixes from Zhang Rui: - Power allocator governor changes to allow binding on thermal zones with missing power estimates information. From Javi Merino. - Add compile test flags on thermal drivers that allow it without producing compilation errors. From Eduardo Valentin. - Fixes around memory allocation on cpu_cooling. From Javi Merino. - Fix on db8500 cpufreq code to allow autoload. From Luis de Bethencourt. - Maintainer entries for cpu cooling device * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: thermal: power_allocator: exit early if there are no cooling devices thermal: power_allocator: don't require tzp to be present for the thermal zone thermal: power_allocator: relax the requirement of two passive trip points thermal: power_allocator: relax the requirement of a sustainable_power in tzp thermal: Add a function to get the minimum power thermal: cpu_cooling: free power table on error or when unregistering thermal: cpu_cooling: don't call kcalloc() under rcu_read_lock thermal: db8500_cpufreq_cooling: Fix module autoload for OF platform driver thermal: cpu_cooling: Add MAINTAINERS entry thermal: ti-soc: Kconfig fix to avoid menu showing wrongly thermal: ti-soc: allow compile test thermal: qcom_spmi: allow compile test thermal: exynos: allow compile test thermal: armada: allow compile test thermal: dove: allow compile test thermal: kirkwood: allow compile test thermal: rockchip: allow compile test thermal: spear: allow compile test thermal: hisi: allow compile test thermal: Fix thermal_zone_of_sensor_register to match documentation
2015-09-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-6/+7
Merge misc fixes from Andrew Morton: "15 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: ocfs2/dlm: fix deadlock when dispatch assert master membarrier: clean up selftest vmscan: fix sane_reclaim helper for legacy memcg lib/iommu-common.c: do not try to deref a null iommu->lazy_flush() pointer when n < pool->hint x86, efi, kasan: #undef memset/memcpy/memmove per arch mm: migrate: hugetlb: putback destination hugepage to active list mm, dax: VMA with vm_ops->pfn_mkwrite wants to be write-notified userfaultfd: register uapi generic syscall (aarch64) userfaultfd: selftest: don't error out if pthread_mutex_t isn't identical userfaultfd: selftest: return an error if BOUNCE_VERIFY fails userfaultfd: selftest: avoid my_bcmp false positives with powerpc userfaultfd: selftest: only warn if __NR_userfaultfd is undefined userfaultfd: selftest: headers fixup userfaultfd: selftests: vm: pick up sanitized kernel headers userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key"
2015-09-24lwtunnel: remove source and destination UDP port config optionJiri Benc1-4/+0
The UDP tunnel config is asymmetric wrt. to the ports used. The source and destination ports from one direction of the tunnel are not related to the ports of the other direction. We need to be able to respond to ARP requests using the correct ports without involving routing. As the consequence, UDP ports need to be fixed property of the tunnel interface and cannot be set per route. Remove the ability to set ports per route. This is still okay to do, as no kernel has been released with these attributes yet. Note that the ability to specify source and destination ports is preserved for other users of the lwtunnel API which don't use routes for tunnel key specification (like openvswitch). If in the future we rework ARP handling to allow port specification, the attributes can be added back. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24ipv4: send arp replies to the correct tunnelJiri Benc1-0/+2
When using ip lwtunnels, the additional data for xmit (basically, the actual tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in metadata dst. When replying to ARP requests, we need to send the reply to the same tunnel the request came from. This means we need to construct proper metadata dst for ARP replies. We could perform another route lookup to get a dst entry with the correct lwtstate. However, this won't always ensure that the outgoing tunnel is the same as the incoming one, and it won't work anyway for IPv4 duplicate address detection. The only thing to do is to "reverse" the ip_tunnel_info. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24skbuff: Fix skb checksum flag on skb pullPravin B Shelar1-0/+3
VXLAN device can receive skb with checksum partial. But the checksum offset could be in outer header which is pulled on receive. This results in negative checksum offset for the skb. Such skb can cause the assert failure in skb_checksum_help(). Following patch fixes the bug by setting checksum-none while pulling outer header. Following is the kernel panic msg from old kernel hitting the bug. ------------[ cut here ]------------ kernel BUG at net/core/dev.c:1906! RIP: 0010:[<ffffffff81518034>] skb_checksum_help+0x144/0x150 Call Trace: <IRQ> [<ffffffffa0164c28>] queue_userspace_packet+0x408/0x470 [openvswitch] [<ffffffffa016614d>] ovs_dp_upcall+0x5d/0x60 [openvswitch] [<ffffffffa0166236>] ovs_dp_process_packet_with_key+0xe6/0x100 [openvswitch] [<ffffffffa016629b>] ovs_dp_process_received_packet+0x4b/0x80 [openvswitch] [<ffffffffa016c51a>] ovs_vport_receive+0x2a/0x30 [openvswitch] [<ffffffffa0171383>] vxlan_rcv+0x53/0x60 [openvswitch] [<ffffffffa01734cb>] vxlan_udp_encap_recv+0x8b/0xf0 [openvswitch] [<ffffffff8157addc>] udp_queue_rcv_skb+0x2dc/0x3b0 [<ffffffff8157b56f>] __udp4_lib_rcv+0x1cf/0x6c0 [<ffffffff8157ba7a>] udp_rcv+0x1a/0x20 [<ffffffff8154fdbd>] ip_local_deliver_finish+0xdd/0x280 [<ffffffff81550128>] ip_local_deliver+0x88/0x90 [<ffffffff8154fa7d>] ip_rcv_finish+0x10d/0x370 [<ffffffff81550365>] ip_rcv+0x235/0x300 [<ffffffff8151ba1d>] __netif_receive_skb+0x55d/0x620 [<ffffffff8151c360>] netif_receive_skb+0x80/0x90 [<ffffffff81459935>] virtnet_poll+0x555/0x6f0 [<ffffffff8151cd04>] net_rx_action+0x134/0x290 [<ffffffff810683d8>] __do_softirq+0xa8/0x210 [<ffffffff8162fe6c>] call_softirq+0x1c/0x30 [<ffffffff810161a5>] do_softirq+0x65/0xa0 [<ffffffff810687be>] irq_exit+0x8e/0xb0 [<ffffffff81630733>] do_IRQ+0x63/0xe0 [<ffffffff81625f2e>] common_interrupt+0x6e/0x6e Reported-by: Anupam Chanda <achanda@vmware.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24cgroup, writeback: don't enable cgroup writeback on traditional hierarchiesTejun Heo1-2/+9
inode_cgwb_enabled() gates cgroup writeback support. If it returns true, each inode is attached to the corresponding memory domain which gets mapped to io domain. It currently only tests whether the filesystem and bdi support cgroup writeback; however, cgroup writeback support doesn't work on traditional hierarchies and thus it should also test whether memcg and iocg are on the default hierarchy. This caused traditional hierarchy setups to hit the cgroup writeback path inadvertently and ended up creating separate writeback domains for each memcg and mapping them all to the root iocg uncovering a couple issues in the cgroup writeback path. cgroup writeback was never meant to be enabled on traditional hierarchies. Make inode_cgwb_enabled() test whether both memcg and iocg are on the default hierarchy. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Artem Bityutskiy <dedekind1@gmail.com> Reported-by: Dexuan Cui <decui@microsoft.com> Link: http://lkml.kernel.org/g/1443012552.19983.209.camel@gmail.com Link: http://lkml.kernel.org/g/f30d4a6aa8a546ff88f73021d026a453@SIXPR30MB031.064d.mgd.msft.net
2015-09-24Merge tag 'spi-fix-v4.3-rc2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A disappointingly large collection of fixes for SPI issues, though almost all in drivers (and there mainly the newly added Mediatek driver) and the core fixes are documentation and error handling. The driver fixes are all of the usual 'important if you see them' variety" * tag 'spi-fix-v4.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: xtensa-xtfpga: fix register endianness spi: meson: Fix module autoload for OF platform driver spi: mediatek: fix wrong error return value on probe spi: fix kernel-doc warnings in spi.h spi: spidev: fix possible NULL dereference spi: atmel: remove warning when !CONFIG_PM_SLEEP spi: bcm2835: BUG: fix wrong use of PAGE_MASK spi: mediatek: fix spi cs polarity error spi: Fix documentation of spi_alloc_master() spi: spi-pxa2xx: Check status register to determine if SSSR_TINT is disabled spi: Mediatek: Document devicetree bindings update for spi bus spi: mediatek: fix spi clock usage error spi: mediatek: remove clk_disable_unprepare()
2015-09-23netpoll: Close race condition between poll_one_napi and napi_disableNeil Horman1-0/+1
Drivers might call napi_disable while not holding the napi instance poll_lock. In those instances, its possible for a race condition to exist between poll_one_napi and napi_disable. That is to say, poll_one_napi only tests the NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such the following may happen: CPU0 CPU1 ndo_tx_timeout napi_poll_dev napi_disable poll_one_napi test_and_set_bit (ret 0) test_bit (ret 1) reset adapter napi_poll_routine If the adapter gets a tx timeout without a napi instance scheduled, its possible for the adapter to think it has exclusive access to the hardware (as the napi instance is now scheduled via the napi_disable call), while the netpoll code thinks there is simply work to do. The result is parallel hardware access leading to corrupt data structures in the driver, and a crash. Additionaly, there is another, more critical race between netpoll and napi_disable. The disabled napi state is actually identical to the scheduled state for a given napi instance. The implication being that, if a napi instance is disabled, a netconsole instance would see the napi state of the device as having been scheduled, and poll it, likely while the driver was dong something requiring exclusive access. In the case above, its fairly clear that not having the rings in a state ready to be polled will cause any number of crashes. The fix should be pretty easy. netpoll uses its own bit to indicate that that the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC). We can just gate disabling on that bit as well as the sched bit. That should prevent netpoll from conducting a napi poll if we convert its set bit to a test_and_set_bit operation to provide mutual exclusion Change notes: V2) Remove a trailing whtiespace Resubmit with proper subject prefix V3) Clean up spacing nits Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> CC: jmaxwell@redhat.com Tested-by: jmaxwell@redhat.com Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-22userfaultfd: register uapi generic syscall (aarch64)Dr. David Alan Gilbert1-3/+5
Add the userfaultfd syscalls to uapi asm-generic, it was tested with postcopy live migration on aarch64 with both 4k and 64k pagesize kernels. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Thierry Reding <treding@nvidia.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-22userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to ↵Andrea Arcangeli1-3/+2
__wake_up_locked_key" This reverts commit 51360155eccb907ff8635bd10fc7de876408c2e0 and adapts fs/userfaultfd.c to use the old version of that function. It didn't look robust to call __wake_up_common with "nr == 1" when we absolutely require wakeall semantics, but we've full control of what we insert in the two waitqueue heads of the blocked userfaults. No exclusive waitqueue risks to be inserted into those two waitqueue heads so we can as well stick to "nr == 1" of the old code and we can rely purely on the fact no waitqueue inserted in one of the two waitqueue heads we must enforce as wakeall, has wait->flags WQ_FLAG_EXCLUSIVE set. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Thierry Reding <treding@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-22Merge remote-tracking branches 'spi/fix/atmel', 'spi/fix/bcm2835', ↵Mark Brown1-1/+1
'spi/fix/doc', 'spi/fix/mediatek', 'spi/fix/meson', 'spi/fix/mtk' and 'spi/fix/pxa2xx' into spi-linus
2015-09-21Merge branch 'for-4.3-fixes' of ↵Linus Torvalds3-25/+22
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "The threadgroup locking changes which went in during 4.2 devel cycle added write locking of a percpu_rwsem in cgroup task migration path; unfortunately, that involved expedited rcu syncing which turned out to be too slow and heavy for certain workloads. The patchset which is dependent on this one didn't get committed during that devel cycle, so these two patches can be reverted safely. Oleg reworked percpu_rwsem for 4.4 so that the writer path is a lot lighter. The reported issue goes away with Oleg's reworked percpu_rwsem and I'll reapply these patches on the for-4.4 branch so that they can land together with Oleg's changes" * 'for-4.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: Revert "sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem" Revert "cgroup: simplify threadgroup locking"
2015-09-21tcp/dccp: fix timewait races in timer handlingEric Dumazet1-1/+13
When creating a timewait socket, we need to arm the timer before allowing other cpus to find it. The signal allowing cpus to find the socket is setting tw_refcnt to non zero value. As we set tw_refcnt in __inet_twsk_hashdance(), we therefore need to call inet_twsk_schedule() first. This also means we need to remove tw_refcnt changes from inet_twsk_schedule() and let the caller handle it. Note that because we use mod_timer_pinned(), we have the guarantee the timer wont expire before we set tw_refcnt as we run in BH context. To make things more readable I introduced inet_twsk_reschedule() helper. When rearming the timer, we can use mod_timer_pending() to make sure we do not rearm a canceled timer. Note: This bug can possibly trigger if packets of a flow can hit multiple cpus. This does not normally happen, unless flow steering is broken somehow. This explains this bug was spotted ~5 months after its introduction. A similar fix is needed for SYN_RECV sockets in reqsk_queue_hash_req(), but will be provided in a separate patch for proper tracking. Fixes: 789f558cfb36 ("tcp/dccp: get rid of central timewait timer") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Ying Cai <ycai@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>