linux - Linux Kernel (branches are rebased on master from time to time)

Age	Commit message (Collapse)	Author	Files	Lines
2017-03-11	x86/hyperv: Implement hv_get_tsc_page()	Vitaly Kuznetsov	3	-2/+18
	To use Hyper-V TSC page clocksource from vDSO we need to make tsc_pg available. Implement hv_get_tsc_page() and add CONFIG_HYPERV_TSCPAGE to make #ifdef-s simple. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: devel@linuxdriverproject.org Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: virtualization@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20170303132142.25595-2-vkuznets@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-11	Merge tag 'tty-4.11-rc2' of ↵	Linus Torvalds	2	-65/+73
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes frpm Greg KH: "Here are two bugfixes for tty stuff for 4.11-rc2. One of them resolves the pretty bad bug in the n_hdlc code that Alexander Popov found and fixed and has been reported everywhere. The other just fixes a samsung serial driver issue when DMA fails on some systems. Both have been in linux-next with no reported issues" * tag 'tty-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: samsung: Continue to work if DMA request fails tty: n_hdlc: get rid of racy n_hdlc.tbuf
2017-03-11	Merge tag 'staging-4.11-rc2' of ↵	Linus Torvalds	2	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fixes from Greg KH: "Here are two small build warning fixes for some staging drivers that Arnd has found on his valiant quest to get the kernel to build properly with no warnings. Both of these have been in linux-next this week and resolve the reported issues" * tag 'staging-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: octeon: remove unused variable staging/vc04_services: add CONFIG_OF dependency
2017-03-11	Merge tag 'usb-4.11-rc2' of ↵	Linus Torvalds	26	-144/+213
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here is a number of different USB fixes for 4.11-rc2. Seems like there were a lot of unresolved issues that people have been finding for this subsystem, and a bunch of good security auditing happening as well from Johan Hovold. There's the usual batch of gadget driver fixes and xhci issues resolved as well. All of these have been in linux-next with no reported issues" * tag 'usb-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (35 commits) usb: host: xhci-plat: Fix timeout on removal of hot pluggable xhci controllers usb: host: xhci-dbg: HCIVERSION should be a binary number usb: xhci: remove dummy extra_priv_size for size of xhci_hcd struct usb: xhci-mtk: check hcc_params after adding primary hcd USB: serial: digi_acceleport: fix OOB-event processing MAINTAINERS: usb251xb: remove reference inexistent file doc: dt-bindings: usb251xb: mark reg as required usb: usb251xb: dt: add unit suffix to oc-delay and power-on-time usb: usb251xb: remove max_{power,current}_{sp,bp} properties usb-storage: Add ignore-residue quirk for Initio INIC-3619 USB: iowarrior: fix NULL-deref in write USB: iowarrior: fix NULL-deref at probe usb: phy: isp1301: Add OF device ID table usb: ohci-at91: Do not drop unhandled USB suspend control requests USB: serial: safe_serial: fix information leak in completion handler USB: serial: io_ti: fix information leak in completion handler USB: serial: omninet: drop open callback USB: serial: omninet: fix reference leaks at open USB: serial: io_ti: fix NULL-deref in interrupt callback usb: dwc3: gadget: make to increment req->remaining in all cases ...
2017-03-11	Merge tag 'pinctrl-v4.11-2' of ↵	Linus Torvalds	2	-6/+21
	git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pinctrl fixes from Linus Walleij: "Two smaller pin control fixes for the v4.11 series: - Add a get_direction() function to the qcom driver - Fix two pin names in the uniphier driver" * tag 'pinctrl-v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: uniphier: change pin names of aio/xirq for LD11 pinctrl: qcom: add get_direction function
2017-03-10	Merge tag 'ceph-for-4.11-rc2' of git://github.com/ceph/ceph-client	Linus Torvalds	6	-8/+66
	Pull ceph fixes from Ilya Dryomov: - a fix for the recently discovered misdirected requests bug present in jewel and later on the server side and all stable kernels - a fixup for -rc1 CRUSH changes - two usability enhancements: osd_request_timeout option and supported_features bus attribute. * tag 'ceph-for-4.11-rc2' of git://github.com/ceph/ceph-client: libceph: osd_request_timeout option rbd: supported_features bus attribute libceph: don't set weight to IN when OSD is destroyed libceph: fix crush_decode() for older maps
2017-03-10	Merge branch 'i2c/for-current' of ↵	Linus Torvalds	8	-22/+56
	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Here are some driver bugfixes from I2C. Unusual this time are the two reverts. One because I accidently picked a patch from the list which I should have pulled from my co-maintainer instead ("missing of_node_put"). And one which I wrongly assumed to be an easy fix but it turned out already that it needs more iterations ("copy device properties")" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: Revert "i2c: copy device properties when using i2c_register_board_info()" Revert "i2c: add missing of_node_put in i2c_mux_del_adapters" i2c: exynos5: Avoid transaction timeouts due TRANSFER_DONE_AUTO not set i2c: designware: add reset interface i2c: meson: fix wrong variable usage in meson_i2c_put_data i2c: copy device properties when using i2c_register_board_info() i2c: m65xx: drop superfluous quirk structure i2c: brcmstb: Fix START and STOP conditions i2c: add missing of_node_put in i2c_mux_del_adapters i2c: riic: fix restart condition i2c: add missing of_node_put in i2c_mux_del_adapters
2017-03-10	Merge tag 'drm-fixes-for-4.11-rc2' of ↵	Linus Torvalds	22	-264/+768
	git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Intel, amd and mxsfb fixes. These are the drm fixes I've collected for rc2. Mostly i915 GVT only fixes, along with a single EDID fix, some mxsfb fixes and a few minor amd fixes" * tag 'drm-fixes-for-4.11-rc2' of git://people.freedesktop.org/~airlied/linux: (38 commits) drm: mxsfb: Implement drm_panel handling drm: mxsfb_crtc: Fix the framebuffer misplacement drm: mxsfb: Fix crash when provided invalid DT bindings drm: mxsfb: fix pixel clock polarity drm: mxsfb: use bus_format to determine LCD bus width drm/amdgpu: bump driver version for some new features drm/amdgpu: validate paramaters in the gem ioctl drm/amd/amdgpu: fix console deadlock if late init failed drm/i915/gvt: change some gvt_err to gvt_dbg_cmd drm/i915/gvt: protect RO and Rsvd bits of virtual vgpu configuration space drm/i915/gvt: handle workload lifecycle properly drm/edid: Add EDID_QUIRK_FORCE_8BPC quirk for Rotel RSX-1058 drm/i915/gvt: fix an error for F_RO flag drm/i915/gvt: use pfn_valid for better checking drm/i915/gvt: set SFUSE_STRAP properly for vitual monitor detection drm/i915/gvt: fix an error for one register drm/i915/gvt: add more registers into handlers list drm/i915/gvt: have more registers with F_CMD_ACCESS flags set drm/i915/gvt: add some new MMIOs to cmd_access white list drm/i915/gvt: fix pcode mailbox write emulation of BDW ...
2017-03-10	Merge branch 'prep-for-5level'	Linus Torvalds	64	-147/+868
	Merge 5-level page table prep from Kirill Shutemov: "Here's relatively low-risk part of 5-level paging patchset. Merging it now will make x86 5-level paging enabling in v4.12 easier. The first patch is actually x86-specific: detect 5-level paging support. It boils down to single define. The rest of patchset converts Linux MMU abstraction from 4- to 5-level paging. Enabling of new abstraction in most cases requires adding single line of code in arch-specific code. The rest is taken care by asm-generic/. Changes to mm/ code are mostly mechanical: add support for new page table level -- p4d_t -- where we deal with pud_t now. v2: - fix build on microblaze (Michal); - comment for __ARCH_HAS_5LEVEL_HACK in kasan_populate_zero_shadow(); - acks from Michal" * emailed patches from Kirill A Shutemov <kirill.shutemov@linux.intel.com>: mm: introduce __p4d_alloc() mm: convert generic code to 5-level paging asm-generic: introduce <asm-generic/pgtable-nop4d.h> arch, mm: convert all architectures to use 5level-fixup.h asm-generic: introduce __ARCH_USE_5LEVEL_HACK asm-generic: introduce 5level-fixup.h x86/cpufeature: Add 5-level paging detection
2017-03-10	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	54	-185/+277
	Merge fixes from Andrew Morton: "26 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (26 commits) userfaultfd: remove wrong comment from userfaultfd_ctx_get() fat: fix using uninitialized fields of fat_inode/fsinfo_inode sh: cayman: IDE support fix kasan: fix races in quarantine_remove_cache() kasan: resched in quarantine_remove_cache() mm: do not call mem_cgroup_free() from within mem_cgroup_alloc() thp: fix another corner case of munlock() vs. THPs rmap: fix NULL-pointer dereference on THP munlocking mm/memblock.c: fix memblock_next_valid_pfn() userfaultfd: selftest: vm: allow to build in vm/ directory userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED userfaultfd: non-cooperative: fix fork fctx->new memleak mm/cgroup: avoid panic when init with low memory drivers/md/bcache/util.h: remove duplicate inclusion of blkdev.h mm/vmstats: add thp_split_pud event for clarity include/linux/fs.h: fix unsigned enum warning with gcc-4.2 userfaultfd: non-cooperative: release all ctx in dup_userfaultfd_complete userfaultfd: non-cooperative: robustness check userfaultfd: non-cooperative: rollback userfaultfd_exit x86, mm: unify exit paths in gup_pte_range() ...
2017-03-09	Merge tag 'xfs-4.11-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux	Linus Torvalds	14	-100/+103
	Pull xfs fixes from Darrick Wong: "Here are some bug fixes for -rc2 to clean up the copy on write handling and to remove a cause of hangs. - Fix various iomap bugs - Fix overly aggressive CoW preallocation garbage collection - Fixes to CoW endio error handling - Fix some incorrect geometry calculations - Remove a potential system hang in bulkstat - Try to allocate blocks more aggressively to reduce ENOSPC errors" * tag 'xfs-4.11-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: try any AG when allocating the first btree block when reflinking xfs: use iomap new flag for newly allocated delalloc blocks xfs: remove kmem_zalloc_greedy xfs: Use xfs_icluster_size_fsb() to calculate inode alignment mask xfs: fix and streamline error handling in xfs_end_io xfs: only reclaim unwritten COW extents periodically iomap: invalidate page caches should be after iomap_dio_complete() in direct write
2017-03-09	Merge tag 'gcc-plugins-v4.11-rc2' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull gcc-plugins fix from Kees Cook: "Fixes a typo in sancov plugin, exposed in earlier compiler versions" * tag 'gcc-plugins-v4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: gcc-plugins: fix sancov_plugin for gcc-5
2017-03-10	drm: mxsfb: Implement drm_panel handling	Fabio Estevam	1	-0/+4
	Currently when the 'power-supply' regulator is passed via device tree it does not actually work since drm_panel_prepare()/drm_panel_enable() are never called. Quoting Thierry Reding: "It should really call drm_panel_prepare() and drm_panel_enable() while switching on the display pipeline and drm_panel_disable(), followed by drm_panel_unprepare() while switching off the display pipeline." So do as suggested, so that the 'power-supply' regulator can be functional. Reported-by: Breno Lima <breno.lima@nxp.com> Suggested-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Tested-by: Marek Vasut <marex@denx.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-10	drm: mxsfb_crtc: Fix the framebuffer misplacement	Fabio Estevam	1	-2/+2
	Currently the framebuffer content is displayed with incorrect offsets in both the vertical and horizontal directions. The fbdev version of the driver does not show this problem. Breno Lima dumped the eLCDIF controller registers on both the drm and fbdev drivers and noticed that the VDCTRL3 register is configured incorrectly in the drm driver. The fbdev driver calculates the vertical and horizontal wait counts of the VDCTRL3 register by doing: back porch + sync length. Looking at the horizontal and vertical timing diagram from include/drm/drm_modes.h this value corresponds to: crtc_[hv]total - crtc_[hv]sync_start So fix the VDCTRL3 register setting accordingly so that the eLCDIF controller can properly show the framebuffer content in the correct position. Reported-by: Breno Lima <breno.lima@nxp.com> Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Tested-by: Breno Lima <breno.lima@nxp.com> Tested-by: Marek Vasut <marex@denx.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-10	drm: mxsfb: Fix crash when provided invalid DT bindings	Marek Vasut	1	-0/+4
	The mxsfb driver will crash if the mxsfb DT node has a subnode, but the content of the subnode is not of-graph binding with an endpoint linking to panel. The crash was triggered by providing old-style panel bindings to the mxsfb driver instead of the new of-graph ones. The problem happens in mxsfb_create_output(), which is invoked from mxsfb_load(). The mxsfb_create_output() iterates over all mxsfb DT subnode endpoints and tries to bind a panel on each endpoint. If there is any problem binding the panel, that is, mxsfb->panel == NULL, this function will return an error code, otherwise success 0 is returned. If the subnodes do not specify of-graph binding with an endpoint, the iteration over endpoints in mxsfb_create_output() will have zero cycles and the function will immediatelly return 0, but the mxsfb->panel will remain NULL. This is propagated back into the mxsfb_load(), which does not detect any problem and expects that the mxsfb->panel is valid, thus calls mxsfb_panel_attach(). But since mxsfb->panel == NULL, mxsfb_panel_attach() is called with first argument NULL and this crashes the kernel. This patch fixes the problem by explicitly checking for valid mxsfb->panel at the end of the iteration in mxsfb_create_output(). Signed-off-by: Marek Vasut <marex@denx.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Stefan Agner <stefan@agner.ch> Cc: Breno Matheus Lima <brenomatheus@gmail.com> Tested-by: Breno Lima <breno.lima@nxp.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-10	drm: mxsfb: fix pixel clock polarity	Stefan Agner	1	-2/+9
	The DRM subsystem specifies the pixel clock polarity from a controllers perspective: DRM_BUS_FLAG_PIXDATA_NEGEDGE means the controller drives the data on pixel clocks falling edge. That is the controllers DOTCLK_POL=0 (Default is data launched at negative edge). Also change the data enable logic to be high active by default and only change if explicitly requested via bus_flags. With that defaults are: - Data enable: high active - Pixel clock polarity: controller drives data on negative edge Signed-off-by: Stefan Agner <stefan@agner.ch> Acked-by: Marek Vasut <marex@denx.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-10	drm: mxsfb: use bus_format to determine LCD bus width	Stefan Agner	2	-2/+33
	The LCD bus width does not need to align with the pixel format. The LCDIF controller automatically converts between pixel formats and bus width by padding or dropping LSBs. The DRM subsystem has the notion of bus_format which allows to determine what bus_formats are supported by the display. Choose the first available or fallback to 24 bit if none are available. Signed-off-by: Stefan Agner <stefan@agner.ch> Acked-by: Marek Vasut <marex@denx.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-10	Merge branch 'drm-fixes-4.11' of git://people.freedesktop.org/~agd5f/linux ↵	Dave Airlie	3	-2/+27
	into drm-fixes * 'drm-fixes-4.11' of git://people.freedesktop.org/~agd5f/linux: drm/amdgpu: bump driver version for some new features drm/amdgpu: validate paramaters in the gem ioctl drm/amd/amdgpu: fix console deadlock if late init failed
2017-03-10	Merge tag 'drm-intel-fixes-2017-03-09' of ↵	Dave Airlie	14	-256/+686
	git://anongit.freedesktop.org/git/drm-intel into drm-fixes flushing out gvt-g fixes * tag 'drm-intel-fixes-2017-03-09' of git://anongit.freedesktop.org/git/drm-intel: (29 commits) drm/i915/gvt: change some gvt_err to gvt_dbg_cmd drm/i915/gvt: protect RO and Rsvd bits of virtual vgpu configuration space drm/i915/gvt: handle workload lifecycle properly drm/i915/gvt: fix an error for F_RO flag drm/i915/gvt: use pfn_valid for better checking drm/i915/gvt: set SFUSE_STRAP properly for vitual monitor detection drm/i915/gvt: fix an error for one register drm/i915/gvt: add more registers into handlers list drm/i915/gvt: have more registers with F_CMD_ACCESS flags set drm/i915/gvt: add some new MMIOs to cmd_access white list drm/i915/gvt: fix pcode mailbox write emulation of BDW drm/i915/gvt: add resolution definition for vGPU type drm/i915/gvt: Add more edid definition support drm/i915/gvt: adjust to fixed vGPU types drm/i915/gvt: remove unnecessary error msg from gtt write drm/i915/gvt: refine pcode write emulation drm/i915/gvt: clear the vGPU reset logic drm/i915/gvt: decrease priority of output msg for untracked mmio drm/i915/gvt: set default value to 0 for unhandled mmio regs drm/i915/gvt: add cmd_access to GEN7_HALF_SLICE_CHICKEN1 ...
2017-03-10	Merge tag 'drm-misc-fixes-2017-03-06' of ↵	Dave Airlie	1	-0/+3
	git://anongit.freedesktop.org/git/drm-misc into drm-fixes Just 1 8bpc quirk from Ville, cc: stable * tag 'drm-misc-fixes-2017-03-06' of git://anongit.freedesktop.org/git/drm-misc: drm/edid: Add EDID_QUIRK_FORCE_8BPC quirk for Rotel RSX-1058
2017-03-09	userfaultfd: remove wrong comment from userfaultfd_ctx_get()	David Hildenbrand	1	-2/+0
	It's a void function, so there is no return value; Link: http://lkml.kernel.org/r/20170309150817.7510-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	fat: fix using uninitialized fields of fat_inode/fsinfo_inode	OGAWA Hirofumi	1	-1/+12
	Recently fallocate patch was merged and it uses MSDOS_I(inode)->mmu_private at fat_evict_inode(). However, fat_inode/fsinfo_inode that was introduced in past didn't initialize MSDOS_I(inode) properly. With those combinations, it became the cause of accessing random entry in FAT area. Link: http://lkml.kernel.org/r/87pohrj4i8.fsf@mail.parknet.co.jp Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reported-by: Moreno Bartalucci <moreno.bartalucci@tecnorama.it> Tested-by: Moreno Bartalucci <moreno.bartalucci@tecnorama.it> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	sh: cayman: IDE support fix	Bartlomiej Zolnierkiewicz	1	-2/+0
	Remove incorrect CONFIG_IDE ifdef (CONFIG_IDE config option is for internal drivers/ide/ use) and make IDE hardware interface always initialized (not only when IDE subsystem is built-in). This patch allows Cayman board to work with modular IDE subsystem support and removes the requirement of having the whole core IDE subsystem built-in when using libata PATA support. Link: http://lkml.kernel.org/r/1990884.yFoE6lSB9G@amdc3058 Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	kasan: fix races in quarantine_remove_cache()	Dmitry Vyukov	1	-6/+36
	quarantine_remove_cache() frees all pending objects that belong to the cache, before we destroy the cache itself. However there are currently two possibilities how it can fail to do so. First, another thread can hold some of the objects from the cache in temp list in quarantine_put(). quarantine_put() has a windows of enabled interrupts, and on_each_cpu() in quarantine_remove_cache() can finish right in that window. These objects will be later freed into the destroyed cache. Then, quarantine_reduce() has the same problem. It grabs a batch of objects from the global quarantine, then unlocks quarantine_lock and then frees the batch. quarantine_remove_cache() can finish while some objects from the cache are still in the local to_free list in quarantine_reduce(). Fix the race with quarantine_put() by disabling interrupts for the whole duration of quarantine_put(). In combination with on_each_cpu() in quarantine_remove_cache() it ensures that quarantine_remove_cache() either sees the objects in the per-cpu list or in the global list. Fix the race with quarantine_reduce() by protecting quarantine_reduce() with srcu critical section and then doing synchronize_srcu() at the end of quarantine_remove_cache(). I've done some assessment of how good synchronize_srcu() works in this case. And on a 4 CPU VM I see that it blocks waiting for pending read critical sections in about 2-3% of cases. Which looks good to me. I suspect that these races are the root cause of some GPFs that I episodically hit. Previously I did not have any explanation for them. BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8 IP: qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:155 PGD 6aeea067 PUD 60ed7067 PMD 0 Oops: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 13667 Comm: syz-executor2 Not tainted 4.10.0+ #60 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff88005f948040 task.stack: ffff880069818000 RIP: 0010:qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:155 RSP: 0018:ffff88006981f298 EFLAGS: 00010246 RAX: ffffea0000ffff00 RBX: 0000000000000000 RCX: ffffea0000ffff1f RDX: 0000000000000000 RSI: ffff88003fffc3e0 RDI: 0000000000000000 RBP: ffff88006981f2c0 R08: ffff88002fed7bd8 R09: 00000001001f000d R10: 00000000001f000d R11: ffff88006981f000 R12: ffff88003fffc3e0 R13: ffff88006981f2d0 R14: ffffffff81877fae R15: 0000000080000000 FS: 00007fb911a2d700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000c8 CR3: 0000000060ed6000 CR4: 00000000000006f0 Call Trace: quarantine_reduce+0x10e/0x120 mm/kasan/quarantine.c:239 kasan_kmalloc+0xca/0xe0 mm/kasan/kasan.c:590 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544 slab_post_alloc_hook mm/slab.h:456 [inline] slab_alloc_node mm/slub.c:2718 [inline] kmem_cache_alloc_node+0x1d3/0x280 mm/slub.c:2754 __alloc_skb+0x10f/0x770 net/core/skbuff.c:219 alloc_skb include/linux/skbuff.h:932 [inline] _sctp_make_chunk+0x3b/0x260 net/sctp/sm_make_chunk.c:1388 sctp_make_data net/sctp/sm_make_chunk.c:1420 [inline] sctp_make_datafrag_empty+0x208/0x360 net/sctp/sm_make_chunk.c:746 sctp_datamsg_from_user+0x7e8/0x11d0 net/sctp/chunk.c:266 sctp_sendmsg+0x2611/0x3970 net/sctp/socket.c:1962 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 SYSC_sendto+0x660/0x810 net/socket.c:1685 SyS_sendto+0x40/0x50 net/socket.c:1653 I am not sure about backporting. The bug is quite hard to trigger, I've seen it few times during our massive continuous testing (however, it could be cause of some other episodic stray crashes as it leads to memory corruption...). If it is triggered, the consequences are very bad -- almost definite bad memory corruption. The fix is non trivial and has chances of introducing new bugs. I am also not sure how actively people use KASAN on older releases. [dvyukov@google.com: - sorted includes[ Link: http://lkml.kernel.org/r/20170309094028.51088-1-dvyukov@google.com Link: http://lkml.kernel.org/r/20170308151532.5070-1-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	kasan: resched in quarantine_remove_cache()	Dmitry Vyukov	1	-1/+8
	We see reported stalls/lockups in quarantine_remove_cache() on machines with large amounts of RAM. quarantine_remove_cache() needs to scan whole quarantine in order to take out all objects belonging to the cache. Quarantine is currently 1/32-th of RAM, e.g. on a machine with 256GB of memory that will be 8GB. Moreover quarantine scanning is a walk over uncached linked list, which is slow. Add cond_resched() after scanning of each non-empty batch of objects. Batches are specifically kept of reasonable size for quarantine_put(). On a machine with 256GB of RAM we should have ~512 non-empty batches, each with 16MB of objects. Link: http://lkml.kernel.org/r/20170308154239.25440-1-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Greg Thelen <gthelen@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	mm: do not call mem_cgroup_free() from within mem_cgroup_alloc()	Tahsin Erdogan	1	-3/+8
	mem_cgroup_free() indirectly calls wb_domain_exit() which is not prepared to deal with a struct wb_domain object that hasn't executed wb_domain_init(). For instance, the following warning message is printed by lockdep if alloc_percpu() fails in mem_cgroup_alloc(): INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 1950 Comm: mkdir Not tainted 4.10.0+ #151 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x67/0x99 register_lock_class+0x36d/0x540 __lock_acquire+0x7f/0x1a30 lock_acquire+0xcc/0x200 del_timer_sync+0x3c/0xc0 wb_domain_exit+0x14/0x20 mem_cgroup_free+0x14/0x40 mem_cgroup_css_alloc+0x3f9/0x620 cgroup_apply_control_enable+0x190/0x390 cgroup_mkdir+0x290/0x3d0 kernfs_iop_mkdir+0x58/0x80 vfs_mkdir+0x10e/0x1a0 SyS_mkdirat+0xa8/0xd0 SyS_mkdir+0x14/0x20 entry_SYSCALL_64_fastpath+0x18/0xad Add __mem_cgroup_free() which skips wb_domain_exit(). This is used by both mem_cgroup_free() and mem_cgroup_alloc() clean up. Fixes: 0b8f73e104285 ("mm: memcontrol: clean up alloc, online, offline, free functions") Link: http://lkml.kernel.org/r/20170306192122.24262-1-tahsin@google.com Signed-off-by: Tahsin Erdogan <tahsin@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	thp: fix another corner case of munlock() vs. THPs	Kirill A. Shutemov	1	-5/+4
	The following test case triggers BUG() in munlock_vma_pages_range(): int main(int argc, char *argv[]) { int fd; system("mount -t tmpfs -o huge=always none /mnt"); fd = open("/mnt/test", O_CREAT \| O_RDWR); ftruncate(fd, 4UL << 20); mmap(NULL, 4UL << 20, PROT_READ \| PROT_WRITE, MAP_SHARED \| MAP_FIXED \| MAP_LOCKED, fd, 0); mmap(NULL, 4096, PROT_READ \| PROT_WRITE, MAP_SHARED \| MAP_LOCKED, fd, 0); munlockall(); return 0; } The second mmap() create PTE-mapping of the first huge page in file. It makes kernel munlock the page as we never keep PTE-mapped page mlocked. On munlockall() when we handle vma created by the first mmap(), munlock_vma_page() returns page_mask == 0, as the page is not mlocked anymore. On next iteration follow_page_mask() return tail page, but page_mask is HPAGE_NR_PAGES - 1. It makes us skip to the first tail page of the next huge page and step on VM_BUG_ON_PAGE(PageMlocked(page)). The fix is not use the page_mask from follow_page_mask() at all. It has no use for us. Link: http://lkml.kernel.org/r/20170302150252.34120-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	rmap: fix NULL-pointer dereference on THP munlocking	Kirill A. Shutemov	1	-6/+7
	The following test case triggers NULL-pointer derefernce in try_to_unmap_one(): #include <fcntl.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> int main(int argc, char *argv[]) { int fd; system("mount -t tmpfs -o huge=always none /mnt"); fd = open("/mnt/test", O_CREAT \| O_RDWR); ftruncate(fd, 2UL << 20); mmap(NULL, 2UL << 20, PROT_READ \| PROT_WRITE, MAP_SHARED \| MAP_FIXED \| MAP_LOCKED, fd, 0); mmap(NULL, 2UL << 20, PROT_READ \| PROT_WRITE, MAP_SHARED \| MAP_LOCKED, fd, 0); munlockall(); return 0; } Apparently, there's a case when we call try_to_unmap() on huge PMDs: it's TTU_MUNLOCK. Let's handle this case correctly. Fixes: c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use page_vma_mapped_walk()") Link: http://lkml.kernel.org/r/20170302151159.30592-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	mm/memblock.c: fix memblock_next_valid_pfn()	AKASHI Takahiro	1	-1/+4
	Obviously, we should not access memblock.memory.regions[right] if 'right' is outside of [0..memblock.memory.cnt>. Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") Link: http://lkml.kernel.org/r/20170303023745.9104-1-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Paul Burton <paul.burton@imgtec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	userfaultfd: selftest: vm: allow to build in vm/ directory	Andrea Arcangeli	1	-0/+4
	linux/tools/testing/selftests/vm $ make gcc -Wall -I ../../../../usr/include compaction_test.c -lrt -o /compaction_test /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.4/../../../../x86_64-pc-linux-gnu/bin/ld: cannot open output file /compaction_test: Permission denied collect2: error: ld returned 1 exit status make: *** [../lib.mk:54: /compaction_test] Error 1 Since commit a8ba798bc8ec ("selftests: enable O and KBUILD_OUTPUT") selftests/vm build fails if run from the "selftests/vm" directory, but it works in the selftests/ directory. It's quicker to be able to do a local vm-only build after a tree wipe and this patch allows for it again. Link: http://lkml.kernel.org/r/20170302173738.18994-4-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED	Andrea Arcangeli	3	-13/+47
	userfaultfd_remove() has to be execute before zapping the pagetables or UFFDIO_COPY could keep filling pages after zap_page_range returned, which would result in non zero data after a MADV_DONTNEED. However userfaultfd_remove() may have to release the mmap_sem. This was handled correctly in MADV_REMOVE, but MADV_DONTNEED accessed a potentially stale vma (the very vma passed to zap_page_range(vma, ...)). The fix consists in revalidating the vma in case userfaultfd_remove() had to release the mmap_sem. This also optimizes away an unnecessary down_read/up_read in the MADV_REMOVE case if UFFD_EVENT_FORK had to be delivered. It all remains zero runtime cost in case CONFIG_USERFAULTFD=n as userfaultfd_remove() will be defined as "true" at build time. Link: http://lkml.kernel.org/r/20170302173738.18994-3-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	userfaultfd: non-cooperative: fix fork fctx->new memleak	Mike Rapoport	1	-0/+9
	We have a memleak in the ->new ctx if the uffd of the parent is closed before the fork event is read, nothing frees the new context. Link: http://lkml.kernel.org/r/20170302173738.18994-2-aarcange@redhat.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	mm/cgroup: avoid panic when init with low memory	Laurent Dufour	1	-2/+5
	The system may panic when initialisation is done when almost all the memory is assigned to the huge pages using the kernel command line parameter hugepage=xxxx. Panic may occur like this: Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xc000000000302b88 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 [ 0.082424] NUMA pSeries Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-15-generic #16-Ubuntu task: c00000021ed01600 task.stack: c00000010d108000 NIP: c000000000302b88 LR: c000000000270e04 CTR: c00000000016cfd0 REGS: c00000010d10b2c0 TRAP: 0300 Not tainted (4.9.0-15-generic) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>[ 0.082770] CR: 28424422 XER: 00000000 CFAR: c0000000003d28b8 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1 GPR00: c000000000270e04 c00000010d10b540 c00000000141a300 c00000010fff6300 GPR04: 0000000000000000 00000000026012c0 c00000010d10b630 0000000487ab0000 GPR08: 000000010ee90000 c000000001454fd8 0000000000000000 0000000000000000 GPR12: 0000000000004400 c00000000fb80000 00000000026012c0 00000000026012c0 GPR16: 00000000026012c0 0000000000000000 0000000000000000 0000000000000002 GPR20: 000000000000000c 0000000000000000 0000000000000000 00000000024200c0 GPR24: c0000000016eef48 0000000000000000 c00000010fff7d00 00000000026012c0 GPR28: 0000000000000000 c00000010fff7d00 c00000010fff6300 c00000010d10b6d0 NIP mem_cgroup_soft_limit_reclaim+0xf8/0x4f0 LR do_try_to_free_pages+0x1b4/0x450 Call Trace: do_try_to_free_pages+0x1b4/0x450 try_to_free_pages+0xf8/0x270 __alloc_pages_nodemask+0x7a8/0xff0 new_slab+0x104/0x8e0 ___slab_alloc+0x620/0x700 __slab_alloc+0x34/0x60 kmem_cache_alloc_node_trace+0xdc/0x310 mem_cgroup_init+0x158/0x1c8 do_one_initcall+0x68/0x1d0 kernel_init_freeable+0x278/0x360 kernel_init+0x24/0x170 ret_from_kernel_thread+0x5c/0x74 Instruction dump: eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020 3d230001 e9499a42 3d220004 3929acd8 794a1f24 7d295214 eac90100 <e9360000> 2fa90000 419eff74 3b200000 ---[ end trace 342f5208b00d01b6 ]--- This is a chicken and egg issue where the kernel try to get free memory when allocating per node data in mem_cgroup_init(), but in that path mem_cgroup_soft_limit_reclaim() is called which assumes that these data are allocated. As mem_cgroup_soft_limit_reclaim() is best effort, it should return when these data are not yet allocated. This patch also fixes potential null pointer access in mem_cgroup_remove_from_trees() and mem_cgroup_update_tree(). Link: http://lkml.kernel.org/r/1487856999-16581-2-git-send-email-ldufour@linux.vnet.ibm.com Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Balbir Singh <bsingharora@gmail.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	drivers/md/bcache/util.h: remove duplicate inclusion of blkdev.h	Masanari Iida	1	-1/+0
	Link: http://lkml.kernel.org/r/20170226060230.11555-1-standby24x7@gmail.com Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Coly Li <colyli@suse.de> Cc: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	mm/vmstats: add thp_split_pud event for clarity	Yisheng Xie	3	-1/+7
	We added support for PUD-sized transparent hugepages, however we count the event "thp split pud" into thp_split_pmd event. To separate the event count of thp split pud from pmd, add a new event named thp_split_pud. Link: http://lkml.kernel.org/r/1488282380-5076-1-git-send-email-xieyisheng1@huawei.com Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	include/linux/fs.h: fix unsigned enum warning with gcc-4.2	Arnd Bergmann	1	-1/+1
	With arm-linux-gcc-4.2, almost every file we build in the kernel ends up with this warning: include/linux/fs.h:2648: warning: comparison of unsigned expression < 0 is always false Later versions don't have this problem, but it's easy enough to work around. Link: http://lkml.kernel.org/r/20161216105634.235457-12-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	userfaultfd: non-cooperative: release all ctx in dup_userfaultfd_complete	Andrea Arcangeli	1	-13/+5
	Don't stop running dup_fctx() even if userfaultfd_event_wait_completion fails as it has to run userfaultfd_ctx_put on all ctx to pair against the userfaultfd_ctx_get that was run on all fctx->orig in dup_userfaultfd. Link: http://lkml.kernel.org/r/20170224181957.19736-4-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	userfaultfd: non-cooperative: robustness check	Andrea Arcangeli	1	-2/+7
	Similar to the handle_userfault() case, also make sure to never attempt to send any event past the PF_EXITING point of no return. This is purely a robustness check. Link: http://lkml.kernel.org/r/20170224181957.19736-3-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	userfaultfd: non-cooperative: rollback userfaultfd_exit	Andrea Arcangeli	5	-43/+1
	Patch series "userfaultfd non-cooperative further update for 4.11 merge window". Unfortunately I noticed one relevant bug in userfaultfd_exit while doing more testing. I've been doing testing before and this was also tested by kbuild bot and exercised by the selftest, but this bug never reproduced before. I dropped userfaultfd_exit as result. I dropped it because of implementation difficulty in receiving signals in __mmput and because I think -ENOSPC as result from the background UFFDIO_COPY should be enough already. Before I decided to remove userfaultfd_exit, I noticed userfaultfd_exit wasn't exercised by the selftest and when I tried to exercise it, after moving it to a more correct place in __mmput where it would make more sense and where the vma list is stable, it resulted in the event_wait_completion in D state. So then I added the second patch to be sure even if we call userfaultfd_event_wait_completion too late during task exit(), we won't risk to generate tasks in D state. The same check exists in handle_userfault() for the same reason, except it makes a difference there, while here is just a robustness check and it's run under WARN_ON_ONCE. While looking at the userfaultfd_event_wait_completion() function I looked back at its callers too while at it and I think it's not ok to stop executing dup_fctx on the fcs list because we relay on userfaultfd_event_wait_completion to execute userfaultfd_ctx_put(fctx->orig) which is paired against userfaultfd_ctx_get(fctx->orig) in dup_userfault just before list_add(fcs). This change only takes care of fctx->orig but this area also needs further review looking for similar problems in fctx->new. The only patch that is urgent is the first because it's an use after free during a SMP race condition that affects all processes if CONFIG_USERFAULTFD=y. Very hard to reproduce though and probably impossible without SLUB poisoning enabled. This patch (of 3): I once reproduced this oops with the userfaultfd selftest, it's not easily reproducible and it requires SLUB poisoning to reproduce. general protection fault: 0000 [#1] SMP Modules linked in: CPU: 2 PID: 18421 Comm: userfaultfd Tainted: G ------------ T 3.10.0+ #15 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014 task: ffff8801f83b9440 ti: ffff8801f833c000 task.ti: ffff8801f833c000 RIP: 0010:[<ffffffff81451299>] [<ffffffff81451299>] userfaultfd_exit+0x29/0xa0 RSP: 0018:ffff8801f833fe80 EFLAGS: 00010202 RAX: ffff8801f833ffd8 RBX: 6b6b6b6b6b6b6b6b RCX: ffff8801f83b9440 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800baf18600 RBP: ffff8801f833fee8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: ffffffff8127ceb3 R12: 0000000000000000 R13: ffff8800baf186b0 R14: ffff8801f83b99f8 R15: 00007faed746c700 FS: 0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007faf0966f028 CR3: 0000000001bc6000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: do_exit+0x297/0xd10 SyS_exit+0x17/0x20 tracesys+0xdd/0xe2 Code: 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 48 83 ec 58 48 8b 1f 48 85 db 75 11 eb 73 66 0f 1f 44 00 00 48 8b 5b 10 48 85 db 74 64 <4c> 8b a3 b8 00 00 00 4d 85 e4 74 eb 41 f6 84 24 2c 01 00 00 80 RIP [<ffffffff81451299>] userfaultfd_exit+0x29/0xa0 RSP <ffff8801f833fe80> ---[ end trace 9fecd6dcb442846a ]--- In the debugger I located the "mm" pointer in the stack and walking mm->mmap->vm_next through the end shows the vma->vm_next list is fully consistent and it is null terminated list as expected. So this has to be an SMP race condition where userfaultfd_exit was running while the vma list was being modified by another CPU. When userfaultfd_exit() run one of the ->vm_next pointers pointed to SLAB_POISON (RBX is the vma pointer and is 0x6b6b..). The reason is that it's not running in __mmput but while there are still other threads running and it's not holding the mmap_sem (it can't as it has to wait the even to be received by the manager). So this is an use after free that was happening for all processes. One more implementation problem aside from the race condition: userfaultfd_exit has really to check a flag in mm->flags before walking the vma or it's going to slowdown the exit() path for regular tasks. One more implementation problem: at that point signals can't be delivered so it would also create a task in D state if the manager doesn't read the event. The major design issue: it overall looks superfluous as the manager can check for -ENOSPC in the background transfer: if (mmget_not_zero(ctx->mm)) { [..] } else { return -ENOSPC; } It's safer to roll it back and re-introduce it later if at all. [rppt@linux.vnet.ibm.com: documentation fixup after removal of UFFD_EVENT_EXIT] Link: http://lkml.kernel.org/r/1488345437-4364-1-git-send-email-rppt@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/20170224181957.19736-2-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	x86, mm: unify exit paths in gup_pte_range()	Dan Williams	1	-19/+20
	All exit paths from gup_pte_range() require pte_unmap() of the original pte page before returning. Refactor the code to have a single exit point to do the unmap. This mirrors the flow of the generic gup_pte_range() in mm/gup.c. Link: http://lkml.kernel.org/r/148804251828.36605.14910389618497006945.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	x86, mm: fix gup_pte_range() vs DAX mappings	Dan Williams	1	-2/+6
	gup_pte_range() fails to check pte_allows_gup() before translating a DAX pte entry, pte_devmap(), to a page. This allows writes to read-only mappings, and bypasses the DAX cacheline dirty tracking due to missed 'mkwrite' faults. The gup_huge_pmd() path and the gup_huge_pud() path correctly check pte_allows_gup() before checking for _devmap() entries. Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Link: http://lkml.kernel.org/r/148804251312.36605.12665024794196605053.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Xiong Zhou <xzhou@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	power/mm: update pte_write and pte_wrprotect to handle savedwrite	Aneesh Kumar K.V	3	-7/+21
	We use pte_write() to check whethwer the pte entry is writable. This is mostly used to later mark the pte read only if it is writable. The other use of pte_write() is to check whether the pte_entry is writable so that hardware page table entry can be marked accordingly. This is used in kvm where we look at qemu page table entry and update hardware hash page table for the guest with correct write enable bit. With the above, for the first usage we should also check the savedwrite bit so that we can correctly clear the savedwite bit. For the later, we add a new variant __pte_write(). With this we can revert write_protect_page part of 595cd8f256d2 ("mm/ksm: handle protnone saved writes when making page write protect"). But I left it as it is as an example code for savedwrite check. Fixes: c137a2757b886 ("powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write") Link: http://lkml.kernel.org/r/1488203787-17849-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Rik van Riel <riel@surriel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	powerpc/mm: handle protnone ptes on fork	Aneesh Kumar K.V	1	-31/+42
	We need to mark pages of parent process read only on fork. Numa fault pte needs a protnone ptes variant with saved write flag set. On fork we need to make sure we remove the saved write bit. Instead of adding the protnone check in the caller update ptep_set_wrprotect variants to clear savedwrite bit. Without this we see random segfaults in application on fork. Fixes: c137a2757b886 ("powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write") Link: http://lkml.kernel.org/r/1488203787-17849-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Rik van Riel <riel@surriel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	scripts/spelling.txt: add "overide" pattern and fix typo instances	Masahiro Yamada	16	-19/+18
	Fix typos and add the following to the scripts/spelling.txt: overide\|\|override While we are here, fix the doubled "address" in the touched line Documentation/devicetree/bindings/regulator/ti-abb-regulator.txt. Also, fix the comment block style in the touched hunks in drivers/media/dvb-frontends/drx39xyj/drx_driver.h. Link: http://lkml.kernel.org/r/1481573103-11329-21-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	scripts/spelling.txt: add "disble(d)" pattern and fix typo instances	Masahiro Yamada	17	-18/+19
	Fix typos and add the following to the scripts/spelling.txt: disble\|\|disable disbled\|\|disabled I kept the TSL2563_INT_DISBLED in /drivers/iio/light/tsl2563.c untouched. The macro is not referenced at all, but this commit is touching only comment blocks just in case. Link: http://lkml.kernel.org/r/1481573103-11329-20-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	userfaultfd: shmem: __do_fault requires VM_FAULT_NOPAGE	Andrea Arcangeli	1	-1/+1
	__do_fault assumes vmf->page has been initialized and is valid if VM_FAULT_NOPAGE is not returned by vma->vm_ops->fault(vma, vmf). handle_userfault() in turn should return VM_FAULT_NOPAGE if it doesn't return VM_FAULT_SIGBUS or VM_FAULT_RETRY (the other two possibilities). This VM_FAULT_NOPAGE case is only invoked when signal are pending and it didn't matter for anonymous memory before. It only started to matter since shmem was introduced. hugetlbfs also takes a different path and doesn't exercise __do_fault. Link: http://lkml.kernel.org/r/20170228154201.GH5816@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	Merge tag 'pm-4.11-rc2' of ↵	Linus Torvalds	4	-46/+44
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix several issues in the intel_pstate driver and one issue in the schedutil cpufreq governor, clean up that governor a bit and hook up existing code for disabling cpufreq to a new kernel command line option. Specifics: - Three fixes for intel_pstate problems related to the passive mode (in which it acts as a regular cpufreq scaling driver), two for the handling of global P-state limits and one for the handling of the cpu_frequency tracepoint in that mode (Rafael Wysocki). - Three fixes for the handling of P-state limits in intel_pstate in the active mode (Rafael Wysocki). - Introduction of a new cpufreq.off=1 kernel command line argument that will disable cpufreq entirely if passed to the kernel and is simply hooked up to the existing code used by Xen (Len Brown). - Fix for the schedutil cpufreq governor to prevent it from using stale raw frequency values in configurations with mutiple CPUs sharing one policy object and a cleanup for it reducing its overhead slightly (Viresh Kumar)" * tag 'pm-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy cpufreq: intel_pstate: Fix intel_pstate_verify_policy() cpufreq: intel_pstate: Fix global settings in active mode cpufreq: Add the "cpufreq.off=1" cmdline option cpufreq: schedutil: Pass sg_policy to get_next_freq() cpufreq: schedutil: move cached_raw_freq to struct sugov_policy cpufreq: intel_pstate: Avoid triggering cpu_frequency tracepoint unnecessarily cpufreq: intel_pstate: Fix intel_cpufreq_verify_policy() cpufreq: intel_pstate: Do not use performance_limits in passive mode
2017-03-09	Merge tag 'pci-v4.11-fixes-2' of ↵	Linus Torvalds	3	-7/+7
	git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "PCI fixes: - fix NULL pointer dereference in Exynos driver - fix NULL pointer dereference in ASPM with pre-1.1 PCIe devices - blacklist QLogic ISP2722 to prevent panics while reading VPD" * tag 'pci-v4.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI/ASPM: Always set link->downstream to avoid NULL dereference on remove PCI: Prevent VPD access for QLogic ISP2722 PCI: exynos: Initialize elbi_base even when using PHY framework
2017-03-09	Merge branch 'for-linus' of git://git.kernel.dk/linux-block	Linus Torvalds	12	-132/+112
	Pull block fixes from Jens Axboe: "Sending this a bit sooner than I otherwise would have, as a fix in the merge window had some unfortunate issues and side effects for some folks. This contains: - Fixes from Jan for the bdi registration/unregistration. These have been tested by the various parties reporting issues, and should be solid at this point. - Also from Jan, fix for axonram gendisk registration. - A stable fix for zram from Johannes. - A small series from Ming, fixing up some long standing issues with blk-mq hardware queue kobject initialization and registration. - A fix for sed opal from Jon, fixing a nonsensical range check and some set-but-not-used variables. - A fix from Neil for a long standing deadlock issue for stacking device drivers. With this in place, dm/md don't have to work around the issue anymore, and can be properly fixed up" * 'for-linus' of git://git.kernel.dk/linux-block: axonram: Fix gendisk handling blk: improve order of bio handling in generic_make_request() Revert "scsi, block: fix duplicate bdi name registration crashes" block: Make del_gendisk() safer for disks without queues bdi: Fix use-after-free in wb_congested_put() block: Allow bdi re-registration block/sed: Fix opal user range check and unused variables zram: set physical queue limits to avoid array out of bounds accesses blk-mq: free hctx->cpumask in release handler of hctx's kobject blk-mq: make lifetime consistent between hctx and its kobject blk-mq: make lifetime consitent between q/ctx and its kobject blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue()
2017-03-09	Merge tag 'media/v4.11-2' of ↵	Linus Torvalds	8	-196/+260
	git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "Media regression fixes: - serial_ir: fix a Kernel crash during boot on Kernel 4.11-rc1, due to an IRQ code called too early - other IR regression fixes at lirc and at the raw IR decoding - a deadlock fix at the RC nuvoton driver - fix another issue with DMA on stack at dw2102 driver There's an extra patch there that change a driver interface for the SoC VSP1 driver, with is shared between the DRM and V4L2 driver. The patch itself is trivial, and was acked by David Arlie" * tag 'media/v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: [media] v4l: vsp1: Adapt vsp1_du_setup_lif() interface to use a structure [media] dw2102: don't do DMA on stack [media] rc: protocol is not set on register for raw IR devices [media] rc: raw decoder for keymap protocol is not loaded on register [media] rc: nuvoton: fix deadlock in nvt_write_wakeup_codes [media] lirc: fix dead lock between open and wakeup_filter [media] serial_ir: ensure we're ready to receive interrupts