summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2017-05-04Merge tag 'platform-drivers-x86-v4.12-1' of ↵Linus Torvalds34-1294/+1618
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform-drivers update from Darren Hart: "This represents a significantly larger and more complex set of changes than those of prior merge windows. In particular, we had several changes with dependencies on other subsystems which we felt were best managed through merges of immutable branches, including one each from input, i2c, and leds. Two patches for the watchdog subsystem are included after discussion with Wim and Guenter following a collision in linux-next (this should be resolved and you should only see these two appear in this pull request). These are called out in the "External" section below. Summary of changes: - significant further cleanup of fujitsu-laptop and hp-wmi - new model support for ideapad, asus, silead, and xiaomi - new hotkeys for thinkpad and models using intel-vbtn - dell keyboard backlight improvements - build and dependency improvements - intel * ipc fixes, cleanups, and api updates - single isolated fixes noted below External: - watchdog: iTCO_wdt: Add PMC specific noreboot update api - watchdog: iTCO_wdt: cleanup set/unset no_reboot_bit functions - Merge branch 'ib/4.10-sparse-keymap-managed' - Merge branch 'i2c/for-INT33FE' - Merge branch 'linux-leds/dell-laptop-changes-for-4.12' platform/x86: - Add Intel Cherry Trail ACPI INT33FE device driver - remove sparse_keymap_free() calls - Make SILEAD_DMI depend on TOUCHSCREEN_SILEAD asus-wmi: - try to set als by default - fix cpufv sysfs file permission acer-wmi: - setup accelerometer when ACPI device was found ideapad-laptop: - Add IdeaPad V310-15ISK to no_hw_rfkill - Add IdeaPad 310-15IKB to no_hw_rfkill intel_pmc_ipc: - use gcr mem base for S0ix counter read - Fix iTCO_wdt GCS memory mapping failure - Add pmc gcr read/write/update api's - fix gcr offset dell-laptop: - Add keyboard backlight timeout AC settings - Handle return error form dell_get_intensity. - Protect kbd_state against races - Refactor kbd_led_triggers_store() hp-wireless: - reuse module_acpi_driver - add Xiaomi's hardware id to the supported list intel-vbtn: - add volume up and down INT33FE: - add i2c dependency hp-wmi: - Cleanup exit paths - Do not shadow errors in sysfs show functions - Use DEVICE_ATTR_(RO|RW) helper macros - Refactor dock and tablet state fetchers - Cleanup wireless get_(hw|sw)state functions - Refactor redundant HPWMI_READ functions - Standardize enum usage for constants - Cleanup local variable declarations - Do not shadow error values - Fix detection for dock and tablet mode - Fix error value for hp_wmi_tablet_state fujitsu-laptop: - simplify error handling in acpi_fujitsu_laptop_add() - do not log LED registration failures - switch to managed LED class devices - reorganize LED-related code - refactor LED registration - select LEDS_CLASS - remove redundant fields from struct fujitsu_bl - account for backlight power when determining brightness - do not log set_lcd_level() failures in bl_update_status() - ignore errors when setting backlight power - make disable_brightness_adjust a boolean - clean up use_alt_lcd_levels handling - sync brightness in set_lcd_level() - simplify set_lcd_level() - merge set_lcd_level_alt() into set_lcd_level() - switch to a managed backlight device - only handle backlight when appropriate - update debug message logged by call_fext_func() - rename call_fext_func() arguments - simplify call_fext_func() - clean up local variables in call_fext_func() - remove keycode fields from struct fujitsu_bl - model-dependent sparse keymap overrides - use a sparse keymap for hotkey event generation - switch to a managed hotkey input device - refactor hotkey input device setup - use a sparse keymap for brightness key events - switch to a managed backlight input device - refactor backlight input device setup - remove pf_device field from struct fujitsu_bl - only register platform device if FUJ02E3 is present - add and remove platform device in separate functions - simplify platform device attribute definitions - remove backlight-related attributes from the platform device - cleanup error labels in fujitsu_init() - only register backlight device if FUJ02B1 is present - sync backlight power status in acpi_fujitsu_laptop_add() - register backlight device in a separate function - simplify brightness key event generation logic - decrease indentation in acpi_fujitsu_bl_notify() intel-hid: - Add missing ->thaw callback - do not set parents of input devices explicitly - remove redundant set_bit() call - use devm_input_allocate_device() for HID events input device - make intel_hid_set_enable() take a boolean argument - simplify enabling/disabling HID events silead_dmi: - Add touchscreen info for Surftab Wintron 7.0 - Abort early if DMI does not match - Do not treat all devices as i2c_clients - Add entry for Insyde 7W tablets - Constify properties arrays intel_scu_ipc: - Introduce intel_scu_ipc_raw_command() - Introduce SCU_DEVICE() macro - Remove redundant subarch check - Rearrange init sequence - Platform data is mandatory asus-nb-wmi: - Add wapf4 quirk for the X302UA dell-*: - Call new led hw_changed API on kbd brightness change - Add a generic dell-laptop notifier chain eeepc-laptop: - Skip unknown key messages 0x50 0x51 thinkpad_acpi: - add mapping for new hotkeys - guard generic hotkey case" * tag 'platform-drivers-x86-v4.12-1' of git://git.infradead.org/linux-platform-drivers-x86: (108 commits) platform/x86: Make SILEAD_DMI depend on TOUCHSCREEN_SILEAD platform/x86: asus-wmi: try to set als by default platform/x86: asus-wmi: fix cpufv sysfs file permission platform/x86: acer-wmi: setup accelerometer when ACPI device was found platform/x86: ideapad-laptop: Add IdeaPad V310-15ISK to no_hw_rfkill platform/x86: intel_pmc_ipc: use gcr mem base for S0ix counter read platform/x86: intel_pmc_ipc: Fix iTCO_wdt GCS memory mapping failure watchdog: iTCO_wdt: Add PMC specific noreboot update api watchdog: iTCO_wdt: cleanup set/unset no_reboot_bit functions platform/x86: intel_pmc_ipc: Add pmc gcr read/write/update api's platform/x86: intel_pmc_ipc: fix gcr offset platform/x86: dell-laptop: Add keyboard backlight timeout AC settings platform/x86: dell-laptop: Handle return error form dell_get_intensity. platform/x86: hp-wireless: reuse module_acpi_driver platform/x86: intel-vbtn: add volume up and down platform/x86: INT33FE: add i2c dependency platform/x86: hp-wmi: Cleanup exit paths platform/x86: hp-wmi: Do not shadow errors in sysfs show functions platform/x86: hp-wmi: Use DEVICE_ATTR_(RO|RW) helper macros platform/x86: hp-wmi: Refactor dock and tablet state fetchers ...
2017-05-04Merge tag 'vfio-v4.12-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds2-86/+77
Pull VFIO updates from Alex Williamson: - Updates for SPAPR IOMMU backend including compatibility test and memory allocation check (Alexey Kardashevskiy) - Updates for type1 IOMMU backend to remove asynchronous locked page accounting and remove redundancy (Alex Williamson) * tag 'vfio-v4.12-rc1' of git://github.com/awilliam/linux-vfio: vfio/type1: Reduce repetitive calls in vfio_pin_pages_remote() vfio/type1: Prune vfio_pin_page_external() vfio/type1: Remove locked page accounting workqueue vfio/spapr_tce: Check kzalloc() return when preregistering memory vfio/powerpc/spapr_tce: Enforce IOMMU type compatibility check
2017-05-04Merge tag 'for-linus-4.12b-rc0b-tag' of ↵Linus Torvalds53-5301/+8410
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: "Xen fixes and featrues for 4.12. The main changes are: - enable building the kernel with Xen support but without enabling paravirtualized mode (Vitaly Kuznetsov) - add a new 9pfs xen frontend driver (Stefano Stabellini) - simplify Xen's cpuid handling by making use of cpu capabilities (Juergen Gross) - add/modify some headers for new Xen paravirtualized devices (Oleksandr Andrushchenko) - EFI reset_system support under Xen (Julien Grall) - and the usual cleanups and corrections" * tag 'for-linus-4.12b-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (57 commits) xen: Move xen_have_vector_callback definition to enlighten.c xen: Implement EFI reset_system callback arm/xen: Consolidate calls to shutdown hypercall in a single helper xen: Export xen_reboot xen/x86: Call xen_smp_intr_init_pv() on BSP xen: Revert commits da72ff5bfcb0 and 72a9b186292d xen/pvh: Do not fill kernel's e820 map in init_pvh_bootparams() xen/scsifront: use offset_in_page() macro xen/arm,arm64: rename __generic_dma_ops to xen_get_dma_ops xen/arm,arm64: fix xen_dma_ops after 815dd18 "Consolidate get_dma_ops..." xen/9pfs: select CONFIG_XEN_XENBUS_FRONTEND x86/cpu: remove hypervisor specific set_cpu_features vmware: set cpu capabilities during platform initialization x86/xen: use capabilities instead of fake cpuid values for xsave x86/xen: use capabilities instead of fake cpuid values for x2apic x86/xen: use capabilities instead of fake cpuid values for mwait x86/xen: use capabilities instead of fake cpuid values for acpi x86/xen: use capabilities instead of fake cpuid values for acc x86/xen: use capabilities instead of fake cpuid values for mtrr x86/xen: use capabilities instead of fake cpuid values for aperf ...
2017-05-03Merge tag 'modules-for-v4.12' of ↵Linus Torvalds3-6/+5
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull modules updates from Jessica Yu: - Minor code cleanups - Fix section alignment for .init_array * tag 'modules-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: kallsyms: Use bounded strnchr() when parsing string module: Unify the return value type of try_module_get module: set .init_array alignment to 8
2017-05-03Merge tag 'trace-v4.12' of ↵Linus Torvalds40-656/+1902
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "New features for this release: - Pretty much a full rewrite of the processing of function plugins. i.e. echo do_IRQ:stacktrace > set_ftrace_filter - The rewrite was needed to add plugins to be unique to tracing instances. i.e. mkdir instance/foo; cd instances/foo; echo do_IRQ:stacktrace > set_ftrace_filter The old way was written very hacky. This removes a lot of those hacks. - New "function-fork" tracing option. When set, pids in the set_ftrace_pid will have their children added when the processes with their pids listed in the set_ftrace_pid file forks. - Exposure of "maxactive" for kretprobe in kprobe_events - Allow for builtin init functions to be traced by the function tracer (via the kernel command line). Module init function tracing will come in the next release. - Added more selftests, and have selftests also test in an instance" * tag 'trace-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (60 commits) ring-buffer: Return reader page back into existing ring buffer selftests: ftrace: Allow some event trigger tests to run in an instance selftests: ftrace: Have some basic tests run in a tracing instance too selftests: ftrace: Have event tests also run in an tracing instance selftests: ftrace: Make func_event_triggers and func_traceonoff_triggers tests do instances selftests: ftrace: Allow some tests to be run in a tracing instance tracing/ftrace: Allow for instances to trigger their own stacktrace probes tracing/ftrace: Allow for the traceonoff probe be unique to instances tracing/ftrace: Enable snapshot function trigger to work with instances tracing/ftrace: Allow instances to have their own function probes tracing/ftrace: Add a better way to pass data via the probe functions ftrace: Dynamically create the probe ftrace_ops for the trace_array tracing: Pass the trace_array into ftrace_probe_ops functions tracing: Have the trace_array hold the list of registered func probes ftrace: If the hash for a probe fails to update then free what was initialized ftrace: Have the function probes call their own function ftrace: Have each function probe use its own ftrace_ops ftrace: Have unregister_ftrace_function_probe_func() return a value ftrace: Add helper function ftrace_hash_move_and_update_ops() ftrace: Remove data field from ftrace_func_probe structure ...
2017-05-03Merge branch 'for-linus' of ↵Linus Torvalds1-25/+56
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - There is a situation when early console is not deregistered because the preferred one matches a wrong entry. It caused messages to appear twice. This is the 2nd attempt to fix it. The first one was wrong, see the commit c6c7d83b9c9e ('Revert "console: don't prefer first registered if DT specifies stdout-path"'). The fix is coupled with some small code clean up. Well, the console registration code would deserve a big one. We need to think about it. - Do not lose information about the preemtive context when the console semaphore is re-taken. - Do not block CPU hotplug when someone else is already pushing messages to the console. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: fix double printing with earlycon printk: rename selected_console -> preferred_console printk: fix name/type/scope of preferred_console var printk: Correctly handle preemption in console_unlock() printk: use console_trylock() in console_cpu_notify()
2017-05-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds79-1484/+2141
Merge misc updates from Andrew Morton: - a few misc things - most of MM - KASAN updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) kasan: separate report parts by empty lines kasan: improve double-free report format kasan: print page description after stacks kasan: improve slab object description kasan: change report header kasan: simplify address description logic kasan: change allocation and freeing stack traces headers kasan: unify report headers kasan: introduce helper functions for determining bug type mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page mm: hwpoison: call shake_page() unconditionally mm/swapfile.c: fix swap space leak in error path of swap_free_entries() mm/gup.c: fix access_ok() argument type mm/truncate: avoid pointless cleancache_invalidate_inode() calls. mm/truncate: bail out early from invalidate_inode_pages2_range() if mapping is empty fs/block_dev: always invalidate cleancache in invalidate_bdev() fs: fix data invalidation in the cleancache during direct IO zram: reduce load operation in page_same_filled zram: use zram_free_page instead of open-coded zram: introduce zram data accessor ...
2017-05-03kasan: separate report parts by empty linesAndrey Konovalov1-0/+7
Makes the report easier to read. Link: http://lkml.kernel.org/r/20170302134851.101218-10-andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03kasan: improve double-free report formatAndrey Konovalov3-18/+17
Changes double-free report header from BUG: Double free or freeing an invalid pointer Unexpected shadow byte: 0xFB to BUG: KASAN: double-free or invalid-free in kmalloc_oob_left+0xe5/0xef This makes a bug uniquely identifiable by the first report line. To account for removing of the unexpected shadow value, print shadow bytes at the end of the report as in reports for other kinds of bugs. Link: http://lkml.kernel.org/r/20170302134851.101218-9-andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03kasan: print page description after stacksAndrey Konovalov1-6/+8
Moves page description after the stacks since it's less important. Link: http://lkml.kernel.org/r/20170302134851.101218-8-andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03kasan: improve slab object descriptionAndrey Konovalov1-11/+42
Changes slab object description from: Object at ffff880068388540, in cache kmalloc-128 size: 128 to: The buggy address belongs to the object at ffff880068388540 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 123 bytes inside of 128-byte region [ffff880068388540, ffff8800683885c0) Makes it more explanatory and adds information about relative offset of the accessed address to the start of the object. Link: http://lkml.kernel.org/r/20170302134851.101218-7-andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03kasan: change report headerAndrey Konovalov1-4/+4
Change report header format from: BUG: KASAN: use-after-free in unwind_get_return_address+0x28a/0x2c0 at addr ffff880069437950 Read of size 8 by task insmod/3925 to: BUG: KASAN: use-after-free in unwind_get_return_address+0x28a/0x2c0 Read of size 8 at addr ffff880069437950 by task insmod/3925 The exact access address is not usually important, so move it to the second line. This also makes the header look visually balanced. Link: http://lkml.kernel.org/r/20170302134851.101218-6-andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03kasan: simplify address description logicAndrey Konovalov1-16/+21
Simplify logic for describing a memory address. Add addr_to_page() helper function. Makes the code easier to follow. Link: http://lkml.kernel.org/r/20170302134851.101218-5-andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03kasan: change allocation and freeing stack traces headersAndrey Konovalov1-6/+4
Change stack traces headers from: Allocated: PID = 42 to: Allocated by task 42: Makes the report one line shorter and look better. Link: http://lkml.kernel.org/r/20170302134851.101218-4-andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03kasan: unify report headersAndrey Konovalov1-13/+13
Unify KASAN report header format for different kinds of bad memory accesses. Makes the code simpler. Link: http://lkml.kernel.org/r/20170302134851.101218-3-andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03kasan: introduce helper functions for determining bug typeAndrey Konovalov1-10/+30
Patch series "kasan: improve error reports", v2. This patchset improves KASAN reports by making them easier to read and a little more detailed. Also improves mm/kasan/report.c readability. Effectively changes a use-after-free report to: ================================================================== BUG: KASAN: use-after-free in kmalloc_uaf+0xaa/0xb6 [test_kasan] Write of size 1 at addr ffff88006aa59da8 by task insmod/3951 CPU: 1 PID: 3951 Comm: insmod Tainted: G B 4.10.0+ #84 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x292/0x398 print_address_description+0x73/0x280 kasan_report.part.2+0x207/0x2f0 __asan_report_store1_noabort+0x2c/0x30 kmalloc_uaf+0xaa/0xb6 [test_kasan] kmalloc_tests_init+0x4f/0xa48 [test_kasan] do_one_initcall+0xf3/0x390 do_init_module+0x215/0x5d0 load_module+0x54de/0x82b0 SYSC_init_module+0x3be/0x430 SyS_init_module+0x9/0x10 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x7f22cfd0b9da RSP: 002b:00007ffe69118a78 EFLAGS: 00000206 ORIG_RAX: 00000000000000af RAX: ffffffffffffffda RBX: 0000555671242090 RCX: 00007f22cfd0b9da RDX: 00007f22cffcaf88 RSI: 000000000004df7e RDI: 00007f22d0399000 RBP: 00007f22cffcaf88 R08: 0000000000000003 R09: 0000000000000000 R10: 00007f22cfd07d0a R11: 0000000000000206 R12: 0000555671243190 R13: 000000000001fe81 R14: 0000000000000000 R15: 0000000000000004 Allocated by task 3951: save_stack_trace+0x16/0x20 save_stack+0x43/0xd0 kasan_kmalloc+0xad/0xe0 kmem_cache_alloc_trace+0x82/0x270 kmalloc_uaf+0x56/0xb6 [test_kasan] kmalloc_tests_init+0x4f/0xa48 [test_kasan] do_one_initcall+0xf3/0x390 do_init_module+0x215/0x5d0 load_module+0x54de/0x82b0 SYSC_init_module+0x3be/0x430 SyS_init_module+0x9/0x10 entry_SYSCALL_64_fastpath+0x1f/0xc2 Freed by task 3951: save_stack_trace+0x16/0x20 save_stack+0x43/0xd0 kasan_slab_free+0x72/0xc0 kfree+0xe8/0x2b0 kmalloc_uaf+0x85/0xb6 [test_kasan] kmalloc_tests_init+0x4f/0xa48 [test_kasan] do_one_initcall+0xf3/0x390 do_init_module+0x215/0x5d0 load_module+0x54de/0x82b0 SYSC_init_module+0x3be/0x430 SyS_init_module+0x9/0x10 entry_SYSCALL_64_fastpath+0x1f/0xc The buggy address belongs to the object at ffff88006aa59da0 which belongs to the cache kmalloc-16 of size 16 The buggy address is located 8 bytes inside of 16-byte region [ffff88006aa59da0, ffff88006aa59db0) The buggy address belongs to the page: page:ffffea0001aa9640 count:1 mapcount:0 mapping: (null) index:0x0 flags: 0x100000000000100(slab) raw: 0100000000000100 0000000000000000 0000000000000000 0000000180800080 raw: ffffea0001abe380 0000000700000007 ffff88006c401b40 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88006aa59c80: 00 00 fc fc 00 00 fc fc 00 00 fc fc 00 00 fc fc ffff88006aa59d00: 00 00 fc fc 00 00 fc fc 00 00 fc fc 00 00 fc fc >ffff88006aa59d80: fb fb fc fc fb fb fc fc fb fb fc fc fb fb fc fc ^ ffff88006aa59e00: fb fb fc fc fb fb fc fc fb fb fc fc fb fb fc fc ffff88006aa59e80: fb fb fc fc 00 00 fc fc 00 00 fc fc 00 00 fc fc ================================================================== from: ================================================================== BUG: KASAN: use-after-free in kmalloc_uaf+0xaa/0xb6 [test_kasan] at addr ffff88006c4dcb28 Write of size 1 by task insmod/3984 CPU: 1 PID: 3984 Comm: insmod Tainted: G B 4.10.0+ #83 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x292/0x398 kasan_object_err+0x1c/0x70 kasan_report.part.1+0x20e/0x4e0 __asan_report_store1_noabort+0x2c/0x30 kmalloc_uaf+0xaa/0xb6 [test_kasan] kmalloc_tests_init+0x4f/0xa48 [test_kasan] do_one_initcall+0xf3/0x390 do_init_module+0x215/0x5d0 load_module+0x54de/0x82b0 SYSC_init_module+0x3be/0x430 SyS_init_module+0x9/0x10 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x7feca0f779da RSP: 002b:00007ffdfeae5218 EFLAGS: 00000206 ORIG_RAX: 00000000000000af RAX: ffffffffffffffda RBX: 000055a064c13090 RCX: 00007feca0f779da RDX: 00007feca1236f88 RSI: 000000000004df7e RDI: 00007feca1605000 RBP: 00007feca1236f88 R08: 0000000000000003 R09: 0000000000000000 R10: 00007feca0f73d0a R11: 0000000000000206 R12: 000055a064c14190 R13: 000000000001fe81 R14: 0000000000000000 R15: 0000000000000004 Object at ffff88006c4dcb20, in cache kmalloc-16 size: 16 Allocated: PID = 3984 save_stack_trace+0x16/0x20 save_stack+0x43/0xd0 kasan_kmalloc+0xad/0xe0 kmem_cache_alloc_trace+0x82/0x270 kmalloc_uaf+0x56/0xb6 [test_kasan] kmalloc_tests_init+0x4f/0xa48 [test_kasan] do_one_initcall+0xf3/0x390 do_init_module+0x215/0x5d0 load_module+0x54de/0x82b0 SYSC_init_module+0x3be/0x430 SyS_init_module+0x9/0x10 entry_SYSCALL_64_fastpath+0x1f/0xc2 Freed: PID = 3984 save_stack_trace+0x16/0x20 save_stack+0x43/0xd0 kasan_slab_free+0x73/0xc0 kfree+0xe8/0x2b0 kmalloc_uaf+0x85/0xb6 [test_kasan] kmalloc_tests_init+0x4f/0xa48 [test_kasan] do_one_initcall+0xf3/0x390 do_init_module+0x215/0x5d0 load_module+0x54de/0x82b0 SYSC_init_module+0x3be/0x430 SyS_init_module+0x9/0x10 entry_SYSCALL_64_fastpath+0x1f/0xc2 Memory state around the buggy address: ffff88006c4dca00: fb fb fc fc fb fb fc fc fb fb fc fc fb fb fc fc ffff88006c4dca80: fb fb fc fc fb fb fc fc fb fb fc fc fb fb fc fc >ffff88006c4dcb00: fb fb fc fc fb fb fc fc fb fb fc fc fb fb fc fc ^ ffff88006c4dcb80: fb fb fc fc 00 00 fc fc fb fb fc fc fb fb fc fc ffff88006c4dcc00: fb fb fc fc fb fb fc fc fb fb fc fc fb fb fc fc ================================================================== This patch (of 9): Introduce get_shadow_bug_type() function, which determines bug type based on the shadow value for a particular kernel address. Introduce get_wild_bug_type() function, which determines bug type for addresses which don't have a corresponding shadow value. Link: http://lkml.kernel.org/r/20170302134851.101218-2-andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm: hwpoison: call shake_page() after try_to_unmap() for mlocked pageNaoya Horiguchi1-0/+8
Memory error handler calls try_to_unmap() for error pages in various states. If the error page is a mlocked page, error handling could fail with "still referenced by 1 users" message. This is because the page is linked to and stays in lru cache after the following call chain. try_to_unmap_one page_remove_rmap clear_page_mlock putback_lru_page lru_cache_add memory_failure() calls shake_page() to hanlde the similar issue, but current code doesn't cover because shake_page() is called only before try_to_unmap(). So this patches adds shake_page(). Fixes: 23a003bfd23ea9ea0b7756b920e51f64b284b468 ("mm/madvise: pass return code of memory_failure() to userspace") Link: http://lkml.kernel.org/r/20170417055948.GM31394@yexl-desktop Link: http://lkml.kernel.org/r/1493197841-23986-3-git-send-email-n-horiguchi@ah.jp.nec.com Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: kernel test robot <lkp@intel.com> Cc: Xiaolong Ye <xiaolong.ye@intel.com> Cc: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm: hwpoison: call shake_page() unconditionallyNaoya Horiguchi2-18/+12
shake_page() is called before going into core error handling code in order to ensure that the error page is flushed from lru_cache lists where pages stay during transferring among LRU lists. But currently it's not fully functional because when the page is linked to lru_cache by calling activate_page(), its PageLRU flag is set and shake_page() is skipped. The result is to fail error handling with "still referenced by 1 users" message. When the page is linked to lru_cache by isolate_lru_page(), its PageLRU is clear, so that's fine. This patch makes shake_page() unconditionally called to avoild the failure. Fixes: 23a003bfd23ea9ea0b7756b920e51f64b284b468 ("mm/madvise: pass return code of memory_failure() to userspace") Link: http://lkml.kernel.org/r/20170417055948.GM31394@yexl-desktop Link: http://lkml.kernel.org/r/1493197841-23986-2-git-send-email-n-horiguchi@ah.jp.nec.com Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: kernel test robot <lkp@intel.com> Cc: Xiaolong Ye <xiaolong.ye@intel.com> Cc: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm/swapfile.c: fix swap space leak in error path of swap_free_entries()Huang Ying1-2/+0
In swapcache_free_entries(), if swap_info_get_cont() returns NULL, something wrong occurs for the swap entry. But we should still continue to free the following swap entries in the array instead of skip them to avoid swap space leak. This is just problem in error path, where system may be in an inconsistent state, but it is still good to fix it. Link: http://lkml.kernel.org/r/20170421124739.24534-1-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Shaohua Li <shli@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm/gup.c: fix access_ok() argument typeArnd Bergmann1-1/+1
MIPS just got changed to only accept a pointer argument for access_ok(), causing one warning in drivers/scsi/pmcraid.c. I tried changing x86 the same way and found the same warning in __get_user_pages_fast() and nowhere else in the kernel during randconfig testing: mm/gup.c: In function '__get_user_pages_fast': mm/gup.c:1578:6: error: passing argument 1 of '__chk_range_not_ok' makes pointer from integer without a cast [-Werror=int-conversion] It would probably be a good idea to enforce type-safety in general, so let's change this file to not cause a warning if we do that. I don't know why the warning did not appear on MIPS. Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()") Link: http://lkml.kernel.org/r/20170421162659.3314521-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm/truncate: avoid pointless cleancache_invalidate_inode() calls.Andrey Ryabinin1-5/+7
cleancache_invalidate_inode() called truncate_inode_pages_range() and invalidate_inode_pages2_range() twice - on entry and on exit. It's stupid and waste of time. It's enough to call it once at exit. Link: http://lkml.kernel.org/r/20170424164135.22350-5-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Alexey Kuznetsov <kuznet@virtuozzo.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Nikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm/truncate: bail out early from invalidate_inode_pages2_range() if mapping ↵Andrey Ryabinin1-0/+3
is empty If mapping is empty (both ->nrpages and ->nrexceptional is zero) we can avoid pointless lookups in empty radix tree and bail out immediately after cleancache invalidation. Link: http://lkml.kernel.org/r/20170424164135.22350-4-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Alexey Kuznetsov <kuznet@virtuozzo.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Nikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03fs/block_dev: always invalidate cleancache in invalidate_bdev()Andrey Ryabinin1-6/+5
invalidate_bdev() calls cleancache_invalidate_inode() iff ->nrpages != 0 which doen't make any sense. Make sure that invalidate_bdev() always calls cleancache_invalidate_inode() regardless of mapping->nrpages value. Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache") Link: http://lkml.kernel.org/r/20170424164135.22350-3-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Alexey Kuznetsov <kuznet@virtuozzo.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Nikolay Borisov <n.borisov.lkml@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03fs: fix data invalidation in the cleancache during direct IOAndrey Ryabinin2-25/+19
Patch series "Properly invalidate data in the cleancache", v2. We've noticed that after direct IO write, buffered read sometimes gets stale data which is coming from the cleancache. The reason for this is that some direct write hooks call call invalidate_inode_pages2[_range]() conditionally iff mapping->nrpages is not zero, so we may not invalidate data in the cleancache. Another odd thing is that we check only for ->nrpages and don't check for ->nrexceptional, but invalidate_inode_pages2[_range] also invalidates exceptional entries as well. So we invalidate exceptional entries only if ->nrpages != 0? This doesn't feel right. - Patch 1 fixes direct IO writes by removing ->nrpages check. - Patch 2 fixes similar case in invalidate_bdev(). Note: I only fixed conditional cleancache_invalidate_inode() here. Do we also need to add ->nrexceptional check in into invalidate_bdev()? - Patches 3-4: some optimizations. This patch (of 4): Some direct IO write fs hooks call invalidate_inode_pages2[_range]() conditionally iff mapping->nrpages is not zero. This can't be right, because invalidate_inode_pages2[_range]() also invalidate data in the cleancache via cleancache_invalidate_inode() call. So if page cache is empty but there is some data in the cleancache, buffered read after direct IO write would get stale data from the cleancache. Also it doesn't feel right to check only for ->nrpages because invalidate_inode_pages2[_range] invalidates exceptional entries as well. Fix this by calling invalidate_inode_pages2[_range]() regardless of nrpages state. Note: nfs,cifs,9p doesn't need similar fix because the never call cleancache_get_page() (nor directly, nor via mpage_readpage[s]()), so they are not affected by this bug. Fixes: c515e1fd361c ("mm/fs: add hooks to support cleancache") Link: http://lkml.kernel.org/r/20170424164135.22350-2-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Alexey Kuznetsov <kuznet@virtuozzo.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Nikolay Borisov <n.borisov.lkml@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03zram: reduce load operation in page_same_filledSangwoo Park1-3/+5
In page_same_filled function, all elements in the page is compared with next index value. The current comparison routine compares the (i)th and (i+1)th values of the page. In this case, two load operaions occur for each comparison. But if we store first value of the page stores at 'val' variable and using it to compare with others, the load opearation is reduced. It reduce load operation per page by up to 64times. Link: http://lkml.kernel.org/r/1488428104-7257-1-git-send-email-sangwoo2.park@lge.com Signed-off-by: Sangwoo Park <sangwoo2.park@lge.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03zram: use zram_free_page instead of open-codedMinchan Kim1-14/+3
The zram_free_page already handles NULL handle case and same page so use it to reduce error probability. (Acutaully, I made a mistake when I handled same page feature) Link: http://lkml.kernel.org/r/1492052365-16169-7-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03zram: introduce zram data accessorMinchan Kim1-11/+22
With element, sometime I got confused handle and element access. It might be my bad but I think it's time to introduce accessor to prevent future idiot like me. This patch is just clean-up patch so it shouldn't change any behavior. Link: http://lkml.kernel.org/r/1492052365-16169-6-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03zram: remove zram_meta structureMinchan Kim2-117/+78
It's redundant now. Instead, remove it and use zram structure directly. Link: http://lkml.kernel.org/r/1492052365-16169-5-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03zram: use zram_slot_lock instead of raw bit_spin_lock opMinchan Kim1-14/+27
With this clean-up phase, I want to use zram's wrapper function to lock table access which is more consistent with other zram's functions. Link: http://lkml.kernel.org/r/1492052365-16169-4-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03zram: partial IO refactoringMinchan Kim1-153/+184
For architecture(PAGE_SIZE > 4K), zram have supported partial IO. However, the mixed code for handling normal/partial IO is too mess, error-prone to modify IO handler functions with upcoming feature so this patch aims for cleaning up zram's IO handling functions. Link: http://lkml.kernel.org/r/1492052365-16169-3-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03zram: handle multiple pages attached bio's bvecMinchan Kim1-29/+11
Patch series "zram clean up", v2. This patchset aims to clean up zram . [1] clean up multiple pages's bvec handling. [2] clean up partial IO handling [3-6] clean up zram via using accessor and removing pointless structure. With [2-6] applied, we can get a few hundred bytes as well as huge readibility enhance. x86: 708 byte save add/remove: 1/1 grow/shrink: 0/11 up/down: 478/-1186 (-708) function old new delta zram_special_page_read - 478 +478 zram_reset_device 317 314 -3 mem_used_max_store 131 128 -3 compact_store 96 93 -3 mm_stat_show 203 197 -6 zram_add 719 712 -7 zram_slot_free_notify 229 214 -15 zram_make_request 819 803 -16 zram_meta_free 128 111 -17 zram_free_page 180 151 -29 disksize_store 432 361 -71 zram_decompress_page.isra 504 - -504 zram_bvec_rw 2592 2080 -512 Total: Before=25350773, After=25350065, chg -0.00% ppc64: 231 byte save add/remove: 2/0 grow/shrink: 1/9 up/down: 681/-912 (-231) function old new delta zram_special_page_read - 480 +480 zram_slot_lock - 200 +200 vermagic 39 40 +1 mm_stat_show 256 248 -8 zram_meta_free 200 184 -16 zram_add 944 912 -32 zram_free_page 348 308 -40 disksize_store 572 492 -80 zram_decompress_page 664 564 -100 zram_slot_free_notify 292 160 -132 zram_make_request 1132 1000 -132 zram_bvec_rw 2768 2396 -372 Total: Before=17565825, After=17565594, chg -0.00% This patch (of 6): Johannes Thumshirn reported system goes the panic when using NVMe over Fabrics loopback target with zram. The reason is zram expects each bvec in bio contains a single page but nvme can attach a huge bulk of pages attached to the bio's bvec so that zram's index arithmetic could be wrong so that out-of-bound access makes system panic. [1] in mainline solved solved the problem by limiting max_sectors with SECTORS_PER_PAGE but it makes zram slow because bio should split with each pages so this patch makes zram aware of multiple pages in a bvec so it could solve without any regression(ie, bio split). [1] 0bc315381fe9, zram: set physical queue limits to avoid array out of bounds accesses Link: http://lkml.kernel.org/r/20170413134057.GA27499@bbox Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Johannes Thumshirn <jthumshirn@suse.de> Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm, page_alloc: remove debug_guardpage_minorder() test in warn_alloc()Tetsuo Handa1-2/+1
Commit c0a32fc5a2e4 ("mm: more intensive memory corruption debugging") changed to check debug_guardpage_minorder() > 0 when reporting allocation failures. The reasoning was When we use guard page to debug memory corruption, it shrinks available pages to 1/2, 1/4, 1/8 and so on, depending on parameter value. In such case memory allocation failures can be common and printing errors can flood dmesg. If somebody debug corruption, allocation failures are not the things he/she is interested about. but this is misguided. Allocation requests with __GFP_NOWARN flag by definition do not cause flooding of allocation failure messages. Allocation requests with __GFP_NORETRY flag likely also have __GFP_NOWARN flag. Costly allocation requests likely also have __GFP_NOWARN flag. Allocation requests without __GFP_DIRECT_RECLAIM flag likely also have __GFP_NOWARN flag or __GFP_HIGH flag. Non-costly allocation requests with __GFP_DIRECT_RECLAIM flag basically retry forever due to the "too small to fail" memory-allocation rule. Therefore, as a whole, shrinking available pages by debug_guardpage_minorder= kernel boot parameter might cause flooding of OOM killer messages but unlikely causes flooding of allocation failure messages. Let's remove debug_guardpage_minorder() > 0 check which would likely be pointless. Link: http://lkml.kernel.org/r/1491910035-4231-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Rafael J . Wysocki" <rafael.j.wysocki@intel.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm/memory-failure.c: add page flag description in error pathsAnshuman Khandual1-8/+8
It helps to provide page flag description along with the raw value in error paths during soft offline process. From sample experiments Before the patch: soft offline: 0x6100: migration failed 1, type 3ffff800008018 soft offline: 0x7400: migration failed 1, type 3ffff800008018 After the patch: soft offline: 0x5900: migration failed 1, type 3ffff800008018 (uptodate|dirty|head) soft offline: 0x6c00: migration failed 1, type 3ffff800008018 (uptodate|dirty|head) Link: http://lkml.kernel.org/r/20170409023829.10788-1-khandual@linux.vnet.ibm.com Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm/madvise: move up the behavior parameter validationAnshuman Khandual1-4/+9
madvise_behavior_valid() should be called before acting upon the behavior parameter. Hence move up the function. This also includes MADV_SOFT_OFFLINE and MADV_HWPOISON options as valid behavior parameter for the system call madvise(). Link: http://lkml.kernel.org/r/20170418052844.24891-1-khandual@linux.vnet.ibm.com Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm/madvise.c: clean up MADV_SOFT_OFFLINE and MADV_HWPOISONAnshuman Khandual1-14/+20
This cleans up handling MADV_SOFT_OFFLINE and MADV_HWPOISON called through madvise() system call. * madvise_memory_failure() was misleading to accommodate handling of both memory_failure() as well as soft_offline_page() functions. Basically it handles memory error injection from user space which can go either way as memory failure or soft offline. Renamed as madvise_inject_error() instead. * Renamed struct page pointer 'p' to 'page'. * pr_info() was essentially printing PFN value but it said 'page' which was misleading. Made the process virtual address explicit. Before the patch: Soft offlining page 0x15e3e at 0x3fff8c230000 Soft offlining page 0x1f3 at 0x3fffa0da0000 Soft offlining page 0x744 at 0x3fff7d200000 Soft offlining page 0x1634d at 0x3fff95e20000 Soft offlining page 0x16349 at 0x3fff95e30000 Soft offlining page 0x1d6 at 0x3fff9e8b0000 Soft offlining page 0x5f3 at 0x3fff91bd0000 Injecting memory failure for page 0x15c8b at 0x3fff83280000 Injecting memory failure for page 0x16190 at 0x3fff83290000 Injecting memory failure for page 0x740 at 0x3fff9a2e0000 Injecting memory failure for page 0x741 at 0x3fff9a2f0000 After the patch: Soft offlining pfn 0x1484e at process virtual address 0x3fff883c0000 Soft offlining pfn 0x1484f at process virtual address 0x3fff883d0000 Soft offlining pfn 0x14850 at process virtual address 0x3fff883e0000 Soft offlining pfn 0x14851 at process virtual address 0x3fff883f0000 Soft offlining pfn 0x14852 at process virtual address 0x3fff88400000 Soft offlining pfn 0x14853 at process virtual address 0x3fff88410000 Soft offlining pfn 0x14854 at process virtual address 0x3fff88420000 Soft offlining pfn 0x1521c at process virtual address 0x3fff6bc70000 Injecting memory failure for pfn 0x10fcf at process virtual address 0x3fff86310000 Injecting memory failure for pfn 0x10fd0 at process virtual address 0x3fff86320000 Injecting memory failure for pfn 0x10fd1 at process virtual address 0x3fff86330000 Injecting memory failure for pfn 0x10fd2 at process virtual address 0x3fff86340000 Injecting memory failure for pfn 0x10fd3 at process virtual address 0x3fff86350000 Injecting memory failure for pfn 0x10fd4 at process virtual address 0x3fff86360000 Injecting memory failure for pfn 0x10fd5 at process virtual address 0x3fff86370000 Link: http://lkml.kernel.org/r/20170410084701.11248-1-khandual@linux.vnet.ibm.com Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03Documentation: vm, add hugetlbfs reservation overviewMike Kravetz2-0/+531
Adding a brief overview of hugetlbfs reservation design and implementation as an aid to those making code modifications in this area. Link: http://lkml.kernel.org/r/1491586995-13085-1-git-send-email-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm, swap: remove unused function prototypeHuang Ying1-3/+0
This is a code cleanup patch, no functionality changes. There are 2 unused function prototype in swap.h, they are removed. Link: http://lkml.kernel.org/r/20170405071017.23677-1-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm: memcontrol: use node page state naming scheme for memcgJohannes Weiner6-68/+68
The memory controllers stat function names are awkwardly long and arbitrarily different from the zone and node stat functions. The current interface is named: mem_cgroup_read_stat() mem_cgroup_update_stat() mem_cgroup_inc_stat() mem_cgroup_dec_stat() mem_cgroup_update_page_stat() mem_cgroup_inc_page_stat() mem_cgroup_dec_page_stat() This patch renames it to match the corresponding node stat functions: memcg_page_state() [node_page_state()] mod_memcg_state() [mod_node_state()] inc_memcg_state() [inc_node_state()] dec_memcg_state() [dec_node_state()] mod_memcg_page_state() [mod_node_page_state()] inc_memcg_page_state() [inc_node_page_state()] dec_memcg_page_state() [dec_node_page_state()] Link: http://lkml.kernel.org/r/20170404220148.28338-4-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm: memcontrol: re-use node VM page state enumJohannes Weiner6-138/+123
The current duplication is a high-maintenance mess, and it's painful to add new items or query memcg state from the rest of the VM. This increases the size of the stat array marginally, but we should aim to track all these stats on a per-cgroup level anyway. Link: http://lkml.kernel.org/r/20170404220148.28338-3-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm: memcontrol: re-use global VM event enumJohannes Weiner2-56/+42
The current duplication is a high-maintenance mess, and it's painful to add new items. This increases the size of the event array, but we'll eventually want most of the VM events tracked on a per-cgroup basis anyway. Link: http://lkml.kernel.org/r/20170404220148.28338-2-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm: memcontrol: clean up memory.events counting functionJohannes Weiner3-18/+10
We only ever count single events, drop the @nr parameter. Rename the function accordingly. Remove low-information kerneldoc. Link: http://lkml.kernel.org/r/20170404220148.28338-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm: vmscan: fix IO/refault regression in cache workingset transitionJohannes Weiner5-41/+150
Since commit 59dc76b0d4df ("mm: vmscan: reduce size of inactive file list") we noticed bigger IO spikes during changes in cache access patterns. The patch in question shrunk the inactive list size to leave more room for the current workingset in the presence of streaming IO. However, workingset transitions that previously happened on the inactive list are now pushed out of memory and incur more refaults to complete. This patch disables active list protection when refaults are being observed. This accelerates workingset transitions, and allows more of the new set to establish itself from memory, without eating into the ability to protect the established workingset during stable periods. The workloads that were measurably affected for us were hit pretty bad by it, with refault/majfault rates doubling and tripling during cache transitions, and the machines sustaining half-hour periods of 100% IO utilization, where they'd previously have sub-minute peaks at 60-90%. Stateful services that handle user data tend to be more conservative with kernel upgrades. As a result we hit most page cache issues with some delay, as was the case here. The severity seemed to warrant a stable tag. Fixes: 59dc76b0d4df ("mm: vmscan: reduce size of inactive file list") Link: http://lkml.kernel.org/r/20170404220052.27593-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> [4.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm/mmap: replace SHM_HUGE_MASK with MAP_HUGE_MASK inside mmap_pgoffAnshuman Khandual1-1/+1
Commit 091d0d55b286 ("shm: fix null pointer deref when userspace specifies invalid hugepage size") had replaced MAP_HUGE_MASK with SHM_HUGE_MASK. Though both of them contain the same numeric value of 0x3f, MAP_HUGE_MASK flag sounds more appropriate than the other one in the context. Hence change it back. Link: http://lkml.kernel.org/r/20170404045635.616-1-khandual@linux.vnet.ibm.com Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03oom: improve oom disable handlingMichal Hocko2-1/+3
Tetsuo has reported that sysrq triggered OOM killer will print a misleading information when no tasks are selected: sysrq: SysRq : Manual OOM execution Out of memory: Kill process 4468 ((agetty)) score 0 or sacrifice child Killed process 4468 ((agetty)) total-vm:43704kB, anon-rss:1760kB, file-rss:0kB, shmem-rss:0kB sysrq: SysRq : Manual OOM execution Out of memory: Kill process 4469 (systemd-cgroups) score 0 or sacrifice child Killed process 4469 (systemd-cgroups) total-vm:10704kB, anon-rss:120kB, file-rss:0kB, shmem-rss:0kB sysrq: SysRq : Manual OOM execution sysrq: OOM request ignored because killer is disabled sysrq: SysRq : Manual OOM execution sysrq: OOM request ignored because killer is disabled sysrq: SysRq : Manual OOM execution sysrq: OOM request ignored because killer is disabled The real reason is that there are no eligible tasks for the OOM killer to select but since commit 7c5f64f84483 ("mm: oom: deduplicate victim selection code for memcg and global oom") the semantic of out_of_memory has changed without updating moom_callback. This patch updates moom_callback to tell that no task was eligible which is the case for both oom killer disabled and no eligible tasks. In order to help distinguish first case from the second add printk to both oom_killer_{enable,disable}. This information is useful on its own because it might help debugging potential memory allocation failures. Fixes: 7c5f64f84483 ("mm: oom: deduplicate victim selection code for memcg and global oom") Link: http://lkml.kernel.org/r/20170404134705.6361-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03userfaultfd: selftest: combine all cases into a single executableMike Rapoport3-108/+116
Currently, selftest for userfaultfd is compiled three times: for anonymous, shared and hugetlb memory. Let's combine all the cases into a single executable which will have a command line option for selection of the test type. Link: http://lkml.kernel.org/r/1490869741-5913-1-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm: fix spelling errorHao Lee1-2/+2
Fix variable name error in comments. No code changes. Link: http://lkml.kernel.org/r/20170403161655.5081-1-haolee.swjtu@gmail.com Signed-off-by: Hao Lee <haolee.swjtu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm/swap_slots.c: add warning if swap slots cache failed to initializeTim Chen1-1/+3
Add a warning diagnostics to user if we failed to allocate swap slots cache and use it. [akpm@linux-foundation.org: use WARN_ONCE return value, fix grammar in message] Link: http://lkml.kernel.org/r/20170328234827.GA10107@linux.intel.com Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03include/linux/migrate.h: add arg names to prototypePushkar Jambhlekar1-2/+3
It is preferred, and the rest of migrate.h gets it right. Link: http://lkml.kernel.org/r/1490336009-8024-1-git-send-email-pushkar.iit@gmail.com Signed-off-by: Pushkar Jambhlekar <pushkar.iit@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm: enable page poisoning early at bootVinayak Menon5-88/+17
On SPARSEMEM systems page poisoning is enabled after buddy is up, because of the dependency on page extension init. This causes the pages released by free_all_bootmem not to be poisoned. This either delays or misses the identification of some issues because the pages have to undergo another cycle of alloc-free-alloc for any corruption to be detected. Enable page poisoning early by getting rid of the PAGE_EXT_DEBUG_POISON flag. Since all the free pages will now be poisoned, the flag need not be verified before checking the poison during an alloc. [vinmenon@codeaurora.org: fix Kconfig] Link: http://lkml.kernel.org/r/1490878002-14423-1-git-send-email-vinmenon@codeaurora.org Link: http://lkml.kernel.org/r/1490358246-11001-1-git-send-email-vinmenon@codeaurora.org Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Acked-by: Laura Abbott <labbott@redhat.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03mm, swap: avoid lock swap_avail_lock when held cluster lockHuang Ying1-3/+3
Cluster lock is used to protect the swap_cluster_info and corresponding elements in swap_info_struct->swap_map[]. But it is found that now in scan_swap_map_slots(), swap_avail_lock may be acquired when cluster lock is held. This does no good except making the locking more complex and improving the potential locking contention, because the swap_info_struct->lock is used to protect the data structure operated in the code already. Fix this via moving the corresponding operations in scan_swap_map_slots() out of cluster lock. Link: http://lkml.kernel.org/r/20170317064635.12792-3-ying.huang@intel.com Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Tim Chen <tim.c.chen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>