summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2017-05-08ipc/shm: some shmat cleanupsDavidlohr Bueso1-9/+7
Clean up early flag and address some minutia. Link: http://lkml.kernel.org/r/1486673582-6979-3-git-send-email-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08initramfs: use vfs_stat/lstat directlyArnd Bergmann2-24/+10
sys_newlstat is a system call implementation that is meant for user space, and that copies kernel-internal data structure to the user format, which is not needed for in-kernel users. Further, as we rearrange the system call implementation so we can extend it with 64-bit time_t, the prototype for sys_newlstat changes. This changes the initramfs code to use vfs_lstat directly, to get it out of the way of the time_t changes, and make it slightly more efficient in the process. Along the same lines we also replace sys_stat and sys_stat64 with vfs_stat. Link: http://lkml.kernel.org/r/20170314214932.4052842-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08initramfs: provide a way to ignore image provided by bootloaderDaniel Thompson2-1/+11
Many "embedded" architectures provide CMDLINE_FORCE to allow the kernel to override the command line provided by an inflexible bootloader. However there is currrently no way for the kernel to override the initramfs image provided by the bootloader meaning there are still ways for bootloaders to make things difficult for us. Fix this by introducing INITRAMFS_FORCE which can prevent the kernel from loading the bootloader supplied image. We use CMDLINE_FORCE (and its friend CMDLINE_EXTEND) to imply that the system has an inflexible bootloader. This allow us to avoid presenting this config option to users of systems where inflexible bootloaders aren't usually a problem. Link: http://lkml.kernel.org/r/20170217121940.30126-1-daniel.thompson@linaro.org Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08lib/zlib_inflate/inftrees.c: fix potential buffer overflowGuenter Roeck1-1/+1
smatch says: WARNING: please, no spaces at the start of a line #30: FILE: lib/zlib_inflate/inftrees.c:112: + for (min = 1; min < MAXBITS; min++)$ total: 0 errors, 1 warnings, 8 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/zlib-inflate-fix-potential-buffer-overflow.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08lib/fault-inject.c: use correct check for interruptsDmitry Vyukov1-1/+1
in_interrupt() also returns true when bh is disabled in task context. That's not what fail_task() wants to check. Use the new in_task() predicate that does the right thing. Link: http://lkml.kernel.org/r/20170321091805.140676-1-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08kcov: simplify interrupt checkDmitry Vyukov1-8/+1
in_interrupt() semantics are confusing and wrong for most users as it also returns true when bh is disabled. Thus we open coded a proper check for interrupts in __sanitizer_cov_trace_pc() with a lengthy explanatory comment. Use the new in_task() predicate instead. Link: http://lkml.kernel.org/r/20170321091026.139655-1-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: James Morse <james.morse@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08taskstats: add e/u/stime for TGID commandZhang Xiao1-0/+14
The elapsed time, user CPU time and system CPU time for the thread group status request are presently left at zero. Fill these in. [akpm@linux-foundation.org: run ktime_get_ns() a single time] [akpm@linux-foundation.org: include linux/sched/cputime.h for task_cputime()] Link: http://lkml.kernel.org/r/1488508424-12322-1-git-send-email-xiao.zhang@windriver.com Signed-off-by: Zhang Xiao <xiao.zhang@windriver.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08pidns: expose task pid_ns_for_children to userspaceKirill Tkhai3-0/+36
pid_ns_for_children set by a task is known only to the task itself, and it's impossible to identify it from outside. It's a big problem for checkpoint/restore software like CRIU, because it can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of their work. This patch solves the problem, and it exposes pid_ns_for_children to ns directory in standard way with the name "pid_for_children": ~# ls /proc/5531/ns -l | grep pid lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836] lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286] Link: http://lkml.kernel.org/r/149201123914.6007.2187327078064239572.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Andrei Vagin <avagin@virtuozzo.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08ns: allow ns_entries to have custom symlink contentKirill Tkhai2-1/+4
Patch series "Expose task pid_ns_for_children to userspace". pid_ns_for_children set by a task is known only to the task itself, and it's impossible to identify it from outside. It's a big problem for checkpoint/restore software like CRIU, because it can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of their work. If they have a custom pid_ns_for_children before dump, they must have the same ns after restore. Otherwise, restored task bumped into enviroment it does not expect. This patchset solves the problem. It exposes pid_ns_for_children to ns directory in standard way with the name "pid_for_children": ~# ls /proc/5531/ns -l | grep pid lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836] lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286] This patch (of 2): Make possible to have link content prefix yyy different from the link name xxx: $ readlink /proc/[pid]/ns/xxx yyy:[4026531838] This will be used in next patch. Link: http://lkml.kernel.org/r/149201120318.6007.7362655181033883000.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08pidns: disable pid allocation if pid_ns_prepare_proc() is failed in alloc_pid()Kirill Tkhai1-1/+3
alloc_pidmap() advances pid_namespace::last_pid. When first pid allocation fails, then next created process will have pid 2 and pid_ns_prepare_proc() won't be called. So, pid_namespace::proc_mnt will never be initialized (not to mention that there won't be a child reaper). I saw crash stack of such case on kernel 3.10: BUG: unable to handle kernel NULL pointer dereference at (null) IP: proc_flush_task+0x8f/0x1b0 Call Trace: release_task+0x3f/0x490 wait_consider_task.part.10+0x7ff/0xb00 do_wait+0x11f/0x280 SyS_wait4+0x7d/0x110 We may fix this by restore of last_pid in 0 or by prohibiting of futher allocations. Since there was a similar issue in Oleg Nesterov's commit 314a8ad0f18a ("pidns: fix free_pid() to handle the first fork failure"). and it was fixed via prohibiting allocation, let's follow this way, and do the same. Link: http://lkml.kernel.org/r/149201021004.4863.6762095011554287922.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andrei Vagin <avagin@virtuozzo.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08powerpc/fadump: update documentation about crashkernel parameter reuseHari Bathini1-8/+15
As we are reusing crashkernel parameter instead of fadump_reserve_mem parameter to specify the memory to reserve for fadump's crash kernel, update the documentation accordingly. Link: http://lkml.kernel.org/r/149035347559.6881.14224829694291758581.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08powerpc/fadump: reuse crashkernel parameter for fadump memory reservationHari Bathini1-13/+10
fadump supports specifying memory to reserve for fadump's crash kernel with fadump_reserve_mem kernel parameter. This parameter currently supports passing a fixed memory size, like fadump_reserve_mem=<size> only. This patch aims to add support for other syntaxes like range-based memory size <range1>:<size1>[,<range2>:<size2>,<range3>:<size3>,...] which allows using the same parameter to boot the kernel with different system RAM sizes. As crashkernel parameter already supports the above mentioned syntaxes, this patch deprecates fadump_reserve_mem parameter and reuses crashkernel parameter instead, to specify memory for fadump's crash kernel memory reservation as well. If any offset is provided in crashkernel parameter, it will be ignored in case of fadump, as fadump reserves memory at end of RAM. Advantages using crashkernel parameter instead of fadump_reserve_mem parameter are one less kernel parameter overall, code reuse and support for multiple syntaxes to specify memory. Suggested-by: Dave Young <dyoung@redhat.com> Link: http://lkml.kernel.org/r/149035346749.6881.911095631212975718.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08powerpc/fadump: remove dependency with CONFIG_KEXECHari Bathini5-37/+16
Now that crashkernel parameter parsing and vmcoreinfo related code is moved under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE, remove dependency with CONFIG_KEXEC for CONFIG_FA_DUMP. While here, get rid of definitions of fadump_append_elf_note() & fadump_final_note() functions to reuse similar functions compiled under CONFIG_CRASH_CORE. Link: http://lkml.kernel.org/r/149035343956.6881.1536459326017709354.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08ia64: reuse append_elf_note() and final_note() functionsHari Bathini5-70/+20
Get rid of multiple definitions of append_elf_note() & final_note() functions. Reuse these functions compiled under CONFIG_CRASH_CORE Also, define Elf_Word and use it instead of generic u32 or the more specific Elf64_Word. Link: http://lkml.kernel.org/r/149035342324.6881.11667840929850361402.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_COREHari Bathini9-462/+531
Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and reuse crashkernel parameter for fadump", v4. Traditionally, kdump is used to save vmcore in case of a crash. Some architectures like powerpc can save vmcore using architecture specific support instead of kexec/kdump mechanism. Such architecture specific support also needs to reserve memory, to be used by dump capture kernel. crashkernel parameter can be a reused, for memory reservation, by such architecture specific infrastructure. This patchset removes dependency with CONFIG_KEXEC for crashkernel parameter and vmcoreinfo related code as it can be reused without kexec support. Also, crashkernel parameter is reused instead of fadump_reserve_mem to reserve memory for fadump. The first patch moves crashkernel parameter parsing and vmcoreinfo related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The second patch reuses the definitions of append_elf_note() & final_note() functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump) in powerpc. The next patch reuses crashkernel parameter for reserving memory for fadump, instead of the fadump_reserve_mem parameter. This has the advantage of using all syntaxes crashkernel parameter supports, for fadump as well. The last patch updates fadump kernel documentation about use of crashkernel parameter. This patch (of 5): Traditionally, kdump is used to save vmcore in case of a crash. Some architectures like powerpc can save vmcore using architecture specific support instead of kexec/kdump mechanism. Such architecture specific support also needs to reserve memory, to be used by dump capture kernel. crashkernel parameter can be a reused, for memory reservation, by such architecture specific infrastructure. But currently, code related to vmcoreinfo and parsing of crashkernel parameter is built under CONFIG_KEXEC_CORE. This patch introduces CONFIG_CRASH_CORE and moves the above mentioned code under this config, allowing code reuse without dependency on CONFIG_KEXEC. There is no functional change with this patch. Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08cpumask: make "nr_cpumask_bits" unsignedAlexey Dobriyan3-4/+4
Bit searching functions accept "unsigned long" indices but "nr_cpumask_bits" is "int" which is signed, so inevitable sign extensions occur on x86_64. Those MOVSX are #1 MOVSX bloat by number of uses across whole kernel. Change "nr_cpumask_bits" to unsigned, this number can't be negative after all. It allows to do implicit zero-extension on x86_64 without MOVSX. Change signed comparisons into unsigned comparisons where necessary. Other uses looks fine because it is either argument passed to a function or comparison is already unsigned. Net win on allyesconfig type of kernel: ~2.8 KB (!) add/remove: 0/0 grow/shrink: 8/725 up/down: 93/-2926 (-2833) function old new delta xen_exit_mmap 691 735 +44 qstat_read 426 440 +14 __cpufreq_cooling_register 1678 1687 +9 trace_rb_cpu_prepare 447 455 +8 vermagic 54 60 +6 nfp_driver_version 54 60 +6 rcu_torture_stats_print 1147 1151 +4 find_next_push_cpu 267 269 +2 xen_irq_resume 961 960 -1 ... init_vp_index 946 906 -40 od_set_powersave_bias 328 281 -47 power_cpu_exit 193 139 -54 arch_show_interrupts 3538 3484 -54 select_idle_sibling 1558 1471 -87 Total: Before=158358910, After=158356077, chg -0.00% Same arguments apply to "nr_cpu_ids" but I haven't yet found enough courage to delve into this issue (and proper fix may require new type "cpu_t" which is whole separate story). Link: http://lkml.kernel.org/r/20170309205322.GA1728@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08fork: free vmapped stacks in cache when cpus are offlineHoeun Ryu1-0/+23
Using virtually mapped stack, kernel stacks are allocated via vmalloc. In the current implementation, two stacks per cpu can be cached when tasks are freed and the cached stacks are used again in task duplications. But the cached stacks may remain unfreed even when cpu are offline. By adding a cpu hotplug callback to free the cached stacks when a cpu goes offline, the pages of the cached stacks are not wasted. Link: http://lkml.kernel.org/r/1487076043-17802-1-git-send-email-hoeun.ryu@gmail.com Signed-off-by: Hoeun Ryu <hoeun.ryu@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Mateusz Guzik <mguzik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08reiserfs: use designated initializersKees Cook1-12/+12
Prepare to mark sensitive kernel structures for randomization by making sure they're using designated initializers. These were identified during allyesconfig builds of x86, arm, and arm64, with most initializer fixes extracted from grsecurity. Link: http://lkml.kernel.org/r/20170329210419.GA40066@beast Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08checkpatch: improve the SUSPECT_CODE_INDENT testJoe Perches1-1/+3
The current SUSPECT_CODE_INDENT test does not recognize several defective code style defects where code following a logical test is inappropriately indented. Before this patch, for code like: if (foo) bar(); checkpatch would not emit a warning. Improve the test to warn when code after a logical test has the same indentation as the logical test. Perform the same indentation test for "else" blocks too. Link: http://lkml.kernel.org/r/df2374b68c4a68af2b7ef08afe486584811f610a.1493683942.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08checkpatch: improve the embedded function name test for patch contextsJoe Perches1-9/+8
The current test works only for a single patch context as it is done in the foreach ($rawlines) loop that precedes the loop where the actual $context_function variable is used. Move the set of $context_function into the foreach (@lines) loop where it is useful for each patch context. Link: http://lkml.kernel.org/r/6c675a31c74fbfad4fc45b9f462303d60ca2a283.1493486091.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08checkpatch: add --typedefsfileJerome Forissier1-17/+35
When using checkpatch on out-of-tree code, it may occur that some project-specific types are used, which will cause spurious warnings. Add the --typedefsfile option as a way to extend the known types and deal with this issue. This was developed for OP-TEE [1]. We run a Travis job on all pull requests [2], and checkpatch is part of that. The typical false warning we get on a regular basis is with some pointers to functions returning TEE_Result [3], which is a typedef from the GlobalPlatform APIs. We consider it is acceptable to use GP types in the OP-TEE core implementation, that's why this patch would be helpful for us. [1] https://github.com/OP-TEE/optee_os [2] https://travis-ci.org/OP-TEE/optee_os/builds [3] https://travis-ci.org/OP-TEE/optee_os/builds/193355335#L1733 Link: http://lkml.kernel.org/r/ba1124d6dfa599bb0dd1d8919dd45dd09ce541a4.1492702192.git.jerome.forissier@linaro.org Signed-off-by: Jerome Forissier <jerome.forissier@linaro.org> Cc: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08checkpatch: improve k.alloc with multiplication and sizeof testJoe Perches1-3/+10
Find multi-line uses of k.alloc by using the $stat variable and not the $line variable. This can still --fix only the single line variant though. Link: http://lkml.kernel.org/r/3f4b23d37cd4c7d8628eefc25afe83ba8fb3ab55.1493167076.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08checkpatch: special audit for revert commit lineWei Wang1-0/+1
Currently checkpatch.pl does not recognize git's default commit revert message and will complain about the hash format. Add special audit for revert commit message line to fix it. Link: http://lkml.kernel.org/r/20170411191532.74381-1-wvw@google.com Signed-off-by: Wei Wang <wvw@google.com> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08checkpatch: clarify the EMBEDDED_FUNCTION_NAME messageJoe Perches1-5/+7
Try to make the conversion of embedded function names to "%s: ", __func__ a bit clearer. Add a bit more information to the comment describing the test too. Link: http://lkml.kernel.org/r/38f5d32f0aec1cd98cb9ceeedd6a736cc9a802db.1491759835.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08checkpatch: improve MULTISTATEMENT_MACRO_USE_DO_WHILE testJoe Perches1-2/+4
The logic currrently misses macros that start with an if statement. e.g.: #define foo(bar) if (bar) baz; Add a test for macro content that starts with if Link: http://lkml.kernel.org/r/a9d41aafe1673889caf1a9850208fb7fd74107a0.1491783914.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Reported-by: Andreas Mohr <andi@lisas.de> Original-patch-by: Alfonso Lima <alfonsolimaastor@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08checkpatch: avoid suggesting struct definitions should be constJoe Perches1-3/+3
Many structs are generally used const and there is a known list of these structs. struct definitions should not be generally be declared const. Add a test for the lack of an open brace immediately after the struct to avoid definitions. This avoids the false positive "struct foo should normally be const" message only when the open brace is on the same line as the definition. Link: http://lkml.kernel.org/r/0dce709150d712e66f1b90b03827634b53b28085.1491845946.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Cc: Arthur Brainville <ybalrid@ybalrid.info> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08checkpatch: allow space leading blank lines in email headersJoe Perches1-2/+2
Allow a leading space and otherwise blank link in the email headers as it can be a line wrapped Spamassassin multiple line string or any other valid rfc 2822/5322 email header. The line with space causes checkpatch to erroneously think that it's in the content body, as opposed to headers and thus flag a mail header as an unwrapped long comment line. Link: http://lkml.kernel.org/r/d75a9f0b78b3488078429f4037d9fff3bdfa3b78.1490247180.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com>Reported-by: Darren Hart (VMware) <dvhart@infradead.org> Tested-by: Darren Hart (VMware) <dvhart@infradead.org> Reviewed-by: Darren Hart (VMware) <dvhart@vmware.com> Original-patch-by: John 'Warthog9' Hawley (VMware) <warthog9@eaglescrag.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08checkpatch: improve EMBEDDED_FUNCTION_NAME testJoe Perches1-0/+11
The existing behavior relies on patch context to identify function declarations. Add the ability to find function declarations when there is an open brace in column 1. This finds function declarations only in specific single line forms where the function name is on a single line like: int foo(args...) { and int foo(args...) { It does not recognize function declarations like: int foo(int bar, int baz) { Link: http://lkml.kernel.org/r/738d74bbbe1a06b80f11ed504818107c68903095.1488155636.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08checkpatch: add ability to find bad uses of vsprintf %p<foo> extensionsJoe Perches2-0/+29
%pK was at least once misused at %pk in an out-of-tree module. This lead to some security concerns. Add the ability to track single and multiple line statements for misuses of %p<foo>. [akpm@linux-foundation.org: add helpful comment into lib/vsprintf.c] [akpm@linux-foundation.org: text tweak] Link: http://lkml.kernel.org/r/163a690510e636a23187c0dc9caa09ddac6d4cde.1488228427.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: William Roberts <william.c.roberts@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08checkpatch: remove obsolete CONFIG_EXPERIMENTAL checksRuslan Bilovol1-13/+0
Config EXPERIMENTAL has been removed from kernel in 2013 (see commit 3d374d09f16f: "final removal of CONFIG_EXPERIMENTAL"), there is no any reason to do these checks now. Link: http://lkml.kernel.org/r/1488234097-20119-1-git-send-email-ruslan.bilovol@gmail.com Signed-off-by: Ruslan Bilovol <ruslan.bilovol@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08firmware/Makefile: force recompilation if makefile changesLuis R. Rodriguez1-1/+2
If you modify the target asm we currently do not force the recompilation of the firmware files. The target asm is in the firmware/Makefile, peg this file as a dependency to require re-compilation of firmware targets when the asm changes. Link: http://lkml.kernel.org/r/20170123150727.4883-1-mcgrof@kernel.org Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <mmarek@suse.com> Cc: Ming Lei <ming.lei@canonical.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Tom Gundersen <teg@jklm.no> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08lib: add module support to linked list sorting testsGeert Uytterhoeven4-152/+155
Extract the linked list sorting test code into its own source file, to allow to compile it either to a loadable module, or builtin into the kernel. Link: http://lkml.kernel.org/r/1488287219-15832-4-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08lib: add module support to array-based sort testsGeert Uytterhoeven1-3/+4
Allow to compile the array-based sort test code either to a loadable module, or builtin into the kernel. Link: http://lkml.kernel.org/r/1488287219-15832-3-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08Revert "lib/test_sort.c: make it explicitly non-modular"Geert Uytterhoeven1-6/+5
Patch series "lib: add module support to sort tests". This patch series allows to compile the array-based and linked list sort test code either to loadable modules, or builtin into the kernel. It's very valuable to have modular tests, so you can run them just by insmodding the test modules, instead of needing a separate kernel that runs them at boot. This patch (of 3): This reverts commit 8893f519330bb073a49c5b4676fce4be6f1be15d. It's very valuable to have modular tests, so you can run them just by insmodding the test modules, instead of needing a separate kernel that runs them at boot. Link: http://lkml.kernel.org/r/1488287219-15832-2-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR()Dan Carpenter1-2/+2
c2port_device_register() never returns NULL, it uses error pointers. Link: http://lkml.kernel.org/r/20170412083321.GC3250@mwanda Fixes: 65131cd52b9e ("c2port: add c2port support for Eurotech Duramar 2150") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Rodolfo Giometti <giometti@linux.it> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08drivers/misc/vmw_vmci/vmci_queue_pair.c: fix a couple integer overflow testsDan Carpenter1-2/+8
The "DIV_ROUND_UP(size, PAGE_SIZE)" operation can overflow if "size" is more than ULLONG_MAX - PAGE_SIZE. Link: http://lkml.kernel.org/r/20170322111950.GA11279@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Jorgen Hansen <jhansen@vmware.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08kernel/hung_task.c: defer showing held locksTetsuo Handa1-1/+7
When I was running my testcase which may block hundreds of threads on fs locks, I got lockup due to output from debug_show_all_locks() added by commit b2d4c2edb2e4 ("locking/hung_task: Show all locks"). For example, if 1000 threads were blocked in TASK_UNINTERRUPTIBLE state and 500 out of 1000 threads hold some lock, debug_show_all_locks() from for_each_process_thread() loop will report locks held by 500 threads for 1000 times. This is a too much noise. In order to make sure rcu_lock_break() is called frequently, we should avoid calling debug_show_all_locks() from for_each_process_thread() loop because debug_show_all_locks() effectively calls for_each_process_thread() loop. Let's defer calling debug_show_all_locks() till before panic() or leaving for_each_process_thread() loop. Link: http://lkml.kernel.org/r/1489296834-60436-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08make help: add tools help targetRandy Dunlap1-2/+6
Add a top-level Makefile help target for Userspace tools. Also make each help "heading" end with a colon ':'. Link: http://lkml.kernel.org/r/55c986ff-3966-3e47-2984-7349da2cce51@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smpMatthias Kaehlcke1-8/+3
jiffies_64 is defined in kernel/time/timer.c with ____cacheline_aligned_in_smp, however this macro is not part of the declaration of jiffies and jiffies_64 in jiffies.h. As a result clang generates the following warning: kernel/time/timer.c:57:26: error: section does not match previous declaration [-Werror,-Wsection] __visible u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES; ^ include/linux/cache.h:39:36: note: expanded from macro '__cacheline_aligned_in_smp' ^ include/linux/cache.h:34:4: note: expanded from macro '__cacheline_aligned' __section__(".data..cacheline_aligned"))) ^ include/linux/jiffies.h:77:12: note: previous attribute is here extern u64 __jiffy_data jiffies_64; ^ include/linux/jiffies.h:70:38: note: expanded from macro '__jiffy_data' Link: http://lkml.kernel.org/r/20170403190200.70273-1-mka@chromium.org Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Cc: "Jason A . Donenfeld" <Jason@zx2c4.com> Cc: Grant Grundler <grundler@chromium.org> Cc: Michael Davidson <md@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08drivers/virt/fsl_hypervisor.c: use get_user_pages_unlocked()Lorenzo Stoakes1-5/+2
Moving from get_user_pages() to get_user_pages_unlocked() simplifies the code and takes advantage of VM_FAULT_RETRY functionality when faulting in pages. Link: http://lkml.kernel.org/r/20161101194332.23961-1-lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Mihai Caraman <mihai.caraman@freescale.com> Cc: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08proc/sysctl: fix the int overflow for jiffies conversionGao Feng1-1/+1
do_proc_dointvec_jiffies_conv() uses LONG_MAX/HZ as the max value to avoid overflow. But actually the *valp is int type, so it still causes overflow. For example, echo 2147483647 > ./sys/net/ipv4/tcp_keepalive_time Then, cat ./sys/net/ipv4/tcp_keepalive_time The output is "-1", it is not expected. Now use INT_MAX/HZ as the max value instead LONG_MAX/HZ to fix it. Link: http://lkml.kernel.org/r/1490109532-9228-1-git-send-email-fgao@ikuai8.com Signed-off-by: Gao Feng <fgao@ikuai8.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08fs/proc/inode.c: remove cast from memory allocationTobin C. Harding1-1/+1
Coccinelle emits this warning: WARNING: casting value returned by memory allocation function to (struct proc_inode *) is useless. Remove unnecessary cast. Link: http://lkml.kernel.org/r/1487745720-16967-1-git-send-email-me@tobin.cc Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08mm, compaction: finish whole pageblock to reduce fragmentationVlastimil Babka2-2/+35
The main goal of direct compaction is to form a high-order page for allocation, but it should also help against long-term fragmentation when possible. Most lower-than-pageblock-order compactions are for non-movable allocations, which means that if we compact in a movable pageblock and terminate as soon as we create the high-order page, it's unlikely that the fallback heuristics will claim the whole block. Instead there might be a single unmovable page in a pageblock full of movable pages, and the next unmovable allocation might pick another pageblock and increase long-term fragmentation. To help against such scenarios, this patch changes the termination criteria for compaction so that the current pageblock is finished even though the high-order page already exists. Note that it might be possible that the high-order page formed elsewhere in the zone due to parallel activity, but this patch doesn't try to detect that. This is only done with sync compaction, because async compaction is limited to pageblock of the same migratetype, where it cannot result in a migratetype fallback. (Async compaction also eagerly skips order-aligned blocks where isolation fails, which is against the goal of migrating away as much of the pageblock as possible.) As a result of this patch, long-term memory fragmentation should be reduced. In testing based on 4.9 kernel with stress-highalloc from mmtests configured for order-4 GFP_KERNEL allocations, this patch has reduced the number of unmovable allocations falling back to movable pageblocks by 20%. The number Link: http://lkml.kernel.org/r/20170307131545.28577-9-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08mm, compaction: restrict async compaction to pageblocks of same migratetypeVlastimil Babka2-9/+22
The migrate scanner in async compaction is currently limited to MIGRATE_MOVABLE pageblocks. This is a heuristic intended to reduce latency, based on the assumption that non-MOVABLE pageblocks are unlikely to contain movable pages. However, with the exception of THP's, most high-order allocations are not movable. Should the async compaction succeed, this increases the chance that the non-MOVABLE allocations will fallback to a MOVABLE pageblock, making the long-term fragmentation worse. This patch attempts to help the situation by changing async direct compaction so that the migrate scanner only scans the pageblocks of the requested migratetype. If it's a non-MOVABLE type and there are such pageblocks that do contain movable pages, chances are that the allocation can succeed within one of such pageblocks, removing the need for a fallback. If that fails, the subsequent sync attempt will ignore this restriction. In testing based on 4.9 kernel with stress-highalloc from mmtests configured for order-4 GFP_KERNEL allocations, this patch has reduced the number of unmovable allocations falling back to movable pageblocks by 30%. The number of movable allocations falling back is reduced by 12%. Link: http://lkml.kernel.org/r/20170307131545.28577-8-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08mm, compaction: add migratetype to compact_controlVlastimil Babka2-8/+8
Preparation patch. We are going to need migratetype at lower layers than compact_zone() and compact_finished(). Link: http://lkml.kernel.org/r/20170307131545.28577-7-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08mm, compaction: change migrate_async_suitable() to suitable_migration_source()Vlastimil Babka2-8/+16
Preparation for making the decisions more complex and depending on compact_control flags. No functional change. Link: http://lkml.kernel.org/r/20170307131545.28577-6-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08mm, page_alloc: count movable pages when stealing from pageblockVlastimil Babka3-21/+63
When stealing pages from pageblock of a different migratetype, we count how many free pages were stolen, and change the pageblock's migratetype if more than half of the pageblock was free. This might be too conservative, as there might be other pages that are not free, but were allocated with the same migratetype as our allocation requested. While we cannot determine the migratetype of allocated pages precisely (at least without the page_owner functionality enabled), we can count pages that compaction would try to isolate for migration - those are either on LRU or __PageMovable(). The rest can be assumed to be MIGRATE_RECLAIMABLE or MIGRATE_UNMOVABLE, which we cannot easily distinguish. This counting can be done as part of free page stealing with little additional overhead. The page stealing code is changed so that it considers free pages plus pages of the "good" migratetype for the decision whether to change pageblock's migratetype. The result should be more accurate migratetype of pageblocks wrt the actual pages in the pageblocks, when stealing from semi-occupied pageblocks. This should help the efficiency of page grouping by mobility. In testing based on 4.9 kernel with stress-highalloc from mmtests configured for order-4 GFP_KERNEL allocations, this patch has reduced the number of unmovable allocations falling back to movable pageblocks by 47%. The number of movable allocations falling back to other pageblocks are increased by 55%, but these events don't cause permanent fragmentation, so the tradeoff should be positive. Later patches also offset the movable fallback increase to some extent. [akpm@linux-foundation.org: merge fix] Link: http://lkml.kernel.org/r/20170307131545.28577-5-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08mm, page_alloc: split smallest stolen page in fallbackVlastimil Babka1-25/+37
The __rmqueue_fallback() function is called when there's no free page of requested migratetype, and we need to steal from a different one. There are various heuristics to make this event infrequent and reduce permanent fragmentation. The main one is to try stealing from a pageblock that has the most free pages, and possibly steal them all at once and convert the whole pageblock. Precise searching for such pageblock would be expensive, so instead the heuristics walks the free lists from MAX_ORDER down to requested order and assumes that the block with highest-order free page is likely to also have the most free pages in total. Chances are that together with the highest-order page, we steal also pages of lower orders from the same block. But then we still split the highest order page. This is wasteful and can contribute to fragmentation instead of avoiding it. This patch thus changes __rmqueue_fallback() to just steal the page(s) and put them on the freelist of the requested migratetype, and only report whether it was successful. Then we pick (and eventually split) the smallest page with __rmqueue_smallest(). This all happens under zone lock, so nobody can steal it from us in the process. This should reduce fragmentation due to fallbacks. At worst we are only stealing a single highest-order page and waste some cycles by moving it between lists and then removing it, but fallback is not exactly hot path so that should not be a concern. As a side benefit the patch removes some duplicate code by reusing __rmqueue_smallest(). [vbabka@suse.cz: fix endless loop in the modified __rmqueue()] Link: http://lkml.kernel.org/r/59d71b35-d556-4fc9-ee2e-1574259282fd@suse.cz Link: http://lkml.kernel.org/r/20170307131545.28577-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08mm, compaction: remove redundant watermark check in compact_finished()Vlastimil Babka1-8/+0
When detecting whether compaction has succeeded in forming a high-order page, __compact_finished() employs a watermark check, followed by an own search for a suitable page in the freelists. This is not ideal for two reasons: - The watermark check also searches high-order freelists, but has a less strict criteria wrt fallback. It's therefore redundant and waste of cycles. This was different in the past when high-order watermark check attempted to apply reserves to high-order pages. - The watermark check might actually fail due to lack of order-0 pages. Compaction can't help with that, so there's no point in continuing because of that. It's possible that high-order page still exists and it terminates. This patch therefore removes the watermark check. This should save some cycles and terminate compaction sooner in some cases. Link: http://lkml.kernel.org/r/20170307131545.28577-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08mm, compaction: reorder fields in struct compact_controlVlastimil Babka1-5/+5
Patch series "try to reduce fragmenting fallbacks", v3. Last year, Johannes Weiner has reported a regression in page mobility grouping [1] and while the exact cause was not found, I've come up with some ways to improve it by reducing the number of allocations falling back to different migratetype and causing permanent fragmentation. The series was tested with mmtests stress-highalloc modified to do GFP_KERNEL order-4 allocations, on 4.9 with "mm, vmscan: fix zone balance check in prepare_kswapd_sleep" (without that, kcompactd indeed wasn't woken up) on UMA machine with 4GB memory. There were 5 repeats of each run, as the extfrag stats are quite volatile (note the stats below are sums, not averages, as it was less perl hacking for me). Success rate are the same, already high due to the low allocation order used, so I'm not including them. Compaction stats: (the patches are stacked, and I haven't measured the non-functional-changes patches separately) patch 1 patch 2 patch 3 patch 4 patch 7 patch 8 Compaction stalls 22449 24680 24846 19765 22059 17480 Compaction success 12971 14836 14608 10475 11632 8757 Compaction failures 9477 9843 10238 9290 10426 8722 Page migrate success 3109022 3370438 3312164 1695105 1608435 2111379 Page migrate failure 911588 1149065 1028264 1112675 1077251 1026367 Compaction pages isolated 7242983 8015530 7782467 4629063 4402787 5377665 Compaction migrate scanned 980838938 987367943 957690188 917647238 947155598 1018922197 Compaction free scanned 557926893 598946443 602236894 594024490 541169699 763651731 Compaction cost 10243 10578 10304 8286 8398 9440 Compaction stats are mostly within noise until patch 4, which decreases the number of compactions, and migrations. Part of that could be due to more pageblocks marked as unmovable, and async compaction skipping those. This changes a bit with patch 7, but not so much. Patch 8 increases free scanner stats and migrations, which comes from the changed termination criteria. Interestingly number of compactions decreases - probably the fully compacted pageblock satisfies multiple subsequent allocations, so it amortizes. Next comes the extfrag tracepoint, where "fragmenting" means that an allocation had to fallback to a pageblock of another migratetype which wasn't fully free (which is almost all of the fallbacks). I have locally added another tracepoint for "Page steal" into steal_suitable_fallback() which triggers in situations where we are allowed to do move_freepages_block(). If we decide to also do set_pageblock_migratetype(), it's "Pages steal with pageblock" with break down for which allocation migratetype we are stealing and from which fallback migratetype. The last part "due to counting" comes from patch 4 and counts the events where the counting of movable pages allowed us to change pageblock's migratetype, while the number of free pages alone wouldn't be enough to cross the threshold. patch 1 patch 2 patch 3 patch 4 patch 7 patch 8 Page alloc extfrag event 10155066 8522968 10164959 15622080 13727068 13140319 Extfrag fragmenting 10149231 8517025 10159040 15616925 13721391 13134792 Extfrag fragmenting for unmovable 159504 168500 184177 97835 70625 56948 Extfrag fragmenting unmovable placed with movable 153613 163549 172693 91740 64099 50917 Extfrag fragmenting unmovable placed with reclaim. 5891 4951 11484 6095 6526 6031 Extfrag fragmenting for reclaimable 4738 4829 6345 4822 5640 5378 Extfrag fragmenting reclaimable placed with movable 1836 1902 1851 1579 1739 1760 Extfrag fragmenting reclaimable placed with unmov. 2902 2927 4494 3243 3901 3618 Extfrag fragmenting for movable 9984989 8343696 9968518 15514268 13645126 13072466 Pages steal 179954 192291 210880 123254 94545 81486 Pages steal with pageblock 22153 18943 20154 33562 29969 33444 Pages steal with pageblock for unmovable 14350 12858 13256 20660 19003 20852 Pages steal with pageblock for unmovable from mov. 12812 11402 11683 19072 17467 19298 Pages steal with pageblock for unmovable from recl. 1538 1456 1573 1588 1536 1554 Pages steal with pageblock for movable 7114 5489 5965 11787 10012 11493 Pages steal with pageblock for movable from unmov. 6885 5291 5541 11179 9525 10885 Pages steal with pageblock for movable from recl. 229 198 424 608 487 608 Pages steal with pageblock for reclaimable 689 596 933 1115 954 1099 Pages steal with pageblock for reclaimable from unmov. 273 219 537 658 547 667 Pages steal with pageblock for reclaimable from mov. 416 377 396 457 407 432 Pages steal with pageblock due to counting 11834 10075 7530 ... for unmovable 8993 7381 4616 ... for movable 2792 2653 2851 ... for reclaimable 49 41 63 What we can see is that "Extfrag fragmenting for unmovable" and "... placed with movable" drops with almost each patch, which is good as we are polluting less movable pageblocks with unmovable pages. The most significant change is patch 4 with movable page counting. On the other hand it increases "Extfrag fragmenting for movable" by 50%. "Pages steal" drops though, so these movable allocation fallbacks find only small free pages and are not allowed to steal whole pageblocks back. "Pages steal with pageblock" raises, because the patch increases the chances of pageblock migratetype changes to happen. This affects all migratetypes. The summary is that patch 4 is not a clear win wrt these stats, but I believe that the tradeoff it makes is a good one. There's less pollution of movable pageblocks by unmovable allocations. There's less stealing between pageblock, and those that remain have higher chance of changing migratetype also the pageblock itself, so it should more faithfully reflect the migratetype of the pages within the pageblock. The increase of movable allocations falling back to unmovable pageblock might look dramatic, but those allocations can be migrated by compaction when needed, and other patches in the series (7-9) improve that aspect. Patches 7 and 8 continue the trend of reduced unmovable fallbacks and also reduce the impact on movable fallbacks from patch 4. [1] https://www.spinics.net/lists/linux-mm/msg114237.html This patch (of 8): While currently there are (mostly by accident) no holes in struct compact_control (on x86_64), but we are going to add more bool flags, so place them all together to the end of the structure. While at it, just order all fields from largest to smallest. Link: http://lkml.kernel.org/r/20170307131545.28577-2-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>