summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2017-05-08kcov: simplify interrupt checkDmitry Vyukov1-8/+1
in_interrupt() semantics are confusing and wrong for most users as it also returns true when bh is disabled. Thus we open coded a proper check for interrupts in __sanitizer_cov_trace_pc() with a lengthy explanatory comment. Use the new in_task() predicate instead. Link: http://lkml.kernel.org/r/20170321091026.139655-1-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: James Morse <james.morse@arm.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08taskstats: add e/u/stime for TGID commandZhang Xiao1-0/+14
The elapsed time, user CPU time and system CPU time for the thread group status request are presently left at zero. Fill these in. [akpm@linux-foundation.org: run ktime_get_ns() a single time] [akpm@linux-foundation.org: include linux/sched/cputime.h for task_cputime()] Link: http://lkml.kernel.org/r/1488508424-12322-1-git-send-email-xiao.zhang@windriver.com Signed-off-by: Zhang Xiao <xiao.zhang@windriver.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08pidns: expose task pid_ns_for_children to userspaceKirill Tkhai1-0/+34
pid_ns_for_children set by a task is known only to the task itself, and it's impossible to identify it from outside. It's a big problem for checkpoint/restore software like CRIU, because it can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of their work. This patch solves the problem, and it exposes pid_ns_for_children to ns directory in standard way with the name "pid_for_children": ~# ls /proc/5531/ns -l | grep pid lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836] lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286] Link: http://lkml.kernel.org/r/149201123914.6007.2187327078064239572.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Andrei Vagin <avagin@virtuozzo.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08pidns: disable pid allocation if pid_ns_prepare_proc() is failed in alloc_pid()Kirill Tkhai1-1/+3
alloc_pidmap() advances pid_namespace::last_pid. When first pid allocation fails, then next created process will have pid 2 and pid_ns_prepare_proc() won't be called. So, pid_namespace::proc_mnt will never be initialized (not to mention that there won't be a child reaper). I saw crash stack of such case on kernel 3.10: BUG: unable to handle kernel NULL pointer dereference at (null) IP: proc_flush_task+0x8f/0x1b0 Call Trace: release_task+0x3f/0x490 wait_consider_task.part.10+0x7ff/0xb00 do_wait+0x11f/0x280 SyS_wait4+0x7d/0x110 We may fix this by restore of last_pid in 0 or by prohibiting of futher allocations. Since there was a similar issue in Oleg Nesterov's commit 314a8ad0f18a ("pidns: fix free_pid() to handle the first fork failure"). and it was fixed via prohibiting allocation, let's follow this way, and do the same. Link: http://lkml.kernel.org/r/149201021004.4863.6762095011554287922.stgit@localhost.localdomain Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andrei Vagin <avagin@virtuozzo.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08ia64: reuse append_elf_note() and final_note() functionsHari Bathini2-48/+14
Get rid of multiple definitions of append_elf_note() & final_note() functions. Reuse these functions compiled under CONFIG_CRASH_CORE Also, define Elf_Word and use it instead of generic u32 or the more specific Elf64_Word. Link: http://lkml.kernel.org/r/149035342324.6881.11667840929850361402.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_COREHari Bathini5-407/+456
Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and reuse crashkernel parameter for fadump", v4. Traditionally, kdump is used to save vmcore in case of a crash. Some architectures like powerpc can save vmcore using architecture specific support instead of kexec/kdump mechanism. Such architecture specific support also needs to reserve memory, to be used by dump capture kernel. crashkernel parameter can be a reused, for memory reservation, by such architecture specific infrastructure. This patchset removes dependency with CONFIG_KEXEC for crashkernel parameter and vmcoreinfo related code as it can be reused without kexec support. Also, crashkernel parameter is reused instead of fadump_reserve_mem to reserve memory for fadump. The first patch moves crashkernel parameter parsing and vmcoreinfo related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE. The second patch reuses the definitions of append_elf_note() & final_note() functions under CONFIG_CRASH_CORE in IA64 arch code. The third patch removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump) in powerpc. The next patch reuses crashkernel parameter for reserving memory for fadump, instead of the fadump_reserve_mem parameter. This has the advantage of using all syntaxes crashkernel parameter supports, for fadump as well. The last patch updates fadump kernel documentation about use of crashkernel parameter. This patch (of 5): Traditionally, kdump is used to save vmcore in case of a crash. Some architectures like powerpc can save vmcore using architecture specific support instead of kexec/kdump mechanism. Such architecture specific support also needs to reserve memory, to be used by dump capture kernel. crashkernel parameter can be a reused, for memory reservation, by such architecture specific infrastructure. But currently, code related to vmcoreinfo and parsing of crashkernel parameter is built under CONFIG_KEXEC_CORE. This patch introduces CONFIG_CRASH_CORE and moves the above mentioned code under this config, allowing code reuse without dependency on CONFIG_KEXEC. There is no functional change with this patch. Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08fork: free vmapped stacks in cache when cpus are offlineHoeun Ryu1-0/+23
Using virtually mapped stack, kernel stacks are allocated via vmalloc. In the current implementation, two stacks per cpu can be cached when tasks are freed and the cached stacks are used again in task duplications. But the cached stacks may remain unfreed even when cpu are offline. By adding a cpu hotplug callback to free the cached stacks when a cpu goes offline, the pages of the cached stacks are not wasted. Link: http://lkml.kernel.org/r/1487076043-17802-1-git-send-email-hoeun.ryu@gmail.com Signed-off-by: Hoeun Ryu <hoeun.ryu@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Mateusz Guzik <mguzik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08kernel/hung_task.c: defer showing held locksTetsuo Handa1-1/+7
When I was running my testcase which may block hundreds of threads on fs locks, I got lockup due to output from debug_show_all_locks() added by commit b2d4c2edb2e4 ("locking/hung_task: Show all locks"). For example, if 1000 threads were blocked in TASK_UNINTERRUPTIBLE state and 500 out of 1000 threads hold some lock, debug_show_all_locks() from for_each_process_thread() loop will report locks held by 500 threads for 1000 times. This is a too much noise. In order to make sure rcu_lock_break() is called frequently, we should avoid calling debug_show_all_locks() from for_each_process_thread() loop because debug_show_all_locks() effectively calls for_each_process_thread() loop. Let's defer calling debug_show_all_locks() till before panic() or leaving for_each_process_thread() loop. Link: http://lkml.kernel.org/r/1489296834-60436-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08proc/sysctl: fix the int overflow for jiffies conversionGao Feng1-1/+1
do_proc_dointvec_jiffies_conv() uses LONG_MAX/HZ as the max value to avoid overflow. But actually the *valp is int type, so it still causes overflow. For example, echo 2147483647 > ./sys/net/ipv4/tcp_keepalive_time Then, cat ./sys/net/ipv4/tcp_keepalive_time The output is "-1", it is not expected. Now use INT_MAX/HZ as the max value instead LONG_MAX/HZ to fix it. Link: http://lkml.kernel.org/r/1490109532-9228-1-git-send-email-fgao@ikuai8.com Signed-off-by: Gao Feng <fgao@ikuai8.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-05Merge tag 'powerpc-4.12-1' of ↵Linus Torvalds1-14/+18
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights include: - Larger virtual address space on 64-bit server CPUs. By default we use a 128TB virtual address space, but a process can request access to the full 512TB by passing a hint to mmap(). - Support for the new Power9 "XIVE" interrupt controller. - TLB flushing optimisations for the radix MMU on Power9. - Support for CAPI cards on Power9, using the "Coherent Accelerator Interface Architecture 2.0". - The ability to configure the mmap randomisation limits at build and runtime. - Several small fixes and cleanups to the kprobes code, as well as support for KPROBES_ON_FTRACE. - Major improvements to handling of system reset interrupts, correctly treating them as NMIs, giving them a dedicated stack and using a new hypervisor call to trigger them, all of which should aid debugging and robustness. - Many fixes and other minor enhancements. Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton Blanchard, Balbir Singh, Ben Hutchings, Benjamin Herrenschmidt, Bhupesh Sharma, Chris Packham, Christian Zigotzky, Christophe Leroy, Christophe Lombard, Daniel Axtens, David Gibson, Gautham R. Shenoy, Gavin Shan, Geert Uytterhoeven, Guilherme G. Piccoli, Hamish Martin, Hari Bathini, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mahesh J Salgaonkar, Mahesh Salgaonkar, Masami Hiramatsu, Matt Brown, Matthew R. Ochs, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pan Xinhui, Paul Mackerras, Rashmica Gupta, Russell Currey, Sukadev Bhattiprolu, Thadeu Lima de Souza Cascardo, Tobin C. Harding, Tyrel Datwyler, Uma Krishnan, Vaibhav Jain, Vipin K Parashar, Yang Shi" * tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits) powerpc/64s: Power9 has no LPCR[VRMASD] field so don't set it powerpc/powernv: Fix TCE kill on NVLink2 powerpc/mm/radix: Drop support for CPUs without lockless tlbie powerpc/book3s/mce: Move add_taint() later in virtual mode powerpc/sysfs: Move #ifdef CONFIG_HOTPLUG_CPU out of the function body powerpc/smp: Document irq enable/disable after migrating IRQs powerpc/mpc52xx: Don't select user-visible RTAS_PROC powerpc/powernv: Document cxl dependency on special case in pnv_eeh_reset() powerpc/eeh: Clean up and document event handling functions powerpc/eeh: Avoid use after free in eeh_handle_special_event() cxl: Mask slice error interrupts after first occurrence cxl: Route eeh events to all drivers in cxl_pci_error_detected() cxl: Force context lock during EEH flow powerpc/64: Allow CONFIG_RELOCATABLE if COMPILE_TEST powerpc/xmon: Teach xmon oops about radix vectors powerpc/mm/hash: Fix off-by-one in comment about kernel contexts ids powerpc/pseries: Enable VFIO powerpc/powernv: Fix iommu table size calculation hook for small tables powerpc/powernv: Check kzalloc() return value in pnv_pci_table_alloc powerpc: Add arch/powerpc/tools directory ...
2017-05-05Merge branch 'for-linus' of ↵Linus Torvalds3-4/+3
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "This is a set of small fixes that were mostly stumbled over during more significant development. This proc fix and the fix to posix-timers are the most significant of the lot. There is a lot of good development going on but unfortunately it didn't quite make the merge window" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: Fix unbalanced hard link numbers signal: Make kill_proc_info static rlimit: Properly call security_task_setrlimit signal: Remove unused definition of sig_user_definied ia64: Remove unused IA64_TASK_SIGHAND_OFFSET and IA64_SIGHAND_SIGLOCK_OFFSET ipc: Remove unused declaration of recompute_msgmni posix-timers: Correct sanity check in posix_cpu_nsleep sysctl: Remove dead register_sysctl_root
2017-05-03Merge tag 'modules-for-v4.12' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull modules updates from Jessica Yu: - Minor code cleanups - Fix section alignment for .init_array * tag 'modules-for-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: kallsyms: Use bounded strnchr() when parsing string module: Unify the return value type of try_module_get module: set .init_array alignment to 8
2017-05-03Merge tag 'trace-v4.12' of ↵Linus Torvalds12-585/+1303
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "New features for this release: - Pretty much a full rewrite of the processing of function plugins. i.e. echo do_IRQ:stacktrace > set_ftrace_filter - The rewrite was needed to add plugins to be unique to tracing instances. i.e. mkdir instance/foo; cd instances/foo; echo do_IRQ:stacktrace > set_ftrace_filter The old way was written very hacky. This removes a lot of those hacks. - New "function-fork" tracing option. When set, pids in the set_ftrace_pid will have their children added when the processes with their pids listed in the set_ftrace_pid file forks. - Exposure of "maxactive" for kretprobe in kprobe_events - Allow for builtin init functions to be traced by the function tracer (via the kernel command line). Module init function tracing will come in the next release. - Added more selftests, and have selftests also test in an instance" * tag 'trace-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (60 commits) ring-buffer: Return reader page back into existing ring buffer selftests: ftrace: Allow some event trigger tests to run in an instance selftests: ftrace: Have some basic tests run in a tracing instance too selftests: ftrace: Have event tests also run in an tracing instance selftests: ftrace: Make func_event_triggers and func_traceonoff_triggers tests do instances selftests: ftrace: Allow some tests to be run in a tracing instance tracing/ftrace: Allow for instances to trigger their own stacktrace probes tracing/ftrace: Allow for the traceonoff probe be unique to instances tracing/ftrace: Enable snapshot function trigger to work with instances tracing/ftrace: Allow instances to have their own function probes tracing/ftrace: Add a better way to pass data via the probe functions ftrace: Dynamically create the probe ftrace_ops for the trace_array tracing: Pass the trace_array into ftrace_probe_ops functions tracing: Have the trace_array hold the list of registered func probes ftrace: If the hash for a probe fails to update then free what was initialized ftrace: Have the function probes call their own function ftrace: Have each function probe use its own ftrace_ops ftrace: Have unregister_ftrace_function_probe_func() return a value ftrace: Add helper function ftrace_hash_move_and_update_ops() ftrace: Remove data field from ftrace_func_probe structure ...
2017-05-03Merge branch 'for-linus' of ↵Linus Torvalds1-25/+56
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - There is a situation when early console is not deregistered because the preferred one matches a wrong entry. It caused messages to appear twice. This is the 2nd attempt to fix it. The first one was wrong, see the commit c6c7d83b9c9e ('Revert "console: don't prefer first registered if DT specifies stdout-path"'). The fix is coupled with some small code clean up. Well, the console registration code would deserve a big one. We need to think about it. - Do not lose information about the preemtive context when the console semaphore is re-taken. - Do not block CPU hotplug when someone else is already pushing messages to the console. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: fix double printing with earlycon printk: rename selected_console -> preferred_console printk: fix name/type/scope of preferred_console var printk: Correctly handle preemption in console_unlock() printk: use console_trylock() in console_cpu_notify()
2017-05-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-2/+9
Merge misc updates from Andrew Morton: - a few misc things - most of MM - KASAN updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) kasan: separate report parts by empty lines kasan: improve double-free report format kasan: print page description after stacks kasan: improve slab object description kasan: change report header kasan: simplify address description logic kasan: change allocation and freeing stack traces headers kasan: unify report headers kasan: introduce helper functions for determining bug type mm: hwpoison: call shake_page() after try_to_unmap() for mlocked page mm: hwpoison: call shake_page() unconditionally mm/swapfile.c: fix swap space leak in error path of swap_free_entries() mm/gup.c: fix access_ok() argument type mm/truncate: avoid pointless cleancache_invalidate_inode() calls. mm/truncate: bail out early from invalidate_inode_pages2_range() if mapping is empty fs/block_dev: always invalidate cleancache in invalidate_bdev() fs: fix data invalidation in the cleancache during direct IO zram: reduce load operation in page_same_filled zram: use zram_free_page instead of open-coded zram: introduce zram data accessor ...
2017-05-03mm: introduce memalloc_nofs_{save,restore} APIMichal Hocko1-3/+3
GFP_NOFS context is used for the following 5 reasons currently: - to prevent from deadlocks when the lock held by the allocation context would be needed during the memory reclaim - to prevent from stack overflows during the reclaim because the allocation is performed from a deep context already - to prevent lockups when the allocation context depends on other reclaimers to make a forward progress indirectly - just in case because this would be safe from the fs POV - silence lockdep false positives Unfortunately overuse of this allocation context brings some problems to the MM. Memory reclaim is much weaker (especially during heavy FS metadata workloads), OOM killer cannot be invoked because the MM layer doesn't have enough information about how much memory is freeable by the FS layer. In many cases it is far from clear why the weaker context is even used and so it might be used unnecessarily. We would like to get rid of those as much as possible. One way to do that is to use the flag in scopes rather than isolated cases. Such a scope is declared when really necessary, tracked per task and all the allocation requests from within the context will simply inherit the GFP_NOFS semantic. Not only this is easier to understand and maintain because there are much less problematic contexts than specific allocation requests, this also helps code paths where FS layer interacts with other layers (e.g. crypto, security modules, MM etc...) and there is no easy way to convey the allocation context between the layers. Introduce memalloc_nofs_{save,restore} API to control the scope of GFP_NOFS allocation context. This is basically copying memalloc_noio_{save,restore} API we have for other restricted allocation context GFP_NOIO. The PF_MEMALLOC_NOFS flag already exists and it is just an alias for PF_FSTRANS which has been xfs specific until recently. There are no more PF_FSTRANS users anymore so let's just drop it. PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO. memalloc_noio_flags is renamed to current_gfp_context because it now cares about both PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts. Xfs code paths preserve their semantic. kmem_flags_convert() doesn't need to evaluate the flag anymore. This patch shouldn't introduce any functional changes. Let's hope that filesystems will drop direct GFP_NOFS (resp. ~__GFP_FS) usage as much as possible and only use a properly documented memalloc_nofs_{save,restore} checkpoints where they are appropriate. [akpm@linux-foundation.org: fix comment typo, reflow comment] Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Brian Foster <bfoster@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03lockdep: allow to disable reclaim lockup detectionMichal Hocko1-0/+4
The current implementation of the reclaim lockup detection can lead to false positives and those even happen and usually lead to tweak the code to silence the lockdep by using GFP_NOFS even though the context can use __GFP_FS just fine. See http://lkml.kernel.org/r/20160512080321.GA18496@dastard as an example. ================================= [ INFO: inconsistent lock state ] 4.5.0-rc2+ #4 Tainted: G O --------------------------------- inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage. kswapd0/543 [HC0[0]:SC0[0]:HE1:SE1] takes: (&xfs_nondir_ilock_class){++++-+}, at: xfs_ilock+0x177/0x200 [xfs] {RECLAIM_FS-ON-R} state was registered at: mark_held_locks+0x79/0xa0 lockdep_trace_alloc+0xb3/0x100 kmem_cache_alloc+0x33/0x230 kmem_zone_alloc+0x81/0x120 [xfs] xfs_refcountbt_init_cursor+0x3e/0xa0 [xfs] __xfs_refcount_find_shared+0x75/0x580 [xfs] xfs_refcount_find_shared+0x84/0xb0 [xfs] xfs_getbmap+0x608/0x8c0 [xfs] xfs_vn_fiemap+0xab/0xc0 [xfs] do_vfs_ioctl+0x498/0x670 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x12/0x6f CPU0 ---- lock(&xfs_nondir_ilock_class); <Interrupt> lock(&xfs_nondir_ilock_class); *** DEADLOCK *** 3 locks held by kswapd0/543: stack backtrace: CPU: 0 PID: 543 Comm: kswapd0 Tainted: G O 4.5.0-rc2+ #4 Call Trace: lock_acquire+0xd8/0x1e0 down_write_nested+0x5e/0xc0 xfs_ilock+0x177/0x200 [xfs] xfs_reflink_cancel_cow_range+0x150/0x300 [xfs] xfs_fs_evict_inode+0xdc/0x1e0 [xfs] evict+0xc5/0x190 dispose_list+0x39/0x60 prune_icache_sb+0x4b/0x60 super_cache_scan+0x14f/0x1a0 shrink_slab.part.63.constprop.79+0x1e9/0x4e0 shrink_zone+0x15e/0x170 kswapd+0x4f1/0xa80 kthread+0xf2/0x110 ret_from_fork+0x3f/0x70 To quote Dave: "Ignoring whether reflink should be doing anything or not, that's a "xfs_refcountbt_init_cursor() gets called both outside and inside transactions" lockdep false positive case. The problem here is lockdep has seen this allocation from within a transaction, hence a GFP_NOFS allocation, and now it's seeing it in a GFP_KERNEL context. Also note that we have an active reference to this inode. So, because the reclaim annotations overload the interrupt level detections and it's seen the inode ilock been taken in reclaim ("interrupt") context, this triggers a reclaim context warning where it thinks it is unsafe to do this allocation in GFP_KERNEL context holding the inode ilock..." This sounds like a fundamental problem of the reclaim lock detection. It is really impossible to annotate such a special usecase IMHO unless the reclaim lockup detection is reworked completely. Until then it is much better to provide a way to add "I know what I am doing flag" and mark problematic places. This would prevent from abusing GFP_NOFS flag which has a runtime effect even on configurations which have lockdep disabled. Introduce __GFP_NOLOCKDEP flag which tells the lockdep gfp tracking to skip the current allocation request. While we are at it also make sure that the radix tree doesn't accidentaly override tags stored in the upper part of the gfp_mask. Link: http://lkml.kernel.org/r/20170306131408.9828-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Brian Foster <bfoster@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03lockdep: teach lockdep about memalloc_noio_saveNikolay Borisov1-1/+4
Patch series "scope GFP_NOFS api", v5. This patch (of 7): Commit 21caf2fc1931 ("mm: teach mm by current context info to not do I/O during memory allocation") added the memalloc_noio_(save|restore) functions to enable people to modify the MM behavior by disabling I/O during memory allocation. This was further extended in commit 934f3072c17c ("mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set"). memalloc_noio_* functions prevent allocation paths recursing back into the filesystem without explicitly changing the flags for every allocation site. However, lockdep hasn't been keeping up with the changes and it entirely misses handling the memalloc_noio adjustments. Instead, it is left to the callers of __lockdep_trace_alloc to call the function after they have shaven the respective GFP flags which can lead to false positives: ================================= [ INFO: inconsistent lock state ] 4.10.0-nbor #134 Not tainted --------------------------------- inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage. fsstress/3365 [HC0[0]:SC0[0]:HE1:SE1] takes: (&xfs_nondir_ilock_class){++++?.}, at: xfs_ilock+0x141/0x230 {IN-RECLAIM_FS-W} state was registered at: __lock_acquire+0x62a/0x17c0 lock_acquire+0xc5/0x220 down_write_nested+0x4f/0x90 xfs_ilock+0x141/0x230 xfs_reclaim_inode+0x12a/0x320 xfs_reclaim_inodes_ag+0x2c8/0x4e0 xfs_reclaim_inodes_nr+0x33/0x40 xfs_fs_free_cached_objects+0x19/0x20 super_cache_scan+0x191/0x1a0 shrink_slab+0x26f/0x5f0 shrink_node+0xf9/0x2f0 kswapd+0x356/0x920 kthread+0x10c/0x140 ret_from_fork+0x31/0x40 irq event stamp: 173777 hardirqs last enabled at (173777): __local_bh_enable_ip+0x70/0xc0 hardirqs last disabled at (173775): __local_bh_enable_ip+0x37/0xc0 softirqs last enabled at (173776): _xfs_buf_find+0x67a/0xb70 softirqs last disabled at (173774): _xfs_buf_find+0x5db/0xb70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&xfs_nondir_ilock_class); <Interrupt> lock(&xfs_nondir_ilock_class); *** DEADLOCK *** 4 locks held by fsstress/3365: #0: (sb_writers#10){++++++}, at: mnt_want_write+0x24/0x50 #1: (&sb->s_type->i_mutex_key#12){++++++}, at: vfs_setxattr+0x6f/0xb0 #2: (sb_internal#2){++++++}, at: xfs_trans_alloc+0xfc/0x140 #3: (&xfs_nondir_ilock_class){++++?.}, at: xfs_ilock+0x141/0x230 stack backtrace: CPU: 0 PID: 3365 Comm: fsstress Not tainted 4.10.0-nbor #134 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Call Trace: kmem_cache_alloc_node_trace+0x3a/0x2c0 vm_map_ram+0x2a1/0x510 _xfs_buf_map_pages+0x77/0x140 xfs_buf_get_map+0x185/0x2a0 xfs_attr_rmtval_set+0x233/0x430 xfs_attr_leaf_addname+0x2d2/0x500 xfs_attr_set+0x214/0x420 xfs_xattr_set+0x59/0xb0 __vfs_setxattr+0x76/0xa0 __vfs_setxattr_noperm+0x5e/0xf0 vfs_setxattr+0xae/0xb0 setxattr+0x15e/0x1a0 path_setxattr+0x8f/0xc0 SyS_lsetxattr+0x11/0x20 entry_SYSCALL_64_fastpath+0x23/0xc6 Let's fix this by making lockdep explicitly do the shaving of respective GFP flags. Fixes: 934f3072c17c ("mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set") Link: http://lkml.kernel.org/r/20170306131408.9828-2-mhocko@kernel.org Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Brian Foster <bfoster@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-03Merge tag 'drm-for-v4.12' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds1-0/+2
Pull drm u pdates from Dave Airlie: "This is the main drm pull request for v4.12. Apart from two fixes pulls, everything should have been in drm-next for at least 2 weeks. The biggest thing in here is AMD released the public headers for their upcoming VEGA GPUs. These as always are quite a sizeable chunk of header files. They've also added initial non-display support for those GPUs, though they aren't available in production yet. Otherwise it's pretty much normal. New bridge drivers: - megachips-stdpxxxx-ge-b850v3-fw LVDS->DP++ - generic LVDS bridge support. Core: - Displayport link train failure reporting to userspace - debugfs interface cleaned up - subsystem TODO in kerneldoc now - Extended fbdev support (flipping and vblank wait) - drm_platform removed - EDP CRC support in helper - HF-VSDB SCDC support in EDID parser - Lots of code cleanups and header extraction - Thunderbolt external GPU awareness - Atomic helper improvements - Documentation improvements panel: - Sitronix and Samsung new panel support amdgpu: - Preliminary vega10 support - Multi-level page table support - GPU sensor support for userspace - PRT support for sparse buffers - SR-IOV improvements - Non-contig VRAM CPU mapping i915: - Atomic modesetting enabled by default on Gen5+ - LSPCON improvements - Atomic state handling for cdclk - GPU reset improvements - In-kernel unit tests - Geminilake improvements and color manager support - Designware i2c fixes - vblank evasion improvements - Hotplug safe connector iterators - GVT scheduler QoS support - GVT Kabylake support nouveau: - Acceleration support for Pascal (GP10x). - Rearchitecture of code handling proprietary signed firmware - Fix GTX 970 with odd MMU configuration - GP10B support - GP107 acceleration support vmwgfx: - Atomic modesetting support for vmwgfx omapdrm: - Support for render nodes - Refactor omapdss code - Fix some probe ordering issues - Fix too dark RGB565 rendering sunxi: - prelim rework for multiple pipes. mali-dp: - Color management support - Plane scaling - Power management improvements imx-drm: - Prefetch Resolve Engine/Gasket on i.MX6QP - Deferred plane disabling - Separate alpha support mediatek: - Mediatek SoC MT2701 support rcar-du: - Gen3 HDMI support msm: - 4k support for newer chips - OPP bindings for gpu - prep work for per-process pagetables vc4: - HDMI audio support - fixes qxl: - minor fixes. dw-hdmi: - PHY improvements - CSC fixes - Amlogic GX SoC support" * tag 'drm-for-v4.12' of git://people.freedesktop.org/~airlied/linux: (1778 commits) drm/nouveau/fb/gf100-: Fix 32 bit wraparound in new ram detection drm/nouveau/secboot/gm20b: fix the error return code in gm20b_secboot_tegra_read_wpr() drm/nouveau/kms: Increase max retries in scanout position queries. drm/nouveau/bios/bitP: check that table is long enough for optional pointers drm/nouveau/fifo/nv40: no ctxsw for pre-nv44 mpeg engine drm: mali-dp: use div_u64 for expensive 64-bit divisions drm/i915: Confirm the request is still active before adding it to the await drm/i915: Avoid busy-spinning on VLV_GLTC_PW_STATUS mmio drm/i915/selftests: Allocate inode/file dynamically drm/i915: Fix system hang with EI UP masked on Haswell drm/i915: checking for NULL instead of IS_ERR() in mock selftests drm/i915: Perform link quality check unconditionally during long pulse drm/i915: Fix use after free in lpe_audio_platdev_destroy() drm/i915: Use the right mapping_gfp_mask for final shmem allocation drm/i915: Make legacy cursor updates more unsynced drm/i915: Apply a cond_resched() to the saturated signaler drm/i915: Park the signaler before sleeping drm: mali-dp: Check the mclk rate and allow up/down scaling drm: mali-dp: Enable image enhancement when scaling drm: mali-dp: Add plane upscaling support ...
2017-05-03Merge branch 'fsnotify' of ↵Linus Torvalds4-32/+71
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "The branch contains mainly a rework of fsnotify infrastructure fixing a shortcoming that we have waited for response to fanotify permission events with SRCU read lock held and when the process consuming events was slow to respond the kernel has stalled. It also contains several cleanups of unnecessary indirections in fsnotify framework and a bugfix from Amir fixing leakage of kernel internal errno to userspace" * 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (37 commits) fanotify: don't expose EOPENSTALE to userspace fsnotify: remove a stray unlock fsnotify: Move ->free_mark callback to fsnotify_ops fsnotify: Add group pointer in fsnotify_init_mark() fsnotify: Drop inode_mark.c fsnotify: Remove fsnotify_find_{inode|vfsmount}_mark() fsnotify: Remove fsnotify_detach_group_marks() fsnotify: Rename fsnotify_clear_marks_by_group_flags() fsnotify: Inline fsnotify_clear_{inode|vfsmount}_mark_group() fsnotify: Remove fsnotify_recalc_{inode|vfsmount}_mask() fsnotify: Remove fsnotify_set_mark_{,ignored_}mask_locked() fanotify: Release SRCU lock when waiting for userspace response fsnotify: Pass fsnotify_iter_info into handle_event handler fsnotify: Provide framework for dropping SRCU lock in ->handle_event fsnotify: Remove special handling of mark destruction on group shutdown fsnotify: Detach mark from object list when last reference is dropped fsnotify: Move queueing of mark for destruction into fsnotify_put_mark() inotify: Do not drop mark reference under idr_lock fsnotify: Free fsnotify_mark_connector when there is no mark attached fsnotify: Lock object list with connector lock ...
2017-05-03Merge branch 'stable-4.12' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds8-173/+201
Pull audit updates from Paul Moore: "Fourteen audit patches for v4.12 that span the full range of fixes, new features, and internal cleanups. We have a patches to move to 64-bit timestamps, convert refcounts from atomic_t to refcount_t, track PIDs using the pid struct instead of pid_t, convert our own private audit buffer cache to a standard kmem_cache, log kernel module names when they are unloaded, and normalize the NETFILTER_PKT to make the userspace folks happier. From a fixes perspective, the most important is likely the auditd connection tracking RCU fix; it was a rather brain dead bug that I'll take the blame for, but thankfully it didn't seem to affect many people (only one report). I think the patch subject lines and commit descriptions do a pretty good job of explaining the details and why the changes are important so I'll point you there instead of duplicating it here; as usual, if you have any questions you know where to find us. We also manage to take out more code than we put in this time, that always makes me happy :)" * 'stable-4.12' of git://git.infradead.org/users/pcmoore/audit: audit: fix the RCU locking for the auditd_connection structure audit: use kmem_cache to manage the audit_buffer cache audit: Use timespec64 to represent audit timestamps audit: store the auditd PID as a pid struct instead of pid_t audit: kernel generated netlink traffic should have a portid of 0 audit: combine audit_receive() and audit_receive_skb() audit: convert audit_watch.count from atomic_t to refcount_t audit: convert audit_tree.count from atomic_t to refcount_t audit: normalize NETFILTER_PKT netfilter: use consistent ipv4 network offset in xt_AUDIT audit: log module name on delete_module audit: remove unnecessary semicolon in audit_watch_handle_event() audit: remove unnecessary semicolon in audit_mark_handle_event() audit: remove unnecessary semicolon in audit_field_valid()
2017-05-03Merge branch 'next' of ↵Linus Torvalds2-13/+24
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Highlights: IMA: - provide ">" and "<" operators for fowner/uid/euid rules KEYS: - add a system blacklist keyring - add KEYCTL_RESTRICT_KEYRING, exposes keyring link restriction functionality to userland via keyctl() LSM: - harden LSM API with __ro_after_init - add prlmit security hook, implement for SELinux - revive security_task_alloc hook TPM: - implement contextual TPM command 'spaces'" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (98 commits) tpm: Fix reference count to main device tpm_tis: convert to using locality callbacks tpm: fix handling of the TPM 2.0 event logs tpm_crb: remove a cruft constant keys: select CONFIG_CRYPTO when selecting DH / KDF apparmor: Make path_max parameter readonly apparmor: fix parameters so that the permission test is bypassed at boot apparmor: fix invalid reference to index variable of iterator line 836 apparmor: use SHASH_DESC_ON_STACK security/apparmor/lsm.c: set debug messages apparmor: fix boolreturn.cocci warnings Smack: Use GFP_KERNEL for smk_netlbl_mls(). smack: fix double free in smack_parse_opts_str() KEYS: add SP800-56A KDF support for DH KEYS: Keyring asymmetric key restrict method with chaining KEYS: Restrict asymmetric key linkage using a specific keychain KEYS: Add a lookup_restriction function for the asymmetric key type KEYS: Add KEYCTL_RESTRICT_KEYRING KEYS: Consistent ordering for __key_link_begin and restrict check KEYS: Add an optional lookup_restriction hook to key_type ...
2017-05-02Merge branch 'for-linus' of ↵Linus Torvalds3-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: tty: fix comment for __tty_alloc_driver() init/main: properly align the multi-line comment init/main: Fix double "the" in comment Fix dead URLs to ftp.kernel.org drivers: Clean up duplicated email address treewide: Fix typo in xml/driver-api/basics.xml tools/testing/selftests/powerpc: remove redundant CFLAGS in Makefile: "-Wall -O2 -Wall" -> "-O2 -Wall" selftests/timers: Spelling s/privledges/privileges/ HID: picoLCD: Spelling s/REPORT_WRTIE_MEMORY/REPORT_WRITE_MEMORY/ net: phy: dp83848: Fix Typo UBI: Fix typos Documentation: ftrace.txt: Correct nice value of 120 priority net: fec: Fix typo in error msg and comment treewide: Fix typos in printk
2017-05-02Merge branch 'for-linus' of ↵Linus Torvalds10-281/+1068
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatch updates from Jiri Kosina: - a per-task consistency model is being added for architectures that support reliable stack dumping (extending this, currently rather trivial set, is currently in the works). This extends the nature of the types of patches that can be applied by live patching infrastructure. The code stems from the design proposal made [1] back in November 2014. It's a hybrid of SUSE's kGraft and RH's kpatch, combining advantages of both: it uses kGraft's per-task consistency and syscall barrier switching combined with kpatch's stack trace switching. There are also a number of fallback options which make it quite flexible. Most of the heavy lifting done by Josh Poimboeuf with help from Miroslav Benes and Petr Mladek [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz - module load time patch optimization from Zhou Chengming - a few assorted small fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: add missing printk newlines livepatch: Cancel transition a safe way for immediate patches livepatch: Reduce the time of finding module symbols livepatch: make klp_mutex proper part of API livepatch: allow removal of a disabled patch livepatch: add /proc/<pid>/patch_state livepatch: change to a per-task consistency model livepatch: store function sizes livepatch: use kstrtobool() in enabled_store() livepatch: move patching functions into patch.c livepatch: remove unnecessary object loaded check livepatch: separate enabled and patched states livepatch/s390: add TIF_PATCH_PENDING thread flag livepatch/s390: reorganize TIF thread flag bits livepatch/powerpc: add TIF_PATCH_PENDING thread flag livepatch/x86: add TIF_PATCH_PENDING thread flag livepatch: create temporary klp_update_patch_state() stub x86/entry: define _TIF_ALLWORK_MASK flags explicitly stacktrace/x86: add function for detecting reliable stack traces
2017-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds14-282/+625
Pull networking updates from David Millar: "Here are some highlights from the 2065 networking commits that happened this development cycle: 1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri) 2) Add a generic XDP driver, so that anyone can test XDP even if they lack a networking device whose driver has explicit XDP support (me). 3) Sparc64 now has an eBPF JIT too (me) 4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei Starovoitov) 5) Make netfitler network namespace teardown less expensive (Florian Westphal) 6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana) 7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger) 8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky) 9) Multiqueue support in stmmac driver (Joao Pinto) 10) Remove TCP timewait recycling, it never really could possibly work well in the real world and timestamp randomization really zaps any hint of usability this feature had (Soheil Hassas Yeganeh) 11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay Aleksandrov) 12) Add socket busy poll support to epoll (Sridhar Samudrala) 13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso, and several others) 14) IPSEC hw offload infrastructure (Steffen Klassert)" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits) tipc: refactor function tipc_sk_recv_stream() tipc: refactor function tipc_sk_recvmsg() net: thunderx: Optimize page recycling for XDP net: thunderx: Support for XDP header adjustment net: thunderx: Add support for XDP_TX net: thunderx: Add support for XDP_DROP net: thunderx: Add basic XDP support net: thunderx: Cleanup receive buffer allocation net: thunderx: Optimize CQE_TX handling net: thunderx: Optimize RBDR descriptor handling net: thunderx: Support for page recycling ipx: call ipxitf_put() in ioctl error path net: sched: add helpers to handle extended actions qed*: Fix issues in the ptp filter config implementation. qede: Fix concurrency issue in PTP Tx path processing. stmmac: Add support for SIMATIC IOT2000 platform net: hns: fix ethtool_get_strings overflow in hns driver tcp: fix wraparound issue in tcp_lp bpf, arm64: fix jit branch offset related to ldimm64 bpf, arm64: implement jiting of BPF_XADD ...
2017-05-02Merge branch 'linus' of ↵Linus Torvalds1-10/+5
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "Here is the crypto update for 4.12: API: - Add batch registration for acomp/scomp - Change acomp testing to non-unique compressed result - Extend algorithm name limit to 128 bytes - Require setkey before accept(2) in algif_aead Algorithms: - Add support for deflate rfc1950 (zlib) Drivers: - Add accelerated crct10dif for powerpc - Add crc32 in stm32 - Add sha384/sha512 in ccp - Add 3des/gcm(aes) for v5 devices in ccp - Add Queue Interface (QI) backend support in caam - Add new Exynos RNG driver - Add ThunderX ZIP driver - Add driver for hardware random generator on MT7623 SoC" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (101 commits) crypto: stm32 - Fix OF module alias information crypto: algif_aead - Require setkey before accept(2) crypto: scomp - add support for deflate rfc1950 (zlib) crypto: scomp - allow registration of multiple scomps crypto: ccp - Change ISR handler method for a v5 CCP crypto: ccp - Change ISR handler method for a v3 CCP crypto: crypto4xx - rename ce_ring_contol to ce_ring_control crypto: testmgr - Allow ecb(cipher_null) in FIPS mode Revert "crypto: arm64/sha - Add constant operand modifier to ASM_EXPORT" crypto: ccp - Disable interrupts early on unload crypto: ccp - Use only the relevant interrupt bits hwrng: mtk - Add driver for hardware random generator on MT7623 SoC dt-bindings: hwrng: Add Mediatek hardware random generator bindings crypto: crct10dif-vpmsum - Fix missing preempt_disable() crypto: testmgr - replace compression known answer test crypto: acomp - allow registration of multiple acomps hwrng: n2 - Use devm_kcalloc() in n2rng_probe() crypto: chcr - Fix error handling related to 'chcr_alloc_shash' padata: get_next is never NULL crypto: exynos - Add new Exynos RNG driver ...
2017-05-02Merge branch 'work.splice' of ↵Linus Torvalds2-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull splice updates from Al Viro: "These actually missed the last cycle; the branch itself is from last December" * 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make nr_pages calculation in default_file_splice_read() a bit less ugly splice/tee/vmsplice: validate flags splice_pipe_desc: kill ->flags remove spd_release_page()
2017-05-02audit: fix the RCU locking for the auditd_connection structurePaul Moore1-57/+100
Cong Wang correctly pointed out that the RCU read locking of the auditd_connection struct was wrong, this patch correct this by adopting a more traditional, and correct RCU locking model. This patch is heavily based on an earlier prototype by Cong Wang. Cc: <stable@vger.kernel.org> # 4.11.x- Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02audit: use kmem_cache to manage the audit_buffer cachePaul Moore1-49/+17
The audit subsystem implemented its own buffer cache mechanism which is a bit silly these days when we could use the kmem_cache construct. Some credit is due to Florian Westphal for originally proposing that we remove the audit cache implementation in favor of simple kmalloc()/kfree() calls, but I would rather have a dedicated slab cache to ease debugging and future stats/performance work. Cc: Florian Westphal <fw@strlen.de> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02audit: Use timespec64 to represent audit timestampsDeepa Dinamani3-9/+9
struct timespec is not y2038 safe. Audit timestamps are recorded in string format into an audit buffer for a given context. These mark the entry timestamps for the syscalls. Use y2038 safe struct timespec64 to represent the times. The log strings can handle this transition as strings can hold upto 1024 characters. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02audit: store the auditd PID as a pid struct instead of pid_tPaul Moore2-28/+58
This is arguably the right thing to do, and will make it easier when we start supporting multiple audit daemons in different namespaces. Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02audit: kernel generated netlink traffic should have a portid of 0Paul Moore3-27/+13
We were setting the portid incorrectly in the netlink message headers, fix that to always be 0 (nlmsg_pid = 0). Signed-off-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
2017-05-02audit: combine audit_receive() and audit_receive_skb()Paul Moore1-11/+8
There is no reason to have both of these functions, combine the two. Signed-off-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
2017-05-02audit: convert audit_watch.count from atomic_t to refcount_tElena Reshetova1-4/+5
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> [PM: fix subject line, add #include] Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02audit: convert audit_tree.count from atomic_t to refcount_tElena Reshetova1-4/+5
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> [PM: fix subject line, add #include] Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02audit: log module name on delete_moduleRichard Guy Briggs1-0/+2
When a sysadmin wishes to monitor module unloading with a syscall rule such as: -a always,exit -F arch=x86_64 -S delete_module -F key=mod-unload the SYSCALL record doesn't tell us what module was requested for unloading. Use the new KERN_MODULE auxiliary record to record it. The SYSCALL record result code will list the return code. See: https://github.com/linux-audit/audit-kernel/issues/37 https://github.com/linux-audit/audit-kernel/issues/7 https://github.com/linux-audit/audit-kernel/wiki/RFE-Module-Load-Record-Format Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Acked-by: Jessica Yu <jeyu@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02audit: remove unnecessary semicolon in audit_watch_handle_event()Nicholas Mc Guire1-1/+1
The excess ; after the closing parenthesis is just code-noise it has no and can be removed. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> [PM: tweaked subject line] Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02audit: remove unnecessary semicolon in audit_mark_handle_event()Nicholas Mc Guire1-1/+1
The excess ; after the closing parenthesis is just code-noise it has no and can be removed. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> [PM: tweaked subject line] Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-02audit: remove unnecessary semicolon in audit_field_valid()Nicholas Mc Guire1-2/+2
The excess ; after the closing parenthesis is just code-noise it has no and can be removed. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> [PM: tweak subject line] Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-05-01Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds1-13/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main x86 MM changes in this cycle were: - continued native kernel PCID support preparation patches to the TLB flushing code (Andy Lutomirski) - various fixes related to 32-bit compat syscall returning address over 4Gb in applications, launched from 64-bit binaries - motivated by C/R frameworks such as Virtuozzo. (Dmitry Safonov) - continued Intel 5-level paging enablement: in particular the conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov) - x86/mpx ABI corner case fixes/enhancements (Joerg Roedel) - ... plus misc updates, fixes and cleanups" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits) mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash x86/mm: Fix flush_tlb_page() on Xen x86/mm: Make flush_tlb_mm_range() more predictable x86/mm: Remove flush_tlb() and flush_tlb_current_task() x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly() x86/mm/64: Fix crash in remove_pagetable() Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" x86/boot/e820: Remove a redundant self assignment x86/mm: Fix dump pagetables for 4 levels of page tables x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()" x86/espfix: Add support for 5-level paging x86/kasan: Extend KASAN to support 5-level paging x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y x86/paravirt: Add 5-level support to the paravirt code x86/mm: Define virtual memory map for 5-level paging x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert x86/boot: Detect 5-level paging support x86/mm/numa: Remove numa_nodemask_from_meminfo() ...
2017-05-01Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds1-52/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot updates from Ingo Molnar: "The biggest changes in this cycle were: - reworking of the e820 code: separate in-kernel and boot-ABI data structures and apply a whole range of cleanups to the kernel side. No change in functionality. - enable KASLR by default: it's used by all major distros and it's out of the experimental stage as well. - ... misc fixes and cleanups" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits) x86/KASLR: Fix kexec kernel boot crash when KASLR randomization fails x86/reboot: Turn off KVM when halting a CPU x86/boot: Fix BSS corruption/overwrite bug in early x86 kernel startup x86: Enable KASLR by default boot/param: Move next_arg() function to lib/cmdline.c for later reuse x86/boot: Fix Sparse warning by including required header file x86/boot/64: Rename start_cpu() x86/xen: Update e820 table handling to the new core x86 E820 code x86/boot: Fix pr_debug() API braindamage xen, x86/headers: Add <linux/device.h> dependency to <asm/xen/page.h> x86/boot/e820: Simplify e820__update_table() x86/boot/e820: Separate the E820 ABI structures from the in-kernel structures x86/boot/e820: Fix and clean up e820_type switch() statements x86/boot/e820: Rename the remaining E820 APIs to the e820__*() prefix x86/boot/e820: Remove unnecessary #include's x86/boot/e820: Rename e820_mark_nosave_regions() to e820__register_nosave_regions() x86/boot/e820: Rename e820_reserve_resources*() to e820__reserve_resources*() x86/boot/e820: Use bool in query APIs x86/boot/e820: Document e820__reserve_setup_data() x86/boot/e820: Clean up __e820__update_table() et al ...
2017-05-01Merge branch 'perf-core-for-linus' of ↵Linus Torvalds8-28/+208
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main changes in this cycle were: Kernel side changes: - Kprobes and uprobes changes: - Make their trampolines read-only while they are used - Make UPROBES_EVENTS default-y which is the distro practice - Apply misc fixes and robustization to probe point insertion. - add support for AMD IOMMU events - extend hw events on Intel Goldmont CPUs - ... plus misc fixes and updates. Tooling side changes: - support s390 jump instructions in perf annotate (Christian Borntraeger) - vendor hardware events updates (Andi Kleen) - add argument support for SDT events in powerpc (Ravi Bangoria) - beautify the statx syscall arguments in 'perf trace' (Arnaldo Carvalho de Melo) - handle inline functions in callchains (Jin Yao) - enable sorting by srcline as key (Milian Wolff) - add 'brstackinsn' field in 'perf script' to reuse the x86 instruction decoder used in the Intel PT code to study hot paths to samples (Andi Kleen) - add PERF_RECORD_NAMESPACES so that the kernel can record information required to associate samples to namespaces, helping in container problem characterization. (Hari Bathini) - allow sorting by symbol_size in 'perf report' and 'perf top' (Charles Baylis) - in perf stat, make system wide (-a) the default option if no target was specified and one of following conditions is met: - no workload specified (current behaviour) - a workload is specified but all requested events are system wide ones, like uncore ones. (Jiri Olsa) - ... plus lots of other updates, enhancements, cleanups and fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (235 commits) perf tools: Fix the code to strip command name tools arch x86: Sync cpufeatures.h tools arch: Sync arch/x86/lib/memcpy_64.S with the kernel tools: Update asm-generic/mman-common.h copy from the kernel perf tools: Use just forward declarations for struct thread where possible perf tools: Add the right header to obtain PERF_ALIGN() perf tools: Remove poll.h and wait.h from util.h perf tools: Remove string.h, unistd.h and sys/stat.h from util.h perf tools: Remove stale prototypes from builtin.h perf tools: Remove string.h from util.h perf tools: Remove sys/ioctl.h from util.h perf tools: Remove a few more needless includes from util.h perf tools: Include sys/param.h where needed perf callchain: Move callchain specific routines from util.[ch] perf tools: Add compress.h for the *_decompress_to_file() headers perf mem: Fix display of data source snoop indication perf debug: Move dump_stack() and sighandler_dump_stack() to debug.h perf kvm: Make function only used by 'perf kvm' static perf tools: Move timestamp routines from util.h to time-utils.h perf tools: Move units conversion/formatting routines to separate object ...
2017-05-01Merge branch 'locking-core-for-linus' of ↵Linus Torvalds13-485/+852
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - a big round of FUTEX_UNLOCK_PI improvements, fixes, cleanups and general restructuring - lockdep updates such as new checks for lock_downgrade() - introduce the new atomic_try_cmpxchg() locking API and use it to optimize refcount code generation - ... plus misc fixes, updates and cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) MAINTAINERS: Add FUTEX SUBSYSTEM futex: Clarify mark_wake_futex memory barrier usage futex: Fix small (and harmless looking) inconsistencies futex: Avoid freeing an active timer rtmutex: Plug preempt count leak in rt_mutex_futex_unlock() rtmutex: Fix more prio comparisons rtmutex: Fix PI chain order integrity sched,tracing: Update trace_sched_pi_setprio() sched/rtmutex: Refactor rt_mutex_setprio() rtmutex: Clean up sched/deadline/rtmutex: Dont miss the dl_runtime/dl_period update sched/rtmutex/deadline: Fix a PI crash for deadline tasks rtmutex: Deboost before waking up the top waiter locking/ww-mutex: Limit stress test to 2 seconds locking/atomic: Fix atomic_try_cmpxchg() semantics lockdep: Fix per-cpu static objects futex: Drop hb->lock before enqueueing on the rtmutex futex: Futex_unlock_pi() determinism futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock() futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock() ...
2017-05-01Merge branch 'sched-core-for-linus' of ↵Linus Torvalds8-292/+518
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - another round of rq-clock handling debugging, robustization and fixes - PELT accounting improvements - CPU hotplug related ->cpus_allowed affinity handling fixes all around the tree - ... plus misc fixes, cleanups and updates" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) sched/x86: Update reschedule warning text crypto: N2 - Replace racy task affinity logic cpufreq/sparc-us2e: Replace racy task affinity logic cpufreq/sparc-us3: Replace racy task affinity logic cpufreq/sh: Replace racy task affinity logic cpufreq/ia64: Replace racy task affinity logic ACPI/processor: Replace racy task affinity logic ACPI/processor: Fix error handling in __acpi_processor_start() sparc/sysfs: Replace racy task affinity logic powerpc/smp: Replace open coded task affinity logic ia64/sn/hwperf: Replace racy task affinity logic ia64/salinfo: Replace racy task affinity logic workqueue: Provide work_on_cpu_safe() ia64/topology: Remove cpus_allowed manipulation sched/fair: Move the PELT constants into a generated header sched/fair: Increase PELT accuracy for small tasks sched/fair: Fix comments sched/Documentation: Add 'sched-pelt' tool sched/fair: Fix corner case in __accumulate_sum() sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags() ...
2017-05-01Merge branch 'timers-core-for-linus' of ↵Linus Torvalds14-117/+161
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The timer departement delivers: - more year 2038 rework - a massive rework of the arm achitected timer - preparatory patches to allow NTP correction of clock event devices to avoid early expiry - the usual pile of fixes and enhancements all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (91 commits) timer/sysclt: Restrict timer migration sysctl values to 0 and 1 arm64/arch_timer: Mark errata handlers as __maybe_unused Clocksource/mips-gic: Remove redundant non devicetree init MIPS/Malta: Probe gic-timer via devicetree clocksource: Use GENMASK_ULL in definition of CLOCKSOURCE_MASK acpi/arm64: Add SBSA Generic Watchdog support in GTDT driver clocksource: arm_arch_timer: add GTDT support for memory-mapped timer acpi/arm64: Add memory-mapped timer support in GTDT driver clocksource: arm_arch_timer: simplify ACPI support code. acpi/arm64: Add GTDT table parse driver clocksource: arm_arch_timer: split MMIO timer probing. clocksource: arm_arch_timer: add structs to describe MMIO timer clocksource: arm_arch_timer: move arch_timer_needs_of_probing into DT init call clocksource: arm_arch_timer: refactor arch_timer_needs_probing clocksource: arm_arch_timer: split dt-only rate handling x86/uv/time: Set ->min_delta_ticks and ->max_delta_ticks unicore32/time: Set ->min_delta_ticks and ->max_delta_ticks um/time: Set ->min_delta_ticks and ->max_delta_ticks tile/time: Set ->min_delta_ticks and ->max_delta_ticks score/time: Set ->min_delta_ticks and ->max_delta_ticks ...
2017-05-01Merge branch 'irq-core-for-linus' of ↵Linus Torvalds2-3/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Nothing exciting from the irq side for this merge window: - a new driver for a Mediatek SoC - ACPI support for ARM GICV3 - support for shared nested interrupts - the usual pile of fixes and updates all over te place" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) irqchip/mbigen: Fix return value check in mbigen_device_probe() irqchip/mips-gic: Replace static map with dynamic irqchip/mips-gic: Remove device IRQ domain irqchip/mips-gic: Separate IPI reservation & usage tracking genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs genirq: Use cpumask_available() for check of cpumask variable cpumask: Add helper cpumask_available() irqchip/irq-imx-gpcv2: Clear OF_POPULATED flag irqchip/atmel-aic5: Handle suspend to RAM irqchip: Add Mediatek mtk-cirq driver dt-bindings: mtk-cirq: Add binding document irqchip/gic-v3-its: Add IORT hook for platform MSI support irqchip/mbigen: Add ACPI support irqchip/mbigen: Introduce mbigen_of_create_domain() irqchip/mbigen: Drop module owner platform-msi: Make platform_msi_create_device_domain() ACPI aware irqchip/gicv3-its: platform-msi: Scan MADT to create platform msi domain irqchip/gicv3-its: platform-msi: Refactor its_pmsi_init() to prepare for ACPI irqchip/gicv3-its: platform-msi: Refactor its_pmsi_prepare() irqchip/gic-v3-its: Keep the include header files in alphabetic order ...
2017-05-01Merge branch 'work.uaccess' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess unification updates from Al Viro: "This is the uaccess unification pile. It's _not_ the end of uaccess work, but the next batch of that will go into the next cycle. This one mostly takes copy_from_user() and friends out of arch/* and gets the zero-padding behaviour in sync for all architectures. Dealing with the nocache/writethrough mess is for the next cycle; fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am sold on access_ok() in there, BTW; just not in this pile), same for reducing __copy_... callsites, strn*... stuff, etc. - there will be a pile about as large as this one in the next merge window. This one sat in -next for weeks. -3KLoC" * 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits) HAVE_ARCH_HARDENED_USERCOPY is unconditional now CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now m32r: switch to RAW_COPY_USER hexagon: switch to RAW_COPY_USER microblaze: switch to RAW_COPY_USER get rid of padding, switch to RAW_COPY_USER ia64: get rid of copy_in_user() ia64: sanitize __access_ok() ia64: get rid of 'segment' argument of __do_{get,put}_user() ia64: get rid of 'segment' argument of __{get,put}_user_check() ia64: add extable.h powerpc: get rid of zeroing, switch to RAW_COPY_USER esas2r: don't open-code memdup_user() alpha: fix stack smashing in old_adjtimex(2) don't open-code kernel_setsockopt() mips: switch to RAW_COPY_USER mips: get rid of tail-zeroing in primitives mips: make copy_from_user() zero tail explicitly mips: clean and reorder the forest of macros... mips: consolidate __invoke_... wrappers ...
2017-05-01Merge tag 'pm-4.12-rc1' of ↵Linus Torvalds2-28/+66
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "This time the majority of changes go to the cpufreq subsystem (and to the intel_pstate driver in particular) and there are some updates in the generic power domains framework, cpuidle, tools and a couple of other places. One thing worth mentioning is that the intel_pstate's sysfs interface has been reworked to be more consistent with the general expectations of the cpufreq core and less confusing, hopefully for the better. Also, we have a new cpufreq driver for Tegra186 and new hardware support in intel_pstata and the Mediatek cpufreq driver. Apart from that, the AnalyzeSuspend utility for system suspend profiling gets a companion called AnalyzeBoot for the analogous profiling of system boot and they both go into one place under tools/power/pm-graph/. The rest is mostly fixes, cleanups and code reorganization. Specifics: - Rework the intel_pstate driver's sysfs interface to make it more straightforward and more intuitive (Rafael Wysocki). - Make intel_pstate support all processors which advertise HWP (hardware-managed P-states) to the kernel in all operation modes and make it use the load-based P-state selection algorithm on a wider range of systems in the active mode (Rafael Wysocki). - Add cpufreq driver for Tegra186 (Mikko Perttunen). - Add support for Gemini Lake SoCs to intel_pstate (David Box). - Add support for MT8176 and MT817x to the Mediatek cpufreq driver and clean up that driver a bit (Daniel Kurtz). - Clean up intel_pstate and optimize it slightly (Rafael Wysocki). - Update the schedutil cpufreq governor, mostly to fix a couple of issues with it related to specific workloads, and rework its sysfs tunable and initialization a bit (Rafael Wysocki, Viresh Kumar). - Fix minor issues in the imx6q, dbx500 and qoriq cpufreq drivers (Christophe Jaillet, Irina Tirdea, Leonard Crestez, Viresh Kumar, YuanTian Tang). - Add file patterns for cpufreq DT bindings to MAINTAINERS (Geert Uytterhoeven). - Add support for "always on" power domains to the genpd (generic power domains) framework and clean up that code somewhat (Ulf Hansson, Lina Iyer, Viresh Kumar). - Fix minor issues in the powernv cpuidle driver and clean it up (Anton Blanchard, Gautham Shenoy). - Move the AnalyzeSuspend utility under tools/power/pm-graph/ and add an analogous boot-profiling utility called AnalyzeBoot to it (Todd Brandt). - Add rk3328 support to the rockchip-io AVS (Adaptive Voltage Scaling) driver (David Wu). - Fix minor issues in the cpuidle core, the intel_pstate_tracer utility, the devfreq framework and the PM core documentation (Chanwoo Choi, Doug Smythies, Johan Hovold, Marcin Nowakowski)" * tag 'pm-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (56 commits) PM / runtime: Document autosuspend-helper side effects PM / runtime: Fix autosuspend documentation tools: power: pm-graph: Package makefile and man pages tools: power: pm-graph: AnalyzeBoot v2.0 tools: power: pm-graph: AnalyzeSuspend v4.6 cpufreq: Add Tegra186 cpufreq driver cpufreq: imx6q: Fix error handling code cpufreq: imx6q: Set max suspend_freq to avoid changes during suspend cpufreq: imx6q: Fix handling EPROBE_DEFER from regulator cpuidle: powernv: Avoid a branch in the core snooze_loop() loop cpuidle: powernv: Don't continually set thread priority in snooze_loop() cpuidle: powernv: Don't bounce between low and very low thread priority cpuidle: cpuidle-cps: remove unused variable tools/power/x86/intel_pstate_tracer: Adjust directory ownership cpufreq: schedutil: Use policy-dependent transition delays cpufreq: schedutil: Reduce frequencies slower PM / devfreq: Move struct devfreq_governor to devfreq directory PM / Domains: Ignore domain-idle-states that are not compatible cpufreq: intel_pstate: Add support for Gemini Lake powernv-cpuidle: Validate DT property array size ...
2017-05-01Merge branch 'for-4.12' of ↵Linus Torvalds6-31/+53
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "Nothing major. Two notable fixes are Li's second stab at fixing the long-standing race condition in the mount path and suppression of spurious warning from cgroup_get(). All other changes are trivial" * 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: mark cgroup_get() with __maybe_unused cgroup: avoid attaching a cgroup root to two different superblocks, take 2 cgroup: fix spurious warnings on cgroup_is_dead() from cgroup_sk_alloc() cgroup: move cgroup_subsys_state parent field for cache locality cpuset: Remove cpuset_update_active_cpus()'s parameter. cgroup: switch to BUG_ON() cgroup: drop duplicate header nsproxy.h kernel: convert css_set.refcount from atomic_t to refcount_t kernel: convert cgroup_namespace.count from atomic_t to refcount_t
2017-05-01Merge branch 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-3/+2
Pull workqueue update from Tejun Heo: "One trivial patch to use setup_deferrable_timer() instead of open-coding the initialization" * 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: use setup_deferrable_timer