summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2016-03-17radix-tree: add an explicit include of bitops.hMatthew Wilcox1-0/+1
The radix-tree header uses the __ffs() function, which is defined in bitops.h. The current kernel headers implicitly include bitops.h, but the userspace test harness does not. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17include/linux/list_bl.h: use bool instead of int for boolean functionsChen Gang1-2/+2
hlist_bl_unhashed() and hlist_bl_empty() are all boolean functions, so return bool instead of int. Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17include/uapi/linux/elf-em.h: remove v850Rob Landley1-3/+0
The v850 port was removed by commits f606ddf42fd4 and 07a887d399b8 in 2008. These #defines are not used in the current kernel. Signed-off-by: Rob Landley <rob@landley.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17fix Christoph's email addressesChristoph Lameter1-1/+1
There are various email addresses for me throughout the kernel. Use the one that will always be valid. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17bug: set warn variable before calling WARN()Steven Rostedt1-9/+12
This has hit me a couple of times already. I would be debugging code and the system would simply hang and then reboot. Finally, I found that the problem was caused by WARN_ON_ONCE() and friends. The macro WARN_ON_ONCE(condition) is defined as: static bool __section(.data.unlikely) __warned; int __ret_warn_once = !!(condition); if (unlikely(__ret_warn_once)) if (WARN_ON(!__warned)) __warned = true; unlikely(__ret_warn_once); Which looks great and all. But what I have hit, is an issue when WARN_ON() itself hits the same WARN_ON_ONCE() code. Because, the variable __warned is not yet set. Then it too calls WARN_ON() and that triggers the warning again. It keeps doing this until the stack is overflowed and the system crashes. By setting __warned first before calling WARN_ON() makes the original WARN_ON_ONCE() really only warn once, and not an infinite amount of times if the WARN_ON() also triggers the warning. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17timer: convert timer_slack_ns from unsigned long to u64John Stultz4-9/+11
This patchset introduces a /proc/<pid>/timerslack_ns interface which would allow controlling processes to be able to set the timerslack value on other processes in order to save power by avoiding wakeups (Something Android currently does via out-of-tree patches). The first patch tries to fix the internal timer_slack_ns usage which was defined as a long, which limits the slack range to ~4 seconds on 32bit systems. It converts it to a u64, which provides the same basically unlimited slack (500 years) on both 32bit and 64bit machines. The second patch introduces the /proc/<pid>/timerslack_ns interface which allows the full 64bit slack range for a task to be read or set on both 32bit and 64bit machines. With these two patches, on a 32bit machine, after setting the slack on bash to 10 seconds: $ time sleep 1 real 0m10.747s user 0m0.001s sys 0m0.005s The first patch is a little ugly, since I had to chase the slack delta arguments through a number of functions converting them to u64s. Let me know if it makes sense to break that up more or not. Other than that things are fairly straightforward. This patch (of 2): The timer_slack_ns value in the task struct is currently a unsigned long. This means that on 32bit applications, the maximum slack is just over 4 seconds. However, on 64bit machines, its much much larger (~500 years). This disparity could make application development a little (as well as the default_slack) to a u64. This means both 32bit and 64bit systems have the same effective internal slack range. Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify the interface as a unsigned long, so we preserve that limitation on 32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is actually larger then what can be stored by an unsigned long. This patch also modifies hrtimer functions which specified the slack delta as a unsigned long. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oren Laadan <orenl@cellrox.com> Cc: Ruchi Kandoi <kandoiruchi@google.com> Cc: Rom Lemarchand <romlem@android.com> Cc: Kees Cook <keescook@chromium.org> Cc: Android Kernel Team <kernel-team@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkersKirill A. Shutemov1-4/+6
freeze_page() and unfreeze_page() helpers evolved in rather complex beasts. It would be nice to cut complexity of this code. This patch rewrites freeze_page() using standard try_to_unmap(). unfreeze_page() is rewritten with remove_migration_ptes(). The result is much simpler. But the new variant is somewhat slower for PTE-mapped THPs. Current helpers iterates over VMAs the compound page is mapped to, and then over ptes within this VMA. New helpers iterates over small page, then over VMA the small page mapped to, and only then find relevant pte. We have short cut for PMD-mapped THP: we directly install migration entries on PMD split. I don't think the slowdown is critical, considering how much simpler result is and that split_huge_page() is quite rare nowadays. It only happens due memory pressure or migration. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm: make remove_migration_ptes() beyond mm/migration.cKirill A. Shutemov1-0/+2
Make remove_migration_ptes() available to be used in split_huge_page(). New parameter 'locked' added: as with try_to_umap() we need a way to indicate that caller holds rmap lock. We also shouldn't try to mlock() pte-mapped huge pages: pte-mapeed THP pages are never mlocked. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17rmap: extend try_to_unmap() to be usable by split_huge_page()Kirill A. Shutemov2-0/+10
Add support for two ttu_flags: - TTU_SPLIT_HUGE_PMD would split PMD if it's there, before trying to unmap page; - TTU_RMAP_LOCKED indicates that caller holds relevant rmap lock; Also, change rwc->done to !page_mapcount() instead of !page_mapped(). try_to_unmap() works on pte level, so we are really interested in the mappedness of this small page rather than of the compound page it's a part of. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17rmap: introduce rmap_walk_locked()Kirill A. Shutemov1-0/+1
This patchset rewrites freeze_page() and unfreeze_page() using try_to_unmap() and remove_migration_ptes(). Result is much simpler, but somewhat slower. Migration 8GiB worth of PMD-mapped THP: Baseline 20.21 +/- 0.393 Patched 20.73 +/- 0.082 Slowdown 1.03x It's 3% slower, comparing to 14% in v1. I don't it should be a stopper. Splitting of PTE-mapped pages slowed more. But this is not a common case. Migration 8GiB worth of PMD-mapped THP: Baseline 20.39 +/- 0.225 Patched 22.43 +/- 0.496 Slowdown 1.10x rmap_walk_locked() is the same as rmap_walk(), but the caller takes care of the relevant rmap lock. This is preparation for switching THP splitting from custom rmap walk in freeze_page()/unfreeze_page() to the generic one. There is no support for KSM pages for now: not clear which lock is implied. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm: remove VM_FAULT_MINORJan Kara1-2/+0
The define has a comment from Nick Piggin from 2007: /* For backwards compat. Remove me quickly. */ I guess 9 years should not be too hurried sense of 'quickly' even for kernel measures. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm: exclude ZONE_DEVICE from GFP_ZONE_TABLEDan Williams2-13/+22
ZONE_DEVICE (merged in 4.3) and ZONE_CMA (proposed) are examples of new mm zones that are bumping up against the current maximum limit of 4 zones, i.e. 2 bits in page->flags for the GFP_ZONE_TABLE. The GFP_ZONE_TABLE poses an interesting constraint since include/linux/gfp.h gets included by the 32-bit portion of a 64-bit build. We need to be careful to only build the table for zones that have a corresponding gfp_t flag. GFP_ZONES_SHIFT is introduced for this purpose. This patch does not attempt to solve the problem of adding a new zone that also has a corresponding GFP_ flag. Vlastimil points out that ZONE_DEVICE, by depending on x86_64 and SPARSEMEM_VMEMMAP implies that SECTIONS_WIDTH is zero. In other words even though ZONE_DEVICE does not fit in GFP_ZONE_TABLE it is free to consume another bit in page->flags (expand ZONES_WIDTH) with room to spare. Link: https://bugzilla.kernel.org/show_bug.cgi?id=110931 Fixes: 033fbae988fc ("mm: ZONE_DEVICE for "device memory"") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: Mark <markk@clara.co.uk> Reported-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm/page_ref: add tracepoint to track down page reference manipulationJoonsoo Kim2-5/+227
CMA allocation should be guaranteed to succeed by definition, but, unfortunately, it would be failed sometimes. It is hard to track down the problem, because it is related to page reference manipulation and we don't have any facility to analyze it. This patch adds tracepoints to track down page reference manipulation. With it, we can find exact reason of failure and can fix the problem. Following is an example of tracepoint output. (note: this example is stale version that printing flags as the number. Recent version will print it as human readable string.) <...>-9018 [004] 92.678375: page_ref_set: pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1 <...>-9018 [004] 92.678378: kernel_stack: => get_page_from_freelist (ffffffff81176659) => __alloc_pages_nodemask (ffffffff81176d22) => alloc_pages_vma (ffffffff811bf675) => handle_mm_fault (ffffffff8119e693) => __do_page_fault (ffffffff810631ea) => trace_do_page_fault (ffffffff81063543) => do_async_page_fault (ffffffff8105c40a) => async_page_fault (ffffffff817581d8) [snip] <...>-9018 [004] 92.678379: page_ref_mod: pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1 [snip] ... ... <...>-9131 [001] 93.174468: test_pages_isolated: start_pfn=0x17800 end_pfn=0x17c00 fin_pfn=0x17ac9 ret=fail [snip] <...>-9018 [004] 93.174843: page_ref_mod_and_test: pfn=0x17ac9 flags=0x40068 count=0 mapcount=0 mapping=0xffff880015a78dc1 mt=4 val=-1 ret=1 => release_pages (ffffffff8117c9e4) => free_pages_and_swap_cache (ffffffff811b0697) => tlb_flush_mmu_free (ffffffff81199616) => tlb_finish_mmu (ffffffff8119a62c) => exit_mmap (ffffffff811a53f7) => mmput (ffffffff81073f47) => do_exit (ffffffff810794e9) => do_group_exit (ffffffff81079def) => SyS_exit_group (ffffffff81079e74) => entry_SYSCALL_64_fastpath (ffffffff817560b6) This output shows that problem comes from exit path. In exit path, to improve performance, pages are not freed immediately. They are gathered and processed by batch. During this process, migration cannot be possible and CMA allocation is failed. This problem is hard to find without this page reference tracepoint facility. Enabling this feature bloat kernel text 30 KB in my configuration. text data bss dec hex filename 12127327 2243616 1507328 15878271 f2487f vmlinux_disabled 12157208 2258880 1507328 15923416 f2f8d8 vmlinux_enabled Note that, due to header file dependency problem between mm.h and tracepoint.h, this feature has to open code the static key functions for tracepoints. Proposed by Steven Rostedt in following link. https://lkml.org/lkml/2015/12/9/699 [arnd@arndb.de: crypto/async_pq: use __free_page() instead of put_page()] [iamjoonsoo.kim@lge.com: fix build failure for xtensa] [akpm@linux-foundation.org: tweak Kconfig text, per Vlastimil] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm: introduce page reference manipulation functionsJoonsoo Kim3-35/+94
The success of CMA allocation largely depends on the success of migration and key factor of it is page reference count. Until now, page reference is manipulated by direct calling atomic functions so we cannot follow up who and where manipulate it. Then, it is hard to find actual reason of CMA allocation failure. CMA allocation should be guaranteed to succeed so finding offending place is really important. In this patch, call sites where page reference is manipulated are converted to introduced wrapper function. This is preparation step to add tracepoint to each page reference manipulation function. With this facility, we can easily find reason of CMA allocation failure. There is no functional change in this patch. In addition, this patch also converts reference read sites. It will help a second step that renames page._count to something else and prevents later attempt to direct access to it (Suggested by Andrew). Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm: thp: set THP defrag by default to madvise and add a stall-free defrag optionMel Gorman2-8/+3
THP defrag is enabled by default to direct reclaim/compact but not wake kswapd in the event of a THP allocation failure. The problem is that THP allocation requests potentially enter reclaim/compaction. This potentially incurs a severe stall that is not guaranteed to be offset by reduced TLB misses. While there has been considerable effort to reduce the impact of reclaim/compaction, it is still a high cost and workloads that should fit in memory fail to do so. Specifically, a simple anon/file streaming workload will enter direct reclaim on NUMA at least even though the working set size is 80% of RAM. It's been years and it's time to throw in the towel. First, this patch defines THP defrag as follows; madvise: A failed allocation will direct reclaim/compact if the application requests it never: Neither reclaim/compact nor wake kswapd defer: A failed allocation will wake kswapd/kcompactd always: A failed allocation will direct reclaim/compact (historical behaviour) khugepaged defrag will enter direct/reclaim but not wake kswapd. Next it sets the default defrag option to be "madvise" to only enter direct reclaim/compaction for applications that specifically requested it. Lastly, it removes a check from the page allocator slowpath that is related to __GFP_THISNODE to allow "defer" to work. The callers that really cares are slub/slab and they are updated accordingly. The slab one may be surprising because it also corrects a comment as kswapd was never woken up by that path. This means that a THP fault will no longer stall for most applications by default and the ideal for most users that get THP if they are immediately available. There are still options for users that prefer a stall at startup of a new application by either restoring historical behaviour with "always" or pick a half-way point with "defer" where kswapd does some of the work in the background and wakes kcompactd if necessary. THP defrag for khugepaged remains enabled and will enter direct/reclaim but no wakeup kswapd or kcompactd. After this patch a THP allocation failure will quickly fallback and rely on khugepaged to recover the situation at some time in the future. In some cases, this will reduce THP usage but the benefit of THP is hard to measure and not a universal win where as a stall to reclaim/compaction is definitely measurable and can be painful. The first test for this is using "usemem" to read a large file and write a large anonymous mapping (to avoid the zero page) multiple times. The total size of the mappings is 80% of RAM and the benchmark simply measures how long it takes to complete. It uses multiple threads to see if that is a factor. On UMA, the performance is almost identical so is not reported but on NUMA, we see this usemem 4.4.0 4.4.0 kcompactd-v1r1 nodefrag-v1r3 Amean System-1 102.86 ( 0.00%) 46.81 ( 54.50%) Amean System-4 37.85 ( 0.00%) 34.02 ( 10.12%) Amean System-7 48.12 ( 0.00%) 46.89 ( 2.56%) Amean System-12 51.98 ( 0.00%) 56.96 ( -9.57%) Amean System-21 80.16 ( 0.00%) 79.05 ( 1.39%) Amean System-30 110.71 ( 0.00%) 107.17 ( 3.20%) Amean System-48 127.98 ( 0.00%) 124.83 ( 2.46%) Amean Elapsd-1 185.84 ( 0.00%) 105.51 ( 43.23%) Amean Elapsd-4 26.19 ( 0.00%) 25.58 ( 2.33%) Amean Elapsd-7 21.65 ( 0.00%) 21.62 ( 0.16%) Amean Elapsd-12 18.58 ( 0.00%) 17.94 ( 3.43%) Amean Elapsd-21 17.53 ( 0.00%) 16.60 ( 5.33%) Amean Elapsd-30 17.45 ( 0.00%) 17.13 ( 1.84%) Amean Elapsd-48 15.40 ( 0.00%) 15.27 ( 0.82%) For a single thread, the benchmark completes 43.23% faster with this patch applied with smaller benefits as the thread increases. Similar, notice the large reduction in most cases in system CPU usage. The overall CPU time is 4.4.0 4.4.0 kcompactd-v1r1 nodefrag-v1r3 User 10357.65 10438.33 System 3988.88 3543.94 Elapsed 2203.01 1634.41 Which is substantial. Now, the reclaim figures 4.4.0 4.4.0 kcompactd-v1r1nodefrag-v1r3 Minor Faults 128458477 278352931 Major Faults 2174976 225 Swap Ins 16904701 0 Swap Outs 17359627 0 Allocation stalls 43611 0 DMA allocs 0 0 DMA32 allocs 19832646 19448017 Normal allocs 614488453 580941839 Movable allocs 0 0 Direct pages scanned 24163800 0 Kswapd pages scanned 0 0 Kswapd pages reclaimed 0 0 Direct pages reclaimed 20691346 0 Compaction stalls 42263 0 Compaction success 938 0 Compaction failures 41325 0 This patch eliminates almost all swapping and direct reclaim activity. There is still overhead but it's from NUMA balancing which does not identify that it's pointless trying to do anything with this workload. I also tried the thpscale benchmark which forces a corner case where compaction can be used heavily and measures the latency of whether base or huge pages were used thpscale Fault Latencies 4.4.0 4.4.0 kcompactd-v1r1 nodefrag-v1r3 Amean fault-base-1 5288.84 ( 0.00%) 2817.12 ( 46.73%) Amean fault-base-3 6365.53 ( 0.00%) 3499.11 ( 45.03%) Amean fault-base-5 6526.19 ( 0.00%) 4363.06 ( 33.15%) Amean fault-base-7 7142.25 ( 0.00%) 4858.08 ( 31.98%) Amean fault-base-12 13827.64 ( 0.00%) 10292.11 ( 25.57%) Amean fault-base-18 18235.07 ( 0.00%) 13788.84 ( 24.38%) Amean fault-base-24 21597.80 ( 0.00%) 24388.03 (-12.92%) Amean fault-base-30 26754.15 ( 0.00%) 19700.55 ( 26.36%) Amean fault-base-32 26784.94 ( 0.00%) 19513.57 ( 27.15%) Amean fault-huge-1 4223.96 ( 0.00%) 2178.57 ( 48.42%) Amean fault-huge-3 2194.77 ( 0.00%) 2149.74 ( 2.05%) Amean fault-huge-5 2569.60 ( 0.00%) 2346.95 ( 8.66%) Amean fault-huge-7 3612.69 ( 0.00%) 2997.70 ( 17.02%) Amean fault-huge-12 3301.75 ( 0.00%) 6727.02 (-103.74%) Amean fault-huge-18 6696.47 ( 0.00%) 6685.72 ( 0.16%) Amean fault-huge-24 8000.72 ( 0.00%) 9311.43 (-16.38%) Amean fault-huge-30 13305.55 ( 0.00%) 9750.45 ( 26.72%) Amean fault-huge-32 9981.71 ( 0.00%) 10316.06 ( -3.35%) The average time to fault pages is substantially reduced in the majority of caseds but with the obvious caveat that fewer THPs are actually used in this adverse workload 4.4.0 4.4.0 kcompactd-v1r1 nodefrag-v1r3 Percentage huge-1 0.71 ( 0.00%) 14.04 (1865.22%) Percentage huge-3 10.77 ( 0.00%) 33.05 (206.85%) Percentage huge-5 60.39 ( 0.00%) 38.51 (-36.23%) Percentage huge-7 45.97 ( 0.00%) 34.57 (-24.79%) Percentage huge-12 68.12 ( 0.00%) 40.07 (-41.17%) Percentage huge-18 64.93 ( 0.00%) 47.82 (-26.35%) Percentage huge-24 62.69 ( 0.00%) 44.23 (-29.44%) Percentage huge-30 43.49 ( 0.00%) 55.38 ( 27.34%) Percentage huge-32 50.72 ( 0.00%) 51.90 ( 2.35%) 4.4.0 4.4.0 kcompactd-v1r1nodefrag-v1r3 Minor Faults 37429143 47564000 Major Faults 1916 1558 Swap Ins 1466 1079 Swap Outs 2936863 149626 Allocation stalls 62510 3 DMA allocs 0 0 DMA32 allocs 6566458 6401314 Normal allocs 216361697 216538171 Movable allocs 0 0 Direct pages scanned 25977580 17998 Kswapd pages scanned 0 3638931 Kswapd pages reclaimed 0 207236 Direct pages reclaimed 8833714 88 Compaction stalls 103349 5 Compaction success 270 4 Compaction failures 103079 1 Note again that while this does swap as it's an aggressive workload, the direct relcim activity and allocation stalls is substantially reduced. There is some kswapd activity but ftrace showed that the kswapd activity was due to normal wakeups from 4K pages being allocated. Compaction-related stalls and activity are almost eliminated. I also tried the stutter benchmark. For this, I do not have figures for NUMA but it's something that does impact UMA so I'll report what is available stutter 4.4.0 4.4.0 kcompactd-v1r1 nodefrag-v1r3 Min mmap 7.3571 ( 0.00%) 7.3438 ( 0.18%) 1st-qrtle mmap 7.5278 ( 0.00%) 17.9200 (-138.05%) 2nd-qrtle mmap 7.6818 ( 0.00%) 21.6055 (-181.25%) 3rd-qrtle mmap 11.0889 ( 0.00%) 21.8881 (-97.39%) Max-90% mmap 27.8978 ( 0.00%) 22.1632 ( 20.56%) Max-93% mmap 28.3202 ( 0.00%) 22.3044 ( 21.24%) Max-95% mmap 28.5600 ( 0.00%) 22.4580 ( 21.37%) Max-99% mmap 29.6032 ( 0.00%) 25.5216 ( 13.79%) Max mmap 4109.7289 ( 0.00%) 4813.9832 (-17.14%) Mean mmap 12.4474 ( 0.00%) 19.3027 (-55.07%) This benchmark is trying to fault an anonymous mapping while there is a heavy IO load -- a scenario that desktop users used to complain about frequently. This shows a mix because the ideal case of mapping with THP is not hit as often. However, note that 99% of the mappings complete 13.79% faster. The CPU usage here is particularly interesting 4.4.0 4.4.0 kcompactd-v1r1nodefrag-v1r3 User 67.50 0.99 System 1327.88 91.30 Elapsed 2079.00 2128.98 And once again we look at the reclaim figures 4.4.0 4.4.0 kcompactd-v1r1nodefrag-v1r3 Minor Faults 335241922 1314582827 Major Faults 715 819 Swap Ins 0 0 Swap Outs 0 0 Allocation stalls 532723 0 DMA allocs 0 0 DMA32 allocs 1822364341 1177950222 Normal allocs 1815640808 1517844854 Movable allocs 0 0 Direct pages scanned 21892772 0 Kswapd pages scanned 20015890 41879484 Kswapd pages reclaimed 19961986 41822072 Direct pages reclaimed 21892741 0 Compaction stalls 1065755 0 Compaction success 514 0 Compaction failures 1065241 0 Allocation stalls and all direct reclaim activity is eliminated as well as compaction-related stalls. THP gives impressive gains in some cases but only if they are quickly available. We're not going to reach the point where they are completely free so lets take the costs out of the fast paths finally and defer the cost to kswapd, kcompactd and khugepaged where it belongs. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm: remove unnecessary description about a non-exist gfp flagSatoru Takeuchi1-2/+0
Since __GFP_NOACCOUNT was removed by commit 20b5c3039863 ("Revert 'gfp: add __GFP_NOACCOUNT'"), its description is not necessary. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm: scale kswapd watermarks in proportion to memoryJohannes Weiner2-0/+3
In machines with 140G of memory and enterprise flash storage, we have seen read and write bursts routinely exceed the kswapd watermarks and cause thundering herds in direct reclaim. Unfortunately, the only way to tune kswapd aggressiveness is through adjusting min_free_kbytes - the system's emergency reserves - which is entirely unrelated to the system's latency requirements. In order to get kswapd to maintain a 250M buffer of free memory, the emergency reserves need to be set to 1G. That is a lot of memory wasted for no good reason. On the other hand, it's reasonable to assume that allocation bursts and overall allocation concurrency scale with memory capacity, so it makes sense to make kswapd aggressiveness a function of that as well. Change the kswapd watermark scale factor from the currently fixed 25% of the tunable emergency reserve to a tunable 0.1% of memory. Beyond 1G of memory, this will produce bigger watermark steps than the current formula in default settings. Ensure that the new formula never chooses steps smaller than that, i.e. 25% of the emergency reserve. On a 140G machine, this raises the default watermark steps - the distance between min and low, and low and high - from 16M to 143M. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm: cleanup *pte_alloc* interfacesKirill A. Shutemov1-9/+8
There are few things about *pte_alloc*() helpers worth cleaning up: - 'vma' argument is unused, let's drop it; - most __pte_alloc() callers do speculative check for pmd_none(), before taking ptl: let's introduce pte_alloc() macro which does the check. The only direct user of __pte_alloc left is userfaultfd, which has different expectation about atomicity wrt pmd. - pte_alloc_map() and pte_alloc_map_lock() are redefined using pte_alloc(). [sudeep.holla@arm.com: fix build for arm64 hugetlbpage] [sfr@canb.auug.org.au: fix arch/arm/mm/mmu.c some more] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17virtio_balloon: export 'available' memory to balloon statisticsIgor Redko1-1/+2
Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory statistics protocol, corresponding to 'Available' in /proc/meminfo. It indicates to the hypervisor how big the balloon can be inflated without pushing the guest system to swap. Signed-off-by: Igor Redko <redkoi@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm/page_alloc.c: calculate 'available' memory in a separate functionIgor Redko1-0/+1
Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory statistics protocol, corresponding to 'Available' in /proc/meminfo. It indicates to the hypervisor how big the balloon can be inflated without pushing the guest system to swap. This metric would be very useful in VM orchestration software to improve memory management of different VMs under overcommit. This patch (of 2): Factor out calculation of the available memory counter into a separate exportable function, in order to be able to use it in other parts of the kernel. In particular, it appears a relevant metric to report to the hypervisor via virtio-balloon statistics interface (in a followup patch). Signed-off-by: Igor Redko <redkoi@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm/thp/migration: switch from flush_tlb_range to flush_pmd_tlb_rangeAneesh Kumar K.V1-0/+17
We remove one instace of flush_tlb_range here. That was added by commit f714f4f20e59 ("mm: numa: call MMU notifiers on THP migration"). But the pmdp_huge_clear_flush_notify should have done the require flush for us. Hence remove the extra flush. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm, tracing: refresh __def_vmaflag_namesKirill A. Shutemov2-7/+17
Get list of VMA flags up-to-date and sort it to match VM_* definition order. [vbabka@suse.cz: add a note above vmaflag definitions to update the names when changing] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm: move max_map_count bits into mm.hAndrey Ryabinin2-21/+21
max_map_count sysctl unrelated to scheduler. Move its bits from include/linux/sched/sysctl.h to include/linux/mm.h. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17thp, vmstats: count deferred split eventsKirill A. Shutemov1-0/+1
Count how many times we put a THP in split queue. Currently, it happens on partial unmap of a THP. Rapidly growing value can indicate that an application behaves unfriendly wrt THP: often fault in huge page and then unmap part of it. This leads to unnecessary memory fragmentation and the application may require tuning. The event also can help with debugging kernel [mis-]behaviour. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm: workingset: make shadow node shrinker memcg awareVladimir Davydov1-0/+10
Workingset code was recently made memcg aware, but shadow node shrinker is still global. As a result, one small cgroup can consume all memory available for shadow nodes, possibly hurting other cgroups by reclaiming their shadow nodes, even though reclaim distances stored in its shadow nodes have no effect. To avoid this, we need to make shadow node shrinker memcg aware. Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm: memcontrol: zap memcg_kmem_online helperVladimir Davydov1-10/+0
As kmem accounting is now either enabled for all cgroups or disabled system-wide, there's no point in having memcg_kmem_online() helper - instead one can use memcg_kmem_enabled() and mem_cgroup_online(), as shrink_slab() now does. There are only two places left where this helper is used - __memcg_kmem_charge() and memcg_create_kmem_cache(). The former can only be called if memcg_kmem_enabled() returned true. Since the cgroup it operates on is online, mem_cgroup_is_root() check will be enough. memcg_create_kmem_cache() can't use mem_cgroup_online() helper instead of memcg_kmem_online(), because it relies on the fact that in memcg_offline_kmem() memcg->kmem_state is changed before memcg_deactivate_kmem_caches() is called, but there we can just open-code the check. Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17include/linux/page-flags.h: force inlining of selected page flag modificationsDenys Vlasenko1-15/+15
Sometimes gcc mysteriously doesn't inline very small functions we expect to be inlined. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 With this .config: http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os, the following functions get deinlined many times. Examples of disassembly: <SetPageUptodate> (43 copies, 141 calls): 55 push %rbp 48 89 e5 mov %rsp,%rbp f0 80 0f 08 lock orb $0x8,(%rdi) 5d pop %rbp c3 retq <PagePrivate> (10 copies, 134 calls): 48 8b 07 mov (%rdi),%rax 55 push %rbp 48 89 e5 mov %rsp,%rbp 48 c1 e8 0b shr $0xb,%rax 83 e0 01 and $0x1,%eax 5d pop %rbp c3 retq This patch fixes this via s/inline/__always_inline/. Code size decrease after the patch is ~7k: text data bss dec hex filename 92125002 20826048 36417536 149368586 8e72f0a vmlinux 92118087 20826112 36417536 149361735 8e71447 vmlinux7_pageops_after Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Graf <tgraf@suug.ch> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17bufferhead: force inlining of buffer head flag operationsDenys Vlasenko1-5/+5
With both gcc 4.7.2 and 4.9.2, sometimes gcc mysteriously doesn't inline very small functions we expect to be inlined. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 With this .config: http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os, set_buffer_foo(), clear_buffer_foo() and similar functions get deinlined about 60 times. Examples of disassembly: <set_buffer_mapped> (14 copies, 43 calls): 55 push %rbp 48 89 e5 mov %rsp,%rbp f0 80 0f 20 lock orb $0x20,(%rdi) 5d pop %rbp c3 retq <buffer_mapped> (3 copies, 34 calls): 48 8b 07 mov (%rdi),%rax 55 push %rbp 48 89 e5 mov %rsp,%rbp 48 c1 e8 05 shr $0x5,%rax 83 e0 01 and $0x1,%eax 5d pop %rbp c3 retq <set_buffer_new> (5 copies, 13 calls): 55 push %rbp 48 89 e5 mov %rsp,%rbp f0 80 0f 40 lock orb $0x40,(%rdi) 5d pop %rbp c3 retq This patch fixes this via s/inline/__always_inline/. This decreases vmlinux by about 3 kbytes. text data bss dec hex filename 88200439 19905208 36421632 144527279 89d4faf vmlinux2 88197239 19905240 36421632 144524111 89d434f vmlinux Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Graf <tgraf@suug.ch> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm, compaction: introduce kcompactdVlastimil Babka4-0/+78
Memory compaction can be currently performed in several contexts: - kswapd balancing a zone after a high-order allocation failure - direct compaction to satisfy a high-order allocation, including THP page fault attemps - khugepaged trying to collapse a hugepage - manually from /proc The purpose of compaction is two-fold. The obvious purpose is to satisfy a (pending or future) high-order allocation, and is easy to evaluate. The other purpose is to keep overal memory fragmentation low and help the anti-fragmentation mechanism. The success wrt the latter purpose is more The current situation wrt the purposes has a few drawbacks: - compaction is invoked only when a high-order page or hugepage is not available (or manually). This might be too late for the purposes of keeping memory fragmentation low. - direct compaction increases latency of allocations. Again, it would be better if compaction was performed asynchronously to keep fragmentation low, before the allocation itself comes. - (a special case of the previous) the cost of compaction during THP page faults can easily offset the benefits of THP. - kswapd compaction appears to be complex, fragile and not working in some scenarios. It could also end up compacting for a high-order allocation request when it should be reclaiming memory for a later order-0 request. To improve the situation, we should be able to benefit from an equivalent of kswapd, but for compaction - i.e. a background thread which responds to fragmentation and the need for high-order allocations (including hugepages) somewhat proactively. One possibility is to extend the responsibilities of kswapd, which could however complicate its design too much. It should be better to let kswapd handle reclaim, as order-0 allocations are often more critical than high-order ones. Another possibility is to extend khugepaged, but this kthread is a single instance and tied to THP configs. This patch goes with the option of a new set of per-node kthreads called kcompactd, and lays the foundations, without introducing any new tunables. The lifecycle mimics kswapd kthreads, including the memory hotplug hooks. For compaction, kcompactd uses the standard compaction_suitable() and ompact_finished() criteria and the deferred compaction functionality. Unlike direct compaction, it uses only sync compaction, as there's no allocation latency to minimize. This patch doesn't yet add a call to wakeup_kcompactd. The kswapd compact/reclaim loop for high-order pages will be replaced by waking up kcompactd in the next patch with the description of what's wrong with the old approach. Waking up of the kcompactd threads is also tied to kswapd activity and follows these rules: - we don't want to affect any fastpaths, so wake up kcompactd only from the slowpath, as it's done for kswapd - if kswapd is doing reclaim, it's more important than compaction, so don't invoke kcompactd until kswapd goes to sleep - the target order used for kswapd is passed to kcompactd Future possible future uses for kcompactd include the ability to wake up kcompactd on demand in special situations, such as when hugepages are not available (currently not done due to __GFP_NO_KSWAPD) or when a fragmentation event (i.e. __rmqueue_fallback()) occurs. It's also possible to perform periodic compaction with kcompactd. [arnd@arndb.de: fix build errors with kcompactd] [paul.gortmaker@windriver.com: don't use modular references for non modular code] Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17/proc/kpageflags: return KPF_BUDDY for "tail" buddy pagesNaoya Horiguchi1-0/+2
Currently /proc/kpageflags returns nothing for "tail" buddy pages, which is inconvenient when grasping how free pages are distributed. This patch sets KPF_BUDDY for such pages. With this patch: $ grep MemFree /proc/meminfo ; tools/vm/page-types -b buddy MemFree: 3134992 kB flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000400 779272 3044 __________B_______________________________ buddy 0x0000000000000c00 4385 17 __________BM______________________________ buddy,mmap total 783657 3061 783657 pages is 3134628 kB (roughly consistent with the global counter,) so it's OK. [akpm@linux-foundation.org: update comment, per Naoya] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm: memcontrol: report kernel stack usage in cgroup2 memory.statVladimir Davydov1-1/+2
Show how much memory is allocated to kernel stacks. Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17mm: memcontrol: report slab usage in cgroup2 memory.statVladimir Davydov1-0/+21
Show how much memory is used for storing reclaimable and unreclaimable in-kernel data structures allocated from slab caches. Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-16Merge tag 'fbdev-4.6' of ↵Linus Torvalds1-3/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux Pull fbdev updates from Tomi Valkeinen: - Miscallaneous small fixes to various fbdev drivers - Remove fb_rotate, which was never used - pmag fb improvements * tag 'fbdev-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: (21 commits) xen kconfig: don't "select INPUT_XEN_KBDDEV_FRONTEND" video: fbdev: sis: remove unused variable drivers/video: make fbdev/sunxvr2500.c explicitly non-modular drivers/video: make fbdev/sunxvr1000.c explicitly non-modular drivers/video: make fbdev/sunxvr500.c explicitly non-modular video: exynos: fix modular build fbdev: da8xx-fb: fix videomodes of lcd panels fbdev: kill fb_rotate video: fbdev: bt431: Correct cursor format control macro video: fbdev: pmag-ba-fb: Optimize Bt455 colormap addressing video: fbdev: pmag-ba-fb: Fix and rework Bt455 colormap handling video: fbdev: bt455: Remove unneeded colormap helpers for cursor support video: fbdev: pmag-aa-fb: Report video timings video: fbdev: pmag-aa-fb: Enable building as a module video: fbdev: pmag-aa-fb: Adapt to current APIs video: fbdev: pmag-ba-fb: Fix the lower margin size fbdev: sh_mobile_lcdc: Use ARCH_RENESAS fbdev: n411: check return value fbdev: exynos: fix IS_ERR_VALUE usage video: Use bool instead int pointer for get_opt_bool() argument ...
2016-03-16Merge tag 'media/v4.6-1' of ↵Linus Torvalds15-46/+548
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - Added support for some new video formats - mn88473 DVB frontend driver got promoted from staging - several improvements at the VSP1 driver - several cleanups and improvements at the Media Controller - added Media Controller support to snd-usb-audio. Currently, enabled only for au0828-based V4L2/DVB boards - Several improvements at nuvoton-cir: it now supports wake up codes - Add media controller support to em28xx and saa7134 drivers - coda driver now accepts NXP distributed firmware files - Some legacy SoC camera drivers will be moving to staging, as they're outdated and nobody so far is willing to fix and convert them to use the current media framework - As usual, lots of cleanups, improvements and new board additions. * tag 'media/v4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (381 commits) media: au0828 disable tuner to demod link in au0828_media_device_register() [media] touptek: cast char types on %x printk [media] touptek: don't DMA at the stack [media] mceusb: use %*ph for small buffer dumps [media] v4l: exynos4-is: Drop unneeded check when setting up fimc-lite links [media] v4l: vsp1: Check if an entity is a subdev with the right function [media] hide unused functions for !MEDIA_CONTROLLER [media] em28xx: fix Terratec Grabby AC97 codec detection [media] media: add prefixes to interface types [media] media: rc: nuvoton: switch attribute wakeup_data to text [media] v4l2-ioctl: fix YUV422P pixel format description [media] media: fix null pointer dereference in v4l_vb2q_enable_media_source() [media] v4l2-mc.h: fix yet more compiler errors [media] staging/media: add missing TODO files [media] media.h: always start with 1 for the audio entities [media] sound/usb: Use meaninful names for goto labels [media] v4l2-mc.h: fix compiler warnings [media] media: au0828 audio mixer isn't connected to decoder [media] sound/usb: Use Media Controller API to share media resources [media] dw2102: add support for TeVii S662 ...
2016-03-16Merge tag 'libnvdimm-for-4.6' of ↵Linus Torvalds5-1/+44
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: - Asynchronous address range scrub: Given the capacities of next generation persistent memory devices a scrub operation to find all poison may take 10s of seconds. We want this scrub work to be done asynchronously with the rest of system initialization, so we move it out of line from the NFIT probing, i.e. acpi_nfit_add(). - Clear poison: ACPI 6.1 introduces the ability to send "clear error" commands to the ACPI0012:00 device representing the root of an "nvdimm bus". Similar to relocating a bad block on a disk, this support clears media errors in response to a write. - Persistent memory resource tracking: A persistent memory range may be designated as simply "reserved" by platform firmware in the efi/e820 memory map. Later when the NFIT driver loads it discovers that the range is "Persistent Memory". The NFIT bus driver inserts a resource to advertise that "persistent" attribute in the system resource tree for /proc/iomem and kernel-internal usages. - Miscellaneous cleanups and fixes: Workaround section misaligned pmem ranges when allocating a struct page memmap, fix handling of the read-only case in the ioctl path, and clean up block device major number allocation. * tag 'libnvdimm-for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits) libnvdimm, pmem: clear poison on write libnvdimm, pmem: fix kmap_atomic() leak in error path nvdimm/btt: don't allocate unused major device number nvdimm/blk: don't allocate unused major device number pmem: don't allocate unused major device number ACPI: Change NFIT driver to insert new resource resource: Export insert_resource and remove_resource resource: Add remove_resource interface resource: Change __request_region to inherit from immediate parent libnvdimm, pmem: fix ia64 build, use PHYS_PFN nfit, libnvdimm: clear poison command support libnvdimm, pfn: 'resource'-address and 'size' attributes for pfn devices libnvdimm, pmem: adjust for section collisions with 'System RAM' libnvdimm, pmem: fix 'pfn' support for section-misaligned namespaces libnvdimm: Fix security issue with DSM IOCTL. libnvdimm: Clean-up access mode check. tools/testing/nvdimm: expand ars unit testing nfit: disable userspace initiated ars during scrub nfit: scrub and register regions in a workqueue nfit, libnvdimm: async region scrub workqueue ...
2016-03-16Merge tag 'dm-4.6-changes' of ↵Linus Torvalds1-3/+12
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - Most attention this cycle went to optimizing blk-mq request-based DM (dm-mq) that is used exclussively by DM multipath: - A stable fix for dm-mq that eliminates excessive context switching offers the biggest performance improvement (for both IOPs and throughput). - But more work is needed, during the next cycle, to reduce spinlock contention in DM multipath on large NUMA systems. - A stable fix for a NULL pointer seen when DM stats is enabled on a DM multipath device that must requeue an IO due to path failure. - A stable fix for DM snapshot to disallow the COW and origin devices from being identical. This amounts to graceful failure in the face of userspace error because these devices shouldn't ever be identical. - Stable fixes for DM cache and DM thin provisioning to address crashes seen if/when their respective metadata device experiences failures that cause the transition to 'fail_io' mode. - The DM cache 'mq' policy is now an alias for the 'smq' policy. The 'smq' policy proved to be consistently better than 'mq'. As such 'mq', with all its complex user-facing tunables, has been eliminated. - Improve DM thin provisioning to consistently return -ENOSPC once the thin-pool's data volume is out of space. - Improve DM core to properly handle error propagation if bio_integrity_clone() fails in clone_bio(). - Other small cleanups and improvements to DM core. * tag 'dm-4.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (41 commits) dm: fix rq_end_stats() NULL pointer in dm_requeue_original_request() dm thin: consistently return -ENOSPC if pool has run out of data space dm cache: bump the target version dm cache: make sure every metadata function checks fail_io dm: add missing newline between DM_DEBUG_BLOCK_STACK_TRACING and DM_BUFIO dm cache policy smq: clarify that mq registration failure was for 'mq' dm: return error if bio_integrity_clone() fails in clone_bio() dm thin metadata: don't issue prefetches if a transaction abort has failed dm snapshot: disallow the COW and origin devices from being identical dm cache: make the 'mq' policy an alias for 'smq' dm: drop unnecessary assignment of md->queue dm: reorder 'struct mapped_device' members to fix alignment and holes dm: remove dummy definition of 'struct dm_table' dm: add 'dm_numa_node' module parameter dm thin metadata: remove needless newline from subtree_dec() DMERR message dm mpath: cleanup reinstate_path() et al based on code review dm mpath: remove __pgpath_busy forward declaration, rename to pgpath_busy dm mpath: switch from 'unsigned' to 'bool' for flags where appropriate dm round robin: use percpu 'repeat_count' and 'current_path' dm path selector: remove 'repeat_count' return from .select_path hook ...
2016-03-16Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds5-0/+20
Pull SCSI updates from James Bottomley: "This pull includes driver updates from the usual suspects (stex, hpsa, ncr5380, scsi_dh, qla2xxx, be2iscsi, hisi_sas, cxlflash, aacraid, mp3sas, megaraid_sas, ibmvscsi, ufs) plus an assortment of miscellaneous fixes. The major user visible change of this pull is that we've moved from monotonically increasing host number to an ida allocated one (meaning the numbers get re-used) because someone managed to wrap the count in an iscsi system. We don't believe there will be any adverse consequences of this" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (230 commits) MAINTAINERS: use new email address for James Bottomley mpt3sas: Remove unnecessary synchronize_irq() before free_irq() sg: fix dxferp in from_to case cxlflash: Increase cmd_per_lun for better throughput cxlflash: Fix to avoid unnecessary scan with internal LUNs cxlflash: Reorder user context initialization cxlflash: Simplify attach path error cleanup cxlflash: Split out context initialization cxlflash: Unmap problem state area before detaching master context cxlflash: Simplify PCI registration scsi: storvsc: fix SRB_STATUS_ABORTED handling be2iscsi: set the boot_kset pointer to NULL in case of failure sd: Fix discard granularity when LBPRZ=1 be2iscsi: Remove unnecessary synchronize_irq() before free_irq() scsi_sysfs: call 'device_add' after attaching device handler scsi_dh_emc: update 'access_state' field scsi_dh_rdac: update 'access_state' field scsi_dh_alua: update 'access_state' field scsi_dh_alua: use common definitions for ALUA state scsi: Add 'access_state' and 'preferred_path' attribute ...
2016-03-16Merge branch 'stable/for-linus-4.6' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft Pull iscsi_ibft update from Konrad Rzeszutek Wilk: "A simple patch that had been rattling around in SuSE repo" * 'stable/for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft: iscsi_ibft: Add prefix-len attr and display netmask
2016-03-16Merge tag 'pci-v4.6-changes' of ↵Linus Torvalds5-97/+94
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for v4.6: Enumeration: - Disable IO/MEM decoding for devices with non-compliant BARs (Bjorn Helgaas) - Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs (Bjorn Helgaas Resource management: - Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED (Bjorn Helgaas) - Don't assign or reassign immutable resources (Bjorn Helgaas) - Don't enable/disable ROM BAR if we're using a RAM shadow copy (Bjorn Helgaas) - Set ROM shadow location in arch code, not in PCI core (Bjorn Helgaas) - Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs (Bjorn Helgaas) - ia64: Use ioremap() instead of open-coded equivalent (Bjorn Helgaas) - ia64: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas) - MIPS: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas) - Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY (Bjorn Helgaas) - Don't leak memory if sysfs_create_bin_file() fails (Bjorn Helgaas) - rcar: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi) - designware: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi) Virtualization: - Wait for up to 1000ms after FLR reset (Alex Williamson) - Support SR-IOV on any function type (Kelly Zytaruk) - Add ACS quirk for all Cavium devices (Manish Jaggi) AER: - Rename pci_ops_aer to aer_inj_pci_ops (Bjorn Helgaas) - Restore pci_ops pointer while calling original pci_ops (David Daney) - Fix aer_inject error codes (Jean Delvare) - Use dev_warn() in aer_inject (Jean Delvare) - Log actual error causes in aer_inject (Jean Delvare) - Log aer_inject error injections (Jean Delvare) VPD: - Prevent VPD access for buggy devices (Babu Moger) - Move pci_read_vpd() and pci_write_vpd() close to other VPD code (Bjorn Helgaas) - Move pci_vpd_release() from header file to pci/access.c (Bjorn Helgaas) - Remove struct pci_vpd_ops.release function pointer (Bjorn Helgaas) - Rename VPD symbols to remove unnecessary "pci22" (Bjorn Helgaas) - Fold struct pci_vpd_pci22 into struct pci_vpd (Bjorn Helgaas) - Sleep rather than busy-wait for VPD access completion (Bjorn Helgaas) - Update VPD definitions (Hannes Reinecke) - Allow access to VPD attributes with size 0 (Hannes Reinecke) - Determine actual VPD size on first access (Hannes Reinecke) Generic host bridge driver: - Move structure definitions to separate header file (David Daney) - Add pci_host_common_probe(), based on gen_pci_probe() (David Daney) - Expose pci_host_common_probe() for use by other drivers (David Daney) Altera host bridge driver: - Fix altera_pcie_link_is_up() (Ley Foon Tan) Cavium ThunderX host bridge driver: - Add PCIe host driver for ThunderX processors (David Daney) - Add driver for ThunderX-pass{1,2} on-chip devices (David Daney) Freescale i.MX6 host bridge driver: - Add DT bindings to configure PHY Tx driver settings (Justin Waters) - Move imx6_pcie_reset_phy() near other PHY handling functions (Lucas Stach) - Move PHY reset into imx6_pcie_establish_link() (Lucas Stach) - Remove broken Gen2 workaround (Lucas Stach) - Move link up check into imx6_pcie_wait_for_link() (Lucas Stach) Freescale Layerscape host bridge driver: - Add "fsl,ls2085a-pcie" compatible ID (Yang Shi) Intel VMD host bridge driver: - Attach VMD resources to parent domain's resource tree (Jon Derrick) - Set bus resource start to 0 (Keith Busch) Microsoft Hyper-V host bridge driver: - Add fwnode_handle to x86 pci_sysdata (Jake Oshins) - Look up IRQ domain by fwnode_handle (Jake Oshins) - Add paravirtual PCI front-end for Microsoft Hyper-V VMs (Jake Oshins) NVIDIA Tegra host bridge driver: - Add pci_ops.{add,remove}_bus() callbacks (Thierry Reding) - Implement ->{add,remove}_bus() callbacks (Thierry Reding) - Remove unused struct tegra_pcie.num_ports field (Thierry Reding) - Track bus -> CPU mapping (Thierry Reding) - Remove misleading PHYS_OFFSET (Thierry Reding) Renesas R-Car host bridge driver: - Depend on ARCH_RENESAS, not ARCH_SHMOBILE (Simon Horman) Synopsys DesignWare host bridge driver: - ARC: Add PCI support (Joao Pinto) - Add generic dw_pcie_wait_for_link() (Joao Pinto) - Add default link up check if sub-driver doesn't override (Joao Pinto) - Add driver for prototyping kits based on ARC SDP (Joao Pinto) TI Keystone host bridge driver: - Defer probing if devm_phy_get() returns -EPROBE_DEFER (Shawn Lin) Xilinx AXI host bridge driver: - Use of_pci_get_host_bridge_resources() to parse DT (Bharat Kumar Gogada) - Remove dependency on ARM-specific struct hw_pci (Bharat Kumar Gogada) - Don't call pci_fixup_irqs() on Microblaze (Bharat Kumar Gogada) - Update Zynq binding with Microblaze node (Bharat Kumar Gogada) - microblaze: Support generic Xilinx AXI PCIe Host Bridge IP driver (Bharat Kumar Gogada) Xilinx NWL host bridge driver: - Add support for Xilinx NWL PCIe Host Controller (Bharat Kumar Gogada) Miscellaneous: - Check device_attach() return value always (Bjorn Helgaas) - Move pci_set_flags() from asm-generic/pci-bridge.h to linux/pci.h (Bjorn Helgaas) - Remove includes of empty asm-generic/pci-bridge.h (Bjorn Helgaas) - ARM64: Remove generated include of asm-generic/pci-bridge.h (Bjorn Helgaas) - Remove empty asm-generic/pci-bridge.h (Bjorn Helgaas) - Remove includes of asm/pci-bridge.h (Bjorn Helgaas) - Consolidate PCI DMA constants and interfaces in linux/pci-dma-compat.h (Bjorn Helgaas) - unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definition (Bjorn Helgaas) - Cleanup pci/pcie/Kconfig whitespace (Andreas Ziegler) - Include pci/hotplug Kconfig directly from pci/Kconfig (Bjorn Helgaas) - Include pci/pcie/Kconfig directly from pci/Kconfig (Bogicevic Sasa) - frv: Remove stray pci_{alloc,free}_consistent() declaration (Christoph Hellwig) - Move pci_dma_* helpers to common code (Christoph Hellwig) - Add PCI_CLASS_SERIAL_USB_DEVICE definition (Heikki Krogerus) - Add QEMU top-level IDs for (sub)vendor & device (Robin H. Johnson) - Fix broken URL for Dell biosdevname (Naga Venkata Sai Indubhaskar Jupudi)" * tag 'pci-v4.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (94 commits) PCI: Add PCI_CLASS_SERIAL_USB_DEVICE definition PCI: designware: Add driver for prototyping kits based on ARC SDP PCI: designware: Add default link up check if sub-driver doesn't override PCI: designware: Add generic dw_pcie_wait_for_link() PCI: Cleanup pci/pcie/Kconfig whitespace PCI: Simplify pci_create_attr() control flow PCI: Don't leak memory if sysfs_create_bin_file() fails PCI: Simplify sysfs ROM cleanup PCI: Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY MIPS: Loongson 3: Keep CPU physical (not virtual) addresses in shadow ROM resource MIPS: Loongson 3: Use temporary struct resource * to avoid repetition ia64/PCI: Keep CPU physical (not virtual) addresses in shadow ROM resource ia64/PCI: Use ioremap() instead of open-coded equivalent ia64/PCI: Use temporary struct resource * to avoid repetition PCI: Clean up pci_map_rom() whitespace PCI: Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs PCI: thunder: Add driver for ThunderX-pass{1,2} on-chip devices PCI: thunder: Add PCIe host driver for ThunderX processors PCI: generic: Expose pci_host_common_probe() for use by other drivers PCI: generic: Add pci_host_common_probe(), based on gen_pci_probe() ...
2016-03-16Merge tag 'pm+acpi-4.6-rc1-1' of ↵Linus Torvalds9-55/+90
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael Wysocki: "This time the majority of changes go into cpufreq and they are significant. First off, the way CPU frequency updates are triggered is different now. Instead of having to set up and manage a deferrable timer for each CPU in the system to evaluate and possibly change its frequency periodically, cpufreq governors set up callbacks to be invoked by the scheduler on a regular basis (basically on utilization updates). The "old" governors, "ondemand" and "conservative", still do all of their work in process context (although that is triggered by the scheduler now), but intel_pstate does it all in the callback invoked by the scheduler with no need for any additional asynchronous processing. Of course, this eliminates the overhead related to the management of all those timers, but also it allows the cpufreq governor code to be simplified quite a bit. On top of that, the common code and data structures used by the "ondemand" and "conservative" governors are cleaned up and made more straightforward and some long-standing and quite annoying problems are addressed. In particular, the handling of governor sysfs attributes is modified and the related locking becomes more fine grained which allows some concurrency problems to be avoided (particularly deadlocks with the core cpufreq code). In principle, the new mechanism for triggering frequency updates allows utilization information to be passed from the scheduler to cpufreq. Although the current code doesn't make use of it, in the works is a new cpufreq governor that will make decisions based on the scheduler's utilization data. That should allow the scheduler and cpufreq to work more closely together in the long run. In addition to the core and governor changes, cpufreq drivers are updated too. Fixes and optimizations go into intel_pstate, the cpufreq-dt driver is updated on top of some modification in the Operating Performance Points (OPP) framework and there are fixes and other updates in the powernv cpufreq driver. Apart from the cpufreq updates there is some new ACPICA material, including a fix for a problem introduced by previous ACPICA updates, and some less significant changes in the ACPI code, like CPPC code optimizations, ACPI processor driver cleanups and support for loading ACPI tables from initrd. Also updated are the generic power domains framework, the Intel RAPL power capping driver and the turbostat utility and we have a bunch of traditional assorted fixes and cleanups. Specifics: - Redesign of cpufreq governors and the intel_pstate driver to make them use callbacks invoked by the scheduler to trigger CPU frequency evaluation instead of using per-CPU deferrable timers for that purpose (Rafael Wysocki). - Reorganization and cleanup of cpufreq governor code to make it more straightforward and fix some concurrency problems in it (Rafael Wysocki, Viresh Kumar). - Cleanup and improvements of locking in the cpufreq core (Viresh Kumar). - Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh Kumar, Eric Biggers). - intel_pstate driver updates including fixes, optimizations and a modification to make it enable enable hardware-coordinated P-state selection (HWP) by default if supported by the processor (Philippe Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe Franciosi). - Operating Performance Points (OPP) framework updates to improve its handling of voltage regulators and device clocks and updates of the cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter). - Updates of the powernv cpufreq driver to fix initialization and cleanup problems in it and correct its worker thread handling with respect to CPU offline, new powernv_throttle tracepoint (Shilpasri Bhat). - ACPI cpufreq driver optimization and cleanup (Rafael Wysocki). - ACPICA updates including one fix for a regression introduced by previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box, Colin Ian King). - Support for installing ACPI tables from initrd (Lv Zheng). - Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin Chaugule). - Support for _HID(ACPI0010) devices (ACPI processor containers) and ACPI processor driver cleanups (Sudeep Holla). - Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory, Aleksey Makarov). - Modification of the ACPI PCI IRQ management code to make it treat 255 in the Interrupt Line register as "not connected" on x86 (as per the specification) and avoid attempts to use that value as a valid interrupt vector (Chen Fan). - ACPI APEI fixes related to resource leaks (Josh Hunt). - Removal of modularity from a few ACPI drivers (BGRT, GHES, intel_pmic_crc) that cannot be built as modules in practice (Paul Gortmaker). - PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS as a valid resource type (Harb Abdulhamid). - New device ID (future AMD I2C controller) in the ACPI driver for AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu). - Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin). - cpuidle menu governor optimization to avoid a square root computation in it (Rasmus Villemoes). - Fix for potential use-after-free in the generic device properties framework (Heikki Krogerus). - Updates of the generic power domains (genpd) framework including support for multiple power states of a domain, fixes and debugfs output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart, Geert Uytterhoeven). - Intel RAPL power capping driver updates to reduce IPI overhead in it (Jacob Pan). - System suspend/hibernation code cleanups (Eric Biggers, Saurabh Sengar). - Year 2038 fix for the process freezer (Abhilash Jindal). - turbostat utility updates including new features (decoding of more registers and CPUID fields, sub-second intervals support, GFX MHz and RC6 printout, --out command line option), fixes (syscall jitter detection and workaround, reductioin of the number of syscalls made, fixes related to Xeon x200 processors, compiler warning fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu)" * tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (182 commits) tools/power turbostat: bugfix: TDP MSRs print bits fixing tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump tools/power turbostat: call __cpuid() instead of __get_cpuid() tools/power turbostat: indicate SMX and SGX support tools/power turbostat: detect and work around syscall jitter tools/power turbostat: show GFX%rc6 tools/power turbostat: show GFXMHz tools/power turbostat: show IRQs per CPU tools/power turbostat: make fewer systems calls tools/power turbostat: fix compiler warnings tools/power turbostat: add --out option for saving output in a file tools/power turbostat: re-name "%Busy" field to "Busy%" tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding tools/power turbostat: Intel Xeon x200: fix erroneous bclk value tools/power turbostat: allow sub-sec intervals ACPI / APEI: ERST: Fixed leaked resources in erst_init ACPI / APEI: Fix leaked resources intel_pstate: Do not skip samples partially intel_pstate: Remove freq calculation from intel_pstate_calc_busy() intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance() ...
2016-03-16Merge branch 'akpm' (patches from Andrew)Linus Torvalds29-178/+386
Merge first patch-bomb from Andrew Morton: - some misc things - ofs2 updates - about half of MM - checkpatch updates - autofs4 update * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits) autofs4: fix string.h include in auto_dev-ioctl.h autofs4: use pr_xxx() macros directly for logging autofs4: change log print macros to not insert newline autofs4: make autofs log prints consistent autofs4: fix some white space errors autofs4: fix invalid ioctl return in autofs4_root_ioctl_unlocked() autofs4: fix coding style line length in autofs4_wait() autofs4: fix coding style problem in autofs4_get_set_timeout() autofs4: coding style fixes autofs: show pipe inode in mount options kallsyms: add support for relative offsets in kallsyms address table kallsyms: don't overload absolute symbol type for percpu symbols x86: kallsyms: disable absolute percpu symbols on !SMP checkpatch: fix another left brace warning checkpatch: improve UNSPECIFIED_INT test for bare signed/unsigned uses checkpatch: warn on bare unsigned or signed declarations without int checkpatch: exclude asm volatile from complex macro check mm: memcontrol: drop unnecessary lru locking from mem_cgroup_migrate() mm: migrate: consolidate mem_cgroup_migrate() calls mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous ...
2016-03-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds5-11/+140
Pull KVM updates from Paolo Bonzini: "One of the largest releases for KVM... Hardly any generic changes, but lots of architecture-specific updates. ARM: - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems - PMU support for guests - 32bit world switch rewritten in C - various optimizations to the vgic save/restore code. PPC: - enabled KVM-VFIO integration ("VFIO device") - optimizations to speed up IPIs between vcpus - in-kernel handling of IOMMU hypercalls - support for dynamic DMA windows (DDW). s390: - provide the floating point registers via sync regs; - separated instruction vs. data accesses - dirty log improvements for huge guests - bugfixes and documentation improvements. x86: - Hyper-V VMBus hypercall userspace exit - alternative implementation of lowest-priority interrupts using vector hashing (for better VT-d posted interrupt support) - fixed guest debugging with nested virtualizations - improved interrupt tracking in the in-kernel IOAPIC - generic infrastructure for tracking writes to guest memory - currently its only use is to speedup the legacy shadow paging (pre-EPT) case, but in the future it will be used for virtual GPUs as well - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits) KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch KVM: x86: disable MPX if host did not enable MPX XSAVE features arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit arm64: KVM: vgic-v3: Reset LRs at boot time arm64: KVM: vgic-v3: Do not save an LR known to be empty arm64: KVM: vgic-v3: Save maintenance interrupt state only if required arm64: KVM: vgic-v3: Avoid accessing ICH registers KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit KVM: arm/arm64: vgic-v2: Reset LRs at boot time KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers KVM: s390: allocate only one DMA page per VM KVM: s390: enable STFLE interpretation only if enabled for the guest KVM: s390: wake up when the VCPU cpu timer expires KVM: s390: step the VCPU timer while in enabled wait KVM: s390: protect VCPU cpu timer with a seqcount KVM: s390: step VCPU cpu timer during kvm_run ioctl ...
2016-03-15Merge tag 'leds_for_4.6' of ↵Linus Torvalds1-2/+6
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED updates from Jacek Anaszewski: "LED core improvements: - Fix misleading comment after workqueue removal from drivers - Avoid error message when a USB LED device is unplugged - Add helpers for calling brightness_set(_blocking) LED triggers: - Simplify led_trigger_store by using sysfs_streq() LED class drivers improvements: - Improve wording and formatting in a comment: lp3944 - Fix return value check in create_gpio_led(): leds-gpio - Use GPIOF_OUT_INIT_LOW instead of hardcoded zero: leds-gpio - Use devm_led_classdev_register(): leds-lm3533, leds-lm3533, leds-lp8788, leds-wm831x-status, leds-s3c24xx, leds-s3c24xx, leds-max8997. New LED class driver: - Add driver for the ISSI IS31FL32xx family of LED controllers. Device Tree documentation: - of: Add vendor prefixes for Integrated Silicon Solutions Inc. (issi) and Si-En Technology (si-en). - DT: Add common bindings for Si-En Technology SN3216/18 and IS31FL32xx family of LED controllers, since they seem to be the same hardware, just rebranded" * tag 'leds_for_4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: leds: triggers: simplify led_trigger_store leds: max8997: Use devm_led_classdev_register leds: da903x: Use devm_led_classdev_register leds: s3c24xx: Use devm_led_classdev_register leds: wm831x-status: Use devm_led_classdev_register leds: lp8788: Use devm_led_classdev_register leds: 88pm860x: Use devm_led_classdev_register leds: Add SN3218 and SN3216 support to the IS31FL32XX driver of: Add vendor prefix for Si-En Technology leds: Add driver for the ISSI IS31FL32xx family of LED controllers DT: leds: Add binding for the ISSI IS31FL32xx family of LED controllers DT: Add vendor prefix for Integrated Silicon Solutions Inc. leds: lm3533: Use devm_led_classdev_register leds: gpio: Use GPIOF_OUT_INIT_LOW instead of hardcoded zero leds: core: add helpers for calling brightness_set(_blocking) leds: leds-gpio: Fix return value check in create_gpio_led() leds: lp3944: improve wording and formatting in a comment leds: core: avoid error message when a USB LED device is unplugged leds: core: fix misleading comment after workqueue removal from drivers
2016-03-15Merge tag 'rtc-4.6' of ↵Linus Torvalds2-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC updates from Alexandre Belloni: "Core: - New sysfs interface to set and read clock offset - Drivers can now be both I2C and SPI (see pcf2127 and ds3232) New drivers: - Alphascale ASM9260 - Epson RX6110SA - Maxim max20024 and max77620 (in max77686) - Microchip PIC32 - NXP pcf2129 (in pcf2127) Subsystem wide cleanups: - remove IRQF_EARLY_RESUME when unecessary Drivers: - ds1307: clock output, temperature sensor and wakeup-source support - ds1685: actually spin forever in poweroff error path - ds3232: many cleanups - ds3234: merged in ds3232 - hym8563: fix invalid year calculation - max77686: many cleanups - max77802 merged in max77686 - pcf2123: cleanups and offset support - pcf85063: cleanups - pcf8523: propely handle oscillator stop bit - rv3029: many cleanups, trickle charger and temperature sensor support - rv8803: convert spin_lock to mutex_lock - rx8025: many fixes - vr41xx: restore alarm_irq_enable" * tag 'rtc-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (86 commits) rtc: pcf2127: add pcf2129 device id rtc: pcf2127: add support for spi interface rtc: pcf2127: convert to use regmap rtc: rv3029: Add thermometer hwmon support rtc: rv3029: Add update_bits helper for eeprom access rtc: ds1685: actually spin forever in poweroff error path rtc: hym8563: fix invalid year calculation rtc: ds3232: use rtc->ops_lock to protect alarm operations rtc: ds3232: fix issue when irq is shared several devices rtc: ds3232: remove unused UIE code rtc: ds3232: add register access error checks rtc: ds3232: fix read on /dev/rtc after RTC_AIE_ON rtc: merge ds3232 and ds3234 rtc: ds3232: convert to use regmap rtc: pxa: fix Kconfig indentation rtc: rv3029: Add device tree property for trickle charger rtc: rv3029: Add functions for EEPROM access rtc: rv3029: Add i2c register update-bits helper rtc: rv3029: Add missing register definitions rtc: rv3029: Add "rv3029" I2C device id ...
2016-03-15Merge tag 'hwmon-for-linus-v4.6' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon updates from Guenter Roeck: - New drivers for NSA320 and LTC2990 - Added support for ADM1278 to adm1275 driver - Added support for ncpXXxh103 to ntc_thermistor driver - Renamed vexpress hwmon implementation - Minor cleanups and improvements * tag 'hwmon-for-linus-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: Create an NSA320 hardware monitoring driver hwmon: Define binding for the nsa320-hwmon driver hwmon: (adm1275) Add support for ADM1278 hwmon: (ntc_thermistor) Add support for ncpXXxh103 Doc: hwmon: Fix typo "montoring" in hwmon ARM: dts: vfxxx: Add iio_hwmon node for ADC temperature channel ARM: dts: Change iio_hwmon nodes to use hypen in node names hwmon: (iio_hwmon) Allow the driver to accept hypen in device tree node names hwmon: Add LTC2990 sensor driver hwmon: (vexpress) rename vexpress hwmon implementation
2016-03-15Merge tag 'regulator-v4.6' of ↵Linus Torvalds4-2/+36
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator updates from Mark Brown: "This has been an extremely quiet release for the regulator API, aside from bugfixes and small enhancements the only thing that really stands out are the new drivers for Action Semiconductors ACT8945A, HiSilicon HI665x, and the Maxim MAX20024 and MAX77620" * tag 'regulator-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (46 commits) regulator: pwm: Add support to have multiple instance of pwm regulator regulator: pwm: Fix calculation of voltage-to-duty cycle regulator: of: Use of_property_read_u32() for reading min/max regulator: pv88060: fix incorrect clear of event register regulator: pv88090: fix incorrect clear of event register regulator: max77620: Add support to configure active-discharge regulator: core: Add support for active-discharge configuration regulator: helper: Add helper to configure active-discharge using regmap regulator: core: Add support for active-discharge configuration regulator: DT: Add DT property for active-discharge configuration regulator: act8865: Specify fixed voltage of 3.3V for ACT8600's REG9 regulator: act8865: Rename platform_data field to init_data regulator: act8865: Remove "static" from local variable ASoC: cs4271: add regulator consumer support regulator: max77620: Remove duplicate module alias regulator: max77620: Eliminate duplicate code regulator: max77620: Remove unused fields regulator: core: fix crash in error path of regulator_register regulator: core: Request GPIO before creating sysfs entries regulator: gpio: don't print error on EPROBE_DEFER ...
2016-03-15Merge tag 'regmap-v4.6' of ↵Linus Torvalds1-49/+58
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap updates from Mark Brown: "This has been a very busy release for regmap, not just in cleaning up the mess we got ourselves into with the endianness handling but also in other areas too: - Fixes for the endianness handling so that we now explicitly default to little endian (the code used to do this by accident). This fixes handling of explictly specified endianness on big endian systems. - Optimisation of the implementation of register striding. - A refectoring of the _update_bits() code to reduce duplication. - Fixes and enhancements for the interrupt implementation which make it easier to use in a wider range of systems" * tag 'regmap-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (28 commits) regmap: irq: add devm apis for regmap_{add,del}_irq_chip regmap: replace regmap_write_bits() regmap: irq: Enable irq retriggering for nested irqs regmap: add regmap_fields_force_xxx() macros regmap: add regmap_field_force_xxx() macros regmap: merge regmap_fields_update_bits() into macro regmap: merge regmap_fields_write() into macro regmap: add regmap_fields_update_bits_base() regmap: merge regmap_field_update_bits() into macro regmap: merge regmap_field_write() into macro regmap: add regmap_field_update_bits_base() regmap: merge regmap_update_bits_check_async() into macro regmap: merge regmap_update_bits_check() into macro regmap: merge regmap_update_bits_async() into macro regmap: merge regmap_update_bits() into macro regmap: add regmap_update_bits_base() regcache: flat: Introduce register strider order regcache: Introduce the index parsing API by stride order regmap: core: Introduce register stride order regmap: irq: add devm apis for regmap_{add,del}_irq_chip ...
2016-03-15Merge tag 'spi-v4.6' of ↵Linus Torvalds2-0/+146
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi updates from Mark Brown: "Not the biggest set of changes for SPI but a bit of a pickup in activity on the core: - Support for memory mapped read from flash devices via a SPI controller. - The beginnings of a message rewriting framework in the core which should in time allow us to support transforming messages to work around the limits of controllers or optimise the performance for controllers transparently to calling drivers. - Updates to the PXA2xx, the main functional change being to improve the ACPI support. - A new driver for the Analog Devices AXI SPI engine" * tag 'spi-v4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (66 commits) spi: Add gfp parameter to kernel-doc to fix build warning spi: Fix htmldocs build error due struct spi_replaced_transfers spi: rockchip: covert rsd_nsecs to u32 type spi: rockchip: header file cleanup spi: xilinx: Add devicetree binding for spi-xilinx spi: respect the maximum segment size of DMA device spi: rockchip: check requesting dma channel with EPROBE_DEFER spi: rockchip: migrate to dmaengine_terminate_async spi: rockchip: check return value of dmaengine_prep_slave_sg spi: core: Fix deadlock when sending messages spi/rockchip: fix endian mode for 16-bit transfers spi/rockchip: Make sure spi clk is on in rockchip_spi_set_cs spi: pxa2xx: Use newer more explicit DMAengine terminate API spi: pxa2xx: Add support for Intel Broxton B-Step spi: lp-8841: return correct error code from probe spi: imx: drop bogus tests for rx/tx bufs in DMA transfer spi: imx: set MX51_ECSPI_CTRL_SMC bit in setup function spi: imx: make some register defines simpler spi: imx: remove unnecessary bit clearing in mx51_ecspi_config spi: imx: add support for all SPI word width for DMA ...
2016-03-15Merge tag 'pinctrl-v4.6-1' of ↵Linus Torvalds1-0/+520
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control updates from Linus Walleij: "An almost purely driver related set of changes with no major changes to the framework, only one patch adding an unlocked version of the pinctrl_find_gpio_range_from_pin() library call. New drivers: - ST Microelectronics STM32 MCU support: this is a non-MMU low-end platform for IoT things (etc). - Microchip PIC32 MCU support: same story as for STM32. New subdrivers: - Allwinner SunXi H3 R_PIO controller support. - Qualcomm IPQ4019 support. - MediaTek MT2701 and MT7623. - Allwinner A64 Non-critical fixes: - gpio_disable_free() for the Vybrid. - pinctrl single: use a separate lockdep class. Misc: - Substantial cleanups and rewrites for the Super-H PFC driver and subdrivers. - Various fixes and cleanups, especially Paul Gortmakers work to make nonmodular drivers nonmodular" * tag 'pinctrl-v4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (75 commits) pinctrl: single: Use a separate lockdep class drivers: pinctrl: add driver for Allwinner A64 SoC pinctrl: Broadcom Northstar2 pinctrl device tree bindings pinctrl: amlogic: Make driver independent from two-domain configuration pinctrl: amlogic: Separate some pin functions for Meson8 / Meson8b pinctrl: at91: use __maybe_unused to hide pm functions pinctrl: sh-pfc: core: don't open code of_device_get_match_data() pinctrl: uniphier: rename CONFIG options and file names pinctrl: sunxi: make A80 explicitly non-modular pinctrl: stm32: make explicitly non-modular pinctrl: sh-pfc: make explicitly non-modular pinctrl: meson: make explicitly non-modular pinctrl: pinctrl-mt6397 driver explicitly non-modular pinctrl: sunxi: does not need module.h pinctrl: pxa2xx: export symbols pinctrl: sunxi: Change mux setting on PI irq pins pinctrl: sunxi: Remove non existing irq's pinctrl: imx: attach iomuxc device to gpr syscon pinctrl-bcm2835: Fix cut-and-paste error in "pull" parsing pinctrl: lpc1850-scu: document nxp,gpio-pin-interrupt ...
2016-03-15autofs4: fix string.h include in auto_dev-ioctl.hIan Kent1-5/+0
Since including linux/string.h will now do the right thing remove the conditional check. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>