summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2012-05-31kernel/resource.c: correct the comment of allocate_resource()Wei Yang1-2/+2
In the comment of allocate_resource(), the explanation of parameter max and min is not correct. Actually, these two parameters are used to specify the range of the resource that will be allocated, not the min/max size that will be allocated. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31perf: Remove duplicate invocation on perf_event_for_eachNamhyung Kim1-1/+0
The @func callback was invoked twice for group leader when perf_event_for_each() called. It seems the commit 75f937f24bd9 ("perf_counter: Fix ctx->mutex vs counter ->mutex inversion") made the mistake during the change. Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1338443506-25009-1-git-send-email-namhyung.kim@lge.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-30Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) perf ui browser: Stop using 'self' perf annotate browser: Read perf config file for settings perf config: Allow '_' in config file variable names perf annotate browser: Make feature toggles global perf annotate browser: The idx_asm field should be used in asm only view perf tools: Convert critical messages to ui__error() perf ui: Make --stdio default when TUI is not supported tools lib traceevent: Silence compiler warning on 32bit build perf record: Fix branch_stack type in perf_record_opts perf tools: Reconstruct event with modifiers from perf_event_attr perf top: Fix counter name fixup when fallbacking to cpu-clock perf tools: fix thread_map__new_by_pid_str() memory leak in error path perf tools: Do not use _FORTIFY_SOURCE when DEBUG=1 is specified tools lib traceevent: Fix signature of create_arg_item() tools lib traceevent: Use proper function parameter type tools lib traceevent: Fix freeing arg on process_dynamic_array() tools lib traceevent: Fix a possibly wrong memory dereference tools lib traceevent: Fix a possible memory leak tools lib traceevent: Allow expressions in __print_symbolic() fields perf evlist: Explicititely initialize input_name ...
2012-05-30Merge branch 'for-3.5/core' of git://git.kernel.dk/linux-blockLinus Torvalds1-3/+2
Merge block/IO core bits from Jens Axboe: "This is a bit bigger on the core side than usual, but that is purely because we decided to hold off on parts of Tejun's submission on 3.4 to give it a bit more time to simmer. As a consequence, it's seen a long cycle in for-next. It contains: - Bug fix from Dan, wrong locking type. - Relax splice gifting restriction from Eric. - A ton of updates from Tejun, primarily for blkcg. This improves the code a lot, making the API nicer and cleaner, and also includes fixes for how we handle and tie policies and re-activate on switches. The changes also include generic bug fixes. - A simple fix from Vivek, along with a fix for doing proper delayed allocation of the blkcg stats." Fix up annoying conflict just due to different merge resolution in Documentation/feature-removal-schedule.txt * 'for-3.5/core' of git://git.kernel.dk/linux-block: (92 commits) blkcg: tg_stats_alloc_lock is an irq lock vmsplice: relax alignement requirements for SPLICE_F_GIFT blkcg: use radix tree to index blkgs from blkcg blkcg: fix blkcg->css ref leak in __blkg_lookup_create() block: fix elvpriv allocation failure handling block: collapse blk_alloc_request() into get_request() blkcg: collapse blkcg_policy_ops into blkcg_policy blkcg: embed struct blkg_policy_data in policy specific data blkcg: mass rename of blkcg API blkcg: style cleanups for blk-cgroup.h blkcg: remove blkio_group->path[] blkcg: blkg_rwstat_read() was missing inline blkcg: shoot down blkgs if all policies are deactivated blkcg: drop stuff unused after per-queue policy activation update blkcg: implement per-queue policy activation blkcg: add request_queue->root_blkg blkcg: make request_queue bypassing on allocation blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing blkcg: make blkg_conf_prep() take @pol and return with queue lock held blkcg: remove static policy ID enums ...
2012-05-30sched: Remove NULL assignment of dattr_curKamalesh Babulal1-1/+0
Remove explicit NULL assignment of static pointer dattr_cur from init_sched_domains(). Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120523091411.GG5005@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched: Remove the last NULL entry from sched_feat_namesHiroshi Shimamoto1-1/+0
No need to have the last NULL entry. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4FBF29E7.5020805@ct.jp.nec.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched: Make sched_feat_names constHiroshi Shimamoto1-1/+1
The strings sched_feat_names are never changed. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4FBF29B2.9030904@ct.jp.nec.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched/rt: Fix SCHED_RR across cgroupsColin Cross1-5/+10
task_tick_rt() has an optimization to only reschedule SCHED_RR tasks if they were the only element on their rq. However, with cgroups a SCHED_RR task could be the only element on its per-cgroup rq but still be competing with other SCHED_RR tasks in its parent's cgroup. In this case, the SCHED_RR task in the child cgroup would never yield at the end of its timeslice. If the child cgroup rt_runtime_us was the same as the parent cgroup rt_runtime_us, the task in the parent cgroup would starve completely. Modify task_tick_rt() to check that the task is the only task on its rq, and that the each of the scheduling entities of its ancestors is also the only entity on its rq. Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1337229266-15798-1-git-send-email-ccross@android.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched: Move nr_cpus_allowed out of 'struct sched_rt_entity'Peter Zijlstra3-17/+23
Since nr_cpus_allowed is used outside of sched/rt.c and wants to be used outside of there more, move it to a more natural site. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-kr61f02y9brwzkh6x53pdptm@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched: Make sure to not re-read variables after validationPeter Zijlstra1-4/+11
We could re-read rq->rt_avg after we validated it was smaller than total, invalidating the check and resulting in an unintended negative. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/1337688268.9698.29.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched: Fix SD_OVERLAPPeter Zijlstra2-7/+25
SD_OVERLAP exists to allow overlapping groups, overlapping groups appear in NUMA topologies that aren't fully connected. The typical result of not fully connected NUMA is that each cpu (or rather node) will have different spans for a particular distance. However due to how sched domains are traversed -- only the first cpu in the mask goes one level up -- the next level only cares about the spans of the cpus that went up. Due to this two things were observed to be broken: - build_overlap_sched_groups() -- since its possible the cpu we're building the groups for exists in multiple (or all) groups, the selection criteria of the first group didn't ensure there was a cpu for which is was true that cpumask_first(span) == cpu. Thus load- balancing would terminate. - update_group_power() -- assumed that the cpu span of the first group of the domain was covered by all groups of the child domain. The above explains why this isn't true, so deal with it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/1337788843.9783.14.camel@laptop Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched: Don't try allocating memory from offline nodesPeter Zijlstra1-1/+1
Allocators don't appreciate it when you try and allocate memory from offline nodes. Reported-and-tested-by: Tony Luck <tony.luck@intel.com> Reported-and-tested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-epfc1io9whb7o22bcujf31vn@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched/nohz: Fix rq->cpu_load calculations some morePeter Zijlstra2-10/+44
Follow up on commit 556061b00 ("sched/nohz: Fix rq->cpu_load[] calculations") since while that fixed the busy case it regressed the mostly idle case. Add a callback from the nohz exit to also age the rq->cpu_load[] array. This closes the hole where either there was no nohz load balance pass during the nohz, or there was a 'significant' amount of idle time between the last nohz balance and the nohz exit. So we'll update unconditionally from the tick to not insert any accidental 0 load periods while busy, and we try and catch up from nohz idle balance and nohz exit. Both these are still prone to missing a jiffy, but that has always been the case. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: pjt@google.com Cc: Venkatesh Pallipadi <venki@google.com> Link: http://lkml.kernel.org/n/tip-kt0trz0apodbf84ucjfdbr1a@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30Merge branch 'linus' into perf/urgentIngo Molnar21-940/+3723
Merge back Linus's latest branch so that we pick up the uprobes changes. ( I tested this branch locally and while it's one from the middle of the merge window it's a good one to base further work off. ) Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-29brlocks/lglocks: turn into functionsAndi Kleen2-1/+90
lglocks and brlocks are currently generated with some complicated macros in lglock.h. But there's no reason to not just use common utility functions and put all the data into a common data structure. Since there are at least two users it makes sense to share this code in a library. This is also easier maintainable than a macro forest. This will also make it later possible to dynamically allocate lglocks and also use them in modules (this would both still need some additional, but now straightforward, code) [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29vsprintf: fix %ps on non symbols when using kallsymsStephen Boyd1-8/+24
Using %ps in a printk format will sometimes fail silently and print the empty string if the address passed in does not match a symbol that kallsyms knows about. But using %pS will fall back to printing the full address if kallsyms can't find the symbol. Make %ps act the same as %pS by falling back to printing the address. While we're here also make %ps print the module that a symbol comes from so that it matches what %pS already does. Take this simple function for example (in a module): static void test_printk(void) { int test; pr_info("with pS: %pS\n", &test); pr_info("with ps: %ps\n", &test); } Before this patch: with pS: 0xdff7df44 with ps: After this patch: with pS: 0xdff7df44 with ps: 0xdff7df44 Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29rescounters: add res_counter_uncharge_until()Frederic Weisbecker1-2/+8
When killing a res_counter which is a child of other counter, we need to do res_counter_uncharge(child, xxx) res_counter_charge(parent, xxx) This is not atomic and wastes CPU. This patch adds res_counter_uncharge_until(). This function's uncharge propagates to ancestors until specified res_counter. res_counter_uncharge_until(child, parent, xxx) Now the operation is atomic and efficient. Signed-off-by: Frederic Weisbecker <fweisbec@redhat.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Ying Han <yinghan@google.com> Cc: Glauber Costa <glommer@parallels.com> Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29kernel: cgroup: push rcu read locking from css_is_ancestor() to callsiteJohannes Weiner1-10/+10
Library functions should not grab locks when the callsites can do it, even if the lock nests like the rcu read-side lock does. Push the rcu_read_lock() from css_is_ancestor() to its single user, mem_cgroup_same_or_subtree() in preparation for another user that may already hold the rcu read-side lock. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29mm/fork: fix overflow in vma length when copying mmap on cloneSiddhesh Poyarekar1-1/+2
The vma length in dup_mmap is calculated and stored in a unsigned int, which is insufficient and hence overflows for very large maps (beyond 16TB). The following program demonstrates this: #include <stdio.h> #include <unistd.h> #include <sys/mman.h> #define GIG 1024 * 1024 * 1024L #define EXTENT 16393 int main(void) { int i, r; void *m; char buf[1024]; for (i = 0; i < EXTENT; i++) { m = mmap(NULL, (size_t) 1 * 1024 * 1024 * 1024L, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); if (m == (void *)-1) printf("MMAP Failed: %d\n", m); else printf("%d : MMAP returned %p\n", i, m); r = fork(); if (r == 0) { printf("%d: successed\n", i); return 0; } else if (r < 0) printf("FORK Failed: %d\n", r); else if (r > 0) wait(NULL); } return 0; } Increase the storage size of the result to unsigned long, which is sufficient for storing the difference between addresses. Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29mm: remove swap token codeRik van Riel1-9/+0
The swap token code no longer fits in with the current VM model. It does not play well with cgroups or the better NUMA placement code in development, since we have only one swap token globally. It also has the potential to mess with scalability of the system, by increasing the number of non-reclaimable pages on the active and inactive anon LRU lists. Last but not least, the swap token code has been broken for a year without complaints, as reported by Konstantin Khlebnikov. This suggests we no longer have much use for it. The days of sub-1G memory systems with heavy use of swap are over. If we ever need thrashing reducing code in the future, we will have to implement something that does scale. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Hugh Dickins <hughd@google.com> Acked-by: Bob Picco <bpicco@meloft.net> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-27cgroup: superblock can't be released with active dentriesTejun Heo1-3/+14
48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal optional" allowed a css to linger after the associated cgroup is removed. As a css holds a reference on the cgroup's dentry, it means that cgroup dentries may linger for a while. cgroup_create() does grab an active reference on the superblock to prevent it from going away while there are !root cgroups; however, the reference is put from cgroup_diput() which is invoked on cgroup removal, so cgroup dentries which are removed but persisting due to lingering csses already have released their superblock active refs allowing superblock to be killed while those dentries are around. Given the right condition, this makes cgroup_kill_sb() call kill_litter_super() with dentries with non-zero d_count leading to BUG() in shrink_dcache_for_umount_subtree(). Fix it by adding cgroup_dops->d_release() operation and moving deactivate_super() to it. cgroup_diput() now marks dentry->d_fsdata with itself if superblock should be deactivated and cgroup_d_release() deactivates the superblock on dentry release. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Sasha Levin <levinsasha928@gmail.com> Tested-by: Sasha Levin <levinsasha928@gmail.com> LKML-Reference: <CA+1xoqe5hMuxzCRhMy7J0XchDk2ZnuxOHJKikROk1-ReAzcT6g@mail.gmail.com> Acked-by: Li Zefan <lizefan@huawei.com>
2012-05-25tick: Move skew_tick option into the HIGH_RES_TIMER sectionThomas Gleixner1-8/+8
commit 5307c95 (tick: Add tick skew boot option) broke the !CONFIG_HIGH_RES_TIMERS build. Move the boot option parsing into the CONFIG_HIGH_RES_TIMERS section. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Galbraith <mgalbraith@suse.de>
2012-05-25clockevents: Make clockevents_config() a global symbolMagnus Damm1-2/+1
Make clockevents_config() into a global symbol to allow it to be used by compiled-in clockevent drivers. This is needed by drivers that want to update the timer frequency after registration time. Signed-off-by: Magnus Damm <damm@opensource.se> Tested-by: Simon Horman <horms@verge.net.au> Cc: arnd@arndb.de Cc: johnstul@us.ibm.com Cc: rjw@sisk.pl Cc: lethal@linux-sh.org Cc: gregkh@linuxfoundation.org Cc: olof@lixom.net Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20120509143934.27521.46553.sendpatchset@w520 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-25tick: Add tick skew boot optionMike Galbraith1-0/+18
Let the user decide whether power consumption or jitter is the more important consideration for their machines. Quoting removal commit af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867: "Historically, Linux has tried to make the regular timer tick on the various CPUs not happen at the same time, to avoid contention on xtime_lock. Nowadays, with the tickless kernel, this contention no longer happens since time keeping and updating are done differently. In addition, this skew is actually hurting power consumption in a measurable way on many-core systems." Problems: - Contrary to the above, systems do encounter contention on both xtime_lock and RCU structure locks when the tick is synchronized. - Moderate sized RT systems suffer intolerable jitter due to the tick being synchronized. - SGI reports the same for their large systems. - Fully utilized systems reap no power saving benefit from skew removal, but do suffer from resulting induced lock contention. - 0209f649 rcu: limit rcu_node leaf-level fanout This patch was born to combat lock contention which testing showed to have been _induced by_ skew removal. Skew the tick, contention disappeared virtually completely. Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Link: http://lkml.kernel.org/r/1336472458.21924.78.camel@marge.simpson.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-24Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+12
Pull KVM changes from Avi Kivity: "Changes include additional instruction emulation, page-crossing MMIO, faster dirty logging, preventing the watchdog from killing a stopped guest, module autoload, a new MSI ABI, and some minor optimizations and fixes. Outside x86 we have a small s390 and a very large ppc update. Regarding the new (for kvm) rebaseless workflow, some of the patches that were merged before we switch trees had to be rebased, while others are true pulls. In either case the signoffs should be correct now." Fix up trivial conflicts in Documentation/feature-removal-schedule.txt arch/powerpc/kvm/book3s_segment.S and arch/x86/include/asm/kvm_para.h. I suspect the kvm_para.h resolution ends up doing the "do I have cpuid" check effectively twice (it was done differently in two different commits), but better safe than sorry ;) * 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (125 commits) KVM: make asm-generic/kvm_para.h have an ifdef __KERNEL__ block KVM: s390: onereg for timer related registers KVM: s390: epoch difference and TOD programmable field KVM: s390: KVM_GET/SET_ONEREG for s390 KVM: s390: add capability indicating COW support KVM: Fix mmu_reload() clash with nested vmx event injection KVM: MMU: Don't use RCU for lockless shadow walking KVM: VMX: Optimize %ds, %es reload KVM: VMX: Fix %ds/%es clobber KVM: x86 emulator: convert bsf/bsr instructions to emulate_2op_SrcV_nobyte() KVM: VMX: unlike vmcs on fail path KVM: PPC: Emulator: clean up SPR reads and writes KVM: PPC: Emulator: clean up instruction parsing kvm/powerpc: Add new ioctl to retreive server MMU infos kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler KVM: PPC: Book3S: Enable IRQs during exit handling KVM: PPC: Fix PR KVM on POWER7 bare metal KVM: PPC: Fix stbux emulation KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields ...
2012-05-24smpboot, idle: Fix comment mismatch over idle_threads_init()Srivatsa S. Bhat1-4/+7
The comment over idle_threads_init() really talks about the functionality of idle_init(). Move that comment to idle_init(), and add a suitable comment over idle_threads_init(). Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: suresh.b.siddha@intel.com Cc: venki@google.com Cc: nikunj@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/20120524151100.2549.66501.stgit@srivatsabhat.in.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-24smpboot, idle: Optimize calls to smp_processor_id() in idle_threads_init()Srivatsa S. Bhat1-2/+4
While trying to initialize idle threads for all cpus, idle_threads_init() calls smp_processor_id() in a loop, which is unnecessary. The intent is to initialize idle threads for all non-boot cpus. So just use a variable to note the boot cpu and use it in the loop. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: suresh.b.siddha@intel.com Cc: venki@google.com Cc: nikunj@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/20120524151055.2549.64309.stgit@srivatsabhat.in.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-24Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds1-18/+88
Pull irqdomain changes from Grant Likely: "Minor changes and fixups for irqdomain infrastructure. The most important change adds the ability to remove a registered irqdomain." * tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6: irqdomain: Document size parameter of irq_domain_add_linear() irqdomain: trivial pr_fmt conversion. irqdomain: Kill off duplicate definitions. irqdomain: Make irq_domain_simple_map() static. irqdomain: Export remaining public API symbols. irqdomain: Support removal of IRQ domains.
2012-05-24genirq: Introduce irq_do_set_affinity() to reduce duplicated codeJiang Liu3-28/+27
All invocations of chip->irq_set_affinity() are doing the same return value checks. Let them all use a common function. [ tglx: removed the silly likely while at it ] Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Keping Chen <chenkeping@huawei.com> Link: http://lkml.kernel.org/r/1333120296-13563-3-git-send-email-jiang.liu@huawei.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-24Merge branch 'timers-core-for-linus' of ↵Linus Torvalds3-15/+55
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner. Various trivial conflict fixups in arch Kconfig due to addition of unrelated entries nearby. And one slightly more subtle one for sparc32 (new user of GENERIC_CLOCKEVENTS), fixed up as per Thomas. * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) timekeeping: Fix a few minor newline issues. time: remove obsolete declaration ntp: Fix a stale comment and a few stray newlines. ntp: Correct TAI offset during leap second timers: Fixup the Kconfig consolidation fallout x86: Use generic time config unicore32: Use generic time config um: Use generic time config tile: Use generic time config sparc: Use: generic time config sh: Use generic time config score: Use generic time config s390: Use generic time config openrisc: Use generic time config powerpc: Use generic time config mn10300: Use generic time config mips: Use generic time config microblaze: Use generic time config m68k: Use generic time config m32r: Use generic time config ...
2012-05-24genirq: Add IRQS_PENDING for nested and simple irqNing Jiang1-2/+6
Every interrupt which is an active wakeup source needs the ability to abort suspend if there is a pending irq. Right now only edge and level irqs can do that. | +---------+ | INTC | +---------+ | GPIO_IRQ +------------+ | gpio-exp | +------------+ | | GPIO0_IRQ GPIO1_IRQ In the above diagram, gpio expander has irq number GPIO_IRQ, it is connected with two sub GPIO pins, GPIO0 and GPIO1. During suspend, we set IRQF_NO_SUSPEND for GPIO_IRQ so that gpio expander driver can handle the sub irq GPIO0_IRQ and GPIO1_IRQ, and these two irqs themselves can further be handled by simple or nested irq in some drivers(typically gpio and mfd driver). If they are used as wakeup sources during suspend, we want them to be able to abort suspend too. Setting IRQS_PENDING flag in handle_nested_irq() and handle_simple_irq() when the irq is disabled allows check_wakeup_irqs() to identify such irqs as source for aborting suspend. Signed-off-by: Ning Jiang <ning.n.jiang@gmail.com> Cc: rjw@sisk.pl Link: http://lkml.kernel.org/r/CAH3Oq6T905%2B3fkF43NAMMFvJvq7dsk_so6T2vQ8ZJrA5xiU3YA@mail.gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+2
Pull more networking updates from David Miller: "Ok, everything from here on out will be bug fixes." 1) One final sync of wireless and bluetooth stuff from John Linville. These changes have all been in his tree for more than a week, and therefore have had the necessary -next exposure. John was just away on a trip and didn't have a change to send the pull request until a day or two ago. 2) Put back some defines in user exposed header file areas that were removed during the tokenring purge. From Stephen Hemminger and Paul Gortmaker. 3) A bug fix for UDP hash table allocation got lost in the pile due to one of those "you got it.. no I've got it.." situations. :-) From Tim Bird. 4) SKB coalescing in TCP needs to have stricter checks, otherwise we'll try to coalesce overlapping frags and crash. Fix from Eric Dumazet. 5) RCU routing table lookups can race with free_fib_info(), causing crashes when we deref the device pointers in the route. Fix by releasing the net device in the RCU callback. From Yanmin Zhang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (293 commits) tcp: take care of overlaps in tcp_try_coalesce() ipv4: fix the rcu race between free_fib_info and ip_route_output_slow mm: add a low limit to alloc_large_system_hash ipx: restore token ring define to include/linux/ipx.h if: restore token ring ARP type to header xen: do not disable netfront in dom0 phy/micrel: Fix ID of KSZ9021 mISDN: Add X-Tensions USB ISDN TA XC-525 gianfar:don't add FCB length to hard_header_len Bluetooth: Report proper error number in disconnection Bluetooth: Create flags for bt_sk() Bluetooth: report the right security level in getsockopt Bluetooth: Lock the L2CAP channel when sending Bluetooth: Restore locking semantics when looking up L2CAP channels Bluetooth: Fix a redundant and problematic incoming MTU check Bluetooth: Add support for Foxconn/Hon Hai AR5BBU22 0489:E03C Bluetooth: Fix EIR data generation for mgmt_device_found Bluetooth: Fix Inquiry with RSSI event mask Bluetooth: improve readability of l2cap_seq_list code Bluetooth: Fix skb length calculation ...
2012-05-24Merge branch 'perf-uprobes-for-linus' of ↵Linus Torvalds11-876/+3521
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull user-space probe instrumentation from Ingo Molnar: "The uprobes code originates from SystemTap and has been used for years in Fedora and RHEL kernels. This version is much rewritten, reviews from PeterZ, Oleg and myself shaped the end result. This tree includes uprobes support in 'perf probe' - but SystemTap (and other tools) can take advantage of user probe points as well. Sample usage of uprobes via perf, for example to profile malloc() calls without modifying user-space binaries. First boot a new kernel with CONFIG_UPROBE_EVENT=y enabled. If you don't know which function you want to probe you can pick one from 'perf top' or can get a list all functions that can be probed within libc (binaries can be specified as well): $ perf probe -F -x /lib/libc.so.6 To probe libc's malloc(): $ perf probe -x /lib64/libc.so.6 malloc Added new event: probe_libc:malloc (on 0x7eac0) You can now use it in all perf tools, such as: perf record -e probe_libc:malloc -aR sleep 1 Make use of it to create a call graph (as the flat profile is going to look very boring): $ perf record -e probe_libc:malloc -gR make [ perf record: Woken up 173 times to write data ] [ perf record: Captured and wrote 44.190 MB perf.data (~1930712 $ perf report | less 32.03% git libc-2.15.so [.] malloc | --- malloc 29.49% cc1 libc-2.15.so [.] malloc | --- malloc | |--0.95%-- 0x208eb1000000000 | |--0.63%-- htab_traverse_noresize 11.04% as libc-2.15.so [.] malloc | --- malloc | 7.15% ld libc-2.15.so [.] malloc | --- malloc | 5.07% sh libc-2.15.so [.] malloc | --- malloc | 4.99% python-config libc-2.15.so [.] malloc | --- malloc | 4.54% make libc-2.15.so [.] malloc | --- malloc | |--7.34%-- glob | | | |--93.18%-- 0x41588f | | | --6.82%-- glob | 0x41588f ... Or: $ perf report -g flat | less # Overhead Command Shared Object Symbol # ........ ............. ............. .......... # 32.03% git libc-2.15.so [.] malloc 27.19% malloc 29.49% cc1 libc-2.15.so [.] malloc 24.77% malloc 11.04% as libc-2.15.so [.] malloc 11.02% malloc 7.15% ld libc-2.15.so [.] malloc 6.57% malloc ... The core uprobes design is fairly straightforward: uprobes probe points register themselves at (inode:offset) addresses of libraries/binaries, after which all existing (or new) vmas that map that address will have a software breakpoint injected at that address. vmas are COW-ed to preserve original content. The probe points are kept in an rbtree. If user-space executes the probed inode:offset instruction address then an event is generated which can be recovered from the regular perf event channels and mmap-ed ring-buffer. Multiple probes at the same address are supported, they create a dynamic callback list of event consumers. The basic model is further complicated by the XOL speedup: the original instruction that is probed is copied (in an architecture specific fashion) and executed out of line when the probe triggers. The XOL area is a single vma per process, with a fixed number of entries (which limits probe execution parallelism). The API: uprobes are installed/removed via /sys/kernel/debug/tracing/uprobe_events, the API is integrated to align with the kprobes interface as much as possible, but is separate to it. Injecting a probe point is privileged operation, which can be relaxed by setting perf_paranoid to -1. You can use multiple probes as well and mix them with kprobes and regular PMU events or tracepoints, when instrumenting a task." Fix up trivial conflicts in mm/memory.c due to previous cleanup of unmap_single_vma(). * 'perf-uprobes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) perf probe: Detect probe target when m/x options are absent perf probe: Provide perf interface for uprobes tracing: Fix kconfig warning due to a typo tracing: Provide trace events interface for uprobes tracing: Extract out common code for kprobes/uprobes trace events tracing: Modify is_delete, is_return from int to bool uprobes/core: Decrement uprobe count before the pages are unmapped uprobes/core: Make background page replacement logic account for rss_stat counters uprobes/core: Optimize probe hits with the help of a counter uprobes/core: Allocate XOL slots for uprobes use uprobes/core: Handle breakpoint and singlestep exceptions uprobes/core: Rename bkpt to swbp uprobes/core: Make order of function parameters consistent across functions uprobes/core: Make macro names consistent uprobes: Update copyright notices uprobes/core: Move insn to arch specific structure uprobes/core: Remove uprobe_opcode_sz uprobes/core: Make instruction tables volatile uprobes: Move to kernel/events/ uprobes/core: Clean up, refactor and improve the code ...
2012-05-24Merge branch 'v4l_for_linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - some V4L2 API updates needed by embedded devices - DVB API extensions for ATSC-MH delivery system, used in US for mobile TV - new tuners for fc0011/0012/0013 and tua9001 - a new dvb driver for af9033/9035 - a new ATSC-MH frontend (lg2160) - new remote controller keymaps - Removal of a few legacy webcam driver that got replaced by gspca on several kernel versions ago - a new driver for Exynos 4/5 webcams(s5pp fimc-lite) - a new webcam sensor driver (smiapp) - a new video input driver for embedded (sta2x1xx) - several improvements, fixes, cleanups, etc inside the drivers. Manually fix up conflicts due to err() -> dev_err() conversion in drivers/staging/media/easycap/easycap_main.c * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (484 commits) [media] saa7134-cards: Remove a PCI entry added by mistake [media] radio-sf16fmi: add support for SF16-FMD [media] rc-loopback: remove duplicate line [media] patch for Asus My Cinema PS3-100 (1043:48cd) [media] au0828: Move the Kconfig knob under V4L_USB_DRIVERS [media] em28xx: simple comment fix [media] [resend] radio-sf16fmr2: add PnP support for SF16-FMD2 [media] smiapp: Use v4l2_ctrl_new_int_menu() instead of v4l2_ctrl_new_custom() [media] smiapp: Add support for 8-bit uncompressed formats [media] smiapp: Allow generic quirk registers [media] smiapp: Use non-binning limits if the binning limit is zero [media] smiapp: Initialise rval in smiapp_read_nvm() [media] smiapp: Round minimum pre_pll up rather than down in ip_clk_freq check [media] smiapp: Use 8-bit reads only before identifying the sensor [media] smiapp: Quirk for sensors that only do 8-bit reads [media] smiapp: Pass struct sensor to register writing commands instead of i2c_client [media] smiapp: Allow using external clock from the clock framework [media] zl10353: change .read_snr() to report SNR as a 0.1 dB [media] media: add support to gspca/pac7302.c for 093a:2627 (Genius FaceCam 300) [media] m88rs2000 - only flip bit 2 on reg 0x70 on 16th try ...
2012-05-24Merge branch 'tip/perf/urgent' of ↵Ingo Molnar1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent Pull an ftrace ring-buffer fix from Steve Rostedt: * fix kernel crash when changing the size of the ring-buffer on boxes where possible_cpus != online_cpus. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-24mm: add a low limit to alloc_large_system_hashTim Bird1-1/+2
UDP stack needs a minimum hash size value for proper operation and also uses alloc_large_system_hash() for proper NUMA distribution of its hash tables and automatic sizing depending on available system memory. On some low memory situations, udp_table_init() must ignore the alloc_large_system_hash() result and reallocs a bigger memory area. As we cannot easily free old hash table, we leak it and kmemleak can issue a warning. This patch adds a low limit parameter to alloc_large_system_hash() to solve this problem. We then specify UDP_HTABLE_SIZE_MIN for UDP/UDPLite hash table allocation. Reported-by: Mark Asselstine <mark.asselstine@windriver.com> Reported-by: Tim Bird <tim.bird@am.sony.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-23keys: kill task_struct->replacement_session_keyringOleg Nesterov1-9/+0
Kill the no longer used task_struct->replacement_session_keyring, update copy_creds() and exit_creds(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Smith <dsmith@redhat.com> Cc: "Frank Ch. Eigler" <fche@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-23genirq: reimplement exit_irq_thread() hook via task_work_add()Oleg Nesterov2-37/+33
exit_irq_thread() and task->irq_thread are needed to handle the unexpected (and unlikely) exit of irq-thread. We can use task_work instead and make this all private to kernel/irq/manage.c, cleanup plus micro-optimization. 1. rename exit_irq_thread() to irq_thread_dtor(), make it static, and move it up before irq_thread(). 2. change irq_thread() to do task_work_add(irq_thread_dtor) at the start and task_work_cancel() before return. tracehook_notify_resume() can never play with kthreads, only do_exit()->exit_task_work() can call the callback and this is what we want. 3. remove task_struct->irq_thread and the special hook in do_exit(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Howells <dhowells@redhat.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Smith <dsmith@redhat.com> Cc: "Frank Ch. Eigler" <fche@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-23task_work_add: generic process-context callbacksOleg Nesterov4-2/+90
Provide a simple mechanism that allows running code in the (nonatomic) context of the arbitrary task. The caller does task_work_add(task, task_work) and this task executes task_work->func() either from do_notify_resume() or from do_exit(). The callback can rely on PF_EXITING to detect the latter case. "struct task_work" can be embedded in another struct, still it has "void *data" to handle the most common/simple case. This allows us to kill the ->replacement_session_keyring hack, and potentially this can have more users. Performance-wise, this adds 2 "unlikely(!hlist_empty())" checks into tracehook_notify_resume() and do_exit(). But at the same time we can remove the "replacement_session_keyring != NULL" checks from arch/*/signal.c and exit_creds(). Note: task_work_add/task_work_run abuses ->pi_lock. This is only because this lock is already used by lookup_pi_state() to synchronize with do_exit() setting PF_EXITING. Fortunately the scope of this lock in task_work.c is really tiny, and the code is unlikely anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Smith <dsmith@redhat.com> Cc: "Frank Ch. Eigler" <fche@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-23Merge branch 'for-linus' of ↵Linus Torvalds2-18/+17
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull first series of signal handling cleanups from Al Viro: "This is just the first part of the queue (about a half of it); assorted fixes all over the place in signal handling. This one ends with all sigsuspend() implementations switched to generic one (->saved_sigmask-based). With this, a bunch of assorted old buglets are fixed and most of the missing bits of NOTIFY_RESUME hookup are in place. Two more fixes sit in arm and um trees respectively, and there's a couple of broken ones that need obvious fixes - parisc and avr32 check TIF_NOTIFY_RESUME only on one of two codepaths; fixes for that will happen in the next series" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (55 commits) unicore32: if there's no handler we need to restore sigmask, syscall or no syscall xtensa: add handling of TIF_NOTIFY_RESUME microblaze: drop 'oldset' argument of do_notify_resume() microblaze: handle TIF_NOTIFY_RESUME score: add handling of NOTIFY_RESUME to do_notify_resume() m68k: add TIF_NOTIFY_RESUME and handle it. sparc: kill ancient comment in sparc_sigaction() h8300: missing checks of __get_user()/__put_user() return values frv: missing checks of __get_user()/__put_user() return values cris: missing checks of __get_user()/__put_user() return values powerpc: missing checks of __get_user()/__put_user() return values sh: missing checks of __get_user()/__put_user() return values sparc: missing checks of __get_user()/__put_user() return values avr32: struct old_sigaction is never used m32r: struct old_sigaction is never used xtensa: xtensa_sigaction doesn't exist alpha: tidy signal delivery up score: don't open-code force_sigsegv() cris: don't open-code force_sigsegv() blackfin: don't open-code force_sigsegv() ...
2012-05-23Merge branch 'for-linus' of ↵Linus Torvalds14-287/+883
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace enhancements from Eric Biederman: "This is a course correction for the user namespace, so that we can reach an inexpensive, maintainable, and reasonably complete implementation. Highlights: - Config guards make it impossible to enable the user namespace and code that has not been converted to be user namespace safe. - Use of the new kuid_t type ensures the if you somehow get past the config guards the kernel will encounter type errors if you enable user namespaces and attempt to compile in code whose permission checks have not been updated to be user namespace safe. - All uids from child user namespaces are mapped into the initial user namespace before they are processed. Removing the need to add an additional check to see if the user namespace of the compared uids remains the same. - With the user namespaces compiled out the performance is as good or better than it is today. - For most operations absolutely nothing changes performance or operationally with the user namespace enabled. - The worst case performance I could come up with was timing 1 billion cache cold stat operations with the user namespace code enabled. This went from 156s to 164s on my laptop (or 156ns to 164ns per stat operation). - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value. Most uid/gid setting system calls treat these value specially anyway so attempting to use -1 as a uid would likely cause entertaining failures in userspace. - If setuid is called with a uid that can not be mapped setuid fails. I have looked at sendmail, login, ssh and every other program I could think of that would call setuid and they all check for and handle the case where setuid fails. - If stat or a similar system call is called from a context in which we can not map a uid we lie and return overflowuid. The LFS experience suggests not lying and returning an error code might be better, but the historical precedent with uids is different and I can not think of anything that would break by lying about a uid we can't map. - Capabilities are localized to the current user namespace making it safe to give the initial user in a user namespace all capabilities. My git tree covers all of the modifications needed to convert the core kernel and enough changes to make a system bootable to runlevel 1." Fix up trivial conflicts due to nearby independent changes in fs/stat.c * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits) userns: Silence silly gcc warning. cred: use correct cred accessor with regards to rcu read lock userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq userns: Convert cgroup permission checks to use uid_eq userns: Convert tmpfs to use kuid and kgid where appropriate userns: Convert sysfs to use kgid/kuid where appropriate userns: Convert sysctl permission checks to use kuid and kgids. userns: Convert proc to use kuid/kgid where appropriate userns: Convert ext4 to user kuid/kgid where appropriate userns: Convert ext3 to use kuid/kgid where appropriate userns: Convert ext2 to use kuid/kgid where appropriate. userns: Convert devpts to use kuid/kgid where appropriate userns: Convert binary formats to use kuid/kgid where appropriate userns: Add negative depends on entries to avoid building code that is userns unsafe userns: signal remove unnecessary map_cred_ns userns: Teach inode_capable to understand inodes whose uids map to other namespaces. userns: Fail exec for suid and sgid binaries with ids outside our user namespace. userns: Convert stat to return values mapped from kuids and kgids userns: Convert user specfied uids and gids in chown into kuids and kgid userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs ...
2012-05-23Merge tag 'module-for-linus' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus Pull module patches from Rusty Russell, who really sells them: "Three trivial patches of no real utility. Modules are boring." But to make things slightly more exciting, he adds: "Fortunately David Howells is looking to change this, with his module signing patchset. But that's for next merge window... Cheers, Rusty." * tag 'module-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: Guard check in module loader against integer overflow modpost: use proper kernel style for autogenerated files modpost: Stop grab_file() from leaking filedescriptors if fstat() fails
2012-05-23Merge tag 'pm-for-3.5' of ↵Linus Torvalds8-42/+635
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: - Implementation of opportunistic suspend (autosleep) and user space interface for manipulating wakeup sources. - Hibernate updates from Bojan Smojver and Minho Ban. - Updates of the runtime PM core and generic PM domains framework related to PM QoS. - Assorted fixes. * tag 'pm-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits) epoll: Fix user space breakage related to EPOLLWAKEUP PM / Domains: Make it possible to add devices to inactive domains PM / Hibernate: Use get_gendisk to verify partition if resume_file is integer format PM / Domains: Fix computation of maximum domain off time PM / Domains: Fix link checking when add subdomain PM / Sleep: User space wakeup sources garbage collector Kconfig option PM / Sleep: Make the limit of user space wakeup sources configurable PM / Documentation: suspend-and-cpuhotplug.txt: Fix typo PM / Domains: Cache device stop and domain power off governor results, v3 PM / Domains: Make device removal more straightforward PM / Sleep: Fix a mistake in a conditional in autosleep_store() epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready PM / QoS: Create device constraints objects on notifier registration PM / Runtime: Remove device fields related to suspend time, v2 PM / Domains: Rework default domain power off governor function, v2 PM / Domains: Rework default device stop governor function, v2 PM / Sleep: Add user space interface for manipulating wakeup sources, v3 PM / Sleep: Add "prevent autosleep time" statistics to wakeup sources PM / Sleep: Implement opportunistic sleep, v2 PM / Sleep: Add wakeup_source_activate and wakeup_source_deactivate tracepoints ...
2012-05-23ring-buffer: Check for valid buffer before changing sizeSteven Rostedt1-0/+5
On some machines the number of possible CPUS is not the same as the number of CPUs that is on the machine. Ftrace uses possible_cpus to update the tracing structures but the ring buffer only allocates per cpu buffers for online CPUs when they come up. When the wakeup tracer was enabled in such a case, the ftrace code enabled all possible cpu buffers, but the code in ring_buffer_resize() did not check to see if the buffer in question was allocated. Since boot up CPUs did not match possible CPUs it caused the following crash: BUG: unable to handle kernel NULL pointer dereference at 00000020 IP: [<c1097851>] ring_buffer_resize+0x16a/0x28d *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: [last unloaded: scsi_wait_scan] Pid: 1387, comm: bash Not tainted 3.4.0-test+ #13 /DG965MQ EIP: 0060:[<c1097851>] EFLAGS: 00010217 CPU: 0 EIP is at ring_buffer_resize+0x16a/0x28d EAX: f5a14340 EBX: f6026b80 ECX: 00000ff4 EDX: 00000ff3 ESI: 00000000 EDI: 00000002 EBP: f4275ecc ESP: f4275eb0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 80050033 CR2: 00000020 CR3: 34396000 CR4: 000007d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process bash (pid: 1387, ti=f4274000 task=f4380cb0 task.ti=f4274000) Stack: c109cf9a f6026b98 00000162 00160f68 00000006 00160f68 00000002 f4275ef0 c109d013 f4275ee8 c123b72a c1c0bf00 c1cc81dc 00000005 f4275f98 00000007 f4275f70 c109d0c7 7700000e 75656b61 00000070 f5e90900 f5c4e198 00000301 Call Trace: [<c109cf9a>] ? tracing_set_tracer+0x115/0x1e9 [<c109d013>] tracing_set_tracer+0x18e/0x1e9 [<c123b72a>] ? _copy_from_user+0x30/0x46 [<c109d0c7>] tracing_set_trace_write+0x59/0x7f [<c10ec01e>] ? fput+0x18/0x1c6 [<c11f8732>] ? security_file_permission+0x27/0x2b [<c10eaacd>] ? rw_verify_area+0xcf/0xf2 [<c10ec01e>] ? fput+0x18/0x1c6 [<c109d06e>] ? tracing_set_tracer+0x1e9/0x1e9 [<c10ead77>] vfs_write+0x8b/0xe3 [<c10ebead>] ? fget_light+0x30/0x81 [<c10eaf54>] sys_write+0x42/0x63 [<c1834fbf>] sysenter_do_call+0x12/0x28 This happens with the latency tracer as the ftrace code updates the saved max buffer via its cpumask and not with a global setting. Adding a check in ring_buffer_resize() to make sure the buffer being resized exists, fixes the problem. Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-05-23Merge branches 'perf-urgent-for-linus' and 'perf-core-for-linus' of ↵Linus Torvalds2-11/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: - Leftover AMD PMU driver fix fix from the end of the v3.4 stabilization cycle. - Late tools/perf/ changes that missed the first round: * endianness fixes * event parsing improvements * libtraceevent fixes factored out from trace-cmd * perl scripting engine fixes related to libtraceevent, * testcase improvements * perf inject / pipe mode fixes * plus a kernel side fix * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Update event scheduling constraints for AMD family 15h models * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "sched, perf: Use a single callback into the scheduler" perf evlist: Show event attribute details perf tools: Bump default sample freq to 4 kHz perf buildid-list: Work better with pipe mode perf tools: Fix piped mode read code perf inject: Fix broken perf inject -b perf tools: rename HEADER_TRACE_INFO to HEADER_TRACING_DATA perf tools: Add union u64_swap type for swapping u64 data perf tools: Carry perf_event_attr bitfield throught different endians perf record: Fix documentation for branch stack sampling perf target: Add cpu flag to sample_type if target has cpu perf tools: Always try to build libtraceevent perf tools: Rename libparsevent to libtraceevent in Makefile perf script: Rename struct event to struct event_format in perl engine perf script: Explicitly handle known default print arg type perf tools: Add hardcoded name term for pmu events perf tools: Separate 'mem:' event scanner bits perf tools: Use allocated list for each parsed event perf tools: Add support for displaying event parser debug info perf test: Move parse event automated tests to separated object
2012-05-23Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull fpu state cleanups from Ingo Molnar: "This tree streamlines further aspects of FPU handling by eliminating the prepare_to_copy() complication and moving that logic to arch_dup_task_struct(). It also fixes the FPU dumps in threaded core dumps, removes and old (and now invalid) assumption plus micro-optimizes the exit path by avoiding an FPU save for dead tasks." Fixed up trivial add-add conflict in arch/sh/kernel/process.c that came in because we now do the FPU handling in arch_dup_task_struct() rather than the legacy (and now gone) prepare_to_copy(). * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, fpu: drop the fpu state during thread exit x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state() coredump: ensure the fpu state is flushed for proper multi-threaded core dump fork: move the real prepare_to_copy() users to arch_dup_task_struct()
2012-05-23Merge branch 'x86-extable-for-linus' of ↵Linus Torvalds1-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull exception table generation updates from Ingo Molnar: "The biggest change here is to allow the build-time sorting of the exception table, to speed up booting. This is achieved by the architecture enabling BUILDTIME_EXTABLE_SORT. This option is enabled for x86 and MIPS currently. On x86 a number of fixes and changes were needed to allow build-time sorting of the exception table, in particular a relocation invariant exception table format was needed. This required the abstracting out of exception table protocol and the removal of 20 years of accumulated assumptions about the x86 exception table format. While at it, this tree also cleans up various other aspects of exception handling, such as early(er) exception handling for rdmsr_safe() et al. All in one, as the result of these changes the x86 exception code is now pretty nice and modern. As an added bonus any regressions in this code will be early and violent crashes, so if you see any of those, you'll know whom to blame!" Fix up trivial conflicts in arch/{mips,x86}/Kconfig files due to nearby modifications of other core architecture options. * 'x86-extable-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) Revert "x86, extable: Disable presorted exception table for now" scripts/sortextable: Handle relative entries, and other cleanups x86, extable: Switch to relative exception table entries x86, extable: Disable presorted exception table for now x86, extable: Add _ASM_EXTABLE_EX() macro x86, extable: Remove open-coded exception table entries in arch/x86/ia32/ia32entry.S x86, extable: Remove open-coded exception table entries in arch/x86/include/asm/xsave.h x86, extable: Remove open-coded exception table entries in arch/x86/include/asm/kvm_host.h x86, extable: Remove the now-unused __ASM_EX_SEC macros x86, extable: Remove open-coded exception table entries in arch/x86/xen/xen-asm_32.S x86, extable: Remove open-coded exception table entries in arch/x86/um/checksum_32.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/usercopy_32.c x86, extable: Remove open-coded exception table entries in arch/x86/lib/putuser.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/getuser.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/csum-copy_64.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/copy_user_nocache_64.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/copy_user_64.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/checksum_32.S x86, extable: Remove open-coded exception table entries in arch/x86/kernel/test_rodata.c x86, extable: Remove open-coded exception table entries in arch/x86/kernel/entry_64.S ...
2012-05-23Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/umlLinus Torvalds1-6/+0
Pull UML updates from Richard Weinberger: "Most changes are bug fixes and cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: missing checks of __put_user()/__get_user() return values um: stub_rt_sigsuspend isn't needed these days anymore um/x86: merge (and trim) 32- and 64-bit variants of ptrace.h irq: Remove irq_chip->release() um: Remove CONFIG_IRQ_RELEASE_METHOD um: Remove usage of irq_chip->release() um: Implement um_free_irq() um: Fix __swp_type() um: Implement a custom pte_same() function um: Add BUG() to do_ops()'s error path um: Remove unused variables um: bury unused _TIF_RESTORE_SIGMASK um: wrong sigmask saved in case of multiple sigframes um: add TIF_NOTIFY_RESUME um: ->restart_block.fn needs to be reset on sigreturn
2012-05-23Revert "sched, perf: Use a single callback into the scheduler"Jiri Olsa2-11/+12
This reverts commit cb04ff9ac424 ("sched, perf: Use a single callback into the scheduler"). Before this change was introduced, the process switch worked like this (wrt. to perf event schedule): schedule (prev, next) - schedule out all perf events for prev - switch to next - schedule in all perf events for current (next) After the commit, the process switch looks like: schedule (prev, next) - schedule out all perf events for prev - schedule in all perf events for (next) - switch to next The problem is, that after we schedule perf events in, the pmu is enabled and we can receive events even before we make the switch to next - so "current" still being prev process (event SAMPLE data are filled based on the value of the "current" process). Thats exactly what we see for test__PERF_RECORD test. We receive SAMPLES with PID of the process that our tracee is scheduled from. Discussed with Peter Zijlstra: > Bah!, yeah I guess reverting is the right thing for now. Sad > though. > > So by having the two hooks we have a black-spot between them > where we receive no events at all, this black-spot covers the > hand-over of current and we thus don't receive the 'wrong' > events. > > I rather liked we could do away with both that black-spot and > clean up the code a little, but apparently people rely on it. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: acme@redhat.com Cc: paulus@samba.org Cc: cjashfor@linux.vnet.ibm.com Cc: fweisbec@gmail.com Cc: eranian@google.com Link: http://lkml.kernel.org/r/20120523111302.GC1638@m.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-23Guard check in module loader against integer overflowDavid Howells1-1/+2
The check: if (len < hdr->e_shoff + hdr->e_shnum * sizeof(Elf_Shdr)) may not work if there's an overflow in the right-hand side of the condition. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>