summaryrefslogtreecommitdiffstats
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2012-05-31c/r: prctl: add ability to set new mm_struct::exe_fileCyrill Gorcunov1-0/+56
When we do restore we would like to have a way to setup a former mm_struct::exe_file so that /proc/pid/exe would point to the original executable file a process had at checkpoint time. For this the PR_SET_MM_EXE_FILE code is introduced. This option takes a file descriptor which will be set as a source for new /proc/$pid/exe symlink. Note it allows to change /proc/$pid/exe if there are no VM_EXECUTABLE vmas present for current process, simply because this feature is a special to C/R and mm::num_exe_file_vmas become meaningless after that. To minimize the amount of transition the /proc/pid/exe symlink might have, this feature is implemented in one-shot manner. Thus once changed the symlink can't be changed again. This should help sysadmins to monitor the symlinks over all process running in a system. In particular one could make a snapshot of processes and ring alarm if there unexpected changes of /proc/pid/exe's in a system. Note -- this feature is available iif CONFIG_CHECKPOINT_RESTORE is set and the caller must have CAP_SYS_RESOURCE capability granted, otherwise the request to change symlink will be rejected. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31c/r: prctl: extend PR_SET_MM to set up more mm_struct entriesCyrill Gorcunov1-51/+83
During checkpoint we dump whole process memory to a file and the dump includes process stack memory. But among stack data itself, the stack carries additional parameters such as command line arguments, environment data and auxiliary vector. So when we do restore procedure and once we've restored stack data itself we need to setup mm_struct::arg_start/end, env_start/end, so restored process would be able to find command line arguments and environment data it had at checkpoint time. The same applies to auxiliary vector. For this reason additional PR_SET_MM_(ARG_START | ARG_END | ENV_START | ENV_END | AUXV) codes are introduced. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Tejun Heo <tj@kernel.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31syscalls, x86: add __NR_kcmp syscallCyrill Gorcunov3-0/+202
While doing the checkpoint-restore in the user space one need to determine whether various kernel objects (like mm_struct-s of file_struct-s) are shared between tasks and restore this state. The 2nd step can be solved by using appropriate CLONE_ flags and the unshare syscall, while there's currently no ways for solving the 1st one. One of the ways for checking whether two tasks share e.g. mm_struct is to provide some mm_struct ID of a task to its proc file, but showing such info considered to be not that good for security reasons. Thus after some debates we end up in conclusion that using that named 'comparison' syscall might be the best candidate. So here is it -- __NR_kcmp. It takes up to 5 arguments - the pids of the two tasks (which characteristics should be compared), the comparison type and (in case of comparison of files) two file descriptors. Lookups for pids are done in the caller's PID namespace only. At moment only x86 is supported and tested. [akpm@linux-foundation.org: fix up selftests, warnings] [akpm@linux-foundation.org: include errno.h] [akpm@linux-foundation.org: tweak comment text] Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Glauber Costa <glommer@parallels.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Tejun Heo <tj@kernel.org> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Valdis.Kletnieks@vt.edu Cc: Michal Marek <mmarek@suse.cz> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31sysctl: make kernel.ns_last_pid control dependent on CHECKPOINT_RESTORECyrill Gorcunov1-1/+5
For those who doesn't need C/R functionality there is no need to control last pid, ie the pid for the next fork() call. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31pidns: make killed children autoreapEric W. Biederman1-1/+6
Force SIGCHLD handling to SIG_IGN so that signals are not generated and so that the children autoreap. This increases the parallelize and in general the speed of network namespace shutdown. Note self reaping childrean can exist past zap_pid_ns_processess but they will all be reaped before we allow the pid namespace init task with pid == 1 to be reaped. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Louis Rilling <louis.rilling@kerlabs.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31pidns: use task_active_pid_ns in do_notify_parentEric W. Biederman1-6/+5
Using task_active_pid_ns is more robust because it works even after we have called exit_namespaces. This change allows us to have parent processes that are zombies. Normally a zombie parent processes is crazy and the last thing you would want to have but in the case of not letting the init process of a pid namespace be reaped until all of it's children are dead and reaped a zombie parent process is exactly what we want. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Louis Rilling <louis.rilling@kerlabs.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31kernel/cpu.c: document clear_tasks_mm_cpumask()Anton Vorontsov1-0/+18
Add more comments on clear_tasks_mm_cpumask, plus adds a runtime check: the function is only suitable for offlined CPUs, and if called inappropriately, the kernel should scream aloud. [akpm@linux-foundation.org: tweak comment: s/walks up/walks/, use 80 cols] Suggested-by: Andrew Morton <akpm@linux-foundation.org> Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31cpu: introduce clear_tasks_mm_cpumask() helperAnton Vorontsov1-0/+26
Many architectures clear tasks' mm_cpumask like this: read_lock(&tasklist_lock); for_each_process(p) { if (p->mm) cpumask_clear_cpu(cpu, mm_cpumask(p->mm)); } read_unlock(&tasklist_lock); Depending on the context, the code above may have several problems, such as: 1. Working with task->mm w/o getting mm or grabing the task lock is dangerous as ->mm might disappear (exit_mm() assigns NULL under task_lock(), so tasklist lock is not enough). 2. Checking for process->mm is not enough because process' main thread may exit or detach its mm via use_mm(), but other threads may still have a valid mm. This patch implements a small helper function that does things correctly, i.e.: 1. We take the task's lock while whe handle its mm (we can't use get_task_mm()/mmput() pair as mmput() might sleep); 2. To catch exited main thread case, we use find_lock_task_mm(), which walks up all threads and returns an appropriate task (with task lock held). Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in the new helper, instead we take the rcu read lock. We can do this because the function is called after the cpu is taken down and marked offline, so no new tasks will get this cpu set in their mm mask. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31fork: call complete_vfork_done() after clearing child_tid and flushing ↵Konstantin Khlebnikov1-3/+7
rss-counters Child should wake up the parent from vfork() only after finishing all operations with shared mm. There is no sense in using CLONE_CHILD_CLEARTID together with CLONE_VFORK, but it looks more accurate now. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31stack usage: add pid to warning printk in check_stack_usageTim Bird1-3/+3
In embedded systems, sometimes the same program (busybox) is the cause of multiple warnings. Outputting the pid with the program name in the warning printk helps distinguish which instances of a program are using the stack most. This is a small patch, but useful. Signed-off-by: Tim Bird <tim.bird@am.sony.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31cred: remove task_is_dead() from __task_cred() validationOleg Nesterov1-1/+1
Commit 8f92054e7ca1 ("CRED: Fix __task_cred()'s lockdep check and banner comment"): add the following validation condition: task->exit_state >= 0 to permit the access if the target task is dead and therefore unable to change its own credentials. OK, but afaics currently this can only help wait_task_zombie() which calls __task_cred() without rcu lock. Remove this validation and change wait_task_zombie() to use task_uid() instead. This means we do rcu_read_lock() only to shut up the lockdep, but we already do the same in, say, wait_task_stopped(). task_is_dead() should die, task->exit_state != 0 means that this task has passed exit_notify(), only do_wait-like code paths should use this. Unfortunately, we can't kill task_is_dead() right now, it has already acquired buggy users in drivers/staging. The fix already exists. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31kmod.c: fix kernel-doc warningRandy Dunlap1-1/+1
Warning(kernel/kmod.c:419): No description found for parameter 'depth' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31kmod: move call_usermodehelper_fns() to .c file and unexport all it's helpersBoaz Harrosh1-3/+22
If we move call_usermodehelper_fns() to kmod.c file and EXPORT_SYMBOL it we can avoid exporting all it's helper functions: call_usermodehelper_setup call_usermodehelper_setfns call_usermodehelper_exec And make all of them static to kmod.c Since the optimizer will see all these as a single call site it will inline them inside call_usermodehelper_fns(). So we loose the call to _fns but gain 3 calls to the helpers. (Not that it matters) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31kmod: convert two call sites to call_usermodehelper_fns()Boaz Harrosh1-11/+8
Both kernel/sys.c && security/keys/request_key.c where inlining the exact same code as call_usermodehelper_fns(); So simply convert these sites to directly use call_usermodehelper_fns(). Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31kmod: unexport call_usermodehelper_freeinfo()Boaz Harrosh1-2/+1
call_usermodehelper_freeinfo() is not used outside of kmod.c. So unexport it, and make it static to kmod.c Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31kernel/cpu_pm.c: fix various typosNicolas Pitre1-8/+8
Signed-off-by: Nicolas Pitre <nico@linaro.org> Acked-by: Colin Cross <ccross@android.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31kernel/irq/manage.c: use the pr_foo() infrastructure to prefix printksAndrew Morton1-6/+8
Use the module-wide pr_fmt() mechanism rather than open-coding "genirq: " everywhere. Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31sethostname/setdomainname: notify userspace when there is a change in ↵Sasikantha babu1-2/+2
uts_kern_table sethostname() and setdomainname() notify userspace on failure (without modifying uts_kern_table). Change things so that we only notify userspace on success, when uts_kern_table was actually modified. Signed-off-by: Sasikantha babu <sasikanth.v19@gmail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: WANG Cong <amwang@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31kernel/resource.c: correct the comment of allocate_resource()Wei Yang1-2/+2
In the comment of allocate_resource(), the explanation of parameter max and min is not correct. Actually, these two parameters are used to specify the range of the resource that will be allocated, not the min/max size that will be allocated. Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31perf: Remove duplicate invocation on perf_event_for_eachNamhyung Kim1-1/+0
The @func callback was invoked twice for group leader when perf_event_for_each() called. It seems the commit 75f937f24bd9 ("perf_counter: Fix ctx->mutex vs counter ->mutex inversion") made the mistake during the change. Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1338443506-25009-1-git-send-email-namhyung.kim@lge.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-30Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-0/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) perf ui browser: Stop using 'self' perf annotate browser: Read perf config file for settings perf config: Allow '_' in config file variable names perf annotate browser: Make feature toggles global perf annotate browser: The idx_asm field should be used in asm only view perf tools: Convert critical messages to ui__error() perf ui: Make --stdio default when TUI is not supported tools lib traceevent: Silence compiler warning on 32bit build perf record: Fix branch_stack type in perf_record_opts perf tools: Reconstruct event with modifiers from perf_event_attr perf top: Fix counter name fixup when fallbacking to cpu-clock perf tools: fix thread_map__new_by_pid_str() memory leak in error path perf tools: Do not use _FORTIFY_SOURCE when DEBUG=1 is specified tools lib traceevent: Fix signature of create_arg_item() tools lib traceevent: Use proper function parameter type tools lib traceevent: Fix freeing arg on process_dynamic_array() tools lib traceevent: Fix a possibly wrong memory dereference tools lib traceevent: Fix a possible memory leak tools lib traceevent: Allow expressions in __print_symbolic() fields perf evlist: Explicititely initialize input_name ...
2012-05-30Merge branch 'for-3.5/core' of git://git.kernel.dk/linux-blockLinus Torvalds1-3/+2
Merge block/IO core bits from Jens Axboe: "This is a bit bigger on the core side than usual, but that is purely because we decided to hold off on parts of Tejun's submission on 3.4 to give it a bit more time to simmer. As a consequence, it's seen a long cycle in for-next. It contains: - Bug fix from Dan, wrong locking type. - Relax splice gifting restriction from Eric. - A ton of updates from Tejun, primarily for blkcg. This improves the code a lot, making the API nicer and cleaner, and also includes fixes for how we handle and tie policies and re-activate on switches. The changes also include generic bug fixes. - A simple fix from Vivek, along with a fix for doing proper delayed allocation of the blkcg stats." Fix up annoying conflict just due to different merge resolution in Documentation/feature-removal-schedule.txt * 'for-3.5/core' of git://git.kernel.dk/linux-block: (92 commits) blkcg: tg_stats_alloc_lock is an irq lock vmsplice: relax alignement requirements for SPLICE_F_GIFT blkcg: use radix tree to index blkgs from blkcg blkcg: fix blkcg->css ref leak in __blkg_lookup_create() block: fix elvpriv allocation failure handling block: collapse blk_alloc_request() into get_request() blkcg: collapse blkcg_policy_ops into blkcg_policy blkcg: embed struct blkg_policy_data in policy specific data blkcg: mass rename of blkcg API blkcg: style cleanups for blk-cgroup.h blkcg: remove blkio_group->path[] blkcg: blkg_rwstat_read() was missing inline blkcg: shoot down blkgs if all policies are deactivated blkcg: drop stuff unused after per-queue policy activation update blkcg: implement per-queue policy activation blkcg: add request_queue->root_blkg blkcg: make request_queue bypassing on allocation blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing blkcg: make blkg_conf_prep() take @pol and return with queue lock held blkcg: remove static policy ID enums ...
2012-05-30sched: Remove NULL assignment of dattr_curKamalesh Babulal1-1/+0
Remove explicit NULL assignment of static pointer dattr_cur from init_sched_domains(). Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120523091411.GG5005@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched: Remove the last NULL entry from sched_feat_namesHiroshi Shimamoto1-1/+0
No need to have the last NULL entry. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4FBF29E7.5020805@ct.jp.nec.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched: Make sched_feat_names constHiroshi Shimamoto1-1/+1
The strings sched_feat_names are never changed. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4FBF29B2.9030904@ct.jp.nec.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched/rt: Fix SCHED_RR across cgroupsColin Cross1-5/+10
task_tick_rt() has an optimization to only reschedule SCHED_RR tasks if they were the only element on their rq. However, with cgroups a SCHED_RR task could be the only element on its per-cgroup rq but still be competing with other SCHED_RR tasks in its parent's cgroup. In this case, the SCHED_RR task in the child cgroup would never yield at the end of its timeslice. If the child cgroup rt_runtime_us was the same as the parent cgroup rt_runtime_us, the task in the parent cgroup would starve completely. Modify task_tick_rt() to check that the task is the only task on its rq, and that the each of the scheduling entities of its ancestors is also the only entity on its rq. Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1337229266-15798-1-git-send-email-ccross@android.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched: Move nr_cpus_allowed out of 'struct sched_rt_entity'Peter Zijlstra3-17/+23
Since nr_cpus_allowed is used outside of sched/rt.c and wants to be used outside of there more, move it to a more natural site. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-kr61f02y9brwzkh6x53pdptm@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched: Make sure to not re-read variables after validationPeter Zijlstra1-4/+11
We could re-read rq->rt_avg after we validated it was smaller than total, invalidating the check and resulting in an unintended negative. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/1337688268.9698.29.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched: Fix SD_OVERLAPPeter Zijlstra2-7/+25
SD_OVERLAP exists to allow overlapping groups, overlapping groups appear in NUMA topologies that aren't fully connected. The typical result of not fully connected NUMA is that each cpu (or rather node) will have different spans for a particular distance. However due to how sched domains are traversed -- only the first cpu in the mask goes one level up -- the next level only cares about the spans of the cpus that went up. Due to this two things were observed to be broken: - build_overlap_sched_groups() -- since its possible the cpu we're building the groups for exists in multiple (or all) groups, the selection criteria of the first group didn't ensure there was a cpu for which is was true that cpumask_first(span) == cpu. Thus load- balancing would terminate. - update_group_power() -- assumed that the cpu span of the first group of the domain was covered by all groups of the child domain. The above explains why this isn't true, so deal with it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/1337788843.9783.14.camel@laptop Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched: Don't try allocating memory from offline nodesPeter Zijlstra1-1/+1
Allocators don't appreciate it when you try and allocate memory from offline nodes. Reported-and-tested-by: Tony Luck <tony.luck@intel.com> Reported-and-tested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-epfc1io9whb7o22bcujf31vn@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30sched/nohz: Fix rq->cpu_load calculations some morePeter Zijlstra2-10/+44
Follow up on commit 556061b00 ("sched/nohz: Fix rq->cpu_load[] calculations") since while that fixed the busy case it regressed the mostly idle case. Add a callback from the nohz exit to also age the rq->cpu_load[] array. This closes the hole where either there was no nohz load balance pass during the nohz, or there was a 'significant' amount of idle time between the last nohz balance and the nohz exit. So we'll update unconditionally from the tick to not insert any accidental 0 load periods while busy, and we try and catch up from nohz idle balance and nohz exit. Both these are still prone to missing a jiffy, but that has always been the case. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: pjt@google.com Cc: Venkatesh Pallipadi <venki@google.com> Link: http://lkml.kernel.org/n/tip-kt0trz0apodbf84ucjfdbr1a@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30Merge branch 'linus' into perf/urgentIngo Molnar21-940/+3723
Merge back Linus's latest branch so that we pick up the uprobes changes. ( I tested this branch locally and while it's one from the middle of the merge window it's a good one to base further work off. ) Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-29brlocks/lglocks: turn into functionsAndi Kleen2-1/+90
lglocks and brlocks are currently generated with some complicated macros in lglock.h. But there's no reason to not just use common utility functions and put all the data into a common data structure. Since there are at least two users it makes sense to share this code in a library. This is also easier maintainable than a macro forest. This will also make it later possible to dynamically allocate lglocks and also use them in modules (this would both still need some additional, but now straightforward, code) [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29vsprintf: fix %ps on non symbols when using kallsymsStephen Boyd1-8/+24
Using %ps in a printk format will sometimes fail silently and print the empty string if the address passed in does not match a symbol that kallsyms knows about. But using %pS will fall back to printing the full address if kallsyms can't find the symbol. Make %ps act the same as %pS by falling back to printing the address. While we're here also make %ps print the module that a symbol comes from so that it matches what %pS already does. Take this simple function for example (in a module): static void test_printk(void) { int test; pr_info("with pS: %pS\n", &test); pr_info("with ps: %ps\n", &test); } Before this patch: with pS: 0xdff7df44 with ps: After this patch: with pS: 0xdff7df44 with ps: 0xdff7df44 Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29rescounters: add res_counter_uncharge_until()Frederic Weisbecker1-2/+8
When killing a res_counter which is a child of other counter, we need to do res_counter_uncharge(child, xxx) res_counter_charge(parent, xxx) This is not atomic and wastes CPU. This patch adds res_counter_uncharge_until(). This function's uncharge propagates to ancestors until specified res_counter. res_counter_uncharge_until(child, parent, xxx) Now the operation is atomic and efficient. Signed-off-by: Frederic Weisbecker <fweisbec@redhat.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Ying Han <yinghan@google.com> Cc: Glauber Costa <glommer@parallels.com> Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29kernel: cgroup: push rcu read locking from css_is_ancestor() to callsiteJohannes Weiner1-10/+10
Library functions should not grab locks when the callsites can do it, even if the lock nests like the rcu read-side lock does. Push the rcu_read_lock() from css_is_ancestor() to its single user, mem_cgroup_same_or_subtree() in preparation for another user that may already hold the rcu read-side lock. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29mm/fork: fix overflow in vma length when copying mmap on cloneSiddhesh Poyarekar1-1/+2
The vma length in dup_mmap is calculated and stored in a unsigned int, which is insufficient and hence overflows for very large maps (beyond 16TB). The following program demonstrates this: #include <stdio.h> #include <unistd.h> #include <sys/mman.h> #define GIG 1024 * 1024 * 1024L #define EXTENT 16393 int main(void) { int i, r; void *m; char buf[1024]; for (i = 0; i < EXTENT; i++) { m = mmap(NULL, (size_t) 1 * 1024 * 1024 * 1024L, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); if (m == (void *)-1) printf("MMAP Failed: %d\n", m); else printf("%d : MMAP returned %p\n", i, m); r = fork(); if (r == 0) { printf("%d: successed\n", i); return 0; } else if (r < 0) printf("FORK Failed: %d\n", r); else if (r > 0) wait(NULL); } return 0; } Increase the storage size of the result to unsigned long, which is sufficient for storing the difference between addresses. Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29mm: remove swap token codeRik van Riel1-9/+0
The swap token code no longer fits in with the current VM model. It does not play well with cgroups or the better NUMA placement code in development, since we have only one swap token globally. It also has the potential to mess with scalability of the system, by increasing the number of non-reclaimable pages on the active and inactive anon LRU lists. Last but not least, the swap token code has been broken for a year without complaints, as reported by Konstantin Khlebnikov. This suggests we no longer have much use for it. The days of sub-1G memory systems with heavy use of swap are over. If we ever need thrashing reducing code in the future, we will have to implement something that does scale. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Hugh Dickins <hughd@google.com> Acked-by: Bob Picco <bpicco@meloft.net> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-27cgroup: superblock can't be released with active dentriesTejun Heo1-3/+14
48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal optional" allowed a css to linger after the associated cgroup is removed. As a css holds a reference on the cgroup's dentry, it means that cgroup dentries may linger for a while. cgroup_create() does grab an active reference on the superblock to prevent it from going away while there are !root cgroups; however, the reference is put from cgroup_diput() which is invoked on cgroup removal, so cgroup dentries which are removed but persisting due to lingering csses already have released their superblock active refs allowing superblock to be killed while those dentries are around. Given the right condition, this makes cgroup_kill_sb() call kill_litter_super() with dentries with non-zero d_count leading to BUG() in shrink_dcache_for_umount_subtree(). Fix it by adding cgroup_dops->d_release() operation and moving deactivate_super() to it. cgroup_diput() now marks dentry->d_fsdata with itself if superblock should be deactivated and cgroup_d_release() deactivates the superblock on dentry release. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Sasha Levin <levinsasha928@gmail.com> Tested-by: Sasha Levin <levinsasha928@gmail.com> LKML-Reference: <CA+1xoqe5hMuxzCRhMy7J0XchDk2ZnuxOHJKikROk1-ReAzcT6g@mail.gmail.com> Acked-by: Li Zefan <lizefan@huawei.com>
2012-05-25tick: Move skew_tick option into the HIGH_RES_TIMER sectionThomas Gleixner1-8/+8
commit 5307c95 (tick: Add tick skew boot option) broke the !CONFIG_HIGH_RES_TIMERS build. Move the boot option parsing into the CONFIG_HIGH_RES_TIMERS section. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Galbraith <mgalbraith@suse.de>
2012-05-25clockevents: Make clockevents_config() a global symbolMagnus Damm1-2/+1
Make clockevents_config() into a global symbol to allow it to be used by compiled-in clockevent drivers. This is needed by drivers that want to update the timer frequency after registration time. Signed-off-by: Magnus Damm <damm@opensource.se> Tested-by: Simon Horman <horms@verge.net.au> Cc: arnd@arndb.de Cc: johnstul@us.ibm.com Cc: rjw@sisk.pl Cc: lethal@linux-sh.org Cc: gregkh@linuxfoundation.org Cc: olof@lixom.net Cc: Magnus Damm <magnus.damm@gmail.com> Link: http://lkml.kernel.org/r/20120509143934.27521.46553.sendpatchset@w520 Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-25tick: Add tick skew boot optionMike Galbraith1-0/+18
Let the user decide whether power consumption or jitter is the more important consideration for their machines. Quoting removal commit af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867: "Historically, Linux has tried to make the regular timer tick on the various CPUs not happen at the same time, to avoid contention on xtime_lock. Nowadays, with the tickless kernel, this contention no longer happens since time keeping and updating are done differently. In addition, this skew is actually hurting power consumption in a measurable way on many-core systems." Problems: - Contrary to the above, systems do encounter contention on both xtime_lock and RCU structure locks when the tick is synchronized. - Moderate sized RT systems suffer intolerable jitter due to the tick being synchronized. - SGI reports the same for their large systems. - Fully utilized systems reap no power saving benefit from skew removal, but do suffer from resulting induced lock contention. - 0209f649 rcu: limit rcu_node leaf-level fanout This patch was born to combat lock contention which testing showed to have been _induced by_ skew removal. Skew the tick, contention disappeared virtually completely. Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Link: http://lkml.kernel.org/r/1336472458.21924.78.camel@marge.simpson.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-24Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+12
Pull KVM changes from Avi Kivity: "Changes include additional instruction emulation, page-crossing MMIO, faster dirty logging, preventing the watchdog from killing a stopped guest, module autoload, a new MSI ABI, and some minor optimizations and fixes. Outside x86 we have a small s390 and a very large ppc update. Regarding the new (for kvm) rebaseless workflow, some of the patches that were merged before we switch trees had to be rebased, while others are true pulls. In either case the signoffs should be correct now." Fix up trivial conflicts in Documentation/feature-removal-schedule.txt arch/powerpc/kvm/book3s_segment.S and arch/x86/include/asm/kvm_para.h. I suspect the kvm_para.h resolution ends up doing the "do I have cpuid" check effectively twice (it was done differently in two different commits), but better safe than sorry ;) * 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (125 commits) KVM: make asm-generic/kvm_para.h have an ifdef __KERNEL__ block KVM: s390: onereg for timer related registers KVM: s390: epoch difference and TOD programmable field KVM: s390: KVM_GET/SET_ONEREG for s390 KVM: s390: add capability indicating COW support KVM: Fix mmu_reload() clash with nested vmx event injection KVM: MMU: Don't use RCU for lockless shadow walking KVM: VMX: Optimize %ds, %es reload KVM: VMX: Fix %ds/%es clobber KVM: x86 emulator: convert bsf/bsr instructions to emulate_2op_SrcV_nobyte() KVM: VMX: unlike vmcs on fail path KVM: PPC: Emulator: clean up SPR reads and writes KVM: PPC: Emulator: clean up instruction parsing kvm/powerpc: Add new ioctl to retreive server MMU infos kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler KVM: PPC: Book3S: Enable IRQs during exit handling KVM: PPC: Fix PR KVM on POWER7 bare metal KVM: PPC: Fix stbux emulation KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields ...
2012-05-24smpboot, idle: Fix comment mismatch over idle_threads_init()Srivatsa S. Bhat1-4/+7
The comment over idle_threads_init() really talks about the functionality of idle_init(). Move that comment to idle_init(), and add a suitable comment over idle_threads_init(). Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: suresh.b.siddha@intel.com Cc: venki@google.com Cc: nikunj@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/20120524151100.2549.66501.stgit@srivatsabhat.in.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-24smpboot, idle: Optimize calls to smp_processor_id() in idle_threads_init()Srivatsa S. Bhat1-2/+4
While trying to initialize idle threads for all cpus, idle_threads_init() calls smp_processor_id() in a loop, which is unnecessary. The intent is to initialize idle threads for all non-boot cpus. So just use a variable to note the boot cpu and use it in the loop. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: suresh.b.siddha@intel.com Cc: venki@google.com Cc: nikunj@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/20120524151055.2549.64309.stgit@srivatsabhat.in.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-24Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds1-18/+88
Pull irqdomain changes from Grant Likely: "Minor changes and fixups for irqdomain infrastructure. The most important change adds the ability to remove a registered irqdomain." * tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6: irqdomain: Document size parameter of irq_domain_add_linear() irqdomain: trivial pr_fmt conversion. irqdomain: Kill off duplicate definitions. irqdomain: Make irq_domain_simple_map() static. irqdomain: Export remaining public API symbols. irqdomain: Support removal of IRQ domains.
2012-05-24genirq: Introduce irq_do_set_affinity() to reduce duplicated codeJiang Liu3-28/+27
All invocations of chip->irq_set_affinity() are doing the same return value checks. Let them all use a common function. [ tglx: removed the silly likely while at it ] Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Keping Chen <chenkeping@huawei.com> Link: http://lkml.kernel.org/r/1333120296-13563-3-git-send-email-jiang.liu@huawei.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-24Merge branch 'timers-core-for-linus' of ↵Linus Torvalds3-15/+55
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner. Various trivial conflict fixups in arch Kconfig due to addition of unrelated entries nearby. And one slightly more subtle one for sparc32 (new user of GENERIC_CLOCKEVENTS), fixed up as per Thomas. * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) timekeeping: Fix a few minor newline issues. time: remove obsolete declaration ntp: Fix a stale comment and a few stray newlines. ntp: Correct TAI offset during leap second timers: Fixup the Kconfig consolidation fallout x86: Use generic time config unicore32: Use generic time config um: Use generic time config tile: Use generic time config sparc: Use: generic time config sh: Use generic time config score: Use generic time config s390: Use generic time config openrisc: Use generic time config powerpc: Use generic time config mn10300: Use generic time config mips: Use generic time config microblaze: Use generic time config m68k: Use generic time config m32r: Use generic time config ...
2012-05-24genirq: Add IRQS_PENDING for nested and simple irqNing Jiang1-2/+6
Every interrupt which is an active wakeup source needs the ability to abort suspend if there is a pending irq. Right now only edge and level irqs can do that. | +---------+ | INTC | +---------+ | GPIO_IRQ +------------+ | gpio-exp | +------------+ | | GPIO0_IRQ GPIO1_IRQ In the above diagram, gpio expander has irq number GPIO_IRQ, it is connected with two sub GPIO pins, GPIO0 and GPIO1. During suspend, we set IRQF_NO_SUSPEND for GPIO_IRQ so that gpio expander driver can handle the sub irq GPIO0_IRQ and GPIO1_IRQ, and these two irqs themselves can further be handled by simple or nested irq in some drivers(typically gpio and mfd driver). If they are used as wakeup sources during suspend, we want them to be able to abort suspend too. Setting IRQS_PENDING flag in handle_nested_irq() and handle_simple_irq() when the irq is disabled allows check_wakeup_irqs() to identify such irqs as source for aborting suspend. Signed-off-by: Ning Jiang <ning.n.jiang@gmail.com> Cc: rjw@sisk.pl Link: http://lkml.kernel.org/r/CAH3Oq6T905%2B3fkF43NAMMFvJvq7dsk_so6T2vQ8ZJrA5xiU3YA@mail.gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+2
Pull more networking updates from David Miller: "Ok, everything from here on out will be bug fixes." 1) One final sync of wireless and bluetooth stuff from John Linville. These changes have all been in his tree for more than a week, and therefore have had the necessary -next exposure. John was just away on a trip and didn't have a change to send the pull request until a day or two ago. 2) Put back some defines in user exposed header file areas that were removed during the tokenring purge. From Stephen Hemminger and Paul Gortmaker. 3) A bug fix for UDP hash table allocation got lost in the pile due to one of those "you got it.. no I've got it.." situations. :-) From Tim Bird. 4) SKB coalescing in TCP needs to have stricter checks, otherwise we'll try to coalesce overlapping frags and crash. Fix from Eric Dumazet. 5) RCU routing table lookups can race with free_fib_info(), causing crashes when we deref the device pointers in the route. Fix by releasing the net device in the RCU callback. From Yanmin Zhang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (293 commits) tcp: take care of overlaps in tcp_try_coalesce() ipv4: fix the rcu race between free_fib_info and ip_route_output_slow mm: add a low limit to alloc_large_system_hash ipx: restore token ring define to include/linux/ipx.h if: restore token ring ARP type to header xen: do not disable netfront in dom0 phy/micrel: Fix ID of KSZ9021 mISDN: Add X-Tensions USB ISDN TA XC-525 gianfar:don't add FCB length to hard_header_len Bluetooth: Report proper error number in disconnection Bluetooth: Create flags for bt_sk() Bluetooth: report the right security level in getsockopt Bluetooth: Lock the L2CAP channel when sending Bluetooth: Restore locking semantics when looking up L2CAP channels Bluetooth: Fix a redundant and problematic incoming MTU check Bluetooth: Add support for Foxconn/Hon Hai AR5BBU22 0489:E03C Bluetooth: Fix EIR data generation for mgmt_device_found Bluetooth: Fix Inquiry with RSSI event mask Bluetooth: improve readability of l2cap_seq_list code Bluetooth: Fix skb length calculation ...