summaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/uprobes.c
AgeCommit message (Collapse)AuthorFilesLines
2014-04-17uprobes/x86: Emulate unconditional relative jmp'sOleg Nesterov1-0/+38
Currently we always execute all insns out-of-line, including relative jmp's and call's. This assumes that even if regs->ip points to nowhere after the single-step, default_post_xol_op(UPROBE_FIX_IP) logic will update it correctly. However, this doesn't work if this regs->ip == xol_vaddr + insn_offset is not canonical. In this case CPU generates #GP and general_protection() kills the task which tries to execute this insn out-of-line. Now that we have uprobe_xol_ops we can teach uprobes to emulate these insns and solve the problem. This patch adds branch_xol_ops which has a single branch_emulate_op() hook, so far it can only handle rel8/32 relative jmp's. TODO: move ->fixup into the union along with rip_rela_target_address. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Jonathan Lebon <jlebon@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and ↵Oleg Nesterov1-22/+15
arch_uretprobe_hijack_return_addr() 1. Add the trivial sizeof_long() helper and change other callers of is_ia32_task() to use it. TODO: is_ia32_task() is not what we actually want, TS_COMPAT does not necessarily mean 32bit. Fortunately syscall-like insns can't be probed so it actually works, but it would be better to rename and use is_ia32_frame(). 2. As Jim pointed out "ncopied" in arch_uretprobe_hijack_return_addr() and adjust_ret_addr() should be named "nleft". And in fact only the last copy_to_user() in arch_uretprobe_hijack_return_addr() actually needs to inspect the non-zero error code. TODO: adjust_ret_addr() should die. We can always calculate the value we need to write into *regs->sp, just UPROBE_FIX_CALL should record insn->length. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Teach arch_uprobe_post_xol() to restart if possibleOleg Nesterov1-4/+16
SIGILL after the failed arch_uprobe_post_xol() should only be used as a last resort, we should try to restart the probed insn if possible. Currently only adjust_ret_addr() can fail, and this can only happen if another thread unmapped our stack after we executed "call" out-of-line. Most probably the application if buggy, but even in this case it can have a handler for SIGSEGV/etc. And in theory it can be even correct and do something non-trivial with its memory. Of course we can't restart unconditionally, so arch_uprobe_post_xol() does this only if ->post_xol() returns -ERESTART even if currently this is the only possible error. default_post_xol_op(UPROBE_FIX_CALL) can always restart, but as Jim pointed out it should not forget to pop off the return address pushed by this insn executed out-of-line. Note: this is not "perfect", we do not want the extra handler_chain() after restart, but I think this is the best solution we can realistically do without too much uglifications. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Send SIGILL if arch_uprobe_post_xol() failsOleg Nesterov1-4/+12
Currently the error from arch_uprobe_post_xol() is silently ignored. This doesn't look good and this can lead to the hard-to-debug problems. 1. Change handle_singlestep() to loudly complain and send SIGILL. Note: this only affects x86, ppc/arm can't fail. 2. Change arch_uprobe_post_xol() to call arch_uprobe_abort_xol() and avoid TF games if it is going to return an error. This can help to to analyze the problem, if nothing else we should not report ->ip = xol_slot in the core-file. Note: this means that handle_riprel_post_xol() can be called twice, but this is fine because it is idempotent. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Conditionalize the usage of handle_riprel_insn()Oleg Nesterov1-4/+2
arch_uprobe_analyze_insn() calls handle_riprel_insn() at the start, but only "0xff" and "default" cases need the UPROBE_FIX_RIP_ logic. Move the callsite into "default" case and change the "0xff" case to fall-through. We are going to add the various hooks to handle the rip-relative jmp/call instructions (and more), we need this change to enforce the fact that the new code can not conflict with is_riprel_insn() logic which, after this change, can only be used by default_xol_ops. Note: arch_uprobe_abort_xol() still calls handle_riprel_post_xol() directly. This is fine unless another _xol_ops we may add later will need to reuse "UPROBE_FIX_RIP_AX|UPROBE_FIX_RIP_CX" bits in ->fixup. In this case we can add uprobe_xol_ops->abort() hook, which (perhaps) we will need anyway in the long term. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
2014-04-17uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->opsOleg Nesterov1-39/+68
Introduce arch_uprobe->ops pointing to the "struct uprobe_xol_ops", move the current UPROBE_FIX_{RIP*,IP,CALL} code into the default set of methods and change arch_uprobe_pre/post_xol() accordingly. This way we can add the new uprobe_xol_ops's to handle the insns which need the special processing (rip-relative jmp/call at least). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
2014-04-17uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooksOleg Nesterov1-16/+14
No functional changes. Preparation to simplify the review of the next change. Just reorder the code in arch_uprobe_pre/post_xol() functions so that UPROBE_FIX_{RIP_*,IP,CALL} logic goes to the end. Also change arch_uprobe_pre_xol() to use utask instead of autask, to make the code more symmetrical with arch_uprobe_post_xol(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-17uprobes/x86: Gather "riprel" functions togetherOleg Nesterov1-65/+53
Cosmetic. Move pre_xol_rip_insn() and handle_riprel_post_xol() up to the closely related handle_riprel_insn(). This way it is simpler to read and understand this code, and this lessens the number of ifdef's. While at it, update the comment in handle_riprel_post_xol() as Jim suggested. TODO: rename them somehow to make the naming consistent. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com>
2014-04-17uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove ↵Oleg Nesterov1-7/+3
"mm" arg Kill the "mm->context.ia32_compat" check in handle_riprel_insn(), if it is true insn_rip_relative() must return false. validate_insn_bits() passed "ia32_compat" as !x86_64 to insn_init(), and insn_rip_relative() checks insn->x86_64. Also, remove the no longer needed "struct mm_struct *mm" argument and the unnecessary "return" at the end. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2014-04-17uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn()Oleg Nesterov1-63/+47
No functional changes, preparation. Shift the code from prepare_fixups() to arch_uprobe_analyze_insn() with the following modifications: - Do not call insn_get_opcode() again, it was already called by validate_insn_bits(). - Move "case 0xea" up. This way "case 0xff" can fall through to default case. - change "case 0xff" to use the nested "switch (MODRM_REG)", this way the code looks a bit simpler. - Make the comments look consistent. While at it, kill the initialization of rip_rela_target_address and ->fixups, we can rely on kzalloc(). We will add the new members into arch_uprobe, it would be better to assume that everything is zero by default. TODO: cleanup/fix the mess in validate_insn_bits() paths: - validate_insn_64bits() and validate_insn_32bits() should be unified. - "ifdef" is not used consistently; if good_insns_64 depends on CONFIG_X86_64, then probably good_insns_32 should depend on CONFIG_X86_32/EMULATION - the usage of mm->context.ia32_compat looks wrong if the task is TIF_X32. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2013-04-13uretprobes/x86: Hijack return addressAnton Arapov1-0/+29
Hijack the return address and replace it with a trampoline address. Signed-off-by: Anton Arapov <anton@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-02-08uprobes: Change handle_swbp() to expose bp_vaddr to handler_chain()Oleg Nesterov1-1/+0
Change handle_swbp() to set regs->ip = bp_vaddr in advance, this is what consumer->handler() needs but uprobe_get_swbp_addr() is not exported. This also simplifies the code and makes it more consistent across the supported architectures. handle_swbp() becomes the only caller of uprobe_get_swbp_addr(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
2013-02-08uprobes/x86: Change __skip_sstep() to actually skip the whole insnOleg Nesterov1-1/+4
__skip_sstep() doesn't update regs->ip. Currently this is correct but only "by accident" and it doesn't skip the whole insn. Change it to advance ->ip by the length of the detected 0x66*0x90 sequence. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-11-03uprobes/x86: Cleanup the single-stepping codeOleg Nesterov1-35/+19
No functional changes. Now that default arch_uprobe_enable/disable_step() helpers do nothing, x86 has no reason to reimplement them. Change arch_uprobe_*_xol() hooks to do the necessary work and remove the x86-specific hooks. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-10-07uprobes/x86: Only rep+nop can be emulated correctlyOleg Nesterov1-14/+2
__skip_sstep() correctly detects the "nontrivial" nop insns, but since it doesn't update regs->ip we can not really skip "0x0f 0x1f | 0x0f 0x19 | 0x87 0xc0", the probed application is killed by SIGILL'ed handle_swbp(). Remove these additional checks. If we want to implement this correctly we need to know the full insn length to update ->ip. rep* + nop is fine even without updating ->ip. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes/x86: Fix arch_uprobe_disable_step() && UTASK_SSTEP_TRAPPED interactionOleg Nesterov1-4/+10
arch_uprobe_disable_step() should also take UTASK_SSTEP_TRAPPED into account. In this case the probed insn was not executed, we need to clear X86_EFLAGS_TF if it was set by us and that is all. Again, this code will look more clean when we move it into arch_uprobe_post_xol() and arch_uprobe_abort_xol(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes/x86: Xol should send SIGTRAP if X86_EFLAGS_TF was setOleg Nesterov1-6/+13
arch_uprobe_disable_step() correctly preserves X86_EFLAGS_TF and returns to user-mode. But this means the application gets SIGTRAP only after the next insn. This means that UPROBE_CLEAR_TF logic is not really right. _enable should only record the state of X86_EFLAGS_TF, and _disable should check it separately from UPROBE_FIX_SETF. Remove arch_uprobe_task->restore_flags, add ->saved_tf instead, and change enable/disable accordingly. This assumes that the probed insn was not trapped, see the next patch. arch_uprobe_skip_sstep() logic has the same problem, change it to check X86_EFLAGS_TF and send SIGTRAP as well. We will cleanup this all after we fold enable/disable_step into pre/post_hol hooks. Note: send_sig(SIGTRAP) is not actually right, we need send_sigtrap(). But this needs more changes, handle_swbp() does the same and this is equally wrong. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes/x86: Do not (ab)use TIF_SINGLESTEP/user_*_single_step() for ↵Oleg Nesterov1-14/+18
single-stepping user_enable/disable_single_step() was designed for ptrace, it assumes a single user and does unnecessary and wrong things for uprobes. For example: - arch_uprobe_enable_step() can't trust TIF_SINGLESTEP, an application itself can set X86_EFLAGS_TF which must be preserved after arch_uprobe_disable_step(). - we do not want to set TIF_SINGLESTEP/TIF_FORCED_TF in arch_uprobe_enable_step(), this only makes sense for ptrace. - otoh we leak TIF_SINGLESTEP if arch_uprobe_disable_step() doesn't do user_disable_single_step(), the application will be killed after the next syscall. - arch_uprobe_enable_step() does access_process_vm() we do not need/want. Change arch_uprobe_enable/disable_step() to set/clear X86_EFLAGS_TF directly, this is much simpler and more correct. However, we need to clear TIF_BLOCKSTEP/DEBUGCTLMSR_BTF before executing the probed insn, add set_task_blockstep(false). Note: with or without this patch, there is another (hopefully minor) problem. A probed "pushf" insn can see the wrong X86_EFLAGS_TF set by uprobes. Perhaps we should change _disable to update the stack, or teach arch_uprobe_skip_sstep() to emulate this insn. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes/x86: Implement x86 specific arch_uprobe_*_stepSebastian Andrzej Siewior1-0/+33
The arch specific implementation behaves like user_enable_single_step() except that it does not disable single stepping if it was already enabled by ptrace. This allows the debugger to single step over an uprobe. The state of block stepping is not restored. It makes only sense together with TF and if that was enabled then the debugger is notified. Note: this is still not correct. For example, TIF_SINGLESTEP check is not right, the application itself can set X86_EFLAGS_TF. And otoh we leak TIF_SINGLESTEP (set by enable) if the probed insn is "popf". See the next patches, we need the changes in arch/x86/kernel/step.c first. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-06-08uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()Ananth N Mavinakayanahalli1-1/+2
On RISC architectures like powerpc, instructions are fixed size. Instruction analysis on such platforms is just a matter of (insn % 4). Pass the vaddr at which the uprobe is to be inserted so that arch_uprobe_analyze_insn() can flag misaligned registration requests. Signed-off-by: Ananth N Mavinakaynahalli <ananth@in.ibm.com> Cc: michael@ellerman.id.au Cc: antonb@thinktux.localdomain Cc: Paul Mackerras <paulus@samba.org> Cc: benh@kernel.crashing.org Cc: peterz@infradead.org Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: oleg@redhat.com Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20120608093257.GG13409@in.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-14uprobes/core: Handle breakpoint and singlestep exceptionsSrikar Dronamraju1-7/+258
Uprobes uses exception notifiers to get to know if a thread hit a breakpoint or a singlestep exception. When a thread hits a uprobe or is singlestepping post a uprobe hit, the uprobe exception notifier sets its TIF_UPROBE bit, which will then be checked on its return to userspace path (do_notify_resume() ->uprobe_notify_resume()), where the consumers handlers are run (in task context) based on the defined filters. Uprobe hits are thread specific and hence we need to maintain information about if a task hit a uprobe, what uprobe was hit, the slot where the original instruction was copied for xol so that it can be singlestepped with appropriate fixups. In some cases, special care is needed for instructions that are executed out of line (xol). These are architecture specific artefacts, such as handling RIP relative instructions on x86_64. Since the instruction at which the uprobe was inserted is executed out of line, architecture specific fixups are added so that the thread continues normal execution in the presence of a uprobe. Postpone the signals until we execute the probed insn. post_xol() path does a recalc_sigpending() before return to user-mode, this ensures the signal can't be lost. Uprobes relies on DIE_DEBUG notification to notify if a singlestep is complete. Adds x86 specific uprobe exception notifiers and appropriate hooks needed to determine a uprobe hit and subsequent post processing. Add requisite x86 fixups for xol for uprobes. Specific cases needing fixups include relative jumps (x86_64), calls, etc. Where possible, we check and skip singlestepping the breakpointed instructions. For now we skip single byte as well as few multibyte nop instructions. However this can be extended to other instructions too. Credits to Oleg Nesterov for suggestions/patches related to signal, breakpoint, singlestep handling code. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com> Cc: Linux-mm <linux-mm@kvack.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120313180011.29771.89027.sendpatchset@srdronam.in.ibm.com [ Performed various cleanliness edits ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-13uprobes/core: Make order of function parameters consistent across functionsSrikar Dronamraju1-7/+8
If a function takes struct uprobe or struct arch_uprobe, then it is passed as the first parameter. This is pure cleanup, no functional change intended. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com> Cc: Linux-mm <linux-mm@kvack.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120312092530.5379.18394.sendpatchset@srdronam.in.ibm.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-13uprobes/core: Make macro names consistentSrikar Dronamraju1-9/+9
Rename macros that refer to individual uprobe to start with UPROBE_ instead of UPROBES_. This is pure cleanup, no functional change intended. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com> Cc: Linux-mm <linux-mm@kvack.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120312092514.5379.36595.sendpatchset@srdronam.in.ibm.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-22uprobes/core: Move insn to arch specific structureSrikar Dronamraju1-30/+30
Few cleanups suggested by Ingo Molnar. - Rename struct uprobe_arch_info to struct arch_uprobe. - Move insn from struct uprobe to struct arch_uprobe. - Make arch specific uprobe functions to accept struct arch_uprobe instead of struct uprobe. - Move struct uprobe to kernel/uprobes.c from include/linux/uprobes.h Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Anton Arapov <anton@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Josh Stone <jistone@redhat.com> Link: http://lkml.kernel.org/r/20120222091602.15880.40249.sendpatchset@srdronam.in.ibm.com [ Made various small improvements ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-22uprobes/core: Make instruction tables volatileSrikar Dronamraju1-29/+32
Some versions of gcc spits a warning about the asm operand for test_bit and also causes the first long of the instruction table to be output. Fix is similar to 7115e3fc on arch/x86/kernel/kprobes.c Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Anton Arapov <anton@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Josh Stone <jistone@redhat.com> Link: http://lkml.kernel.org/r/20120222091535.15880.12502.sendpatchset@srdronam.in.ibm.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-17uprobes/core: Clean up, refactor and improve the codeIngo Molnar1-61/+68
Make the uprobes code readable to me: - improve the Kconfig text so that a mere mortal gets some idea what CONFIG_UPROBES=y is really about - do trivial renames to standardize around the uprobes_*() namespace - clean up and simplify various code flow details - separate basic blocks of functionality - line break artifact and white space related removal - use standard local varible definition blocks - use vertical spacing to make things more readable - remove unnecessary volatile - restructure comment blocks to make them more uniform and more readable in general Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Anton Arapov <anton@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Link: http://lkml.kernel.org/n/tip-ewbwhb8o6navvllsauu7k07p@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-17uprobes, mm, x86: Add the ability to install and remove uprobes breakpointsSrikar Dronamraju1-0/+412
Add uprobes support to the core kernel, with x86 support. This commit adds the kernel facilities, the actual uprobes user-space ABI and perf probe support comes in later commits. General design: Uprobes are maintained in an rb-tree indexed by inode and offset (the offset here is from the start of the mapping). For a unique (inode, offset) tuple, there can be at most one uprobe in the rb-tree. Since the (inode, offset) tuple identifies a unique uprobe, more than one user may be interested in the same uprobe. This provides the ability to connect multiple 'consumers' to the same uprobe. Each consumer defines a handler and a filter (optional). The 'handler' is run every time the uprobe is hit, if it matches the 'filter' criteria. The first consumer of a uprobe causes the breakpoint to be inserted at the specified address and subsequent consumers are appended to this list. On subsequent probes, the consumer gets appended to the existing list of consumers. The breakpoint is removed when the last consumer unregisters. For all other unregisterations, the consumer is removed from the list of consumers. Given a inode, we get a list of the mms that have mapped the inode. Do the actual registration if mm maps the page where a probe needs to be inserted/removed. We use a temporary list to walk through the vmas that map the inode. - The number of maps that map the inode, is not known before we walk the rmap and keeps changing. - extending vm_area_struct wasn't recommended, it's a size-critical data structure. - There can be more than one maps of the inode in the same mm. We add callbacks to the mmap methods to keep an eye on text vmas that are of interest to uprobes. When a vma of interest is mapped, we insert the breakpoint at the right address. Uprobe works by replacing the instruction at the address defined by (inode, offset) with the arch specific breakpoint instruction. We save a copy of the original instruction at the uprobed address. This is needed for: a. executing the instruction out-of-line (xol). b. instruction analysis for any subsequent fixups. c. restoring the instruction back when the uprobe is unregistered. We insert or delete a breakpoint instruction, and this breakpoint instruction is assumed to be the smallest instruction available on the platform. For fixed size instruction platforms this is trivially true, for variable size instruction platforms the breakpoint instruction is typically the smallest (often a single byte). Writing the instruction is done by COWing the page and changing the instruction during the copy, this even though most platforms allow atomic writes of the breakpoint instruction. This also mirrors the behaviour of a ptrace() memory write to a PRIVATE file map. The core worker is derived from KSM's replace_page() logic. In essence, similar to KSM: a. allocate a new page and copy over contents of the page that has the uprobed vaddr b. modify the copy and insert the breakpoint at the required address c. switch the original page with the copy containing the breakpoint d. flush page tables. replace_page() is being replicated here because of some minor changes in the type of pages and also because Hugh Dickins had plans to improve replace_page() for KSM specific work. Instruction analysis on x86 is based on instruction decoder and determines if an instruction can be probed and determines the necessary fixups after singlestep. Instruction analysis is done at probe insertion time so that we avoid having to repeat the same analysis every time a probe is hit. A lot of code here is due to the improvement/suggestions/inputs from Peter Zijlstra. Changelog: (v10): - Add code to clear REX.B prefix as suggested by Denys Vlasenko and Masami Hiramatsu. (v9): - Use insn_offset_modrm as suggested by Masami Hiramatsu. (v7): Handle comments from Peter Zijlstra: - Dont take reference to inode. (expect inode to uprobe_register to be sane). - Use PTR_ERR to set the return value. - No need to take reference to inode. - use PTR_ERR to return error value. - register and uprobe_unregister share code. (v5): - Modified del_consumer as per comments from Peter. - Drop reference to inode before dropping reference to uprobe. - Use i_size_read(inode) instead of inode->i_size. - Ensure uprobe->consumers is NULL, before __uprobe_unregister() is called. - Includes errno.h as recommended by Stephen Rothwell to fix a build issue on sparc defconfig - Remove restrictions while unregistering. - Earlier code leaked inode references under some conditions while registering/unregistering. - Continue the vma-rmap walk even if the intermediate vma doesnt meet the requirements. - Validate the vma found by find_vma before inserting/removing the breakpoint - Call del_consumer under mutex_lock. - Use hash locks. - Handle mremap. - Introduce find_least_offset_node() instead of close match logic in find_uprobe - Uprobes no more depends on MM_OWNER; No reference to task_structs while inserting/removing a probe. - Uses read_mapping_page instead of grab_cache_page so that the pages have valid content. - pass NULL to get_user_pages for the task parameter. - call SetPageUptodate on the new page allocated in write_opcode. - fix leaking a reference to the new page under certain conditions. - Include Instruction Decoder if Uprobes gets defined. - Remove const attributes for instruction prefix arrays. - Uses mm_context to know if the application is 32 bit. Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Also-written-by: Jim Keniston <jkenisto@us.ibm.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Roland McGrath <roland@hack.frob.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Anton Arapov <anton@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linux-mm <linux-mm@kvack.org> Link: http://lkml.kernel.org/r/20120209092642.GE16600@linux.vnet.ibm.com [ Made various small edits to the commit log ] Signed-off-by: Ingo Molnar <mingo@elte.hu>