summaryrefslogtreecommitdiffstats
path: root/Documentation/kernel-hacking
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/kernel-hacking')
-rw-r--r--Documentation/kernel-hacking/hacking.rst179
1 files changed, 98 insertions, 81 deletions
diff --git a/Documentation/kernel-hacking/hacking.rst b/Documentation/kernel-hacking/hacking.rst
index 75597e627791..1a456b60a7cf 100644
--- a/Documentation/kernel-hacking/hacking.rst
+++ b/Documentation/kernel-hacking/hacking.rst
@@ -56,7 +56,7 @@ interrupts. You can sleep, by calling :c:func:`schedule()`.
In user context, the ``current`` pointer (indicating the task we are
currently executing) is valid, and :c:func:`in_interrupt()`
-(``include/linux/interrupt.h``) is false.
+(``include/linux/preempt.h``) is false.
.. warning::
@@ -114,12 +114,12 @@ time, although different tasklets can run simultaneously.
Kuznetsov had at the time.
You can tell you are in a softirq (or tasklet) using the
-:c:func:`in_softirq()` macro (``include/linux/interrupt.h``).
+:c:func:`in_softirq()` macro (``include/linux/preempt.h``).
.. warning::
- Beware that this will return a false positive if a bh lock (see
- below) is held.
+ Beware that this will return a false positive if a
+ :ref:`botton half lock <local_bh_disable>` is held.
Some Basic Rules
================
@@ -154,9 +154,7 @@ The Linux kernel is portable
ioctls: Not writing a new system call
=====================================
-A system call generally looks like this
-
-::
+A system call generally looks like this::
asmlinkage long sys_mycall(int arg)
{
@@ -175,7 +173,9 @@ If all your routine does is read or write some parameter, consider
implementing a :c:func:`sysfs()` interface instead.
Inside the ioctl you're in user context to a process. When a error
-occurs you return a negated errno (see ``include/linux/errno.h``),
+occurs you return a negated errno (see
+``include/uapi/asm-generic/errno-base.h``,
+``include/uapi/asm-generic/errno.h`` and ``include/linux/errno.h``),
otherwise you return 0.
After you slept you should check if a signal occurred: the Unix/Linux
@@ -195,9 +195,7 @@ some data structure.
If you're doing longer computations: first think userspace. If you
**really** want to do it in kernel you should regularly check if you need
to give up the CPU (remember there is cooperative multitasking per CPU).
-Idiom:
-
-::
+Idiom::
cond_resched(); /* Will sleep */
@@ -231,26 +229,24 @@ Really.
Common Routines
===============
-:c:func:`printk()` ``include/linux/kernel.h``
----------------------------------------------
+:c:func:`printk()`
+------------------
+
+Defined in ``include/linux/printk.h``
:c:func:`printk()` feeds kernel messages to the console, dmesg, and
the syslog daemon. It is useful for debugging and reporting errors, and
can be used inside interrupt context, but use with caution: a machine
which has its console flooded with printk messages is unusable. It uses
a format string mostly compatible with ANSI C printf, and C string
-concatenation to give it a first "priority" argument:
-
-::
+concatenation to give it a first "priority" argument::
printk(KERN_INFO "i = %u\n", i);
-See ``include/linux/kernel.h``; for other ``KERN_`` values; these are
+See ``include/linux/kern_levels.h``; for other ``KERN_`` values; these are
interpreted by syslog as the level. Special case: for printing an IP
-address use
-
-::
+address use::
__be32 ipaddress;
printk(KERN_INFO "my ip: %pI4\n", &ipaddress);
@@ -270,8 +266,10 @@ overruns. Make sure that will be enough.
on top of its printf function: "Printf should not be used for
chit-chat". You should follow that advice.
-:c:func:`copy_[to/from]_user()` / :c:func:`get_user()` / :c:func:`put_user()` ``include/linux/uaccess.h``
----------------------------------------------------------------------------------------------------------
+:c:func:`copy_to_user()` / :c:func:`copy_from_user()` / :c:func:`get_user()` / :c:func:`put_user()`
+---------------------------------------------------------------------------------------------------
+
+Defined in ``include/linux/uaccess.h`` / ``asm/uaccess.h``
**[SLEEPS]**
@@ -297,8 +295,10 @@ The functions may sleep implicitly. This should never be called outside
user context (it makes no sense), with interrupts disabled, or a
spinlock held.
-:c:func:`kmalloc()`/:c:func:`kfree()` ``include/linux/slab.h``
---------------------------------------------------------------
+:c:func:`kmalloc()`/:c:func:`kfree()`
+-------------------------------------
+
+Defined in ``include/linux/slab.h``
**[MAY SLEEP: SEE BELOW]**
@@ -324,9 +324,9 @@ message, then maybe you called a sleeping allocation function from
interrupt context without ``GFP_ATOMIC``. You should really fix that.
Run, don't walk.
-If you are allocating at least ``PAGE_SIZE`` (``include/asm/page.h``)
-bytes, consider using :c:func:`__get_free_pages()`
-(``include/linux/mm.h``). It takes an order argument (0 for page sized,
+If you are allocating at least ``PAGE_SIZE`` (``asm/page.h`` or
+``asm/page_types.h``) bytes, consider using :c:func:`__get_free_pages()`
+(``include/linux/gfp.h``). It takes an order argument (0 for page sized,
1 for double page, 2 for four pages etc.) and the same memory priority
flag word as above.
@@ -344,24 +344,30 @@ routine.
Before inventing your own cache of often-used objects consider using a
slab cache in ``include/linux/slab.h``
-:c:func:`current()` ``include/asm/current.h``
----------------------------------------------
+:c:func:`current()`
+-------------------
+
+Defined in ``include/asm/current.h``
This global variable (really a macro) contains a pointer to the current
task structure, so is only valid in user context. For example, when a
process makes a system call, this will point to the task structure of
the calling process. It is **not NULL** in interrupt context.
-:c:func:`mdelay()`/:c:func:`udelay()` ``include/asm/delay.h`` ``include/linux/delay.h``
----------------------------------------------------------------------------------------
+:c:func:`mdelay()`/:c:func:`udelay()`
+-------------------------------------
+
+Defined in ``include/asm/delay.h`` / ``include/linux/delay.h``
The :c:func:`udelay()` and :c:func:`ndelay()` functions can be
used for small pauses. Do not use large values with them as you risk
overflow - the helper function :c:func:`mdelay()` is useful here, or
consider :c:func:`msleep()`.
-:c:func:`cpu_to_be32()`/:c:func:`be32_to_cpu()`/:c:func:`cpu_to_le32()`/:c:func:`le32_to_cpu()` ``include/asm/byteorder.h``
----------------------------------------------------------------------------------------------------------------------------
+:c:func:`cpu_to_be32()`/:c:func:`be32_to_cpu()`/:c:func:`cpu_to_le32()`/:c:func:`le32_to_cpu()`
+-----------------------------------------------------------------------------------------------
+
+Defined in ``include/asm/byteorder.h``
The :c:func:`cpu_to_be32()` family (where the "32" can be replaced
by 64 or 16, and the "be" can be replaced by "le") are the general way
@@ -375,8 +381,10 @@ to the given type, and return the converted value. The other variation
is the "in-situ" family, such as :c:func:`cpu_to_be32s()`, which
convert value referred to by the pointer, and return void.
-:c:func:`local_irq_save()`/:c:func:`local_irq_restore()` ``include/linux/irqflags.h``
--------------------------------------------------------------------------------------
+:c:func:`local_irq_save()`/:c:func:`local_irq_restore()`
+--------------------------------------------------------
+
+Defined in ``include/linux/irqflags.h``
These routines disable hard interrupts on the local CPU, and restore
them. They are reentrant; saving the previous state in their one
@@ -384,16 +392,23 @@ them. They are reentrant; saving the previous state in their one
enabled, you can simply use :c:func:`local_irq_disable()` and
:c:func:`local_irq_enable()`.
-:c:func:`local_bh_disable()`/:c:func:`local_bh_enable()` ``include/linux/interrupt.h``
---------------------------------------------------------------------------------------
+.. _local_bh_disable:
+
+:c:func:`local_bh_disable()`/:c:func:`local_bh_enable()`
+--------------------------------------------------------
+
+Defined in ``include/linux/bottom_half.h``
+
These routines disable soft interrupts on the local CPU, and restore
them. They are reentrant; if soft interrupts were disabled before, they
will still be disabled after this pair of functions has been called.
They prevent softirqs and tasklets from running on the current CPU.
-:c:func:`smp_processor_id()`() ``include/asm/smp.h``
-----------------------------------------------------
+:c:func:`smp_processor_id()`
+----------------------------
+
+Defined in ``include/linux/smp.h``
:c:func:`get_cpu()` disables preemption (so you won't suddenly get
moved to another CPU) and returns the current processor number, between
@@ -405,8 +420,10 @@ If you know you cannot be preempted by another task (ie. you are in
interrupt context, or have preemption disabled) you can use
smp_processor_id().
-``__init``/``__exit``/``__initdata`` ``include/linux/init.h``
--------------------------------------------------------------
+``__init``/``__exit``/``__initdata``
+------------------------------------
+
+Defined in ``include/linux/init.h``
After boot, the kernel frees up a special section; functions marked with
``__init`` and data structures marked with ``__initdata`` are dropped
@@ -415,10 +432,13 @@ initialization. ``__exit`` is used to declare a function which is only
required on exit: the function will be dropped if this file is not
compiled as a module. See the header file for use. Note that it makes no
sense for a function marked with ``__init`` to be exported to modules
-with :c:func:`EXPORT_SYMBOL()` - this will break.
+with :c:func:`EXPORT_SYMBOL()` or :c:func:`EXPORT_SYMBOL_GPL()`- this
+will break.
+
+:c:func:`__initcall()`/:c:func:`module_init()`
+----------------------------------------------
-:c:func:`__initcall()`/:c:func:`module_init()` ``include/linux/init.h``
------------------------------------------------------------------------
+Defined in ``include/linux/init.h`` / ``include/linux/module.h``
Many parts of the kernel are well served as a module
(dynamically-loadable parts of the kernel). Using the
@@ -438,8 +458,11 @@ to fail (unfortunately, this has no effect if the module is compiled
into the kernel). This function is called in user context with
interrupts enabled, so it can sleep.
-:c:func:`module_exit()` ``include/linux/init.h``
-------------------------------------------------
+:c:func:`module_exit()`
+-----------------------
+
+
+Defined in ``include/linux/module.h``
This macro defines the function to be called at module removal time (or
never, in the case of the file compiled into the kernel). It will only
@@ -450,8 +473,10 @@ it returns.
Note that this macro is optional: if it is not present, your module will
not be removable (except for 'rmmod -f').
-:c:func:`try_module_get()`/:c:func:`module_put()` ``include/linux/module.h``
-----------------------------------------------------------------------------
+:c:func:`try_module_get()`/:c:func:`module_put()`
+-------------------------------------------------
+
+Defined in ``include/linux/module.h``
These manipulate the module usage count, to protect against removal (a
module also can't be removed if another module uses one of its exported
@@ -472,8 +497,8 @@ Wait Queues ``include/linux/wait.h``
A wait queue is used to wait for someone to wake you up when a certain
condition is true. They must be used carefully to ensure there is no
-race condition. You declare a ``wait_queue_head_t``, and then processes
-which want to wait for that condition declare a ``wait_queue_t``
+race condition. You declare a :c:type:`wait_queue_head_t`, and then processes
+which want to wait for that condition declare a :c:type:`wait_queue_t`
referring to themselves, and place that in the queue.
Declaring
@@ -490,15 +515,15 @@ Queuing
Placing yourself in the waitqueue is fairly complex, because you must
put yourself in the queue before checking the condition. There is a
macro to do this: :c:func:`wait_event_interruptible()`
-``include/linux/wait.h`` The first argument is the wait queue head, and
+(``include/linux/wait.h``) The first argument is the wait queue head, and
the second is an expression which is evaluated; the macro returns 0 when
-this expression is true, or -ERESTARTSYS if a signal is received. The
+this expression is true, or ``-ERESTARTSYS`` if a signal is received. The
:c:func:`wait_event()` version ignores signals.
Waking Up Queued Tasks
----------------------
-Call :c:func:`wake_up()` ``include/linux/wait.h``;, which will wake
+Call :c:func:`wake_up()` (``include/linux/wait.h``);, which will wake
up every process in the queue. The exception is if one has
``TASK_EXCLUSIVE`` set, in which case the remainder of the queue will
not be woken. There are other variants of this basic function available
@@ -508,9 +533,9 @@ Atomic Operations
=================
Certain operations are guaranteed atomic on all platforms. The first
-class of operations work on ``atomic_t`` ``include/asm/atomic.h``; this
-contains a signed integer (at least 32 bits long), and you must use
-these functions to manipulate or read atomic_t variables.
+class of operations work on :c:type:`atomic_t` (``include/asm/atomic.h``);
+this contains a signed integer (at least 32 bits long), and you must use
+these functions to manipulate or read :c:type:`atomic_t` variables.
:c:func:`atomic_read()` and :c:func:`atomic_set()` get and set
the counter, :c:func:`atomic_add()`, :c:func:`atomic_sub()`,
:c:func:`atomic_inc()`, :c:func:`atomic_dec()`, and
@@ -534,7 +559,7 @@ true if the bit was previously set; these are particularly useful for
atomically setting flags.
It is possible to call these operations with bit indices greater than
-BITS_PER_LONG. The resulting behavior is strange on big-endian
+``BITS_PER_LONG``. The resulting behavior is strange on big-endian
platforms though so it is a good idea not to do this.
Symbols
@@ -546,14 +571,18 @@ be used anywhere in the kernel). However, for modules, a special
exported symbol table is kept which limits the entry points to the
kernel proper. Modules can also export symbols.
-:c:func:`EXPORT_SYMBOL()` ``include/linux/export.h``
-----------------------------------------------------
+:c:func:`EXPORT_SYMBOL()`
+-------------------------
+
+Defined in ``include/linux/export.h``
This is the classic method of exporting a symbol: dynamically loaded
modules will be able to use the symbol as normal.
-:c:func:`EXPORT_SYMBOL_GPL()` ``include/linux/export.h``
---------------------------------------------------------
+:c:func:`EXPORT_SYMBOL_GPL()`
+-----------------------------
+
+Defined in ``include/linux/export.h``
Similar to :c:func:`EXPORT_SYMBOL()` except that the symbols
exported by :c:func:`EXPORT_SYMBOL_GPL()` can only be seen by
@@ -579,11 +608,11 @@ Return Conventions
------------------
For code called in user context, it's very common to defy C convention,
-and return 0 for success, and a negative error number (eg. -EFAULT) for
+and return 0 for success, and a negative error number (eg. ``-EFAULT``) for
failure. This can be unintuitive at first, but it's fairly widespread in
the kernel.
-Using :c:func:`ERR_PTR()` ``include/linux/err.h``; to encode a
+Using :c:func:`ERR_PTR()` (``include/linux/err.h``) to encode a
negative error number into a pointer, and :c:func:`IS_ERR()` and
:c:func:`PTR_ERR()` to get it back out again: avoids a separate
pointer parameter for the error number. Icky, but in a good way.
@@ -603,9 +632,7 @@ Initializing structure members
------------------------------
The preferred method of initializing structures is to use designated
-initialisers, as defined by ISO C99, eg:
-
-::
+initialisers, as defined by ISO C99, eg::
static struct block_device_operations opt_fops = {
.open = opt_open,
@@ -716,18 +743,14 @@ Kernel Cantrips
Some favorites from browsing the source. Feel free to add to this list.
-``arch/x86/include/asm/delay.h:``
-
-::
+``arch/x86/include/asm/delay.h``::
#define ndelay(n) (__builtin_constant_p(n) ? \
((n) > 20000 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \
__ndelay(n))
-``include/linux/fs.h``:
-
-::
+``include/linux/fs.h``::
/*
* Kernel pointers have redundant information, so we can use a
@@ -741,9 +764,7 @@ Some favorites from browsing the source. Feel free to add to this list.
#define PTR_ERR(ptr) ((long)(ptr))
#define IS_ERR(ptr) ((unsigned long)(ptr) > (unsigned long)(-1000))
-``arch/x86/include/asm/uaccess_32.h:``
-
-::
+``arch/x86/include/asm/uaccess_32.h:``::
#define copy_to_user(to,from,n) \
(__builtin_constant_p(n) ? \
@@ -751,9 +772,7 @@ Some favorites from browsing the source. Feel free to add to this list.
__generic_copy_to_user((to),(from),(n)))
-``arch/sparc/kernel/head.S:``
-
-::
+``arch/sparc/kernel/head.S:``::
/*
* Sun people can't spell worth damn. "compatability" indeed.
@@ -772,9 +791,7 @@ Some favorites from browsing the source. Feel free to add to this list.
.asciz "compatible"
-``arch/sparc/lib/checksum.S:``
-
-::
+``arch/sparc/lib/checksum.S:``::
/* Sun, you just can't beat me, you just can't. Stop trying,
* give up. I'm serious, I am going to kick the living shit