diff options
author | David S. Miller <davem@davemloft.net> | 2019-03-04 10:14:31 -0800 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2019-03-04 10:14:31 -0800 |
commit | f7fb7c1a1c8f86005d34f28278524213c521f761 (patch) | |
tree | 05a3b21c5e0b1667b106153fc0f0eb88cd980ab2 /Documentation | |
parent | 8c4238df4d0cc3420c5ee14b54d200d74267cfe5 (diff) | |
parent | 87dab7c3d54ce0f1ff6b54840bf7279d0944bc6a (diff) | |
download | linux-f7fb7c1a1c8f86005d34f28278524213c521f761.tar.bz2 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-03-04
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add AF_XDP support to libbpf. Rationale is to facilitate writing
AF_XDP applications by offering higher-level APIs that hide many
of the details of the AF_XDP uapi. Sample programs are converted
over to this new interface as well, from Magnus.
2) Introduce a new cant_sleep() macro for annotation of functions
that cannot sleep and use it in BPF_PROG_RUN() to assert that
BPF programs run under preemption disabled context, from Peter.
3) Introduce per BPF prog stats in order to monitor the usage
of BPF; this is controlled by kernel.bpf_stats_enabled sysctl
knob where monitoring tools can make use of this to efficiently
determine the average cost of programs, from Alexei.
4) Split up BPF selftest's test_progs similarly as we already
did with test_verifier. This allows to further reduce merge
conflicts in future and to get more structure into our
quickly growing BPF selftest suite, from Stanislav.
5) Fix a bug in BTF's dedup algorithm which can cause an infinite
loop in some circumstances; also various BPF doc fixes and
improvements, from Andrii.
6) Various BPF sample cleanups and migration to libbpf in order
to further isolate the old sample loader code (so we can get
rid of it at some point), from Jakub.
7) Add a new BPF helper for BPF cgroup skb progs that allows
to set ECN CE code point and a Host Bandwidth Manager (HBM)
sample program for limiting the bandwidth used by v2 cgroups,
from Lawrence.
8) Enable write access to skb->queue_mapping from tc BPF egress
programs in order to let BPF pick TX queue, from Jesper.
9) Fix a bug in BPF spinlock handling for map-in-map which did
not propagate spin_lock_off to the meta map, from Yonghong.
10) Fix a bug in the new per-CPU BPF prog counters to properly
initialize stats for each CPU, from Eric.
11) Add various BPF helper prototypes to selftest's bpf_helpers.h,
from Willem.
12) Fix various BPF samples bugs in XDP and tracing progs,
from Toke, Daniel and Yonghong.
13) Silence preemption splat in test_bpf after BPF_PROG_RUN()
enforces it now everywhere, from Anders.
14) Fix a signedness bug in libbpf's btf_dedup_ref_type() to
get error handling working, from Dan.
15) Fix bpftool documentation and auto-completion with regards
to stream_{verdict,parser} attach types, from Alban.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/bpf/bpf_design_QA.rst | 24 | ||||
-rw-r--r-- | Documentation/bpf/btf.rst | 316 | ||||
-rw-r--r-- | Documentation/networking/af_xdp.rst | 36 | ||||
-rw-r--r-- | Documentation/networking/filter.txt | 2 |
4 files changed, 195 insertions, 183 deletions
diff --git a/Documentation/bpf/bpf_design_QA.rst b/Documentation/bpf/bpf_design_QA.rst index 7cc9e368c1e9..10453c627135 100644 --- a/Documentation/bpf/bpf_design_QA.rst +++ b/Documentation/bpf/bpf_design_QA.rst @@ -36,27 +36,27 @@ consideration important quirks of other architectures) and defines calling convention that is compatible with C calling convention of the linux kernel on those architectures. -Q: can multiple return values be supported in the future? +Q: Can multiple return values be supported in the future? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A: NO. BPF allows only register R0 to be used as return value. -Q: can more than 5 function arguments be supported in the future? +Q: Can more than 5 function arguments be supported in the future? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A: NO. BPF calling convention only allows registers R1-R5 to be used as arguments. BPF is not a standalone instruction set. (unlike x64 ISA that allows msft, cdecl and other conventions) -Q: can BPF programs access instruction pointer or return address? +Q: Can BPF programs access instruction pointer or return address? ----------------------------------------------------------------- A: NO. -Q: can BPF programs access stack pointer ? +Q: Can BPF programs access stack pointer ? ------------------------------------------ A: NO. Only frame pointer (register R10) is accessible. From compiler point of view it's necessary to have stack pointer. -For example LLVM defines register R11 as stack pointer in its +For example, LLVM defines register R11 as stack pointer in its BPF backend, but it makes sure that generated code never uses it. Q: Does C-calling convention diminishes possible use cases? @@ -66,8 +66,8 @@ A: YES. BPF design forces addition of major functionality in the form of kernel helper functions and kernel objects like BPF maps with seamless interoperability between them. It lets kernel call into -BPF programs and programs call kernel helpers with zero overhead. -As all of them were native C code. That is particularly the case +BPF programs and programs call kernel helpers with zero overhead, +as all of them were native C code. That is particularly the case for JITed BPF programs that are indistinguishable from native kernel C code. @@ -75,9 +75,9 @@ Q: Does it mean that 'innovative' extensions to BPF code are disallowed? ------------------------------------------------------------------------ A: Soft yes. -At least for now until BPF core has support for +At least for now, until BPF core has support for bpf-to-bpf calls, indirect calls, loops, global variables, -jump tables, read only sections and all other normal constructs +jump tables, read-only sections, and all other normal constructs that C code can produce. Q: Can loops be supported in a safe way? @@ -109,16 +109,16 @@ For example why BPF_JNE and other compare and jumps are not cpu-like? A: This was necessary to avoid introducing flags into ISA which are impossible to make generic and efficient across CPU architectures. -Q: why BPF_DIV instruction doesn't map to x64 div? +Q: Why BPF_DIV instruction doesn't map to x64 div? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A: Because if we picked one-to-one relationship to x64 it would have made it more complicated to support on arm64 and other archs. Also it needs div-by-zero runtime check. -Q: why there is no BPF_SDIV for signed divide operation? +Q: Why there is no BPF_SDIV for signed divide operation? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A: Because it would be rarely used. llvm errors in such case and -prints a suggestion to use unsigned divide instead +prints a suggestion to use unsigned divide instead. Q: Why BPF has implicit prologue and epilogue? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/bpf/btf.rst b/Documentation/bpf/btf.rst index 1d434c3a268d..9a60a5d60e38 100644 --- a/Documentation/bpf/btf.rst +++ b/Documentation/bpf/btf.rst @@ -5,43 +5,35 @@ BPF Type Format (BTF) 1. Introduction *************** -BTF (BPF Type Format) is the meta data format which -encodes the debug info related to BPF program/map. -The name BTF was used initially to describe -data types. The BTF was later extended to include -function info for defined subroutines, and line info -for source/line information. - -The debug info is used for map pretty print, function -signature, etc. The function signature enables better -bpf program/function kernel symbol. -The line info helps generate -source annotated translated byte code, jited code -and verifier log. +BTF (BPF Type Format) is the metadata format which encodes the debug info +related to BPF program/map. The name BTF was used initially to describe data +types. The BTF was later extended to include function info for defined +subroutines, and line info for source/line information. + +The debug info is used for map pretty print, function signature, etc. The +function signature enables better bpf program/function kernel symbol. The line +info helps generate source annotated translated byte code, jited code and +verifier log. The BTF specification contains two parts, * BTF kernel API * BTF ELF file format -The kernel API is the contract between -user space and kernel. The kernel verifies -the BTF info before using it. -The ELF file format is a user space contract -between ELF file and libbpf loader. +The kernel API is the contract between user space and kernel. The kernel +verifies the BTF info before using it. The ELF file format is a user space +contract between ELF file and libbpf loader. -The type and string sections are part of the -BTF kernel API, describing the debug info -(mostly types related) referenced by the bpf program. -These two sections are discussed in -details in :ref:`BTF_Type_String`. +The type and string sections are part of the BTF kernel API, describing the +debug info (mostly types related) referenced by the bpf program. These two +sections are discussed in details in :ref:`BTF_Type_String`. .. _BTF_Type_String: 2. BTF Type and String Encoding ******************************* -The file ``include/uapi/linux/btf.h`` provides high -level definition on how types/strings are encoded. +The file ``include/uapi/linux/btf.h`` provides high-level definition of how +types/strings are encoded. The beginning of data blob must be:: @@ -59,25 +51,23 @@ The beginning of data blob must be:: }; The magic is ``0xeB9F``, which has different encoding for big and little -endian system, and can be used to test whether BTF is generated for -big or little endian target. -The btf_header is designed to be extensible with hdr_len equal to -``sizeof(struct btf_header)`` when the data blob is generated. +endian systems, and can be used to test whether BTF is generated for big- or +little-endian target. The ``btf_header`` is designed to be extensible with +``hdr_len`` equal to ``sizeof(struct btf_header)`` when a data blob is +generated. 2.1 String Encoding =================== -The first string in the string section must be a null string. -The rest of string table is a concatenation of other null-treminated -strings. +The first string in the string section must be a null string. The rest of +string table is a concatenation of other null-terminated strings. 2.2 Type Encoding ================= -The type id ``0`` is reserved for ``void`` type. -The type section is parsed sequentially and the type id is assigned to -each recognized type starting from id ``1``. -Currently, the following types are supported:: +The type id ``0`` is reserved for ``void`` type. The type section is parsed +sequentially and type id is assigned to each recognized type starting from id +``1``. Currently, the following types are supported:: #define BTF_KIND_INT 1 /* Integer */ #define BTF_KIND_PTR 2 /* Pointer */ @@ -122,9 +112,9 @@ Each type contains the following common data:: }; }; -For certain kinds, the common data are followed by kind specific data. -The ``name_off`` in ``struct btf_type`` specifies the offset in the string table. -The following details encoding of each kind. +For certain kinds, the common data are followed by kind-specific data. The +``name_off`` in ``struct btf_type`` specifies the offset in the string table. +The following sections detail encoding of each kind. 2.2.1 BTF_KIND_INT ~~~~~~~~~~~~~~~~~~ @@ -136,7 +126,7 @@ The following details encoding of each kind. * ``info.vlen``: 0 * ``size``: the size of the int type in bytes. -``btf_type`` is followed by a ``u32`` with following bits arrangement:: +``btf_type`` is followed by a ``u32`` with the following bits arrangement:: #define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f000000) >> 24) #define BTF_INT_OFFSET(VAL) (((VAL & 0x00ff0000)) >> 16) @@ -148,39 +138,33 @@ The ``BTF_INT_ENCODING`` has the following attributes:: #define BTF_INT_CHAR (1 << 1) #define BTF_INT_BOOL (1 << 2) -The ``BTF_INT_ENCODING()`` provides extra information, signness, -char, or bool, for the int type. The char and bool encoding -are mostly useful for pretty print. At most one encoding can -be specified for the int type. - -The ``BTF_INT_BITS()`` specifies the number of actual bits held by -this int type. For example, a 4-bit bitfield encodes -``BTF_INT_BITS()`` equals to 4. The ``btf_type.size * 8`` -must be equal to or greater than ``BTF_INT_BITS()`` for the type. -The maximum value of ``BTF_INT_BITS()`` is 128. +The ``BTF_INT_ENCODING()`` provides extra information: signedness, char, or +bool, for the int type. The char and bool encoding are mostly useful for +pretty print. At most one encoding can be specified for the int type. -The ``BTF_INT_OFFSET()`` specifies the starting bit offset to -calculate values for this int. For example, a bitfield struct -member has +The ``BTF_INT_BITS()`` specifies the number of actual bits held by this int +type. For example, a 4-bit bitfield encodes ``BTF_INT_BITS()`` equals to 4. +The ``btf_type.size * 8`` must be equal to or greater than ``BTF_INT_BITS()`` +for the type. The maximum value of ``BTF_INT_BITS()`` is 128. - * btf member bit offset 100 from the start of the structure, - * btf member pointing to an int type, - * the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4`` +The ``BTF_INT_OFFSET()`` specifies the starting bit offset to calculate values +for this int. For example, a bitfield struct member has: * btf member bit +offset 100 from the start of the structure, * btf member pointing to an int +type, * the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4`` -Then in the struct memory layout, this member will occupy -``4`` bits starting from bits ``100 + 2 = 102``. +Then in the struct memory layout, this member will occupy ``4`` bits starting +from bits ``100 + 2 = 102``. -Alternatively, the bitfield struct member can be the following to -access the same bits as the above: +Alternatively, the bitfield struct member can be the following to access the +same bits as the above: * btf member bit offset 102, * btf member pointing to an int type, * the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4`` -The original intention of ``BTF_INT_OFFSET()`` is to provide -flexibility of bitfield encoding. -Currently, both llvm and pahole generates ``BTF_INT_OFFSET() = 0`` -for all int types. +The original intention of ``BTF_INT_OFFSET()`` is to provide flexibility of +bitfield encoding. Currently, both llvm and pahole generate +``BTF_INT_OFFSET() = 0`` for all int types. 2.2.2 BTF_KIND_PTR ~~~~~~~~~~~~~~~~~~ @@ -204,7 +188,7 @@ No additional type data follow ``btf_type``. * ``info.vlen``: 0 * ``size/type``: 0, not used -btf_type is followed by one "struct btf_array":: +``btf_type`` is followed by one ``struct btf_array``:: struct btf_array { __u32 type; @@ -217,27 +201,25 @@ The ``struct btf_array`` encoding: * ``index_type``: the index type * ``nelems``: the number of elements for this array (``0`` is also allowed). -The ``index_type`` can be any regular int types -(u8, u16, u32, u64, unsigned __int128). -The original design of including ``index_type`` follows dwarf -which has a ``index_type`` for its array type. +The ``index_type`` can be any regular int type (``u8``, ``u16``, ``u32``, +``u64``, ``unsigned __int128``). The original design of including +``index_type`` follows DWARF, which has an ``index_type`` for its array type. Currently in BTF, beyond type verification, the ``index_type`` is not used. The ``struct btf_array`` allows chaining through element type to represent -multiple dimensional arrays. For example, ``int a[5][6]``, the following -type system illustrates the chaining: +multidimensional arrays. For example, for ``int a[5][6]``, the following type +information illustrates the chaining: * [1]: int * [2]: array, ``btf_array.type = [1]``, ``btf_array.nelems = 6`` * [3]: array, ``btf_array.type = [2]``, ``btf_array.nelems = 5`` -Currently, both pahole and llvm collapse multiple dimensional array -into one dimensional array, e.g., ``a[5][6]``, the btf_array.nelems -equal to ``30``. This is because the original use case is map pretty -print where the whole array is dumped out so one dimensional array -is enough. As more BTF usage is explored, pahole and llvm can be -changed to generate proper chained representation for -multiple dimensional arrays. +Currently, both pahole and llvm collapse multidimensional array into +one-dimensional array, e.g., for ``a[5][6]``, the ``btf_array.nelems`` is +equal to ``30``. This is because the original use case is map pretty print +where the whole array is dumped out so one-dimensional array is enough. As +more BTF usage is explored, pahole and llvm can be changed to generate proper +chained representation for multidimensional arrays. 2.2.4 BTF_KIND_STRUCT ~~~~~~~~~~~~~~~~~~~~~ @@ -264,28 +246,26 @@ multiple dimensional arrays. * ``type``: the member type * ``offset``: <see below> -If the type info ``kind_flag`` is not set, the offset contains -only bit offset of the member. Note that the base type of the -bitfield can only be int or enum type. If the bitfield size -is 32, the base type can be either int or enum type. -If the bitfield size is not 32, the base type must be int, -and int type ``BTF_INT_BITS()`` encodes the bitfield size. +If the type info ``kind_flag`` is not set, the offset contains only bit offset +of the member. Note that the base type of the bitfield can only be int or enum +type. If the bitfield size is 32, the base type can be either int or enum +type. If the bitfield size is not 32, the base type must be int, and int type +``BTF_INT_BITS()`` encodes the bitfield size. -If the ``kind_flag`` is set, the ``btf_member.offset`` -contains both member bitfield size and bit offset. The -bitfield size and bit offset are calculated as below.:: +If the ``kind_flag`` is set, the ``btf_member.offset`` contains both member +bitfield size and bit offset. The bitfield size and bit offset are calculated +as below.:: #define BTF_MEMBER_BITFIELD_SIZE(val) ((val) >> 24) #define BTF_MEMBER_BIT_OFFSET(val) ((val) & 0xffffff) -In this case, if the base type is an int type, it must -be a regular int type: +In this case, if the base type is an int type, it must be a regular int type: * ``BTF_INT_OFFSET()`` must be 0. * ``BTF_INT_BITS()`` must be equal to ``{1,2,4,8,16} * 8``. -The following kernel patch introduced ``kind_flag`` and -explained why both modes exist: +The following kernel patch introduced ``kind_flag`` and explained why both +modes exist: https://github.com/torvalds/linux/commit/9d5f9f701b1891466fb3dbb1806ad97716f95cc3#diff-fa650a64fdd3968396883d2fe8215ff3 @@ -382,11 +362,11 @@ No additional type data follow ``btf_type``. No additional type data follow ``btf_type``. -A BTF_KIND_FUNC defines, not a type, but a subprogram (function) whose -signature is defined by ``type``. The subprogram is thus an instance of -that type. The BTF_KIND_FUNC may in turn be referenced by a func_info in -the :ref:`BTF_Ext_Section` (ELF) or in the arguments to -:ref:`BPF_Prog_Load` (ABI). +A BTF_KIND_FUNC defines not a type, but a subprogram (function) whose +signature is defined by ``type``. The subprogram is thus an instance of that +type. The BTF_KIND_FUNC may in turn be referenced by a func_info in the +:ref:`BTF_Ext_Section` (ELF) or in the arguments to :ref:`BPF_Prog_Load` +(ABI). 2.2.13 BTF_KIND_FUNC_PROTO ~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -405,13 +385,13 @@ the :ref:`BTF_Ext_Section` (ELF) or in the arguments to __u32 type; }; -If a BTF_KIND_FUNC_PROTO type is referred by a BTF_KIND_FUNC type, -then ``btf_param.name_off`` must point to a valid C identifier -except for the possible last argument representing the variable -argument. The btf_param.type refers to parameter type. +If a BTF_KIND_FUNC_PROTO type is referred by a BTF_KIND_FUNC type, then +``btf_param.name_off`` must point to a valid C identifier except for the +possible last argument representing the variable argument. The btf_param.type +refers to parameter type. -If the function has variable arguments, the last parameter -is encoded with ``name_off = 0`` and ``type = 0``. +If the function has variable arguments, the last parameter is encoded with +``name_off = 0`` and ``type = 0``. 3. BTF Kernel API ***************** @@ -459,10 +439,9 @@ The workflow typically looks like: 3.1 BPF_BTF_LOAD ================ -Load a blob of BTF data into kernel. A blob of data -described in :ref:`BTF_Type_String` -can be directly loaded into the kernel. -A ``btf_fd`` returns to userspace. +Load a blob of BTF data into kernel. A blob of data, described in +:ref:`BTF_Type_String`, can be directly loaded into the kernel. A ``btf_fd`` +is returned to a userspace. 3.2 BPF_MAP_CREATE ================== @@ -484,18 +463,18 @@ In libbpf, the map can be defined with extra annotation like below: }; BPF_ANNOTATE_KV_PAIR(btf_map, int, struct ipv_counts); -Here, the parameters for macro BPF_ANNOTATE_KV_PAIR are map name, -key and value types for the map. -During ELF parsing, libbpf is able to extract key/value type_id's -and assigned them to BPF_MAP_CREATE attributes automatically. +Here, the parameters for macro BPF_ANNOTATE_KV_PAIR are map name, key and +value types for the map. During ELF parsing, libbpf is able to extract +key/value type_id's and assign them to BPF_MAP_CREATE attributes +automatically. .. _BPF_Prog_Load: 3.3 BPF_PROG_LOAD ================= -During prog_load, func_info and line_info can be passed to kernel with -proper values for the following attributes: +During prog_load, func_info and line_info can be passed to kernel with proper +values for the following attributes: :: __u32 insn_cnt; @@ -522,9 +501,9 @@ The func_info and line_info are an array of below, respectively.:: __u32 line_col; /* line number and column number */ }; -func_info_rec_size is the size of each func_info record, and line_info_rec_size -is the size of each line_info record. Passing the record size to kernel make -it possible to extend the record itself in the future. +func_info_rec_size is the size of each func_info record, and +line_info_rec_size is the size of each line_info record. Passing the record +size to kernel make it possible to extend the record itself in the future. Below are requirements for func_info: * func_info[0].insn_off must be 0. @@ -532,7 +511,7 @@ Below are requirements for func_info: bpf func boundaries. Below are requirements for line_info: - * the first insn in each func must points to a line_info record. + * the first insn in each func must have a line_info record pointing to it. * the line_info insn_off is in strictly increasing order. For line_info, the line number and column number are defined as below: @@ -543,40 +522,38 @@ For line_info, the line number and column number are defined as below: 3.4 BPF_{PROG,MAP}_GET_NEXT_ID -In kernel, every loaded program, map or btf has a unique id. -The id won't change during the life time of the program, map or btf. +In kernel, every loaded program, map or btf has a unique id. The id won't +change during the lifetime of a program, map, or btf. -The bpf syscall command BPF_{PROG,MAP}_GET_NEXT_ID -returns all id's, one for each command, to user space, for bpf -program or maps, -so the inspection tool can inspect all programs and maps. +The bpf syscall command BPF_{PROG,MAP}_GET_NEXT_ID returns all id's, one for +each command, to user space, for bpf program or maps, respectively, so an +inspection tool can inspect all programs and maps. 3.5 BPF_{PROG,MAP}_GET_FD_BY_ID -The introspection tool cannot use id to get details about program or maps. -A file descriptor needs to be obtained first for reference counting purpose. +An introspection tool cannot use id to get details about program or maps. +A file descriptor needs to be obtained first for reference-counting purpose. 3.6 BPF_OBJ_GET_INFO_BY_FD ========================== -Once a program/map fd is acquired, the introspection tool can -get the detailed information from kernel about this fd, -some of which is btf related. For example, -``bpf_map_info`` returns ``btf_id``, key/value type id. -``bpf_prog_info`` returns ``btf_id``, func_info and line info -for translated bpf byte codes, and jited_line_info. +Once a program/map fd is acquired, an introspection tool can get the detailed +information from kernel about this fd, some of which are BTF-related. For +example, ``bpf_map_info`` returns ``btf_id`` and key/value type ids. +``bpf_prog_info`` returns ``btf_id``, func_info, and line info for translated +bpf byte codes, and jited_line_info. 3.7 BPF_BTF_GET_FD_BY_ID ======================== -With ``btf_id`` obtained in ``bpf_map_info`` and ``bpf_prog_info``, -bpf syscall command BPF_BTF_GET_FD_BY_ID can retrieve a btf fd. -Then, with command BPF_OBJ_GET_INFO_BY_FD, the btf blob, originally -loaded into the kernel with BPF_BTF_LOAD, can be retrieved. +With ``btf_id`` obtained in ``bpf_map_info`` and ``bpf_prog_info``, bpf +syscall command BPF_BTF_GET_FD_BY_ID can retrieve a btf fd. Then, with +command BPF_OBJ_GET_INFO_BY_FD, the btf blob, originally loaded into the +kernel with BPF_BTF_LOAD, can be retrieved. -With the btf blob, ``bpf_map_info`` and ``bpf_prog_info``, the introspection -tool has full btf knowledge and is able to pretty print map key/values, -dump func signatures, dump line info along with byte/jit codes. +With the btf blob, ``bpf_map_info``, and ``bpf_prog_info``, an introspection +tool has full btf knowledge and is able to pretty print map key/values, dump +func signatures and line info, along with byte/jit codes. 4. ELF File Format Interface **************************** @@ -584,19 +561,19 @@ dump func signatures, dump line info along with byte/jit codes. 4.1 .BTF section ================ -The .BTF section contains type and string data. The format of this section -is same as the one describe in :ref:`BTF_Type_String`. +The .BTF section contains type and string data. The format of this section is +same as the one describe in :ref:`BTF_Type_String`. .. _BTF_Ext_Section: 4.2 .BTF.ext section ==================== -The .BTF.ext section encodes func_info and line_info which -needs loader manipulation before loading into the kernel. +The .BTF.ext section encodes func_info and line_info which needs loader +manipulation before loading into the kernel. -The specification for .BTF.ext section is defined at -``tools/lib/bpf/btf.h`` and ``tools/lib/bpf/btf.c``. +The specification for .BTF.ext section is defined at ``tools/lib/bpf/btf.h`` +and ``tools/lib/bpf/btf.c``. The current header of .BTF.ext section:: @@ -613,9 +590,9 @@ The current header of .BTF.ext section:: __u32 line_info_len; }; -It is very similar to .BTF section. Instead of type/string section, -it contains func_info and line_info section. See :ref:`BPF_Prog_Load` -for details about func_info and line_info record format. +It is very similar to .BTF section. Instead of type/string section, it +contains func_info and line_info section. See :ref:`BPF_Prog_Load` for details +about func_info and line_info record format. The func_info is organized as below.:: @@ -624,9 +601,9 @@ The func_info is organized as below.:: btf_ext_info_sec for section #2 /* func_info for section #2 */ ... -``func_info_rec_size`` specifies the size of ``bpf_func_info`` structure -when .BTF.ext is generated. btf_ext_info_sec, defined below, is -the func_info for each specific ELF section.:: +``func_info_rec_size`` specifies the size of ``bpf_func_info`` structure when +.BTF.ext is generated. ``btf_ext_info_sec``, defined below, is a collection of +func_info for each specific ELF section.:: struct btf_ext_info_sec { __u32 sec_name_off; /* offset to section name */ @@ -644,14 +621,14 @@ The line_info is organized as below.:: btf_ext_info_sec for section #2 /* line_info for section #2 */ ... -``line_info_rec_size`` specifies the size of ``bpf_line_info`` structure -when .BTF.ext is generated. +``line_info_rec_size`` specifies the size of ``bpf_line_info`` structure when +.BTF.ext is generated. The interpretation of ``bpf_func_info->insn_off`` and -``bpf_line_info->insn_off`` is different between kernel API and ELF API. -For kernel API, the ``insn_off`` is the instruction offset in the unit -of ``struct bpf_insn``. For ELF API, the ``insn_off`` is the byte offset -from the beginning of section (``btf_ext_info_sec->sec_name_off``). +``bpf_line_info->insn_off`` is different between kernel API and ELF API. For +kernel API, the ``insn_off`` is the instruction offset in the unit of ``struct +bpf_insn``. For ELF API, the ``insn_off`` is the byte offset from the +beginning of section (``btf_ext_info_sec->sec_name_off``). 5. Using BTF ************ @@ -659,10 +636,9 @@ from the beginning of section (``btf_ext_info_sec->sec_name_off``). 5.1 bpftool map pretty print ============================ -With BTF, the map key/value can be printed based on fields rather than -simply raw bytes. This is especially -valuable for large structure or if you data structure -has bitfields. For example, for the following map,:: +With BTF, the map key/value can be printed based on fields rather than simply +raw bytes. This is especially valuable for large structure or if your data +structure has bitfields. For example, for the following map,:: enum A { A1, A2, A3, A4, A5 }; typedef enum A ___A; @@ -702,9 +678,9 @@ bpftool is able to pretty print like below: 5.2 bpftool prog dump ===================== -The following is an example to show func_info and line_info -can help prog dump with better kernel symbol name, function prototype -and line information.:: +The following is an example showing how func_info and line_info can help prog +dump with better kernel symbol names, function prototypes and line +information.:: $ bpftool prog dump jited pinned /sys/fs/bpf/test_btf_haskv [...] @@ -733,10 +709,11 @@ and line information.:: ; counts = bpf_map_lookup_elem(&btf_map, &key); [...] -5.3 verifier log +5.3 Verifier Log ================ -The following is an example how line_info can help verifier failure debug.:: +The following is an example of how line_info can help debugging verification +failure.:: /* The code at tools/testing/selftests/bpf/test_xdp_noinline.c * is modified as below. @@ -765,8 +742,8 @@ You need latest pahole https://git.kernel.org/pub/scm/devel/pahole/pahole.git/ -or llvm (8.0 or later). The pahole acts as a dwarf2btf converter. It doesn't support .BTF.ext -and btf BTF_KIND_FUNC type yet. For example,:: +or llvm (8.0 or later). The pahole acts as a dwarf2btf converter. It doesn't +support .BTF.ext and btf BTF_KIND_FUNC type yet. For example,:: -bash-4.4$ cat t.c struct t { @@ -783,8 +760,9 @@ and btf BTF_KIND_FUNC type yet. For example,:: c type_id=2 bitfield_size=2 bits_offset=5 [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED -The llvm is able to generate .BTF and .BTF.ext directly with -g for bpf target only. -The assembly code (-S) is able to show the BTF encoding in assembly format.:: +The llvm is able to generate .BTF and .BTF.ext directly with -g for bpf target +only. The assembly code (-S) is able to show the BTF encoding in assembly +format.:: -bash-4.4$ cat t2.c typedef int __int32; @@ -867,4 +845,4 @@ The assembly code (-S) is able to show the BTF encoding in assembly format.:: 7. Testing ********** -Kernel bpf selftest `test_btf.c` provides extensive set of BTF related tests. +Kernel bpf selftest `test_btf.c` provides extensive set of BTF-related tests. diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst index 4ae4f9d8f8fe..e14d7d40fc75 100644 --- a/Documentation/networking/af_xdp.rst +++ b/Documentation/networking/af_xdp.rst @@ -295,6 +295,41 @@ using:: For XDP_SKB mode, use the switch "-S" instead of "-N" and all options can be displayed with "-h", as usual. +FAQ +======= + +Q: I am not seeing any traffic on the socket. What am I doing wrong? + +A: When a netdev of a physical NIC is initialized, Linux usually + allocates one Rx and Tx queue pair per core. So on a 8 core system, + queue ids 0 to 7 will be allocated, one per core. In the AF_XDP + bind call or the xsk_socket__create libbpf function call, you + specify a specific queue id to bind to and it is only the traffic + towards that queue you are going to get on you socket. So in the + example above, if you bind to queue 0, you are NOT going to get any + traffic that is distributed to queues 1 through 7. If you are + lucky, you will see the traffic, but usually it will end up on one + of the queues you have not bound to. + + There are a number of ways to solve the problem of getting the + traffic you want to the queue id you bound to. If you want to see + all the traffic, you can force the netdev to only have 1 queue, queue + id 0, and then bind to queue 0. You can use ethtool to do this:: + + sudo ethtool -L <interface> combined 1 + + If you want to only see part of the traffic, you can program the + NIC through ethtool to filter out your traffic to a single queue id + that you can bind your XDP socket to. Here is one example in which + UDP traffic to and from port 4242 are sent to queue 2:: + + sudo ethtool -N <interface> rx-flow-hash udp4 fn + sudo ethtool -N <interface> flow-type udp4 src-port 4242 dst-port \ + 4242 action 2 + + A number of other ways are possible all up to the capabilitites of + the NIC you have. + Credits ======= @@ -309,4 +344,3 @@ Credits - Michael S. Tsirkin - Qi Z Zhang - Willem de Bruijn - diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index b5e060edfc38..319e5e041f38 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt @@ -829,7 +829,7 @@ tracing filters may do to maintain counters of events, for example. Register R9 is not used by socket filters either, but more complex filters may be running out of registers and would have to resort to spill/fill to stack. -Internal BPF can used as generic assembler for last step performance +Internal BPF can be used as a generic assembler for last step performance optimizations, socket filters and seccomp are using it as assembler. Tracing filters may use it as assembler to generate code from kernel. In kernel usage may not be bounded by security considerations, since generated internal BPF code |