diff options
Diffstat (limited to 'Documentation')
45 files changed, 2011 insertions, 491 deletions
diff --git a/Documentation/ABI/testing/sysfs-timecard b/Documentation/ABI/testing/sysfs-timecard index 97f6773794a5..220478156297 100644 --- a/Documentation/ABI/testing/sysfs-timecard +++ b/Documentation/ABI/testing/sysfs-timecard @@ -37,8 +37,15 @@ Description: (RO) Set of available destinations (sinks) for a SMA PPS2 signal is sent to the PPS2 selector TS1 signal is sent to timestamper 1 TS2 signal is sent to timestamper 2 + TS3 signal is sent to timestamper 3 + TS4 signal is sent to timestamper 4 IRIG signal is sent to the IRIG-B module DCF signal is sent to the DCF module + FREQ1 signal is sent to frequency counter 1 + FREQ2 signal is sent to frequency counter 2 + FREQ3 signal is sent to frequency counter 3 + FREQ4 signal is sent to frequency counter 4 + None signal input is disabled ===== ================================================ What: /sys/class/timecard/ocpN/available_sma_outputs @@ -50,10 +57,16 @@ Description: (RO) Set of available sources for a SMA output signal. 10Mhz output is from the 10Mhz reference clock PHC output PPS is from the PHC clock MAC output PPS is from the Miniature Atomic Clock - GNSS output PPS is from the GNSS module + GNSS1 output PPS is from the first GNSS module GNSS2 output PPS is from the second GNSS module IRIG output is from the PHC, in IRIG-B format DCF output is from the PHC, in DCF format + GEN1 output is from frequency generator 1 + GEN2 output is from frequency generator 2 + GEN3 output is from frequency generator 3 + GEN4 output is from frequency generator 4 + GND output is GND + VCC output is VCC ===== ================================================ What: /sys/class/timecard/ocpN/clock_source @@ -63,6 +76,97 @@ Description: (RW) Contains the current synchronization source used by the PHC. May be changed by writing one of the listed values from the available_clock_sources attribute set. +What: /sys/class/timecard/ocpN/clock_status_drift +Date: March 2022 +Contact: Jonathan Lemon <jonathan.lemon@gmail.com> +Description: (RO) Contains the current drift value used by the firmware + for internal disciplining of the atomic clock. + +What: /sys/class/timecard/ocpN/clock_status_offset +Date: March 2022 +Contact: Jonathan Lemon <jonathan.lemon@gmail.com> +Description: (RO) Contains the current offset value used by the firmware + for internal disciplining of the atomic clock. + +What: /sys/class/timecard/ocpN/freqX +Date: March 2022 +Contact: Jonathan Lemon <jonathan.lemon@gmail.com> +Description: (RO) Optional directory containing the sysfs nodes for + frequency counter <X>. + +What: /sys/class/timecard/ocpN/freqX/frequency +Date: March 2022 +Contact: Jonathan Lemon <jonathan.lemon@gmail.com> +Description: (RO) Contains the measured frequency over the specified + measurement period. + +What: /sys/class/timecard/ocpN/freqX/seconds +Date: March 2022 +Contact: Jonathan Lemon <jonathan.lemon@gmail.com> +Description: (RW) Specifies the number of seconds from 0-255 that the + frequency should be measured over. Write 0 to disable. + +What: /sys/class/timecard/ocpN/genX +Date: March 2022 +Contact: Jonathan Lemon <jonathan.lemon@gmail.com> +Description: (RO) Optional directory containing the sysfs nodes for + frequency generator <X>. + +What: /sys/class/timecard/ocpN/genX/duty +Date: March 2022 +Contact: Jonathan Lemon <jonathan.lemon@gmail.com> +Description: (RO) Specifies the signal duty cycle as a percentage from 1-99. + +What: /sys/class/timecard/ocpN/genX/period +Date: March 2022 +Contact: Jonathan Lemon <jonathan.lemon@gmail.com> +Description: (RO) Specifies the signal period in nanoseconds. + +What: /sys/class/timecard/ocpN/genX/phase +Date: March 2022 +Contact: Jonathan Lemon <jonathan.lemon@gmail.com> +Description: (RO) Specifies the signal phase offset in nanoseconds. + +What: /sys/class/timecard/ocpN/genX/polarity +Date: March 2022 +Contact: Jonathan Lemon <jonathan.lemon@gmail.com> +Description: (RO) Specifies the signal polarity, either 1 or 0. + +What: /sys/class/timecard/ocpN/genX/running +Date: March 2022 +Contact: Jonathan Lemon <jonathan.lemon@gmail.com> +Description: (RO) Either 0 or 1, showing if the signal generator is running. + +What: /sys/class/timecard/ocpN/genX/start +Date: March 2022 +Contact: Jonathan Lemon <jonathan.lemon@gmail.com> +Description: (RO) Shows the time in <sec>.<nsec> that the signal generator + started running. + +What: /sys/class/timecard/ocpN/genX/signal +Date: March 2022 +Contact: Jonathan Lemon <jonathan.lemon@gmail.com> +Description: (RW) Used to start the signal generator, and summarize + the current status. + + The signal generator may be started by writing the signal + period, followed by the optional signal values. If the + optional values are not provided, they default to the current + settings, which may be obtained from the other sysfs nodes. + + period [duty [phase [polarity]]] + + echo 500000000 > signal # 1/2 second period + echo 1000000 40 100 > signal + echo 0 > signal # turn off generator + + Period and phase are specified in nanoseconds. Duty cycle is + a percentage from 1-99. Polarity is 1 or 0. + + Reading this node will return: + + period duty phase polarity start_time + What: /sys/class/timecard/ocpN/gnss_sync Date: September 2021 Contact: Jonathan Lemon <jonathan.lemon@gmail.com> @@ -126,6 +230,16 @@ Description: (RW) These attributes specify the direction of the signal The 10Mhz reference clock input is currently only valid on SMA1 and may not be combined with other destination sinks. +What: /sys/class/timecard/ocpN/tod_correction +Date: March 2022 +Contact: Jonathan Lemon <jonathan.lemon@gmail.com> +Description: (RW) The incoming GNSS signal is in UTC time, and the NMEA + format messages do not provide a TAI offset. This sets the + correction value for the incoming time. + + If UBX_LS is enabled, this should be 0, and the offset is + taken from the UBX-NAV-TIMELS message. + What: /sys/class/timecard/ocpN/ts_window_adjust Date: September 2021 Contact: Jonathan Lemon <jonathan.lemon@gmail.com> diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst index 4150f74c521a..f86b5e1623c6 100644 --- a/Documentation/admin-guide/sysctl/net.rst +++ b/Documentation/admin-guide/sysctl/net.rst @@ -365,6 +365,15 @@ new netns has been created. Default : 0 (for compatibility reasons) +txrehash +-------- + +Controls default hash rethink behaviour on listening socket when SO_TXREHASH +option is set to SOCK_TXREHASH_DEFAULT (i. e. not overridden by setsockopt). + +If set to 1 (default), hash rethink is performed on listening socket. +If set to 0, hash rethink is not performed. + 2. /proc/sys/net/unix - Parameters for Unix domain sockets ---------------------------------------------------------- diff --git a/Documentation/bpf/bpf_prog_run.rst b/Documentation/bpf/bpf_prog_run.rst new file mode 100644 index 000000000000..4868c909df5c --- /dev/null +++ b/Documentation/bpf/bpf_prog_run.rst @@ -0,0 +1,117 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================================== +Running BPF programs from userspace +=================================== + +This document describes the ``BPF_PROG_RUN`` facility for running BPF programs +from userspace. + +.. contents:: + :local: + :depth: 2 + + +Overview +-------- + +The ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to +execute a BPF program in the kernel and return the results to userspace. This +can be used to unit test BPF programs against user-supplied context objects, and +as way to explicitly execute programs in the kernel for their side effects. The +command was previously named ``BPF_PROG_TEST_RUN``, and both constants continue +to be defined in the UAPI header, aliased to the same value. + +The ``BPF_PROG_RUN`` command can be used to execute BPF programs of the +following types: + +- ``BPF_PROG_TYPE_SOCKET_FILTER`` +- ``BPF_PROG_TYPE_SCHED_CLS`` +- ``BPF_PROG_TYPE_SCHED_ACT`` +- ``BPF_PROG_TYPE_XDP`` +- ``BPF_PROG_TYPE_SK_LOOKUP`` +- ``BPF_PROG_TYPE_CGROUP_SKB`` +- ``BPF_PROG_TYPE_LWT_IN`` +- ``BPF_PROG_TYPE_LWT_OUT`` +- ``BPF_PROG_TYPE_LWT_XMIT`` +- ``BPF_PROG_TYPE_LWT_SEG6LOCAL`` +- ``BPF_PROG_TYPE_FLOW_DISSECTOR`` +- ``BPF_PROG_TYPE_STRUCT_OPS`` +- ``BPF_PROG_TYPE_RAW_TRACEPOINT`` +- ``BPF_PROG_TYPE_SYSCALL`` + +When using the ``BPF_PROG_RUN`` command, userspace supplies an input context +object and (for program types operating on network packets) a buffer containing +the packet data that the BPF program will operate on. The kernel will then +execute the program and return the results to userspace. Note that programs will +not have any side effects while being run in this mode; in particular, packets +will not actually be redirected or dropped, the program return code will just be +returned to userspace. A separate mode for live execution of XDP programs is +provided, documented separately below. + +Running XDP programs in "live frame mode" +----------------------------------------- + +The ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs, +which can be used to execute XDP programs in a way where packets will actually +be processed by the kernel after the execution of the XDP program as if they +arrived on a physical interface. This mode is activated by setting the +``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to +``BPF_PROG_RUN``. + +The live packet mode is optimised for high performance execution of the supplied +XDP program many times (suitable for, e.g., running as a traffic generator), +which means the semantics are not quite as straight-forward as the regular test +run mode. Specifically: + +- When executing an XDP program in live frame mode, the result of the execution + will not be returned to userspace; instead, the kernel will perform the + operation indicated by the program's return code (drop the packet, redirect + it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes + in the syscall parameters when running in this mode will be rejected. In + addition, not all failures will be reported back to userspace directly; + specifically, only fatal errors in setup or during execution (like memory + allocation errors) will halt execution and return an error. If an error occurs + in packet processing, like a failure to redirect to a given interface, + execution will continue with the next repetition; these errors can be detected + via the same trace points as for regular XDP programs. + +- Userspace can supply an ifindex as part of the context object, just like in + the regular (non-live) mode. The XDP program will be executed as though the + packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context + object will point to that interface. Furthermore, if the XDP program returns + ``XDP_PASS``, the packet will be injected into the kernel networking stack as + though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet + will be transmitted *out* of that same interface. Do note, though, that + because the program execution is not happening in driver context, an + ``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to + that same interface (i.e., it will only work if the driver has support for the + ``ndo_xdp_xmit`` driver op). + +- When running the program with multiple repetitions, the execution will happen + in batches. The batch size defaults to 64 packets (which is same as the + maximum NAPI receive batch size), but can be specified by userspace through + the ``batch_size`` parameter, up to a maximum of 256 packets. For each batch, + the kernel executes the XDP program repeatedly, each invocation getting a + separate copy of the packet data. For each repetition, if the program drops + the packet, the data page is immediately recycled (see below). Otherwise, the + packet is buffered until the end of the batch, at which point all packets + buffered this way during the batch are transmitted at once. + +- When setting up the test run, the kernel will initialise a pool of memory + pages of the same size as the batch size. Each memory page will be initialised + with the initial packet data supplied by userspace at ``BPF_PROG_RUN`` + invocation. When possible, the pages will be recycled on future program + invocations, to improve performance. Pages will generally be recycled a full + batch at a time, except when a packet is dropped (by return code or because + of, say, a redirection error), in which case that page will be recycled + immediately. If a packet ends up being passed to the regular networking stack + (because the XDP program returns ``XDP_PASS``, or because it ends up being + redirected to an interface that injects it into the stack), the page will be + released and a new one will be allocated when the pool is empty. + + When recycling, the page content is not rewritten; only the packet boundary + pointers (``data``, ``data_end`` and ``data_meta``) in the context object will + be reset to the original values. This means that if a program rewrites the + packet contents, it has to be prepared to see either the original content or + the modified version on subsequent invocations. diff --git a/Documentation/bpf/btf.rst b/Documentation/bpf/btf.rst index 1ebf4c5c7ddc..7940da9bc6c1 100644 --- a/Documentation/bpf/btf.rst +++ b/Documentation/bpf/btf.rst @@ -503,6 +503,19 @@ valid index (starting from 0) pointing to a member or an argument. * ``info.vlen``: 0 * ``type``: the type with ``btf_type_tag`` attribute +Currently, ``BTF_KIND_TYPE_TAG`` is only emitted for pointer types. +It has the following btf type chain: +:: + + ptr -> [type_tag]* + -> [const | volatile | restrict | typedef]* + -> base_type + +Basically, a pointer type points to zero or more +type_tag, then zero or more const/volatile/restrict/typedef +and finally the base type. The base type is one of +int, ptr, array, struct, union, enum, func_proto and float types. + 3. BTF Kernel API ================= @@ -565,18 +578,15 @@ A map can be created with ``btf_fd`` and specified key/value type id.:: In libbpf, the map can be defined with extra annotation like below: :: - struct bpf_map_def SEC("maps") btf_map = { - .type = BPF_MAP_TYPE_ARRAY, - .key_size = sizeof(int), - .value_size = sizeof(struct ipv_counts), - .max_entries = 4, - }; - BPF_ANNOTATE_KV_PAIR(btf_map, int, struct ipv_counts); + struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __type(key, int); + __type(value, struct ipv_counts); + __uint(max_entries, 4); + } btf_map SEC(".maps"); -Here, the parameters for macro BPF_ANNOTATE_KV_PAIR are map name, key and -value types for the map. During ELF parsing, libbpf is able to extract -key/value type_id's and assign them to BPF_MAP_CREATE attributes -automatically. +During ELF parsing, libbpf is able to extract key/value type_id's and assign +them to BPF_MAP_CREATE attributes automatically. .. _BPF_Prog_Load: @@ -824,13 +834,12 @@ structure has bitfields. For example, for the following map,:: ___A b1:4; enum A b2:4; }; - struct bpf_map_def SEC("maps") tmpmap = { - .type = BPF_MAP_TYPE_ARRAY, - .key_size = sizeof(__u32), - .value_size = sizeof(struct tmp_t), - .max_entries = 1, - }; - BPF_ANNOTATE_KV_PAIR(tmpmap, int, struct tmp_t); + struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __type(key, int); + __type(value, struct tmp_t); + __uint(max_entries, 1); + } tmpmap SEC(".maps"); bpftool is able to pretty print like below: :: diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst index ef5c996547ec..96056a7447c7 100644 --- a/Documentation/bpf/index.rst +++ b/Documentation/bpf/index.rst @@ -21,6 +21,7 @@ that goes into great technical depth about the BPF Architecture. helpers programs maps + bpf_prog_run classic_vs_extended.rst bpf_licensing test_debug diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst index 3704836fe6df..5300837ac2c9 100644 --- a/Documentation/bpf/instruction-set.rst +++ b/Documentation/bpf/instruction-set.rst @@ -22,7 +22,13 @@ necessary across calls. Instruction encoding ==================== -eBPF uses 64-bit instructions with the following encoding: +eBPF has two instruction encodings: + + * the basic instruction encoding, which uses 64 bits to encode an instruction + * the wide instruction encoding, which appends a second 64-bit immediate value + (imm64) after the basic instruction for a total of 128 bits. + +The basic instruction encoding looks as follows: ============= ======= =============== ==================== ============ 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) @@ -82,9 +88,9 @@ BPF_ALU uses 32-bit wide operands while BPF_ALU64 uses 64-bit wide operands for otherwise identical operations. The code field encodes the operation as below: - ======== ===== ========================== + ======== ===== ================================================= code value description - ======== ===== ========================== + ======== ===== ================================================= BPF_ADD 0x00 dst += src BPF_SUB 0x10 dst -= src BPF_MUL 0x20 dst \*= src @@ -98,8 +104,8 @@ The code field encodes the operation as below: BPF_XOR 0xa0 dst ^= src BPF_MOV 0xb0 dst = src BPF_ARSH 0xc0 sign extending shift right - BPF_END 0xd0 endianness conversion - ======== ===== ========================== + BPF_END 0xd0 byte swap operations (see separate section below) + ======== ===== ================================================= BPF_ADD | BPF_X | BPF_ALU means:: @@ -118,6 +124,42 @@ BPF_XOR | BPF_K | BPF_ALU64 means:: src_reg = src_reg ^ imm32 +Byte swap instructions +---------------------- + +The byte swap instructions use an instruction class of ``BFP_ALU`` and a 4-bit +code field of ``BPF_END``. + +The byte swap instructions instructions operate on the destination register +only and do not use a separate source register or immediate value. + +The 1-bit source operand field in the opcode is used to to select what byte +order the operation convert from or to: + + ========= ===== ================================================= + source value description + ========= ===== ================================================= + BPF_TO_LE 0x00 convert between host byte order and little endian + BPF_TO_BE 0x08 convert between host byte order and big endian + ========= ===== ================================================= + +The imm field encodes the width of the swap operations. The following widths +are supported: 16, 32 and 64. + +Examples: + +``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means:: + + dst_reg = htole16(dst_reg) + +``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means:: + + dst_reg = htobe64(dst_reg) + +``BPF_FROM_LE`` and ``BPF_FROM_BE`` exist as aliases for ``BPF_TO_LE`` and +``BPF_TO_LE`` respetively. + + Jump instructions ----------------- @@ -176,63 +218,96 @@ The mode modifier is one of: ============= ===== ==================================== mode modifier value description ============= ===== ==================================== - BPF_IMM 0x00 used for 64-bit mov - BPF_ABS 0x20 legacy BPF packet access - BPF_IND 0x40 legacy BPF packet access - BPF_MEM 0x60 all normal load and store operations + BPF_IMM 0x00 64-bit immediate instructions + BPF_ABS 0x20 legacy BPF packet access (absolute) + BPF_IND 0x40 legacy BPF packet access (indirect) + BPF_MEM 0x60 regular load and store operations BPF_ATOMIC 0xc0 atomic operations ============= ===== ==================================== -BPF_MEM | <size> | BPF_STX means:: + +Regular load and store operations +--------------------------------- + +The ``BPF_MEM`` mode modifier is used to encode regular load and store +instructions that transfer data between a register and memory. + +``BPF_MEM | <size> | BPF_STX`` means:: *(size *) (dst_reg + off) = src_reg -BPF_MEM | <size> | BPF_ST means:: +``BPF_MEM | <size> | BPF_ST`` means:: *(size *) (dst_reg + off) = imm32 -BPF_MEM | <size> | BPF_LDX means:: +``BPF_MEM | <size> | BPF_LDX`` means:: dst_reg = *(size *) (src_reg + off) -Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. +Where size is one of: ``BPF_B``, ``BPF_H``, ``BPF_W``, or ``BPF_DW``. Atomic operations ----------------- -eBPF includes atomic operations, which use the immediate field for extra -encoding:: +Atomic operations are operations that operate on memory and can not be +interrupted or corrupted by other access to the same memory region +by other eBPF programs or means outside of this specification. + +All atomic operations supported by eBPF are encoded as store operations +that use the ``BPF_ATOMIC`` mode modifier as follows: + + * ``BPF_ATOMIC | BPF_W | BPF_STX`` for 32-bit operations + * ``BPF_ATOMIC | BPF_DW | BPF_STX`` for 64-bit operations + * 8-bit and 16-bit wide atomic operations are not supported. - .imm = BPF_ADD, .code = BPF_ATOMIC | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg - .imm = BPF_ADD, .code = BPF_ATOMIC | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg +The imm field is used to encode the actual atomic operation. +Simple atomic operation use a subset of the values defined to encode +arithmetic operations in the imm field to encode the atomic operation: -The basic atomic operations supported are:: + ======== ===== =========== + imm value description + ======== ===== =========== + BPF_ADD 0x00 atomic add + BPF_OR 0x40 atomic or + BPF_AND 0x50 atomic and + BPF_XOR 0xa0 atomic xor + ======== ===== =========== - BPF_ADD - BPF_AND - BPF_OR - BPF_XOR -Each having equivalent semantics with the ``BPF_ADD`` example, that is: the -memory location addresed by ``dst_reg + off`` is atomically modified, with -``src_reg`` as the other operand. If the ``BPF_FETCH`` flag is set in the -immediate, then these operations also overwrite ``src_reg`` with the -value that was in memory before it was modified. +``BPF_ATOMIC | BPF_W | BPF_STX`` with imm = BPF_ADD means:: -The more special operations are:: + *(u32 *)(dst_reg + off16) += src_reg - BPF_XCHG +``BPF_ATOMIC | BPF_DW | BPF_STX`` with imm = BPF ADD means:: -This atomically exchanges ``src_reg`` with the value addressed by ``dst_reg + -off``. :: + *(u64 *)(dst_reg + off16) += src_reg - BPF_CMPXCHG +``BPF_XADD`` is a deprecated name for ``BPF_ATOMIC | BPF_ADD``. -This atomically compares the value addressed by ``dst_reg + off`` with -``R0``. If they match it is replaced with ``src_reg``. In either case, the -value that was there before is zero-extended and loaded back to ``R0``. +In addition to the simple atomic operations, there also is a modifier and +two complex atomic operations: -Note that 1 and 2 byte atomic operations are not supported. + =========== ================ =========================== + imm value description + =========== ================ =========================== + BPF_FETCH 0x01 modifier: return old value + BPF_XCHG 0xe0 | BPF_FETCH atomic exchange + BPF_CMPXCHG 0xf0 | BPF_FETCH atomic compare and exchange + =========== ================ =========================== + +The ``BPF_FETCH`` modifier is optional for simple atomic operations, and +always set for the complex atomic operations. If the ``BPF_FETCH`` flag +is set, then the operation also overwrites ``src_reg`` with the value that +was in memory before it was modified. + +The ``BPF_XCHG`` operation atomically exchanges ``src_reg`` with the value +addressed by ``dst_reg + off``. + +The ``BPF_CMPXCHG`` operation atomically compares the value addressed by +``dst_reg + off`` with ``R0``. If they match, the value addressed by +``dst_reg + off`` is replaced with ``src_reg``. In either case, the +value that was at ``dst_reg + off`` before the operation is zero-extended +and loaded back to ``R0``. Clang can generate atomic instructions by default when ``-mcpu=v3`` is enabled. If a lower version for ``-mcpu`` is set, the only atomic instruction @@ -240,40 +315,52 @@ Clang can generate is ``BPF_ADD`` *without* ``BPF_FETCH``. If you need to enable the atomics features, while keeping a lower ``-mcpu`` version, you can use ``-Xclang -target-feature -Xclang +alu32``. -You may encounter ``BPF_XADD`` - this is a legacy name for ``BPF_ATOMIC``, -referring to the exclusive-add operation encoded when the immediate field is -zero. +64-bit immediate instructions +----------------------------- -16-byte instructions --------------------- +Instructions with the ``BPF_IMM`` mode modifier use the wide instruction +encoding for an extra imm64 value. -eBPF has one 16-byte instruction: ``BPF_LD | BPF_DW | BPF_IMM`` which consists -of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single -instruction that loads 64-bit immediate value into a dst_reg. +There is currently only one such instruction. -Packet access instructions --------------------------- +``BPF_LD | BPF_DW | BPF_IMM`` means:: -eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and -(BPF_IND | <size> | BPF_LD) which are used to access packet data. + dst_reg = imm64 -They had to be carried over from classic BPF to have strong performance of -socket filters running in eBPF interpreter. These instructions can only -be used when interpreter context is a pointer to ``struct sk_buff`` and -have seven implicit operands. Register R6 is an implicit input that must -contain pointer to sk_buff. Register R0 is an implicit output which contains -the data fetched from the packet. Registers R1-R5 are scratch registers -and must not be used to store the data across BPF_ABS | BPF_LD or -BPF_IND | BPF_LD instructions. -These instructions have implicit program exit condition as well. When -eBPF program is trying to access the data beyond the packet boundary, -the interpreter will abort the execution of the program. JIT compilers -therefore must preserve this property. src_reg and imm32 fields are -explicit inputs to these instructions. +Legacy BPF Packet access instructions +------------------------------------- -For example, BPF_IND | BPF_W | BPF_LD means:: +eBPF has special instructions for access to packet data that have been +carried over from classic BPF to retain the performance of legacy socket +filters running in the eBPF interpreter. - R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32)) +The instructions come in two forms: ``BPF_ABS | <size> | BPF_LD`` and +``BPF_IND | <size> | BPF_LD``. -and R1 - R5 are clobbered. +These instructions are used to access packet data and can only be used when +the program context is a pointer to networking packet. ``BPF_ABS`` +accesses packet data at an absolute offset specified by the immediate data +and ``BPF_IND`` access packet data at an offset that includes the value of +a register in addition to the immediate data. + +These instructions have seven implicit operands: + + * Register R6 is an implicit input that must contain pointer to a + struct sk_buff. + * Register R0 is an implicit output which contains the data fetched from + the packet. + * Registers R1-R5 are scratch registers that are clobbered after a call to + ``BPF_ABS | BPF_LD`` or ``BPF_IND`` | BPF_LD instructions. + +These instructions have an implicit program exit condition as well. When an +eBPF program is trying to access the data beyond the packet boundary, the +program execution will be aborted. + +``BPF_ABS | BPF_W | BPF_LD`` means:: + + R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + imm32)) + +``BPF_IND | BPF_W | BPF_LD`` means:: + + R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32)) diff --git a/Documentation/bpf/verifier.rst b/Documentation/bpf/verifier.rst index fae5f6273bac..d4326caf01f9 100644 --- a/Documentation/bpf/verifier.rst +++ b/Documentation/bpf/verifier.rst @@ -329,7 +329,7 @@ Program with unreachable instructions:: BPF_EXIT_INSN(), }; -Error: +Error:: unreachable insn 1 diff --git a/Documentation/devicetree/bindings/i2c/i2c.txt b/Documentation/devicetree/bindings/i2c/i2c.txt index b864916e087f..fc3dd7ec0445 100644 --- a/Documentation/devicetree/bindings/i2c/i2c.txt +++ b/Documentation/devicetree/bindings/i2c/i2c.txt @@ -95,6 +95,10 @@ wants to support one of the below features, it should adapt these bindings. - smbus-alert states that the optional SMBus-Alert feature apply to this bus. +- mctp-controller + indicates that the system is accessible via this bus as an endpoint for + MCTP over I2C transport. + Required properties (per child device) -------------------------------------- diff --git a/Documentation/devicetree/bindings/net/can/allwinner,sun4i-a10-can.yaml b/Documentation/devicetree/bindings/net/can/allwinner,sun4i-a10-can.yaml index c93fe9d3ea82..3c51b2d02957 100644 --- a/Documentation/devicetree/bindings/net/can/allwinner,sun4i-a10-can.yaml +++ b/Documentation/devicetree/bindings/net/can/allwinner,sun4i-a10-can.yaml @@ -10,6 +10,9 @@ maintainers: - Chen-Yu Tsai <wens@csie.org> - Maxime Ripard <mripard@kernel.org> +allOf: + - $ref: can-controller.yaml# + properties: compatible: oneOf: diff --git a/Documentation/devicetree/bindings/net/can/bosch,m_can.yaml b/Documentation/devicetree/bindings/net/can/bosch,m_can.yaml index 401ab7cdb379..b7f9803c1c6d 100644 --- a/Documentation/devicetree/bindings/net/can/bosch,m_can.yaml +++ b/Documentation/devicetree/bindings/net/can/bosch,m_can.yaml @@ -9,7 +9,10 @@ title: Bosch MCAN controller Bindings description: Bosch MCAN controller for CAN bus maintainers: - - Sriram Dash <sriram.dash@samsung.com> + - Chandrasekar Ramakrishnan <rcsekar@samsung.com> + +allOf: + - $ref: can-controller.yaml# properties: compatible: @@ -66,8 +69,8 @@ properties: M_CAN includes the following elements according to user manual: 11-bit Filter 0-128 elements / 0-128 words 29-bit Filter 0-64 elements / 0-128 words - Rx FIFO 0 0-64 elements / 0-1152 words - Rx FIFO 1 0-64 elements / 0-1152 words + Rx FIFO 0 0-64 elements / 0-1152 words + Rx FIFO 1 0-64 elements / 0-1152 words Rx Buffers 0-64 elements / 0-1152 words Tx Event FIFO 0-32 elements / 0-64 words Tx Buffers 0-32 elements / 0-576 words diff --git a/Documentation/devicetree/bindings/net/can/microchip,mcp251xfd.yaml b/Documentation/devicetree/bindings/net/can/microchip,mcp251xfd.yaml index 2a884c1fe0e0..b3826af6bd6e 100644 --- a/Documentation/devicetree/bindings/net/can/microchip,mcp251xfd.yaml +++ b/Documentation/devicetree/bindings/net/can/microchip,mcp251xfd.yaml @@ -11,6 +11,9 @@ title: maintainers: - Marc Kleine-Budde <mkl@pengutronix.de> +allOf: + - $ref: can-controller.yaml# + properties: compatible: oneOf: diff --git a/Documentation/devicetree/bindings/net/can/renesas,rcar-canfd.yaml b/Documentation/devicetree/bindings/net/can/renesas,rcar-canfd.yaml index 546c6e6d2fb0..91a3554ca950 100644 --- a/Documentation/devicetree/bindings/net/can/renesas,rcar-canfd.yaml +++ b/Documentation/devicetree/bindings/net/can/renesas,rcar-canfd.yaml @@ -35,6 +35,8 @@ properties: - renesas,r9a07g044-canfd # RZ/G2{L,LC} - const: renesas,rzg2l-canfd # RZ/G2L family + - const: renesas,r8a779a0-canfd # R-Car V3U + reg: maxItems: 1 diff --git a/Documentation/devicetree/bindings/net/can/xilinx,can.yaml b/Documentation/devicetree/bindings/net/can/xilinx,can.yaml new file mode 100644 index 000000000000..65af8183cb9c --- /dev/null +++ b/Documentation/devicetree/bindings/net/can/xilinx,can.yaml @@ -0,0 +1,161 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/can/xilinx,can.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: + Xilinx Axi CAN/Zynq CANPS controller + +maintainers: + - Appana Durga Kedareswara rao <appana.durga.rao@xilinx.com> + +properties: + compatible: + enum: + - xlnx,zynq-can-1.0 + - xlnx,axi-can-1.00.a + - xlnx,canfd-1.0 + - xlnx,canfd-2.0 + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + clocks: + minItems: 1 + maxItems: 2 + + clock-names: + maxItems: 2 + + power-domains: + maxItems: 1 + + tx-fifo-depth: + $ref: "/schemas/types.yaml#/definitions/uint32" + description: CAN Tx fifo depth (Zynq, Axi CAN). + + rx-fifo-depth: + $ref: "/schemas/types.yaml#/definitions/uint32" + description: CAN Rx fifo depth (Zynq, Axi CAN, CAN FD in sequential Rx mode) + + tx-mailbox-count: + $ref: "/schemas/types.yaml#/definitions/uint32" + description: CAN Tx mailbox buffer count (CAN FD) + +required: + - compatible + - reg + - interrupts + - clocks + - clock-names + +unevaluatedProperties: false + +allOf: + - $ref: can-controller.yaml# + - if: + properties: + compatible: + contains: + enum: + - xlnx,zynq-can-1.0 + + then: + properties: + clock-names: + items: + - const: can_clk + - const: pclk + required: + - tx-fifo-depth + - rx-fifo-depth + + - if: + properties: + compatible: + contains: + enum: + - xlnx,axi-can-1.00.a + + then: + properties: + clock-names: + items: + - const: can_clk + - const: s_axi_aclk + required: + - tx-fifo-depth + - rx-fifo-depth + + - if: + properties: + compatible: + contains: + enum: + - xlnx,canfd-1.0 + - xlnx,canfd-2.0 + + then: + properties: + clock-names: + items: + - const: can_clk + - const: s_axi_aclk + required: + - tx-mailbox-count + - rx-fifo-depth + +examples: + - | + #include <dt-bindings/interrupt-controller/arm-gic.h> + + can@e0008000 { + compatible = "xlnx,zynq-can-1.0"; + reg = <0xe0008000 0x1000>; + clocks = <&clkc 19>, <&clkc 36>; + clock-names = "can_clk", "pclk"; + interrupts = <GIC_SPI 28 IRQ_TYPE_LEVEL_HIGH>; + interrupt-parent = <&intc>; + tx-fifo-depth = <0x40>; + rx-fifo-depth = <0x40>; + }; + + - | + can@40000000 { + compatible = "xlnx,axi-can-1.00.a"; + reg = <0x40000000 0x10000>; + clocks = <&clkc 0>, <&clkc 1>; + clock-names = "can_clk", "s_axi_aclk"; + interrupt-parent = <&intc>; + interrupts = <GIC_SPI 59 IRQ_TYPE_EDGE_RISING>; + tx-fifo-depth = <0x40>; + rx-fifo-depth = <0x40>; + }; + + - | + can@40000000 { + compatible = "xlnx,canfd-1.0"; + reg = <0x40000000 0x2000>; + clocks = <&clkc 0>, <&clkc 1>; + clock-names = "can_clk", "s_axi_aclk"; + interrupt-parent = <&intc>; + interrupts = <GIC_SPI 59 IRQ_TYPE_EDGE_RISING>; + tx-mailbox-count = <0x20>; + rx-fifo-depth = <0x20>; + }; + + - | + can@ff060000 { + compatible = "xlnx,canfd-2.0"; + reg = <0xff060000 0x6000>; + clocks = <&clkc 0>, <&clkc 1>; + clock-names = "can_clk", "s_axi_aclk"; + interrupt-parent = <&intc>; + interrupts = <GIC_SPI 59 IRQ_TYPE_EDGE_RISING>; + tx-mailbox-count = <0x20>; + rx-fifo-depth = <0x40>; + }; diff --git a/Documentation/devicetree/bindings/net/can/xilinx_can.txt b/Documentation/devicetree/bindings/net/can/xilinx_can.txt deleted file mode 100644 index 100cc40b8510..000000000000 --- a/Documentation/devicetree/bindings/net/can/xilinx_can.txt +++ /dev/null @@ -1,61 +0,0 @@ -Xilinx Axi CAN/Zynq CANPS controller Device Tree Bindings ---------------------------------------------------------- - -Required properties: -- compatible : Should be: - - "xlnx,zynq-can-1.0" for Zynq CAN controllers - - "xlnx,axi-can-1.00.a" for Axi CAN controllers - - "xlnx,canfd-1.0" for CAN FD controllers - - "xlnx,canfd-2.0" for CAN FD 2.0 controllers -- reg : Physical base address and size of the controller - registers map. -- interrupts : Property with a value describing the interrupt - number. -- clock-names : List of input clock names - - "can_clk", "pclk" (For CANPS), - - "can_clk", "s_axi_aclk" (For AXI CAN and CAN FD). - (See clock bindings for details). -- clocks : Clock phandles (see clock bindings for details). -- tx-fifo-depth : Can Tx fifo depth (Zynq, Axi CAN). -- rx-fifo-depth : Can Rx fifo depth (Zynq, Axi CAN, CAN FD in - sequential Rx mode). -- tx-mailbox-count : Can Tx mailbox buffer count (CAN FD). -- rx-mailbox-count : Can Rx mailbox buffer count (CAN FD in mailbox Rx - mode). - - -Example: - -For Zynq CANPS Dts file: - zynq_can_0: can@e0008000 { - compatible = "xlnx,zynq-can-1.0"; - clocks = <&clkc 19>, <&clkc 36>; - clock-names = "can_clk", "pclk"; - reg = <0xe0008000 0x1000>; - interrupts = <0 28 4>; - interrupt-parent = <&intc>; - tx-fifo-depth = <0x40>; - rx-fifo-depth = <0x40>; - }; -For Axi CAN Dts file: - axi_can_0: axi-can@40000000 { - compatible = "xlnx,axi-can-1.00.a"; - clocks = <&clkc 0>, <&clkc 1>; - clock-names = "can_clk","s_axi_aclk" ; - reg = <0x40000000 0x10000>; - interrupt-parent = <&intc>; - interrupts = <0 59 1>; - tx-fifo-depth = <0x40>; - rx-fifo-depth = <0x40>; - }; -For CAN FD Dts file: - canfd_0: canfd@40000000 { - compatible = "xlnx,canfd-1.0"; - clocks = <&clkc 0>, <&clkc 1>; - clock-names = "can_clk", "s_axi_aclk"; - reg = <0x40000000 0x2000>; - interrupt-parent = <&intc>; - interrupts = <0 59 1>; - tx-mailbox-count = <0x20>; - rx-fifo-depth = <0x20>; - }; diff --git a/Documentation/devicetree/bindings/net/cdns,macb.yaml b/Documentation/devicetree/bindings/net/cdns,macb.yaml index 8dd06db34169..6cd3d853dcba 100644 --- a/Documentation/devicetree/bindings/net/cdns,macb.yaml +++ b/Documentation/devicetree/bindings/net/cdns,macb.yaml @@ -81,6 +81,25 @@ properties: phy-handle: true + phys: + maxItems: 1 + + phy-names: + const: sgmii-phy + description: + Required with ZynqMP SoC when in SGMII mode. + Should reference PS-GTR generic PHY device for this controller + instance. See ZynqMP example. + + resets: + maxItems: 1 + description: + Recommended with ZynqMP, specify reset control for this + controller instance with zynqmp-reset driver. + + reset-names: + maxItems: 1 + fixed-link: true iommus: @@ -157,3 +176,40 @@ examples: reset-gpios = <&pioE 6 1>; }; }; + + - | + #include <dt-bindings/clock/xlnx-zynqmp-clk.h> + #include <dt-bindings/power/xlnx-zynqmp-power.h> + #include <dt-bindings/reset/xlnx-zynqmp-resets.h> + #include <dt-bindings/phy/phy.h> + + bus { + #address-cells = <2>; + #size-cells = <2>; + gem1: ethernet@ff0c0000 { + compatible = "cdns,zynqmp-gem", "cdns,gem"; + interrupt-parent = <&gic>; + interrupts = <0 59 4>, <0 59 4>; + reg = <0x0 0xff0c0000 0x0 0x1000>; + clocks = <&zynqmp_clk LPD_LSBUS>, <&zynqmp_clk GEM1_REF>, + <&zynqmp_clk GEM1_TX>, <&zynqmp_clk GEM1_RX>, + <&zynqmp_clk GEM_TSU>; + clock-names = "pclk", "hclk", "tx_clk", "rx_clk", "tsu_clk"; + #address-cells = <1>; + #size-cells = <0>; + #stream-id-cells = <1>; + iommus = <&smmu 0x875>; + power-domains = <&zynqmp_firmware PD_ETH_1>; + resets = <&zynqmp_reset ZYNQMP_RESET_GEM1>; + reset-names = "gem1_rst"; + status = "okay"; + phy-mode = "sgmii"; + phy-names = "sgmii-phy"; + phys = <&psgtr 1 PHY_TYPE_SGMII 1 1>; + fixed-link { + speed = <1000>; + full-duplex; + pause; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/net/davicom,dm9051.yaml b/Documentation/devicetree/bindings/net/davicom,dm9051.yaml new file mode 100644 index 000000000000..52e852fef753 --- /dev/null +++ b/Documentation/devicetree/bindings/net/davicom,dm9051.yaml @@ -0,0 +1,62 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/davicom,dm9051.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Davicom DM9051 SPI Ethernet Controller + +maintainers: + - Joseph CHANG <josright123@gmail.com> + +description: | + The DM9051 is a fully integrated and cost-effective low pin count single + chip Fast Ethernet controller with a Serial Peripheral Interface (SPI). + +allOf: + - $ref: ethernet-controller.yaml# + +properties: + compatible: + const: davicom,dm9051 + + reg: + maxItems: 1 + + spi-max-frequency: + maximum: 45000000 + + interrupts: + maxItems: 1 + + local-mac-address: true + + mac-address: true + +required: + - compatible + - reg + - spi-max-frequency + - interrupts + +additionalProperties: false + +examples: + # Raspberry Pi platform + - | + /* for Raspberry Pi with pin control stuff for GPIO irq */ + #include <dt-bindings/interrupt-controller/irq.h> + #include <dt-bindings/gpio/gpio.h> + spi { + #address-cells = <1>; + #size-cells = <0>; + + ethernet@0 { + compatible = "davicom,dm9051"; + reg = <0>; /* spi chip select */ + local-mac-address = [00 00 00 00 00 00]; + interrupt-parent = <&gpio>; + interrupts = <26 IRQ_TYPE_LEVEL_LOW>; + spi-max-frequency = <31200000>; + }; + }; diff --git a/Documentation/devicetree/bindings/net/dsa/dsa-port.yaml b/Documentation/devicetree/bindings/net/dsa/dsa-port.yaml index 702df848a71d..e60867c7c571 100644 --- a/Documentation/devicetree/bindings/net/dsa/dsa-port.yaml +++ b/Documentation/devicetree/bindings/net/dsa/dsa-port.yaml @@ -51,6 +51,8 @@ properties: - edsa - ocelot - ocelot-8021q + - rtl8_4 + - rtl8_4t - seville phy-handle: true diff --git a/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml b/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml index 84985f53bffd..184152087b60 100644 --- a/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml +++ b/Documentation/devicetree/bindings/net/dsa/microchip,ksz.yaml @@ -42,6 +42,12 @@ properties: description: Set if the output SYNCLKO frequency should be set to 125MHz instead of 25MHz. + microchip,synclko-disable: + $ref: /schemas/types.yaml#/definitions/flag + description: + Set if the output SYNCLKO clock should be disabled. Do not mix with + microchip,synclko-125. + required: - compatible - reg diff --git a/Documentation/devicetree/bindings/net/dsa/realtek-smi.txt b/Documentation/devicetree/bindings/net/dsa/realtek-smi.txt deleted file mode 100644 index 7959ec237983..000000000000 --- a/Documentation/devicetree/bindings/net/dsa/realtek-smi.txt +++ /dev/null @@ -1,240 +0,0 @@ -Realtek SMI-based Switches -========================== - -The SMI "Simple Management Interface" is a two-wire protocol using -bit-banged GPIO that while it reuses the MDIO lines MCK and MDIO does -not use the MDIO protocol. This binding defines how to specify the -SMI-based Realtek devices. - -Required properties: - -- compatible: must be exactly one of: - "realtek,rtl8365mb" (4+1 ports) - "realtek,rtl8366" - "realtek,rtl8366rb" (4+1 ports) - "realtek,rtl8366s" (4+1 ports) - "realtek,rtl8367" - "realtek,rtl8367b" - "realtek,rtl8368s" (8 port) - "realtek,rtl8369" - "realtek,rtl8370" (8 port) - -Required properties: -- mdc-gpios: GPIO line for the MDC clock line. -- mdio-gpios: GPIO line for the MDIO data line. -- reset-gpios: GPIO line for the reset signal. - -Optional properties: -- realtek,disable-leds: if the LED drivers are not used in the - hardware design this will disable them so they are not turned on - and wasting power. - -Required subnodes: - -- interrupt-controller - - This defines an interrupt controller with an IRQ line (typically - a GPIO) that will demultiplex and handle the interrupt from the single - interrupt line coming out of one of the SMI-based chips. It most - importantly provides link up/down interrupts to the PHY blocks inside - the ASIC. - -Required properties of interrupt-controller: - -- interrupt: parent interrupt, see interrupt-controller/interrupts.txt -- interrupt-controller: see interrupt-controller/interrupts.txt -- #address-cells: should be <0> -- #interrupt-cells: should be <1> - -- mdio - - This defines the internal MDIO bus of the SMI device, mostly for the - purpose of being able to hook the interrupts to the right PHY and - the right PHY to the corresponding port. - -Required properties of mdio: - -- compatible: should be set to "realtek,smi-mdio" for all SMI devices - -See net/mdio.txt for additional MDIO bus properties. - -See net/dsa/dsa.txt for a list of additional required and optional properties -and subnodes of DSA switches. - -Examples: - -An example for the RTL8366RB: - -switch { - compatible = "realtek,rtl8366rb"; - /* 22 = MDIO (has input reads), 21 = MDC (clock, output only) */ - mdc-gpios = <&gpio0 21 GPIO_ACTIVE_HIGH>; - mdio-gpios = <&gpio0 22 GPIO_ACTIVE_HIGH>; - reset-gpios = <&gpio0 14 GPIO_ACTIVE_LOW>; - - switch_intc: interrupt-controller { - /* GPIO 15 provides the interrupt */ - interrupt-parent = <&gpio0>; - interrupts = <15 IRQ_TYPE_LEVEL_LOW>; - interrupt-controller; - #address-cells = <0>; - #interrupt-cells = <1>; - }; - - ports { - #address-cells = <1>; - #size-cells = <0>; - reg = <0>; - port@0 { - reg = <0>; - label = "lan0"; - phy-handle = <&phy0>; - }; - port@1 { - reg = <1>; - label = "lan1"; - phy-handle = <&phy1>; - }; - port@2 { - reg = <2>; - label = "lan2"; - phy-handle = <&phy2>; - }; - port@3 { - reg = <3>; - label = "lan3"; - phy-handle = <&phy3>; - }; - port@4 { - reg = <4>; - label = "wan"; - phy-handle = <&phy4>; - }; - port@5 { - reg = <5>; - label = "cpu"; - ethernet = <&gmac0>; - phy-mode = "rgmii"; - fixed-link { - speed = <1000>; - full-duplex; - }; - }; - }; - - mdio { - compatible = "realtek,smi-mdio", "dsa-mdio"; - #address-cells = <1>; - #size-cells = <0>; - - phy0: phy@0 { - reg = <0>; - interrupt-parent = <&switch_intc>; - interrupts = <0>; - }; - phy1: phy@1 { - reg = <1>; - interrupt-parent = <&switch_intc>; - interrupts = <1>; - }; - phy2: phy@2 { - reg = <2>; - interrupt-parent = <&switch_intc>; - interrupts = <2>; - }; - phy3: phy@3 { - reg = <3>; - interrupt-parent = <&switch_intc>; - interrupts = <3>; - }; - phy4: phy@4 { - reg = <4>; - interrupt-parent = <&switch_intc>; - interrupts = <12>; - }; - }; -}; - -An example for the RTL8365MB-VC: - -switch { - compatible = "realtek,rtl8365mb"; - mdc-gpios = <&gpio1 16 GPIO_ACTIVE_HIGH>; - mdio-gpios = <&gpio1 17 GPIO_ACTIVE_HIGH>; - reset-gpios = <&gpio5 0 GPIO_ACTIVE_LOW>; - - switch_intc: interrupt-controller { - interrupt-parent = <&gpio5>; - interrupts = <1 IRQ_TYPE_LEVEL_LOW>; - interrupt-controller; - #address-cells = <0>; - #interrupt-cells = <1>; - }; - - ports { - #address-cells = <1>; - #size-cells = <0>; - reg = <0>; - port@0 { - reg = <0>; - label = "swp0"; - phy-handle = <ðphy0>; - }; - port@1 { - reg = <1>; - label = "swp1"; - phy-handle = <ðphy1>; - }; - port@2 { - reg = <2>; - label = "swp2"; - phy-handle = <ðphy2>; - }; - port@3 { - reg = <3>; - label = "swp3"; - phy-handle = <ðphy3>; - }; - port@6 { - reg = <6>; - label = "cpu"; - ethernet = <&fec1>; - phy-mode = "rgmii"; - tx-internal-delay-ps = <2000>; - rx-internal-delay-ps = <2000>; - - fixed-link { - speed = <1000>; - full-duplex; - pause; - }; - }; - }; - - mdio { - compatible = "realtek,smi-mdio"; - #address-cells = <1>; - #size-cells = <0>; - - ethphy0: phy@0 { - reg = <0>; - interrupt-parent = <&switch_intc>; - interrupts = <0>; - }; - ethphy1: phy@1 { - reg = <1>; - interrupt-parent = <&switch_intc>; - interrupts = <1>; - }; - ethphy2: phy@2 { - reg = <2>; - interrupt-parent = <&switch_intc>; - interrupts = <2>; - }; - ethphy3: phy@3 { - reg = <3>; - interrupt-parent = <&switch_intc>; - interrupts = <3>; - }; - }; -}; diff --git a/Documentation/devicetree/bindings/net/dsa/realtek.yaml b/Documentation/devicetree/bindings/net/dsa/realtek.yaml new file mode 100644 index 000000000000..8756060895a8 --- /dev/null +++ b/Documentation/devicetree/bindings/net/dsa/realtek.yaml @@ -0,0 +1,394 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/dsa/realtek.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Realtek switches for unmanaged switches + +allOf: + - $ref: dsa.yaml# + +maintainers: + - Linus Walleij <linus.walleij@linaro.org> + +description: + Realtek advertises these chips as fast/gigabit switches or unmanaged + switches. They can be controlled using different interfaces, like SMI, + MDIO or SPI. + + The SMI "Simple Management Interface" is a two-wire protocol using + bit-banged GPIO that while it reuses the MDIO lines MCK and MDIO does + not use the MDIO protocol. This binding defines how to specify the + SMI-based Realtek devices. The realtek-smi driver is a platform driver + and it must be inserted inside a platform node. + + The MDIO-connected switches use MDIO protocol to access their registers. + The realtek-mdio driver is an MDIO driver and it must be inserted inside + an MDIO node. + +properties: + compatible: + enum: + - realtek,rtl8365mb + - realtek,rtl8366 + - realtek,rtl8366rb + - realtek,rtl8366s + - realtek,rtl8367 + - realtek,rtl8367b + - realtek,rtl8367rb + - realtek,rtl8367s + - realtek,rtl8368s + - realtek,rtl8369 + - realtek,rtl8370 + description: | + realtek,rtl8365mb: 4+1 ports + realtek,rtl8366: 5+1 ports + realtek,rtl8366rb: 5+1 ports + realtek,rtl8366s: 5+1 ports + realtek,rtl8367: + realtek,rtl8367b: + realtek,rtl8367rb: 5+2 ports + realtek,rtl8367s: 5+2 ports + realtek,rtl8368s: 8 ports + realtek,rtl8369: 8+1 ports + realtek,rtl8370: 8+2 ports + + mdc-gpios: + description: GPIO line for the MDC clock line. + maxItems: 1 + + mdio-gpios: + description: GPIO line for the MDIO data line. + maxItems: 1 + + reset-gpios: + description: GPIO to be used to reset the whole device + maxItems: 1 + + realtek,disable-leds: + type: boolean + description: | + if the LED drivers are not used in the hardware design, + this will disable them so they are not turned on + and wasting power. + + interrupt-controller: + type: object + description: | + This defines an interrupt controller with an IRQ line (typically + a GPIO) that will demultiplex and handle the interrupt from the single + interrupt line coming out of one of the Realtek switch chips. It most + importantly provides link up/down interrupts to the PHY blocks inside + the ASIC. + + properties: + + interrupt-controller: true + + interrupts: + maxItems: 1 + description: + A single IRQ line from the switch, either active LOW or HIGH + + '#address-cells': + const: 0 + + '#interrupt-cells': + const: 1 + + required: + - interrupt-controller + - '#address-cells' + - '#interrupt-cells' + + mdio: + $ref: /schemas/net/mdio.yaml# + unevaluatedProperties: false + + properties: + compatible: + const: realtek,smi-mdio + +if: + required: + - reg + +then: + not: + required: + - mdc-gpios + - mdio-gpios + - mdio + + properties: + mdc-gpios: false + mdio-gpios: false + mdio: false + +else: + required: + - mdc-gpios + - mdio-gpios + - mdio + - reset-gpios + +required: + - compatible + + # - mdc-gpios + # - mdio-gpios + # - reset-gpios + # - mdio + +unevaluatedProperties: false + +examples: + - | + #include <dt-bindings/gpio/gpio.h> + #include <dt-bindings/interrupt-controller/irq.h> + + platform { + switch { + compatible = "realtek,rtl8366rb"; + /* 22 = MDIO (has input reads), 21 = MDC (clock, output only) */ + mdc-gpios = <&gpio0 21 GPIO_ACTIVE_HIGH>; + mdio-gpios = <&gpio0 22 GPIO_ACTIVE_HIGH>; + reset-gpios = <&gpio0 14 GPIO_ACTIVE_LOW>; + + switch_intc1: interrupt-controller { + /* GPIO 15 provides the interrupt */ + interrupt-parent = <&gpio0>; + interrupts = <15 IRQ_TYPE_LEVEL_LOW>; + interrupt-controller; + #address-cells = <0>; + #interrupt-cells = <1>; + }; + + ports { + #address-cells = <1>; + #size-cells = <0>; + port@0 { + reg = <0>; + label = "lan0"; + phy-handle = <&phy0>; + }; + port@1 { + reg = <1>; + label = "lan1"; + phy-handle = <&phy1>; + }; + port@2 { + reg = <2>; + label = "lan2"; + phy-handle = <&phy2>; + }; + port@3 { + reg = <3>; + label = "lan3"; + phy-handle = <&phy3>; + }; + port@4 { + reg = <4>; + label = "wan"; + phy-handle = <&phy4>; + }; + port@5 { + reg = <5>; + label = "cpu"; + ethernet = <&gmac0>; + phy-mode = "rgmii"; + fixed-link { + speed = <1000>; + full-duplex; + }; + }; + }; + + mdio { + compatible = "realtek,smi-mdio"; + #address-cells = <1>; + #size-cells = <0>; + + phy0: ethernet-phy@0 { + reg = <0>; + interrupt-parent = <&switch_intc1>; + interrupts = <0>; + }; + phy1: ethernet-phy@1 { + reg = <1>; + interrupt-parent = <&switch_intc1>; + interrupts = <1>; + }; + phy2: ethernet-phy@2 { + reg = <2>; + interrupt-parent = <&switch_intc1>; + interrupts = <2>; + }; + phy3: ethernet-phy@3 { + reg = <3>; + interrupt-parent = <&switch_intc1>; + interrupts = <3>; + }; + phy4: ethernet-phy@4 { + reg = <4>; + interrupt-parent = <&switch_intc1>; + interrupts = <12>; + }; + }; + }; + }; + + - | + #include <dt-bindings/gpio/gpio.h> + #include <dt-bindings/interrupt-controller/irq.h> + + platform { + switch { + compatible = "realtek,rtl8365mb"; + mdc-gpios = <&gpio1 16 GPIO_ACTIVE_HIGH>; + mdio-gpios = <&gpio1 17 GPIO_ACTIVE_HIGH>; + reset-gpios = <&gpio5 0 GPIO_ACTIVE_LOW>; + + switch_intc2: interrupt-controller { + interrupt-parent = <&gpio5>; + interrupts = <1 IRQ_TYPE_LEVEL_LOW>; + interrupt-controller; + #address-cells = <0>; + #interrupt-cells = <1>; + }; + + ports { + #address-cells = <1>; + #size-cells = <0>; + port@0 { + reg = <0>; + label = "swp0"; + phy-handle = <ðphy0>; + }; + port@1 { + reg = <1>; + label = "swp1"; + phy-handle = <ðphy1>; + }; + port@2 { + reg = <2>; + label = "swp2"; + phy-handle = <ðphy2>; + }; + port@3 { + reg = <3>; + label = "swp3"; + phy-handle = <ðphy3>; + }; + port@6 { + reg = <6>; + label = "cpu"; + ethernet = <&fec1>; + phy-mode = "rgmii"; + tx-internal-delay-ps = <2000>; + rx-internal-delay-ps = <2000>; + + fixed-link { + speed = <1000>; + full-duplex; + pause; + }; + }; + }; + + mdio { + compatible = "realtek,smi-mdio"; + #address-cells = <1>; + #size-cells = <0>; + + ethphy0: ethernet-phy@0 { + reg = <0>; + interrupt-parent = <&switch_intc2>; + interrupts = <0>; + }; + ethphy1: ethernet-phy@1 { + reg = <1>; + interrupt-parent = <&switch_intc2>; + interrupts = <1>; + }; + ethphy2: ethernet-phy@2 { + reg = <2>; + interrupt-parent = <&switch_intc2>; + interrupts = <2>; + }; + ethphy3: ethernet-phy@3 { + reg = <3>; + interrupt-parent = <&switch_intc2>; + interrupts = <3>; + }; + }; + }; + }; + + - | + #include <dt-bindings/gpio/gpio.h> + #include <dt-bindings/interrupt-controller/irq.h> + + mdio { + #address-cells = <1>; + #size-cells = <0>; + + switch@29 { + compatible = "realtek,rtl8367s"; + reg = <29>; + + reset-gpios = <&gpio2 20 GPIO_ACTIVE_LOW>; + + switch_intc3: interrupt-controller { + interrupt-parent = <&gpio0>; + interrupts = <11 IRQ_TYPE_EDGE_FALLING>; + interrupt-controller; + #address-cells = <0>; + #interrupt-cells = <1>; + }; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + port@0 { + reg = <0>; + label = "lan4"; + }; + + port@1 { + reg = <1>; + label = "lan3"; + }; + + port@2 { + reg = <2>; + label = "lan2"; + }; + + port@3 { + reg = <3>; + label = "lan1"; + }; + + port@4 { + reg = <4>; + label = "wan"; + }; + + port@7 { + reg = <7>; + ethernet = <ðernet>; + phy-mode = "rgmii"; + tx-internal-delay-ps = <2000>; + rx-internal-delay-ps = <0>; + + fixed-link { + speed = <1000>; + full-duplex; + }; + }; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/net/fsl-fman.txt b/Documentation/devicetree/bindings/net/fsl-fman.txt index 020337f3c05f..801efc7d6818 100644 --- a/Documentation/devicetree/bindings/net/fsl-fman.txt +++ b/Documentation/devicetree/bindings/net/fsl-fman.txt @@ -388,14 +388,24 @@ PROPERTIES Value type: <prop-encoded-array> Definition: A standard property. -- bus-frequency +- clocks + Usage: optional + Value type: <phandle> + Definition: A reference to the input clock of the controller + from which the MDC frequency is derived. + +- clock-frequency Usage: optional Value type: <u32> - Definition: Specifies the external MDIO bus clock speed to - be used, if different from the standard 2.5 MHz. - This may be due to the standard speed being unsupported (e.g. - due to a hardware problem), or to advertise that all relevant - components in the system support a faster speed. + Definition: Specifies the external MDC frequency, in Hertz, to + be used. Requires that the input clock is specified in the + "clocks" property. See also: mdio.yaml. + +- suppress-preamble + Usage: optional + Value type: <boolean> + Definition: Disable generation of preamble bits. See also: + mdio.yaml. - interrupts Usage: required for external MDIO diff --git a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt index 691f886cfc4a..2bf31572b08d 100644 --- a/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt +++ b/Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt @@ -5,6 +5,7 @@ Required properties: "marvell,armada-370-neta" "marvell,armada-xp-neta" "marvell,armada-3700-neta" + "marvell,armada-ac5-neta" - reg: address and length of the register set for the device. - interrupts: interrupt for the device - phy: See ethernet.txt file in the same directory. diff --git a/Documentation/devicetree/bindings/net/mctp-i2c-controller.yaml b/Documentation/devicetree/bindings/net/mctp-i2c-controller.yaml new file mode 100644 index 000000000000..afd11c9422fa --- /dev/null +++ b/Documentation/devicetree/bindings/net/mctp-i2c-controller.yaml @@ -0,0 +1,92 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/mctp-i2c-controller.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: MCTP I2C transport binding + +maintainers: + - Matt Johnston <matt@codeconstruct.com.au> + +description: | + An mctp-i2c-controller defines a local MCTP endpoint on an I2C controller. + MCTP I2C is specified by DMTF DSP0237. + + An mctp-i2c-controller must be attached to an I2C adapter which supports + slave functionality. I2C busses (either directly or as subordinate mux + busses) are attached to the mctp-i2c-controller with a 'mctp-controller' + property on each used bus. Each mctp-controller I2C bus will be presented + to the host system as a separate MCTP I2C instance. + +properties: + compatible: + const: mctp-i2c-controller + + reg: + minimum: 0x40000000 + maximum: 0x4000007f + description: | + 7 bit I2C address of the local endpoint. + I2C_OWN_SLAVE_ADDRESS (1<<30) flag must be set. + +additionalProperties: false + +required: + - compatible + - reg + +examples: + - | + // Basic case of a single I2C bus + #include <dt-bindings/i2c/i2c.h> + + i2c { + #address-cells = <1>; + #size-cells = <0>; + mctp-controller; + + mctp@30 { + compatible = "mctp-i2c-controller"; + reg = <(0x30 | I2C_OWN_SLAVE_ADDRESS)>; + }; + }; + + - | + // Mux topology with multiple MCTP-handling busses under + // a single mctp-i2c-controller. + // i2c1 and i2c6 can have MCTP devices, i2c5 does not. + #include <dt-bindings/i2c/i2c.h> + + i2c1: i2c { + #address-cells = <1>; + #size-cells = <0>; + mctp-controller; + + mctp@50 { + compatible = "mctp-i2c-controller"; + reg = <(0x50 | I2C_OWN_SLAVE_ADDRESS)>; + }; + }; + + i2c-mux { + #address-cells = <1>; + #size-cells = <0>; + i2c-parent = <&i2c1>; + + i2c5: i2c@0 { + #address-cells = <1>; + #size-cells = <0>; + reg = <0>; + eeprom@33 { + reg = <0x33>; + }; + }; + + i2c6: i2c@1 { + #address-cells = <1>; + #size-cells = <0>; + reg = <1>; + mctp-controller; + }; + }; diff --git a/Documentation/devicetree/bindings/net/mediatek-dwmac.txt b/Documentation/devicetree/bindings/net/mediatek-dwmac.txt deleted file mode 100644 index afbcaebf062e..000000000000 --- a/Documentation/devicetree/bindings/net/mediatek-dwmac.txt +++ /dev/null @@ -1,91 +0,0 @@ -MediaTek DWMAC glue layer controller - -This file documents platform glue layer for stmmac. -Please see stmmac.txt for the other unchanged properties. - -The device node has following properties. - -Required properties: -- compatible: Should be "mediatek,mt2712-gmac" for MT2712 SoC -- reg: Address and length of the register set for the device -- interrupts: Should contain the MAC interrupts -- interrupt-names: Should contain a list of interrupt names corresponding to - the interrupts in the interrupts property, if available. - Should be "macirq" for the main MAC IRQ -- clocks: Must contain a phandle for each entry in clock-names. -- clock-names: The name of the clock listed in the clocks property. These are - "axi", "apb", "mac_main", "ptp_ref", "rmii_internal" for MT2712 SoC. -- mac-address: See ethernet.txt in the same directory -- phy-mode: See ethernet.txt in the same directory -- mediatek,pericfg: A phandle to the syscon node that control ethernet - interface and timing delay. - -Optional properties: -- mediatek,tx-delay-ps: TX clock delay macro value. Default is 0. - It should be defined for RGMII/MII interface. - It should be defined for RMII interface when the reference clock is from MT2712 SoC. -- mediatek,rx-delay-ps: RX clock delay macro value. Default is 0. - It should be defined for RGMII/MII interface. - It should be defined for RMII interface. -Both delay properties need to be a multiple of 170 for RGMII interface, -or will round down. Range 0~31*170. -Both delay properties need to be a multiple of 550 for MII/RMII interface, -or will round down. Range 0~31*550. - -- mediatek,rmii-rxc: boolean property, if present indicates that the RMII - reference clock, which is from external PHYs, is connected to RXC pin - on MT2712 SoC. - Otherwise, is connected to TXC pin. -- mediatek,rmii-clk-from-mac: boolean property, if present indicates that - MT2712 SoC provides the RMII reference clock, which outputs to TXC pin only. -- mediatek,txc-inverse: boolean property, if present indicates that - 1. tx clock will be inversed in MII/RGMII case, - 2. tx clock inside MAC will be inversed relative to reference clock - which is from external PHYs in RMII case, and it rarely happen. - 3. the reference clock, which outputs to TXC pin will be inversed in RMII case - when the reference clock is from MT2712 SoC. -- mediatek,rxc-inverse: boolean property, if present indicates that - 1. rx clock will be inversed in MII/RGMII case. - 2. reference clock will be inversed when arrived at MAC in RMII case, when - the reference clock is from external PHYs. - 3. the inside clock, which be sent to MAC, will be inversed in RMII case when - the reference clock is from MT2712 SoC. -- assigned-clocks: mac_main and ptp_ref clocks -- assigned-clock-parents: parent clocks of the assigned clocks - -Example: - eth: ethernet@1101c000 { - compatible = "mediatek,mt2712-gmac"; - reg = <0 0x1101c000 0 0x1300>; - interrupts = <GIC_SPI 237 IRQ_TYPE_LEVEL_LOW>; - interrupt-names = "macirq"; - phy-mode ="rgmii-rxid"; - mac-address = [00 55 7b b5 7d f7]; - clock-names = "axi", - "apb", - "mac_main", - "ptp_ref", - "rmii_internal"; - clocks = <&pericfg CLK_PERI_GMAC>, - <&pericfg CLK_PERI_GMAC_PCLK>, - <&topckgen CLK_TOP_ETHER_125M_SEL>, - <&topckgen CLK_TOP_ETHER_50M_SEL>, - <&topckgen CLK_TOP_ETHER_50M_RMII_SEL>; - assigned-clocks = <&topckgen CLK_TOP_ETHER_125M_SEL>, - <&topckgen CLK_TOP_ETHER_50M_SEL>, - <&topckgen CLK_TOP_ETHER_50M_RMII_SEL>; - assigned-clock-parents = <&topckgen CLK_TOP_ETHERPLL_125M>, - <&topckgen CLK_TOP_APLL1_D3>, - <&topckgen CLK_TOP_ETHERPLL_50M>; - power-domains = <&scpsys MT2712_POWER_DOMAIN_AUDIO>; - mediatek,pericfg = <&pericfg>; - mediatek,tx-delay-ps = <1530>; - mediatek,rx-delay-ps = <1530>; - mediatek,rmii-rxc; - mediatek,txc-inverse; - mediatek,rxc-inverse; - snps,txpbl = <1>; - snps,rxpbl = <1>; - snps,reset-gpio = <&pio 87 GPIO_ACTIVE_LOW>; - snps,reset-active-low; - }; diff --git a/Documentation/devicetree/bindings/net/mediatek-dwmac.yaml b/Documentation/devicetree/bindings/net/mediatek-dwmac.yaml new file mode 100644 index 000000000000..901944683322 --- /dev/null +++ b/Documentation/devicetree/bindings/net/mediatek-dwmac.yaml @@ -0,0 +1,175 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/mediatek-dwmac.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: MediaTek DWMAC glue layer controller + +maintainers: + - Biao Huang <biao.huang@mediatek.com> + +description: + This file documents platform glue layer for stmmac. + +# We need a select here so we don't match all nodes with 'snps,dwmac' +select: + properties: + compatible: + contains: + enum: + - mediatek,mt2712-gmac + - mediatek,mt8195-gmac + required: + - compatible + +allOf: + - $ref: "snps,dwmac.yaml#" + +properties: + compatible: + oneOf: + - items: + - enum: + - mediatek,mt2712-gmac + - const: snps,dwmac-4.20a + - items: + - enum: + - mediatek,mt8195-gmac + - const: snps,dwmac-5.10a + + clocks: + minItems: 5 + items: + - description: AXI clock + - description: APB clock + - description: MAC Main clock + - description: PTP clock + - description: RMII reference clock provided by MAC + - description: MAC clock gate + + clock-names: + minItems: 5 + items: + - const: axi + - const: apb + - const: mac_main + - const: ptp_ref + - const: rmii_internal + - const: mac_cg + + mediatek,pericfg: + $ref: /schemas/types.yaml#/definitions/phandle + description: + The phandle to the syscon node that control ethernet + interface and timing delay. + + mediatek,tx-delay-ps: + description: + The internal TX clock delay (provided by this driver) in nanoseconds. + For MT2712 RGMII interface, Allowed value need to be a multiple of 170, + or will round down. Range 0~31*170. + For MT2712 RMII/MII interface, Allowed value need to be a multiple of 550, + or will round down. Range 0~31*550. + For MT8195 RGMII/RMII/MII interface, Allowed value need to be a multiple of 290, + or will round down. Range 0~31*290. + + mediatek,rx-delay-ps: + description: + The internal RX clock delay (provided by this driver) in nanoseconds. + For MT2712 RGMII interface, Allowed value need to be a multiple of 170, + or will round down. Range 0~31*170. + For MT2712 RMII/MII interface, Allowed value need to be a multiple of 550, + or will round down. Range 0~31*550. + For MT8195 RGMII/RMII/MII interface, Allowed value need to be a multiple + of 290, or will round down. Range 0~31*290. + + mediatek,rmii-rxc: + type: boolean + description: + If present, indicates that the RMII reference clock, which is from external + PHYs, is connected to RXC pin. Otherwise, is connected to TXC pin. + + mediatek,rmii-clk-from-mac: + type: boolean + description: + If present, indicates that MAC provides the RMII reference clock, which + outputs to TXC pin only. + + mediatek,txc-inverse: + type: boolean + description: + If present, indicates that + 1. tx clock will be inversed in MII/RGMII case, + 2. tx clock inside MAC will be inversed relative to reference clock + which is from external PHYs in RMII case, and it rarely happen. + 3. the reference clock, which outputs to TXC pin will be inversed in RMII case + when the reference clock is from MAC. + + mediatek,rxc-inverse: + type: boolean + description: + If present, indicates that + 1. rx clock will be inversed in MII/RGMII case. + 2. reference clock will be inversed when arrived at MAC in RMII case, when + the reference clock is from external PHYs. + 3. the inside clock, which be sent to MAC, will be inversed in RMII case when + the reference clock is from MAC. + + mediatek,mac-wol: + type: boolean + description: + If present, indicates that MAC supports WOL(Wake-On-LAN), and MAC WOL will be enabled. + Otherwise, PHY WOL is perferred. + +required: + - compatible + - reg + - interrupts + - interrupt-names + - clocks + - clock-names + - phy-mode + - mediatek,pericfg + +unevaluatedProperties: false + +examples: + - | + #include <dt-bindings/clock/mt2712-clk.h> + #include <dt-bindings/gpio/gpio.h> + #include <dt-bindings/interrupt-controller/arm-gic.h> + #include <dt-bindings/interrupt-controller/irq.h> + #include <dt-bindings/power/mt2712-power.h> + + eth: ethernet@1101c000 { + compatible = "mediatek,mt2712-gmac", "snps,dwmac-4.20a"; + reg = <0x1101c000 0x1300>; + interrupts = <GIC_SPI 237 IRQ_TYPE_LEVEL_LOW>; + interrupt-names = "macirq"; + phy-mode ="rgmii-rxid"; + mac-address = [00 55 7b b5 7d f7]; + clock-names = "axi", + "apb", + "mac_main", + "ptp_ref", + "rmii_internal"; + clocks = <&pericfg CLK_PERI_GMAC>, + <&pericfg CLK_PERI_GMAC_PCLK>, + <&topckgen CLK_TOP_ETHER_125M_SEL>, + <&topckgen CLK_TOP_ETHER_50M_SEL>, + <&topckgen CLK_TOP_ETHER_50M_RMII_SEL>; + assigned-clocks = <&topckgen CLK_TOP_ETHER_125M_SEL>, + <&topckgen CLK_TOP_ETHER_50M_SEL>, + <&topckgen CLK_TOP_ETHER_50M_RMII_SEL>; + assigned-clock-parents = <&topckgen CLK_TOP_ETHERPLL_125M>, + <&topckgen CLK_TOP_APLL1_D3>, + <&topckgen CLK_TOP_ETHERPLL_50M>; + power-domains = <&scpsys MT2712_POWER_DOMAIN_AUDIO>; + mediatek,pericfg = <&pericfg>; + mediatek,tx-delay-ps = <1530>; + snps,txpbl = <1>; + snps,rxpbl = <1>; + snps,reset-gpio = <&pio 87 GPIO_ACTIVE_LOW>; + snps,reset-delays-us = <0 10000 10000>; + }; diff --git a/Documentation/devicetree/bindings/net/micrel.txt b/Documentation/devicetree/bindings/net/micrel.txt index 8d157f0295a5..c5ab62c39133 100644 --- a/Documentation/devicetree/bindings/net/micrel.txt +++ b/Documentation/devicetree/bindings/net/micrel.txt @@ -45,3 +45,20 @@ Optional properties: In fiber mode, auto-negotiation is disabled and the PHY can only work in 100base-fx (full and half duplex) modes. + + - lan8814,ignore-ts: If present the PHY will not support timestamping. + + This option acts as check whether Timestamping is supported by + hardware or not. LAN8814 phy support hardware tmestamping. + + - lan8814,latency_rx_10: Configures Latency value of phy in ingress at 10 Mbps. + + - lan8814,latency_tx_10: Configures Latency value of phy in egress at 10 Mbps. + + - lan8814,latency_rx_100: Configures Latency value of phy in ingress at 100 Mbps. + + - lan8814,latency_tx_100: Configures Latency value of phy in egress at 100 Mbps. + + - lan8814,latency_rx_1000: Configures Latency value of phy in ingress at 1000 Mbps. + + - lan8814,latency_tx_1000: Configures Latency value of phy in egress at 1000 Mbps. diff --git a/Documentation/devicetree/bindings/net/microchip,lan966x-switch.yaml b/Documentation/devicetree/bindings/net/microchip,lan966x-switch.yaml index e79e4e166ad8..13812768b923 100644 --- a/Documentation/devicetree/bindings/net/microchip,lan966x-switch.yaml +++ b/Documentation/devicetree/bindings/net/microchip,lan966x-switch.yaml @@ -38,6 +38,7 @@ properties: - description: register based extraction - description: frame dma based extraction - description: analyzer interrupt + - description: ptp interrupt interrupt-names: minItems: 1 @@ -45,6 +46,7 @@ properties: - const: xtr - const: fdma - const: ana + - const: ptp resets: items: diff --git a/Documentation/devicetree/bindings/net/microchip,sparx5-switch.yaml b/Documentation/devicetree/bindings/net/microchip,sparx5-switch.yaml index 347b912a46bb..6c86d3d85e99 100644 --- a/Documentation/devicetree/bindings/net/microchip,sparx5-switch.yaml +++ b/Documentation/devicetree/bindings/net/microchip,sparx5-switch.yaml @@ -53,12 +53,14 @@ properties: items: - description: register based extraction - description: frame dma based extraction + - description: ptp interrupt interrupt-names: minItems: 1 items: - const: xtr - const: fdma + - const: ptp resets: items: diff --git a/Documentation/devicetree/bindings/net/mscc-miim.txt b/Documentation/devicetree/bindings/net/mscc-miim.txt index 7104679cf59d..70e0cb1ee485 100644 --- a/Documentation/devicetree/bindings/net/mscc-miim.txt +++ b/Documentation/devicetree/bindings/net/mscc-miim.txt @@ -2,7 +2,7 @@ Microsemi MII Management Controller (MIIM) / MDIO ================================================= Properties: -- compatible: must be "mscc,ocelot-miim" +- compatible: must be "mscc,ocelot-miim" or "microchip,lan966x-miim" - reg: The base address of the MDIO bus controller register bank. Optionally, a second register bank can be defined if there is an associated reset register for internal PHYs diff --git a/Documentation/devicetree/bindings/net/renesas,etheravb.yaml b/Documentation/devicetree/bindings/net/renesas,etheravb.yaml index bda821065a2b..ee2ccacc39ff 100644 --- a/Documentation/devicetree/bindings/net/renesas,etheravb.yaml +++ b/Documentation/devicetree/bindings/net/renesas,etheravb.yaml @@ -45,8 +45,10 @@ properties: - items: - enum: + - renesas,r9a07g043-gbeth # RZ/G2UL - renesas,r9a07g044-gbeth # RZ/G2{L,LC} - - const: renesas,rzg2l-gbeth # RZ/G2L + - renesas,r9a07g054-gbeth # RZ/V2L + - const: renesas,rzg2l-gbeth # RZ/{G2L,G2UL,V2L} family reg: true diff --git a/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml b/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml index 269cd63fb544..3216c999f0e1 100644 --- a/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml +++ b/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.yaml @@ -18,7 +18,7 @@ description: | wireless device. The node is expected to be specified as a child node of the PCI controller to which the wireless chip is connected. Alternatively, it can specify the wireless part of the MT7628/MT7688 - or MT7622 SoC. + or MT7622/MT7986 SoC. allOf: - $ref: ieee80211.yaml# @@ -29,9 +29,13 @@ properties: - mediatek,mt76 - mediatek,mt7628-wmac - mediatek,mt7622-wmac + - mediatek,mt7986-wmac reg: - maxItems: 1 + minItems: 1 + maxItems: 3 + description: + MT7986 should contain 3 regions consys, dcm, and sku, in this order. interrupts: maxItems: 1 @@ -39,6 +43,17 @@ properties: power-domains: maxItems: 1 + memory-region: + maxItems: 1 + + resets: + maxItems: 1 + description: + Specify the consys reset for mt7986. + + reset-name: + const: consys + mediatek,infracfg: $ref: /schemas/types.yaml#/definitions/phandle description: @@ -69,6 +84,15 @@ properties: calibration data is generic and specific calibration data should be pulled from the OTP ROM + mediatek,disable-radar-background: + type: boolean + description: + Disable/enable radar/CAC detection running on a dedicated offchannel + chain available on some hw. + Background radar/CAC detection allows to avoid the CAC downtime + switching on a different channel during CAC detection on the selected + radar channel. + led: type: object $ref: /schemas/leds/common.yaml# @@ -165,7 +189,7 @@ required: - compatible - reg -additionalProperties: false +unevaluatedProperties: false examples: - | @@ -231,3 +255,15 @@ examples: power-domains = <&scpsys 3>; }; + + - | + wifi@18000000 { + compatible = "mediatek,mt7986-wmac"; + resets = <&watchdog 23>; + reset-names = "consys"; + reg = <0x18000000 0x1000000>, + <0x10003000 0x1000>, + <0x11d10000 0x1000>; + interrupts = <GIC_SPI 213 IRQ_TYPE_LEVEL_HIGH>; + memory-region = <&wmcpu_emi>; + }; diff --git a/Documentation/devicetree/bindings/phy/fsl,lynx-28g.yaml b/Documentation/devicetree/bindings/phy/fsl,lynx-28g.yaml new file mode 100644 index 000000000000..4d91e2f4f247 --- /dev/null +++ b/Documentation/devicetree/bindings/phy/fsl,lynx-28g.yaml @@ -0,0 +1,40 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/phy/fsl,lynx-28g.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Freescale Lynx 28G SerDes PHY binding + +maintainers: + - Ioana Ciornei <ioana.ciornei@nxp.com> + +properties: + compatible: + enum: + - fsl,lynx-28g + + reg: + maxItems: 1 + + "#phy-cells": + const: 1 + +required: + - compatible + - reg + - "#phy-cells" + +additionalProperties: false + +examples: + - | + soc { + #address-cells = <2>; + #size-cells = <2>; + serdes_1: phy@1ea0000 { + compatible = "fsl,lynx-28g"; + reg = <0x0 0x1ea0000 0x0 0x1e30>; + #phy-cells = <1>; + }; + }; diff --git a/Documentation/devicetree/bindings/phy/transmit-amplitude.yaml b/Documentation/devicetree/bindings/phy/transmit-amplitude.yaml new file mode 100644 index 000000000000..51492fe738ec --- /dev/null +++ b/Documentation/devicetree/bindings/phy/transmit-amplitude.yaml @@ -0,0 +1,103 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/phy/transmit-amplitude.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Common PHY and network PCS transmit amplitude property binding + +description: + Binding describing the peak-to-peak transmit amplitude for common PHYs + and network PCSes. + +maintainers: + - Marek BehĂșn <kabel@kernel.org> + +properties: + tx-p2p-microvolt: + description: + Transmit amplitude voltages in microvolts, peak-to-peak. If this property + contains multiple values for various PHY modes, the + 'tx-p2p-microvolt-names' property must be provided and contain + corresponding mode names. + + tx-p2p-microvolt-names: + description: | + Names of the modes corresponding to voltages in the 'tx-p2p-microvolt' + property. Required only if multiple voltages are provided. + + If a value of 'default' is provided, the system should use it for any PHY + mode that is otherwise not defined here. If 'default' is not provided, the + system should use manufacturer default value. + minItems: 1 + maxItems: 16 + items: + enum: + - default + + # ethernet modes + - sgmii + - qsgmii + - xgmii + - 1000base-x + - 2500base-x + - 5gbase-r + - rxaui + - xaui + - 10gbase-kr + - usxgmii + - 10gbase-r + - 25gbase-r + + # PCIe modes + - pcie + - pcie1 + - pcie2 + - pcie3 + - pcie4 + - pcie5 + - pcie6 + + # USB modes + - usb + - usb-ls + - usb-fs + - usb-hs + - usb-ss + - usb-ss+ + - usb-4 + + # storage modes + - sata + - ufs-hs + - ufs-hs-a + - ufs-hs-b + + # display modes + - lvds + - dp + - dp-rbr + - dp-hbr + - dp-hbr2 + - dp-hbr3 + - dp-uhbr-10 + - dp-uhbr-13.5 + - dp-uhbr-20 + + # camera modes + - mipi-dphy + - mipi-dphy-univ + - mipi-dphy-v2.5-univ + +dependencies: + tx-p2p-microvolt-names: [ tx-p2p-microvolt ] + +additionalProperties: true + +examples: + - | + phy: phy { + #phy-cells = <1>; + tx-p2p-microvolt = <915000>, <1100000>, <1200000>; + tx-p2p-microvolt-names = "2500base-x", "usb-hs", "usb-ss"; + }; diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst index ab98373535ea..525e6842dd33 100644 --- a/Documentation/networking/bonding.rst +++ b/Documentation/networking/bonding.rst @@ -313,6 +313,17 @@ arp_ip_target maximum number of targets that can be specified is 16. The default value is no IP addresses. +ns_ip6_target + + Specifies the IPv6 addresses to use as IPv6 monitoring peers when + arp_interval is > 0. These are the targets of the NS request + sent to determine the health of the link to the targets. + Specify these values in ffff:ffff::ffff:ffff format. Multiple IPv6 + addresses must be separated by a comma. At least one IPv6 + address must be given for NS/NA monitoring to function. The + maximum number of targets that can be specified is 16. The + default value is no IPv6 addresses. + arp_validate Specifies whether or not ARP probes and replies should be diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst index 443123772f44..c17cdb079611 100644 --- a/Documentation/networking/devlink/index.rst +++ b/Documentation/networking/devlink/index.rst @@ -4,6 +4,22 @@ Linux Devlink Documentation devlink is an API to expose device information and resources not directly related to any device class, such as chip-wide/switch-ASIC-wide configuration. +Locking +------- + +Driver facing APIs are currently transitioning to allow more explicit +locking. Drivers can use the existing ``devlink_*`` set of APIs, or +new APIs prefixed by ``devl_*``. The older APIs handle all the locking +in devlink core, but don't allow registration of most sub-objects once +the main devlink object is itself registered. The newer ``devl_*`` APIs assume +the devlink instance lock is already held. Drivers can take the instance +lock by calling ``devl_lock()``. It is also held in most of the callbacks. +Eventually all callbacks will be invoked under the devlink instance lock, +refer to the use of the ``DEVLINK_NL_FLAG_NO_LOCK`` flag in devlink core +to find out which callbacks are not converted, yet. + +Drivers are encouraged to use the devlink instance lock for their own needs. + Interface documentation ----------------------- diff --git a/Documentation/networking/dsa/sja1105.rst b/Documentation/networking/dsa/sja1105.rst index 29b1bae0cf00..e0219c1452ab 100644 --- a/Documentation/networking/dsa/sja1105.rst +++ b/Documentation/networking/dsa/sja1105.rst @@ -293,6 +293,33 @@ of dropped frames, which is a sum of frames dropped due to timing violations, lack of destination ports and MTU enforcement checks). Byte-level counters are not available. +Limitations +=========== + +The SJA1105 switch family always performs VLAN processing. When configured as +VLAN-unaware, frames carry a different VLAN tag internally, depending on +whether the port is standalone or under a VLAN-unaware bridge. + +The virtual link keys are always fixed at {MAC DA, VLAN ID, VLAN PCP}, but the +driver asks for the VLAN ID and VLAN PCP when the port is under a VLAN-aware +bridge. Otherwise, it fills in the VLAN ID and PCP automatically, based on +whether the port is standalone or in a VLAN-unaware bridge, and accepts only +"VLAN-unaware" tc-flower keys (MAC DA). + +The existing tc-flower keys that are offloaded using virtual links are no +longer operational after one of the following happens: + +- port was standalone and joins a bridge (VLAN-aware or VLAN-unaware) +- port is part of a bridge whose VLAN awareness state changes +- port was part of a bridge and becomes standalone +- port was standalone, but another port joins a VLAN-aware bridge and this + changes the global VLAN awareness state of the bridge + +The driver cannot veto all these operations, and it cannot update/remove the +existing tc-flower filters either. So for proper operation, the tc-flower +filters should be installed only after the forwarding configuration of the port +has been made, and removed by user space before making any changes to it. + Device Tree bindings and board design ===================================== diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst index 9d98e0511249..24d9be69065d 100644 --- a/Documentation/networking/ethtool-netlink.rst +++ b/Documentation/networking/ethtool-netlink.rst @@ -860,8 +860,17 @@ Kernel response contents: ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring ``ETHTOOL_A_RINGS_RX_BUF_LEN`` u32 size of buffers on the ring + ``ETHTOOL_A_RINGS_TCP_DATA_SPLIT`` u8 TCP header / data split + ``ETHTOOL_A_RINGS_CQE_SIZE`` u32 Size of TX/RX CQE ==================================== ====== =========================== +``ETHTOOL_A_RINGS_TCP_DATA_SPLIT`` indicates whether the device is usable with +page-flipping TCP zero-copy receive (``getsockopt(TCP_ZEROCOPY_RECEIVE)``). +If enabled the device is configured to place frame headers and data into +separate buffers. The device configuration must make it possible to receive +full memory pages of data, for example because MTU is high enough or through +HW-GRO. + RINGS_SET ========= @@ -877,6 +886,7 @@ Request contents: ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring ``ETHTOOL_A_RINGS_RX_BUF_LEN`` u32 size of buffers on the ring + ``ETHTOOL_A_RINGS_CQE_SIZE`` u32 Size of TX/RX CQE ==================================== ====== =========================== Kernel checks that requested ring sizes do not exceed limits reported by @@ -884,6 +894,15 @@ driver. Driver may impose additional constraints and may not suspport all attributes. +``ETHTOOL_A_RINGS_CQE_SIZE`` specifies the completion queue event size. +Completion queue events(CQE) are the events posted by NIC to indicate the +completion status of a packet when the packet is sent(like send success or +error) or received(like pointers to packet fragments). The CQE size parameter +enables to modify the CQE size other than default size if NIC supports it. +A bigger CQE can have more receive buffer pointers inturn NIC can transfer +a bigger frame from wire. Based on the NIC hardware, the overall completion +queue size can be adjusted in the driver if CQE size is modified. + CHANNELS_GET ============ diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 58bc8cd367c6..ce017136ab05 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -96,6 +96,7 @@ Contents: sctp secid seg6-sysctl + smc-sysctl statistics strparser switchdev diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 2572eecc3e86..b0024aa7b051 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -878,6 +878,29 @@ tcp_min_tso_segs - INTEGER Default: 2 +tcp_tso_rtt_log - INTEGER + Adjustment of TSO packet sizes based on min_rtt + + Starting from linux-5.18, TCP autosizing can be tweaked + for flows having small RTT. + + Old autosizing was splitting the pacing budget to send 1024 TSO + per second. + + tso_packet_size = sk->sk_pacing_rate / 1024; + + With the new mechanism, we increase this TSO sizing using: + + distance = min_rtt_usec / (2^tcp_tso_rtt_log) + tso_packet_size += gso_max_size >> distance; + + This means that flows between very close hosts can use bigger + TSO packets, reducing their cpu costs. + + If you want to use the old autosizing, set this sysctl to 0. + + Default: 9 (2^9 = 512 usec) + tcp_pacing_ss_ratio - INTEGER sk->sk_pacing_rate is set by TCP stack using a ratio applied to current rate. (current_rate = cwnd * mss / srtt) diff --git a/Documentation/networking/mctp.rst b/Documentation/networking/mctp.rst index 46f74bffce0f..c628cb5406d2 100644 --- a/Documentation/networking/mctp.rst +++ b/Documentation/networking/mctp.rst @@ -212,6 +212,54 @@ remote address is already known, or the message does not require a reply. Like the send calls, sockets will only receive responses to requests they have sent (TO=1) and may only respond (TO=0) to requests they have received. +``ioctl(SIOCMCTPALLOCTAG)`` and ``ioctl(SIOCMCTPDROPTAG)`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +These tags give applications more control over MCTP message tags, by allocating +(and dropping) tag values explicitly, rather than the kernel automatically +allocating a per-message tag at ``sendmsg()`` time. + +In general, you will only need to use these ioctls if your MCTP protocol does +not fit the usual request/response model. For example, if you need to persist +tags across multiple requests, or a request may generate more than one response. +In these cases, the ioctls allow you to decouple the tag allocation (and +release) from individual message send and receive operations. + +Both ioctls are passed a pointer to a ``struct mctp_ioc_tag_ctl``: + +.. code-block:: C + + struct mctp_ioc_tag_ctl { + mctp_eid_t peer_addr; + __u8 tag; + __u16 flags; + }; + +``SIOCMCTPALLOCTAG`` allocates a tag for a specific peer, which an application +can use in future ``sendmsg()`` calls. The application populates the +``peer_addr`` member with the remote EID. Other fields must be zero. + +On return, the ``tag`` member will be populated with the allocated tag value. +The allocated tag will have the following tag bits set: + + - ``MCTP_TAG_OWNER``: it only makes sense to allocate tags if you're the tag + owner + + - ``MCTP_TAG_PREALLOC``: to indicate to ``sendmsg()`` that this is a + preallocated tag. + + - ... and the actual tag value, within the least-significant three bits + (``MCTP_TAG_MASK``). Note that zero is a valid tag value. + +The tag value should be used as-is for the ``smctp_tag`` member of ``struct +sockaddr_mctp``. + +``SIOCMCTPDROPTAG`` releases a tag that has been previously allocated by a +``SIOCMCTPALLOCTAG`` ioctl. The ``peer_addr`` must be the same as used for the +allocation, and the ``tag`` value must match exactly the tag returned from the +allocation (including the ``MCTP_TAG_OWNER`` and ``MCTP_TAG_PREALLOC`` bits). +The ``flags`` field must be zero. + Kernel internals ================ diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst index a147591ce203..5db8c263b0c6 100644 --- a/Documentation/networking/page_pool.rst +++ b/Documentation/networking/page_pool.rst @@ -105,6 +105,47 @@ a page will cause no race conditions is enough. Please note the caller must not use data area after running page_pool_put_page_bulk(), as this function overwrites it. +* page_pool_get_stats(): Retrieve statistics about the page_pool. This API + is only available if the kernel has been configured with + ``CONFIG_PAGE_POOL_STATS=y``. A pointer to a caller allocated ``struct + page_pool_stats`` structure is passed to this API which is filled in. The + caller can then report those stats to the user (perhaps via ethtool, + debugfs, etc.). See below for an example usage of this API. + +Stats API and structures +------------------------ +If the kernel is configured with ``CONFIG_PAGE_POOL_STATS=y``, the API +``page_pool_get_stats()`` and structures described below are available. It +takes a pointer to a ``struct page_pool`` and a pointer to a ``struct +page_pool_stats`` allocated by the caller. + +The API will fill in the provided ``struct page_pool_stats`` with +statistics about the page_pool. + +The stats structure has the following fields:: + + struct page_pool_stats { + struct page_pool_alloc_stats alloc_stats; + struct page_pool_recycle_stats recycle_stats; + }; + + +The ``struct page_pool_alloc_stats`` has the following fields: + * ``fast``: successful fast path allocations + * ``slow``: slow path order-0 allocations + * ``slow_high_order``: slow path high order allocations + * ``empty``: ptr ring is empty, so a slow path allocation was forced. + * ``refill``: an allocation which triggered a refill of the cache + * ``waive``: pages obtained from the ptr ring that cannot be added to + the cache due to a NUMA mismatch. + +The ``struct page_pool_recycle_stats`` has the following fields: + * ``cached``: recycling placed page in the page pool cache + * ``cache_full``: page pool cache was full + * ``ring``: page placed into the ptr ring + * ``ring_full``: page released from page pool because the ptr ring was full + * ``released_refcnt``: page released (and not recycled) because refcnt > 1 + Coding examples =============== @@ -157,6 +198,21 @@ NAPI poller } } +Stats +----- + +.. code-block:: c + + #ifdef CONFIG_PAGE_POOL_STATS + /* retrieve stats */ + struct page_pool_stats stats = { 0 }; + if (page_pool_get_stats(page_pool, &stats)) { + /* perhaps the driver reports statistics with ethool */ + ethtool_print_allocation_stats(&stats.alloc_stats); + ethtool_print_recycle_stats(&stats.recycle_stats); + } + #endif + Driver unload ------------- diff --git a/Documentation/networking/smc-sysctl.rst b/Documentation/networking/smc-sysctl.rst new file mode 100644 index 000000000000..0987fd1bc220 --- /dev/null +++ b/Documentation/networking/smc-sysctl.rst @@ -0,0 +1,23 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========== +SMC Sysctl +========== + +/proc/sys/net/smc/* Variables +============================= + +autocorking_size - INTEGER + Setting SMC auto corking size: + SMC auto corking is like TCP auto corking from the application's + perspective of view. When applications do consecutive small + write()/sendmsg() system calls, we try to coalesce these small writes + as much as possible, to lower total amount of CDC and RDMA Write been + sent. + autocorking_size limits the maximum corked bytes that can be sent to + the under device in 1 single sending. If set to 0, the SMC auto corking + is disabled. + Applications can still use TCP_CORK for optimal behavior when they + know how/when to uncork their sockets. + + Default: 64K diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst index f5809206eb93..be4eb1242057 100644 --- a/Documentation/networking/timestamping.rst +++ b/Documentation/networking/timestamping.rst @@ -668,7 +668,7 @@ timestamping: (through another RX timestamping FIFO). Deferral on RX is typically necessary when retrieving the timestamp needs a sleepable context. In that case, it is the responsibility of the DSA driver to call - ``netif_rx_ni()`` on the freshly timestamped skb. + ``netif_rx()`` on the freshly timestamped skb. 3.2.2 Ethernet PHYs ^^^^^^^^^^^^^^^^^^^ diff --git a/Documentation/trace/fprobe.rst b/Documentation/trace/fprobe.rst new file mode 100644 index 000000000000..b64bec1ce144 --- /dev/null +++ b/Documentation/trace/fprobe.rst @@ -0,0 +1,174 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================================== +Fprobe - Function entry/exit probe +================================== + +.. Author: Masami Hiramatsu <mhiramat@kernel.org> + +Introduction +============ + +Fprobe is a function entry/exit probe mechanism based on ftrace. +Instead of using ftrace full feature, if you only want to attach callbacks +on function entry and exit, similar to the kprobes and kretprobes, you can +use fprobe. Compared with kprobes and kretprobes, fprobe gives faster +instrumentation for multiple functions with single handler. This document +describes how to use fprobe. + +The usage of fprobe +=================== + +The fprobe is a wrapper of ftrace (+ kretprobe-like return callback) to +attach callbacks to multiple function entry and exit. User needs to set up +the `struct fprobe` and pass it to `register_fprobe()`. + +Typically, `fprobe` data structure is initialized with the `entry_handler` +and/or `exit_handler` as below. + +.. code-block:: c + + struct fprobe fp = { + .entry_handler = my_entry_callback, + .exit_handler = my_exit_callback, + }; + +To enable the fprobe, call one of register_fprobe(), register_fprobe_ips(), and +register_fprobe_syms(). These functions register the fprobe with different types +of parameters. + +The register_fprobe() enables a fprobe by function-name filters. +E.g. this enables @fp on "func*()" function except "func2()".:: + + register_fprobe(&fp, "func*", "func2"); + +The register_fprobe_ips() enables a fprobe by ftrace-location addresses. +E.g. + +.. code-block:: c + + unsigned long ips[] = { 0x.... }; + + register_fprobe_ips(&fp, ips, ARRAY_SIZE(ips)); + +And the register_fprobe_syms() enables a fprobe by symbol names. +E.g. + +.. code-block:: c + + char syms[] = {"func1", "func2", "func3"}; + + register_fprobe_syms(&fp, syms, ARRAY_SIZE(syms)); + +To disable (remove from functions) this fprobe, call:: + + unregister_fprobe(&fp); + +You can temporally (soft) disable the fprobe by:: + + disable_fprobe(&fp); + +and resume by:: + + enable_fprobe(&fp); + +The above is defined by including the header:: + + #include <linux/fprobe.h> + +Same as ftrace, the registered callbacks will start being called some time +after the register_fprobe() is called and before it returns. See +:file:`Documentation/trace/ftrace.rst`. + +Also, the unregister_fprobe() will guarantee that the both enter and exit +handlers are no longer being called by functions after unregister_fprobe() +returns as same as unregister_ftrace_function(). + +The fprobe entry/exit handler +============================= + +The prototype of the entry/exit callback function is as follows: + +.. code-block:: c + + void callback_func(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs); + +Note that both entry and exit callbacks have same ptototype. The @entry_ip is +saved at function entry and passed to exit handler. + +@fp + This is the address of `fprobe` data structure related to this handler. + You can embed the `fprobe` to your data structure and get it by + container_of() macro from @fp. The @fp must not be NULL. + +@entry_ip + This is the ftrace address of the traced function (both entry and exit). + Note that this may not be the actual entry address of the function but + the address where the ftrace is instrumented. + +@regs + This is the `pt_regs` data structure at the entry and exit. Note that + the instruction pointer of @regs may be different from the @entry_ip + in the entry_handler. If you need traced instruction pointer, you need + to use @entry_ip. On the other hand, in the exit_handler, the instruction + pointer of @regs is set to the currect return address. + +Share the callbacks with kprobes +================================ + +Since the recursion safeness of the fprobe (and ftrace) is a bit different +from the kprobes, this may cause an issue if user wants to run the same +code from the fprobe and the kprobes. + +Kprobes has per-cpu 'current_kprobe' variable which protects the kprobe +handler from recursion in all cases. On the other hand, fprobe uses +only ftrace_test_recursion_trylock(). This allows interrupt context to +call another (or same) fprobe while the fprobe user handler is running. + +This is not a matter if the common callback code has its own recursion +detection, or it can handle the recursion in the different contexts +(normal/interrupt/NMI.) +But if it relies on the 'current_kprobe' recursion lock, it has to check +kprobe_running() and use kprobe_busy_*() APIs. + +Fprobe has FPROBE_FL_KPROBE_SHARED flag to do this. If your common callback +code will be shared with kprobes, please set FPROBE_FL_KPROBE_SHARED +*before* registering the fprobe, like: + +.. code-block:: c + + fprobe.flags = FPROBE_FL_KPROBE_SHARED; + + register_fprobe(&fprobe, "func*", NULL); + +This will protect your common callback from the nested call. + +The missed counter +================== + +The `fprobe` data structure has `fprobe::nmissed` counter field as same as +kprobes. +This counter counts up when; + + - fprobe fails to take ftrace_recursion lock. This usually means that a function + which is traced by other ftrace users is called from the entry_handler. + + - fprobe fails to setup the function exit because of the shortage of rethook + (the shadow stack for hooking the function return.) + +The `fprobe::nmissed` field counts up in both cases. Therefore, the former +skips both of entry and exit callback and the latter skips the exit +callback, but in both case the counter will increase by 1. + +Note that if you set the FTRACE_OPS_FL_RECURSION and/or FTRACE_OPS_FL_RCU to +`fprobe::ops::flags` (ftrace_ops::flags) when registering the fprobe, this +counter may not work correctly, because ftrace skips the fprobe function which +increase the counter. + + +Functions and structures +======================== + +.. kernel-doc:: include/linux/fprobe.h +.. kernel-doc:: kernel/trace/fprobe.c + diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst index 3a47aa8341c6..f9b7bcb5a630 100644 --- a/Documentation/trace/index.rst +++ b/Documentation/trace/index.rst @@ -9,6 +9,7 @@ Linux Tracing Technologies tracepoint-analysis ftrace ftrace-uses + fprobe kprobes kprobetrace uprobetracer |