diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2020-03-31 17:29:33 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2020-03-31 17:29:33 -0700 |
commit | 29d9f30d4ce6c7a38745a54a8cddface10013490 (patch) | |
tree | 85649ba6a7b39203584d8db9365e03f64e62c136 /Documentation | |
parent | 56a451b780676bc1cdac011735fe2869fa2e9abf (diff) | |
parent | 7f80ccfe996871ca69648efee74a60ae7ad0dcd9 (diff) | |
download | linux-29d9f30d4ce6c7a38745a54a8cddface10013490.tar.bz2 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:
"Highlights:
1) Fix the iwlwifi regression, from Johannes Berg.
2) Support BSS coloring and 802.11 encapsulation offloading in
hardware, from John Crispin.
3) Fix some potential Spectre issues in qtnfmac, from Sergey
Matyukevich.
4) Add TTL decrement action to openvswitch, from Matteo Croce.
5) Allow paralleization through flow_action setup by not taking the
RTNL mutex, from Vlad Buslov.
6) A lot of zero-length array to flexible-array conversions, from
Gustavo A. R. Silva.
7) Align XDP statistics names across several drivers for consistency,
from Lorenzo Bianconi.
8) Add various pieces of infrastructure for offloading conntrack, and
make use of it in mlx5 driver, from Paul Blakey.
9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.
10) Lots of parallelization improvements during configuration changes
in mlxsw driver, from Ido Schimmel.
11) Add support to devlink for generic packet traps, which report
packets dropped during ACL processing. And use them in mlxsw
driver. From Jiri Pirko.
12) Support bcmgenet on ACPI, from Jeremy Linton.
13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
Starovoitov, and your's truly.
14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.
15) Fix sysfs permissions when network devices change namespaces, from
Christian Brauner.
16) Add a flags element to ethtool_ops so that drivers can more simply
indicate which coalescing parameters they actually support, and
therefore the generic layer can validate the user's ethtool
request. Use this in all drivers, from Jakub Kicinski.
17) Offload FIFO qdisc in mlxsw, from Petr Machata.
18) Support UDP sockets in sockmap, from Lorenz Bauer.
19) Fix stretch ACK bugs in several TCP congestion control modules,
from Pengcheng Yang.
20) Support virtual functiosn in octeontx2 driver, from Tomasz
Duszynski.
21) Add region operations for devlink and use it in ice driver to dump
NVM contents, from Jacob Keller.
22) Add support for hw offload of MACSEC, from Antoine Tenart.
23) Add support for BPF programs that can be attached to LSM hooks,
from KP Singh.
24) Support for multiple paths, path managers, and counters in MPTCP.
From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
and others.
25) More progress on adding the netlink interface to ethtool, from
Michal Kubecek"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
cxgb4/chcr: nic-tls stats in ethtool
net: dsa: fix oops while probing Marvell DSA switches
net/bpfilter: remove superfluous testing message
net: macb: Fix handling of fixed-link node
net: dsa: ksz: Select KSZ protocol tag
netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
net: stmmac: create dwmac-intel.c to contain all Intel platform
net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
net: dsa: bcm_sf2: Add support for matching VLAN TCI
net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
net: dsa: bcm_sf2: Disable learning for ASP port
net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
net: dsa: b53: Restore VLAN entries upon (re)configuration
net: dsa: bcm_sf2: Fix overflow checks
hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
...
Diffstat (limited to 'Documentation')
34 files changed, 2235 insertions, 115 deletions
diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst index 287b98708a40..e043c9213388 100644 --- a/Documentation/admin-guide/sysctl/net.rst +++ b/Documentation/admin-guide/sysctl/net.rst @@ -67,7 +67,8 @@ two flavors of JITs, the newer eBPF JIT currently supported on: - sparc64 - mips64 - s390x - - riscv + - riscv64 + - riscv32 And the older cBPF JIT supported on the following archs: diff --git a/Documentation/bpf/bpf_devel_QA.rst b/Documentation/bpf/bpf_devel_QA.rst index c9856b927055..38c15c6fcb14 100644 --- a/Documentation/bpf/bpf_devel_QA.rst +++ b/Documentation/bpf/bpf_devel_QA.rst @@ -20,11 +20,11 @@ Reporting bugs Q: How do I report bugs for BPF kernel code? -------------------------------------------- A: Since all BPF kernel development as well as bpftool and iproute2 BPF -loader development happens through the netdev kernel mailing list, +loader development happens through the bpf kernel mailing list, please report any found issues around BPF to the following mailing list: - netdev@vger.kernel.org + bpf@vger.kernel.org This may also include issues related to XDP, BPF tracing, etc. @@ -46,17 +46,12 @@ Submitting patches Q: To which mailing list do I need to submit my BPF patches? ------------------------------------------------------------ -A: Please submit your BPF patches to the netdev kernel mailing list: +A: Please submit your BPF patches to the bpf kernel mailing list: - netdev@vger.kernel.org - -Historically, BPF came out of networking and has always been maintained -by the kernel networking community. Although these days BPF touches -many other subsystems as well, the patches are still routed mainly -through the networking community. + bpf@vger.kernel.org In case your patch has changes in various different subsystems (e.g. -tracing, security, etc), make sure to Cc the related kernel mailing +networking, tracing, security, etc), make sure to Cc the related kernel mailing lists and maintainers from there as well, so they are able to review the changes and provide their Acked-by's to the patches. @@ -168,7 +163,7 @@ a BPF point of view. Be aware that this is not a final verdict that the patch will automatically get accepted into net or net-next trees eventually: -On the netdev kernel mailing list reviews can come in at any point +On the bpf kernel mailing list reviews can come in at any point in time. If discussions around a patch conclude that they cannot get included as-is, we will either apply a follow-up fix or drop them from the trees entirely. Therefore, we also reserve to rebase @@ -494,15 +489,15 @@ A: You need cmake and gcc-c++ as build requisites for LLVM. Once you have that set up, proceed with building the latest LLVM and clang version from the git repositories:: - $ git clone http://llvm.org/git/llvm.git - $ cd llvm/tools - $ git clone --depth 1 http://llvm.org/git/clang.git - $ cd ..; mkdir build; cd build - $ cmake .. -DLLVM_TARGETS_TO_BUILD="BPF;X86" \ + $ git clone https://github.com/llvm/llvm-project.git + $ mkdir -p llvm-project/llvm/build/install + $ cd llvm-project/llvm/build + $ cmake .. -G "Ninja" -DLLVM_TARGETS_TO_BUILD="BPF;X86" \ + -DLLVM_ENABLE_PROJECTS="clang" \ -DBUILD_SHARED_LIBS=OFF \ -DCMAKE_BUILD_TYPE=Release \ -DLLVM_BUILD_RUNTIME=OFF - $ make -j $(getconf _NPROCESSORS_ONLN) + $ ninja The built binaries can then be found in the build/bin/ directory, where you can point the PATH variable to. diff --git a/Documentation/bpf/bpf_lsm.rst b/Documentation/bpf/bpf_lsm.rst new file mode 100644 index 000000000000..1c0a75a51d79 --- /dev/null +++ b/Documentation/bpf/bpf_lsm.rst @@ -0,0 +1,142 @@ +.. SPDX-License-Identifier: GPL-2.0+ +.. Copyright (C) 2020 Google LLC. + +================ +LSM BPF Programs +================ + +These BPF programs allow runtime instrumentation of the LSM hooks by privileged +users to implement system-wide MAC (Mandatory Access Control) and Audit +policies using eBPF. + +Structure +--------- + +The example shows an eBPF program that can be attached to the ``file_mprotect`` +LSM hook: + +.. c:function:: int file_mprotect(struct vm_area_struct *vma, unsigned long reqprot, unsigned long prot); + +Other LSM hooks which can be instrumented can be found in +``include/linux/lsm_hooks.h``. + +eBPF programs that use :doc:`/bpf/btf` do not need to include kernel headers +for accessing information from the attached eBPF program's context. They can +simply declare the structures in the eBPF program and only specify the fields +that need to be accessed. + +.. code-block:: c + + struct mm_struct { + unsigned long start_brk, brk, start_stack; + } __attribute__((preserve_access_index)); + + struct vm_area_struct { + unsigned long start_brk, brk, start_stack; + unsigned long vm_start, vm_end; + struct mm_struct *vm_mm; + } __attribute__((preserve_access_index)); + + +.. note:: The order of the fields is irrelevant. + +This can be further simplified (if one has access to the BTF information at +build time) by generating the ``vmlinux.h`` with: + +.. code-block:: console + + # bpftool btf dump file <path-to-btf-vmlinux> format c > vmlinux.h + +.. note:: ``path-to-btf-vmlinux`` can be ``/sys/kernel/btf/vmlinux`` if the + build environment matches the environment the BPF programs are + deployed in. + +The ``vmlinux.h`` can then simply be included in the BPF programs without +requiring the definition of the types. + +The eBPF programs can be declared using the``BPF_PROG`` +macros defined in `tools/lib/bpf/bpf_tracing.h`_. In this +example: + + * ``"lsm/file_mprotect"`` indicates the LSM hook that the program must + be attached to + * ``mprotect_audit`` is the name of the eBPF program + +.. code-block:: c + + SEC("lsm/file_mprotect") + int BPF_PROG(mprotect_audit, struct vm_area_struct *vma, + unsigned long reqprot, unsigned long prot, int ret) + { + /* ret is the return value from the previous BPF program + * or 0 if it's the first hook. + */ + if (ret != 0) + return ret; + + int is_heap; + + is_heap = (vma->vm_start >= vma->vm_mm->start_brk && + vma->vm_end <= vma->vm_mm->brk); + + /* Return an -EPERM or write information to the perf events buffer + * for auditing + */ + if (is_heap) + return -EPERM; + } + +The ``__attribute__((preserve_access_index))`` is a clang feature that allows +the BPF verifier to update the offsets for the access at runtime using the +:doc:`/bpf/btf` information. Since the BPF verifier is aware of the types, it +also validates all the accesses made to the various types in the eBPF program. + +Loading +------- + +eBPF programs can be loaded with the :manpage:`bpf(2)` syscall's +``BPF_PROG_LOAD`` operation: + +.. code-block:: c + + struct bpf_object *obj; + + obj = bpf_object__open("./my_prog.o"); + bpf_object__load(obj); + +This can be simplified by using a skeleton header generated by ``bpftool``: + +.. code-block:: console + + # bpftool gen skeleton my_prog.o > my_prog.skel.h + +and the program can be loaded by including ``my_prog.skel.h`` and using +the generated helper, ``my_prog__open_and_load``. + +Attachment to LSM Hooks +----------------------- + +The LSM allows attachment of eBPF programs as LSM hooks using :manpage:`bpf(2)` +syscall's ``BPF_RAW_TRACEPOINT_OPEN`` operation or more simply by +using the libbpf helper ``bpf_program__attach_lsm``. + +The program can be detached from the LSM hook by *destroying* the ``link`` +link returned by ``bpf_program__attach_lsm`` using ``bpf_link__destroy``. + +One can also use the helpers generated in ``my_prog.skel.h`` i.e. +``my_prog__attach`` for attachment and ``my_prog__destroy`` for cleaning up. + +Examples +-------- + +An example eBPF program can be found in +`tools/testing/selftests/bpf/progs/lsm.c`_ and the corresponding +userspace code in `tools/testing/selftests/bpf/prog_tests/test_lsm.c`_ + +.. Links +.. _tools/lib/bpf/bpf_tracing.h: + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/lib/bpf/bpf_tracing.h +.. _tools/testing/selftests/bpf/progs/lsm.c: + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/bpf/progs/lsm.c +.. _tools/testing/selftests/bpf/prog_tests/test_lsm.c: + https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/testing/selftests/bpf/prog_tests/test_lsm.c diff --git a/Documentation/bpf/drgn.rst b/Documentation/bpf/drgn.rst new file mode 100644 index 000000000000..41f223c3161e --- /dev/null +++ b/Documentation/bpf/drgn.rst @@ -0,0 +1,213 @@ +.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) + +============== +BPF drgn tools +============== + +drgn scripts is a convenient and easy to use mechanism to retrieve arbitrary +kernel data structures. drgn is not relying on kernel UAPI to read the data. +Instead it's reading directly from ``/proc/kcore`` or vmcore and pretty prints +the data based on DWARF debug information from vmlinux. + +This document describes BPF related drgn tools. + +See `drgn/tools`_ for all tools available at the moment and `drgn/doc`_ for +more details on drgn itself. + +bpf_inspect.py +-------------- + +Description +=========== + +`bpf_inspect.py`_ is a tool intended to inspect BPF programs and maps. It can +iterate over all programs and maps in the system and print basic information +about these objects, including id, type and name. + +The main use-case `bpf_inspect.py`_ covers is to show BPF programs of types +``BPF_PROG_TYPE_EXT`` and ``BPF_PROG_TYPE_TRACING`` attached to other BPF +programs via ``freplace``/``fentry``/``fexit`` mechanisms, since there is no +user-space API to get this information. + +Getting started +=============== + +List BPF programs (full names are obtained from BTF):: + + % sudo bpf_inspect.py prog + 27: BPF_PROG_TYPE_TRACEPOINT tracepoint__tcp__tcp_send_reset + 4632: BPF_PROG_TYPE_CGROUP_SOCK_ADDR tw_ipt_bind + 49464: BPF_PROG_TYPE_RAW_TRACEPOINT raw_tracepoint__sched_process_exit + +List BPF maps:: + + % sudo bpf_inspect.py map + 2577: BPF_MAP_TYPE_HASH tw_ipt_vips + 4050: BPF_MAP_TYPE_STACK_TRACE stack_traces + 4069: BPF_MAP_TYPE_PERCPU_ARRAY ned_dctcp_cntr + +Find BPF programs attached to BPF program ``test_pkt_access``:: + + % sudo bpf_inspect.py p | grep test_pkt_access + 650: BPF_PROG_TYPE_SCHED_CLS test_pkt_access + 654: BPF_PROG_TYPE_TRACING test_main linked:[650->25: BPF_TRAMP_FEXIT test_pkt_access->test_pkt_access()] + 655: BPF_PROG_TYPE_TRACING test_subprog1 linked:[650->29: BPF_TRAMP_FEXIT test_pkt_access->test_pkt_access_subprog1()] + 656: BPF_PROG_TYPE_TRACING test_subprog2 linked:[650->31: BPF_TRAMP_FEXIT test_pkt_access->test_pkt_access_subprog2()] + 657: BPF_PROG_TYPE_TRACING test_subprog3 linked:[650->21: BPF_TRAMP_FEXIT test_pkt_access->test_pkt_access_subprog3()] + 658: BPF_PROG_TYPE_EXT new_get_skb_len linked:[650->16: BPF_TRAMP_REPLACE test_pkt_access->get_skb_len()] + 659: BPF_PROG_TYPE_EXT new_get_skb_ifindex linked:[650->23: BPF_TRAMP_REPLACE test_pkt_access->get_skb_ifindex()] + 660: BPF_PROG_TYPE_EXT new_get_constant linked:[650->19: BPF_TRAMP_REPLACE test_pkt_access->get_constant()] + +It can be seen that there is a program ``test_pkt_access``, id 650 and there +are multiple other tracing and ext programs attached to functions in +``test_pkt_access``. + +For example the line:: + + 658: BPF_PROG_TYPE_EXT new_get_skb_len linked:[650->16: BPF_TRAMP_REPLACE test_pkt_access->get_skb_len()] + +, means that BPF program id 658, type ``BPF_PROG_TYPE_EXT``, name +``new_get_skb_len`` replaces (``BPF_TRAMP_REPLACE``) function ``get_skb_len()`` +that has BTF id 16 in BPF program id 650, name ``test_pkt_access``. + +Getting help: + +.. code-block:: none + + % sudo bpf_inspect.py + usage: bpf_inspect.py [-h] {prog,p,map,m} ... + + drgn script to list BPF programs or maps and their properties + unavailable via kernel API. + + See https://github.com/osandov/drgn/ for more details on drgn. + + optional arguments: + -h, --help show this help message and exit + + subcommands: + {prog,p,map,m} + prog (p) list BPF programs + map (m) list BPF maps + +Customization +============= + +The script is intended to be customized by developers to print relevant +information about BPF programs, maps and other objects. + +For example, to print ``struct bpf_prog_aux`` for BPF program id 53077: + +.. code-block:: none + + % git diff + diff --git a/tools/bpf_inspect.py b/tools/bpf_inspect.py + index 650e228..aea2357 100755 + --- a/tools/bpf_inspect.py + +++ b/tools/bpf_inspect.py + @@ -112,7 +112,9 @@ def list_bpf_progs(args): + if linked: + linked = f" linked:[{linked}]" + + - print(f"{id_:>6}: {type_:32} {name:32} {linked}") + + if id_ == 53077: + + print(f"{id_:>6}: {type_:32} {name:32}") + + print(f"{bpf_prog.aux}") + + + def list_bpf_maps(args): + +It produces the output:: + + % sudo bpf_inspect.py p + 53077: BPF_PROG_TYPE_XDP tw_xdp_policer + *(struct bpf_prog_aux *)0xffff8893fad4b400 = { + .refcnt = (atomic64_t){ + .counter = (long)58, + }, + .used_map_cnt = (u32)1, + .max_ctx_offset = (u32)8, + .max_pkt_offset = (u32)15, + .max_tp_access = (u32)0, + .stack_depth = (u32)8, + .id = (u32)53077, + .func_cnt = (u32)0, + .func_idx = (u32)0, + .attach_btf_id = (u32)0, + .linked_prog = (struct bpf_prog *)0x0, + .verifier_zext = (bool)0, + .offload_requested = (bool)0, + .attach_btf_trace = (bool)0, + .func_proto_unreliable = (bool)0, + .trampoline_prog_type = (enum bpf_tramp_prog_type)BPF_TRAMP_FENTRY, + .trampoline = (struct bpf_trampoline *)0x0, + .tramp_hlist = (struct hlist_node){ + .next = (struct hlist_node *)0x0, + .pprev = (struct hlist_node **)0x0, + }, + .attach_func_proto = (const struct btf_type *)0x0, + .attach_func_name = (const char *)0x0, + .func = (struct bpf_prog **)0x0, + .jit_data = (void *)0x0, + .poke_tab = (struct bpf_jit_poke_descriptor *)0x0, + .size_poke_tab = (u32)0, + .ksym_tnode = (struct latch_tree_node){ + .node = (struct rb_node [2]){ + { + .__rb_parent_color = (unsigned long)18446612956263126665, + .rb_right = (struct rb_node *)0x0, + .rb_left = (struct rb_node *)0xffff88a0be3d0088, + }, + { + .__rb_parent_color = (unsigned long)18446612956263126689, + .rb_right = (struct rb_node *)0x0, + .rb_left = (struct rb_node *)0xffff88a0be3d00a0, + }, + }, + }, + .ksym_lnode = (struct list_head){ + .next = (struct list_head *)0xffff88bf481830b8, + .prev = (struct list_head *)0xffff888309f536b8, + }, + .ops = (const struct bpf_prog_ops *)xdp_prog_ops+0x0 = 0xffffffff820fa350, + .used_maps = (struct bpf_map **)0xffff889ff795de98, + .prog = (struct bpf_prog *)0xffffc9000cf2d000, + .user = (struct user_struct *)root_user+0x0 = 0xffffffff82444820, + .load_time = (u64)2408348759285319, + .cgroup_storage = (struct bpf_map *[2]){}, + .name = (char [16])"tw_xdp_policer", + .security = (void *)0xffff889ff795d548, + .offload = (struct bpf_prog_offload *)0x0, + .btf = (struct btf *)0xffff8890ce6d0580, + .func_info = (struct bpf_func_info *)0xffff889ff795d240, + .func_info_aux = (struct bpf_func_info_aux *)0xffff889ff795de20, + .linfo = (struct bpf_line_info *)0xffff888a707afc00, + .jited_linfo = (void **)0xffff8893fad48600, + .func_info_cnt = (u32)1, + .nr_linfo = (u32)37, + .linfo_idx = (u32)0, + .num_exentries = (u32)0, + .extable = (struct exception_table_entry *)0xffffffffa032d950, + .stats = (struct bpf_prog_stats *)0x603fe3a1f6d0, + .work = (struct work_struct){ + .data = (atomic_long_t){ + .counter = (long)0, + }, + .entry = (struct list_head){ + .next = (struct list_head *)0x0, + .prev = (struct list_head *)0x0, + }, + .func = (work_func_t)0x0, + }, + .rcu = (struct callback_head){ + .next = (struct callback_head *)0x0, + .func = (void (*)(struct callback_head *))0x0, + }, + } + + +.. Links +.. _drgn/doc: https://drgn.readthedocs.io/en/latest/ +.. _drgn/tools: https://github.com/osandov/drgn/tree/master/tools +.. _bpf_inspect.py: + https://github.com/osandov/drgn/blob/master/tools/bpf_inspect.py diff --git a/Documentation/bpf/index.rst b/Documentation/bpf/index.rst index 4f5410b61441..f99677f3572f 100644 --- a/Documentation/bpf/index.rst +++ b/Documentation/bpf/index.rst @@ -45,14 +45,16 @@ Program types prog_cgroup_sockopt prog_cgroup_sysctl prog_flow_dissector + bpf_lsm -Testing BPF -=========== +Testing and debugging BPF +========================= .. toctree:: :maxdepth: 1 + drgn s390 diff --git a/Documentation/devicetree/bindings/net/dsa/ocelot.txt b/Documentation/devicetree/bindings/net/dsa/ocelot.txt new file mode 100644 index 000000000000..66a129fea705 --- /dev/null +++ b/Documentation/devicetree/bindings/net/dsa/ocelot.txt @@ -0,0 +1,116 @@ +Microchip Ocelot switch driver family +===================================== + +Felix +----- + +The VSC9959 core is currently the only switch supported by the driver, and is +found in the NXP LS1028A. It is a PCI device, part of the larger ENETC root +complex. As a result, the ethernet-switch node is a sub-node of the PCIe root +complex node and its "reg" property conforms to the parent node bindings: + +* reg: Specifies PCIe Device Number and Function Number of the endpoint device, + in this case for the Ethernet L2Switch it is PF5 (of device 0, bus 0). + +It does not require a "compatible" string. + +The interrupt line is used to signal availability of PTP TX timestamps and for +TSN frame preemption. + +For the external switch ports, depending on board configuration, "phy-mode" and +"phy-handle" are populated by board specific device tree instances. Ports 4 and +5 are fixed as internal ports in the NXP LS1028A instantiation. + +The CPU port property ("ethernet") configures the feature called "NPI port" in +the Ocelot hardware core. The CPU port in Ocelot is a set of queues, which are +connected, in the Node Processor Interface (NPI) mode, to an Ethernet port. +By default, in fsl-ls1028a.dtsi, the NPI port is assigned to the internal +2.5Gbps port@4, but can be moved to the 1Gbps port@5, depending on the specific +use case. Moving the NPI port to an external switch port is hardware possible, +but there is no platform support for the Linux system on the LS1028A chip to +operate as an entire slave DSA chip. NPI functionality (and therefore DSA +tagging) is supported on a single port at a time. + +Any port can be disabled (and in fsl-ls1028a.dtsi, they are indeed all disabled +by default, and should be enabled on a per-board basis). But if any external +switch port is enabled at all, the ENETC PF2 (enetc_port2) should be enabled as +well, regardless of whether it is configured as the DSA master or not. This is +because the Felix PHYLINK implementation accesses the MAC PCS registers, which +in hardware truly belong to the ENETC port #2 and not to Felix. + +Supported PHY interface types (appropriate SerDes protocol setting changes are +needed in the RCW binary): + +* phy_mode = "internal": on ports 4 and 5 +* phy_mode = "sgmii": on ports 0, 1, 2, 3 +* phy_mode = "qsgmii": on ports 0, 1, 2, 3 +* phy_mode = "usxgmii": on ports 0, 1, 2, 3 +* phy_mode = "2500base-x": on ports 0, 1, 2, 3 + +For the rest of the device tree binding definitions, which are standard DSA and +PCI, refer to the following documents: + +Documentation/devicetree/bindings/net/dsa/dsa.txt +Documentation/devicetree/bindings/pci/pci.txt + +Example: + +&soc { + pcie@1f0000000 { /* Integrated Endpoint Root Complex */ + ethernet-switch@0,5 { + reg = <0x000500 0 0 0 0>; + /* IEP INT_B */ + interrupts = <GIC_SPI 95 IRQ_TYPE_LEVEL_HIGH>; + + ports { + #address-cells = <1>; + #size-cells = <0>; + + /* External ports */ + port@0 { + reg = <0>; + label = "swp0"; + }; + + port@1 { + reg = <1>; + label = "swp1"; + }; + + port@2 { + reg = <2>; + label = "swp2"; + }; + + port@3 { + reg = <3>; + label = "swp3"; + }; + + /* Tagging CPU port */ + port@4 { + reg = <4>; + ethernet = <&enetc_port2>; + phy-mode = "internal"; + + fixed-link { + speed = <2500>; + full-duplex; + }; + }; + + /* Non-tagging CPU port */ + port@5 { + reg = <5>; + phy-mode = "internal"; + status = "disabled"; + + fixed-link { + speed = <1000>; + full-duplex; + }; + }; + }; + }; + }; +}; diff --git a/Documentation/devicetree/bindings/net/marvell,mvusb.yaml b/Documentation/devicetree/bindings/net/marvell,mvusb.yaml new file mode 100644 index 000000000000..9458f6659be1 --- /dev/null +++ b/Documentation/devicetree/bindings/net/marvell,mvusb.yaml @@ -0,0 +1,65 @@ +# SPDX-License-Identifier: GPL-2.0 +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/marvell,mvusb.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Marvell USB to MDIO Controller + +maintainers: + - Tobias Waldekranz <tobias@waldekranz.com> + +description: |+ + This controller is mounted on development boards for Marvell's Link Street + family of Ethernet switches. It allows you to configure the switch's registers + using the standard MDIO interface. + + Since the device is connected over USB, there is no strict requirement of + having a device tree representation of the device. But in order to use it with + the mv88e6xxx driver, you need a device tree node in which to place the switch + definition. + +allOf: + - $ref: "mdio.yaml#" + +properties: + compatible: + const: usb1286,1fa4 + reg: + maxItems: 1 + description: The USB port number on the host controller + +required: + - compatible + - reg + - "#address-cells" + - "#size-cells" + +examples: + - | + /* USB host controller */ + &usb1 { + mvusb: mdio@1 { + compatible = "usb1286,1fa4"; + reg = <1>; + #address-cells = <1>; + #size-cells = <0>; + }; + }; + + /* MV88E6390X devboard */ + &mvusb { + switch@0 { + compatible = "marvell,mv88e6190"; + status = "ok"; + reg = <0x0>; + + ports { + /* Port definitions */ + }; + + mdio { + /* PHY definitions */ + }; + }; + }; diff --git a/Documentation/devicetree/bindings/net/qcom,ipa.yaml b/Documentation/devicetree/bindings/net/qcom,ipa.yaml new file mode 100644 index 000000000000..140f15245654 --- /dev/null +++ b/Documentation/devicetree/bindings/net/qcom,ipa.yaml @@ -0,0 +1,198 @@ +# SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/qcom,ipa.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Qualcomm IP Accelerator (IPA) + +maintainers: + - Alex Elder <elder@kernel.org> + +description: + This binding describes the Qualcomm IPA. The IPA is capable of offloading + certain network processing tasks (e.g. filtering, routing, and NAT) from + the main processor. + + The IPA sits between multiple independent "execution environments," + including the Application Processor (AP) and the modem. The IPA presents + a Generic Software Interface (GSI) to each execution environment. + The GSI is an integral part of the IPA, but it is logically isolated + and has a distinct interrupt and a separately-defined address space. + + See also soc/qcom/qcom,smp2p.txt and interconnect/interconnect.txt. + + - | + -------- --------- + | | | | + | AP +<---. .----+ Modem | + | +--. | | .->+ | + | | | | | | | | + -------- | | | | --------- + v | v | + --+-+---+-+-- + | GSI | + |-----------| + | | + | IPA | + | | + ------------- + +properties: + compatible: + const: "qcom,sdm845-ipa" + + reg: + items: + - description: IPA registers + - description: IPA shared memory + - description: GSI registers + + reg-names: + items: + - const: ipa-reg + - const: ipa-shared + - const: gsi + + clocks: + maxItems: 1 + + clock-names: + const: core + + interrupts: + items: + - description: IPA interrupt (hardware IRQ) + - description: GSI interrupt (hardware IRQ) + - description: Modem clock query interrupt (smp2p interrupt) + - description: Modem setup ready interrupt (smp2p interrupt) + + interrupt-names: + items: + - const: ipa + - const: gsi + - const: ipa-clock-query + - const: ipa-setup-ready + + interconnects: + items: + - description: Interconnect path between IPA and main memory + - description: Interconnect path between IPA and internal memory + - description: Interconnect path between IPA and the AP subsystem + + interconnect-names: + items: + - const: memory + - const: imem + - const: config + + qcom,smem-states: + allOf: + - $ref: /schemas/types.yaml#/definitions/phandle-array + description: State bits used in by the AP to signal the modem. + items: + - description: Whether the "ipa-clock-enabled" state bit is valid + - description: Whether the IPA clock is enabled (if valid) + + qcom,smem-state-names: + allOf: + - $ref: /schemas/types.yaml#/definitions/string-array + description: The names of the state bits used for SMP2P output + items: + - const: ipa-clock-enabled-valid + - const: ipa-clock-enabled + + modem-init: + type: boolean + description: + If present, it indicates that the modem is responsible for + performing early IPA initialization, including loading and + validating firwmare used by the GSI. + + modem-remoteproc: + $ref: /schemas/types.yaml#definitions/phandle + description: + This defines the phandle to the remoteproc node representing + the modem subsystem. This is requied so the IPA driver can + receive and act on notifications of modem up/down events. + + memory-region: + $ref: /schemas/types.yaml#/definitions/phandle-array + maxItems: 1 + description: + If present, a phandle for a reserved memory area that holds + the firmware passed to Trust Zone for authentication. Required + when Trust Zone (not the modem) performs early initialization. + +required: + - compatible + - reg + - clocks + - interrupts + - interconnects + - qcom,smem-states + - modem-remoteproc + +oneOf: + - required: + - modem-init + - required: + - memory-region + +examples: + - | + #include <dt-bindings/interrupt-controller/irq.h> + #include <dt-bindings/clock/qcom,rpmh.h> + #include <dt-bindings/interconnect/qcom,sdm845.h> + + smp2p-mpss { + compatible = "qcom,smp2p"; + ipa_smp2p_out: ipa-ap-to-modem { + qcom,entry-name = "ipa"; + #qcom,smem-state-cells = <1>; + }; + + ipa_smp2p_in: ipa-modem-to-ap { + qcom,entry-name = "ipa"; + interrupt-controller; + #interrupt-cells = <2>; + }; + }; + ipa@1e40000 { + compatible = "qcom,sdm845-ipa"; + + modem-init; + modem-remoteproc = <&mss_pil>; + + reg = <0 0x1e40000 0 0x7000>, + <0 0x1e47000 0 0x2000>, + <0 0x1e04000 0 0x2c000>; + reg-names = "ipa-reg", + "ipa-shared", + "gsi"; + + interrupts-extended = <&intc 0 311 IRQ_TYPE_EDGE_RISING>, + <&intc 0 432 IRQ_TYPE_LEVEL_HIGH>, + <&ipa_smp2p_in 0 IRQ_TYPE_EDGE_RISING>, + <&ipa_smp2p_in 1 IRQ_TYPE_EDGE_RISING>; + interrupt-names = "ipa", + "gsi", + "ipa-clock-query", + "ipa-setup-ready"; + + clocks = <&rpmhcc RPMH_IPA_CLK>; + clock-names = "core"; + + interconnects = + <&rsc_hlos MASTER_IPA &rsc_hlos SLAVE_EBI1>, + <&rsc_hlos MASTER_IPA &rsc_hlos SLAVE_IMEM>, + <&rsc_hlos MASTER_APPSS_PROC &rsc_hlos SLAVE_IPA_CFG>; + interconnect-names = "memory", + "imem", + "config"; + + qcom,smem-states = <&ipa_smp2p_out 0>, + <&ipa_smp2p_out 1>; + qcom,smem-state-names = "ipa-clock-enabled-valid", + "ipa-clock-enabled"; + }; diff --git a/Documentation/devicetree/bindings/net/qcom,ipq8064-mdio.yaml b/Documentation/devicetree/bindings/net/qcom,ipq8064-mdio.yaml new file mode 100644 index 000000000000..b9f90081046f --- /dev/null +++ b/Documentation/devicetree/bindings/net/qcom,ipq8064-mdio.yaml @@ -0,0 +1,53 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/qcom,ipq8064-mdio.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Qualcomm ipq806x MDIO bus controller + +maintainers: + - Ansuel Smith <ansuelsmth@gmail.com> + +description: + The ipq806x soc have a MDIO dedicated controller that is + used to communicate with the gmac phy connected. + +allOf: + - $ref: "mdio.yaml#" + +properties: + compatible: + const: qcom,ipq8064-mdio + + reg: + maxItems: 1 + + clocks: + maxItems: 1 + +required: + - compatible + - reg + - clocks + - "#address-cells" + - "#size-cells" + +examples: + - | + #include <dt-bindings/clock/qcom,gcc-ipq806x.h> + + mdio0: mdio@37000000 { + #address-cells = <1>; + #size-cells = <0>; + + compatible = "qcom,ipq8064-mdio"; + reg = <0x37000000 0x200000>; + + clocks = <&gcc GMAC_CORE1_CLK>; + + switch@10 { + compatible = "qca,qca8337"; + /* ... */ + }; + }; diff --git a/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt index 999aceadb985..beca6466d59a 100644 --- a/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt +++ b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt @@ -31,6 +31,7 @@ Optional properties for compatible string qcom,wcn399x-bt: - max-speed: see Documentation/devicetree/bindings/serial/slave-device.txt - firmware-name: specify the name of nvm firmware to load + - clocks: clock provided to the controller Examples: @@ -57,5 +58,6 @@ serial@898000 { vddch0-supply = <&vreg_l25a_3p3>; max-speed = <3200000>; firmware-name = "crnv21.bin"; + clocks = <&rpmhcc RPMH_RF_CLK2>; }; }; diff --git a/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml b/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml new file mode 100644 index 000000000000..78bf511e2892 --- /dev/null +++ b/Documentation/devicetree/bindings/net/ti,k3-am654-cpsw-nuss.yaml @@ -0,0 +1,225 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/net/ti,k3-am654-cpsw-nuss.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: The TI AM654x/J721E SoC Gigabit Ethernet MAC (Media Access Controller) Device Tree Bindings + +maintainers: + - Grygorii Strashko <grygorii.strashko@ti.com> + - Sekhar Nori <nsekhar@ti.com> + +description: + The TI AM654x/J721E SoC Gigabit Ethernet MAC (CPSW2G NUSS) has two ports + (one external) and provides Ethernet packet communication for the device. + CPSW2G NUSS features - the Reduced Gigabit Media Independent Interface (RGMII), + Reduced Media Independent Interface (RMII), the Management Data + Input/Output (MDIO) interface for physical layer device (PHY) management, + new version of Common Platform Time Sync (CPTS), updated Address Lookup + Engine (ALE). + One external Ethernet port (port 1) with selectable RGMII/RMII interfaces and + an internal Communications Port Programming Interface (CPPI5) (Host port 0). + Host Port 0 CPPI Packet Streaming Interface interface supports 8 TX channels + and one RX channels and operating by TI AM654x/J721E NAVSS Unified DMA + Peripheral Root Complex (UDMA-P) controller. + The CPSW2G NUSS is integrated into device MCU domain named MCU_CPSW0. + + Additional features + priority level Quality Of Service (QOS) support (802.1p) + Support for Audio/Video Bridging (P802.1Qav/D6.0) + Support for IEEE 1588 Clock Synchronization (2008 Annex D, Annex E and Annex F) + Flow Control (802.3x) Support + Time Sensitive Network Support + IEEE P902.3br/D2.0 Interspersing Express Traffic + IEEE 802.1Qbv/D2.2 Enhancements for Scheduled Traffic + Configurable number of addresses plus VLANs + Configurable number of classifier/policers + VLAN support, 802.1Q compliant, Auto add port VLAN for untagged frames on + ingress, Auto VLAN removal on egress and auto pad to minimum frame size. + RX/TX csum offload + + Specifications can be found at + http://www.ti.com/lit/ug/spruid7e/spruid7e.pdf + http://www.ti.com/lit/ug/spruil1a/spruil1a.pdf + +properties: + "#address-cells": true + "#size-cells": true + + compatible: + oneOf: + - const: ti,am654-cpsw-nuss + - const: ti,j721e-cpsw-nuss + + reg: + maxItems: 1 + description: + The physical base address and size of full the CPSW2G NUSS IO range + + reg-names: + items: + - const: cpsw_nuss + + ranges: true + + dma-coherent: true + + clocks: + description: CPSW2G NUSS functional clock + + clock-names: + items: + - const: fck + + power-domains: + maxItems: 1 + + dmas: + maxItems: 9 + + dma-names: + items: + - const: tx0 + - const: tx1 + - const: tx2 + - const: tx3 + - const: tx4 + - const: tx5 + - const: tx6 + - const: tx7 + - const: rx + + ethernet-ports: + type: object + properties: + '#address-cells': + const: 1 + '#size-cells': + const: 0 + + patternProperties: + port@1: + type: object + description: CPSW2G NUSS external ports + + allOf: + - $ref: ethernet-controller.yaml# + + properties: + reg: + items: + - const: 1 + description: CPSW port number + + phys: + maxItems: 1 + description: phandle on phy-gmii-sel PHY + + label: + description: label associated with this port + + ti,mac-only: + $ref: /schemas/types.yaml#definitions/flag + description: + Specifies the port works in mac-only mode. + + ti,syscon-efuse: + $ref: /schemas/types.yaml#definitions/phandle-array + description: + Phandle to the system control device node which provides access + to efuse IO range with MAC addresses + + required: + - reg + - phys + + additionalProperties: false + +patternProperties: + "^mdio@[0-9a-f]+$": + type: object + allOf: + - $ref: "ti,davinci-mdio.yaml#" + description: + CPSW MDIO bus. + +required: + - compatible + - reg + - reg-names + - ranges + - clocks + - clock-names + - power-domains + - dmas + - dma-names + - '#address-cells' + - '#size-cells' + +additionalProperties: false + +examples: + - | + #include <dt-bindings/pinctrl/k3.h> + #include <dt-bindings/soc/ti,sci_pm_domain.h> + #include <dt-bindings/net/ti-dp83867.h> + + mcu_cpsw: ethernet@46000000 { + compatible = "ti,am654-cpsw-nuss"; + #address-cells = <2>; + #size-cells = <2>; + reg = <0x0 0x46000000 0x0 0x200000>; + reg-names = "cpsw_nuss"; + ranges = <0x0 0x0 0x46000000 0x0 0x200000>; + dma-coherent; + clocks = <&k3_clks 5 10>; + clock-names = "fck"; + power-domains = <&k3_pds 5 TI_SCI_PD_EXCLUSIVE>; + pinctrl-names = "default"; + pinctrl-0 = <&mcu_cpsw_pins_default &mcu_mdio_pins_default>; + + dmas = <&mcu_udmap 0xf000>, + <&mcu_udmap 0xf001>, + <&mcu_udmap 0xf002>, + <&mcu_udmap 0xf003>, + <&mcu_udmap 0xf004>, + <&mcu_udmap 0xf005>, + <&mcu_udmap 0xf006>, + <&mcu_udmap 0xf007>, + <&mcu_udmap 0x7000>; + dma-names = "tx0", "tx1", "tx2", "tx3", "tx4", "tx5", "tx6", "tx7", + "rx"; + + ethernet-ports { + #address-cells = <1>; + #size-cells = <0>; + + cpsw_port1: port@1 { + reg = <1>; + ti,mac-only; + label = "port1"; + ti,syscon-efuse = <&mcu_conf 0x200>; + phys = <&phy_gmii_sel 1>; + + phy-mode = "rgmii-rxid"; + phy-handle = <&phy0>; + }; + }; + + davinci_mdio: mdio@f00 { + compatible = "ti,cpsw-mdio","ti,davinci_mdio"; + reg = <0x0 0xf00 0x0 0x100>; + #address-cells = <1>; + #size-cells = <0>; + clocks = <&k3_clks 5 10>; + clock-names = "fck"; + bus_freq = <1000000>; + + phy0: ethernet-phy@0 { + reg = <0>; + ti,rx-internal-delay = <DP83867_RGMIIDCTL_2_00_NS>; + ti,fifo-depth = <DP83867_PHYCR_FIFO_DEPTH_4_B_NIB>; + }; + }; + }; diff --git a/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt b/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt index 7e675dafc256..3a76d8faaaed 100644 --- a/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt +++ b/Documentation/devicetree/bindings/net/wireless/mediatek,mt76.txt @@ -4,17 +4,27 @@ This node provides properties for configuring the MediaTek mt76xx wireless device. The node is expected to be specified as a child node of the PCI controller to which the wireless chip is connected. -Alternatively, it can specify the wireless part of the MT7628/MT7688 SoC. -For SoC, use the compatible string "mediatek,mt7628-wmac" and the following -properties: +Alternatively, it can specify the wireless part of the MT7628/MT7688 or +MT7622 SoC. For SoC, use the following compatible strings: + +compatible: +- "mediatek,mt7628-wmac" for MT7628/MT7688 +- "mediatek,mt7622-wmac" for MT7622 +properties: - reg: Address and length of the register set for the device. - interrupts: Main device interrupt +MT7622 specific properties: +- power-domains: phandle to the power domain that the WMAC is part of +- mediatek,infracfg: phandle to the infrastructure bus fabric syscon node + Optional properties: - ieee80211-freq-limit: See ieee80211.txt - mediatek,mtd-eeprom: Specify a MTD partition + offset containing EEPROM data +- big-endian: if the radio eeprom partition is written in big-endian, specify + this property The MAC address can as well be set with corresponding optional properties defined in net/ethernet.txt. @@ -31,6 +41,7 @@ Optional nodes: reg = <0x0000 0 0 0 0>; ieee80211-freq-limit = <5000000 6000000>; mediatek,mtd-eeprom = <&factory 0x8000>; + big-endian; led { led-sources = <2>; @@ -50,3 +61,15 @@ wmac: wmac@10300000 { mediatek,mtd-eeprom = <&factory 0x0000>; }; + +MT7622 example: + +wmac: wmac@18000000 { + compatible = "mediatek,mt7622-wmac"; + reg = <0 0x18000000 0 0x100000>; + interrupts = <GIC_SPI 211 IRQ_TYPE_LEVEL_LOW>; + + mediatek,infracfg = <&infracfg>; + + power-domains = <&scpsys MT7622_POWER_DOMAIN_WB>; +}; diff --git a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt index 616c87746d6f..71bf91f97386 100644 --- a/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt +++ b/Documentation/devicetree/bindings/net/wireless/qcom,ath10k.txt @@ -91,6 +91,11 @@ Optional properties: - qcom,msa-fixed-perm: Boolean context flag to disable SCM call for statically mapped msa region. +- qcom,coexist-support : should contain eithr "0" or "1" to indicate coex + support by the hardware. +- qcom,coexist-gpio-pin : gpio pin number information to support coex + which will be used by wifi firmware. + Example (to supply PCI based wifi block details): In this example, the node is defined as child node of the PCI controller. @@ -159,6 +164,8 @@ wifi0: wifi@a000000 { qcom,msi_addr = <0x0b006040>; qcom,msi_base = <0x40>; qcom,ath10k-pre-calibration-data = [ 01 02 03 ... ]; + qcom,coexist-support = <1>; + qcom,coexist-gpio-pin = <0x33>; }; Example (to supply wcn3990 SoC wifi block details): diff --git a/Documentation/devicetree/bindings/net/wireless/ti,wl1251.txt b/Documentation/devicetree/bindings/net/wireless/ti,wl1251.txt index f38950560982..88fd28d15eac 100644 --- a/Documentation/devicetree/bindings/net/wireless/ti,wl1251.txt +++ b/Documentation/devicetree/bindings/net/wireless/ti,wl1251.txt @@ -9,11 +9,12 @@ Required properties: - spi-max-frequency : Maximum SPI clocking speed of device in Hz - interrupts : Should contain interrupt line - vio-supply : phandle to regulator providing VIO -- ti,power-gpio : GPIO connected to chip's PMEN pin Optional properties: - ti,wl1251-has-eeprom : boolean, the wl1251 has an eeprom connected, which provides configuration data (calibration, MAC, ...) +- ti,power-gpio : GPIO connected to chip's PMEN pin if operated in + SPI mode - Please consult Documentation/devicetree/bindings/spi/spi-bus.txt for optional SPI connection related properties, diff --git a/Documentation/devicetree/bindings/ptp/ptp-idt82p33.yaml b/Documentation/devicetree/bindings/ptp/ptp-idt82p33.yaml new file mode 100644 index 000000000000..9bc664f414a1 --- /dev/null +++ b/Documentation/devicetree/bindings/ptp/ptp-idt82p33.yaml @@ -0,0 +1,45 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/ptp/ptp-idt82p33.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: IDT 82P33 PTP Clock Device Tree Bindings + +description: | + IDT 82P33XXX Synchronization Management Unit (SMU) based PTP clock + +maintainers: + - Min Li <min.li.xe@renesas.com> + +properties: + compatible: + enum: + - idt,82p33810 + - idt,82p33813 + - idt,82p33814 + - idt,82p33831 + - idt,82p33910 + - idt,82p33913 + - idt,82p33914 + - idt,82p33931 + + reg: + maxItems: 1 + +required: + - compatible + - reg + +additionalProperties: false + +examples: + - | + i2c { + #address-cells = <1>; + #size-cells = <0>; + phc@51 { + compatible = "idt,82p33810"; + reg = <0x51>; + }; + }; diff --git a/Documentation/networking/6lowpan.txt b/Documentation/networking/6lowpan.rst index 2e5a939d7e6f..e70a6520cc33 100644 --- a/Documentation/networking/6lowpan.txt +++ b/Documentation/networking/6lowpan.rst @@ -1,37 +1,40 @@ +.. SPDX-License-Identifier: GPL-2.0 -Netdev private dataroom for 6lowpan interfaces: +============================================== +Netdev private dataroom for 6lowpan interfaces +============================================== All 6lowpan able net devices, means all interfaces with ARPHRD_6LOWPAN, must have "struct lowpan_priv" placed at beginning of netdev_priv. -The priv_size of each interface should be calculate by: +The priv_size of each interface should be calculate by:: dev->priv_size = LOWPAN_PRIV_SIZE(LL_6LOWPAN_PRIV_DATA); Where LL_PRIV_6LOWPAN_DATA is sizeof linklayer 6lowpan private data struct. -To access the LL_PRIV_6LOWPAN_DATA structure you can cast: +To access the LL_PRIV_6LOWPAN_DATA structure you can cast:: lowpan_priv(dev)-priv; to your LL_6LOWPAN_PRIV_DATA structure. -Before registering the lowpan netdev interface you must run: +Before registering the lowpan netdev interface you must run:: lowpan_netdev_setup(dev, LOWPAN_LLTYPE_FOOBAR); wheres LOWPAN_LLTYPE_FOOBAR is a define for your 6LoWPAN linklayer type of enum lowpan_lltypes. -Example to evaluate the private usually you can do: +Example to evaluate the private usually you can do:: -static inline struct lowpan_priv_foobar * -lowpan_foobar_priv(struct net_device *dev) -{ + static inline struct lowpan_priv_foobar * + lowpan_foobar_priv(struct net_device *dev) + { return (struct lowpan_priv_foobar *)lowpan_priv(dev)->priv; -} + } -switch (dev->type) { -case ARPHRD_6LOWPAN: + switch (dev->type) { + case ARPHRD_6LOWPAN: lowpan_priv = lowpan_priv(dev); /* do great stuff which is ARPHRD_6LOWPAN related */ switch (lowpan_priv->lltype) { @@ -42,8 +45,8 @@ case ARPHRD_6LOWPAN: ... } break; -... -} + ... + } In case of generic 6lowpan branch ("net/6lowpan") you can remove the check on ARPHRD_6LOWPAN, because you can be sure that these function are called diff --git a/Documentation/networking/bareudp.rst b/Documentation/networking/bareudp.rst new file mode 100644 index 000000000000..465a8b251bfe --- /dev/null +++ b/Documentation/networking/bareudp.rst @@ -0,0 +1,52 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================================== +Bare UDP Tunnelling Module Documentation +======================================== + +There are various L3 encapsulation standards using UDP being discussed to +leverage the UDP based load balancing capability of different networks. +MPLSoUDP (__ https://tools.ietf.org/html/rfc7510) is one among them. + +The Bareudp tunnel module provides a generic L3 encapsulation tunnelling +support for tunnelling different L3 protocols like MPLS, IP, NSH etc. inside +a UDP tunnel. + +Special Handling +---------------- +The bareudp device supports special handling for MPLS & IP as they can have +multiple ethertypes. +MPLS procotcol can have ethertypes ETH_P_MPLS_UC (unicast) & ETH_P_MPLS_MC (multicast). +IP protocol can have ethertypes ETH_P_IP (v4) & ETH_P_IPV6 (v6). +This special handling can be enabled only for ethertypes ETH_P_IP & ETH_P_MPLS_UC +with a flag called multiproto mode. + +Usage +------ + +1) Device creation & deletion + + a) ip link add dev bareudp0 type bareudp dstport 6635 ethertype 0x8847. + + This creates a bareudp tunnel device which tunnels L3 traffic with ethertype + 0x8847 (MPLS traffic). The destination port of the UDP header will be set to + 6635.The device will listen on UDP port 6635 to receive traffic. + + b) ip link delete bareudp0 + +2) Device creation with multiple proto mode enabled + +There are two ways to create a bareudp device for MPLS & IP with multiproto mode +enabled. + + a) ip link add dev bareudp0 type bareudp dstport 6635 ethertype 0x8847 multiproto + + b) ip link add dev bareudp0 type bareudp dstport 6635 ethertype mpls + +3) Device Usage + +The bareudp device could be used along with OVS or flower filter in TC. +The OVS or TC flower layer must set the tunnel information in SKB dst field before +sending packet buffer to the bareudp device for transmission. On reception the +bareudp device extracts and stores the tunnel information in SKB dst field before +passing the packet buffer to the network stack. diff --git a/Documentation/networking/device_drivers/mellanox/mlx5.rst b/Documentation/networking/device_drivers/mellanox/mlx5.rst index f575a49790e8..e9b65035cd47 100644 --- a/Documentation/networking/device_drivers/mellanox/mlx5.rst +++ b/Documentation/networking/device_drivers/mellanox/mlx5.rst @@ -101,7 +101,7 @@ Enabling the driver and kconfig options **External options** ( Choose if the corresponding mlx5 feature is required ) - CONFIG_PTP_1588_CLOCK: When chosen, mlx5 ptp support will be enabled -- CONFIG_VXLAN: When chosen, mlx5 vxaln support will be enabled. +- CONFIG_VXLAN: When chosen, mlx5 vxlan support will be enabled. - CONFIG_MLXFW: When chosen, mlx5 firmware flashing support will be enabled (via devlink and ethtool). Devlink info diff --git a/Documentation/networking/device_drivers/stmicro/stmmac.rst b/Documentation/networking/device_drivers/stmicro/stmmac.rst index c34bab3d2df0..5d46e5036129 100644 --- a/Documentation/networking/device_drivers/stmicro/stmmac.rst +++ b/Documentation/networking/device_drivers/stmicro/stmmac.rst @@ -32,7 +32,8 @@ is also supported. DesignWare(R) Cores Ethernet MAC 10/100/1000 Universal version 3.70a (and older) and DesignWare(R) Cores Ethernet Quality-of-Service version 4.0 (and upper) have been used for developing this driver as well as -DesignWare(R) Cores XGMAC - 10G Ethernet MAC. +DesignWare(R) Cores XGMAC - 10G Ethernet MAC and DesignWare(R) Cores +Enterprise MAC - 100G Ethernet MAC. This driver supports both the platform bus and PCI. @@ -48,6 +49,8 @@ Cores Ethernet Controllers and corresponding minimum and maximum versions: +-------------------------------+--------------+--------------+--------------+ | XGMAC - 10G Ethernet MAC | 2.10a | N/A | XGMAC2+ | +-------------------------------+--------------+--------------+--------------+ +| XLGMAC - 100G Ethernet MAC | 2.00a | N/A | XLGMAC2+ | ++-------------------------------+--------------+--------------+--------------+ For questions related to hardware requirements, refer to the documentation supplied with your Ethernet adapter. All hardware requirements listed apply @@ -57,7 +60,7 @@ Feature List ============ The following features are available in this driver: - - GMII/MII/RGMII/SGMII/RMII/XGMII Interface + - GMII/MII/RGMII/SGMII/RMII/XGMII/XLGMII Interface - Half-Duplex / Full-Duplex Operation - Energy Efficient Ethernet (EEE) - IEEE 802.3x PAUSE Packets (Flow Control) diff --git a/Documentation/networking/devlink/bnxt.rst b/Documentation/networking/devlink/bnxt.rst index 82ef9ec46707..3dfd84ccb1c7 100644 --- a/Documentation/networking/devlink/bnxt.rst +++ b/Documentation/networking/devlink/bnxt.rst @@ -51,6 +51,9 @@ The ``bnxt_en`` driver reports the following versions * - Name - Type - Description + * - ``board.id`` + - fixed + - Part number identifying the board design * - ``asic.id`` - fixed - ASIC design identifier @@ -63,12 +66,15 @@ The ``bnxt_en`` driver reports the following versions * - ``fw`` - stored, running - Overall board firmware version - * - ``fw.app`` - - stored, running - - Data path firmware version * - ``fw.mgmt`` - stored, running - - Management firmware version + - NIC hardware resource management firmware version + * - ``fw.mgmt.api`` + - running + - Minimum firmware interface spec version supported between driver and firmware + * - ``fw.nsci`` + - stored, running + - General platform management firmware version * - ``fw.roce`` - stored, running - RoCE management firmware version diff --git a/Documentation/networking/devlink/devlink-flash.rst b/Documentation/networking/devlink/devlink-flash.rst new file mode 100644 index 000000000000..40a87c0222cb --- /dev/null +++ b/Documentation/networking/devlink/devlink-flash.rst @@ -0,0 +1,93 @@ +.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) + +.. _devlink_flash: + +============= +Devlink Flash +============= + +The ``devlink-flash`` API allows updating device firmware. It replaces the +older ``ethtool-flash`` mechanism, and doesn't require taking any +networking locks in the kernel to perform the flash update. Example use:: + + $ devlink dev flash pci/0000:05:00.0 file flash-boot.bin + +Note that the file name is a path relative to the firmware loading path +(usually ``/lib/firmware/``). Drivers may send status updates to inform +user space about the progress of the update operation. + +Firmware Loading +================ + +Devices which require firmware to operate usually store it in non-volatile +memory on the board, e.g. flash. Some devices store only basic firmware on +the board, and the driver loads the rest from disk during probing. +``devlink-info`` allows users to query firmware information (loaded +components and versions). + +In other cases the device can both store the image on the board, load from +disk, or automatically flash a new image from disk. The ``fw_load_policy`` +devlink parameter can be used to control this behavior +(:ref:`Documentation/networking/devlink/devlink-params.rst <devlink_params_generic>`). + +On-disk firmware files are usually stored in ``/lib/firmware/``. + +Firmware Version Management +=========================== + +Drivers are expected to implement ``devlink-flash`` and ``devlink-info`` +functionality, which together allow for implementing vendor-independent +automated firmware update facilities. + +``devlink-info`` exposes the ``driver`` name and three version groups +(``fixed``, ``running``, ``stored``). + +The ``driver`` attribute and ``fixed`` group identify the specific device +design, e.g. for looking up applicable firmware updates. This is why +``serial_number`` is not part of the ``fixed`` versions (even though it +is fixed) - ``fixed`` versions should identify the design, not a single +device. + +``running`` and ``stored`` firmware versions identify the firmware running +on the device, and firmware which will be activated after reboot or device +reset. + +The firmware update agent is supposed to be able to follow this simple +algorithm to update firmware contents, regardless of the device vendor: + +.. code-block:: sh + + # Get unique HW design identifier + $hw_id = devlink-dev-info['fixed'] + + # Find out which FW flash we want to use for this NIC + $want_flash_vers = some-db-backed.lookup($hw_id, 'flash') + + # Update flash if necessary + if $want_flash_vers != devlink-dev-info['stored']: + $file = some-db-backed.download($hw_id, 'flash') + devlink-dev-flash($file) + + # Find out the expected overall firmware versions + $want_fw_vers = some-db-backed.lookup($hw_id, 'all') + + # Update on-disk file if necessary + if $want_fw_vers != devlink-dev-info['running']: + $file = some-db-backed.download($hw_id, 'disk') + write($file, '/lib/firmware/') + + # Try device reset, if available + if $want_fw_vers != devlink-dev-info['running']: + devlink-reset() + + # Reboot, if reset wasn't enough + if $want_fw_vers != devlink-dev-info['running']: + reboot() + +Note that each reference to ``devlink-dev-info`` in this pseudo-code +is expected to fetch up-to-date information from the kernel. + +For the convenience of identifying firmware files some vendors add +``bundle_id`` information to the firmware versions. This meta-version covers +multiple per-component versions and can be used e.g. in firmware file names +(all component versions could get rather long.) diff --git a/Documentation/networking/devlink/devlink-info.rst b/Documentation/networking/devlink/devlink-info.rst index 70981dd1b981..3fe11401b838 100644 --- a/Documentation/networking/devlink/devlink-info.rst +++ b/Documentation/networking/devlink/devlink-info.rst @@ -5,34 +5,119 @@ Devlink Info ============ The ``devlink-info`` mechanism enables device drivers to report device -information in a generic fashion. It is extensible, and enables exporting -even device or driver specific information. +(hardware and firmware) information in a standard, extensible fashion. -devlink supports representing the following types of versions +The original motivation for the ``devlink-info`` API was twofold: -.. list-table:: List of version types + - making it possible to automate device and firmware management in a fleet + of machines in a vendor-independent fashion (see also + :ref:`Documentation/networking/devlink/devlink-flash.rst <devlink_flash>`); + - name the per component FW versions (as opposed to the crowded ethtool + version string). + +``devlink-info`` supports reporting multiple types of objects. Reporting driver +versions is generally discouraged - here, and via any other Linux API. + +.. list-table:: List of top level info objects :widths: 5 95 - * - Type + * - Name - Description + * - ``driver`` + - Name of the currently used device driver, also available through sysfs. + + * - ``serial_number`` + - Serial number of the device. + + This is usually the serial number of the ASIC, also often available + in PCI config space of the device in the *Device Serial Number* + capability. + + The serial number should be unique per physical device. + Sometimes the serial number of the device is only 48 bits long (the + length of the Ethernet MAC address), and since PCI DSN is 64 bits long + devices pad or encode additional information into the serial number. + One example is adding port ID or PCI interface ID in the extra two bytes. + Drivers should make sure to strip or normalize any such padding + or interface ID, and report only the part of the serial number + which uniquely identifies the hardware. In other words serial number + reported for two ports of the same device or on two hosts of + a multi-host device should be identical. + + .. note:: ``devlink-info`` API should be extended with a new field + if devices want to report board/product serial number (often + reported in PCI *Vital Product Data* capability). + * - ``fixed`` - - Represents fixed versions, which cannot change. For example, + - Group for hardware identifiers, and versions of components + which are not field-updatable. + + Versions in this section identify the device design. For example, component identifiers or the board version reported in the PCI VPD. + Data in ``devlink-info`` should be broken into the smallest logical + components, e.g. PCI VPD may concatenate various information + to form the Part Number string, while in ``devlink-info`` all parts + should be reported as separate items. + + This group must not contain any frequently changing identifiers, + such as serial numbers. See + :ref:`Documentation/networking/devlink/devlink-flash.rst <devlink_flash>` + to understand why. + * - ``running`` - - Represents the version of the currently running component. For - example the running version of firmware. These versions generally - only update after a reboot. + - Group for information about currently running software/firmware. + These versions often only update after a reboot, sometimes device reset. + * - ``stored`` - - Represents the version of a component as stored, such as after a - flash update. Stored values should update to reflect changes in the - flash even if a reboot has not yet occurred. + - Group for software/firmware versions in device flash. + + Stored values must update to reflect changes in the flash even + if reboot has not yet occurred. If device is not capable of updating + ``stored`` versions when new software is flashed, it must not report + them. + +Each version can be reported at most once in each version group. Firmware +components stored on the flash should feature in both the ``running`` and +``stored`` sections, if device is capable of reporting ``stored`` versions +(see :ref:`Documentation/networking/devlink/devlink-flash.rst <devlink_flash>`). +In case software/firmware components are loaded from the disk (e.g. +``/lib/firmware``) only the running version should be reported via +the kernel API. Generic Versions ================ It is expected that drivers use the following generic names for exporting -version information. Other information may be exposed using driver-specific -names, but these should be documented in the driver-specific file. +version information. If a generic name for a given component doesn't exist yet, +driver authors should consult existing driver-specific versions and attempt +reuse. As last resort, if a component is truly unique, using driver-specific +names is allowed, but these should be documented in the driver-specific file. + +All versions should try to use the following terminology: + +.. list-table:: List of common version suffixes + :widths: 10 90 + + * - Name + - Description + * - ``id``, ``revision`` + - Identifiers of designs and revision, mostly used for hardware versions. + + * - ``api`` + - Version of API between components. API items are usually of limited + value to the user, and can be inferred from other versions by the vendor, + so adding API versions is generally discouraged as noise. + + * - ``bundle_id`` + - Identifier of a distribution package which was flashed onto the device. + This is an attribute of a firmware package which covers multiple versions + for ease of managing firmware images (see + :ref:`Documentation/networking/devlink/devlink-flash.rst <devlink_flash>`). + + ``bundle_id`` can appear in both ``running`` and ``stored`` versions, + but it must not be reported if any of the components covered by the + ``bundle_id`` was changed and no longer matches the version from + the bundle. board.id -------- @@ -52,7 +137,7 @@ ASIC design identifier. asic.rev -------- -ASIC design revision. +ASIC design revision/stepping. board.manufacture ----------------- @@ -72,6 +157,12 @@ Control unit firmware version. This firmware is responsible for house keeping tasks, PHY control etc. but not the packet-by-packet data path operation. +fw.mgmt.api +----------- + +Firmware interface specification version of the software interfaces between +driver and firmware. + fw.app ------ @@ -91,10 +182,31 @@ Network Controller Sideband Interface. fw.psid ------- -Unique identifier of the firmware parameter set. +Unique identifier of the firmware parameter set. These are usually +parameters of a particular board, defined at manufacturing time. fw.roce ------- RoCE firmware version which is responsible for handling roce management. + +fw.bundle_id +------------ + +Unique identifier of the entire firmware bundle. + +Future work +=========== + +The following extensions could be useful: + + - product serial number - NIC boards often get labeled with a board serial + number rather than ASIC serial number; it'd be useful to add board serial + numbers to the API if they can be retrieved from the device; + + - on-disk firmware file names - drivers list the file names of firmware they + may need to load onto devices via the ``MODULE_FIRMWARE()`` macro. These, + however, are per module, rather than per device. It'd be useful to list + the names of firmware files the driver will try to load for a given device, + in order of priority. diff --git a/Documentation/networking/devlink/devlink-params.rst b/Documentation/networking/devlink/devlink-params.rst index da2f85c0fa21..d075fd090b3d 100644 --- a/Documentation/networking/devlink/devlink-params.rst +++ b/Documentation/networking/devlink/devlink-params.rst @@ -41,6 +41,8 @@ In order for ``driverinit`` parameters to take effect, the driver must support reloading via the ``devlink-reload`` command. This command will request a reload of the device driver. +.. _devlink_params_generic: + Generic configuration parameters ================================ The following is a list of generic configuration parameters that drivers may diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst index 8b46e8591fe0..04e04d1ff627 100644 --- a/Documentation/networking/devlink/devlink-region.rst +++ b/Documentation/networking/devlink/devlink-region.rst @@ -20,6 +20,11 @@ address regions that are otherwise inaccessible to the user. Regions may also be used to provide an additional way to debug complex error states, but see also :doc:`devlink-health` +Regions may optionally support capturing a snapshot on demand via the +``DEVLINK_CMD_REGION_NEW`` netlink message. A driver wishing to allow +requested snapshots must implement the ``.snapshot`` callback for the region +in its ``devlink_region_ops`` structure. + example usage ------------- @@ -29,8 +34,7 @@ example usage $ devlink region show [ DEV/REGION ] $ devlink region del DEV/REGION snapshot SNAPSHOT_ID $ devlink region dump DEV/REGION [ snapshot SNAPSHOT_ID ] - $ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ] - address ADDRESS length length + $ devlink region read DEV/REGION [ snapshot SNAPSHOT_ID ] address ADDRESS length length # Show all of the exposed regions with region sizes: $ devlink region show @@ -40,6 +44,9 @@ example usage # Delete a snapshot using: $ devlink region del pci/0000:00:05.0/cr-space snapshot 1 + # Request an immediate snapshot, if supported by the region + $ devlink region new pci/0000:00:05.0/cr-space snapshot 5 + # Dump a snapshot: $ devlink region dump pci/0000:00:05.0/fw-health snapshot 1 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 @@ -48,8 +55,7 @@ example usage 0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5 # Read a specific part of a snapshot: - $ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0 - length 16 + $ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0 length 16 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 As regions are likely very device or driver specific, no generic regions are diff --git a/Documentation/networking/devlink/devlink-trap.rst b/Documentation/networking/devlink/devlink-trap.rst index 47a429bb8658..a09971c2115c 100644 --- a/Documentation/networking/devlink/devlink-trap.rst +++ b/Documentation/networking/devlink/devlink-trap.rst @@ -238,6 +238,12 @@ be added to the following table: - ``drop`` - Traps NVE packets that the device decided to drop because their overlay source MAC is multicast + * - ``ingress_flow_action_drop`` + - ``drop`` + - Traps packets dropped during processing of ingress flow action drop + * - ``egress_flow_action_drop`` + - ``drop`` + - Traps packets dropped during processing of egress flow action drop Driver-specific Packet Traps ============================ @@ -277,6 +283,35 @@ narrow. The description of these groups must be added to the following table: * - ``tunnel_drops`` - Contains packet traps for packets that were dropped by the device during tunnel encapsulation / decapsulation + * - ``acl_drops`` + - Contains packet traps for packets that were dropped by the device during + ACL processing + +Packet Trap Policers +==================== + +As previously explained, the underlying device can trap certain packets to the +CPU for processing. In most cases, the underlying device is capable of handling +packet rates that are several orders of magnitude higher compared to those that +can be handled by the CPU. + +Therefore, in order to prevent the underlying device from overwhelming the CPU, +devices usually include packet trap policers that are able to police the +trapped packets to rates that can be handled by the CPU. + +The ``devlink-trap`` mechanism allows capable device drivers to register their +supported packet trap policers with ``devlink``. The device driver can choose +to associate these policers with supported packet trap groups (see +:ref:`Generic-Packet-Trap-Groups`) during its initialization, thereby exposing +its default control plane policy to user space. + +Device drivers should allow user space to change the parameters of the policers +(e.g., rate, burst size) as well as the association between the policers and +trap groups by implementing the relevant callbacks. + +If possible, device drivers should implement a callback that allows user space +to retrieve the number of packets that were dropped by the policer because its +configured policy was violated. Testing ======= diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst new file mode 100644 index 000000000000..5b58fc4e1268 --- /dev/null +++ b/Documentation/networking/devlink/ice.rst @@ -0,0 +1,96 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================== +ice devlink support +=================== + +This document describes the devlink features implemented by the ``ice`` +device driver. + +Info versions +============= + +The ``ice`` driver reports the following versions + +.. list-table:: devlink info versions implemented + :widths: 5 5 5 90 + + * - Name + - Type + - Example + - Description + * - ``board.id`` + - fixed + - K65390-000 + - The Product Board Assembly (PBA) identifier of the board. + * - ``fw.mgmt`` + - running + - 2.1.7 + - 3-digit version number of the management firmware that controls the + PHY, link, etc. + * - ``fw.mgmt.api`` + - running + - 1.5 + - 2-digit version number of the API exported over the AdminQ by the + management firmware. Used by the driver to identify what commands + are supported. + * - ``fw.mgmt.build`` + - running + - 0x305d955f + - Unique identifier of the source for the management firmware. + * - ``fw.undi`` + - running + - 1.2581.0 + - Version of the Option ROM containing the UEFI driver. The version is + reported in ``major.minor.patch`` format. The major version is + incremented whenever a major breaking change occurs, or when the + minor version would overflow. The minor version is incremented for + non-breaking changes and reset to 1 when the major version is + incremented. The patch version is normally 0 but is incremented when + a fix is delivered as a patch against an older base Option ROM. + * - ``fw.psid.api`` + - running + - 0.80 + - Version defining the format of the flash contents. + * - ``fw.bundle_id`` + - running + - 0x80002ec0 + - Unique identifier of the firmware image file that was loaded onto + the device. Also referred to as the EETRACK identifier of the NVM. + * - ``fw.app.name`` + - running + - ICE OS Default Package + - The name of the DDP package that is active in the device. The DDP + package is loaded by the driver during initialization. Each varation + of DDP package shall have a unique name. + * - ``fw.app`` + - running + - 1.3.1.0 + - The version of the DDP package that is active in the device. Note + that both the name (as reported by ``fw.app.name``) and version are + required to uniquely identify the package. + +Regions +======= + +The ``ice`` driver enables access to the contents of the Non Volatile Memory +flash chip via the ``nvm-flash`` region. + +Users can request an immediate capture of a snapshot via the +``DEVLINK_CMD_REGION_NEW`` + +.. code:: shell + + $ devlink region new pci/0000:01:00.0/nvm-flash snapshot 1 + $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1 + + $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1 + 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 + 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8 + 0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc + 0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5 + + $ devlink region read pci/0000:01:00.0/nvm-flash snapshot 1 address 0 length 16 + 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 + + $ devlink region delete pci/0000:01:00.0/nvm-flash snapshot 1 diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst index 087ff54d53fc..c536db2cc0f9 100644 --- a/Documentation/networking/devlink/index.rst +++ b/Documentation/networking/devlink/index.rst @@ -16,6 +16,7 @@ general. devlink-dpipe devlink-health devlink-info + devlink-flash devlink-params devlink-region devlink-resource @@ -32,6 +33,7 @@ parameters, info versions, and other features it supports. bnxt ionic + ice mlx4 mlx5 mlxsw diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst index 629a6e69c036..4e4b97f7971a 100644 --- a/Documentation/networking/devlink/mlx5.rst +++ b/Documentation/networking/devlink/mlx5.rst @@ -37,6 +37,12 @@ parameters. * ``smfs`` Software managed flow steering. In SMFS mode, the HW steering entities are created and manage through the driver without firmware intervention. + * - ``fdb_large_groups`` + - u32 + - driverinit + - Control the number of large groups (size > 1) in the FDB table. + + * The default value is 15, and the range is between 1 and 1024. The ``mlx5`` driver supports reloading via ``DEVLINK_CMD_RELOAD`` diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst index f1f868479ceb..567326491f80 100644 --- a/Documentation/networking/ethtool-netlink.rst +++ b/Documentation/networking/ethtool-netlink.rst @@ -189,6 +189,21 @@ Userspace to kernel: ``ETHTOOL_MSG_DEBUG_SET`` set debugging settings ``ETHTOOL_MSG_WOL_GET`` get wake-on-lan settings ``ETHTOOL_MSG_WOL_SET`` set wake-on-lan settings + ``ETHTOOL_MSG_FEATURES_GET`` get device features + ``ETHTOOL_MSG_FEATURES_SET`` set device features + ``ETHTOOL_MSG_PRIVFLAGS_GET`` get private flags + ``ETHTOOL_MSG_PRIVFLAGS_SET`` set private flags + ``ETHTOOL_MSG_RINGS_GET`` get ring sizes + ``ETHTOOL_MSG_RINGS_SET`` set ring sizes + ``ETHTOOL_MSG_CHANNELS_GET`` get channel counts + ``ETHTOOL_MSG_CHANNELS_SET`` set channel counts + ``ETHTOOL_MSG_COALESCE_GET`` get coalescing parameters + ``ETHTOOL_MSG_COALESCE_SET`` set coalescing parameters + ``ETHTOOL_MSG_PAUSE_GET`` get pause parameters + ``ETHTOOL_MSG_PAUSE_SET`` set pause parameters + ``ETHTOOL_MSG_EEE_GET`` get EEE settings + ``ETHTOOL_MSG_EEE_SET`` set EEE settings + ``ETHTOOL_MSG_TSINFO_GET`` get timestamping info ===================================== ================================ Kernel to userspace: @@ -204,6 +219,22 @@ Kernel to userspace: ``ETHTOOL_MSG_DEBUG_NTF`` debugging settings notification ``ETHTOOL_MSG_WOL_GET_REPLY`` wake-on-lan settings ``ETHTOOL_MSG_WOL_NTF`` wake-on-lan settings notification + ``ETHTOOL_MSG_FEATURES_GET_REPLY`` device features + ``ETHTOOL_MSG_FEATURES_SET_REPLY`` optional reply to FEATURES_SET + ``ETHTOOL_MSG_FEATURES_NTF`` netdev features notification + ``ETHTOOL_MSG_PRIVFLAGS_GET_REPLY`` private flags + ``ETHTOOL_MSG_PRIVFLAGS_NTF`` private flags + ``ETHTOOL_MSG_RINGS_GET_REPLY`` ring sizes + ``ETHTOOL_MSG_RINGS_NTF`` ring sizes + ``ETHTOOL_MSG_CHANNELS_GET_REPLY`` channel counts + ``ETHTOOL_MSG_CHANNELS_NTF`` channel counts + ``ETHTOOL_MSG_COALESCE_GET_REPLY`` coalescing parameters + ``ETHTOOL_MSG_COALESCE_NTF`` coalescing parameters + ``ETHTOOL_MSG_PAUSE_GET_REPLY`` pause parameters + ``ETHTOOL_MSG_PAUSE_NTF`` pause parameters + ``ETHTOOL_MSG_EEE_GET_REPLY`` EEE settings + ``ETHTOOL_MSG_EEE_NTF`` EEE settings + ``ETHTOOL_MSG_TSINFO_GET_REPLY`` timestamping info ===================================== ================================= ``GET`` requests are sent by userspace applications to retrieve device @@ -521,6 +552,410 @@ Request contents: ``WAKE_MAGICSECURE`` mode. +FEATURES_GET +============ + +Gets netdev features like ``ETHTOOL_GFEATURES`` ioctl request. + +Request contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_FEATURES_HEADER`` nested request header + ==================================== ====== ========================== + +Kernel response contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_FEATURES_HEADER`` nested reply header + ``ETHTOOL_A_FEATURES_HW`` bitset dev->hw_features + ``ETHTOOL_A_FEATURES_WANTED`` bitset dev->wanted_features + ``ETHTOOL_A_FEATURES_ACTIVE`` bitset dev->features + ``ETHTOOL_A_FEATURES_NOCHANGE`` bitset NETIF_F_NEVER_CHANGE + ==================================== ====== ========================== + +Bitmaps in kernel response have the same meaning as bitmaps used in ioctl +interference but attribute names are different (they are based on +corresponding members of struct net_device). Legacy "flags" are not provided, +if userspace needs them (most likely only ethtool for backward compatibility), +it can calculate their values from related feature bits itself. +ETHA_FEATURES_HW uses mask consisting of all features recognized by kernel (to +provide all names when using verbose bitmap format), the other three use no +mask (simple bit lists). + + +FEATURES_SET +============ + +Request to set netdev features like ``ETHTOOL_SFEATURES`` ioctl request. + +Request contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_FEATURES_HEADER`` nested request header + ``ETHTOOL_A_FEATURES_WANTED`` bitset requested features + ==================================== ====== ========================== + +Kernel response contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_FEATURES_HEADER`` nested reply header + ``ETHTOOL_A_FEATURES_WANTED`` bitset diff wanted vs. result + ``ETHTOOL_A_FEATURES_ACTIVE`` bitset diff old vs. new active + ==================================== ====== ========================== + +Request constains only one bitset which can be either value/mask pair (request +to change specific feature bits and leave the rest) or only a value (request +to set all features to specified set). + +As request is subject to netdev_change_features() sanity checks, optional +kernel reply (can be suppressed by ``ETHTOOL_FLAG_OMIT_REPLY`` flag in request +header) informs client about the actual result. ``ETHTOOL_A_FEATURES_WANTED`` +reports the difference between client request and actual result: mask consists +of bits which differ between requested features and result (dev->features +after the operation), value consists of values of these bits in the request +(i.e. negated values from resulting features). ``ETHTOOL_A_FEATURES_ACTIVE`` +reports the difference between old and new dev->features: mask consists of +bits which have changed, values are their values in new dev->features (after +the operation). + +``ETHTOOL_MSG_FEATURES_NTF`` notification is sent not only if device features +are modified using ``ETHTOOL_MSG_FEATURES_SET`` request or on of ethtool ioctl +request but also each time features are modified with netdev_update_features() +or netdev_change_features(). + + +PRIVFLAGS_GET +============= + +Gets private flags like ``ETHTOOL_GPFLAGS`` ioctl request. + +Request contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_PRIVFLAGS_HEADER`` nested request header + ==================================== ====== ========================== + +Kernel response contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_PRIVFLAGS_HEADER`` nested reply header + ``ETHTOOL_A_PRIVFLAGS_FLAGS`` bitset private flags + ==================================== ====== ========================== + +``ETHTOOL_A_PRIVFLAGS_FLAGS`` is a bitset with values of device private flags. +These flags are defined by driver, their number and names (and also meaning) +are device dependent. For compact bitset format, names can be retrieved as +``ETH_SS_PRIV_FLAGS`` string set. If verbose bitset format is requested, +response uses all private flags supported by the device as mask so that client +gets the full information without having to fetch the string set with names. + + +PRIVFLAGS_SET +============= + +Sets or modifies values of device private flags like ``ETHTOOL_SPFLAGS`` +ioctl request. + +Request contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_PRIVFLAGS_HEADER`` nested request header + ``ETHTOOL_A_PRIVFLAGS_FLAGS`` bitset private flags + ==================================== ====== ========================== + +``ETHTOOL_A_PRIVFLAGS_FLAGS`` can either set the whole set of private flags or +modify only values of some of them. + + +RINGS_GET +========= + +Gets ring sizes like ``ETHTOOL_GRINGPARAM`` ioctl request. + +Request contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_RINGS_HEADER`` nested request header + ==================================== ====== ========================== + +Kernel response contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_RINGS_HEADER`` nested reply header + ``ETHTOOL_A_RINGS_RX_MAX`` u32 max size of RX ring + ``ETHTOOL_A_RINGS_RX_MINI_MAX`` u32 max size of RX mini ring + ``ETHTOOL_A_RINGS_RX_JUMBO_MAX`` u32 max size of RX jumbo ring + ``ETHTOOL_A_RINGS_TX_MAX`` u32 max size of TX ring + ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring + ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring + ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring + ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring + ==================================== ====== ========================== + + +RINGS_SET +========= + +Sets ring sizes like ``ETHTOOL_SRINGPARAM`` ioctl request. + +Request contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_RINGS_HEADER`` nested reply header + ``ETHTOOL_A_RINGS_RX`` u32 size of RX ring + ``ETHTOOL_A_RINGS_RX_MINI`` u32 size of RX mini ring + ``ETHTOOL_A_RINGS_RX_JUMBO`` u32 size of RX jumbo ring + ``ETHTOOL_A_RINGS_TX`` u32 size of TX ring + ==================================== ====== ========================== + +Kernel checks that requested ring sizes do not exceed limits reported by +driver. Driver may impose additional constraints and may not suspport all +attributes. + + +CHANNELS_GET +============ + +Gets channel counts like ``ETHTOOL_GCHANNELS`` ioctl request. + +Request contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_CHANNELS_HEADER`` nested request header + ==================================== ====== ========================== + +Kernel response contents: + + ===================================== ====== ========================== + ``ETHTOOL_A_CHANNELS_HEADER`` nested reply header + ``ETHTOOL_A_CHANNELS_RX_MAX`` u32 max receive channels + ``ETHTOOL_A_CHANNELS_TX_MAX`` u32 max transmit channels + ``ETHTOOL_A_CHANNELS_OTHER_MAX`` u32 max other channels + ``ETHTOOL_A_CHANNELS_COMBINED_MAX`` u32 max combined channels + ``ETHTOOL_A_CHANNELS_RX_COUNT`` u32 receive channel count + ``ETHTOOL_A_CHANNELS_TX_COUNT`` u32 transmit channel count + ``ETHTOOL_A_CHANNELS_OTHER_COUNT`` u32 other channel count + ``ETHTOOL_A_CHANNELS_COMBINED_COUNT`` u32 combined channel count + ===================================== ====== ========================== + + +CHANNELS_SET +============ + +Sets channel counts like ``ETHTOOL_SCHANNELS`` ioctl request. + +Request contents: + + ===================================== ====== ========================== + ``ETHTOOL_A_CHANNELS_HEADER`` nested request header + ``ETHTOOL_A_CHANNELS_RX_COUNT`` u32 receive channel count + ``ETHTOOL_A_CHANNELS_TX_COUNT`` u32 transmit channel count + ``ETHTOOL_A_CHANNELS_OTHER_COUNT`` u32 other channel count + ``ETHTOOL_A_CHANNELS_COMBINED_COUNT`` u32 combined channel count + ===================================== ====== ========================== + +Kernel checks that requested channel counts do not exceed limits reported by +driver. Driver may impose additional constraints and may not suspport all +attributes. + + +COALESCE_GET +============ + +Gets coalescing parameters like ``ETHTOOL_GCOALESCE`` ioctl request. + +Request contents: + + ==================================== ====== ========================== + ``ETHTOOL_A_COALESCE_HEADER`` nested request header + ==================================== ====== ========================== + +Kernel response contents: + + =========================================== ====== ======================= + ``ETHTOOL_A_COALESCE_HEADER`` nested reply header + ``ETHTOOL_A_COALESCE_RX_USECS`` u32 delay (us), normal Rx + ``ETHTOOL_A_COALESCE_RX_MAX_FRAMES`` u32 max packets, normal Rx + ``ETHTOOL_A_COALESCE_RX_USECS_IRQ`` u32 delay (us), Rx in IRQ + ``ETHTOOL_A_COALESCE_RX_MAX_FRAMES_IRQ`` u32 max packets, Rx in IRQ + ``ETHTOOL_A_COALESCE_TX_USECS`` u32 delay (us), normal Tx + ``ETHTOOL_A_COALESCE_TX_MAX_FRAMES`` u32 max packets, normal Tx + ``ETHTOOL_A_COALESCE_TX_USECS_IRQ`` u32 delay (us), Tx in IRQ + ``ETHTOOL_A_COALESCE_TX_MAX_FRAMES_IRQ`` u32 IRQ packets, Tx in IRQ + ``ETHTOOL_A_COALESCE_STATS_BLOCK_USECS`` u32 delay of stats update + ``ETHTOOL_A_COALESCE_USE_ADAPTIVE_RX`` bool adaptive Rx coalesce + ``ETHTOOL_A_COALESCE_USE_ADAPTIVE_TX`` bool adaptive Tx coalesce + ``ETHTOOL_A_COALESCE_PKT_RATE_LOW`` u32 threshold for low rate + ``ETHTOOL_A_COALESCE_RX_USECS_LOW`` u32 delay (us), low Rx + ``ETHTOOL_A_COALESCE_RX_MAX_FRAMES_LOW`` u32 max packets, low Rx + ``ETHTOOL_A_COALESCE_TX_USECS_LOW`` u32 delay (us), low Tx + ``ETHTOOL_A_COALESCE_TX_MAX_FRAMES_LOW`` u32 max packets, low Tx + ``ETHTOOL_A_COALESCE_PKT_RATE_HIGH`` u32 threshold for high rate + ``ETHTOOL_A_COALESCE_RX_USECS_HIGH`` u32 delay (us), high Rx + ``ETHTOOL_A_COALESCE_RX_MAX_FRAMES_HIGH`` u32 max packets, high Rx + ``ETHTOOL_A_COALESCE_TX_USECS_HIGH`` u32 delay (us), high Tx + ``ETHTOOL_A_COALESCE_TX_MAX_FRAMES_HIGH`` u32 max packets, high Tx + ``ETHTOOL_A_COALESCE_RATE_SAMPLE_INTERVAL`` u32 rate sampling interval + =========================================== ====== ======================= + +Attributes are only included in reply if their value is not zero or the +corresponding bit in ``ethtool_ops::supported_coalesce_params`` is set (i.e. +they are declared as supported by driver). + + +COALESCE_SET +============ + +Sets coalescing parameters like ``ETHTOOL_SCOALESCE`` ioctl request. + +Request contents: + + =========================================== ====== ======================= + ``ETHTOOL_A_COALESCE_HEADER`` nested request header + ``ETHTOOL_A_COALESCE_RX_USECS`` u32 delay (us), normal Rx + ``ETHTOOL_A_COALESCE_RX_MAX_FRAMES`` u32 max packets, normal Rx + ``ETHTOOL_A_COALESCE_RX_USECS_IRQ`` u32 delay (us), Rx in IRQ + ``ETHTOOL_A_COALESCE_RX_MAX_FRAMES_IRQ`` u32 max packets, Rx in IRQ + ``ETHTOOL_A_COALESCE_TX_USECS`` u32 delay (us), normal Tx + ``ETHTOOL_A_COALESCE_TX_MAX_FRAMES`` u32 max packets, normal Tx + ``ETHTOOL_A_COALESCE_TX_USECS_IRQ`` u32 delay (us), Tx in IRQ + ``ETHTOOL_A_COALESCE_TX_MAX_FRAMES_IRQ`` u32 IRQ packets, Tx in IRQ + ``ETHTOOL_A_COALESCE_STATS_BLOCK_USECS`` u32 delay of stats update + ``ETHTOOL_A_COALESCE_USE_ADAPTIVE_RX`` bool adaptive Rx coalesce + ``ETHTOOL_A_COALESCE_USE_ADAPTIVE_TX`` bool adaptive Tx coalesce + ``ETHTOOL_A_COALESCE_PKT_RATE_LOW`` u32 threshold for low rate + ``ETHTOOL_A_COALESCE_RX_USECS_LOW`` u32 delay (us), low Rx + ``ETHTOOL_A_COALESCE_RX_MAX_FRAMES_LOW`` u32 max packets, low Rx + ``ETHTOOL_A_COALESCE_TX_USECS_LOW`` u32 delay (us), low Tx + ``ETHTOOL_A_COALESCE_TX_MAX_FRAMES_LOW`` u32 max packets, low Tx + ``ETHTOOL_A_COALESCE_PKT_RATE_HIGH`` u32 threshold for high rate + ``ETHTOOL_A_COALESCE_RX_USECS_HIGH`` u32 delay (us), high Rx + ``ETHTOOL_A_COALESCE_RX_MAX_FRAMES_HIGH`` u32 max packets, high Rx + ``ETHTOOL_A_COALESCE_TX_USECS_HIGH`` u32 delay (us), high Tx + ``ETHTOOL_A_COALESCE_TX_MAX_FRAMES_HIGH`` u32 max packets, high Tx + ``ETHTOOL_A_COALESCE_RATE_SAMPLE_INTERVAL`` u32 rate sampling interval + =========================================== ====== ======================= + +Request is rejected if it attributes declared as unsupported by driver (i.e. +such that the corresponding bit in ``ethtool_ops::supported_coalesce_params`` +is not set), regardless of their values. Driver may impose additional +constraints on coalescing parameters and their values. + + +PAUSE_GET +============ + +Gets channel counts like ``ETHTOOL_GPAUSE`` ioctl request. + +Request contents: + + ===================================== ====== ========================== + ``ETHTOOL_A_PAUSE_HEADER`` nested request header + ===================================== ====== ========================== + +Kernel response contents: + + ===================================== ====== ========================== + ``ETHTOOL_A_PAUSE_HEADER`` nested request header + ``ETHTOOL_A_PAUSE_AUTONEG`` bool pause autonegotiation + ``ETHTOOL_A_PAUSE_RX`` bool receive pause frames + ``ETHTOOL_A_PAUSE_TX`` bool transmit pause frames + ===================================== ====== ========================== + + +PAUSE_SET +============ + +Sets pause parameters like ``ETHTOOL_GPAUSEPARAM`` ioctl request. + +Request contents: + + ===================================== ====== ========================== + ``ETHTOOL_A_PAUSE_HEADER`` nested request header + ``ETHTOOL_A_PAUSE_AUTONEG`` bool pause autonegotiation + ``ETHTOOL_A_PAUSE_RX`` bool receive pause frames + ``ETHTOOL_A_PAUSE_TX`` bool transmit pause frames + ===================================== ====== ========================== + + +EEE_GET +======= + +Gets channel counts like ``ETHTOOL_GEEE`` ioctl request. + +Request contents: + + ===================================== ====== ========================== + ``ETHTOOL_A_EEE_HEADER`` nested request header + ===================================== ====== ========================== + +Kernel response contents: + + ===================================== ====== ========================== + ``ETHTOOL_A_EEE_HEADER`` nested request header + ``ETHTOOL_A_EEE_MODES_OURS`` bool supported/advertised modes + ``ETHTOOL_A_EEE_MODES_PEER`` bool peer advertised link modes + ``ETHTOOL_A_EEE_ACTIVE`` bool EEE is actively used + ``ETHTOOL_A_EEE_ENABLED`` bool EEE is enabled + ``ETHTOOL_A_EEE_TX_LPI_ENABLED`` bool Tx lpi enabled + ``ETHTOOL_A_EEE_TX_LPI_TIMER`` u32 Tx lpi timeout (in us) + ===================================== ====== ========================== + +In ``ETHTOOL_A_EEE_MODES_OURS``, mask consists of link modes for which EEE is +enabled, value of link modes for which EEE is advertised. Link modes for which +peer advertises EEE are listed in ``ETHTOOL_A_EEE_MODES_PEER`` (no mask). The +netlink interface allows reporting EEE status for all link modes but only +first 32 are provided by the ``ethtool_ops`` callback. + + +EEE_SET +======= + +Sets pause parameters like ``ETHTOOL_GEEEPARAM`` ioctl request. + +Request contents: + + ===================================== ====== ========================== + ``ETHTOOL_A_EEE_HEADER`` nested request header + ``ETHTOOL_A_EEE_MODES_OURS`` bool advertised modes + ``ETHTOOL_A_EEE_ENABLED`` bool EEE is enabled + ``ETHTOOL_A_EEE_TX_LPI_ENABLED`` bool Tx lpi enabled + ``ETHTOOL_A_EEE_TX_LPI_TIMER`` u32 Tx lpi timeout (in us) + ===================================== ====== ========================== + +``ETHTOOL_A_EEE_MODES_OURS`` is used to either list link modes to advertise +EEE for (if there is no mask) or specify changes to the list (if there is +a mask). The netlink interface allows reporting EEE status for all link modes +but only first 32 can be set at the moment as that is what the ``ethtool_ops`` +callback supports. + + +TSINFO_GET +========== + +Gets timestamping information like ``ETHTOOL_GET_TS_INFO`` ioctl request. + +Request contents: + + ===================================== ====== ========================== + ``ETHTOOL_A_TSINFO_HEADER`` nested request header + ===================================== ====== ========================== + +Kernel response contents: + + ===================================== ====== ========================== + ``ETHTOOL_A_TSINFO_HEADER`` nested request header + ``ETHTOOL_A_TSINFO_TIMESTAMPING`` bitset SO_TIMESTAMPING flags + ``ETHTOOL_A_TSINFO_TX_TYPES`` bitset supported Tx types + ``ETHTOOL_A_TSINFO_RX_FILTERS`` bitset supported Rx filters + ``ETHTOOL_A_TSINFO_PHC_INDEX`` u32 PTP hw clock index + ===================================== ====== ========================== + +``ETHTOOL_A_TSINFO_PHC_INDEX`` is absent if there is no associated PHC (there +is no special value for this case). The bitset attributes are omitted if they +would be empty (no bit set). + + Request translation =================== @@ -545,37 +980,37 @@ have their netlink replacement yet. ``ETHTOOL_GLINK`` ``ETHTOOL_MSG_LINKSTATE_GET`` ``ETHTOOL_GEEPROM`` n/a ``ETHTOOL_SEEPROM`` n/a - ``ETHTOOL_GCOALESCE`` n/a - ``ETHTOOL_SCOALESCE`` n/a - ``ETHTOOL_GRINGPARAM`` n/a - ``ETHTOOL_SRINGPARAM`` n/a - ``ETHTOOL_GPAUSEPARAM`` n/a - ``ETHTOOL_SPAUSEPARAM`` n/a - ``ETHTOOL_GRXCSUM`` n/a - ``ETHTOOL_SRXCSUM`` n/a - ``ETHTOOL_GTXCSUM`` n/a - ``ETHTOOL_STXCSUM`` n/a - ``ETHTOOL_GSG`` n/a - ``ETHTOOL_SSG`` n/a + ``ETHTOOL_GCOALESCE`` ``ETHTOOL_MSG_COALESCE_GET`` + ``ETHTOOL_SCOALESCE`` ``ETHTOOL_MSG_COALESCE_SET`` + ``ETHTOOL_GRINGPARAM`` ``ETHTOOL_MSG_RINGS_GET`` + ``ETHTOOL_SRINGPARAM`` ``ETHTOOL_MSG_RINGS_SET`` + ``ETHTOOL_GPAUSEPARAM`` ``ETHTOOL_MSG_PAUSE_GET`` + ``ETHTOOL_SPAUSEPARAM`` ``ETHTOOL_MSG_PAUSE_SET`` + ``ETHTOOL_GRXCSUM`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_SRXCSUM`` ``ETHTOOL_MSG_FEATURES_SET`` + ``ETHTOOL_GTXCSUM`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_STXCSUM`` ``ETHTOOL_MSG_FEATURES_SET`` + ``ETHTOOL_GSG`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_SSG`` ``ETHTOOL_MSG_FEATURES_SET`` ``ETHTOOL_TEST`` n/a ``ETHTOOL_GSTRINGS`` ``ETHTOOL_MSG_STRSET_GET`` ``ETHTOOL_PHYS_ID`` n/a ``ETHTOOL_GSTATS`` n/a - ``ETHTOOL_GTSO`` n/a - ``ETHTOOL_STSO`` n/a + ``ETHTOOL_GTSO`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_STSO`` ``ETHTOOL_MSG_FEATURES_SET`` ``ETHTOOL_GPERMADDR`` rtnetlink ``RTM_GETLINK`` - ``ETHTOOL_GUFO`` n/a - ``ETHTOOL_SUFO`` n/a - ``ETHTOOL_GGSO`` n/a - ``ETHTOOL_SGSO`` n/a - ``ETHTOOL_GFLAGS`` n/a - ``ETHTOOL_SFLAGS`` n/a - ``ETHTOOL_GPFLAGS`` n/a - ``ETHTOOL_SPFLAGS`` n/a + ``ETHTOOL_GUFO`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_SUFO`` ``ETHTOOL_MSG_FEATURES_SET`` + ``ETHTOOL_GGSO`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_SGSO`` ``ETHTOOL_MSG_FEATURES_SET`` + ``ETHTOOL_GFLAGS`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_SFLAGS`` ``ETHTOOL_MSG_FEATURES_SET`` + ``ETHTOOL_GPFLAGS`` ``ETHTOOL_MSG_PRIVFLAGS_GET`` + ``ETHTOOL_SPFLAGS`` ``ETHTOOL_MSG_PRIVFLAGS_SET`` ``ETHTOOL_GRXFH`` n/a ``ETHTOOL_SRXFH`` n/a - ``ETHTOOL_GGRO`` n/a - ``ETHTOOL_SGRO`` n/a + ``ETHTOOL_GGRO`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_SGRO`` ``ETHTOOL_MSG_FEATURES_SET`` ``ETHTOOL_GRXRINGS`` n/a ``ETHTOOL_GRXCLSRLCNT`` n/a ``ETHTOOL_GRXCLSRULE`` n/a @@ -589,18 +1024,18 @@ have their netlink replacement yet. ``ETHTOOL_GSSET_INFO`` ``ETHTOOL_MSG_STRSET_GET`` ``ETHTOOL_GRXFHINDIR`` n/a ``ETHTOOL_SRXFHINDIR`` n/a - ``ETHTOOL_GFEATURES`` n/a - ``ETHTOOL_SFEATURES`` n/a - ``ETHTOOL_GCHANNELS`` n/a - ``ETHTOOL_SCHANNELS`` n/a + ``ETHTOOL_GFEATURES`` ``ETHTOOL_MSG_FEATURES_GET`` + ``ETHTOOL_SFEATURES`` ``ETHTOOL_MSG_FEATURES_SET`` + ``ETHTOOL_GCHANNELS`` ``ETHTOOL_MSG_CHANNELS_GET`` + ``ETHTOOL_SCHANNELS`` ``ETHTOOL_MSG_CHANNELS_SET`` ``ETHTOOL_SET_DUMP`` n/a ``ETHTOOL_GET_DUMP_FLAG`` n/a ``ETHTOOL_GET_DUMP_DATA`` n/a - ``ETHTOOL_GET_TS_INFO`` n/a + ``ETHTOOL_GET_TS_INFO`` ``ETHTOOL_MSG_TSINFO_GET`` ``ETHTOOL_GMODULEINFO`` n/a ``ETHTOOL_GMODULEEEPROM`` n/a - ``ETHTOOL_GEEE`` n/a - ``ETHTOOL_SEEE`` n/a + ``ETHTOOL_GEEE`` ``ETHTOOL_MSG_EEE_GET`` + ``ETHTOOL_SEEE`` ``ETHTOOL_MSG_EEE_SET`` ``ETHTOOL_GRSSH`` n/a ``ETHTOOL_SRSSH`` n/a ``ETHTOOL_GTUNABLE`` n/a diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index c4a328f2d57a..2f0f8b17dade 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt @@ -606,7 +606,7 @@ before a conversion to the new layout is being done behind the scenes! Currently, the classic BPF format is being used for JITing on most 32-bit architectures, whereas x86-64, aarch64, s390x, powerpc64, -sparc64, arm32, riscv (RV64G) perform JIT compilation from eBPF +sparc64, arm32, riscv64, riscv32 perform JIT compilation from eBPF instruction set. Some core changes of the new internal format: diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index d07d9855dcd3..50133d9761c9 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -8,6 +8,7 @@ Contents: netdev-FAQ af_xdp + bareudp batman-adv can can_ucan_protocol @@ -33,6 +34,7 @@ Contents: tls tls-offload nfc + 6lowpan .. only:: subproject and html diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 5f53faff4e25..ee961d322d93 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -958,6 +958,15 @@ ip_nonlocal_bind - BOOLEAN which can be quite useful - but may break some applications. Default: 0 +ip_autobind_reuse - BOOLEAN + By default, bind() does not select the ports automatically even if + the new socket and all sockets bound to the port have SO_REUSEADDR. + ip_autobind_reuse allows bind() to reuse the port and this is useful + when you use bind()+connect(), but may break some applications. + The preferred solution is to use IP_BIND_ADDRESS_NO_PORT and this + option should only be set by experts. + Default: 0 + ip_dynaddr - BOOLEAN If set non-zero, enables support for dynamic addresses. If set to a non-zero value larger than 1, a kernel log diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst new file mode 100644 index 000000000000..43088ddf95e4 --- /dev/null +++ b/Documentation/networking/page_pool.rst @@ -0,0 +1,159 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============= +Page Pool API +============= + +The page_pool allocator is optimized for the XDP mode that uses one frame +per-page, but it can fallback on the regular page allocator APIs. + +Basic use involves replacing alloc_pages() calls with the +page_pool_alloc_pages() call. Drivers should use page_pool_dev_alloc_pages() +replacing dev_alloc_pages(). + +API keeps track of inflight pages, in order to let API user know +when it is safe to free a page_pool object. Thus, API users +must run page_pool_release_page() when a page is leaving the page_pool or +call page_pool_put_page() where appropriate in order to maintain correct +accounting. + +API user must call page_pool_put_page() once on a page, as it +will either recycle the page, or in case of refcnt > 1, it will +release the DMA mapping and inflight state accounting. + +Architecture overview +===================== + +.. code-block:: none + + +------------------+ + | Driver | + +------------------+ + ^ + | + | + | + v + +--------------------------------------------+ + | request memory | + +--------------------------------------------+ + ^ ^ + | | + | Pool empty | Pool has entries + | | + v v + +-----------------------+ +------------------------+ + | alloc (and map) pages | | get page from cache | + +-----------------------+ +------------------------+ + ^ ^ + | | + | cache available | No entries, refill + | | from ptr-ring + | | + v v + +-----------------+ +------------------+ + | Fast cache | | ptr-ring cache | + +-----------------+ +------------------+ + +API interface +============= +The number of pools created **must** match the number of hardware queues +unless hardware restrictions make that impossible. This would otherwise beat the +purpose of page pool, which is allocate pages fast from cache without locking. +This lockless guarantee naturally comes from running under a NAPI softirq. +The protection doesn't strictly have to be NAPI, any guarantee that allocating +a page will cause no race conditions is enough. + +* page_pool_create(): Create a pool. + * flags: PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV + * order: 2^order pages on allocation + * pool_size: size of the ptr_ring + * nid: preferred NUMA node for allocation + * dev: struct device. Used on DMA operations + * dma_dir: DMA direction + * max_len: max DMA sync memory size + * offset: DMA address offset + +* page_pool_put_page(): The outcome of this depends on the page refcnt. If the + driver bumps the refcnt > 1 this will unmap the page. If the page refcnt is 1 + the allocator owns the page and will try to recycle it in one of the pool + caches. If PP_FLAG_DMA_SYNC_DEV is set, the page will be synced for_device + using dma_sync_single_range_for_device(). + +* page_pool_put_full_page(): Similar to page_pool_put_page(), but will DMA sync + for the entire memory area configured in area pool->max_len. + +* page_pool_recycle_direct(): Similar to page_pool_put_full_page() but caller + must guarantee safe context (e.g NAPI), since it will recycle the page + directly into the pool fast cache. + +* page_pool_release_page(): Unmap the page (if mapped) and account for it on + inflight counters. + +* page_pool_dev_alloc_pages(): Get a page from the page allocator or page_pool + caches. + +* page_pool_get_dma_addr(): Retrieve the stored DMA address. + +* page_pool_get_dma_dir(): Retrieve the stored DMA direction. + +Coding examples +=============== + +Registration +------------ + +.. code-block:: c + + /* Page pool registration */ + struct page_pool_params pp_params = { 0 }; + struct xdp_rxq_info xdp_rxq; + int err; + + pp_params.order = 0; + /* internal DMA mapping in page_pool */ + pp_params.flags = PP_FLAG_DMA_MAP; + pp_params.pool_size = DESC_NUM; + pp_params.nid = NUMA_NO_NODE; + pp_params.dev = priv->dev; + pp_params.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE; + page_pool = page_pool_create(&pp_params); + + err = xdp_rxq_info_reg(&xdp_rxq, ndev, 0); + if (err) + goto err_out; + + err = xdp_rxq_info_reg_mem_model(&xdp_rxq, MEM_TYPE_PAGE_POOL, page_pool); + if (err) + goto err_out; + +NAPI poller +----------- + + +.. code-block:: c + + /* NAPI Rx poller */ + enum dma_data_direction dma_dir; + + dma_dir = page_pool_get_dma_dir(dring->page_pool); + while (done < budget) { + if (some error) + page_pool_recycle_direct(page_pool, page); + if (packet_is_xdp) { + if XDP_DROP: + page_pool_recycle_direct(page_pool, page); + } else (packet_is_skb) { + page_pool_release_page(page_pool, page); + new_page = page_pool_dev_alloc_pages(page_pool); + } + } + +Driver unload +------------- + +.. code-block:: c + + /* Driver unload */ + page_pool_put_full_page(page_pool, page, false); + xdp_rxq_info_unreg(&xdp_rxq); diff --git a/Documentation/networking/sfp-phylink.rst b/Documentation/networking/sfp-phylink.rst index d753a309f9d1..5aec7c8857d0 100644 --- a/Documentation/networking/sfp-phylink.rst +++ b/Documentation/networking/sfp-phylink.rst @@ -74,10 +74,13 @@ phylib to the sfp/phylink support. Please send patches to improve this documentation. 1. Optionally split the network driver's phylib update function into - three parts dealing with link-down, link-up and reconfiguring the - MAC settings. This can be done as a separate preparation commit. + two parts dealing with link-down and link-up. This can be done as + a separate preparation commit. - An example of this preparation can be found in git commit fc548b991fb0. + An older example of this preparation can be found in git commit + fc548b991fb0, although this was splitting into three parts; the + link-up part now includes configuring the MAC for the link settings. + Please see :c:func:`mac_link_up` for more information on this. 2. Replace:: @@ -135,27 +138,27 @@ this documentation. .. code-block:: c - static int foo_ethtool_set_link_ksettings(struct net_device *dev, - const struct ethtool_link_ksettings *cmd) - { - struct foo_priv *priv = netdev_priv(dev); - - return phylink_ethtool_ksettings_set(priv->phylink, cmd); - } - - static int foo_ethtool_get_link_ksettings(struct net_device *dev, - struct ethtool_link_ksettings *cmd) - { - struct foo_priv *priv = netdev_priv(dev); + static int foo_ethtool_set_link_ksettings(struct net_device *dev, + const struct ethtool_link_ksettings *cmd) + { + struct foo_priv *priv = netdev_priv(dev); + + return phylink_ethtool_ksettings_set(priv->phylink, cmd); + } - return phylink_ethtool_ksettings_get(priv->phylink, cmd); - } + static int foo_ethtool_get_link_ksettings(struct net_device *dev, + struct ethtool_link_ksettings *cmd) + { + struct foo_priv *priv = netdev_priv(dev); + + return phylink_ethtool_ksettings_get(priv->phylink, cmd); + } -7. Replace the call to: +7. Replace the call to:: phy_dev = of_phy_connect(dev, node, link_func, flags, phy_interface); - and associated code with a call to: + and associated code with a call to:: err = phylink_of_phy_connect(priv->phylink, node, flags); @@ -207,6 +210,14 @@ this documentation. using. This is particularly important for in-band negotiation methods such as 1000base-X and SGMII. + The :c:func:`mac_link_up` method is used to inform the MAC that the + link has come up. The call includes the negotiation mode and interface + for reference only. The finalised link parameters are also supplied + (speed, duplex and flow control/pause enablement settings) which + should be used to configure the MAC when the MAC and PCS are not + tightly integrated, or when the settings are not coming from in-band + negotiation. + The :c:func:`mac_config` method is used to update the MAC with the requested state, and must avoid unnecessarily taking the link down when making changes to the MAC configuration. This means the |