summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2018-10-20bpf: sk_msg program helper bpf_msg_push_dataJohn Fastabend2-1/+24
This allows user to push data into a msg using sk_msg program types. The format is as follows, bpf_msg_push_data(msg, offset, len, flags) this will insert 'len' bytes at offset 'offset'. For example to prepend 10 bytes at the front of the message the user can, bpf_msg_push_data(msg, 0, 10, 0); This will invalidate data bounds so BPF user will have to then recheck data bounds after calling this. After this the msg size will have been updated and the user is free to write into the added bytes. We allow any offset/len as long as it is within the (data, data_end) range. However, a copy will be required if the ring is full and its possible for the helper to fail with ENOMEM or EINVAL errors which need to be handled by the BPF program. This can be used similar to XDP metadata to pass data between sk_msg layer and lower layers. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-20bpf: skmsg, fix psock create on existing kcm/tls portJohn Fastabend1-5/+20
Before using the psock returned by sk_psock_get() when adding it to a sockmap we need to ensure it is actually a sockmap based psock. Previously we were only checking this after incrementing the reference counter which was an error. This resulted in a slab-out-of-bounds error when the psock was not actually a sockmap type. This moves the check up so the reference counter is only used if it is a sockmap psock. Eric reported the following KASAN BUG, BUG: KASAN: slab-out-of-bounds in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] BUG: KASAN: slab-out-of-bounds in refcount_inc_not_zero_checked+0x97/0x2f0 lib/refcount.c:120 Read of size 4 at addr ffff88019548be58 by task syz-executor4/22387 CPU: 1 PID: 22387 Comm: syz-executor4 Not tainted 4.19.0-rc7+ #264 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113 print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272 atomic_read include/asm-generic/atomic-instrumented.h:21 [inline] refcount_inc_not_zero_checked+0x97/0x2f0 lib/refcount.c:120 sk_psock_get include/linux/skmsg.h:379 [inline] sock_map_link.isra.6+0x41f/0xe30 net/core/sock_map.c:178 sock_hash_update_common+0x19b/0x11e0 net/core/sock_map.c:669 sock_hash_update_elem+0x306/0x470 net/core/sock_map.c:738 map_update_elem+0x819/0xdf0 kernel/bpf/syscall.c:818 Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-19bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKBSong Liu1-0/+21
BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the skb. This patch enables direct access of skb for these programs. Two helper functions bpf_compute_and_save_data_end() and bpf_restore_data_end() are introduced. There are used in __cgroup_bpf_run_filter_skb(), to compute proper data_end for the BPF program, and restore original data afterwards. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscallMauricio Vasquez B1-0/+1
The previous patch implemented a bpf queue/stack maps that provided the peek/pop/push functions. There is not a direct relationship between those functions and the current maps syscalls, hence a new MAP_LOOKUP_AND_DELETE_ELEM syscall is added, this is mapped to the pop operation in the queue/stack maps and it is still to implement in other kind of maps. Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19bpf: add queue and stack mapsMauricio Vasquez B3-1/+36
Queue/stack maps implement a FIFO/LIFO data storage for ebpf programs. These maps support peek, pop and push operations that are exposed to eBPF programs through the new bpf_map[peek/pop/push] helpers. Those operations are exposed to userspace applications through the already existing syscalls in the following way: BPF_MAP_LOOKUP_ELEM -> peek BPF_MAP_LOOKUP_AND_DELETE_ELEM -> pop BPF_MAP_UPDATE_ELEM -> push Queue/stack maps are implemented using a buffer, tail and head indexes, hence BPF_F_NO_PREALLOC is not supported. As opposite to other maps, queue and stack do not use RCU for protecting maps values, the bpf_map[peek/pop] have a ARG_PTR_TO_UNINIT_MAP_VALUE argument that is a pointer to a memory zone where to save the value of a map. Basically the same as ARG_PTR_TO_UNINIT_MEM, but the size has not be passed as an extra argument. Our main motivation for implementing queue/stack maps was to keep track of a pool of elements, like network ports in a SNAT, however we forsee other use cases, like for exampling saving last N kernel events in a map and then analysing from userspace. Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19bpf/verifier: add ARG_PTR_TO_UNINIT_MAP_VALUEMauricio Vasquez B1-0/+1
ARG_PTR_TO_UNINIT_MAP_VALUE argument is a pointer to a memory zone used to save the value of a map. Basically the same as ARG_PTR_TO_UNINIT_MEM, but the size has not be passed as an extra argument. This will be used in the following patch that implements some new helpers that receive a pointer to be filled with a map value. Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19bpf: rename stack trace map operationsMauricio Vasquez B1-1/+1
In the following patches queue and stack maps (FIFO and LIFO datastructures) will be implemented. In order to avoid confusion and a possible name clash rename stack_map_ops to stack_trace_map_ops Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-17bpf: fix doc of bpf_skb_adjust_room() in uapiNicolas Dichtel1-1/+1
len_diff is signed. Fixes: fa15601ab31e ("bpf: add documentation for eBPF helpers (33-41)") CC: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-17bpf: sockmap, support for msg_peek in sk_msg with redirect ingressJohn Fastabend1-1/+1
This adds support for the MSG_PEEK flag when doing redirect to ingress and receiving on the sk_msg psock queue. Previously the flag was being ignored which could confuse applications if they expected the flag to work as normal. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-17bpf: skmsg, improve sk_msg_used_element to work in cork contextJohn Fastabend1-5/+8
Currently sk_msg_used_element is only called in zerocopy context where cork is not possible and if this case happens we fallback to copy mode. However the helper is more useful if it works in all contexts. This patch resolved the case where if end == head indicating a full or empty ring the helper always reports an empty ring. To fix this add a test for the full ring case to avoid reporting a full ring has 0 elements. This additional functionality will be used in the next patches from recvmsg context where end = head with a full ring is a valid case. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-17bpf: sockmap, fix skmsg recvmsg handler to track size correctlyJohn Fastabend1-0/+1
When converting sockmap to new skmsg generic data structures we missed that the recvmsg handler did not correctly use sg.size and instead was using individual elements length. The result is if a sock is closed with outstanding data we omit the call to sk_mem_uncharge() and can get the warning below. [ 66.728282] WARNING: CPU: 6 PID: 5783 at net/core/stream.c:206 sk_stream_kill_queues+0x1fa/0x210 To fix this correct the redirect handler to xfer the size along with the scatterlist and also decrement the size from the recvmsg handler. Now when a sock is closed the remaining 'size' will be decremented with sk_mem_uncharge(). Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-16net: Enable kernel side filtering of route dumpsDavid Ahern1-1/+1
Update parsing of route dump request to enable kernel side filtering. Allow filtering results by protocol (e.g., which routing daemon installed the route), route type (e.g., unicast), table id and nexthop device. These amount to the low hanging fruit, yet a huge improvement, for dumping routes. ip_valid_fib_dump_req is called with RTNL held, so __dev_get_by_index can be used to look up the device index without taking a reference. From there filter->dev is only used during dump loops with the lock still held. Set NLM_F_DUMP_FILTERED in the answer_flags so the user knows the results have been filtered should no entries be returned. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net: Plumb support for filtering ipv4 and ipv6 multicast route dumpsDavid Ahern1-3/+4
Implement kernel side filtering of routes by egress device index and table id. If the table id is given in the filter, lookup table and call mr_table_dump directly for it. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16ipmr: Refactor mr_rtm_dumprouteDavid Ahern1-0/+6
Move per-table loops from mr_rtm_dumproute to mr_table_dump and export mr_table_dump for dumps by specific table id. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net/ipv4: Plumb support for filtering route dumpsDavid Ahern1-1/+1
Implement kernel side filtering of routes by table id, egress device index, protocol and route type. If the table id is given in the filter, lookup the table and call fib_table_dump directly for it. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net: Add struct for fib dump filterDavid Ahern2-1/+13
Add struct fib_dump_filter for options on limiting which routes are returned in a dump request. The current list is table id, protocol, route type, rtm_flags and nexthop device index. struct net is needed to lookup the net_device from the index. Declare the filter for each route dump handler and plumb the new arguments from dump handlers to ip_valid_fib_dump_req. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16netlink: Add answer_flags to netlink_callbackDavid Ahern1-0/+1
With dump filtering we need a way to ensure the NLM_F_DUMP_FILTERED flag is set on a message back to the user if the data returned is influenced by some input attributes. Normally this can be done as messages are added to the skb, but if the filter results in no data being returned, the user could be confused as to why. This patch adds answer_flags to the netlink_callback allowing dump handlers to set the NLM_F_DUMP_FILTERED at a minimum in the NLMSG_DONE message ensuring the flag gets back to the user. The netlink_callback space is initialized to 0 via a memset in __netlink_dump_start, so init of the new answer_flags is covered. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller8-62/+465
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-10-16 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Convert BPF sockmap and kTLS to both use a new sk_msg API and enable sk_msg BPF integration for the latter, from Daniel and John. 2) Enable BPF syscall side to indicate for maps that they do not support a map lookup operation as opposed to just missing key, from Prashant. 3) Add bpftool map create command which after map creation pins the map into bpf fs for further processing, from Jakub. 4) Add bpftool support for attaching programs to maps allowing sock_map and sock_hash to be used from bpftool, from John. 5) Improve syscall BPF map update/delete path for map-in-map types to wait a RCU grace period for pending references to complete, from Daniel. 6) Couple of follow-up fixes for the BPF socket lookup to get it enabled also when IPv6 is compiled as a module, from Joe. 7) Fix a generic-XDP bug to handle the case when the Ethernet header was mangled and thus update skb's protocol and data, from Jesper. 8) Add a missing BTF header length check between header copies from user space, from Wenwen. 9) Minor fixups in libbpf to use __u32 instead u32 types and include proper perf_event.h uapi header instead of perf internal one, from Yonghong. 10) Allow to pass user-defined flags through EXTRA_CFLAGS and EXTRA_LDFLAGS to bpftool's build, from Jiri. 11) BPF kselftest tweaks to add LWTUNNEL to config fragment and to install with_addr.sh script from flow dissector selftest, from Anders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15net: extend sk_pacing_rate to unsigned longEric Dumazet1-2/+2
sk_pacing_rate has beed introduced as a u32 field in 2013, effectively limiting per flow pacing to 34Gbit. We believe it is time to allow TCP to pace high speed flows on 64bit hosts, as we now can reach 100Gbit on one TCP flow. This patch adds no cost for 32bit kernels. The tcpi_pacing_rate and tcpi_max_pacing_rate were already exported as 64bit, so iproute2/ss command require no changes. Unfortunately the SO_MAX_PACING_RATE socket option will stay 32bit and we will need to add a new option to let applications control high pacing rates. State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 1787144 10.246.9.76:49992 10.246.9.77:36741 timer:(on,003ms,0) ino:91863 sk:2 <-> skmem:(r0,rb540000,t66440,tb2363904,f605944,w1822984,o0,bl0,d0) ts sack bbr wscale:8,8 rto:201 rtt:0.057/0.006 mss:1448 rcvmss:536 advmss:1448 cwnd:138 ssthresh:178 bytes_acked:256699822585 segs_out:177279177 segs_in:3916318 data_segs_out:177279175 bbr:(bw:31276.8Mbps,mrtt:0,pacing_gain:1.25,cwnd_gain:2) send 28045.5Mbps lastrcv:73333 pacing_rate 38705.0Mbps delivery_rate 22997.6Mbps busy:73333ms unacked:135 retrans:0/157 rcv_space:14480 notsent:2085120 minrtt:0.013 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15tcp: do not change tcp_wstamp_ns in tcp_mstamp_refreshEric Dumazet1-0/+1
In EDT design, I made the mistake of using tcp_wstamp_ns to store the last tcp_clock_ns() sample and to store the pacing virtual timer. This causes major regressions at high speed flows. Introduce tcp_clock_cache to store last tcp_clock_ns(). This is needed because some arches have slow high-resolution kernel time service. tcp_wstamp_ns is only updated when a packet is sent. Note that we can remove tcp_mstamp in the future since tcp_mstamp is essentially tcp_clock_cache/1000, so the apparent socket size increase is temporary. Fixes: 9799ccb0e984 ("tcp: add tcp_wstamp_ns socket field") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15net/ncsi: Extend NC-SI Netlink interface to allow user space to send NC-SI ↵Justin.Lee1@Dell.com1-0/+6
command The new command (NCSI_CMD_SEND_CMD) is added to allow user space application to send NC-SI command to the network card. Also, add a new attribute (NCSI_ATTR_DATA) for transferring request and response. The work flow is as below. Request: User space application -> Netlink interface (msg) -> new Netlink handler - ncsi_send_cmd_nl() -> ncsi_xmit_cmd() Response: Response received - ncsi_rcv_rsp() -> internal response handler - ncsi_rsp_handler_xxx() -> ncsi_rsp_handler_netlink() -> ncsi_send_netlink_rsp () -> Netlink interface (msg) -> user space application Command timeout - ncsi_request_timeout() -> ncsi_send_netlink_timeout () -> Netlink interface (msg with zero data length) -> user space application Error: Error detected -> ncsi_send_netlink_err () -> Netlink interface (err msg) -> user space application Signed-off-by: Justin Lee <justin.lee1@dell.com> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15Merge tag 'mlx5e-updates-2018-10-10' of ↵David S. Miller2-12/+32
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5e-updates-2018-10-10 IPoIB netlink support and mlx5e pre-allocated netdevice initialization IP link was broken due to the changes in IPoIB for the rdma_netdev support after commit cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks"). This patchset fixes IPoIB pkey creation and removal using rtnetlink by adding support in both IPoIB ULP layer and mlx5 layer: From Jason and Denis: 1) Introduces changes in the RDMA netdev code in order to allow allocation of the netdev to be done by the rtnl netdev code. 2) Reworks IPoIB initialization to use the two step rdma_netdev creation. From Feras and Saeed, mlx5e netdev layer refactoring to allow accepting pre-allocated netdevs: 3) Adds support to initialize/cleanup netdevs that are not created by mlx5 driver. 4) Change mlx5e netdevice layer to accept the pre-allocated netdevice queue number. 5) Initialize mlx5e generic structures in one place to be used for all netdevs types NIC/representors/IPoIB (both mlx5 allocated and pre-allocted). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15FDDI: defza: Support capturing outgoing SMT trafficMaciej W. Rozycki1-0/+1
DEC FDDIcontroller 700 (DEFZA) uses a Tx/Rx queue pair to communicate SMT frames with adapter's firmware. Any SMT frame received from the RMC via the Rx queue is queued back by the driver to the SMT Rx queue for the firmware to process. Similarly the firmware uses the SMT Tx queue to supply the driver with SMT frames which are queued back to the Tx queue for the RMC to send to the ring. When a network tap is attached to an FDDI interface handled by `defza' any incoming SMT frames captured are queued to our usual processing of network data received, which in turn delivers them to any listening taps. However the outgoing SMT frames produced by the firmware bypass our network protocol stack and are therefore not delivered to taps. This in turn means that taps are missing a part of network traffic sent by the adapter, which may make it more difficult to track down network problems or do general traffic analysis. Call `dev_queue_xmit_nit' then in the SMT Tx path, having checked that a network tap is attached, with a newly-created `dev_nit_active' helper wrapping the usual condition used in the transmit path. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15FDDI: defza: Add support for DEC FDDIcontroller 700 TURBOchannel adapterMaciej W. Rozycki1-3/+18
Add support for the DEC FDDIcontroller 700 (DEFZA), Digital Equipment Corporation's first-generation FDDI network interface adapter, made for TURBOchannel and based on a discrete version of what eventually became Motorola's widely used CAMEL chipset. The CAMEL chipset is present for example in the DEC FDDIcontroller TURBOchannel, EISA and PCI adapters (DEFTA/DEFEA/DEFPA) that we support with the `defxx' driver, however the host bus interface logic and the firmware API are different in the DEFZA and hence a separate driver is required. There isn't much to say about the driver except that it works, but there is one peculiarity to mention. The adapter implements two Tx/Rx queue pairs. Of these one pair is the usual network Tx/Rx queue pair, in this case used by the adapter to exchange frames with the ring, via the RMC (Ring Memory Controller) chip. The Tx queue is handled directly by the RMC chip and resides in onboard packet memory. The Rx queue is maintained via DMA in host memory by adapter's firmware copying received data stored by the RMC in onboard packet memory. The other pair is used to communicate SMT frames with adapter's firmware. Any SMT frame received from the RMC via the Rx queue must be queued back by the driver to the SMT Rx queue for the firmware to process. Similarly the firmware uses the SMT Tx queue to supply the driver with SMT frames that must be queued back to the Tx queue for the RMC to send to the ring. This solution was chosen because the designers ran out of PCB space and could not squeeze in more logic onto the board that would be required to handle this SMT frame traffic without the need to involve the driver, as with the later DEFTA/DEFEA/DEFPA adapters. Finally the driver does some Frame Control byte decoding, so to avoid magic numbers some macros are added to <linux/if_fddi.h>. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15bpf: Allow sk_lookup with IPv6 moduleJoe Stringer1-0/+5
This is a more complete fix than d71019b54bff ("net: core: Fix build with CONFIG_IPV6=m"), so that IPv6 sockets may be looked up if the IPv6 module is loaded (not just if it's compiled in). Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tls: add bpf support to sk_msg handlingJohn Fastabend1-2/+39
This work adds BPF sk_msg verdict program support to kTLS allowing BPF and kTLS to be combined together. Previously kTLS and sk_msg verdict programs were mutually exclusive in the ULP layer which created challenges for the orchestrator when trying to apply TCP based policy, for example. To resolve this, leveraging the work from previous patches that consolidates the use of sk_msg, we can finally enable BPF sk_msg verdict programs so they continue to run after the kTLS socket is created. No change in behavior when kTLS is not used in combination with BPF, the kselftest suite for kTLS also runs successfully. Joint work with Daniel. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tls: replace poll implementation with read hookJohn Fastabend1-4/+2
Instead of re-implementing poll routine use the poll callback to trigger read from kTLS, we reuse the stream_memory_read callback which is simpler and achieves the same. This helps to align sockmap and kTLS so we can more easily embed BPF in kTLS. Joint work with Daniel. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tls: convert to generic sk_msg interfaceDaniel Borkmann3-13/+11
Convert kTLS over to make use of sk_msg interface for plaintext and encrypted scattergather data, so it reuses all the sk_msg helpers and data structure which later on in a second step enables to glue this to BPF. This also allows to remove quite a bit of open coded helpers which are covered by the sk_msg API. Recent changes in kTLs 80ece6a03aaf ("tls: Remove redundant vars from tls record structure") and 4e6d47206c32 ("tls: Add support for inplace records encryption") changed the data path handling a bit; while we've kept the latter optimization intact, we had to undo the former change to better fit the sk_msg model, hence the sg_aead_in and sg_aead_out have been brought back and are linked into the sk_msg sgs. Now the kTLS record contains a msg_plaintext and msg_encrypted sk_msg each. In the original code, the zerocopy_from_iter() has been used out of TX but also RX path. For the strparser skb-based RX path, we've left the zerocopy_from_iter() in decrypt_internal() mostly untouched, meaning it has been moved into tls_setup_from_iter() with charging logic removed (as not used from RX). Given RX path is not based on sk_msg objects, we haven't pursued setting up a dummy sk_msg to call into sk_msg_zerocopy_from_iter(), but it could be an option to prusue in a later step. Joint work with John. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15bpf, sockmap: convert to generic sk_msg interfaceDaniel Borkmann5-44/+410
Add a generic sk_msg layer, and convert current sockmap and later kTLS over to make use of it. While sk_buff handles network packet representation from netdevice up to socket, sk_msg handles data representation from application to socket layer. This means that sk_msg framework spans across ULP users in the kernel, and enables features such as introspection or filtering of data with the help of BPF programs that operate on this data structure. Latter becomes in particular useful for kTLS where data encryption is deferred into the kernel, and as such enabling the kernel to perform L7 introspection and policy based on BPF for TLS connections where the record is being encrypted after BPF has run and came to a verdict. In order to get there, first step is to transform open coding of scatter-gather list handling into a common core framework that subsystems can use. The code itself has been split and refactored into three bigger pieces: i) the generic sk_msg API which deals with managing the scatter gather ring, providing helpers for walking and mangling, transferring application data from user space into it, and preparing it for BPF pre/post-processing, ii) the plain sock map itself where sockets can be attached to or detached from; these bits are independent of i) which can now be used also without sock map, and iii) the integration with plain TCP as one protocol to be used for processing L7 application data (later this could e.g. also be extended to other protocols like UDP). The semantics are the same with the old sock map code and therefore no change of user facing behavior or APIs. While pursuing this work it also helped finding a number of bugs in the old sockmap code that we've fixed already in earlier commits. The test_sockmap kselftest suite passes through fine as well. Joint work with John. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-15tcp, ulp: remove ulp bits from sockmapDaniel Borkmann1-1/+0
In order to prepare sockmap logic to be used in combination with kTLS we need to detangle it from ULP, and further split it in later commits into a generic API. Joint work with John. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-14Merge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.gitKalle Valo1-1/+3
ath.git patches for 4.20. Major changes: ath10k * support NET_DETECT WoWLAN feature * wcn3990 basic functionality now working after we got QMI support
2018-10-13firmware: qcom: scm: Add WLAN VMID for Qualcomm SCM interfaceGovind Singh1-1/+3
Add WLAN related VMID's to support wlan driver to set up the remote's permissions call via TrustZone. Signed-off-by: Govind Singh <govinds@codeaurora.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Niklas Cassel <niklas.cassel@linaro.org> Reviewed-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2018-10-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller15-26/+78
Conflicts were easy to resolve using immediate context mostly, except the cls_u32.c one where I simply too the entire HEAD chunk. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12netlink: replace __NLA_ENSURE implementationJohannes Berg1-1/+1
We already have BUILD_BUG_ON_ZERO() which I just hadn't found before, so we should use it here instead of open-coding another implementation thereof. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12Merge tag 'mac80211-next-for-davem-2018-10-12' of ↵David S. Miller4-3/+201
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Highlights: * merge net-next, so I can finish the hwsim workqueue removal * fix TXQ NULL pointer issue that was reported multiple times * minstrel cleanups from Felix * simplify lib80211 code by not using skcipher, note that this will conflict with the crypto tree (and this new code here should be used) * use new netlink policy validation in nl80211 * fix up SAE (part of WPA3) in client-mode * FTM responder support in the stack ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12net: bridge: add support for per-port vlan statsNikolay Aleksandrov1-0/+1
This patch adds an option to have per-port vlan stats instead of the default global stats. The option can be set only when there are no port vlans in the bridge since we need to allocate the stats if it is set when vlans are being added to ports (and respectively free them when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the largest user of options. The current stats design allows us to add these without any changes to the fast-path, it all comes down to the per-vlan stats pointer which, if this option is enabled, will be allocated for each port vlan instead of using the global bridge-wide one. CC: bridge@lists.linux-foundation.org CC: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12net: Evict neighbor entries on carrier downDavid Ahern1-0/+1
When a link's carrier goes down it could be a sign of the port changing networks. If the new network has overlapping addresses with the old one, then the kernel will continue trying to use neighbor entries established based on the old network until the entries finally age out - meaning a potentially long delay with communications not working. This patch evicts neighbor entries on carrier down with the exception of those marked permanent. Permanent entries are managed by userspace (either an admin or a routing daemon such as FRR). Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12net/ipv6: Add knob to skip DELROUTE message on device downDavid Ahern2-0/+4
Another difference between IPv4 and IPv6 is the generation of RTM_DELROUTE notifications when a device is taken down (admin down) or deleted. IPv4 does not generate a message for routes evicted by the down or delete; IPv6 does. A NOS at scale really needs to avoid these messages and have IPv4 and IPv6 behave similarly, relying on userspace to handle link notifications and evict the routes. At this point existing user behavior needs to be preserved. Since notifications are a global action (not per app) the only way to preserve existing behavior and allow the messages to be skipped is to add a new sysctl (net/ipv6/route/skip_notify_on_dev_down) which can be set to disable the notificatioons. IPv6 route code already supports the option to skip the message (it is used for multipath routes for example). Besides the new sysctl we need to pass the skip_notify setting through the generic fib6_clean and fib6_walk functions to fib6_clean_node and to set skip_notify on calls to __ip_del_rt for the addrconf_ifdown path. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12Merge tag 'armsoc-fixes-4.19' of ↵Greg Kroah-Hartman2-0/+16
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Arnd writes: "ARM: SoC fixes for 4.19 Two last minute bugfixes, both for NXP platforms: * The Layerscape 'qbman' infrastructure suffers from probe ordering bugs in some configurations, a two-patch series adds a hotfix for this. 4.20 will have a longer set of patches to rework it. * The old imx53-qsb board regressed in 4.19 after the addition of cpufreq support, adding a set of explicit operating points fixes this." * tag 'armsoc-fixes-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: soc: fsl: qman_portals: defer probe after qman's probe soc: fsl: qbman: add APIs to retrieve the probing status ARM: dts: imx53-qsb: disable 1.2GHz OPP
2018-10-12mac80211: implement ieee80211_tx_rate_update to update rateAnilkumar Kolli1-0/+15
Current mac80211 has provision to update tx status through ieee80211_tx_status() and ieee80211_tx_status_ext(). But drivers like ath10k updates the tx status from the skb except txrate, txrate will be updated from a different path, peer stats. Using ieee80211_tx_status_ext() in two different paths (one for the stats, one for the tx rate) would duplicate the stats instead. To avoid this stats duplication, ieee80211_tx_rate_update() is implemented. Signed-off-by: Anilkumar Kolli <akolli@codeaurora.org> [minor commit message editing, use initializers in code] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-12nl80211: Add per peer statistics to compute FCS error rateAnkita Bajaj2-0/+15
Add support for drivers to report the total number of MPDUs received and the number of MPDUs received with an FCS error from a specific peer. These counters will be incremented only when the TA of the frame matches the MAC address of the peer irrespective of FCS error. It should be noted that the TA field in the frame might be corrupted when there is an FCS error and TA matching logic would fail in such cases. Hence, FCS error counter might not be fully accurate, but it can provide help in detecting bad RX links in significant number of cases. This FCS error counter without full accuracy can be used, e.g., to trigger a kick-out of a connected client with a bad link in AP mode to force such a client to roam to another AP. Signed-off-by: Ankita Bajaj <bankita@codeaurora.org> Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-12Merge tag 'gpio-v4.19-4' of ↵Greg Kroah-Hartman1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Linus writes: "GPIO fix for the v4.19 series: - Fix up the interrupt parent for the irqdomains." * tag 'gpio-v4.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: Assign gpio_irq_chip::parents to non-stack pointer
2018-10-12mac80211: support FTM responder configuration/statisticsPradeep Kumar Chitrapu1-0/+28
New bss param ftm_responder is used to notify the driver to enable fine timing request (FTM) responder role in AP mode. Plumb the new cfg80211 API for FTM responder statistics through to the driver API in mac80211. Signed-off-by: David Spinadel <david.spinadel@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Pradeep Kumar Chitrapu <pradeepc@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-12Merge branch 'for-linus' of ↵Greg Kroah-Hartman1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Dmitry writes: "Input updates for v4.19-rc7 - we added a few scheduling points into various input interfaces to ensure that large writes will not cause RCU stalls - fixed configuring PS/2 keyboards as wakeup devices on newer platforms - added a new Xbox gamepad ID." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: uinput - add a schedule point in uinput_inject_events() Input: evdev - add a schedule point in evdev_write() Input: mousedev - add a schedule point in mousedev_write() Input: i8042 - enable keyboard wakeups by default when s2idle is used Input: xpad - add support for Xbox1 PDP Camo series gamepad
2018-10-12Merge tag 'next-fixes-20181012' of ↵Greg Kroah-Hartman1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/sfr/next-fixes Stephen writes: "A couple of warning fixes: Two fixes from Peter Oberparleiter <oberpar@linux.ibm.com>: Commit 6b7dca401cb1 ("tracing: Allow gcov profiling on only ftrace subsystem") uncovered linker problems when using gcov kernel profiling on some architectures. These problems were likely introduced earlier, and are possibly related to compiler changes." * tag 'next-fixes-20181012' of git://git.kernel.org/pub/scm/linux/kernel/git/sfr/next-fixes: vmlinux.lds.h: Fix linker warnings about orphan .LPBX sections vmlinux.lds.h: Fix incomplete .text.exit discards
2018-10-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman6-13/+34
David writes: "Networking 1) RXRPC receive path fixes from David Howells. 2) Re-export __skb_recv_udp(), from Jiri Kosina. 3) Fix refcounting in u32 classificer, from Al Viro. 4) Userspace netlink ABI fixes from Eugene Syromiatnikov. 5) Don't double iounmap on rmmod in ena driver, from Arthur Kiyanovski. 6) Fix devlink string attribute handling, we must pull a copy into a kernel buffer if the lifetime extends past the netlink request. From Moshe Shemesh. 7) Fix hangs in RDS, from Ka-Cheong Poon. 8) Fix recursive locking lockdep warnings in tipc, from Ying Xue. 9) Clear RX irq correctly in socionext, from Ilias Apalodimas. 10) bcm_sf2 fixes from Florian Fainelli." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits) net: dsa: bcm_sf2: Call setup during switch resume net: dsa: bcm_sf2: Fix unbind ordering net: phy: sfp: remove sfp_mutex's definition r8169: set RX_MULTI_EN bit in RxConfig for 8168F-family chips net: socionext: clear rx irq correctly net/mlx4_core: Fix warnings during boot on driverinit param set failures tipc: eliminate possible recursive locking detected by LOCKDEP selftests: udpgso_bench.sh explicitly requires bash selftests: rtnetlink.sh explicitly requires bash. qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface tipc: queue socket protocol error messages into socket receive buffer tipc: set link tolerance correctly in broadcast link net: ipv4: don't let PMTU updates increase route MTU net: ipv4: update fnhe_pmtu when first hop's MTU changes net/ipv6: stop leaking percpu memory in fib6 info rds: RDS (tcp) hangs on sendto() to unresponding address net: make skb_partial_csum_set() more robust against overflows devlink: Add helper function for safely copy string param devlink: Fix param cmode driverinit for string type devlink: Fix param set handling for string type ...
2018-10-12vmlinux.lds.h: Fix linker warnings about orphan .LPBX sectionsPeter Oberparleiter1-1/+1
Enabling both CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y and CONFIG_GCOV_PROFILE_ALL=y results in linker warnings: warning: orphan section `.data..LPBX1' being placed in section `.data..LPBX1'. LD_DEAD_CODE_DATA_ELIMINATION adds compiler flag -fdata-sections. This option causes GCC to create separate data sections for data objects, including those generated by GCC internally for gcov profiling. The names of these objects start with a dot (.LPBX0, .LPBX1), resulting in section names starting with 'data..'. As section names starting with 'data..' are used for specific purposes in the Linux kernel, the linker script does not automatically include them in the output data section, resulting in the "orphan section" linker warnings. Fix this by specifically including sections named "data..LPBX*" in the data section. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2018-10-12vmlinux.lds.h: Fix incomplete .text.exit discardsPeter Oberparleiter1-2/+2
Enabling CONFIG_GCOV_PROFILE_ALL=y causes linker errors on ARM: `.text.exit' referenced in section `.ARM.exidx.text.exit': defined in discarded section `.text.exit' `.text.exit' referenced in section `.fini_array.00100': defined in discarded section `.text.exit' And related errors on NDS32: `.text.exit' referenced in section `.dtors.65435': defined in discarded section `.text.exit' The gcov compiler flags cause certain compiler versions to generate additional destructor-related sections that are not yet handled by the linker script, resulting in references between discarded and non-discarded sections. Since destructors are not used in the Linux kernel, fix this by discarding these additional sections. Reported-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Greentime Hu <green.hu@gmail.com> Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2018-10-11Merge branch 'for-4.19-fixes' of ↵Greg Kroah-Hartman1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Tejun writes: "cgroup fixes for v4.19-rc7 One cgroup2 threaded mode fix for v4.19-rc7. While threaded mode isn't used widely (yet) and the bug requires somewhat convoluted sequence of operations, it causes a userland visible malfunction - EINVAL on a valid attempt to enable threaded mode. This pull request contains the fix" * 'for-4.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Fix dom_cgrp propagation when enabling threaded mode
2018-10-11mac80211: minstrel: Enable STBC and LDPC for VHT RatesChaitanya T K1-0/+1
If peer support reception of STBC and LDPC, enable them for better performance. Signed-off-by: Chaitanya TK <chaitanya.mgit@gmail.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>