summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2018-10-16netfilter: xt_osf: simplify xt_osf_match_packet()Pablo Neira Ayuso1-7/+1
info area in match is always available, and remove unneeded variables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-10-16netfilter: nft_xfrm: use state family, not hook oneFlorian Westphal1-5/+6
Eyal says: doesn't the use of nft_pf(pkt) in this context limit the matching of encapsulated packets to the same family? IIUC when an e.g. IPv6-in-IPv4 packet is matched, the nft_pf(pkt) will be the decapsulated packet family - IPv6 - whereas the state may be IPv4. So this check would not allow matching the 'underlay' address in such cases. I know this was a limitation in xt_policy. but is this intentional in this matcher? or is it possible to use state->props.family when validating the match instead of nft_pf(pkt)? Userspace already tells us which address family it expects to match, so we can just use the real state family rather than the hook family. so change it as suggested above. Reported-by: Eyal Birger <eyal.birger@gmail.com> Suggested-by: Eyal Birger <eyal.birger@gmail.com> Fixes: 6c47260250fc6 ("netfilter: nf_tables: add xfrm expression") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-10-16netfilter: nft_osf: Add ttl option supportFernando Fernandez Mancera2-26/+35
Add ttl option support to the nftables "osf" expression. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-10-16netfilter: cttimeout: remove set but not used variable 'l3num'YueHaibing1-2/+0
Fixes gcc '-Wunused-but-set-variable' warning: net/netfilter/nfnetlink_cttimeout.c: In function 'cttimeout_default_set': net/netfilter/nfnetlink_cttimeout.c:353:8: warning: variable 'l3num' set but not used [-Wunused-but-set-variable] It not used any more after commit dd2934a95701 ("netfilter: conntrack: remove l3->l4 mapping information") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-10-16netfilter: Replace spin_is_locked() with lockdepLance Roy1-1/+1
lockdep_assert_held() is better suited to checking locking requirements, since it won't get confused when someone else holds the lock. This is also a step towards possibly removing spin_is_locked(). Signed-off-by: Lance Roy <ldr709@gmail.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: <netfilter-devel@vger.kernel.org> Cc: <coreteam@netfilter.org> Cc: <netdev@vger.kernel.org> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller6-72/+278
Alexei Starovoitov says: ==================== pull-request: bpf-next 2018-10-08 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow BPF programs to perform lookups for sockets in a network namespace. This would allow programs to determine early on in processing whether the stack is expecting to receive the packet, and perform some action (eg drop, forward somewhere) based on this information. 2) per-cpu cgroup local storage from Roman Gushchin. Per-cpu cgroup local storage is very similar to simple cgroup storage except all the data is per-cpu. The main goal of per-cpu variant is to implement super fast counters (e.g. packet counters), which don't require neither lookups, neither atomic operations in a fast path. The example of these hybrid counters is in selftests/bpf/netcnt_prog.c 3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet 4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski 5) rename of libbpf interfaces from Andrey Ignatov. libbpf is maturing as a library and should follow good practices in library design and implementation to play well with other libraries. This patch set brings consistent naming convention to global symbols. 6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov to let Apache2 projects use libbpf 7) various AF_XDP fixes from Björn and Magnus ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller43-1003/+1452
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree: 1) Support for matching on ipsec policy already set in the route, from Florian Westphal. 2) Split set destruction into deactivate and destroy phase to make it fit better into the transaction infrastructure, also from Florian. This includes a patch to warn on imbalance when setting the new activate and deactivate interfaces. 3) Release transaction list from the workqueue to remove expensive synchronize_rcu() from configuration plane path. This speeds up configuration plane quite a bit. From Florian Westphal. 4) Add new xfrm/ipsec extension, this new extension allows you to match for ipsec tunnel keys such as source and destination address, spi and reqid. From Máté Eckl and Florian Westphal. 5) Add secmark support, this includes connsecmark too, patches from Christian Gottsche. 6) Allow to specify remaining bytes in xt_quota, from Chenbo Feng. One follow up patch to calm a clang warning for this one, from Nathan Chancellor. 7) Flush conntrack entries based on layer 3 family, from Kristian Evensen. 8) New revision for cgroups2 to shrink the path field. 9) Get rid of obsolete need_conntrack(), as a result from recent demodularization works. 10) Use WARN_ON instead of BUG_ON, from Florian Westphal. 11) Unused exported symbol in nf_nat_ipv4_fn(), from Florian. 12) Remove superfluous check for timeout netlink parser and dump functions in layer 4 conntrack helpers. 13) Unnecessary redundant rcu read side locks in NAT redirect, from Taehee Yoo. 14) Pass nf_hook_state structure to error handlers, patch from Florian Westphal. 15) Remove ->new() interface from layer 4 protocol trackers. Place them in the ->packet() interface. From Florian. 16) Place conntrack ->error() handling in the ->packet() interface. Patches from Florian Westphal. 17) Remove unused parameter in the pernet initialization path, also from Florian. 18) Remove additional parameter to specify layer 3 protocol when looking up for protocol tracker. From Florian. 19) Shrink array of layer 4 protocol trackers, from Florian. 20) Check for linear skb only once from the ALG NAT mangling codebase, from Taehee Yoo. 21) Use rhashtable_walk_enter() instead of deprecated rhashtable_walk_init(), also from Taehee. 22) No need to flush all conntracks when only one single address is gone, from Tan Hu. 23) Remove redundant check for NAT flags in flowtable code, from Taehee Yoo. 24) Use rhashtable_lookup() instead of rhashtable_lookup_fast() from netfilter codebase, since rcu read lock side is already assumed in this path. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-09bpf: fix building without CONFIG_INETArnd Bergmann1-2/+8
The newly added TCP and UDP handling fails to link when CONFIG_INET is disabled: net/core/filter.o: In function `sk_lookup': filter.c:(.text+0x7ff8): undefined reference to `tcp_hashinfo' filter.c:(.text+0x7ffc): undefined reference to `tcp_hashinfo' filter.c:(.text+0x8020): undefined reference to `__inet_lookup_established' filter.c:(.text+0x8058): undefined reference to `__inet_lookup_listener' filter.c:(.text+0x8068): undefined reference to `udp_table' filter.c:(.text+0x8070): undefined reference to `udp_table' filter.c:(.text+0x808c): undefined reference to `__udp4_lib_lookup' net/core/filter.o: In function `bpf_sk_release': filter.c:(.text+0x82e8): undefined reference to `sock_gen_put' Wrap the related sections of code in #ifdefs for the config option. Furthermore, sk_lookup() should always have been marked 'static', this also avoids a warning about a missing prototype when building with 'make W=1'. Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-09netfilter: xt_quota: Don't use aligned attribute in sizeofNathan Chancellor1-1/+1
Clang warns: net/netfilter/xt_quota.c:47:44: warning: 'aligned' attribute ignored when parsing type [-Wignored-attributes] BUILD_BUG_ON(sizeof(atomic64_t) != sizeof(__aligned_u64)); ^~~~~~~~~~~~~ Use 'sizeof(__u64)' instead, as the alignment doesn't affect the size of the type. Fixes: e9837e55b020 ("netfilter: xt_quota: fix the behavior of xt_quota module") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-10-08rtnetlink: Update rtnl_fdb_dump for strict data checkingDavid Ahern1-2/+60
Update rtnl_fdb_dump for strict data checking. If the flag is set, the dump request is expected to have an ndmsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values supported by the dump handler are allowed to be non-0 or set in the request. At the moment only the NDA_IFINDEX and NDA_MASTER attributes are supported. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08rtnetlink: Move input checking for rtnl_fdb_dump to helperDavid Ahern1-20/+33
Move the existing input checking for rtnl_fdb_dump into a helper, valid_fdb_dump_legacy. This function will retain the current logic that works around the 2 headers that userspace has been allowed to send up to this point. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net/bridge: Update br_mdb_dump for strict data checkingDavid Ahern1-0/+30
Update br_mdb_dump for strict data checking. If the flag is set, the dump request is expected to have a br_port_msg struct as the header. All elements of the struct are expected to be 0 and no attributes can be appended. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: Update netconf dump handlers for strict data checkingDavid Ahern3-7/+55
Update inet_netconf_dump_devconf, inet6_netconf_dump_devconf, and mpls_netconf_dump_devconf for strict data checking. If the flag is set, the dump request is expected to have an netconfmsg struct as the header. The struct only has the family member and no attributes can be appended. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net/ipv6: Update ip6addrlbl_dump for strict data checkingDavid Ahern1-1/+33
Update ip6addrlbl_dump for strict data checking. If the flag is set, the dump request is expected to have an ifaddrlblmsg struct as the header. All elements of the struct are expected to be 0 and no attributes can be appended. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net/fib_rules: Update fib_nl_dumprule for strict data checkingDavid Ahern1-1/+35
Update fib_nl_dumprule for strict data checking. If the flag is set, the dump request is expected to have fib_rule_hdr struct as the header. All elements of the struct are expected to be 0 and no attributes can be appended. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net/namespace: Update rtnl_net_dumpid for strict data checkingDavid Ahern1-0/+6
Update rtnl_net_dumpid for strict data checking. If the flag is set, the dump request is expected to have an rtgenmsg struct as the header which has the family as the only element. No data may be appended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net/neighbor: Update neightbl_dump_info for strict data checkingDavid Ahern1-3/+35
Update neightbl_dump_info for strict data checking. If the flag is set, the dump request is expected to have an ndtmsg struct as the header. All elements of the struct are expected to be 0 and no attributes can be appended. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net/neighbor: Update neigh_dump_info for strict data checkingDavid Ahern1-15/+67
Update neigh_dump_info for strict data checking. If the flag is set, the dump request is expected to have an ndmsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values supported by the dump handler are allowed to be non-0 or set in the request. At the moment only the NDA_IFINDEX and NDA_MASTER attributes are supported. Existing code does not fail the dump if nlmsg_parse fails. That behavior is kept for non-strict checking. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08rtnetlink: Update fib dumps for strict data checkingDavid Ahern5-2/+72
Add helper to check netlink message for route dumps. If the strict flag is set the dump request is expected to have an rtmsg struct as the header. All elements of the struct are expected to be 0 with the exception of rtm_flags (which is used by both ipv4 and ipv6 dumps) and no attributes can be appended. rtm_flags can only have RTM_F_CLONED and RTM_F_PREFIX set. Update inet_dump_fib, inet6_dump_fib, mpls_dump_routes, ipmr_rtm_dumproute, and ip6mr_rtm_dumproute to call this helper if strict data checking is enabled. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08rtnetlink: Update ipmr_rtm_dumplink for strict data checkingDavid Ahern1-0/+32
Update ipmr_rtm_dumplink for strict data checking. If the flag is set, the dump request is expected to have an ifinfomsg struct as the header. All elements of the struct are expected to be 0 and no attributes can be appended. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08rtnetlink: Update inet6_dump_ifinfo for strict data checkingDavid Ahern1-0/+35
Update inet6_dump_ifinfo for strict data checking. If the flag is set, the dump request is expected to have an ifinfomsg struct as the header. All elements of the struct are expected to be 0 and no attributes can be appended. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08rtnetlink: Update rtnl_stats_dump for strict data checkingDavid Ahern1-2/+22
Update rtnl_stats_dump for strict data checking. If the flag is set, the dump request is expected to have an if_stats_msg struct as the header. All elements of the struct are expected to be 0 except filter_mask which must be non-0 (legacy behavior). No attributes are supported. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08rtnetlink: Update rtnl_bridge_getlink for strict data checkingDavid Ahern1-13/+57
Update rtnl_bridge_getlink for strict data checking. If the flag is set, the dump request is expected to have an ifinfomsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values supported by the dump handler are allowed to be non-0 or set in the request. At the moment only the IFLA_EXT_MASK attribute is supported. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08rtnetlink: Update rtnl_dump_ifinfo for strict data checkingDavid Ahern1-30/+83
Update rtnl_dump_ifinfo for strict data checking. If the flag is set, the dump request is expected to have an ifinfomsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values supported by the dump handler are allowed to be non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID, IFLA_EXT_MASK, IFLA_MASTER, and IFLA_LINKINFO attributes are supported. Existing code does not fail the dump if nlmsg_parse fails. That behavior is kept for non-strict checking. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net/ipv6: Update inet6_dump_addr for strict data checkingDavid Ahern1-10/+59
Update inet6_dump_addr for strict data checking. If the flag is set, the dump request is expected to have an ifaddrmsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values suppored by the dump handler are allowed to be non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID attribute is supported. Follow on patches can add support for other fields (e.g., honor ifa_index and only return data for the given device index). Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net/ipv4: Update inet_dump_ifaddr for strict data checkingDavid Ahern1-11/+61
Update inet_dump_ifaddr for strict data checking. If the flag is set, the dump request is expected to have an ifaddrmsg struct as the header potentially followed by one or more attributes. Any data passed in the header or as an attribute is taken as a request to influence the data returned. Only values supported by the dump handler are allowed to be non-0 or set in the request. At the moment only the IFA_TARGET_NETNSID attribute is supported. Follow on patches can support for other fields (e.g., honor ifa_index and only return data for the given device index). Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08netlink: Add new socket option to enable strict checking on dumpsDavid Ahern2-1/+21
Add a new socket option, NETLINK_DUMP_STRICT_CHK, that userspace can use via setsockopt to request strict checking of headers and attributes on dump requests. To get dump features such as kernel side filtering based on data in the header or attributes appended to the dump request, userspace must call setsockopt() for NETLINK_DUMP_STRICT_CHK and a non-zero value. Since the netlink sock and its flags are private to the af_netlink code, the strict checking flag is passed to dump handlers via a flag in the netlink_callback struct. For old userspace on new kernel there is no impact as all of the data checks in later patches are wrapped in a check on the new strict flag. For new userspace on old kernel, the setsockopt will fail and even if new userspace sets data in the headers and appended attributes the kernel will silently ignore it. Moving forward when the setsockopt succeeds, the new userspace on old kernel means the dump request can pass an attribute the kernel does not understand. The dump will then fail as the older kernel does not understand it. New userspace on new kernel setting the socket option gets the benefit of the improved data dump. Kernel side the NETLINK_DUMP_STRICT_CHK uapi is converted to a generic NETLINK_F_STRICT_CHK flag which can potentially be leveraged for tighter checking on the NEW, DEL, and SET commands. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net/ipv6: Refactor address dump to push inet6_fill_args to in6_dump_addrsDavid Ahern1-27/+30
Pull the inet6_fill_args arg up to in6_dump_addrs and move netnsid into it. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: Add extack to nlmsg_parseDavid Ahern12-17/+21
Make sure extack is passed to nlmsg_parse where easy to do so. Most of these are dump handlers and leveraging the extack in the netlink_callback. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08netlink: Pass extack to dump handlersDavid Ahern1-1/+11
Declare extack in netlink_dump and pass to dump handlers via netlink_callback. Add any extack message after the dump_done_errno allowing error messages to be returned. This will be useful when strict checking is done on dump requests, returning why the dump fails EINVAL. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: sched: cls_u32: simplify the hell out u32_delete() emptiness checkAl Viro1-47/+1
Now that we have the knode count, we can instantly check if any hnodes are non-empty. And that kills the check for extra references to root hnode - those could happen only if there was a knode to carry such a link. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: sched: cls_u32: keep track of knodes count in tc_u_commonAl Viro1-0/+6
allows to simplify u32_delete() considerably Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: sched: cls_u32: get rid of tp_cAl Viro1-7/+4
Both hnode ->tp_c and tp_c argument of u32_set_parms() the latter is redundant, the former - never read... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: sched: cls_u32: the tp_c argument of u32_set_parms() is always tp->dataAl Viro1-3/+2
It must be tc_u_common associated with that tp (i.e. tp->data). Proof: * both ->ht_up and ->tp_c are assign-once * ->tp_c of anything inserted into tp_c->hlist is tp_c * hnodes never get reinserted into the lists or moved between those, so anything found by u32_lookup_ht(tp->data, ...) will have ->tp_c equal to tp->data. * tp->root->tp_c == tp->data. * ->ht_up of anything inserted into hnode->ht[...] is equal to hnode. * knodes never get reinserted into hash chains or moved between those, so anything returned by u32_lookup_key(ht, ...) will have ->ht_up equal to ht. * any knode returned by u32_get(tp, ...) will have ->ht_up->tp_c point to tp->data Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: sched: cls_u32: pass tc_u_common to u32_set_parms() instead of tc_u_hnodeAl Viro1-4/+4
the only thing we used ht for was ht->tp_c and callers can get that without going through ->tp_c at all; start with lifting that into the callers, next commits will massage those, eventually removing ->tp_c altogether. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: sched: cls_u32: clean tc_u_common hashtableAl Viro1-15/+9
* calculate key *once*, not for each hash chain element * let tc_u_hash() return the pointer to chain head rather than index - callers are cleaner that way. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: sched: cls_u32: get rid of tc_u_common ->rcuAl Viro1-1/+0
unused Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: sched: cls_u32: get rid of tc_u_knode ->tpAl Viro1-3/+0
not used anymore Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: sched: cls_u32: get rid of unused argument of u32_destroy_key()Al Viro1-7/+6
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: sched: cls_u32: make sure that divisor is a power of 2Al Viro1-1/+5
Tested by modifying iproute2 to allow sending a divisor > 255 Tested-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: sched: cls_u32: disallow linking to root hnodeAl Viro1-0/+4
Operation makes no sense. Nothing will actually break if we do so (depth limit in u32_classify() will prevent infinite loops), but according to maintainers it's best prohibited outright. NOTE: doing so guarantees that u32_destroy() will trigger the call of u32_destroy_hnode(); we might want to make that unconditional. Test: tc qdisc add dev eth0 ingress tc filter add dev eth0 parent ffff: protocol ip prio 100 u32 \ link 800: offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff should fail with Error: cls_u32: Not linking to root node Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: sched: cls_u32: mark root hnode explicitlyAl Viro1-1/+3
... and produce consistent error on attempt to delete such. Existing check in u32_delete() is inconsistent - after tc qdisc add dev eth0 ingress tc filter add dev eth0 parent ffff: protocol ip prio 100 handle 1: u32 \ divisor 1 tc filter add dev eth0 parent ffff: protocol ip prio 200 handle 2: u32 \ divisor 1 both tc filter delete dev eth0 parent ffff: protocol ip prio 100 handle 801: u32 and tc filter delete dev eth0 parent ffff: protocol ip prio 100 handle 800: u32 will fail (at least with refcounting fixes), but the former will complain about an attempt to remove a busy table, while the latter will recognize it as root and yield "Not allowed to delete root node" instead. The problem with the existing check is that several tcf_proto instances might share the same tp->data and handle-to-hnode lookup will be the same for all of them. So comparing an hnode to be deleted with tp->root won't catch the case when one tp is used to try deleting the root of another. Solution is trivial - mark the root hnodes explicitly upon allocation and check for that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08xsk: proper AF_XDP socket teardown orderingBjörn Töpel2-13/+11
The AF_XDP socket struct can exist in three different, implicit states: setup, bound and released. Setup is prior the socket has been bound to a device. Bound is when the socket is active for receive and send. Released is when the process/userspace side of the socket is released, but the sock object is still lingering, e.g. when there is a reference to the socket in an XSKMAP after process termination. The Rx fast-path code uses the "dev" member of struct xdp_sock to check whether a socket is bound or relased, and the Tx code uses the struct xdp_umem "xsk_list" member in conjunction with "dev" to determine the state of a socket. However, the transition from bound to released did not tear the socket down in correct order. On the Rx side "dev" was cleared after synchronize_net() making the synchronization useless. On the Tx side, the internal queues were destroyed prior removing them from the "xsk_list". This commit corrects the cleanup order, and by doing so xdp_del_sk_umem() can be simplified and one synchronize_net() can be removed. Fixes: 965a99098443 ("xsk: add support for bind for Rx") Fixes: ac98d8aab61b ("xsk: wire upp Tx zero-copy functions") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-07net: sched: pie: fix coding style issuesLeslie Monis1-18/+18
Fix 5 warnings and 14 checks issued by checkpatch.pl: CHECK: Logical continuations should be on the previous line + if ((q->vars.qdelay < q->params.target / 2) + && (q->vars.prob < MAX_PROB / 5)) WARNING: line over 80 characters + q->params.tupdate = usecs_to_jiffies(nla_get_u32(tb[TCA_PIE_TUPDATE])); CHECK: Blank lines aren't necessary after an open brace '{' +{ + CHECK: braces {} should be used on all arms of this statement + if (qlen < QUEUE_THRESHOLD) [...] + else { [...] CHECK: Unbalanced braces around else statement + else { CHECK: No space is necessary after a cast + if (delta > (s32) (MAX_PROB / (100 / 2)) && CHECK: Unnecessary parentheses around 'qdelay == 0' + if ((qdelay == 0) && (qdelay_old == 0) && update_prob) CHECK: Unnecessary parentheses around 'qdelay_old == 0' + if ((qdelay == 0) && (qdelay_old == 0) && update_prob) CHECK: Unnecessary parentheses around 'q->vars.prob == 0' + if ((q->vars.qdelay < q->params.target / 2) && + (q->vars.qdelay_old < q->params.target / 2) && + (q->vars.prob == 0) && + (q->vars.avg_dq_rate > 0)) CHECK: Unnecessary parentheses around 'q->vars.avg_dq_rate > 0' + if ((q->vars.qdelay < q->params.target / 2) && + (q->vars.qdelay_old < q->params.target / 2) && + (q->vars.prob == 0) && + (q->vars.avg_dq_rate > 0)) CHECK: Blank lines aren't necessary before a close brace '}' + +} CHECK: Comparison to NULL could be written "!opts" + if (opts == NULL) CHECK: No space is necessary after a cast + ((u32) PSCHED_TICKS2NS(q->params.target)) / WARNING: line over 80 characters + nla_put_u32(skb, TCA_PIE_TUPDATE, jiffies_to_usecs(q->params.tupdate)) || CHECK: Blank lines aren't necessary before a close brace '}' + +} CHECK: No space is necessary after a cast + .delay = ((u32) PSCHED_TICKS2NS(q->vars.qdelay)) / WARNING: Missing a blank line after declarations + struct sk_buff *skb; + skb = qdisc_dequeue_head(sch); WARNING: Missing a blank line after declarations + struct pie_sched_data *q = qdisc_priv(sch); + qdisc_reset_queue(sch); WARNING: Missing a blank line after declarations + struct pie_sched_data *q = qdisc_priv(sch); + q->params.tupdate = 0; Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller9-36/+88
2018-10-05net/ncsi: Add NCSI OEM command supportVijay Khemka4-4/+88
This patch adds OEM commands and response handling. It also defines OEM command and response structure as per NCSI specification along with its handlers. ncsi_cmd_handler_oem: This is a generic command request handler for OEM commands ncsi_rsp_handler_oem: This is a generic response handler for OEM commands Signed-off-by: Vijay Khemka <vijaykhemka@fb.com> Reviewed-by: Justin Lee <justin.lee1@dell.com> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-05ipv6: take rcu lock in rawv6_send_hdrinc()Wei Wang1-9/+20
In rawv6_send_hdrinc(), in order to avoid an extra dst_hold(), we directly assign the dst to skb and set passed in dst to NULL to avoid double free. However, in error case, we free skb and then do stats update with the dst pointer passed in. This causes use-after-free on the dst. Fix it by taking rcu read lock right before dst could get released to make sure dst does not get freed until the stats update is done. Note: we don't have this issue in ipv4 cause dst is not used for stats update in v4. Syzkaller reported following crash: BUG: KASAN: use-after-free in rawv6_send_hdrinc net/ipv6/raw.c:692 [inline] BUG: KASAN: use-after-free in rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921 Read of size 8 at addr ffff8801d95ba730 by task syz-executor0/32088 CPU: 1 PID: 32088 Comm: syz-executor0 Not tainted 4.19.0-rc2+ #93 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113 print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 rawv6_send_hdrinc net/ipv6/raw.c:692 [inline] rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:631 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114 __sys_sendmsg+0x11d/0x280 net/socket.c:2152 __do_sys_sendmsg net/socket.c:2161 [inline] __se_sys_sendmsg net/socket.c:2159 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457099 Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f83756edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f83756ee6d4 RCX: 0000000000457099 RDX: 0000000000000000 RSI: 0000000020003840 RDI: 0000000000000004 RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000004d4b30 R14: 00000000004c90b1 R15: 0000000000000000 Allocated by task 32088: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x730 mm/slab.c:3554 dst_alloc+0xbb/0x1d0 net/core/dst.c:105 ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:353 ip6_rt_cache_alloc+0x247/0x7b0 net/ipv6/route.c:1186 ip6_pol_route+0x8f8/0xd90 net/ipv6/route.c:1895 ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2093 fib6_rule_lookup+0x277/0x860 net/ipv6/fib6_rules.c:122 ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2121 ip6_route_output include/net/ip6_route.h:88 [inline] ip6_dst_lookup_tail+0xe27/0x1d60 net/ipv6/ip6_output.c:951 ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079 rawv6_sendmsg+0x12d9/0x4630 net/ipv6/raw.c:905 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:631 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114 __sys_sendmsg+0x11d/0x280 net/socket.c:2152 __do_sys_sendmsg net/socket.c:2161 [inline] __se_sys_sendmsg net/socket.c:2159 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 5356: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x83/0x290 mm/slab.c:3756 dst_destroy+0x267/0x3c0 net/core/dst.c:141 dst_destroy_rcu+0x16/0x19 net/core/dst.c:154 __rcu_reclaim kernel/rcu/rcu.h:236 [inline] rcu_do_batch kernel/rcu/tree.c:2576 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2880 [inline] __rcu_process_callbacks kernel/rcu/tree.c:2847 [inline] rcu_process_callbacks+0xf23/0x2670 kernel/rcu/tree.c:2864 __do_softirq+0x30b/0xad8 kernel/softirq.c:292 Fixes: 1789a640f556 ("raw: avoid two atomics in xmit") Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-05socket: Tighten no-error check in bind()Jakub Sitnicki1-1/+1
move_addr_to_kernel() returns only negative values on error, or zero on success. Rewrite the error check to an idiomatic form to avoid confusing the reader. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-05net: sched: Add policy validation for tc attributesDavid Ahern1-4/+20
A number of TC attributes are processed without proper validation (e.g., length checks). Add a tca policy for all input attributes and use when invoking nlmsg_parse. The 2 Fixes tags below cover the latest additions. The other attributes are a string (KIND), nested attribute (OPTIONS which does seem to have validation in most cases), for dumps only or a flag. Fixes: 5bc1701881e39 ("net: sched: introduce multichain support for filters") Fixes: d47a6b0e7c492 ("net: sched: introduce ingress/egress block index attributes for qdisc") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-05rtnetlink: fix rtnl_fdb_dump() for ndmsg headerMauricio Faria de Oliveira1-9/+20
Currently, rtnl_fdb_dump() assumes the family header is 'struct ifinfomsg', which is not always true -- 'struct ndmsg' is used by iproute2 ('ip neigh'). The problem is, the function bails out early if nlmsg_parse() fails, which does occur for iproute2 usage of 'struct ndmsg' because the payload length is shorter than the family header alone (as 'struct ifinfomsg' is assumed). This breaks backward compatibility with userspace -- nothing is sent back. Some examples with iproute2 and netlink library for go [1]: 1) $ bridge fdb show 33:33:00:00:00:01 dev ens3 self permanent 01:00:5e:00:00:01 dev ens3 self permanent 33:33:ff:15:98:30 dev ens3 self permanent This one works, as it uses 'struct ifinfomsg'. fdb_show() @ iproute2/bridge/fdb.c """ .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)), ... if (rtnl_dump_request(&rth, RTM_GETNEIGH, [...] """ 2) $ ip --family bridge neigh RTNETLINK answers: Invalid argument Dump terminated This one fails, as it uses 'struct ndmsg'. do_show_or_flush() @ iproute2/ip/ipneigh.c """ .n.nlmsg_type = RTM_GETNEIGH, .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)), """ 3) $ ./neighlist < no output > This one fails, as it uses 'struct ndmsg'-based. neighList() @ netlink/neigh_linux.go """ req := h.newNetlinkRequest(unix.RTM_GETNEIGH, [...] msg := Ndmsg{ """ The actual breakage was introduced by commit 0ff50e83b512 ("net: rtnetlink: bail out from rtnl_fdb_dump() on parse error"), because nlmsg_parse() fails if the payload length (with the _actual_ family header) is less than the family header length alone (which is assumed, in parameter 'hdrlen'). This is true in the examples above with struct ndmsg, with size and payload length shorter than struct ifinfomsg. However, that commit just intends to fix something under the assumption the family header is indeed an 'struct ifinfomsg' - by preventing access to the payload as such (via 'ifm' pointer) if the payload length is not sufficient to actually contain it. The assumption was introduced by commit 5e6d24358799 ("bridge: netlink dump interface at par with brctl"), to support iproute2's 'bridge fdb' command (not 'ip neigh') which indeed uses 'struct ifinfomsg', thus is not broken. So, in order to unbreak the 'struct ndmsg' family headers and still allow 'struct ifinfomsg' to continue to work, check for the known message sizes used with 'struct ndmsg' in iproute2 (with zero or one attribute which is not used in this function anyway) then do not parse the data as ifinfomsg. Same examples with this patch applied (or revert/before the original fix): $ bridge fdb show 33:33:00:00:00:01 dev ens3 self permanent 01:00:5e:00:00:01 dev ens3 self permanent 33:33:ff:15:98:30 dev ens3 self permanent $ ip --family bridge neigh dev ens3 lladdr 33:33:00:00:00:01 PERMANENT dev ens3 lladdr 01:00:5e:00:00:01 PERMANENT dev ens3 lladdr 33:33:ff:15:98:30 PERMANENT $ ./neighlist netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0x0, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0} netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x1, 0x0, 0x5e, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0} netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0xff, 0x15, 0x98, 0x30}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0} Tested on mainline (v4.19-rc6) and net-next (3bd09b05b068). References: [1] netlink library for go (test-case) https://github.com/vishvananda/netlink $ cat ~/go/src/neighlist/main.go package main import ("fmt"; "syscall"; "github.com/vishvananda/netlink") func main() { neighs, _ := netlink.NeighList(0, syscall.AF_BRIDGE) for _, neigh := range neighs { fmt.Printf("%#v\n", neigh) } } $ export GOPATH=~/go $ go get github.com/vishvananda/netlink $ go build neighlist $ ~/go/src/neighlist/neighlist Thanks to David Ahern for suggestions to improve this patch. Fixes: 0ff50e83b512 ("net: rtnetlink: bail out from rtnl_fdb_dump() on parse error") Fixes: 5e6d24358799 ("bridge: netlink dump interface at par with brctl") Reported-by: Aidan Obley <aobley@pivotal.io> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>