summaryrefslogtreecommitdiffstats
path: root/net/netfilter/ipvs/ip_vs_ctl.c
AgeCommit message (Collapse)AuthorFilesLines
2019-04-27genetlink: optionally validate strictly/dumpsJohannes Berg1-0/+16
Add options to strictly validate messages and dump messages, sometimes perhaps validating dump messages non-strictly may be required, so add an option for that as well. Since none of this can really be applied to existing commands, set the options everwhere using the following spatch: @@ identifier ops; expression X; @@ struct genl_ops ops[] = { ..., { .cmd = X, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, ... }, ... }; For new commands one should just not copy the .validate 'opt-out' flags and thus get strict validation. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27netlink: make validation more configurable for future strictnessJohannes Berg1-9/+4
We currently have two levels of strict validation: 1) liberal (default) - undefined (type >= max) & NLA_UNSPEC attributes accepted - attribute length >= expected accepted - garbage at end of message accepted 2) strict (opt-in) - NLA_UNSPEC attributes accepted - attribute length >= expected accepted Split out parsing strictness into four different options: * TRAILING - check that there's no trailing data after parsing attributes (in message or nested) * MAXTYPE - reject attrs > max known type * UNSPEC - reject attributes with NLA_UNSPEC policy entries * STRICT_ATTRS - strictly validate attribute size The default for future things should be *everything*. The current *_strict() is a combination of TRAILING and MAXTYPE, and is renamed to _deprecated_strict(). The current regular parsing has none of this, and is renamed to *_parse_deprecated(). Additionally it allows us to selectively set one of the new flags even on old policies. Notably, the UNSPEC flag could be useful in this case, since it can be arranged (by filling in the policy) to not be an incompatible userspace ABI change, but would then going forward prevent forgetting attribute entries. Similar can apply to the POLICY flag. We end up with the following renames: * nla_parse -> nla_parse_deprecated * nla_parse_strict -> nla_parse_deprecated_strict * nlmsg_parse -> nlmsg_parse_deprecated * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict * nla_parse_nested -> nla_parse_nested_deprecated * nla_validate_nested -> nla_validate_nested_deprecated Using spatch, of course: @@ expression TB, MAX, HEAD, LEN, POL, EXT; @@ -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression TB, MAX, NLA, POL, EXT; @@ -nla_parse_nested(TB, MAX, NLA, POL, EXT) +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) @@ expression START, MAX, POL, EXT; @@ -nla_validate_nested(START, MAX, POL, EXT) +nla_validate_nested_deprecated(START, MAX, POL, EXT) @@ expression NLH, HDRLEN, MAX, POL, EXT; @@ -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) For this patch, don't actually add the strict, non-renamed versions yet so that it breaks compile if I get it wrong. Also, while at it, make nla_validate and nla_parse go down to a common __nla_validate_parse() function to avoid code duplication. Ultimately, this allows us to have very strict validation for every new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the next patch, while existing things will continue to work as is. In effect then, this adds fully strict validation for any new command. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27netlink: make nla_nest_start() add NLA_F_NESTED flagMichal Kubecek1-5/+5
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-08ipvs: allow tunneling with gue encapsulationJacky Hu1-1/+34
ipip packets are blocked in some public cloud environments, this patch allows gue encapsulation with the tunneling method, which would make tunneling working in those environments. Signed-off-by: Jacky Hu <hengqing.hu@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-22genetlink: make policy common to familyJohannes Berg1-12/+1
Since maxattr is common, the policy can't really differ sanely, so make it common as well. The only user that did in fact manage to make a non-common policy is taskstats, which has to be really careful about it (since it's still using a common maxattr!). This is no longer supported, but we can fake it using pre_doit. This reduces the size of e.g. nl80211.o (which has lots of commands): text data bss dec hex filename 398745 14323 2240 415308 6564c net/wireless/nl80211.o (before) 397913 14331 2240 414484 65314 net/wireless/nl80211.o (after) -------------------------------- -832 +8 0 -824 Which is obviously just 8 bytes for each command, and an added 8 bytes for the new policy pointer. I'm not sure why the ops list is counted as .text though. Most of the code transformations were done using the following spatch: @ops@ identifier OPS; expression POLICY; @@ struct genl_ops OPS[] = { ..., { - .policy = POLICY, }, ... }; @@ identifier ops.OPS; expression ops.POLICY; identifier fam; expression M; @@ struct genl_family fam = { .ops = OPS, .maxattr = M, + .policy = POLICY, ... }; This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing the cb->data as ops, which we want to change in a later genl patch. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01ipvs: change some data types from int to boolAndrea Claudi1-6/+6
Change the data type of the following variables from int to bool across ipvs code: - found - loop - need_full_dest - need_full_svc - payload_csum Also change the following functions to use bool full_entry param instead of int: - ip_vs_genl_parse_dest() - ip_vs_genl_parse_service() This patch does not change any functionality but makes the source code slightly easier to read. Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+2
Two easily resolvable overlapping change conflicts, one in TCP and one in the eBPF verifier. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-4/+2
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for you net-next tree: 1) Missing NFTA_RULE_POSITION_ID netlink attribute validation, from Phil Sutter. 2) Restrict matching on tunnel metadata to rx/tx path, from wenxu. 3) Avoid indirect calls for IPV6=y, from Florian Westphal. 4) Add two indirections to prepare merger of IPV4 and IPV6 nat modules, from Florian Westphal. 5) Broken indentation in ctnetlink, from Colin Ian King. 6) Patches to use struct_size() from netfilter and IPVS, from Gustavo A. R. Silva. 7) Display kernel splat only once in case of racing to confirm conntrack from bridge plus nfqueue setups, from Chieh-Min Wang. 8) Skip checksum validation for layer 4 protocols that don't need it, patch from Alin Nastac. 9) Sparse warning due to symbol that should be static in CLUSTERIP, from Wei Yongjun. 10) Add new toggle to disable SDP payload translation when media endpoint is reachable though the same interface as the signalling peer, from Alin Nastac. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-16ipvs: fix warning on unused variableAndrea Claudi1-1/+2
When CONFIG_IP_VS_IPV6 is not defined, build produced this warning: net/netfilter/ipvs/ip_vs_ctl.c:899:6: warning: unused variable ‘ret’ [-Wunused-variable] int ret = 0; ^~~ Fix this by moving the declaration of 'ret' in the CONFIG_IP_VS_IPV6 section in the same function. While at it, drop its unneeded initialisation. Fixes: 098e13f5b21d ("ipvs: fix dependency on nf_defrag_ipv6") Reported-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-12ipvs: fix dependency on nf_defrag_ipv6Andrea Claudi1-0/+10
ipvs relies on nf_defrag_ipv6 module to manage IPv6 fragmentation, but lacks proper Kconfig dependencies and does not explicitly request defrag features. As a result, if netfilter hooks are not loaded, when IPv6 fragmented packet are handled by ipvs only the first fragment makes through. Fix it properly declaring the dependency on Kconfig and registering netfilter hooks on ip_vs_add_service() and ip_vs_new_dest(). Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-12ipvs: Use struct_size() helperGustavo A. R. Silva1-4/+2
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; size = sizeof(struct foo) + count * sizeof(struct boo); instance = alloc(size, GFP_KERNEL) Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: size = struct_size(instance, entry, count); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-24ipvs: Fix signed integer overflow when setsockopt timeoutZhangXiaoxu1-0/+12
There is a UBSAN bug report as below: UBSAN: Undefined behaviour in net/netfilter/ipvs/ip_vs_ctl.c:2227:21 signed integer overflow: -2147483647 * 1000 cannot be represented in type 'int' Reproduce program: #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #define IPPROTO_IP 0 #define IPPROTO_RAW 255 #define IP_VS_BASE_CTL (64+1024+64) #define IP_VS_SO_SET_TIMEOUT (IP_VS_BASE_CTL+10) /* The argument to IP_VS_SO_GET_TIMEOUT */ struct ipvs_timeout_t { int tcp_timeout; int tcp_fin_timeout; int udp_timeout; }; int main() { int ret = -1; int sockfd = -1; struct ipvs_timeout_t to; sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW); if (sockfd == -1) { printf("socket init error\n"); return -1; } to.tcp_timeout = -2147483647; to.tcp_fin_timeout = -2147483647; to.udp_timeout = -2147483647; ret = setsockopt(sockfd, IPPROTO_IP, IP_VS_SO_SET_TIMEOUT, (char *)(&to), sizeof(to)); printf("setsockopt return %d\n", ret); return ret; } Return -EINVAL if the timeout value is negative or max than 'INT_MAX / HZ'. Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-26ipvs: call ip_vs_dst_notifier earlier than ipv6_dev_notfXin Long1-0/+3
ip_vs_dst_event is supposed to clean up all dst used in ipvs' destinations when a net dev is going down. But it works only when the dst's dev is the same as the dev from the event. Now with the same priority but late registration, ip_vs_dst_notifier is always called later than ipv6_dev_notf where the dst's dev is set to lo for NETDEV_DOWN event. As the dst's dev lo is not the same as the dev from the event in ip_vs_dst_event, ip_vs_dst_notifier doesn't actually work. Also as these dst have to wait for dest_trash_timer to clean them up. It would cause some non-permanent kernel warnings: unregister_netdevice: waiting for br0 to become free. Usage count = 3 To fix it, call ip_vs_dst_notifier earlier than ipv6_dev_notf by increasing its priority to ADDRCONF_NOTIFY_PRIORITY + 5. Note that for ipv4 route fib_netdev_notifier doesn't set dst's dev to lo in NETDEV_DOWN event, so this fix is only needed when IP_VS_IPV6 is defined. Fixes: 7a4f0761fce3 ("IPVS: init and cleanup restructuring") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-10-08net: Add extack to nlmsg_parseDavid Ahern1-1/+1
Make sure extack is passed to nlmsg_parse where easy to do so. Most of these are dump handlers and leveraging the extack in the netlink_callback. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-04net: Remove some unneeded semicolonzhong jiang1-1/+1
These semicolons are not needed. Just remove them. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-0/+4
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree: 1) Reject non-null terminated helper names from xt_CT, from Gao Feng. 2) Fix KASAN splat due to out-of-bound access from commit phase, from Alexey Kodanev. 3) Missing conntrack hook registration on IPVS FTP helper, from Julian Anastasov. 4) Incorrect skbuff allocation size in bridge nft_reject, from Taehee Yoo. 5) Fix inverted check on packet xmit to non-local addresses, also from Julian. 6) Fix ebtables alignment compat problems, from Alin Nastac. 7) Hook mask checks are not correct in xt_set, from Serhey Popovych. 8) Fix timeout listing of element in ipsets, from Jozsef. 9) Cap maximum timeout value in ipset, also from Jozsef. 10) Don't allow family option for hash:mac sets, from Florent Fourcot. 11) Restrict ebtables to work with NFPROTO_BRIDGE targets only, this Florian. 12) Another bug reported by KASAN in the rbtree set backend, from Taehee Yoo. 13) Missing __IPS_MAX_BIT update doesn't include IPS_OFFLOAD_BIT. From Gao Feng. 14) Missing initialization of match/target in ebtables, from Florian Westphal. 15) Remove useless nft_dup.h file in include path, from C. Labbe. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-0/+4
Pull networking updates from David Miller: 1) Add Maglev hashing scheduler to IPVS, from Inju Song. 2) Lots of new TC subsystem tests from Roman Mashak. 3) Add TCP zero copy receive and fix delayed acks and autotuning with SO_RCVLOWAT, from Eric Dumazet. 4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard Brouer. 5) Add ttl inherit support to vxlan, from Hangbin Liu. 6) Properly separate ipv6 routes into their logically independant components. fib6_info for the routing table, and fib6_nh for sets of nexthops, which thus can be shared. From David Ahern. 7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP messages from XDP programs. From Nikita V. Shirokov. 8) Lots of long overdue cleanups to the r8169 driver, from Heiner Kallweit. 9) Add BTF ("BPF Type Format"), from Martin KaFai Lau. 10) Add traffic condition monitoring to iwlwifi, from Luca Coelho. 11) Plumb extack down into fib_rules, from Roopa Prabhu. 12) Add Flower classifier offload support to igb, from Vinicius Costa Gomes. 13) Add UDP GSO support, from Willem de Bruijn. 14) Add documentation for eBPF helpers, from Quentin Monnet. 15) Add TLS tx offload to mlx5, from Ilya Lesokhin. 16) Allow applications to be given the number of bytes available to read on a socket via a control message returned from recvmsg(), from Soheil Hassas Yeganeh. 17) Add x86_32 eBPF JIT compiler, from Wang YanQing. 18) Add AF_XDP sockets, with zerocopy support infrastructure as well. From Björn Töpel. 19) Remove indirect load support from all of the BPF JITs and handle these operations in the verifier by translating them into native BPF instead. From Daniel Borkmann. 20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha. 21) Allow XDP programs to do lookups in the main kernel routing tables for forwarding. From David Ahern. 22) Allow drivers to store hardware state into an ELF section of kernel dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy. 23) Various RACK and loss detection improvements in TCP, from Yuchung Cheng. 24) Add TCP SACK compression, from Eric Dumazet. 25) Add User Mode Helper support and basic bpfilter infrastructure, from Alexei Starovoitov. 26) Support ports and protocol values in RTM_GETROUTE, from Roopa Prabhu. 27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard Brouer. 28) Add lots of forwarding selftests, from Petr Machata. 29) Add generic network device failover driver, from Sridhar Samudrala. * ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits) strparser: Add __strp_unpause and use it in ktls. rxrpc: Fix terminal retransmission connection ID to include the channel net: hns3: Optimize PF CMDQ interrupt switching process net: hns3: Fix for VF mailbox receiving unknown message net: hns3: Fix for VF mailbox cannot receiving PF response bnx2x: use the right constant Revert "net: sched: cls: Fix offloading when ingress dev is vxlan" net: dsa: b53: Fix for brcm tag issue in Cygnus SoC enic: fix UDP rss bits netdev-FAQ: clarify DaveM's position for stable backports rtnetlink: validate attributes in do_setlink() mlxsw: Add extack messages for port_{un, }split failures netdevsim: Add extack error message for devlink reload devlink: Add extack to reload and port_{un, }split operations net: metrics: add proper netlink validation ipmr: fix error path when ipmr_new_table fails ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds net: hns3: remove unused hclgevf_cfg_func_mta_filter netfilter: provide udp*_lib_lookup for nf_tproxy qed*: Utilize FW 8.37.2.0 ...
2018-06-04Merge branch 'hch.procfs' of ↵Linus Torvalds1-41/+6
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull procfs updates from Al Viro: "Christoph's proc_create_... cleanups series" * 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits) xfs, proc: hide unused xfs procfs helpers isdn/gigaset: add back gigaset_procinfo assignment proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields tty: replace ->proc_fops with ->proc_show ide: replace ->proc_fops with ->proc_show ide: remove ide_driver_proc_write isdn: replace ->proc_fops with ->proc_show atm: switch to proc_create_seq_private atm: simplify procfs code bluetooth: switch to proc_create_seq_data netfilter/x_tables: switch to proc_create_seq_private netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data neigh: switch to proc_create_seq_data hostap: switch to proc_create_{seq,single}_data bonding: switch to proc_create_seq_data rtc/proc: switch to proc_create_single_data drbd: switch to proc_create_single resource: switch to proc_create_seq_data staging/rtl8192u: simplify procfs code jfs: simplify procfs code ...
2018-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-6/+15
Filling in the padding slot in the bpf structure as a bug fix in 'ne' overlapped with actually using that padding area for something in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-02ipvs: register conntrack hooks for ftpJulian Anastasov1-0/+4
ip_vs_ftp requires conntrack modules for mangling of FTP command responses in passive mode. Make sure the conntrack hooks are registered when real servers use NAT method in FTP virtual service. The hooks will be registered while the service is present. Fixes: 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-23ipvs: fix buffer overflow with sync daemon and serviceJulian Anastasov1-6/+15
syzkaller reports for buffer overflow for interface name when starting sync daemons [1] What we do is that we copy user structure into larger stack buffer but later we search NUL past the stack buffer. The same happens for sched_name when adding/editing virtual server. We are restricted by IP_VS_SCHEDNAME_MAXLEN and IP_VS_IFNAME_MAXLEN being used as size in include/uapi/linux/ip_vs.h, so they include the space for NUL. As using strlcpy is wrong for unsafe source, replace it with strscpy and add checks to return EINVAL if source string is not NUL-terminated. The incomplete strlcpy fix comes from 2.6.13. For the netlink interface reduce the len parameter for IPVS_DAEMON_ATTR_MCAST_IFN and IPVS_SVC_ATTR_SCHED_NAME, so that we get proper EINVAL. [1] kernel BUG at lib/string.c:1052! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 373 Comm: syz-executor936 Not tainted 4.17.0-rc4+ #45 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:fortify_panic+0x13/0x20 lib/string.c:1051 RSP: 0018:ffff8801c976f800 EFLAGS: 00010282 RAX: 0000000000000022 RBX: 0000000000000040 RCX: 0000000000000000 RDX: 0000000000000022 RSI: ffffffff8160f6f1 RDI: ffffed00392edef6 RBP: ffff8801c976f800 R08: ffff8801cf4c62c0 R09: ffffed003b5e4fb0 R10: ffffed003b5e4fb0 R11: ffff8801daf27d87 R12: ffff8801c976fa20 R13: ffff8801c976fae4 R14: ffff8801c976fae0 R15: 000000000000048b FS: 00007fd99f75e700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000200001c0 CR3: 00000001d6843000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: strlen include/linux/string.h:270 [inline] strlcpy include/linux/string.h:293 [inline] do_ip_vs_set_ctl+0x31c/0x1d00 net/netfilter/ipvs/ip_vs_ctl.c:2388 nf_sockopt net/netfilter/nf_sockopt.c:106 [inline] nf_setsockopt+0x7d/0xd0 net/netfilter/nf_sockopt.c:115 ip_setsockopt+0xd8/0xf0 net/ipv4/ip_sockglue.c:1253 udp_setsockopt+0x62/0xa0 net/ipv4/udp.c:2487 ipv6_setsockopt+0x149/0x170 net/ipv6/ipv6_sockglue.c:917 tcp_setsockopt+0x93/0xe0 net/ipv4/tcp.c:3057 sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3046 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903 __do_sys_setsockopt net/socket.c:1914 [inline] __se_sys_setsockopt net/socket.c:1911 [inline] __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x447369 RSP: 002b:00007fd99f75dda8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00000000006e39e4 RCX: 0000000000447369 RDX: 000000000000048b RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000018 R09: 0000000000000000 R10: 00000000200001c0 R11: 0000000000000246 R12: 00000000006e39e0 R13: 75a1ff93f0896195 R14: 6f745f3168746576 R15: 0000000000000001 Code: 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b 48 89 df e8 d2 8f 48 fa eb de 55 48 89 fe 48 c7 c7 60 65 64 88 48 89 e5 e8 91 dd f3 f9 <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 41 57 41 56 RIP: fortify_panic+0x13/0x20 lib/string.c:1051 RSP: ffff8801c976f800 Reported-and-tested-by: syzbot+aac887f77319868646df@syzkaller.appspotmail.com Fixes: e4ff67513096 ("ipvs: add sync_maxlen parameter for the sync daemon") Fixes: 4da62fc70d7c ("[IPVS]: Fix for overflows") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-16proc: introduce proc_create_net_singleChristoph Hellwig1-27/+4
Variant of proc_create_data that directly take a seq_file show callback and deals with network namespaces in ->open and ->release. All callers of proc_create + single_open_net converted over, and single_{open,release}_net are removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16proc: introduce proc_create_net{,_data}Christoph Hellwig1-14/+2
Variants of proc_create{,_data} that directly take a struct seq_operations and deal with network namespaces in ->open and ->release. All callers of proc_create + seq_open_net converted over, and seq_{open,release}_net are removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-0/+4
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree, more relevant updates in this batch are: 1) Add Maglev support to IPVS. Moreover, store lastest server weight in IPVS since this is needed by maglev, patches from from Inju Song. 2) Preparation works to add iptables flowtable support, patches from Felix Fietkau. 3) Hand over flows back to conntrack slow path in case of TCP RST/FIN packet is seen via new teardown state, also from Felix. 4) Add support for extended netlink error reporting for nf_tables. 5) Support for larger timeouts that 23 days in nf_tables, patch from Florian Westphal. 6) Always set an upper limit to dynamic sets, also from Florian. 7) Allow number generator to make map lookups, from Laura Garcia. 8) Use hash_32() instead of opencode hashing in IPVS, from Vicent Bernat. 9) Extend ip6tables SRH match to support previous, next and last SID, from Ahmed Abdelsalam. 10) Move Passive OS fingerprint nf_osf.c, from Fernando Fernandez. 11) Expose nf_conntrack_max through ctnetlink, from Florent Fourcot. 12) Several housekeeping patches for xt_NFLOG, x_tables and ebtables, from Taehee Yoo. 13) Unify meta bridge with core nft_meta, then make nft_meta built-in. Make rt and exthdr built-in too, again from Florian. 14) Missing initialization of tbl->entries in IPVS, from Cong Wang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-09ipvs: fix rtnl_lock lockups caused by start_sync_threadJulian Anastasov1-8/+0
syzkaller reports for wrong rtnl_lock usage in sync code [1] and [2] We have 2 problems in start_sync_thread if error path is taken, eg. on memory allocation error or failure to configure sockets for mcast group or addr/port binding: 1. recursive locking: holding rtnl_lock while calling sock_release which in turn calls again rtnl_lock in ip_mc_drop_socket to leave the mcast group, as noticed by Florian Westphal. Additionally, sock_release can not be called while holding sync_mutex (ABBA deadlock). 2. task hung: holding rtnl_lock while calling kthread_stop to stop the running kthreads. As the kthreads do the same to leave the mcast group (sock_release -> ip_mc_drop_socket -> rtnl_lock) they hang. Fix the problems by calling rtnl_unlock early in the error path, now sock_release is called after unlocking both mutexes. Problem 3 (task hung reported by syzkaller [2]) is variant of problem 2: use _trylock to prevent one user to call rtnl_lock and then while waiting for sync_mutex to block kthreads that execute sock_release when they are stopped by stop_sync_thread. [1] IPVS: stopping backup sync thread 4500 ... WARNING: possible recursive locking detected 4.16.0-rc7+ #3 Not tainted -------------------------------------------- syzkaller688027/4497 is trying to acquire lock: (rtnl_mutex){+.+.}, at: [<00000000bb14d7fb>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 but task is already holding lock: IPVS: stopping backup sync thread 4495 ... (rtnl_mutex){+.+.}, at: [<00000000bb14d7fb>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(rtnl_mutex); lock(rtnl_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by syzkaller688027/4497: #0: (rtnl_mutex){+.+.}, at: [<00000000bb14d7fb>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 #1: (ipvs->sync_mutex){+.+.}, at: [<00000000703f78e3>] do_ip_vs_set_ctl+0x10f8/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2388 stack backtrace: CPU: 1 PID: 4497 Comm: syzkaller688027 Not tainted 4.16.0-rc7+ #3 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 print_deadlock_bug kernel/locking/lockdep.c:1761 [inline] check_deadlock kernel/locking/lockdep.c:1805 [inline] validate_chain kernel/locking/lockdep.c:2401 [inline] __lock_acquire+0xe8f/0x3e00 kernel/locking/lockdep.c:3431 lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920 __mutex_lock_common kernel/locking/mutex.c:756 [inline] __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 ip_mc_drop_socket+0x88/0x230 net/ipv4/igmp.c:2643 inet_release+0x4e/0x1c0 net/ipv4/af_inet.c:413 sock_release+0x8d/0x1e0 net/socket.c:595 start_sync_thread+0x2213/0x2b70 net/netfilter/ipvs/ip_vs_sync.c:1924 do_ip_vs_set_ctl+0x1139/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2389 nf_sockopt net/netfilter/nf_sockopt.c:106 [inline] nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115 ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1261 udp_setsockopt+0x45/0x80 net/ipv4/udp.c:2406 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2975 SYSC_setsockopt net/socket.c:1849 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1828 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x446a69 RSP: 002b:00007fa1c3a64da8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000446a69 RDX: 000000000000048b RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00000000006e29fc R08: 0000000000000018 R09: 0000000000000000 R10: 00000000200000c0 R11: 0000000000000246 R12: 00000000006e29f8 R13: 00676e697279656b R14: 00007fa1c3a659c0 R15: 00000000006e2b60 [2] IPVS: sync thread started: state = BACKUP, mcast_ifn = syz_tun, syncid = 4, id = 0 IPVS: stopping backup sync thread 25415 ... INFO: task syz-executor7:25421 blocked for more than 120 seconds. Not tainted 4.16.0-rc6+ #284 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syz-executor7 D23688 25421 4408 0x00000004 Call Trace: context_switch kernel/sched/core.c:2862 [inline] __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 schedule+0xf5/0x430 kernel/sched/core.c:3499 schedule_timeout+0x1a3/0x230 kernel/time/timer.c:1777 do_wait_for_common kernel/sched/completion.c:86 [inline] __wait_for_common kernel/sched/completion.c:107 [inline] wait_for_common kernel/sched/completion.c:118 [inline] wait_for_completion+0x415/0x770 kernel/sched/completion.c:139 kthread_stop+0x14a/0x7a0 kernel/kthread.c:530 stop_sync_thread+0x3d9/0x740 net/netfilter/ipvs/ip_vs_sync.c:1996 do_ip_vs_set_ctl+0x2b1/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2394 nf_sockopt net/netfilter/nf_sockopt.c:106 [inline] nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115 ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1253 sctp_setsockopt+0x2ca/0x63e0 net/sctp/socket.c:4154 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:3039 SYSC_setsockopt net/socket.c:1850 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1829 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x454889 RSP: 002b:00007fc927626c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00007fc9276276d4 RCX: 0000000000454889 RDX: 000000000000048c RSI: 0000000000000000 RDI: 0000000000000017 RBP: 000000000072bf58 R08: 0000000000000018 R09: 0000000000000000 R10: 0000000020000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 000000000000051c R14: 00000000006f9b40 R15: 0000000000000001 Showing all locks held in the system: 2 locks held by khungtaskd/868: #0: (rcu_read_lock){....}, at: [<00000000a1a8f002>] check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline] #0: (rcu_read_lock){....}, at: [<00000000a1a8f002>] watchdog+0x1c5/0xd60 kernel/hung_task.c:249 #1: (tasklist_lock){.+.+}, at: [<0000000037c2f8f9>] debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470 1 lock held by rsyslogd/4247: #0: (&f->f_pos_lock){+.+.}, at: [<000000000d8d6983>] __fdget_pos+0x12b/0x190 fs/file.c:765 2 locks held by getty/4338: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4339: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4340: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4341: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4342: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4343: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4344: #0: (&tty->ldisc_sem){++++}, at: [<00000000bee98654>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<00000000c1d180aa>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 3 locks held by kworker/0:5/6494: #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000a062b18e>] work_static include/linux/workqueue.h:198 [inline] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000a062b18e>] set_work_data kernel/workqueue.c:619 [inline] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000a062b18e>] set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline] #0: ((wq_completion)"%s"("ipv6_addrconf")){+.+.}, at: [<00000000a062b18e>] process_one_work+0xb12/0x1bb0 kernel/workqueue.c:2084 #1: ((addr_chk_work).work){+.+.}, at: [<00000000278427d5>] process_one_work+0xb89/0x1bb0 kernel/workqueue.c:2088 #2: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 1 lock held by syz-executor7/25421: #0: (ipvs->sync_mutex){+.+.}, at: [<00000000d414a689>] do_ip_vs_set_ctl+0x277/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2393 2 locks held by syz-executor7/25427: #0: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 #1: (ipvs->sync_mutex){+.+.}, at: [<00000000e6d48489>] do_ip_vs_set_ctl+0x10f8/0x1cc0 net/netfilter/ipvs/ip_vs_ctl.c:2388 1 lock held by syz-executor7/25435: #0: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 1 lock held by ipvs-b:2:0/25415: #0: (rtnl_mutex){+.+.}, at: [<00000000066e35ac>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 Reported-and-tested-by: syzbot+a46d6abf9d56b1365a72@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+5fe074c01b2032ce9618@syzkaller.appspotmail.com Fixes: e0b26cc997d5 ("ipvs: call rtnl_lock early") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-04-09netfilter: ipvs: Keep latest weight of destinationInju Song1-0/+4
The hashing table in scheduler such as source hash or maglev hash should ignore the changed weight to 0 and allow changing the weight from/to non-0 values. So, struct ip_vs_dest needs to keep weight with latest non-0 weight. Signed-off-by: Inju Song <inju.song@navercorp.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2018-01-19netfilter: delete /proc THIS_MODULE referencesAlexey Dobriyan1-3/+0
/proc has been ignoring struct file_operations::owner field for 10 years. Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where inode->i_fop is initialized with proxy struct file_operations for regular files: - if (de->proc_fops) - inode->i_fop = de->proc_fops; + if (de->proc_fops) { + if (S_ISREG(inode->i_mode)) + inode->i_fop = &proc_reg_file_ops; + else + inode->i_fop = de->proc_fops; + } VFS stopped pinning module at this point. # ipvs Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-2/+6
Pull networking updates from David Miller: "Highlights: 1) Maintain the TCP retransmit queue using an rbtree, with 1GB windows at 100Gb this really has become necessary. From Eric Dumazet. 2) Multi-program support for cgroup+bpf, from Alexei Starovoitov. 3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew Lunn. 4) Add meter action support to openvswitch, from Andy Zhou. 5) Add a data meta pointer for BPF accessible packets, from Daniel Borkmann. 6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet. 7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli. 8) More work to move the RTNL mutex down, from Florian Westphal. 9) Add 'bpftool' utility, to help with bpf program introspection. From Jakub Kicinski. 10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper Dangaard Brouer. 11) Support 'blocks' of transformations in the packet scheduler which can span multiple network devices, from Jiri Pirko. 12) TC flower offload support in cxgb4, from Kumar Sanghvi. 13) Priority based stream scheduler for SCTP, from Marcelo Ricardo Leitner. 14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg. 15) Add RED qdisc offloadability, and use it in mlxsw driver. From Nogah Frankel. 16) eBPF based device controller for cgroup v2, from Roman Gushchin. 17) Add some fundamental tracepoints for TCP, from Song Liu. 18) Remove garbage collection from ipv6 route layer, this is a significant accomplishment. From Wei Wang. 19) Add multicast route offload support to mlxsw, from Yotam Gigi" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits) tcp: highest_sack fix geneve: fix fill_info when link down bpf: fix lockdep splat net: cdc_ncm: GetNtbFormat endian fix openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start netem: remove unnecessary 64 bit modulus netem: use 64 bit divide by rate tcp: Namespace-ify sysctl_tcp_default_congestion_control net: Protect iterations over net::fib_notifier_ops in fib_seq_sum() ipv6: set all.accept_dad to 0 by default uapi: fix linux/tls.h userspace compilation error usbnet: ipheth: prevent TX queue timeouts when device not ready vhost_net: conditionally enable tx polling uapi: fix linux/rxrpc.h userspace compilation errors net: stmmac: fix LPI transitioning for dwmac4 atm: horizon: Fix irq release error net-sysfs: trigger netlink notification on ifalias change via sysfs openvswitch: Using kfree_rcu() to simplify the code openvswitch: Make local function ovs_nsh_key_attr_size() static openvswitch: Fix return value check in ovs_meter_cmd_features() ...
2017-11-08netfilter: ipvs: Convert timers to use timer_setup()Kees Cook1-4/+3
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Wensong Zhang <wensong@linux-vs.org> Cc: Simon Horman <horms@verge.net.au> Cc: Julian Anastasov <ja@ssi.bg> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: lvs-devel@vger.kernel.org Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au>
2017-11-06netfilter: ipvs: Fix inappropriate output of procfsKUWAZAWA Takuya1-0/+4
Information about ipvs in different network namespace can be seen via procfs. How to reproduce: # ip netns add ns01 # ip netns add ns02 # ip netns exec ns01 ip a add dev lo 127.0.0.1/8 # ip netns exec ns02 ip a add dev lo 127.0.0.1/8 # ip netns exec ns01 ipvsadm -A -t 10.1.1.1:80 # ip netns exec ns02 ipvsadm -A -t 10.1.1.2:80 The ipvsadm displays information about its own network namespace only. # ip netns exec ns01 ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.1.1.1:80 wlc # ip netns exec ns02 ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.1.1.2:80 wlc But I can see information about other network namespace via procfs. # ip netns exec ns01 cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 0A010101:0050 wlc TCP 0A010102:0050 wlc # ip netns exec ns02 cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 0A010102:0050 wlc Signed-off-by: KUWAZAWA Takuya <albatross0@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-11-06netfilter: ipvs: Use %pS printk format for direct addressesHelge Deller1-2/+2
The debug and error printk functions in ipvs uses wrongly the %pF instead of the %pS printk format specifier for printing symbols for the address returned by _builtin_return_address(0). Fix it for the ia64, ppc64 and parisc64 architectures. Signed-off-by: Helge Deller <deller@gmx.de> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Cc: lvs-devel@vger.kernel.org Cc: netfilter-devel@vger.kernel.org Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-07-24netfilter: Remove duplicated rcu_read_lock.Taehee Yoo1-3/+0
This patch removes duplicate rcu_read_lock(). 1. IPVS part: According to Julian Anastasov's mention, contexts of ipvs are described at: http://marc.info/?l=netfilter-devel&m=149562884514072&w=2, in summary: - packet RX/TX: does not need locks because packets come from hooks. - sync msg RX: backup server uses RCU locks while registering new connections. - ip_vs_ctl.c: configuration get/set, RCU locks needed. - xt_ipvs.c: It is a netfilter match, running from hook context. As result, rcu_read_lock and rcu_read_unlock can be removed from: - ip_vs_core.c: all - ip_vs_ctl.c: - only from ip_vs_has_real_service - ip_vs_ftp.c: all - ip_vs_proto_sctp.c: all - ip_vs_proto_tcp.c: all - ip_vs_proto_udp.c: all - ip_vs_xmit.c: all (contains only packet processing) 2. Netfilter part: There are three types of functions that are guaranteed the rcu_read_lock(). First, as result, functions are only called by nf_hook(): - nf_conntrack_broadcast_help(), pptp_expectfn(), set_expected_rtp_rtcp(). - tcpmss_reverse_mtu(), tproxy_laddr4(), tproxy_laddr6(). - match_lookup_rt6(), check_hlist(), hashlimit_mt_common(). - xt_osf_match_packet(). Second, functions that caller already held the rcu_read_lock(). - destroy_conntrack(), ctnetlink_conntrack_event(). - ctnl_timeout_find_get(), nfqnl_nf_hook_drop(). Third, functions that are mixed with type1 and type2. These functions are called by nf_hook() also these are called by ordinary functions that already held the rcu_read_lock(): - __ctnetlink_glue_build(), ctnetlink_expect_event(). - ctnetlink_proto_size(). Applied files are below: - nf_conntrack_broadcast.c, nf_conntrack_core.c, nf_conntrack_netlink.c. - nf_conntrack_pptp.c, nf_conntrack_sip.c, nfnetlink_cttimeout.c. - nfnetlink_queue.c, xt_TCPMSS.c, xt_TPROXY.c, xt_addrtype.c. - xt_connlimit.c, xt_hashlimit.c, xt_osf.c Detailed calltrace can be found at: http://marc.info/?l=netfilter-devel&m=149667610710350&w=2 Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-5/+17
Pablo Neira Ayuso says: ==================== Netfilter/IPVS/OVS fixes for net The following patchset contains a rather large batch of Netfilter, IPVS and OVS fixes for your net tree. This includes fixes for ctnetlink, the userspace conntrack helper infrastructure, conntrack OVS support, ebtables DNAT target, several leaks in error path among other. More specifically, they are: 1) Fix reference count leak in the CT target error path, from Gao Feng. 2) Remove conntrack entry clashing with a matching expectation, patch from Jarno Rajahalme. 3) Fix bogus EEXIST when registering two different userspace helpers, from Liping Zhang. 4) Don't leak dummy elements in the new bitmap set type in nf_tables, from Liping Zhang. 5) Get rid of module autoload from conntrack update path in ctnetlink, we don't need autoload at this late stage and it is happening with rcu read lock held which is not good. From Liping Zhang. 6) Fix deadlock due to double-acquire of the expect_lock from conntrack update path, this fixes a bug that was introduced when the central spinlock got removed. Again from Liping Zhang. 7) Safe ct->status update from ctnetlink path, from Liping. The expect_lock protection that was selected when the central spinlock was removed was not really protecting anything at all. 8) Protect sequence adjustment under ct->lock. 9) Missing socket match with IPv6, from Peter Tirsek. 10) Adjust skb->pkt_type of DNAT'ed frames from ebtables, from Linus Luessing. 11) Don't give up on evaluating the expression on new entries added via dynset expression in nf_tables, from Liping Zhang. 12) Use skb_checksum() when mangling icmpv6 in IPv6 NAT as this deals with non-linear skbuffs. 13) Don't allow IPv6 service in IPVS if no IPv6 support is available, from Paolo Abeni. 14) Missing mutex release in error path of xt_find_table_lock(), from Dan Carpenter. 15) Update maintainers files, Netfilter section. Add Florian to the file, refer to nftables.org and change project status from Supported to Maintained. 16) Bail out on mismatching extensions in element updates in nf_tables. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-6/+6
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter updates for your net-next tree. A large bunch of code cleanups, simplify the conntrack extension codebase, get rid of the fake conntrack object, speed up netns by selective synchronize_net() calls. More specifically, they are: 1) Check for ct->status bit instead of using nfct_nat() from IPVS and Netfilter codebase, patch from Florian Westphal. 2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao. 3) Simplify FTP IPVS helper module registration path, from Arushi Singhal. 4) Introduce nft_is_base_chain() helper function. 5) Enforce expectation limit from userspace conntrack helper, from Gao Feng. 6) Add nf_ct_remove_expect() helper function, from Gao Feng. 7) NAT mangle helper function return boolean, from Gao Feng. 8) ctnetlink_alloc_expect() should only work for conntrack with helpers, from Gao Feng. 9) Add nfnl_msg_type() helper function to nfnetlink to build the netlink message type. 10) Get rid of unnecessary cast on void, from simran singhal. 11) Use seq_puts()/seq_putc() instead of seq_printf() where possible, also from simran singhal. 12) Use list_prev_entry() from nf_tables, from simran signhal. 13) Remove unnecessary & on pointer function in the Netfilter and IPVS code. 14) Remove obsolete comment on set of rules per CPU in ip6_tables, no longer true. From Arushi Singhal. 15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng. 16) Remove unnecessary nested rcu_read_lock() in __nf_nat_decode_session(). Code running from hooks are already guaranteed to run under RCU read side. 17) Remove deadcode in nf_tables_getobj(), from Aaron Conole. 18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(), also from Aaron. 19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole. 20) Don't propagate NF_DROP error to userspace via ctnetlink in __nf_nat_alloc_null_binding() function, from Gao Feng. 21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks, from Gao Feng. 22) Kill the fake untracked conntrack objects, use ctinfo instead to annotate a conntrack object is untracked, from Florian Westphal. 23) Remove nf_ct_is_untracked(), now obsolete since we have no conntrack template anymore, from Florian. 24) Add event mask support to nft_ct, also from Florian. 25) Move nf_conn_help structure to include/net/netfilter/nf_conntrack_helper.h. 26) Add a fixed 32 bytes scratchpad area for conntrack helpers. Thus, we don't deal with variable conntrack extensions anymore. Make sure userspace conntrack helper doesn't go over that size. Remove variable size ct extension infrastructure now this code got no more clients. From Florian Westphal. 27) Restore offset and length of nf_ct_ext structure to 8 bytes now that wraparound is not possible any longer, also from Florian. 28) Allow to get rid of unassured flows under stress in conntrack, this applies to DCCP, SCTP and TCP protocols, from Florian. 29) Shrink size of nf_conntrack_ecache structure, from Florian. 30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker, from Gao Feng. 31) Register SYNPROXY hooks on demand, from Florian Westphal. 32) Use pernet hook whenever possible, instead of global hook registration, from Florian Westphal. 33) Pass hook structure to ebt_register_table() to consolidate some infrastructure code, from Florian Westphal. 34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the SYNPROXY code, to make sure device stats are not fooled, patch from Gao Feng. 35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we don't need anymore if we just select a fixed size instead of expensive runtime time calculation of this. From Florian. 36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(), from Florian. 37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from Florian. 38) Attach NAT extension on-demand from masquerade and pptp helper path, from Florian. 39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole. 40) Speed up netns by selective calls of synchronize_net(), from Florian Westphal. 41) Silence stack size warning gcc in 32-bit arch in snmp helper, from Florian. 42) Inconditionally call nf_ct_ext_destroy(), even if we have no extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-28ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabledPaolo Abeni1-5/+17
When creating a new ipvs service, ipv6 addresses are always accepted if CONFIG_IP_VS_IPV6 is enabled. On dest creation the address family is not explicitly checked. This allows the user-space to configure ipvs services even if the system is booted with ipv6.disable=1. On specific configuration, ipvs can try to call ipv6 routing code at setup time, causing the kernel to oops due to fib6_rules_ops being NULL. This change addresses the issue adding a check for the ipv6 module being enabled while validating ipv6 service operations and adding the same validation for dest operations. According to git history, this issue is apparently present since the introduction of ipv6 support, and the oops can be triggered since commit 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local") Fixes: 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2017-04-13netlink: pass extended ACK struct where availableJohannes Berg1-1/+1
This is an add-on to the previous patch that passes the extended ACK structure where it's already available by existing genl_info or extack function arguments. This was done with this spatch (with some manual adjustment of indentation): @@ expression A, B, C, D, E; identifier fn, info; @@ fn(..., struct genl_info *info, ...) { ... -nlmsg_parse(A, B, C, D, E, NULL) +nlmsg_parse(A, B, C, D, E, info->extack) ... } @@ expression A, B, C, D, E; identifier fn, info; @@ fn(..., struct genl_info *info, ...) { <... -nla_parse_nested(A, B, C, D, NULL) +nla_parse_nested(A, B, C, D, info->extack) ...> } @@ expression A, B, C, D, E; identifier fn, extack; @@ fn(..., struct netlink_ext_ack *extack, ...) { <... -nlmsg_parse(A, B, C, D, E, NULL) +nlmsg_parse(A, B, C, D, E, extack) ...> } @@ expression A, B, C, D, E; identifier fn, extack; @@ fn(..., struct netlink_ext_ack *extack, ...) { <... -nla_parse(A, B, C, D, E, NULL) +nla_parse(A, B, C, D, E, extack) ...> } @@ expression A, B, C, D, E; identifier fn, extack; @@ fn(..., struct netlink_ext_ack *extack, ...) { ... -nlmsg_parse(A, B, C, D, E, NULL) +nlmsg_parse(A, B, C, D, E, extack) ... } @@ expression A, B, C, D; identifier fn, extack; @@ fn(..., struct netlink_ext_ack *extack, ...) { <... -nla_parse_nested(A, B, C, D, NULL) +nla_parse_nested(A, B, C, D, extack) ...> } @@ expression A, B, C, D; identifier fn, extack; @@ fn(..., struct netlink_ext_ack *extack, ...) { <... -nlmsg_validate(A, B, C, D, NULL) +nlmsg_validate(A, B, C, D, extack) ...> } @@ expression A, B, C, D; identifier fn, extack; @@ fn(..., struct netlink_ext_ack *extack, ...) { <... -nla_validate(A, B, C, D, NULL) +nla_validate(A, B, C, D, extack) ...> } @@ expression A, B, C; identifier fn, extack; @@ fn(..., struct netlink_ext_ack *extack, ...) { <... -nla_validate_nested(A, B, C, NULL) +nla_validate_nested(A, B, C, extack) ...> } Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-13netlink: pass extended ACK struct to parsing functionsJohannes Berg1-5/+7
Pass the new extended ACK reporting struct to all of the generic netlink parsing functions. For now, pass NULL in almost all callers (except for some in the core.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-07netfilter: Remove exceptional & on function nameArushi Singhal1-2/+2
Remove & from function pointers to conform to the style found elsewhere in the file. Done using the following semantic patch // <smpl> @r@ identifier f; @@ f(...) { ... } @@ identifier r.f; @@ - &f + f // </smpl> Signed-off-by: Arushi Singhal <arushisinghal19971997@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-04-07netfilter: Use seq_puts()/seq_putc() where possiblesimran singhal1-4/+4
For string without format specifiers, use seq_puts(). For seq_printf("\n"), use seq_putc('\n'). Signed-off-by: simran singhal <singhalsimran0@gmail.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-17netfilter: refcounter conversionsReshetova, Elena1-6/+6
refcount_t type and corresponding API (see include/linux/refcount.h) should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-5/+3
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree, they are: 1) Stash ctinfo 3-bit field into pointer to nf_conntrack object from sk_buff so we only access one single cacheline in the conntrack hotpath. Patchset from Florian Westphal. 2) Don't leak pointer to internal structures when exporting x_tables ruleset back to userspace, from Willem DeBruijn. This includes new helper functions to copy data to userspace such as xt_data_to_user() as well as conversions of our ip_tables, ip6_tables and arp_tables clients to use it. Not surprinsingly, ebtables requires an ad-hoc update. There is also a new field in x_tables extensions to indicate the amount of bytes that we copy to userspace. 3) Add nf_log_all_netns sysctl: This new knob allows you to enable logging via nf_log infrastructure for all existing netnamespaces. Given the effort to provide pernet syslog has been discontinued, let's provide a way to restore logging using netfilter kernel logging facilities in trusted environments. Patch from Michal Kubecek. 4) Validate SCTP checksum from conntrack helper, from Davide Caratti. 5) Merge UDPlite conntrack and NAT helpers into UDP, this was mostly a copy&paste from the original helper, from Florian Westphal. 6) Reset netfilter state when duplicating packets, also from Florian. 7) Remove unnecessary check for broadcast in IPv6 in pkttype match and nft_meta, from Liping Zhang. 8) Add missing code to deal with loopback packets from nft_meta when used by the netdev family, also from Liping. 9) Several cleanups on nf_tables, one to remove unnecessary check from the netlink control plane path to add table, set and stateful objects and code consolidation when unregister chain hooks, from Gao Feng. 10) Fix harmless reference counter underflow in IPVS that, however, results in problems with the introduction of the new refcount_t type, from David Windsor. 11) Enable LIBCRC32C from nf_ct_sctp instead of nf_nat_sctp, from Davide Caratti. 12) Missing documentation on nf_tables uapi header, from Liping Zhang. 13) Use rb_entry() helper in xt_connlimit, from Geliang Tang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02ipvs: free ip_vs_dest structs when refcnt=0David Windsor1-5/+3
Currently, the ip_vs_dest cache frees ip_vs_dest objects when their reference count becomes < 0. Aside from not being semantically sound, this is problematic for the new type refcount_t, which will be introduced shortly in a separate patch. refcount_t is the new kernel type for holding reference counts, and provides overflow protection and a constrained interface relative to atomic_t (the type currently being used for kernel reference counts). Per Julian Anastasov: "The problem is that dest_trash currently holds deleted dests (unlinked from RCU lists) with refcnt=0." Changing dest_trash to hold dest with refcnt=1 will allow us to free ip_vs_dest structs when their refcnt=0, in ip_vs_dest_put_and_free(). Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-24Introduce a sysctl that modifies the value of PROT_SOCK.Krister Johansen1-4/+3
Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl that denotes the first unprivileged inet port in the namespace. To disable all privileged ports set this to zero. It also checks for overlap with the local port range. The privileged and local range may not overlap. The use case for this change is to allow containerized processes to bind to priviliged ports, but prevent them from ever being allowed to modify their container's network configuration. The latter is accomplished by ensuring that the network namespace is not a child of the user namespace. This modification was needed to allow the container manager to disable a namespace's priviliged port restrictions without exposing control of the network namespace to processes in the user namespace. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds1-1/+1
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-04Merge tag 'ipvs-for-v4.10' of ↵Pablo Neira Ayuso1-1/+1
https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== IPVS Updates for v4.10 please consider these enhancements to the IPVS for v4.10. * Decrement the IP ttl in all the modes in order to prevent infinite route loops. Thanks to Dwip Banerjee. * Use IS_ERR_OR_NULL macro. Clean-up from Gao Feng. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Several cases of bug fixes in 'net' overlapping other changes in 'net-next-. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15ipvs: Use IS_ERR_OR_NULL(svc) instead of IS_ERR(svc) || svc == NULLGao Feng1-1/+1
This minor refactoring does not change the logic of function ip_vs_genl_dump_dests. Signed-off-by: Gao Feng <fgao@ikuai8.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2016-11-08ipvs: use IPVS_CMD_ATTR_MAX for family.maxattrWANG Cong1-1/+1
family.maxattr is the max index for policy[], the size of ops[] is determined with ARRAY_SIZE(). Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-10-27genetlink: mark families as __ro_after_initJohannes Berg1-1/+1
Now genl_register_family() is the only thing (other than the users themselves, perhaps, but I didn't find any doing that) writing to the family struct. In all families that I found, genl_register_family() is only called from __init functions (some indirectly, in which case I've add __init annotations to clarifly things), so all can actually be marked __ro_after_init. This protects the data structure from accidental corruption. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27genetlink: statically initialize familiesJohannes Berg1-9/+13
Instead of providing macros/inline functions to initialize the families, make all users initialize them statically and get rid of the macros. This reduces the kernel code size by about 1.6k on x86-64 (with allyesconfig). Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>