summaryrefslogtreecommitdiffstats
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2018-01-19caif: reduce stack size with KASANArnd Bergmann1-28/+22
When CONFIG_KASAN is set, we can use relatively large amounts of kernel stack space: net/caif/cfctrl.c:555:1: warning: the frame size of 1600 bytes is larger than 1280 bytes [-Wframe-larger-than=] This adds convenience wrappers around cfpkt_extr_head(), which is responsible for most of the stack growth. With those wrapper functions, gcc apparently starts reusing the stack slots for each instance, thus avoiding the problem. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-18net/sched/sch_prio.c: work around gcc-4.4.4 union initializer issuesAndrew Morton1-3/+7
gcc-4.4.4 has problems witn anon union initializers. Work around this. net/sched/sch_prio.c: In function 'prio_dump_offload': net/sched/sch_prio.c:260: error: unknown field 'stats' specified in initializer net/sched/sch_prio.c:260: warning: initialization makes integer from pointer without a cast net/sched/sch_prio.c:261: error: unknown field 'stats' specified in initializer net/sched/sch_prio.c:261: warning: initialization makes integer from pointer without a cast Fixes: 7fdb61b44c0c95 ("net: sch: prio: Add offload ability to PRIO qdisc") Cc: Nogah Frankel <nogahf@mellanox.com> Cc: Yuval Mintz <yuvalm@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-18flow_netlink: Remove unneeded semicolonsChristopher Díaz Riveros1-2/+2
Trivial fix removes unneeded semicolons after if blocks. This issue was detected by using the Coccinelle software. Signed-off-by: Christopher Díaz Riveros <chrisadr@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-18net: sched: silence uninitialized parent variable warning in tc_dump_tfilterJiri Pirko1-0/+7
When tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK, parent is still passed down but the value is never used. Compiler does not recognize it and issues a warning. Silence it down initializing parent to 0. Fixes: 7960d1daf278 ("net: sched: use block index as a handle instead of qdisc when block is shared") Reported-by: David Miller <davem@davemloft.net> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17net: Remove spinlock from get_net_ns_by_id()Kirill Tkhai1-2/+0
idr_find() is safe under rcu_read_lock() and maybe_get_net() guarantees that net is alive. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17net: Fix possible race in peernet2id_alloc()Kirill Tkhai1-2/+11
peernet2id_alloc() is racy without rtnl_lock() as refcount_read(&peer->count) under net->nsid_lock does not guarantee, peer is alive: rcu_read_lock() peernet2id_alloc() .. spin_lock_bh(&net->nsid_lock) .. refcount_read(&peer->count) (!= 0) .. .. put_net() .. cleanup_net() .. for_each_net(tmp) .. spin_lock_bh(&tmp->nsid_lock) .. __peernet2id(tmp, net) == -1 .. .. .. .. __peernet2id_alloc(alloc == true) .. .. .. rcu_read_unlock() .. .. synchronize_rcu() .. kmem_cache_free(net) After the above situation, net::netns_id contains id pointing to freed memory, and any other dereferencing by the id will operate with this freed memory. Currently, peernet2id_alloc() is used under rtnl_lock() everywhere except ovs_vport_cmd_fill_info(), and this race can't occur. But peernet2id_alloc() is generic interface, and better we fix it before someone really starts use it in wrong context. v2: Don't place refcount_read(&net->count) under net->nsid_lock as suggested by Eric W. Biederman <ebiederm@xmission.com> v3: Rebase on top of net-next Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17net: sched: allow ingress and clsact qdiscs to share filter blocksJiri Pirko1-16/+64
Benefit from the previously introduced shared filter blocks infrastructure and allow ingress and clsact qdisc instances to share filter blocks. The block index is coming from userspace as qdisc option. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17net: sched: introduce ingress/egress block index attributes for qdiscJiri Pirko1-0/+60
Introduce two new attributes to be used for qdisc creation and dumping. One for ingress block, one for egress block. Introduce a set of ops that qdisc which supports block sharing would implement. Passing block indexes in qdisc change is not supported yet and it is checked and forbidded. In future, these attributes are to be reused for specifying block indexes for classes as well. As of this moment however, it is not supported so a check is in place to forbid it. Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17net: sched: use block index as a handle instead of qdisc when block is sharedJiri Pirko1-84/+118
As the tcm_ifindex with value TCM_IFINDEX_MAGIC_BLOCK is invalid ifindex, use it to indicate that we work with block, instead of qdisc. So if tcm_ifindex is set to TCM_IFINDEX_MAGIC_BLOCK, tcm_parent is used to carry block_index. If the block is set to be shared between at least 2 qdiscs, it is forbidden to use the qdisc handle to add/delete filters. In that case, userspace has to pass block_index. Also, for dump of the filters, in case the block is shared in between at least 2 qdiscs, the each filter is dumped with tcm_ifindex value TCM_IFINDEX_MAGIC_BLOCK and tcm_parent set to block_index. That gives the user clear indication, that the filter belongs to a shared block and not only to one qdisc under which it is dumped. Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17net: sched: keep track of offloaded filters and check tc offload featureJiri Pirko5-23/+81
During block bind, we need to check tc offload feature. If it is disabled yet still the block contains offloaded filters, forbid the bind. Also forbid to register callback for a block that already contains offloaded filters, as the play back is not supported now. For keeping track of offloaded filters there is a new counter introduced, alongside with couple of helpers called from cls_* code. These helpers set and clear TCA_CLS_FLAGS_IN_HW flag. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17net: sched: remove classid and q fields from tcf_protoJiri Pirko1-5/+2
Both are no longer used, so remove them. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17net: sched: introduce block mechanism to handle netif_keep_dst callsJiri Pirko4-4/+73
Couple of classifiers call netif_keep_dst directly on q->dev. That is not possible to do directly for shared blocke where multiple qdiscs are owning the block. So introduce a infrastructure to keep track of the block owners in list and use this list to implement block variant of netif_keep_dst. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17net: sched: avoid usage of tp->q in tcf_classifyJiri Pirko1-2/+3
Use block index in the messages instead. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17net: sched: introduce shared filter blocks infrastructureJiri Pirko1-24/+143
Allow qdiscs to share filter blocks among them. Each qdisc type has to use block get/put extended modifications that enable sharing. Shared blocks are tracked within each net namespace and identified by u32 index. This index is passed from user during the qdisc creation. If user passes index that is not used by any other qdisc, new block is created. If user passes index that is already used, the existing block will be re-used. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17net: sched: introduce support for multiple filter chain pointers registrationJiri Pirko1-8/+69
So far, there was possible only to register a single filter chain pointer to block->chain[0]. However, when the blocks will get shareable, we need to allow multiple filter chain pointers registration. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17net: sched: red: don't reset the backlog on every stat dumpJakub Kicinski1-1/+1
Commit 0dfb33a0d7e2 ("sch_red: report backlog information") copied child's backlog into RED's backlog. Back then RED did not maintain its own backlog counts. This has changed after commit 2ccccf5fb43f ("net_sched: update hierarchical backlog too") and commit d7f4f332f082 ("sch_red: update backlog as well"). Copying is no longer necessary. Tested: $ tc -s qdisc show dev veth0 qdisc red 1: root refcnt 2 limit 400000b min 30000b max 30000b ecn Sent 20942 bytes 221 pkt (dropped 0, overlimits 0 requeues 0) backlog 1260b 14p requeues 14 marked 0 early 0 pdrop 0 other 0 qdisc tbf 2: parent 1: rate 1Kbit burst 15000b lat 3585.0s Sent 20942 bytes 221 pkt (dropped 0, overlimits 138 requeues 0) backlog 1260b 14p requeues 14 Recently RED offload was added. We need to make sure drivers don't depend on resetting the stats. This means backlog should be treated like any other statistic: total_stat = new_hw_stat - prev_hw_stat; Adjust mlxsw. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Nogah Frankel <nogahf@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller28-153/+130
Overlapping changes all over. The mini-qdisc bits were a little bit tricky, however. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-4/+3
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-01-17 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add initial BPF map offloading for nfp driver. Currently only programs were supported so far w/o being able to access maps. Offloaded programs are right now only allowed to perform map lookups, and control path is responsible for populating the maps. BPF core infrastructure along with nfp implementation is provided, from Jakub. 2) Various follow-ups to Josef's BPF error injections. More specifically that includes: properly check whether the error injectable event is on function entry or not, remove the percpu bpf_kprobe_override and rather compare instruction pointer with original one, separate error-injection from kprobes since it's not limited to it, add injectable error types in order to specify what is the expected type of failure, and last but not least also support the kernel's fault injection framework, all from Masami. 3) Various misc improvements and cleanups to the libbpf Makefile. That is, fix permissions when installing BPF header files, remove unused variables and functions, and also install the libbpf.h header, from Jesper. 4) When offloading to nfp JIT and the BPF insn is unsupported in the JIT, then reject right at verification time. Also fix libbpf with regards to ELF section name matching by properly treating the program type as prefix. Both from Quentin. 5) Add -DPACKAGE to bpftool when including bfd.h for the disassembler. This is needed, for example, when building libfd from source as bpftool doesn't supply a config.h for bfd.h. Fix from Jiong. 6) xdp_convert_ctx_access() is simplified since it doesn't need to set target size during verification, from Jesper. 7) Let bpftool properly recognize BPF_PROG_TYPE_CGROUP_DEVICE program types, from Roman. 8) Various functions in BPF cpumap were not declared static, from Wei. 9) Fix a double semicolon in BPF samples, from Luis. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds27-143/+135
Pull networking fixes from David Miller: 1) Two read past end of buffer fixes in AF_KEY, from Eric Biggers. 2) Memory leak in key_notify_policy(), from Steffen Klassert. 3) Fix overflow with bpf arrays, from Daniel Borkmann. 4) Fix RDMA regression with mlx5 due to mlx5 no longer using pci_irq_get_affinity(), from Saeed Mahameed. 5) Missing RCU read locking in nl80211_send_iface() when it calls ieee80211_bss_get_ie(), from Dominik Brodowski. 6) cfg80211 should check dev_set_name()'s return value, from Johannes Berg. 7) Missing module license tag in 9p protocol, from Stephen Hemminger. 8) Fix crash due to too small MTU in udp ipv6 sendmsg, from Mike Maloney. 9) Fix endless loop in netlink extack code, from David Ahern. 10) TLS socket layer sets inverted error codes, resulting in an endless loop. From Robert Hering. 11) Revert openvswitch erspan tunnel support, it's mis-designed and we need to kill it before it goes into a real release. From William Tu. 12) Fix lan78xx failures in full speed USB mode, from Yuiko Oshino. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits) net, sched: fix panic when updating miniq {b,q}stats qed: Fix potential use-after-free in qed_spq_post() nfp: use the correct index for link speed table lan78xx: Fix failure in USB Full Speed sctp: do not allow the v4 socket to bind a v4mapped v6 address sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf sctp: reinit stream if stream outcnt has been change by sinit in sendmsg ibmvnic: Fix pending MAC address changes netlink: extack: avoid parenthesized string constant warning ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY net: Allow neigh contructor functions ability to modify the primary_key sh_eth: fix dumping ARSTR Revert "openvswitch: Add erspan tunnel support." net/tls: Fix inverted error codes to avoid endless loop ipv6: ip6_make_skb() needs to clear cork.base.dst sctp: avoid compiler warning on implicit fallthru net: ipv4: Make "ip route get" match iif lo rules again. netlink: extack needs to be reset each time through loop tipc: fix a memory leak in tipc_nl_node_get_link() ipv6: fix udpv6 sendmsg crash caused by too small MTU ...
2018-01-16net, sched: fix panic when updating miniq {b,q}statsDaniel Borkmann3-30/+22
While working on fixing another bug, I ran into the following panic on arm64 by simply attaching clsact qdisc, adding a filter and running traffic on ingress to it: [...] [ 178.188591] Unable to handle kernel read from unreadable memory at virtual address 810fb501f000 [ 178.197314] Mem abort info: [ 178.200121] ESR = 0x96000004 [ 178.203168] Exception class = DABT (current EL), IL = 32 bits [ 178.209095] SET = 0, FnV = 0 [ 178.212157] EA = 0, S1PTW = 0 [ 178.215288] Data abort info: [ 178.218175] ISV = 0, ISS = 0x00000004 [ 178.222019] CM = 0, WnR = 0 [ 178.224997] user pgtable: 4k pages, 48-bit VAs, pgd = 0000000023cb3f33 [ 178.231531] [0000810fb501f000] *pgd=0000000000000000 [ 178.236508] Internal error: Oops: 96000004 [#1] SMP [...] [ 178.311855] CPU: 73 PID: 2497 Comm: ping Tainted: G W 4.15.0-rc7+ #5 [ 178.319413] Hardware name: FOXCONN R2-1221R-A4/C2U4N_MB, BIOS G31FB18A 03/31/2017 [ 178.326887] pstate: 60400005 (nZCv daif +PAN -UAO) [ 178.331685] pc : __netif_receive_skb_core+0x49c/0xac8 [ 178.336728] lr : __netif_receive_skb+0x28/0x78 [ 178.341161] sp : ffff00002344b750 [ 178.344465] x29: ffff00002344b750 x28: ffff810fbdfd0580 [ 178.349769] x27: 0000000000000000 x26: ffff000009378000 [...] [ 178.418715] x1 : 0000000000000054 x0 : 0000000000000000 [ 178.424020] Process ping (pid: 2497, stack limit = 0x000000009f0a3ff4) [ 178.430537] Call trace: [ 178.432976] __netif_receive_skb_core+0x49c/0xac8 [ 178.437670] __netif_receive_skb+0x28/0x78 [ 178.441757] process_backlog+0x9c/0x160 [ 178.445584] net_rx_action+0x2f8/0x3f0 [...] Reason is that sch_ingress and sch_clsact are doing mini_qdisc_pair_init() which sets up miniq pointers to cpu_{b,q}stats from the underlying qdisc. Problem is that this cannot work since they are actually set up right after the qdisc ->init() callback in qdisc_create(), so first packet going into sch_handle_ingress() tries to call mini_qdisc_bstats_cpu_update() and we therefore panic. In order to fix this, allocation of {b,q}stats needs to happen before we call into ->init(). In net-next, there's already such option through commit d59f5ffa59d8 ("net: sched: a dflt qdisc may be used with per cpu stats"). However, the bug needs to be fixed in net still for 4.15. Thus, include these bits to reduce any merge churn and reuse the static_flags field to set TCQ_F_CPUSTATS, and remove the allocation from qdisc_create() since there is no other user left. Prashant Bhole ran into the same issue but for net-next, thus adding him below as well as co-author. Same issue was also reported by Sandipan Das when using bcc. Fixes: 46209401f8f6 ("net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath") Reference: https://lists.iovisor.org/pipermail/iovisor-dev/2018-January/001190.html Reported-by: Sandipan Das <sandipan@linux.vnet.ibm.com> Co-authored-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Co-authored-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16net: delete /proc THIS_MODULE referencesAlexey Dobriyan61-96/+0
/proc has been ignoring struct file_operations::owner field for 10 years. Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where inode->i_fop is initialized with proxy struct file_operations for regular files: - if (de->proc_fops) - inode->i_fop = de->proc_fops; + if (de->proc_fops) { + if (S_ISREG(inode->i_mode)) + inode->i_fop = &proc_reg_file_ops; + else + inode->i_fop = de->proc_fops; + } VFS stopped pinning module at this point. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16tipc: fix race condition at topology server receiveJon Maloy3-46/+51
We have identified a race condition during reception of socket events and messages in the topology server. - The function tipc_close_conn() is releasing the corresponding struct tipc_subscriber instance without considering that there may still be items in the receive work queue. When those are scheduled, in the function tipc_receive_from_work(), they are using the subscriber pointer stored in struct tipc_conn, without first checking if this is valid or not. This will sometimes lead to crashes, as the next call of tipc_conn_recvmsg() will access the now deleted item. We fix this by making the usage of this pointer conditional on whether the connection is active or not. I.e., we check the condition test_bit(CF_CONNECTED) before making the call tipc_conn_recvmsg(). - Since the two functions may be running on different cores, the condition test described above is not enough. tipc_close_conn() may come in between and delete the subscriber item after the condition test is done, but before tipc_conn_recv_msg() is finished. This happens less frequently than the problem described above, but leads to the same symptoms. We fix this by using the existing sk_callback_lock for mutual exclusion in the two functions. In addition, we have to move a call to tipc_conn_terminate() outside the mentioned lock to avoid deadlock. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16Merge tag 'mac80211-for-davem-2018-01-15' of ↵David S. Miller4-9/+15
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== More fixes: * hwsim: - properly flush deletion works at module unload - validate # of channels passed from userspace * cfg80211: - fix RCU locking regression - initialize on-stack channel data for nl80211 event - check dev_set_name() return value ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16sctp: do not allow the v4 socket to bind a v4mapped v6 addressXin Long1-8/+6
The check in sctp_sockaddr_af is not robust enough to forbid binding a v4mapped v6 addr on a v4 socket. The worse thing is that v4 socket's bind_verify would not convert this v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4 socket bound a v6 addr. This patch is to fix it by doing the common sa.sa_family check first, then AF_INET check for v4mapped v6 addrs. Fixes: 7dab83de50c7 ("sctp: Support ipv6only AF_INET6 sockets.") Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbufXin Long1-10/+6
After commit cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep"), it may change to lock another sk if the asoc has been peeled off in sctp_wait_for_sndbuf. However, the asoc's new sk could be already closed elsewhere, as it's in the sendmsg context of the old sk that can't avoid the new sk's closing. If the sk's last one refcnt is held by this asoc, later on after putting this asoc, the new sk will be freed, while under it's own lock. This patch is to revert that commit, but fix the old issue by returning error under the old sk's lock. Fixes: cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep") Reported-by: syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16sctp: reinit stream if stream outcnt has been change by sinit in sendmsgXin Long1-2/+8
After introducing sctp_stream structure, sctp uses stream->outcnt as the out stream nums instead of c.sinit_num_ostreams. However when users use sinit in cmsg, it only updates c.sinit_num_ostreams in sctp_sendmsg. At that moment, stream->outcnt is still using previous value. If it's value is not updated, the sinit_num_ostreams of sinit could not really work. This patch is to fix it by updating stream->outcnt and reiniting stream if stream outcnt has been change by sinit in sendmsg. Fixes: a83863174a61 ("sctp: prepare asoc stream for stream reconf") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16devlink: Add relation between dpipe and resourceArkadi Sharshevsky1-0/+37
The hardware processes which are modeled via dpipe commonly use some internal hardware resources. Such relation can improve the understanding of hardware limitations. The number of resource's unit consumed per table's entry are also provided for each table. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16devlink: Add support for reloadArkadi Sharshevsky1-0/+47
Add support for performing driver hot reload. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16devlink: Add support for resource abstractionArkadi Sharshevsky1-0/+374
Add support for hardware resource abstraction over devlink. Each resource is identified via id, furthermore it contains information regarding its size and its related sub resources. Each resource can also provide its current occupancy. In some cases the sizes of some resources can be changed, yet for those changes to take place a hot driver reload may be needed. The reload capability will be introduced in the next patch. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16devlink: Add per devlink instance lockArkadi Sharshevsky1-59/+77
This is a preparation before introducing resources and hot reload support. Currently there are two global lock where one protects all devlink access, and the second one protects devlink port access. This patch adds per devlink instance lock which protects the internal members which are the sb/dpipe/ resource/ports. By introducing this lock the global devlink port lock can be discarded. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15Merge tag 'linux-can-next-for-4.16-20180105' of ↵David S. Miller4-15/+17
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2017-12-01,Re: pull-request: can-next this is a pull request of 7 patches for net-next/master. All patches are by me. Patch 6 is for the "can_raw" protocol and add error checking to the bind() function. All other patches clean up the coding style and remove unused parameters in various CAN drivers and infrastructure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANYJim Westfall1-1/+6
Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices to avoid making an entry for every remote ip the device needs to talk to. This used the be the old behavior but became broken in a263b3093641f (ipv4: Make neigh lookups directly in output packet path) and later removed in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point devices) because it was broken. Signed-off-by: Jim Westfall <jwestfall@surrealistic.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15net: Allow neigh contructor functions ability to modify the primary_keyJim Westfall1-2/+2
Use n->primary_key instead of pkey to account for the possibility that a neigh constructor function may have modified the primary_key value. Signed-off-by: Jim Westfall <jwestfall@surrealistic.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15Revert "openvswitch: Add erspan tunnel support."William Tu1-50/+1
This reverts commit ceaa001a170e43608854d5290a48064f57b565ed. The OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS attr should be designed as a nested attribute to support all ERSPAN v1 and v2's fields. The current attr is a be32 supporting only one field. Thus, this patch reverts it and later patch will redo it using nested attr. Signed-off-by: William Tu <u9012063@gmail.com> Cc: Jiri Benc <jbenc@redhat.com> Cc: Pravin Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15ipv6: Fix build with gcc-4.4.5Ido Schimmel1-2/+6
Emil reported the following compiler errors: net/ipv6/route.c: In function `rt6_sync_up`: net/ipv6/route.c:3586: error: unknown field `nh_flags` specified in initializer net/ipv6/route.c:3586: warning: missing braces around initializer net/ipv6/route.c:3586: warning: (near initialization for `arg.<anonymous>`) net/ipv6/route.c: In function `rt6_sync_down_dev`: net/ipv6/route.c:3695: error: unknown field `event` specified in initializer net/ipv6/route.c:3695: warning: missing braces around initializer net/ipv6/route.c:3695: warning: (near initialization for `arg.<anonymous>`) Problem is with the named initializers for the anonymous union members. Fix this by adding curly braces around the initialization. Fixes: 4c981e28d373 ("ipv6: Prepare to handle multiple netdev events") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Emil S Tantilov <emils.tantilov@gmail.com> Tested-by: Emil S Tantilov <emils.tantilov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15tipc: fix bug during lookup of multicast destination nodesJon Maloy3-8/+4
In commit 232d07b74a33 ("tipc: improve groupcast scope handling") we inadvertently broke non-group multicast transmission when changing the parameter 'domain' to 'scope' in the function tipc_nametbl_lookup_dst_nodes(). We missed to make the corresponding change in the calling function, with the result that the lookup always fails. A closer anaysis reveals that this parameter is not needed at all. Non-group multicast is hard coded to use CLUSTER_SCOPE, and in the current implementation this will be delivered to all matching destinations except those which are published with NODE_SCOPE on other nodes. Since such publications never will be visible on the sending node anyway, it makes no sense to discriminate by scope at all. We now remove this parameter altogether. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15net: Convert atomic_t net::count to refcount_tKirill Tkhai4-10/+10
Since net could be obtained from RCU lists, and there is a race with net destruction, the patch converts net::count to refcount_t. This provides sanity checks for the cases of incrementing counter of already dead net, when maybe_get_net() has to used instead of get_net(). Drivers: allyesconfig and allmodconfig are OK. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15net/tls: Fix inverted error codes to avoid endless loopr.hering@avm.de1-2/+2
sendfile() calls can hang endless with using Kernel TLS if a socket error occurs. Socket error codes must be inverted by Kernel TLS before returning because they are stored with positive sign. If returned non-inverted they are interpreted as number of bytes sent, causing endless looping of the splice mechanic behind sendfile(). Signed-off-by: Robert Hering <r.hering@avm.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15ipv6: ip6_make_skb() needs to clear cork.base.dstEric Dumazet1-0/+1
In my last patch, I missed fact that cork.base.dst was not initialized in ip6_make_skb() : If ip6_setup_cork() returns an error, we might attempt a dst_release() on some random pointer. Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15sctp: removed unused var from sctp_make_authMarcelo Ricardo Leitner1-2/+1
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15sctp: avoid compiler warning on implicit fallthruMarcelo Ricardo Leitner2-2/+3
These fall-through are expected. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15net: ipv4: Make "ip route get" match iif lo rules again.Lorenzo Colitti1-0/+1
Commit 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") broke "ip route get" in the presence of rules that specify iif lo. Host-originated traffic always has iif lo, because ip_route_output_key_hash and ip6_route_output_flags set the flow iif to LOOPBACK_IFINDEX. Thus, putting "iif lo" in an ip rule is a convenient way to select only originated traffic and not forwarded traffic. inet_rtm_getroute used to match these rules correctly because even though it sets the flow iif to 0, it called ip_route_output_key which overwrites iif with LOOPBACK_IFINDEX. But now that it calls ip_route_output_key_hash_rcu, the ifindex will remain 0 and not match the iif lo in the rule. As a result, "ip route get" will return ENETUNREACH. Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") Tested: https://android.googlesource.com/kernel/tests/+/master/net/test/multinetwork_test.py passes again Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15netlink: extack needs to be reset each time through loopDavid Ahern1-1/+2
syzbot triggered the WARN_ON in netlink_ack testing the bad_attr value. The problem is that netlink_rcv_skb loops over the skb repeatedly invoking the callback and without resetting the extack leaving potentially stale data. Initializing each time through avoids the WARN_ON. Fixes: 2d4bc93368f5a ("netlink: extended ACK reporting") Reported-by: syzbot+315fa6766d0f7c359327@syzkaller.appspotmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15tipc: fix a memory leak in tipc_nl_node_get_link()Cong Wang1-12/+14
When tipc_node_find_by_name() fails, the nlmsg is not freed. While on it, switch to a goto label to properly free it. Fixes: be9c086715c ("tipc: narrow down exposure of struct tipc_node") Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15tipc: fix a potental access after delete in tipc_sk_join()Jon Maloy1-0/+1
In commit d12d2e12cec2 "tipc: send out join messages as soon as new member is discovered") we added a call to the function tipc_group_join() without considering the case that the preceding tipc_sk_publish() might have failed, and the group item already deleted. We fix this by returning from tipc_sk_join() directly after the failed tipc_sk_publish. Reported-by: syzbot+e3eeae78ea88b8d6d858@syzkaller.appspotmail.com Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15ipv6: fix udpv6 sendmsg crash caused by too small MTUMike Maloney1-2/+4
The logic in __ip6_append_data() assumes that the MTU is at least large enough for the headers. A device's MTU may be adjusted after being added while sendmsg() is processing data, resulting in __ip6_append_data() seeing any MTU. For an mtu smaller than the size of the fragmentation header, the math results in a negative 'maxfraglen', which causes problems when refragmenting any previous skb in the skb_write_queue, leaving it possibly malformed. Instead sendmsg returns EINVAL when the mtu is calculated to be less than IPV6_MIN_MTU. Found by syzkaller: kernel BUG at ./include/linux/skbuff.h:2064! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801d0b68580 task.stack: ffff8801ac6b8000 RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline] RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: 0018:ffff8801ac6bf570 EFLAGS: 00010216 RAX: 0000000000010000 RBX: 0000000000000028 RCX: ffffc90003cce000 RDX: 00000000000001b8 RSI: ffffffff839df06f RDI: ffff8801d9478ca0 RBP: ffff8801ac6bf780 R08: ffff8801cc3f1dbc R09: 0000000000000000 R10: ffff8801ac6bf7a0 R11: 43cb4b7b1948a9e7 R12: ffff8801cc3f1dc8 R13: ffff8801cc3f1d40 R14: 0000000000001036 R15: dffffc0000000000 FS: 00007f43d740c700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7834984000 CR3: 00000001d79b9000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip6_finish_skb include/net/ipv6.h:911 [inline] udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093 udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 SYSC_sendto+0x352/0x5a0 net/socket.c:1750 SyS_sendto+0x40/0x50 net/socket.c:1718 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x4512e9 RSP: 002b:00007f43d740bc08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000007180a8 RCX: 00000000004512e9 RDX: 000000000000002e RSI: 0000000020d08000 RDI: 0000000000000005 RBP: 0000000000000086 R08: 00000000209c1000 R09: 000000000000001c R10: 0000000000040800 R11: 0000000000000216 R12: 00000000004b9c69 R13: 00000000ffffffff R14: 0000000000000005 R15: 00000000202c2000 Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd <0f> 0b 49 8d 45 78 4d 8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP: ffff8801ac6bf570 RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: ffff8801ac6bf570 Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Mike Maloney <maloney@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-159p: add missing module license for xen transportStephen Hemminger1-0/+4
The 9P of Xen module is missing required license and module information. See https://bugzilla.kernel.org/show_bug.cgi?id=198109 Reported-by: Alan Bartlett <ajb@elrepo.org> Fixes: 868eb122739a ("xen/9pfs: introduce Xen 9pfs transport driver") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15cfg80211: check dev_set_name() return valueJohannes Berg1-1/+7
syzbot reported a warning from rfkill_alloc(), and after a while I think that the reason is that it was doing fault injection and the dev_set_name() failed, leaving the name NULL, and we didn't check the return value and got to rfkill_alloc() with a NULL name. Since we really don't want a NULL name, we ought to check the return value. Fixes: fb28ad35906a ("net: struct device - replace bus_id with dev_name(), dev_set_name()") Reported-by: syzbot+1ddfb3357e1d7bb5b5d3@syzkaller.appspotmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-01-15mac80211_hwsim: validate number of different channelsJohannes Berg1-2/+0
When creating a new radio on the fly, hwsim allows this to be done with an arbitrary number of channels, but cfg80211 only supports a limited number of simultaneous channels, leading to a warning. Fix this by validating the number - this requires moving the define for the maximum out to a visible header file. Reported-by: syzbot+8dd9051ff19940290931@syzkaller.appspotmail.com Fixes: b59ec8dd4394 ("mac80211_hwsim: fix number of channels in interface combinations") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-01-15nl80211: take RCU read lock when calling ieee80211_bss_get_ie()Dominik Brodowski1-4/+7
As ieee80211_bss_get_ie() derefences an RCU to return ssid_ie, both the call to this function and any operation on this variable need protection by the RCU read lock. Fixes: 44905265bc15 ("nl80211: don't expose wdev->ssid for most interfaces") Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Johannes Berg <johannes.berg@intel.com>