summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2017-07-03Merge https://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-5/+17
Some overlapping changes in the mlx5 driver. A merge conflict resolution posted by Stephen Rothwell was used as a guide. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03net: make sk_ehashfn() staticEric Dumazet2-2/+1
sk_ehashfn() is only used from a single file. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03net: avoid one splat in fib_nl_delrule()Eric Dumazet1-1/+1
We need to use refcount_set() on a newly created rule to avoid following error : [ 64.601749] ------------[ cut here ]------------ [ 64.601757] WARNING: CPU: 0 PID: 6476 at lib/refcount.c:184 refcount_sub_and_test+0x75/0xa0 [ 64.601758] Modules linked in: w1_therm wire cdc_acm ehci_pci ehci_hcd mlx4_en ib_uverbs mlx4_ib ib_core mlx4_core [ 64.601769] CPU: 0 PID: 6476 Comm: ip Tainted: G W 4.12.0-smp-DEV #274 [ 64.601771] task: ffff8837bf482040 task.stack: ffff8837bdc08000 [ 64.601773] RIP: 0010:refcount_sub_and_test+0x75/0xa0 [ 64.601774] RSP: 0018:ffff8837bdc0f5c0 EFLAGS: 00010286 [ 64.601776] RAX: 0000000000000026 RBX: 0000000000000001 RCX: 0000000000000000 [ 64.601777] RDX: 0000000000000026 RSI: 0000000000000096 RDI: ffffed06f7b81eae [ 64.601778] RBP: ffff8837bdc0f5d0 R08: 0000000000000004 R09: fffffbfff4a54c25 [ 64.601779] R10: 00000000cbc500e5 R11: ffffffffa52a6128 R12: ffff881febcf6f24 [ 64.601779] R13: ffff881fbf4eaf00 R14: ffff881febcf6f80 R15: ffff8837d7a4ed00 [ 64.601781] FS: 00007ff5a2f6b700(0000) GS:ffff881fff800000(0000) knlGS:0000000000000000 [ 64.601782] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 64.601783] CR2: 00007ffcdc70d000 CR3: 0000001f9c91e000 CR4: 00000000001406f0 [ 64.601783] Call Trace: [ 64.601786] refcount_dec_and_test+0x11/0x20 [ 64.601790] fib_nl_delrule+0xc39/0x1630 [ 64.601793] ? is_bpf_text_address+0xe/0x20 [ 64.601795] ? fib_nl_newrule+0x25e0/0x25e0 [ 64.601798] ? depot_save_stack+0x133/0x470 [ 64.601801] ? ns_capable+0x13/0x20 [ 64.601803] ? __netlink_ns_capable+0xcc/0x100 [ 64.601806] rtnetlink_rcv_msg+0x23a/0x6a0 [ 64.601808] ? rtnl_newlink+0x1630/0x1630 [ 64.601811] ? memset+0x31/0x40 [ 64.601813] netlink_rcv_skb+0x2d7/0x440 [ 64.601815] ? rtnl_newlink+0x1630/0x1630 [ 64.601816] ? netlink_ack+0xaf0/0xaf0 [ 64.601818] ? kasan_unpoison_shadow+0x35/0x50 [ 64.601820] ? __kmalloc_node_track_caller+0x4c/0x70 [ 64.601821] rtnetlink_rcv+0x28/0x30 [ 64.601823] netlink_unicast+0x422/0x610 [ 64.601824] ? netlink_attachskb+0x650/0x650 [ 64.601826] netlink_sendmsg+0x7b7/0xb60 [ 64.601828] ? netlink_unicast+0x610/0x610 [ 64.601830] ? netlink_unicast+0x610/0x610 [ 64.601832] sock_sendmsg+0xba/0xf0 [ 64.601834] ___sys_sendmsg+0x6a9/0x8c0 [ 64.601835] ? copy_msghdr_from_user+0x520/0x520 [ 64.601837] ? __alloc_pages_nodemask+0x160/0x520 [ 64.601839] ? memcg_write_event_control+0xd60/0xd60 [ 64.601841] ? __alloc_pages_slowpath+0x1d50/0x1d50 [ 64.601843] ? kasan_slab_free+0x71/0xc0 [ 64.601845] ? mem_cgroup_commit_charge+0xb2/0x11d0 [ 64.601847] ? lru_cache_add_active_or_unevictable+0x7d/0x1a0 [ 64.601849] ? __handle_mm_fault+0x1af8/0x2810 [ 64.601851] ? may_open_dev+0xc0/0xc0 [ 64.601852] ? __pmd_alloc+0x2c0/0x2c0 [ 64.601853] ? __fdget+0x13/0x20 [ 64.601855] __sys_sendmsg+0xc6/0x150 [ 64.601856] ? __sys_sendmsg+0xc6/0x150 [ 64.601857] ? SyS_shutdown+0x170/0x170 [ 64.601859] ? handle_mm_fault+0x28a/0x650 [ 64.601861] SyS_sendmsg+0x12/0x20 [ 64.601863] entry_SYSCALL_64_fastpath+0x13/0x94 Fixes: 717d1e993ad8 ("net: convert fib_rule.refcnt from atomic_t to refcount_t") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03mlx4_en: make mlx4_log_num_mgm_entry_size staticZhu Yanjun2-2/+1
The variable mlx4_log_num_mgm_entry_size is only called in main.c. CC: Joe Jin <joe.jin@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03net: core: Fix slab-out-of-bounds in netdev_stats_to_stats64Alban Browaeys1-1/+1
commit 9256645af098 ("net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64") made an attempt to read beyond the size of the source a possibility. Fix to only copy src size to dest. As dest might be bigger than src. ================================================================== BUG: KASAN: slab-out-of-bounds in netdev_stats_to_stats64+0xe/0x30 at addr ffff8801be248b20 Read of size 192 by task VBoxNetAdpCtl/6734 CPU: 1 PID: 6734 Comm: VBoxNetAdpCtl Tainted: G O 4.11.4prahal+intel+ #118 Hardware name: LENOVO 20CDCTO1WW/20CDCTO1WW, BIOS GQET52WW (1.32 ) 05/04/2017 Call Trace: dump_stack+0x63/0x86 kasan_object_err+0x1c/0x70 kasan_report+0x270/0x520 ? netdev_stats_to_stats64+0xe/0x30 ? sched_clock_cpu+0x1b/0x190 ? __module_address+0x3e/0x3b0 ? unwind_next_frame+0x1ea/0xb00 check_memory_region+0x13c/0x1a0 memcpy+0x23/0x50 netdev_stats_to_stats64+0xe/0x30 dev_get_stats+0x1b9/0x230 rtnl_fill_stats+0x44/0xc00 ? nla_put+0xc6/0x130 rtnl_fill_ifinfo+0xe9e/0x3700 ? rtnl_fill_vfinfo+0xde0/0xde0 ? sched_clock+0x9/0x10 ? sched_clock+0x9/0x10 ? sched_clock_local+0x120/0x130 ? __module_address+0x3e/0x3b0 ? unwind_next_frame+0x1ea/0xb00 ? sched_clock+0x9/0x10 ? sched_clock+0x9/0x10 ? sched_clock_cpu+0x1b/0x190 ? VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp] ? depot_save_stack+0x1d8/0x4a0 ? depot_save_stack+0x34f/0x4a0 ? depot_save_stack+0x34f/0x4a0 ? save_stack+0xb1/0xd0 ? save_stack_trace+0x16/0x20 ? save_stack+0x46/0xd0 ? kasan_slab_alloc+0x12/0x20 ? __kmalloc_node_track_caller+0x10d/0x350 ? __kmalloc_reserve.isra.36+0x2c/0xc0 ? __alloc_skb+0xd0/0x560 ? rtmsg_ifinfo_build_skb+0x61/0x120 ? rtmsg_ifinfo.part.25+0x16/0xb0 ? rtmsg_ifinfo+0x47/0x70 ? register_netdev+0x15/0x30 ? vboxNetAdpOsCreate+0xc0/0x1c0 [vboxnetadp] ? vboxNetAdpCreate+0x210/0x400 [vboxnetadp] ? VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp] ? do_vfs_ioctl+0x17f/0xff0 ? SyS_ioctl+0x74/0x80 ? do_syscall_64+0x182/0x390 ? __alloc_skb+0xd0/0x560 ? __alloc_skb+0xd0/0x560 ? save_stack_trace+0x16/0x20 ? init_object+0x64/0xa0 ? ___slab_alloc+0x1ae/0x5c0 ? ___slab_alloc+0x1ae/0x5c0 ? __alloc_skb+0xd0/0x560 ? sched_clock+0x9/0x10 ? kasan_unpoison_shadow+0x35/0x50 ? kasan_kmalloc+0xad/0xe0 ? __kmalloc_node_track_caller+0x246/0x350 ? __alloc_skb+0xd0/0x560 ? kasan_unpoison_shadow+0x35/0x50 ? memset+0x31/0x40 ? __alloc_skb+0x31f/0x560 ? napi_consume_skb+0x320/0x320 ? br_get_link_af_size_filtered+0xb7/0x120 [bridge] ? if_nlmsg_size+0x440/0x630 rtmsg_ifinfo_build_skb+0x83/0x120 rtmsg_ifinfo.part.25+0x16/0xb0 rtmsg_ifinfo+0x47/0x70 register_netdevice+0xa2b/0xe50 ? __kmalloc+0x171/0x2d0 ? netdev_change_features+0x80/0x80 register_netdev+0x15/0x30 vboxNetAdpOsCreate+0xc0/0x1c0 [vboxnetadp] vboxNetAdpCreate+0x210/0x400 [vboxnetadp] ? vboxNetAdpComposeMACAddress+0x1d0/0x1d0 [vboxnetadp] ? kasan_check_write+0x14/0x20 VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp] ? VBoxNetAdpLinuxOpen+0x20/0x20 [vboxnetadp] ? lock_acquire+0x11c/0x270 ? __audit_syscall_entry+0x2fb/0x660 do_vfs_ioctl+0x17f/0xff0 ? __audit_syscall_entry+0x2fb/0x660 ? ioctl_preallocate+0x1d0/0x1d0 ? __audit_syscall_entry+0x2fb/0x660 ? kmem_cache_free+0xb2/0x250 ? syscall_trace_enter+0x537/0xd00 ? exit_to_usermode_loop+0x100/0x100 SyS_ioctl+0x74/0x80 ? do_sys_open+0x350/0x350 ? do_vfs_ioctl+0xff0/0xff0 do_syscall_64+0x182/0x390 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f7e39a1ae07 RSP: 002b:00007ffc6f04c6d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffc6f04c730 RCX: 00007f7e39a1ae07 RDX: 00007ffc6f04c730 RSI: 00000000c0207601 RDI: 0000000000000007 RBP: 00007ffc6f04c700 R08: 00007ffc6f04c780 R09: 0000000000000008 R10: 0000000000000541 R11: 0000000000000206 R12: 0000000000000007 R13: 00000000c0207601 R14: 00007ffc6f04c730 R15: 0000000000000012 Object at ffff8801be248008, in cache kmalloc-4096 size: 4096 Allocated: PID = 6734 save_stack_trace+0x16/0x20 save_stack+0x46/0xd0 kasan_kmalloc+0xad/0xe0 __kmalloc+0x171/0x2d0 alloc_netdev_mqs+0x8a7/0xbe0 vboxNetAdpOsCreate+0x65/0x1c0 [vboxnetadp] vboxNetAdpCreate+0x210/0x400 [vboxnetadp] VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp] do_vfs_ioctl+0x17f/0xff0 SyS_ioctl+0x74/0x80 do_syscall_64+0x182/0x390 return_from_SYSCALL_64+0x0/0x6a Freed: PID = 5600 save_stack_trace+0x16/0x20 save_stack+0x46/0xd0 kasan_slab_free+0x73/0xc0 kfree+0xe4/0x220 kvfree+0x25/0x30 single_release+0x74/0xb0 __fput+0x265/0x6b0 ____fput+0x9/0x10 task_work_run+0xd5/0x150 exit_to_usermode_loop+0xe2/0x100 do_syscall_64+0x26c/0x390 return_from_SYSCALL_64+0x0/0x6a Memory state around the buggy address: ffff8801be248a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8801be248b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8801be248b80: 00 00 00 00 00 00 00 00 00 00 00 07 fc fc fc fc ^ ffff8801be248c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8801be248c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Signed-off-by: Alban Browaeys <alban.browaeys@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03netxen_nic: Remove unused pointer hdr in netxen_setup_minidump()Christos Gkekas1-3/+0
Pointer hdr in netxen_setup_minidump() is set but never used, thus should be removed. Signed-off-by: Christos Gkekas <chris.gekas@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03Merge branch 'vxlan-geneve-fix-hlist-corruption'David S. Miller3-28/+62
Jiri Benc says: ==================== vxlan, geneve: fix hlist corruption Fix memory corruption introduced with the support of both IPv4 and IPv6 sockets in a single device. The same bug is present in VXLAN and Geneve. ==================== Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03geneve: fix hlist corruptionJiri Benc1-16/+32
It's not a good idea to add the same hlist_node to two different hash lists. This leads to various hard to debug memory corruptions. Fixes: 8ed66f0e8235 ("geneve: implement support for IPv6-based tunnels") Cc: John W. Linville <linville@tuxdriver.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03vxlan: fix hlist corruptionJiri Benc2-12/+30
It's not a good idea to add the same hlist_node to two different hash lists. This leads to various hard to debug memory corruptions. Fixes: b1be00a6c39f ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03net/mlxfw: Properly handle dependancy with non-loadable mlx5Or Gerlitz1-1/+1
If mlx5 is set to be built-in and mlxfw as a module, we get a link error: drivers/built-in.o: In function `mlx5_firmware_flash': (.text+0x5aed72): undefined reference to `mlxfw_firmware_flash' Since we don't want to mandate selecting mlxfw for mlx5 users, we use the IS_REACHABLE macro to make sure that a stub is exposed to the caller. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Jakub Kicinski <kubakici@wp.pl> Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03iucv: Convert sk_wmem_alloc accesses to refcount_t.David S. Miller1-1/+1
Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03ctcm_fsms: Convert skb->user accesses to refcount_tDavid S. Miller1-6/+6
Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03Merge branch 'bpf-misc-helper-verifier-improvements'David S. Miller10-208/+604
Daniel Borkmann says: ==================== Misc BPF helper/verifier improvements Miscellanous improvements I still had in my queue, it adds a new bpf_skb_adjust_room() helper for cls_bpf, exports to fdinfo whether tail call array owner is JITed, so iproute2 error reporting can be improved on that regard, a small cleanup and extension to trace printk, two verifier patches, one to make the code around narrower ctx access a bit more straight forward and one to allow for imm += x operations, that we've seen LLVM generating and the verifier currently rejecting. We've included the patch 6 given it's rather small and we ran into it from LLVM side, it would be great if it could be queued for stable as well after the merge window. Last but not least, test cases are added also related to imm alu improvement. Thanks a lot! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03bpf: add various test cases for verifier selftestDaniel Borkmann1-0/+165
Add couple of verifier test cases for x|imm += pkt_ptr, including the imm += x extension. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03bpf, verifier: add additional patterns to evaluate_reg_imm_aluJohn Fastabend1-0/+62
Currently the verifier does not track imm across alu operations when the source register is of unknown type. This adds additional pattern matching to catch this and track imm. We've seen LLVM generating this pattern while working on cilium. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03bpf: extend bpf_trace_printk to support %iJohn Fastabend1-3/+4
Currently, bpf_trace_printk does not support common formatting symbol '%i' however vsprintf does and is what eventually gets called by bpf helper. If users are used to '%i' and currently make use of it, then bpf_trace_printk will just return with error without dumping anything to the trace pipe, so just add support for '%i' to the helper. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03bpf: export whether tail call has jited ownerDaniel Borkmann1-1/+6
We do export through fdinfo already whether a prog is JITed or not, given a program load can fail in case of either prog or tail call map has JITed property, but neither both are JITed or not JITed, we can facilitate error reporting in loaders like iproute2 through exporting owner_jited of tail call map. We already do export owner_prog_type through this facility, so parser can pick up both for comparison. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03bpf: simplify narrower ctx accessDaniel Borkmann5-195/+209
This work tries to make the semantics and code around the narrower ctx access a bit easier to follow. Right now everything is done inside the .is_valid_access(). Offset matching is done differently for read/write types, meaning writes don't support narrower access and thus matching only on offsetof(struct foo, bar) is enough whereas for read case that supports narrower access we must check for offsetof(struct foo, bar) + offsetof(struct foo, bar) + sizeof(<bar>) - 1 for each of the cases. For read cases of individual members that don't support narrower access (like packet pointers or skb->cb[] case which has its own narrow access logic), we check as usual only offsetof(struct foo, bar) like in write case. Then, for the case where narrower access is allowed, we also need to set the aux info for the access. Meaning, ctx_field_size and converted_op_size have to be set. First is the original field size e.g. sizeof(<bar>) as in above example from the user facing ctx, and latter one is the target size after actual rewrite happened, thus for the kernel facing ctx. Also here we need the range match and we need to keep track changing convert_ctx_access() and converted_op_size from is_valid_access() as both are not at the same location. We can simplify the code a bit: check_ctx_access() becomes simpler in that we only store ctx_field_size as a meta data and later in convert_ctx_accesses() we fetch the target_size right from the location where we do convert. Should the verifier be misconfigured we do reject for BPF_WRITE cases or target_size that are not provided. For the subsystems, we always work on ranges in is_valid_access() and add small helpers for ranges and narrow access, convert_ctx_accesses() sets target_size for the relevant instruction. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Cc: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03bpf: add bpf_skb_adjust_room helperDaniel Borkmann3-7/+151
This work adds a helper that can be used to adjust net room of an skb. The helper is generic and can be further extended in future. Main use case is for having a programmatic way to add/remove room to v4/v6 header options along with cls_bpf on egress and ingress hook of the data path. It reuses most of the infrastructure that we added for the bpf_skb_change_type() helper which can be used in nat64 translations. Similarly, the helper only takes care of adjusting the room so that related data is populated and csum adapted out of the BPF program using it. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03bpf, net: add skb_mac_header_len helperDaniel Borkmann2-2/+7
Add a small skb_mac_header_len() helper similarly as the skb_network_header_len() we have and replace open coded places in BPF's bpf_skb_change_proto() helper. Will also be used in upcoming work. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03net: cdc_mbim: apply "NDP to end" quirk to HP lt4132Tore Anderson1-0/+7
The HP lt4132 LTE/HSPA+ 4G Module (03f0:a31d) is a rebranded Huawei ME906s-158 device. It, like the ME906s-158, requires the "NDP to end" quirk for correct operation. Signed-off-by: Tore Anderson <tore@fud.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03Documentation: fix wrong example commandMatteo Croce1-2/+2
In the IPVLAN documentation there is an example command line where the master and slave interface names are inverted. Fix the command line and also add the optional `name' keyword to better describe what the command is doing. v2: added commit message Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03vxlan: correctly set vxlan->net when creating the device in a netnsSabrina Dubroca1-3/+6
Commit a985343ba906 ("vxlan: refactor verification and application of configuration") modified vxlan device creation, and replaced the assignment of vxlan->net to src_net with dev_net(netdev) in ->setup(). But dev_net(netdev) is not the same as src_net. At the time ->setup() is called, dev_net hasn't been set yet, so we end up creating the socket for the vxlan device in init_net. Fix this by bringing back the assignment of vxlan->net during device creation. Fixes: a985343ba906 ("vxlan: refactor verification and application of configuration") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03Merge branch 'hns-phy-loopback'David S. Miller5-71/+92
Lin Yun Sheng says: ==================== Add loopback support in phy_driver and hns ethtool fix This Patch Set add set_loopback in phy_driver and use it to setup loopback when doing ethtool phy self_test. Patch V8: Respin the Patch based on net-next Patch V7: 1. Add comment why resume the phy in hns_nic_config_phy_loopback. 2. Fix a typo error in patch description. Patch V6: Fix Or'ing error code in __lb_setup. Patch V5: Removing non loopback related code change. Patch V4: 1. Remove c45 checking 2. Add -ENOTSUPP when function pointer is null, take mutex in phy_loopback. Patch V3: Calling phy_loopback enable and disable in pair in hns mac driver. Patch V2: 1. Add phy_loopback in phy_device.c. 2. Do error checking and do the read and write once in genphy_loopback. 3. Remove gen10g_loopback in phy_device.c. Patch V1: Initial Submit ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03net: hns: Use phy_driver to setup Phy loopbackLin Yun Sheng2-71/+35
Use function set_loopback in phy_driver to setup phy loopback when doing ethtool self test. Signed-off-by: Lin Yun Sheng <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03net: phy: Add phy loopback support in net phy frameworkLin Yun Sheng3-0/+57
This patch add set_loopback in phy_driver, which is used by MAC driver to enable or disable phy loopback. it also add a generic genphy_loopback function, which use BMCR loopback bit to enable or disable loopback. Signed-off-by: Lin Yun Sheng <linyunsheng@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03net/mlx5: fix memcpy limit?Stephen Rothwell1-1/+1
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03ipv6: dad: don't remove dynamic addresses if link is downSabrina Dubroca1-9/+9
Currently, when the link for $DEV is down, this command succeeds but the address is removed immediately by DAD (1): ip addr add 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800 In the same situation, this will succeed and not remove the address (2): ip addr add 1111::12/64 dev $DEV ip addr change 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800 The comment in addrconf_dad_begin() when !IF_READY makes it look like this is the intended behavior, but doesn't explain why: * If the device is not ready: * - keep it tentative if it is a permanent address. * - otherwise, kill it. We clearly cannot prevent userspace from doing (2), but we can make (1) work consistently with (2). addrconf_dad_stop() is only called in two cases: if DAD failed, or to skip DAD when the link is down. In that second case, the fix is to avoid deleting the address, like we already do for permanent addresses. Fixes: 3c21edbd1137 ("[IPV6]: Defer IPv6 device initialization until the link becomes ready.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03net: cdc_ncm: Reduce memory use when kernel memory lowJim Baxter2-12/+45
The CDC-NCM driver can require large amounts of memory to create skb's and this can be a problem when the memory becomes fragmented. This especially affects embedded systems that have constrained resources but wish to maximise the throughput of CDC-NCM with 16KiB NTB's. The issue is after running for a while the kernel memory can become fragmented and it needs compacting. If the NTB allocation is needed before the memory has been compacted the atomic allocation can fail which can cause increased latency, large re-transmissions or disconnections depending upon the data being transmitted at the time. This situation occurs for less than a second until the kernel has compacted the memory but the failed devices can take a lot longer to recover from the failed TX packets. To ease this temporary situation I modified the CDC-NCM TX path to temporarily switch into a reduced memory mode which allocates an NTB that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes) sized memory block and only transmit NTB's with a single network frame until the memory situation is resolved. Each time this issue occurs we wait for an increasing number of reduced size allocations before requesting a full size one to not put additional pressure on a low memory system. Once the memory is compacted the CDC-NCM data can resume transmitting at the normal tx_max rate once again. Signed-off-by: Jim Baxter <jim_baxter@mentor.com> Reviewed-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03Merge branch 'qed-Add-iWARP-support-for-QL4xxxx'David S. Miller18-100/+3008
Michal Kalderon says: ==================== qed: Add iWARP support for QL4xxxx This patch series adds iWARP support to our QL4xxxx networking adapters. The code changes span across qed and qedr drivers, but this series contains changes to qed only. Once the series is accepted, the qedr series will be submitted to the rdma tree. There is one additional qed patch which enables the iWARP, this patch is delayed until the qedr series will be accepted. The patches were previously sent as an RFC, and these are the first 12 patches in the RFC series: https://www.spinics.net/lists/linux-rdma/msg51416.html This series was tested and built against net-next. MAINTAINERS file is not updated in this PATCH as there is a pending patch for qedr driver update https://patchwork.kernel.org/patch/9752761. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03qed: Add iWARP support for physical queue allocationKalderon, Michal1-0/+4
iWARP has different physical queue requirements than RoCE Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03qed: Add iWARP protocol support in context allocationKalderon, Michal1-2/+11
When computing how much memory is required for the different hw clients iWARP protocol should be taken into account Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03qed: iWARP CM add error handlingKalderon, Michal2-2/+190
This patch introduces error handling for errors that occurred during connection establishment. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03qed: iWARP implement disconnect flowsKalderon, Michal2-1/+91
This patch takes care of active/passive disconnect flows. Disconnect flows can be initiated remotely, in which case a async event will arrive from peer and indicated to qedr driver. These are referred to as exceptions. When a QP is destroyed, it needs to check that it's associated ep has been closed. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03qed: iWARP CM add active side connectKalderon, Michal4-12/+265
This patch implements the active side connect. Offload a connection, process MPA reply and send RTR. In some of the common passive/active functions, the active side will work in blocking mode. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03qed: iWARP CM add passive side connectKalderon, Michal9-20/+1039
This patch implements the passive side connect. It addresses pre-allocating resources, creating a connection element upon valid SYN packet received. Calling upper layer and implementation of the accept/reject calls. Error handling is not part of this patch. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03qed: iWARP CM add listener functions and initial SYN processingKalderon, Michal4-3/+343
This patch adds the ability to add and remove listeners and identify whether the SYN packet received is intended for iWARP or not. If a listener is not found the SYN packet is posted back to the chip. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03qed: iWARP CM - setup a ll2 connection for handling SYN packetsKalderon, Michal2-3/+220
iWARP handles incoming SYN packets using the ll2 interface. This patch implements ll2 setup and teardown. Additional ll2 connections will be used in the future which are not part of this patch series. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03qed: Add iWARP support in ll2 connectionsKalderon, Michal2-2/+12
Add a new connection type for iWARP ll2 connections for setting correct ll2 filters and connection type to FW. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03qed: Rename some ll2 related definesKalderon, Michal3-17/+16
Make some names more generic as they will be used by iWARP too. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03qed: Implement iWARP initialization, teardown and qp operationsKalderon, Michal11-40/+803
This patch adds iWARP support for flows that have common code between RoCE and iWARP, such as initialization, teardown and qp setup verbs: create, destroy, modify, query. It introduces the iWARP specific files qed_iwarp.[ch] and iwarp_common.h Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03qed: Introduce iWARP personalityKalderon, Michal7-27/+43
iWARP personality introduced the need for differentiating in several places in the code whether we are RoCE, iWARP or either. This leads to introducing new macros for querying the personality. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-02bpf: fix to bpf_setsockopsLawrence Brakmo1-2/+1
Fixed build error due to misplaced "#ifdef CONFIG_INET" (moved 1 statement up). Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01Merge branch 'bpf-Add-support-for-sock_ops'David S. Miller25-26/+1218
Lawrence Brakmo says: ==================== bpf: Add support for sock_ops Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding struct that allows BPF programs of this type to access some of the socket's fields (such as IP addresses, ports, etc.) and setting connection parameters such as buffer sizes, initial window, SYN/SYN-ACK RTOs, etc. Unlike current BPF program types that expect to be called at a particular place in the network stack code, SOCK_OPS program can be called at different places and use an "op" field to indicate the context. There are currently two types of operations, those whose effect is through their return value and those whose effect is through the new bpf_setsocketop BPF helper function. Example operands of the first type are: BPF_SOCK_OPS_TIMEOUT_INIT BPF_SOCK_OPS_RWND_INIT BPF_SOCK_OPS_NEEDS_ECN Example operands of the secont type are: BPF_SOCK_OPS_TCP_CONNECT_CB BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB Current operands are only called during connection establishment so there should not be any BPF overheads after connection establishment. The main idea is to use connection information form both hosts, such as IP addresses and ports to allow setting of per connection parameters to optimize the connection's peformance. Alghough there are already 3 mechanisms to set parameters (sysctls, route metrics and setsockopts), this new mechanism provides some disticnt advantages. Unlike sysctls, it can set parameters per connection. In contrast to route metrics, it can also use port numbers and information provided by a user level program. In addition, it could set parameters probabilistically for evaluation purposes (i.e. do something different on 10% of the flows and compare results with the other 90% of the flows). Also, in cases where IPv6 addresses contain geographic information, the rules to make changes based on the distance (or RTT) between the hosts are much easier than route metric rules and can be global. Finally, unlike setsockopt, it does not require application changes and it can be updated easily at any time. It uses the existing bpf cgroups infrastructure so the programs can be attached per cgroup with full inheritance support. Although the bpf cgroup framework already contains a sock related program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type (BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called only once during the connections's lifetime. In contrast, the new program type will be called multiple times from different places in the network stack code. For example, before sending SYN and SYN-ACKs to set an appropriate timeout, when the connection is established to set congestion control, etc. As a result it has "op" field to specify the type of operation requested. This patch set also includes sample BPF programs to demostrate the differnet features. v2: Formatting changes, rebased to latest net-next v3: Fixed build issues, changed socket_ops to sock_ops throught, fixed formatting issues, removed the syscall to load sock_ops program and added functionality to use existing bpf attach and bpf detach system calls, removed reader/writer locks in sock_bpfops.c (used when saving sock_ops global program) and fixed missing module refcount increment. v4: Removed global sock_ops program and instead used existing cgroup bpf infrastructure to support a new BPF_CGROUP_ATTCH type. v5: fixed kbuild warning happening in bpf-cgroup.h removed automatic converstion to host byte order from some sock_ops fields (ipv4 and ipv6 addresses, remote port) Added conversion to host byte order in some of the sample programs Added to sample BPF program comments about using load_sock_ops to load Removed is_req_sock field from bpf_sock_ops_kern and related places, using sk_fullsock() instead. v6: fixes to BPF helper function setsockopt (possible NULL deferencing, etc.) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: update tools/include/uapi/linux/bpf.hLawrence Brakmo1-1/+65
Update tools/include/uapi/linux/bpf.h to include changes related to new bpf sock_ops program type. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Sample bpf program to set sndcwnd clampLawrence Brakmo2-0/+103
Sample BPF program, tcp_clamp_kern.c, to demostrate the use of setting the sndcwnd clamp. This program assumes that if the first 5.5 bytes of the host's IPv6 addresses are the same, then the hosts are in the same datacenter and sets sndcwnd clamp to 100 packets, SYN and SYN-ACK RTOs to 10ms and send/receive buffer sizes to 150KB. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Adds support for setting sndcwnd clampLawrence Brakmo2-0/+8
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_SNDCWND_CLAMP, which sets the initial congestion window. It is useful to limit the sndcwnd when the host are close to each other (small RTT). Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Sample BPF program to set initial cwndLawrence Brakmo2-0/+89
Sample BPF program that assumes hosts are far away (i.e. large RTTs) and sets initial cwnd and initial receive window to 40 packets, send and receive buffers to 1.5MB. In practice there would be a test to insure the hosts are actually far enough away. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Adds support for setting initial cwndLawrence Brakmo2-1/+19
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_IW, which sets the initial congestion window. This can be used when the hosts are far apart (large RTTs) and it is safe to start with a large inital cwnd. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Sample BPF program to set congestion controlLawrence Brakmo2-0/+84
Sample BPF program that sets congestion control to dctcp when both hosts are within the same datacenter. In this example that is assumed to be when they have the first 5.5 bytes of their IPv6 address are the same. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>