summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2017-08-28net/ncsi: Configure VLAN tag filterSamuel Mendoza-Jonas4-4/+326
Make use of the ndo_vlan_rx_{add,kill}_vid callbacks to have the NCSI stack process new VLAN tags and configure the channel VLAN filter appropriately. Several VLAN tags can be set and a "Set VLAN Filter" packet must be sent for each one, meaning the ncsi_dev_state_config_svf state must be repeated. An internal list of VLAN tags is maintained, and compared against the current channel's ncsi_channel_filter in order to keep track within the state. VLAN filters are removed in a similar manner, with the introduction of the ncsi_dev_state_config_clear_vids state. The maximum number of VLAN tag filters is determined by the "Get Capabilities" response from the channel. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28net/ncsi: Fix several packet definitionsSamuel Mendoza-Jonas3-7/+8
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28Merge branch '40GbE' of ↵David S. Miller15-122/+285
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2017-08-27 This series contains updates to i40e and i40evf only. Sudheer updates code comments and state variable so that adminq_subtask will have accutate information whenever it gets scheduled. Mariusz stores information about FEC modes, to be used to printing link states information, so that we do not need to call admin queue when reporting link status. Adds VF support for controlling VLAN tag stripping via ethtool. Jake provides the majority of changes in this series, starting with increasing the size of the prefix buffer so that it can hold enough characters for every possible input, which prevents snprintf truncation. Fixed other string truncation errors/warnings produced by GCC 7.x. Removed an unnecessary workaround for resetting XPS. Fixed an issue where there is a mismatched affinity mask value, so initialize the value to cpu_possible_mask and invert the logic for checking incorrect CPU vs IRQ affinity so that the exceptional case is handled at the check. Removed ULTRA latency mode due to several issues found and will be looking at better solution for small packet workloads. Akeem fixes an issue where the incorrect flag was being used to set promiscuous mode for unicast, which was enabling promiscuous mode only for multicast instead of unicast. Carolyn fixes an issue where an error return value is set, but this value can be overwritten before we actually do exit the function. So remove the error code assignment and add code comments for better understanding on why we do not need to set and return the error. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28net-next/hinic: fix comparison of a uint16_t type with -1Aviad Krawczyk2-36/+22
Remove the search for index of constant buffer size Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com> Signed-off-by: Zhao Chen <zhaochen6@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28net-next/hinic: Fix MTU limitationAviad Krawczyk1-0/+1
Fix the hw MTU limitation by setting max_mtu Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com> Signed-off-by: Zhao Chen <zhaochen6@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28Merge branch 'irda-move-to-staging'David S. Miller138-7/+16
Greg Kroah-Hartman says: ==================== irda: move it to drivers/staging so we can delete it The IRDA code has long been obsolete and broken. So, to keep people from trying to use it, and to prevent people from having to maintain it, let's move it to drivers/staging/ so that we can delete it entirely from the kernel in a few releases. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28staging: irda: add a TODO file.Greg Kroah-Hartman1-0/+4
The irda code will be deleted in a future kernel release, so no need to have anyone do any new work on it. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28irda: move include/net/irda into staging subdirectoryGreg Kroah-Hartman36-0/+4
And finally, move the irda include files into drivers/staging/irda/include/net/irda. Yes, it's a long path, but it makes it easy for us to just add a Makefile directory path addition and all of the net and drivers code "just works". Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28irda: move drivers/net/irda to drivers/staging/irda/driversGreg Kroah-Hartman50-2/+2
Move the irda drivers from drivers/net/irda/ to drivers/staging/irda/drivers as they will be deleted in a future kernel release. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28irda: move net/irda/ to drivers/staging/irda/net/Greg Kroah-Hartman55-5/+6
It's time to get rid of IRDA. It's long been broken, and no one seems to use it anymore. So move it to staging and after a while, we can delete it from there. To start, move the network irda core from net/irda to drivers/staging/irda/net/ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28Merge branch 'dpaa_eth-rss'David S. Miller12-94/+1235
Madalin Bucur says: ==================== Add RSS to DPAA 1.x Ethernet driver This patch set introduces Receive Side Scaling for the DPAA Ethernet driver. Documentation is updated with details related to the new feature and limitations that apply. Added also a small fix. v2: removed a C++ style comment v3: move struct fman to header file to avoid exporting a function v4: addressed compilation issues introduced in v3 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28dpaa_eth: check allocation resultMadalin Bucur1-0/+3
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28Documentation: networking: add RSS informationMadalin Bucur1-1/+67
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28dpaa_eth: add NETIF_F_RXHASHMadalin Bucur5-5/+41
Set the skb hash when then FMan Keygen hash result is available. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28dpaa_eth: enable Rx hashing controlMadalin Bucur1-0/+113
Allow ethtool control of the Rx flow hashing. By default RSS is enabled, this allows to turn it off by bypassing the FMan Keygen block and sending all traffic on the default Rx frame queue. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28dpaa_eth: use multiple Rx frame queuesMadalin Bucur3-7/+47
Add a block of 128 Rx frame queues per port. The FMan hardware will send traffic on one of these queues based on the FMan port Parse Classify Distribute setup. The hash computed by the FMan Keygen block will select the Rx FQ. Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28fsl/fman: enable FMan KeygenIordache Florinel-R701777-2/+884
Add support for the FMan Keygen with a hardcoded scheme to spread incoming traffic on a FQ range based on source and destination IPs and ports. Signed-off-by: Iordache Florinel <florinel.iordache@nxp.com> Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28fsl/fman: move struct fman to header fileMadalin Bucur3-81/+82
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28net: ethernet: broadcom: Remove null check before kfreeHimanshu Jha1-8/+4
Kfree on NULL pointer is a no-op and therefore checking is redundant. Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28sched: sfq: drop packets after root qdisc lock is releasedGao Feng1-7/+13
The commit 520ac30f4551 ("net_sched: drop packets after root qdisc lock is released) made a big change of tc for performance. But there are some points which are not changed in SFQ enqueue operation. 1. Fail to find the SFQ hash slot; 2. When the queue is full; Now use qdisc_drop instead free skb directly. Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28Merge branch 'mlxsw-dpipe-fixes'David S. Miller3-1/+20
Jiri Pirko says: ==================== mlxsw: spectrum: Fix couple of dpipe ipv4 host table bugs Arkadi Sharshevsky (1): mlxsw: spectrum_dpipe: Fix host table dump Jiri Pirko (1): mlxsw: spectrum: compile-in dpipe support only if devlink is enabled ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28mlxsw: spectrum_dpipe: Fix host table dumpArkadi Sharshevsky1-0/+3
During the neighbor traversal the neighbors from different families should be ignored. Fixes: c58035a74aba ("mlxsw: spectrum_dpipe: Add support for IPv4 host table dump") Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28mlxsw: spectrum: compile-in dpipe support only if devlink is enabledJiri Pirko2-1/+17
Makes no sense to have dpipe compiled in when devlink is not enabled, because the devlink dpipe registation is noop function. So don't compile it in. This also fixes missing extern structs errors. Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: a86f030915f2 ("mlxsw: spectrum_dpipe: Add support for IPv4 host table dump") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)Dexuan Cui4-0/+920
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It uses VMBus ringbuffer as the transportation layer. With hv_sock, applications between the host (Windows 10, Windows Server 2016 or newer) and the guest can talk with each other using the traditional socket APIs. More info about Hyper-V Sockets is available here: "Make your own integration services": https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service The patch implements the necessary support in Linux guest by introducing a new vsock transport for AF_VSOCK. Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Andy King <acking@vmware.com> Cc: Dmitry Torokhov <dtor@vmware.com> Cc: George Zhang <georgezhang@vmware.com> Cc: Jorgen Hansen <jhansen@vmware.com> Cc: Reilly Grant <grantr@vmware.com> Cc: Asias He <asias@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Cathy Avery <cavery@redhat.com> Cc: Rolf Neugebauer <rolf.neugebauer@docker.com> Cc: Marcelo Cerri <marcelo.cerri@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28selftests/bpf: check the instruction dumps are populatedJakub Kicinski1-4/+12
Add a basic test for checking whether kernel is populating the jited and xlated BPF images. It was used to confirm the behaviour change from commit d777b2ddbecf ("bpf: don't zero out the info struct in bpf_obj_get_info_by_fd()"), which made bpf_obj_get_info_by_fd() usable for retrieving the image dumps. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28bpf: fix oops on allocation failureDan Carpenter1-0/+1
"err" is set to zero if bpf_map_area_alloc() fails so it means we return ERR_PTR(0) which is NULL. The caller, find_and_alloc_map(), is not expecting NULL returns and will oops. Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28net: Add comment that early_demux can change via sysctlDavid Ahern3-0/+12
Twice patches trying to constify inet{6}_protocol have been reverted: 39294c3df2a8 ("Revert "ipv6: constify inet6_protocol structures"") to revert 3a3a4e3054137 and then 03157937fe0b5 ("Revert "ipv4: make net_protocol const"") to revert aa8db499ea67. Add a comment that the structures can not be const because the early_demux field can change based on a sysctl. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28xen-netback: update ubuf_info initialization to anonymous unionWillem de Bruijn1-2/+2
The xen driver initializes struct ubuf_info fields using designated initializers. I recently moved these fields inside a nested anonymous struct inside an anonymous union. I had missed this use case. This breaks compilation of xen-netback with older compilers. >From kbuild bot with gcc-4.4.7: drivers/net//xen-netback/interface.c: In function 'xenvif_init_queue': >> drivers/net//xen-netback/interface.c:554: error: unknown field 'ctx' specified in initializer >> drivers/net//xen-netback/interface.c:554: warning: missing braces around initializer drivers/net//xen-netback/interface.c:554: warning: (near initialization for '(anonymous).<anonymous>') >> drivers/net//xen-netback/interface.c:554: warning: initialization makes integer from pointer without a cast >> drivers/net//xen-netback/interface.c:555: error: unknown field 'desc' specified in initializer Add double braces around the designated initializers to match their nested position in the struct. After this, compilation succeeds again. Fixes: 4ab6c99d99bb ("sock: MSG_ZEROCOPY notification coalescing") Reported-by: kbuild bot <lpk@intel.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28Merge branch 'gre-add-collect_md-mode-for-ERSPAN-tunnel'David S. Miller4-21/+232
William Tu says: ==================== gre: add collect_md mode for ERSPAN tunnel This patch series provide collect_md mode for ERSPAN tunnel. The fist patch refactors the existing gre_fb_xmit function by exacting the route cache portion into a new function called prepare_fb_xmit. The second patch introduces the collect_md mode for ERSPAN tunnel, by calling the prepare_fb_xmit function and adding ERSPAN specific logic. The final patch adds the test case using bpf_skb_{set,get}_tunnel_{key,opt}. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28samples/bpf: extend test_tunnel_bpf.sh with ERSPANWilliam Tu2-1/+91
Extend existing tests for vxlan, gre, geneve, ipip to include ERSPAN tunnel. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28gre: add collect_md mode to ERSPAN tunnelWilliam Tu2-5/+101
Similar to gre, vxlan, geneve, ipip tunnels, allow ERSPAN tunnels to operate in 'collect metadata' mode. bpf_skb_[gs]et_tunnel_key() helpers can make use of it right away. OVS can use it as well in the future. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28gre: refactor the gre_fb_xmitWilliam Tu1-15/+40
The patch refactors the gre_fb_xmit function, by creating prepare_fb_xmit function for later ERSPAN collect_md mode patch. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28Revert "ipv4: make net_protocol const"David Ahern1-2/+2
This reverts commit aa8db499ea67cff1f5f049033810ffede2fe5ae4. Early demux structs can not be made const. Doing so results in: [ 84.967355] BUG: unable to handle kernel paging request at ffffffff81684b10 [ 84.969272] IP: proc_configure_early_demux+0x1e/0x3d [ 84.970544] PGD 1a0a067 [ 84.970546] P4D 1a0a067 [ 84.971212] PUD 1a0b063 [ 84.971733] PMD 80000000016001e1 [ 84.972669] Oops: 0003 [#1] SMP [ 84.973065] Modules linked in: ip6table_filter ip6_tables veth vrf [ 84.973833] CPU: 0 PID: 955 Comm: sysctl Not tainted 4.13.0-rc6+ #22 [ 84.974612] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 84.975855] task: ffff88003854ce00 task.stack: ffffc900005a4000 [ 84.976580] RIP: 0010:proc_configure_early_demux+0x1e/0x3d [ 84.977253] RSP: 0018:ffffc900005a7dd0 EFLAGS: 00010246 [ 84.977891] RAX: ffffffff81684b10 RBX: 0000000000000001 RCX: 0000000000000000 [ 84.978759] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000000 [ 84.979628] RBP: ffffc900005a7dd0 R08: 0000000000000000 R09: 0000000000000000 [ 84.980501] R10: 0000000000000001 R11: 0000000000000008 R12: 0000000000000001 [ 84.981373] R13: ffffffffffffffea R14: ffffffff81a9b4c0 R15: 0000000000000002 [ 84.982249] FS: 00007feb237b7700(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 [ 84.983231] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 84.983941] CR2: ffffffff81684b10 CR3: 0000000038492000 CR4: 00000000000406f0 [ 84.984817] Call Trace: [ 84.985133] proc_tcp_early_demux+0x29/0x30 I think this is the second time such a patch has been reverted. Cc: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28RDS: make rhashtable_params constBhumika Goyal1-1/+1
Make this const as it is either used during a copy operation or passed to a const argument of the function rhltable_init Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28ipv4: make net_protocol constBhumika Goyal1-2/+2
Make these const as they are only passed to a const argument of the function inet_add_protocol. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28bridge: make ebt_table constBhumika Goyal1-1/+1
Make this const as it is only passed to a const argument of the function ebt_register_table. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28Merge branch 'sockmap-uapi-updates-and-fixes'David S. Miller15-296/+544
John Fastabend says: ==================== sockmap UAPI updates and fixes This series updates sockmap UAPI, adds additional test cases and provides a couple fixes. First the UAPI changes. The original API added two sockmap specific API artifacts (a) a new map_flags field with a sockmap specific update command and (b) a new sockmap specific attach field in the attach data structure. After this series instead of attaching programs with a single command now two commands are used to attach programs to maps individually. This allows us to add new programs easily in the future and avoids any specific sockmap data structure additions. The map_flags field is also removed and instead we allow socks to be added to multiple maps that may or may not have programs attached. This allows users to decide if a sock should run a SK_SKB program type on receive based on the map it is attached to. This is a nice improvement. See patches for specific details. More test cases were added to test above changes and also stress test the interface. Finally two fixes/improvements were made. First a missing rcu section was added. Second now sockmap can build without KCM being used to trigger 'y' on CONFIG_STREAM_PARSER by selecting a new BPF config option. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28bpf: test_maps add sockmap stress testJohn Fastabend1-1/+28
Sockmap is a bit different than normal stress tests that can run in parallel as is. We need to reuse the same socket pool and map pool to get good stress test cases. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28bpf: sockmap requires STREAM_PARSER add Kconfig entryJohn Fastabend1-0/+12
SOCKMAP uses strparser code (compiled with Kconfig option CONFIG_STREAM_PARSER) to run the parser BPF program. Without this config option set sockmap wont be compiled. However, at the moment the only way to pull in the strparser code is to enable KCM. To resolve this create a BPF specific config option to pull only the strparser piece in that sockmap needs. This also allows folks who want to use BPF/syscall/maps but don't need sockmap to easily opt out. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28bpf: sockmap indicate sock events to listenersJohn Fastabend1-0/+6
After userspace pushes sockets into a sockmap it may not be receiving data (assuming stream_{parser|verdict} programs are attached). But, it may still want to manage the socks. A common pattern is to poll/select for a POLLRDHUP event so we can close the sock. This patch adds the logic to wake up these listeners. Also add TCP_SYN_SENT to the list of events to handle. We don't want to break the connection just because we happen to be in this state. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28bpf: harden sockmap program attach to ensure correct map typeJohn Fastabend3-4/+33
When attaching a program to sockmap we need to check map type is correct. Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28bpf: more SK_SKB selftestsJohn Fastabend1-0/+98
Tests packet read/writes and additional skb fields. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28bpf: additional sockmap self testsJohn Fastabend3-46/+96
Add some more sockmap tests to cover, - forwarding to NULL entries - more than two maps to test list ops - forwarding to different map Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28bpf: sockmap add missing rcu_read_(un)lock in smap_data_readyJohn Fastabend1-3/+6
References to psock must be done inside RCU critical section. Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28bpf: sockmap, remove STRPARSER map_flags and add multi-map supportJohn Fastabend2-107/+165
The addition of map_flags BPF_SOCKMAP_STRPARSER flags was to handle a specific use case where we want to have BPF parse program disabled on an entry in a sockmap. However, Alexei found the API a bit cumbersome and I agreed. Lets remove the STRPARSER flag and support the use case by allowing socks to be in multiple maps. This allows users to create two maps one with programs attached and one without. When socks are added to maps they now inherit any programs attached to the map. This is a nice generalization and IMO improves the API. The API rules are less ambiguous and do not need a flag: - When a sock is added to a sockmap we have two cases, i. The sock map does not have any attached programs so we can add sock to map without inheriting bpf programs. The sock may exist in 0 or more other maps. ii. The sock map has an attached BPF program. To avoid duplicate bpf programs we only add the sock entry if it does not have an existing strparser/verdict attached, returning -EBUSY if a program is already attached. Otherwise attach the program and inherit strparser/verdict programs from the sock map. This allows for socks to be in a multiple maps for redirects and inherit a BPF program from a single map. Also this patch simplifies the logic around BPF_{EXIST|NOEXIST|ANY} flags. In the original patch I tried to be extra clever and only update map entries when necessary. Now I've decided the complexity is not worth it. If users constantly update an entry with the same sock for no reason (i.e. update an entry without actually changing any parameters on map or sock) we still do an alloc/release. Using this and allowing multiple entries of a sock to exist in a map the logic becomes much simpler. Note: Now that multiple maps are supported the "maps" pointer called when a socket is closed becomes a list of maps to remove the sock from. To keep the map up to date when a sock is added to the sockmap we must add the map/elem in the list. Likewise when it is removed we must remove it from the list. This results in searching the per psock list on delete operation. On TCP_CLOSE events we walk the list and remove the psock from all map/entry locations. I don't see any perf implications in this because at most I have a psock in two maps. If a psock were to be in many maps its possibly this might be noticeable on delete but I can't think of a reason to dup a psock in many maps. The sk_callback_lock is used to protect read/writes to the list. This was convenient because in all locations we were taking the lock anyways just after working on the list. Also the lock is per sock so in normal cases we shouldn't see any contention. Suggested-by: Alexei Starovoitov <ast@kernel.org> Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28bpf: convert sockmap field attach_bpf_fd2 to typeJohn Fastabend13-151/+116
In the initial sockmap API we provided strparser and verdict programs using a single attach command by extending the attach API with a the attach_bpf_fd2 field. However, if we add other programs in the future we will be adding a field for every new possible type, attach_bpf_fd(3,4,..). This seems a bit clumsy for an API. So lets push the programs using two new type fields. BPF_SK_SKB_STREAM_PARSER BPF_SK_SKB_STREAM_VERDICT This has the advantage of having a readable name and can easily be extended in the future. Updates to samples and sockmap included here also generalize tests slightly to support upcoming patch for multiple map support. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support") Suggested-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-27ARM: dts: rk3228-evb: Fix the compiling errorDavid Wu1-2/+2
This patch solves the following error: arch/arm/boot/dts/rk3228-evb.dtb: ERROR (phandle_references): Reference to non-existent node or label "phy0" Fixess db40f15b53e4 ("ARM: dts: rk3228-evb: Enable the integrated PHY for gmac") Signed-off-by: David Wu <david.wu@rock-chips.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-27i40e/i40evf: avoid dynamic ITR updates when polling or low packet rateJacob Keller4-10/+36
The dynamic ITR algorithm depends on a calculation of usecs which assumes that the interrupts have been firing constantly at the interrupt throttle rate. This is not guaranteed because we could have a low packet rate, or have been polling in software. We'll estimate whether this is the case by using jiffies to determine if we've been too long. If the time difference of jiffies is larger we are guaranteed to have an incorrect calculation. If the time difference of jiffies is smaller we might have been polling some but the difference shouldn't affect the calculation too much. This ensures that we don't get stuck in BULK latency during certain rare situations where we receive bursts of packets that force us into NAPI polling. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27i40e/i40evf: remove ULTRA latency modeJacob Keller4-36/+0
Since commit c56625d59726 ("i40e/i40evf: change dynamic interrupt thresholds") a new higher latency ITR setting called I40E_ULTRA_LATENCY was added with a cryptic comment about how it was meant for adjusting Rx more aggressively when streaming small packets. This mode was attempting to calculate packets per second and then kick in when we have a huge number of small packets. Unfortunately, the ULTRA setting was kicking in for workloads it wasn't intended for including single-thread UDP_STREAM workloads. This wasn't caught for a variety of reasons. First, the ip_defrag routines were improved somewhat which makes the UDP_STREAM test still reasonable at 10GbE, even when dropped down to 8k interrupts a second. Additionally, some other obvious workloads appear to work fine, such as TCP_STREAM. The number 40k doesn't make sense for a number of reasons. First, we absolutely can do more than 40k packets per second. Second, we calculate the value inline in an integer, which sometimes can overflow resulting in using incorrect values. If we fix this overflow it makes it even more likely that we'll enter ULTRA mode which is the opposite of what we want. The ULTRA mode was added originally as a way to reduce CPU utilization during a small packet workload where we weren't keeping up anyways. It should never have been kicking in during these other workloads. Given the issues outlined above, let's remove the ULTRA latency mode. If necessary, a better solution to the CPU utilization issue for small packet workloads will be added in a future patch. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27i40e: invert logic for checking incorrect cpu vs irq affinityJacob Keller2-31/+30
In commit 96db776a3682 ("i40e/vf: fix interrupt affinity bug") we added some code to force exit of polling in case we did not have the correct CPU. This is important since it was possible for the IRQ affinity to be changed while the CPU is pegged at 100%. This can result in the polling routine being stuck on the wrong CPU until traffic finally stops. Unfortunately, the implementation, "if the CPU is correct, exit as normal, otherwise, fall-through to the end-polling exit" is incredibly confusing to reason about. In this case, the normal flow looks like the exception, while the exception actually occurs far away from the if statement and comment. We recently discovered and fixed a bug in this code because we were incorrectly initializing the affinity mask. Re-write the code so that the exceptional case is handled at the check, rather than having the logic be spread through the regular exit flow. This does end up with minor code duplication, but the resulting code is much easier to reason about. The new logic is identical, but inverted. If we are running on a CPU not in our affinity mask, we'll exit polling. However, the code flow is much easier to understand. Note that we don't actually have to check for MSI-X, because in the MSI case we'll only have one q_vector, but its default affinity mask should be correct as it includes all CPUs when it's initialized. Further, we could at some point add code to setup the notifier for the non-MSI-X case and enable this workaround for that case too, if desired, though there isn't much gain since its unlikely to be the common case. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>