summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2017-05-01bpf: Move endianness BPF helpers out of bpf_util.hDavid S. Miller4-21/+25
We do not want to include things like stdio.h and friends into eBPF program builds. bpf_util.h is for host compiled programs, so eBPF C-code helpers don't really belong there. Add a new bpf_endian.h as a quick fix for this for now. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01ipv6: Need to export ipv6_push_frag_opts for tunneling now.David S. Miller1-1/+1
Since that change also made the nfrag function not necessary for exports, remove it. Fixes: 89a23c8b528b ("ip6_tunnel: Fix missing tunnel encapsulation limit option") Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01Merge branch 'dsa-mv88e6xxx-802.1s-and-88E6390-VTU'David S. Miller5-435/+690
Vivien Didelot says: ==================== net: dsa: mv88e6xxx: 802.1s and 88E6390 VTU This patch series adds support for the VLAN Table Unit (a.k.a. the VTU) to the 88E6390 family of Marvell Ethernet switch chips. The plumbing for the per VLAN Spanning Tree support is added as a side effect of the necessary refactoring. The patchset is split up so that no duplication of code is introduced. With this patchset applied, the mv88e6xxx driver has 2 new function pointers for the VTU GetNext and VTU Load/Purge operations (with 3 implementations), both handling programmation of 802.1q and 802.1s. On a ZII Rev C board (featuring 2 88E6390X chips) with all ports bridged together, we obtain the following hardware VLAN configuration: # cat /sys/class/net/br0/bridge/vlan_filtering 1 # cat /sys/class/net/br0/bridge/default_pvid 42 # bridge vlan add dev lan3 vid 666 # bridge vlan show port vlan ids lan1 42 PVID Egress Untagged lan1 42 PVID Egress Untagged lan2 42 PVID Egress Untagged lan2 42 PVID Egress Untagged lan3 42 PVID Egress Untagged 666 lan3 42 PVID Egress Untagged 666 lan4 42 PVID Egress Untagged lan4 42 PVID Egress Untagged lan5 42 PVID Egress Untagged lan5 42 PVID Egress Untagged lan6 42 PVID Egress Untagged lan6 42 PVID Egress Untagged lan7 42 PVID Egress Untagged lan7 42 PVID Egress Untagged lan8 42 PVID Egress Untagged lan8 42 PVID Egress Untagged br0 42 PVID Egress Untagged Below are the technical details for the different implementations. All switch families have up to 3 dedicated VTU Data registers used to program 802.1q and 802.1s, both using 2-bit values. On 88E6185 and 88E6352 families, port membership and state are adjacent, while the 88E6390 family share the same bits: Bits 88E6185/88E6352 88E6390 ----- ----------------- -------------------------- 0-1 Port 0 membership Port 0 membership or state 2-3 Port 0 state Port 1 membership or state 4-5 Port 1 membership Port 2 membership or state 6-7 Port 1 state Port 3 membership or state 8-9 Port 2 membership Port 4 membership or state 10-11 Port 2 state Port 5 membership or state ... ... ... The 88E6185 family programs all ports membership and state in a single VTU GetNext or Load/Purge operation. The 88E6352 family introduced an indirect Spanning Tree Unit table (a.k.a. STU) which requires additional STU GetNext and Load/Purge operations to read and write the ports state bits. The 88E6390 family also has an STU and requires data bits to be accessed before and after every single VTU or STU operation. Finally, the 88E6390 family introduced a 13th bit for the VLAN ID, which must be taken care of regardless the VTU operating mode. This means that iterating over the VTU now starts or ends with value 8191, not 4095. Patch 1 adds a max_vid field to the chip info structure. Patch 2 adds 802.1q and 802.1s data to the generic VTU entry structure. Patches 3 to 10 move helpers to a dedicated file (later made static). Patches 11 and 12 abstract handling of the STU behind VTU operations. Patches 13 and 14 add the new function pointers for VTU operations. Patches 15 and 18 polish the VTU code and add VTU support for 88E6390. Changes in v2: - add Reviewed-by tags - fix comments in 8/18 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: add VTU support for 88E6390Vivien Didelot3-0/+146
The 6390 family of chips use only 2 of the 3 VTU Data registers to pack the MemberTag and PortState VLAN data. This means that they must be written or read before or after each VTU/STU operations. Implement this variant to add support for VTU with such chips. These chips have a 13th bit for the VID thus set their max_vid to 8191. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: support the VTU Page bitVivien Didelot2-0/+8
Newer chips such as the 88E6390 have a VTU Page bit in the VTU VID register to specify a 13th bit for the VID. This can be used to support 8K VLANs. When dumping the whole VTU, all VID bits must be set to one, including this VTU Page bit. Add support for VID greater than 4095. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: simplify VTU entry getterVivien Didelot1-38/+24
Make the code which fetches or initializes a new VTU entry more concise. This allows us the get rid of the old underscore prefix naming. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: make VTU helpers staticVivien Didelot2-48/+24
Now that we have chip operations for VTU accesses, mark all helpers from global1_vtu.c as static. Only the various implementations of the GetNext, LoadPurge and Flush operations need to be exposed. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: add VTU Load/Purge operationVivien Didelot4-57/+103
Add a new vtu_loadpurge operation to the chip info structure to differ the various implementations of the VTU accesses. Now that the STU handling is abstracted behind VTU operations, kill the obsolete MV88E6XXX_FLAG_STU flag. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: add VTU GetNext operationVivien Didelot4-48/+99
Add a new vtu_getnext operation to the chip info structure to differ the various implementations of the VTU accesses. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: load STU entry with VTU entryVivien Didelot1-104/+4
Now that the code writes both VTU and STU data when loading a VTU entry, load the corresponding STU entry at the same time. This allows us to get rid of the STU management in the _mv88e6xxx_vtu_new helper and thus remove the separate implementations of STU Load/Purge and STU GetNext, as well as the unused family checks. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: get STU entry on VTU GetNextVivien Didelot3-1/+25
Now that the code reads both VTU and STU data on VTU GetNext operation, fetch the STU entry data of a VTU entry at the same time. The STU data bits are masked with the VTU data bits and they are now all read at the same time a VTU GetNext operation is issued. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: move STU GetNext operationVivien Didelot3-13/+23
Extract the generic portion of code to issue an STU GetNext operation, which will be used in other implementations. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: move VTU Data accessorsVivien Didelot3-81/+72
The code to access the VTU Data registers currently only supports the 88E6185 family and alike: 2-bit membership adjacent to 2-bit port state. Even though the 88E6352 family introduced an indirect table to program the VLAN Spanning Tree states, the usage of the VTU Data registers remains the same regardless the VTU or STU operation. Now that the mv88e6xxx_vtu_entry structure contains both port membership and states data, factorize the code to access them in global1_vtu.c. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: move generic VTU GetNextVivien Didelot3-29/+33
Even though every switch model has a different way to access the VTU Data bits, the base implementation of the VTU GetNext operation remains the same: wait, write the first VID to iterate from, start the operation, and read the next VID. Move this generic implementation into global1_vtu.c and abstract the handling of the start VID (similarly to the ATU GetNext implementation), before introducing a new chip operation for specific chips. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: move VTU VID accessorsVivien Didelot3-34/+56
Add helpers to access the VTU VID register in the global1_vtu.c file. At the same time, move mv88e6xxx_g1_vtu_vid_write at the beginning of _mv88e6xxx_vtu_loadpurge, which adds no functional changes but makes future patches simpler. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: move VTU SID accessorsVivien Didelot3-13/+37
Add helpers to access the VTU SID register in the global1_vtu.c file. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: move VTU FID accessorsVivien Didelot3-5/+31
Add helpers to access the VTU FID register in the global1_vtu.c file. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: move VTU flushVivien Didelot3-16/+26
Move the VTU flush operation to global1_vtu.c and call it from a mv88e6xxx_vtu_setup helper, similarly to the ATU and PVT setup. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: move VTU Operation accessorsVivien Didelot4-29/+47
Move the helper functions to access the Global 1 VTU Operation register to a new global1_vtu.c file, and get rid of the old underscore prefix naming convention. This file will be extended will all VTU/STU related code. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: split VTU entry data memberVivien Didelot2-11/+13
VLAN aware Marvell chips can program 802.1Q VLAN membership as well as 802.1s per VLAN Spanning Tree state using the same 3 VTU Data registers. Some chips such as 88E6185 use different Data registers offsets for ports state and membership, and program them in a single operation. Other chips such as 88E6352 use the same register layout but program them in distinct operations (an indirect table is used for 802.1s.) Newer chips such as 88E6390 use the same offsets for both state and membership in distinct operations, thus require multiple data accesses. To correctly abstract this, split the "data" structure member of mv88e6xxx_vtu_entry in two "state" and "member" members, before adding VTU support for newer chips. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net: dsa: mv88e6xxx: add max VID to infoVivien Didelot2-20/+31
Some chips don't have a VLAN Table Unit, most of them do have a 4K table, some others as the 88E6390 family has a 13th bit for the VID. Add a new max_vid member to the info structure, used to check the presence of a VTU as well as the value used to iterate from in VTU GetNext operations. This makes the MV88E6XXX_FLAG_VTU obsolete, thus remove it. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01xfrm: Indicate xfrm_state offload errorsIlan Tayari1-3/+6
Current code silently ignores driver errors when configuring IPSec offload xfrm_state, and falls back to host-based crypto. Fail the xfrm_state creation if the driver has an error, because the NIC offloading was explicitly requested by the user program. This will communicate back to the user that there was an error. Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01net/esp4: Fix invalid esph pointer crashIlan Tayari1-0/+1
Both esp_output and esp_xmit take a pointer to the ESP header and place it in esp_info struct prior to calling esp_output_head. Inside esp_output_head, the call to esp_output_udp_encap makes sure to update the pointer if it gets invalid. However, if esp_output_head itself calls skb_cow_data, the pointer is not updated and stays invalid, causing a crash after esp_output_head returns. Update the pointer if it becomes invalid in esp_output_head Fixes: fca11ebde3f0 ("esp4: Reorganize esp_output") Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01ip6_tunnel: Fix missing tunnel encapsulation limit optionCraig Gallek1-2/+2
The IPv6 tunneling code tries to insert IPV6_TLV_TNL_ENCAP_LIMIT and IPV6_TLV_PADN options when an encapsulation limit is defined (the default is a limit of 4). An MTU adjustment is done to account for these options as well. However, the options are never present in the generated packets. The issue appears to be a subtlety between IPV6_DSTOPTS and IPV6_RTHDRDSTOPTS defined in RFC 3542. When the IPIP tunnel driver was written, the encap limit options were included as IPV6_RTHDRDSTOPTS in dst0opt of struct ipv6_txoptions. Later, ipv6_push_nfrags_opts was (correctly) updated to require IPV6_RTHDR options when IPV6_RTHDRDSTOPTS are to be used. This caused the options to no longer be included in v6 encapsulated packets. The fix is to use IPV6_DSTOPTS (in dst1opt of struct ipv6_txoptions) instead. IPV6_DSTOPTS do not have the additional IPV6_RTHDR requirement. Fixes: 1df64a8569c7: ("[IPV6]: Add ip6ip6 tunnel driver.") Fixes: 333fad5364d6: ("[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542)") Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01Merge branch 'for-upstream' of ↵David S. Miller11-887/+342
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-04-30 Here's one last batch of Bluetooth patches in the bluetooth-next tree targeting the 4.12 kernel. - Remove custom ECDH implementation and use new KPP API instead - Add protocol checks to hci_ldisc - Add module license to HCI UART Nokia H4+ driver - Minor fix for 32bit user space - 64 bit kernel combination Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01switchdev: documentation: fix whitespace issuesLiam Beguin1-35/+35
Figure 1 is full of whitespaces; fix it Signed-off-by: Liam Beguin <lbeguin@tycoint.com> Signed-off-by: Sylvain Lemieux <slemieux@tycoint.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01mlxsw: spectrum_router: Simplify VRF enslavementIdo Schimmel3-131/+63
When a netdev is enslaved to a VRF master, its router interface (RIF) needs to be destroyed (if exists) and a new one created using the corresponding virtual router (VR). >From the driver's perspective, the above is equivalent to an inetaddr event sent for this netdev. Therefore, when a port netdev (or its uppers) are enslaved to a VRF master, call the same function that would've been called had a NETDEV_UP was sent for this netdev in the inetaddr notification chain. This patch also fixes a bug when a LAG netdev with an existing RIF is enslaved to a VRF. Before this patch, each LAG port would drop the reference on the RIF, but would re-join the same one (in the wrong VR) soon after. With this patch, the corresponding RIF is first destroyed and a new one is created using the correct VR. Fixes: 7179eb5acd59 ("mlxsw: spectrum_router: Add support for VRFs") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01Merge tag 'mlx5-updates-2017-04-30' of ↵David S. Miller14-262/+1074
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux mlx5-updates-2017-04-30 Or says: ================ mlx5 neigh update This series (whose code name is 'neigh update') from Hadar, enhances the mlx5 TC IP tunnel offloads to deal with changes to tunnel destination neighbours used in offloaded flows which involved encapsulation. In order to keep track on the validity state of such neighbours, we register a netevent notifier callback and act on NEIGH_UPDATE events: if a neighbour becomes valid, offload the related flows to HW (the other way around when neigh becomes invalid) and similarly when a neigh mac addresses changes. Since this traffic is offloaded from the host OS, the neighbour for the IP tunnel destination can mistakenly become STALE and deleted by the kernel since its 'used' value wasn't changed. To address that, we proactively update the neighbour 'used' value every DELAY_PROBE_TIME seconds, using time stamps generated by the existing driver code for HW flow counters. We use the DELAY_PROBE_TIME_UPDATE event to adjust the frequency of the updates. Prior to the core of the series, there's a patch from Saeed that introduces an extendable vport representor implementation scheme. It provides a separation between the eswitch to the netdev related aspects of the representors. We would like to thank Ido Schimmel and Ilya Lesokhin for their coaching && advice through the long design and review cycles while we struggled to understand and (hopefully correctly) implement the locking around the different driver flows(..) . - Or. ================= Misc Updates: From Tariq: Some small performance and trivial code optimization for mlx5 netdev driver - Optimize poll ICOSQ completion queue - Use prefetchw when a write is to follow - Use u8 as ownership type in mlx5e_get_cqe() From Eran: - Disable LRO by default on specific setups From Eli: - Small cleanup for E-Switch to avoid redundant allocation Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01qed: Prevent warning without CONFIG_RFS_ACCELMintz, Yuval1-0/+2
After removing the PTP related initialization from slowpath start, the remaining PTT entry is required only in case CONFIG_RFS_ACCEL is set. Otherwise, it leads to a warning due to it being unused. Fixes: d179bd1699fc ("qed: Acquire/release ptt_ptp lock when enabling/disabling PTP") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01Merge branch 'qed-RoCE-fixes'David S. Miller5-37/+101
Yuval Mintz says: ==================== qed: RoCE related pseudo-fixes This series contains multiple small corrections to the RoCE logic in qed plus some debug information and inter-module parameter meant to prevent issues further along. - #1, #6 Share information with protocol driver [either new or filling missing bits in existing API]. - #2, #3 correct error flows in qed. - #4 add debug related information. - #5 fixes a minor issue in the HW configuration. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01qed: output the DPM status and WID countRam Amrani4-1/+10
Output to the RDMA driver whether DPM mode is enabled or disabled in the HW and if so what is the number of WIDs it supports Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01qed: align DPI configuration to HW requirementsRam Amrani2-7/+6
When calculating doorbell BAR partitioning round up the number of CPUs to the nearest power of 2 so the size of the DPI (per user section) configured in the hardware will be stored properly and not truncated. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01qed: verify RoCE resource bitmaps are releasedRam Amrani2-27/+80
Add mechanism to verify RoCE resources are released prior to freeing the bitmaps. If this is not the case, print what resources were not released. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01qed: add error handling flow to TID deregistratin posting failureRam Amrani1-0/+2
If the posting of the ramrod for the purpose of TID deregistration fails, abort the deregistration operation without using the FW's return code. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01qed: remove unused SQ error stateRam Amrani1-2/+1
The internal RoCE SQE QP state isn't being used. Instead we mark the QP as in regular error state. Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01qed: configure the RoCE max message sizeRam Amrani1-0/+2
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01bpf: enhance verifier to understand stack pointer arithmeticYonghong Song2-6/+23
llvm 4.0 and above generates the code like below: .... 440: (b7) r1 = 15 441: (05) goto pc+73 515: (79) r6 = *(u64 *)(r10 -152) 516: (bf) r7 = r10 517: (07) r7 += -112 518: (bf) r2 = r7 519: (0f) r2 += r1 520: (71) r1 = *(u8 *)(r8 +0) 521: (73) *(u8 *)(r2 +45) = r1 .... and the verifier complains "R2 invalid mem access 'inv'" for insn #521. This is because verifier marks register r2 as unknown value after #519 where r2 is a stack pointer and r1 holds a constant value. Teach verifier to recognize "stack_ptr + imm" and "stack_ptr + reg with const val" as valid stack_ptr with new offset. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01benet: Use time_before_eq for time comparisonKarim Eshapa1-3/+3
Use time_before_eq for time comparison more safe and dealing with timer wrapping to be future-proof. Signed-off-by: Karim Eshapa <karim.eshapa@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01flower: check unused bits in MPLS fieldsBenjamin LaHaise1-10/+22
Since several of the the netlink attributes used to configure the flower classifier's MPLS TC, BOS and Label fields have additional bits which are unused, check those bits to ensure that they are actually 0 as suggested by Jamal. Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com> Cc: David Miller <davem@davemloft.net> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Simon Horman <simon.horman@netronome.com> Cc: Jakub Kicinski <kubakici@wp.pl> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller113-836/+853
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter updates for your net-next tree. A large bunch of code cleanups, simplify the conntrack extension codebase, get rid of the fake conntrack object, speed up netns by selective synchronize_net() calls. More specifically, they are: 1) Check for ct->status bit instead of using nfct_nat() from IPVS and Netfilter codebase, patch from Florian Westphal. 2) Use kcalloc() wherever possible in the IPVS code, from Varsha Rao. 3) Simplify FTP IPVS helper module registration path, from Arushi Singhal. 4) Introduce nft_is_base_chain() helper function. 5) Enforce expectation limit from userspace conntrack helper, from Gao Feng. 6) Add nf_ct_remove_expect() helper function, from Gao Feng. 7) NAT mangle helper function return boolean, from Gao Feng. 8) ctnetlink_alloc_expect() should only work for conntrack with helpers, from Gao Feng. 9) Add nfnl_msg_type() helper function to nfnetlink to build the netlink message type. 10) Get rid of unnecessary cast on void, from simran singhal. 11) Use seq_puts()/seq_putc() instead of seq_printf() where possible, also from simran singhal. 12) Use list_prev_entry() from nf_tables, from simran signhal. 13) Remove unnecessary & on pointer function in the Netfilter and IPVS code. 14) Remove obsolete comment on set of rules per CPU in ip6_tables, no longer true. From Arushi Singhal. 15) Remove duplicated nf_conntrack_l4proto_udplite4, from Gao Feng. 16) Remove unnecessary nested rcu_read_lock() in __nf_nat_decode_session(). Code running from hooks are already guaranteed to run under RCU read side. 17) Remove deadcode in nf_tables_getobj(), from Aaron Conole. 18) Remove double assignment in nf_ct_l4proto_pernet_unregister_one(), also from Aaron. 19) Get rid of unsed __ip_set_get_netlink(), from Aaron Conole. 20) Don't propagate NF_DROP error to userspace via ctnetlink in __nf_nat_alloc_null_binding() function, from Gao Feng. 21) Revisit nf_ct_deliver_cached_events() to remove unnecessary checks, from Gao Feng. 22) Kill the fake untracked conntrack objects, use ctinfo instead to annotate a conntrack object is untracked, from Florian Westphal. 23) Remove nf_ct_is_untracked(), now obsolete since we have no conntrack template anymore, from Florian. 24) Add event mask support to nft_ct, also from Florian. 25) Move nf_conn_help structure to include/net/netfilter/nf_conntrack_helper.h. 26) Add a fixed 32 bytes scratchpad area for conntrack helpers. Thus, we don't deal with variable conntrack extensions anymore. Make sure userspace conntrack helper doesn't go over that size. Remove variable size ct extension infrastructure now this code got no more clients. From Florian Westphal. 27) Restore offset and length of nf_ct_ext structure to 8 bytes now that wraparound is not possible any longer, also from Florian. 28) Allow to get rid of unassured flows under stress in conntrack, this applies to DCCP, SCTP and TCP protocols, from Florian. 29) Shrink size of nf_conntrack_ecache structure, from Florian. 30) Use TCP_MAX_WSCALE instead of hardcoded 14 in TCP tracker, from Gao Feng. 31) Register SYNPROXY hooks on demand, from Florian Westphal. 32) Use pernet hook whenever possible, instead of global hook registration, from Florian Westphal. 33) Pass hook structure to ebt_register_table() to consolidate some infrastructure code, from Florian Westphal. 34) Use consume_skb() and return NF_STOLEN, instead of NF_DROP in the SYNPROXY code, to make sure device stats are not fooled, patch from Gao Feng. 35) Remove NF_CT_EXT_F_PREALLOC this kills quite some code that we don't need anymore if we just select a fixed size instead of expensive runtime time calculation of this. From Florian. 36) Constify nf_ct_extend_register() and nf_ct_extend_unregister(), from Florian. 37) Simplify nf_ct_ext_add(), this kills nf_ct_ext_create(), from Florian. 38) Attach NAT extension on-demand from masquerade and pptp helper path, from Florian. 39) Get rid of useless ip_vs_set_state_timeout(), from Aaron Conole. 40) Speed up netns by selective calls of synchronize_net(), from Florian Westphal. 41) Silence stack size warning gcc in 32-bit arch in snmp helper, from Florian. 42) Inconditionally call nf_ct_ext_destroy(), even if we have no extensions, to deal with the NF_NAT_MANIP_SRC case. Patch from Liping Zhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01Merge branch 'bpf-samples-skb_mode-bug-fixes'David S. Miller4-11/+12
Jesper Dangaard Brouer says: ==================== samples/bpf: two bug fixes to XDP_FLAGS_SKB_MODE attaching Two small bugfixes for: commit 3993f2cb983b ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01samples/bpf: fix XDP_FLAGS_SKB_MODE detach for xdp_tx_iptunnelJesper Dangaard Brouer1-2/+2
The xdp_tx_iptunnel program can be terminated in two ways, after N-seconds or via Ctrl-C SIGINT. The SIGINT code path does not handle detatching the correct XDP program, in-case the program was attached with XDP_FLAGS_SKB_MODE. Fix this by storing the XDP flags as a global variable, which is available for the SIGINT handler function. Fixes: 3993f2cb983b ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01samples/bpf: fix SKB_MODE flag to be a 32-bit unsigned intJesper Dangaard Brouer4-10/+11
The kernel side of XDP_FLAGS_SKB_MODE is unsigned, and the rtnetlink IFLA_XDP_FLAGS is defined as NLA_U32. Thus, userspace programs under samples/bpf/ should use the correct type. Fixes: 3993f2cb983b ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01Merge branch 'xdp-netlink-ext-ack'David S. Miller8-24/+52
Jakub Kicinski says: ==================== xdp: use netlink extended ACK reporting This series is an attempt to make XDP more user friendly by enabling exploiting the recently added netlink extended ACK reporting to carry messages to user space. David Ahern's iproute2 ext ack patches for ip link are sufficient to show the errors like this: Error: nfp: MTU too large w/ XDP enabled Where the message is coming directly from the driver. There could still be a bit of a leap for a complete novice from the message above to the right settings, but it's a big improvement over the standard "Invalid argument" message. v1/non-rfc: - add a separate macro in patch 1; - add KBUILD_MODNAME as part of the message (Daniel); - don't print the error to logs in patch 1. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01virtio_net: make use of extended ack message reportingJakub Kicinski1-4/+7
Try to carry error messages to the user via the netlink extended ack message attribute. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01nfp: make use of extended ack message reportingJakub Kicinski3-12/+17
Try to carry error messages to the user via the netlink extended ack message attribute. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01xdp: propagate extended ack to XDP setupJakub Kicinski3-8/+20
Drivers usually have a number of restrictions for running XDP - most common being buffer sizes, LRO and number of rings. Even though some drivers try to be helpful and print error messages experience shows that users don't often consult kernel logs on netlink errors. Try to use the new extended ack mechanism to carry the message back to user space. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01netlink: add NULL-friendly helper for setting extended ACK messageJakub Kicinski1-0/+8
As we propagate extended ack reporting throughout various paths in the kernel it may be that the same function is called with the extended ack parameter passed as NULL. One place where that happens is in drivers which have a centralized reconfiguration function called both from ndos and from ethtool_ops. Add a new helper for setting the error message in such conditions. Existing helper is left as is to encourage propagating the ext act fully wherever possible. It also makes it clear in the code which messages may be lost due to ext ack being NULL. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01netfilter: nf_ct_ext: invoke destroy even when ext is not attachedLiping Zhang2-12/+3
For NF_NAT_MANIP_SRC, we will insert the ct to the nat_bysource_table, then remove it from the nat_bysource_table via nat_extend->destroy. But now, the nat extension is attached on demand, so if the nat extension is not attached, we will not be notified when the ct is destroyed, i.e. we may fail to remove ct from the nat_bysource_table. So just keep it simple, even if the extension is not attached, we will still invoke the related ext->destroy. And this will also preserve the flexibility for the future extension. Fixes: 9a08ecfe74d7 ("netfilter: don't attach a nat extension by default") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-05-01Merge tag 'ipvs3-for-v4.12' of ↵Pablo Neira Ayuso3-25/+1
http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== Third Round of IPVS Updates for v4.12 please consider these enhancements to IPVS for v4.12. If it is too late for v4.12 then please consider them for v4.13. * Remove unused function * Correct comparison of unsigned value ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>