summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2019-05-03ether: Add dedicated Ethertype for pseudo-802.1Q DSA taggingVladimir Oltean1-0/+1
There are two possible utilizations so far: - Switch devices that don't support a native insertion/extraction header on the CPU port may still enjoy the benefits of port isolation with a custom VLAN tag. For this, they need to have a customizable TPID in hardware and a new Ethertype to distinguish between real 802.1Q traffic and the private tags used for port separation. - Switches that don't support the deactivation of VLAN awareness, but still want to have a mode in which they accept all traffic, including frames that are tagged with a VLAN not configured on their ports, may use this as a fake to trick the hardware into thinking that the TPID for VLAN is something other than 0x8100. What follows after the ETH_P_DSA_8021Q EtherType is a regular VLAN header (TCI), however there is no other EtherType that can be used for this purpose and doesn't already have a well-defined meaning. ETH_P_8021AD, ETH_P_QINQ1, ETH_P_QINQ2 and ETH_P_QINQ3 expect that another follow-up VLAN tag is present, which is not the case here. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Suggested-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-03net: dsa: sja1105: Error out if RGMII delays are requested in DTVladimir Oltean3-1/+41
Documentation/devicetree/bindings/net/ethernet.txt is confusing because it says what the MAC should not do, but not what it *should* do: * "rgmii-rxid" (RGMII with internal RX delay provided by the PHY, the MAC should not add an RX delay in this case) The gap in semantics is threefold: 1. Is it illegal for the MAC to apply the Rx internal delay by itself, and simplify the phy_mode (mask off "rgmii-rxid" into "rgmii") before passing it to of_phy_connect? The documentation would suggest yes. 1. For "rgmii-rxid", while the situation with the Rx clock skew is more or less clear (needs to be added by the PHY), what should the MAC driver do about the Tx delays? Is it an implicit wild card for the MAC to apply delays in the Tx direction if it can? What if those were already added as serpentine PCB traces, how could that be made more obvious through DT bindings so that the MAC doesn't attempt to add them twice and again potentially break the link? 3. If the interface is a fixed-link and therefore the PHY object is fixed (a purely software entity that obviously cannot add clock skew), what is the meaning of the above property? So an interpretation of the RGMII bindings was chosen that hopefully does not contradict their intention but also makes them more applied. The SJA1105 driver understands to act upon "rgmii-*id" phy-mode bindings if the port is in the PHY role (either explicitly, or if it is a fixed-link). Otherwise it always passes the duty of setting up delays to the PHY driver. The error behavior that this patch adds is required on SJA1105E/T where the MAC really cannot apply internal delays. If the other end of the fixed-link cannot apply RGMII delays either (this would be specified through its own DT bindings), then the situation requires PCB delays. For SJA1105P/Q/R/S, this is however hardware supported and the error is thus only temporary. I created a stub function pointer for configuring delays per-port on RXC and TXC, and will implement it when I have access to a board with this hardware setup. Meanwhile do not allow the user to select an invalid configuration. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-03net: dsa: sja1105: Add support for FDB and MDB managementVladimir Oltean3-0/+239
Currently only the (more difficult) first generation E/T series is supported. Here the TCAM is only 4-way associative, and to know where the hardware will search for a FDB entry, we need to perform the same hash algorithm in order to install the entry in the correct bin. On P/Q/R/S, the TCAM should be fully associative. However the SPI command interface is different, and because I don't have access to a new-generation device at the moment, support for it is TODO. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-03net: dsa: Introduce driver for NXP SJA1105 5-port L2 switchVladimir Oltean14-0/+4000
At this moment the following is supported: * Link state management through phylib * Autonomous L2 forwarding managed through iproute2 bridge commands. IP termination must be done currently through the master netdevice, since the switch is unmanaged at this point and using DSA_TAG_PROTO_NONE. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Georg Waibel <georg.waibel@sensor-technik.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-03lib: Add support for generic packing operationsVladimir Oltean6-0/+437
This provides an unified API for accessing register bit fields regardless of memory layout. The basic unit of data for these API functions is the u64. The process of transforming an u64 from native CPU encoding into the peripheral's encoding is called 'pack', and transforming it from peripheral to native CPU encoding is 'unpack'. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-03net: macb: shrink macb_platform_data structureNicolas Ferre2-57/+11
This structure was used intensively for machine specific values when DT was not used. Since the removal of AVR32 from the kernel, this structure is only used for passing clocks from PCI macb wrapper, all other fields being 0. All other known platforms use DT. Remove the leftovers but make sure that PCI macb still works as expected by using default values: - phydev->irq is set to PHY_POLL by mdiobus_alloc() - mii_bus->phy_mask is cleared while allocating it - bp->phy_interface is set to PHY_INTERFACE_MODE_MII if mode not found in DT. This simplifies driver probe path and particularly phy handling. Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-03net: macb: remove redundant struct phy_device declarationNicolas Ferre1-2/+0
While moving the chunk of code during 739de9a1563a ("net: macb: Reorganize macb_mii bringup"), the declaration of struct phy_device declaration was kept. It's not useful in this function as we alrady have a phydev pointer. Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller167-725/+1373
Three trivial overlapping conflicts. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds67-289/+544
Pull networking fixes from David Miller: 1) Out of bounds access in xfrm IPSEC policy unlink, from Yue Haibing. 2) Missing length check for esp4 UDP encap, from Sabrina Dubroca. 3) Fix byte order of RX STBC access in mac80211, from Johannes Berg. 4) Inifnite loop in bpftool map create, from Alban Crequy. 5) Register mark fix in ebpf verifier after pkt/null checks, from Paul Chaignon. 6) Properly use rcu_dereference_sk_user_data in L2TP code, from Eric Dumazet. 7) Buffer overrun in marvell phy driver, from Andrew Lunn. 8) Several crash and statistics handling fixes to bnxt_en driver, from Michael Chan and Vasundhara Volam. 9) Several fixes to the TLS layer from Jakub Kicinski (copying negative amounts of data in reencrypt, reencrypt frag copying, blind nskb->sk NULL deref, etc). 10) Several UDP GRO fixes, from Paolo Abeni and Eric Dumazet. 11) PID/UID checks on ipv6 flow labels are inverted, from Willem de Bruijn. 12) Use after free in l2tp, from Eric Dumazet. 13) IPV6 route destroy races, also from Eric Dumazet. 14) SCTP state machine can erroneously run recursively, fix from Xin Long. 15) Adjust AF_PACKET msg_name length checks, add padding bytes if necessary. From Willem de Bruijn. 16) Preserve skb_iif, so that forwarded packets have consistent values even if fragmentation is involved. From Shmulik Ladkani. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits) udp: fix GRO packet of death ipv6: A few fixes on dereferencing rt->from rds: ib: force endiannes annotation selftests: fib_rule_tests: print the result and return 1 if any tests failed ipv4: ip_do_fragment: Preserve skb_iif during fragmentation net/tls: avoid NULL pointer deref on nskb->sk in fallback selftests: fib_rule_tests: Fix icmp proto with ipv6 packet: validate msg_namelen in send directly packet: in recvmsg msg_name return at least sizeof sockaddr_ll sctp: avoid running the sctp state machine recursively stmmac: pci: Fix typo in IOT2000 comment Documentation: fix netdev-FAQ.rst markup warning ipv6: fix races in ip6_dst_destroy() l2ip: fix possible use-after-free appletalk: Set error code if register_snap_client failed net: dsa: bcm_sf2: fix buffer overflow doing set_rxnfc rxrpc: Fix net namespace cleanup ipv6/flowlabel: wait rcu grace period before put_pid() vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach tcp: add sanity tests in tcp_add_backlog() ...
2019-05-02Merge tag 'for-linus-20190502' of git://git.kernel.dk/linux-blockLinus Torvalds3-93/+161
Pull io_uring fixes from Jens Axboe: "This is mostly io_uring fixes/tweaks. Most of these were actually done in time for the last -rc, but I wanted to ensure that everything tested out great before including them. The code delta looks larger than it really is, as it's mostly just comment additions/changes. Outside of the comment additions/changes, this is mostly removal of unnecessary barriers. In all, this pull request contains: - Tweak to how we handle errors at submission time. We now post a completion event if the error occurs on behalf of an sqe, instead of returning it through the system call. If the error happens outside of a specific sqe, we return the error through the system call. This makes it nicer to use and makes the "normal" use case behave the same as the offload cases. (me) - Fix for a missing req reference drop from async context (me) - If an sqe is submitted with RWF_NOWAIT, don't punt it to async context. Return -EAGAIN directly, instead of using it as a hint to do async punt. (Stefan) - Fix notes on barriers (Stefan) - Remove unnecessary barriers (Stefan) - Fix potential double free of memory in setup error (Mark) - Further improve sq poll CPU validation (Mark) - Fix page allocation warning and leak on buffer registration error (Mark) - Fix iov_iter_type() for new no-ref flag (Ming) - Fix a case where dio doesn't honor bio no-page-ref (Ming)" * tag 'for-linus-20190502' of git://git.kernel.dk/linux-block: io_uring: avoid page allocation warnings iov_iter: fix iov_iter_type block: fix handling for BIO_NO_PAGE_REF io_uring: drop req submit reference always in async punt io_uring: free allocated io_memory once io_uring: fix SQPOLL cpu validation io_uring: have submission side sqe errors post a cqe io_uring: remove unnecessary barrier after unsetting IORING_SQ_NEED_WAKEUP io_uring: remove unnecessary barrier after incrementing dropped counter io_uring: remove unnecessary barrier before reading SQ tail io_uring: remove unnecessary barrier after updating SQ head io_uring: remove unnecessary barrier before reading cq head io_uring: remove unnecessary barrier before wq_has_sleeper io_uring: fix notes on barriers io_uring: fix handling SQEs requesting NOWAIT
2019-05-02Merge tag 'pci-v5.1-fixes-3' of ↵Linus Torvalds5-4/+32
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "I apologize for sending these so late in the cycle. We went back and forth about how to deal with the unexpected logging of intentional link state changes and finally decided to just config them off by default. PCI fixes: - Stop ignoring "pci=disable_acs_redir" parameter (Logan Gunthorpe) - Use shared MSI/MSI-X vector for Link Bandwidth Management (Alex Williamson) - Add Kconfig option for Link Bandwidth notification messages (Keith Busch)" * tag 'pci-v5.1-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI/LINK: Add Kconfig option (default off) PCI/portdrv: Use shared MSI/MSI-X vector for Bandwidth Management PCI: Fix issue with "pci=disable_acs_redir" parameter being ignored
2019-05-02Merge tag 'mtd/fixes-for-5.1-rc6' of ↵Linus Torvalds1-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull MTD fix from Richard Weinberger: "A single regression fix for the marvell nand driver" * tag 'mtd/fixes-for-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: rawnand: marvell: Clean the controller state before each operation
2019-05-02PCI/LINK: Add Kconfig option (default off)Keith Busch3-1/+13
e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") added dmesg logging whenever a link changes speed or width to a state that is considered degraded. Unfortunately, it cannot differentiate signal integrity-related link changes from those intentionally initiated by an endpoint driver, including drivers that may live in userspace or VMs when making use of vfio-pci. Some GPU drivers actively manage the link state to save power, which generates a stream of messages like this: vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 GT/s x16 link) Since we can't distinguish the intentional changes from the signal integrity issues, leave the reporting turned off by default. Add a Kconfig option to turn it on if desired. Fixes: e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth notification") Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2019-05-02net: ll_temac: Fix typo bug for 32-bitEsben Haabendal1-1/+1
Fixes: d84aec42151b ("net: ll_temac: Fix support for 64-bit platforms") Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-02ice: Use dev_err when ice_cfg_vsi_lan failsBrett Creeley1-3/+6
dev_err makes more sense than dev_info when this call fails. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-02ice: Refactor link event flowBrett Creeley2-48/+65
Currently the link event flow works, but can be much better. Refactor the link event flow to make it cleaner and more clear on what is going on. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-02ice: Add missing PHY type to link settingsTony Nguyen1-0/+1
The PHY type ICE_PHY_TYPE_LOW_25G_AUI_C2C is missing from ice_get_settings_link_up() which is causing a warning message for unrecognized PHY. Add the PHY type to correctly set the settings and avoid the warning message. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-02ice: Add reg_idx variable in ice_q_vector structureBrett Creeley4-26/+76
Every time we want to re-enable interrupts and/or write to a register that requires an interrupt vector's hardware index we do the following: vsi->hw_base_vector + q_vector->v_idx This is a wasteful operation, especially in the hot path. Fix this by adding a u16 reg_idx member to the ice_q_vector structure and make the necessary changes to make this work. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-02ice: Remove runtime change of PFINT_OICR_ENA registerMd Fahad Iqbal Polash2-23/+3
Runtime change of PFINT_OICR_ENA register is unnecessary. The handlers should always clear the atomic bit for each task as they start, because it will make sure that any late interrupt will either 1) re-set the bit, or 2) be handled directly in the "already running" task handler. Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-02ice: Fix issue when adding more than allowed VLANsAkeem G Abodunrin2-7/+21
This patch fixes issue with non trusted VFs being able to add more than permitted number of VLANs by adding a check in ice_vc_process_vlan_msg. Also don't return an error in this case as the VF does not need to know that it is not trusted. Also rework ice_vsi_kill_vlan to use the right types. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-02ice: Remove unnecessary wait when disabling/enabling Rx queuesBrett Creeley1-8/+2
In ice_vsi_ctrl_rx_rings() we are unnecessarily waiting for QRX_CTRL_QENA_REQ and QRX_CTRL_QENA_STAT to be the same value prior to disabling each Rx queue. There is no reason to do this so remove this wait loop as we already have a wait loop after disabling/enabling the Rx queue through the QRX_CTRL register to make sure it gets successfully disabled/enabled. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-02ice: Add ability to update rx-usecs-highBrett Creeley4-2/+33
Currently the driver allows rx-usecs-high values to be set, but when querying the device for rx-usecs-high the value does not stick. This is because it was not yet implemented. Add code to allow the user to change rx-usecs-high and use this to set the q_vector's intrl value. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-02ice: Add 52 byte RSS hash key supportPaul Greenwalt2-7/+8
Add support to set 52 byte RSS hash key. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-02ice: Use ice_for_each_q_vector macro where possibleBrett Creeley2-8/+8
There are many places in the code where we do the following: for (i = 0; i < vsi->num_q_vectors; i++) Instead use the macro mentioned in the commit title: ice_for_each_q_vector(vsi, i) Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-02ice: Validate ring existence and its q_vector per VSIMaciej Fijalkowski1-1/+2
When stopping Tx rings, we use 'i' as an ring array index for looking up whether the ice_ring exists and have assigned a q_vector. This checks rings only within a given TC and we need to go through every ring in VSI. Use 'q_idx' instead. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-02ice: Reduce scope of variable in ice_vsi_cfg_rxqsBrett Creeley1-7/+11
Reduce scope of the variable 'err' to inside the for loop instead of using it as a second looping conditional. Also while here, improve the debug message if we fail to configure a Rx queue. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-02ice: Resolve static analysis reported issueBruce Allan1-10/+2
Static analysis points out the default case in the switch statement in ice_get_itr_intrl_gran() is an infeasible condition causing the default case statement to be unreachable. Remove it and since the function no longer returns anything but success, change it to just return void and update the only call to it accordingly. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-02ice: Return configuration error without queue to disableAkeem G Abodunrin1-7/+10
If there is no queue to disable, return appropriate configuration error earlier without acquiring the lock. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-02ice: Create framework for VSI queue contextAnirudh Venkataramanan7-56/+205
This patch introduces a framework to store queue specific information in VSI queue contexts. Currently VSI queue context (represented by struct ice_q_ctx) only has q_handle as a member. In future patches, this structure will be updated to hold queue specific information. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-01udp: fix GRO packet of deathEric Dumazet1-3/+10
syzbot was able to crash host by sending UDP packets with a 0 payload. TCP does not have this issue since we do not aggregate packets without payload. Since dev_gro_receive() sets gso_size based on skb_gro_len(skb) it seems not worth trying to cope with padded packets. BUG: KASAN: slab-out-of-bounds in skb_gro_receive+0xf5f/0x10e0 net/core/skbuff.c:3826 Read of size 16 at addr ffff88808893fff0 by task syz-executor612/7889 CPU: 0 PID: 7889 Comm: syz-executor612 Not tainted 5.1.0-rc7+ #96 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187 kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 __asan_report_load16_noabort+0x14/0x20 mm/kasan/generic_report.c:133 skb_gro_receive+0xf5f/0x10e0 net/core/skbuff.c:3826 udp_gro_receive_segment net/ipv4/udp_offload.c:382 [inline] call_gro_receive include/linux/netdevice.h:2349 [inline] udp_gro_receive+0xb61/0xfd0 net/ipv4/udp_offload.c:414 udp4_gro_receive+0x763/0xeb0 net/ipv4/udp_offload.c:478 inet_gro_receive+0xe72/0x1110 net/ipv4/af_inet.c:1510 dev_gro_receive+0x1cd0/0x23c0 net/core/dev.c:5581 napi_gro_frags+0x36b/0xd10 net/core/dev.c:5843 tun_get_user+0x2f24/0x3fb0 drivers/net/tun.c:1981 tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2027 call_write_iter include/linux/fs.h:1866 [inline] do_iter_readv_writev+0x5e1/0x8e0 fs/read_write.c:681 do_iter_write fs/read_write.c:957 [inline] do_iter_write+0x184/0x610 fs/read_write.c:938 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1002 do_writev+0x15e/0x370 fs/read_write.c:1037 __do_sys_writev fs/read_write.c:1110 [inline] __se_sys_writev fs/read_write.c:1107 [inline] __x64_sys_writev+0x75/0xb0 fs/read_write.c:1107 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x441cc0 Code: 05 48 3d 01 f0 ff ff 0f 83 9d 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 3d 51 93 29 00 00 75 14 b8 14 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 74 09 fc ff c3 48 83 ec 08 e8 ba 2b 00 00 RSP: 002b:00007ffe8c716118 EFLAGS: 00000246 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 00007ffe8c716150 RCX: 0000000000441cc0 RDX: 0000000000000001 RSI: 00007ffe8c716170 RDI: 00000000000000f0 RBP: 0000000000000000 R08: 000000000000ffff R09: 0000000000a64668 R10: 0000000020000040 R11: 0000000000000246 R12: 000000000000c2d9 R13: 0000000000402b50 R14: 0000000000000000 R15: 0000000000000000 Allocated by task 5143: save_stack+0x45/0xd0 mm/kasan/common.c:75 set_track mm/kasan/common.c:87 [inline] __kasan_kmalloc mm/kasan/common.c:497 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:505 slab_post_alloc_hook mm/slab.h:437 [inline] slab_alloc mm/slab.c:3393 [inline] kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3555 mm_alloc+0x1d/0xd0 kernel/fork.c:1030 bprm_mm_init fs/exec.c:363 [inline] __do_execve_file.isra.0+0xaa3/0x23f0 fs/exec.c:1791 do_execveat_common fs/exec.c:1865 [inline] do_execve fs/exec.c:1882 [inline] __do_sys_execve fs/exec.c:1958 [inline] __se_sys_execve fs/exec.c:1953 [inline] __x64_sys_execve+0x8f/0xc0 fs/exec.c:1953 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 5351: save_stack+0x45/0xd0 mm/kasan/common.c:75 set_track mm/kasan/common.c:87 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459 kasan_slab_free+0xe/0x10 mm/kasan/common.c:467 __cache_free mm/slab.c:3499 [inline] kmem_cache_free+0x86/0x260 mm/slab.c:3765 __mmdrop+0x238/0x320 kernel/fork.c:677 mmdrop include/linux/sched/mm.h:49 [inline] finish_task_switch+0x47b/0x780 kernel/sched/core.c:2746 context_switch kernel/sched/core.c:2880 [inline] __schedule+0x81b/0x1cc0 kernel/sched/core.c:3518 preempt_schedule_irq+0xb5/0x140 kernel/sched/core.c:3745 retint_kernel+0x1b/0x2d arch_local_irq_restore arch/x86/include/asm/paravirt.h:767 [inline] kmem_cache_free+0xab/0x260 mm/slab.c:3766 anon_vma_chain_free mm/rmap.c:134 [inline] unlink_anon_vmas+0x2ba/0x870 mm/rmap.c:401 free_pgtables+0x1af/0x2f0 mm/memory.c:394 exit_mmap+0x2d1/0x530 mm/mmap.c:3144 __mmput kernel/fork.c:1046 [inline] mmput+0x15f/0x4c0 kernel/fork.c:1067 exec_mmap fs/exec.c:1046 [inline] flush_old_exec+0x8d9/0x1c20 fs/exec.c:1279 load_elf_binary+0x9bc/0x53f0 fs/binfmt_elf.c:864 search_binary_handler fs/exec.c:1656 [inline] search_binary_handler+0x17f/0x570 fs/exec.c:1634 exec_binprm fs/exec.c:1698 [inline] __do_execve_file.isra.0+0x1394/0x23f0 fs/exec.c:1818 do_execveat_common fs/exec.c:1865 [inline] do_execve fs/exec.c:1882 [inline] __do_sys_execve fs/exec.c:1958 [inline] __se_sys_execve fs/exec.c:1953 [inline] __x64_sys_execve+0x8f/0xc0 fs/exec.c:1953 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff88808893f7c0 which belongs to the cache mm_struct of size 1496 The buggy address is located 600 bytes to the right of 1496-byte region [ffff88808893f7c0, ffff88808893fd98) The buggy address belongs to the page: page:ffffea0002224f80 count:1 mapcount:0 mapping:ffff88821bc40ac0 index:0xffff88808893f7c0 compound_mapcount: 0 flags: 0x1fffc0000010200(slab|head) raw: 01fffc0000010200 ffffea00025b4f08 ffffea00027b9d08 ffff88821bc40ac0 raw: ffff88808893f7c0 ffff88808893e440 0000000100000001 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88808893fe80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88808893ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88808893ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff888088940000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff888088940080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01Merge tag 'for-v5.1-rc' of ↵Linus Torvalds2-6/+3
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply fixes from Sebastian Reichel: "Two more fixes for the 5.1 cycle. One division by zero fix in a specific driver and one core workaround for bad userspace behaviour from systemd regarding uevents. IMHO this can be considered to be a userspace bug, but the debug messages are useless anyways - cpcap-battery: fix a division by zero - core: fix systemd issue due to log messages produced by uevent" * tag 'for-v5.1-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: power: supply: sysfs: prevent endless uevent loop with CONFIG_POWER_SUPPLY_DEBUG power: supply: cpcap-battery: Fix division by zero
2019-05-01net/mlx5: E-Switch, Use atomic rep state to serialize state changeBodong Wang2-20/+18
When the state of rep was introduced, it was also designed to prevent duplicate unloading of the same rep. Considering the following two flows when an eswitch manager is at switchdev mode with n VF reps loaded. +--------------------------------------+--------------------------------+ | cpu-0 | cpu-1 | | -------- | -------- | | mlx5_ib_remove | mlx5_eswitch_disable_sriov | | mlx5_ib_unregister_vport_reps | esw_offloads_cleanup | | mlx5_eswitch_unregister_vport_reps | esw_offloads_unload_all_reps | | __unload_reps_all_vport | __unload_reps_all_vport | +--------------------------------------+--------------------------------+ These two flows will try to unload the same rep. Per original design, once one flow unloads the rep, the state moves to REGISTERED. The 2nd flow will no longer needs to do the unload and bails out. However, as read and write of the state is not atomic, when 1st flow is doing the unload, the state is still LOADED, 2nd flow is able to do the same unload action. Kernel crash will happen. To solve this, driver should do atomic test-and-set for the state. So that only one flow can change the rep state from LOADED to REGISTERED, and proceed to do the actual unloading. Since the state is changing to atomic type, all other read/write should be atomic action as well. Fixes: f121e0ea9586 (net/mlx5: E-Switch, Add state to eswitch vport representors) Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5: E-Switch, Fix the check of legal vportBodong Wang2-45/+46
The check of legal vport is to ensure the vport number falls between 0 and total number of vports. Along with the introduction of uplink rep, enabled vports are not consecutive any more. Therefore, rely on the eswitch vport getter function to check if it's a valid vport. As the getter function relies on eswitch, add the check of vport group manager and validation the presence of eswitch structure. Remove the redundant check in the function calls. Since the vport array will be allocated once eswitch is initialized and will be kept alive if eswitch presents, no need to protect it with the state lock. Fixes: 5ae5162066d8 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport") Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5: E-Switch, Use getter to access all vport arrayBodong Wang2-9/+11
Some functions issue vport commands and access vport array using vport_index/vport_num interchangeably which is OK for VFs vports. However, this creates potential bug if those vports are not VFs (E.g, uplink, sf) where their vport_index don't equal to vport_num. Prepare code to access mlx5_vport structure using a getter function. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5: Use available mlx5_vport structParav Pandit1-58/+51
Several functions need to access mlx5_vport and vport_num. When these functions are called, caller already has mlx5_vport* available. Hence pass such mlx5_vport pointer. This is preparation patch to add error checks to mlx5_eswitch_get_vport() and to return error status. By doing so, reduce places where error check of mlx5_eswitch_get_vport() can be avoided. While doing such change, mlx5_eswitch_query_vport_drop_stats() gets corrected to work on vport, instead of vport_idx. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5: Reuse mlx5_esw_for_each_vf_vport macro in two filesParav Pandit3-52/+58
Currently mlx5_esw_for_each_vf_vport iterates over mlx5_vport entries in eswitch.c Same macro in eswitch_offloads.c iterates over vport number in eswitch_offloads.c Instead of duplicate macro names, to avoid confusion and to reuse the same macro in both files, move it to eswitch.h. To iterate over vport numbers where there is no need to iterate over mlx5_vport, but only a vport number is needed, rename those macros in eswitch_offloads.c to mlx5_esw_for_each_vf_num_vport*. While at it, keep all vport and vport rep iterators together. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5: Remove unused mlx5_query_nic_vport_vlansBodong Wang2-65/+0
mlx5_query_nic_vport_vlans() is not used anymore. Hence remove it. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5e: remove meaningless CFLAGS_tracepoint.oMasahiro Yamada1-2/+0
CFLAGS_tracepoint.o specifies CFLAGS for compiling tracepoint.c but it does not exist under drivers/net/ethernet/mellanox/mlx5/core/. CFLAGS_tracepoint.o is unused. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5e: Put the common XDP code into a functionMaxim Mikityanskiy1-36/+27
The same code that returns XDP frames and releases pages is used both in mlx5e_poll_xdpsq_cq and mlx5e_free_xdpsq_descs. Create a function that cleans up an MPWQE. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5e: ethtool, Add support for EEPROM high pages queryErez Alfasi3-10/+39
Add the support to read additional EEPROM information from high pages. Information for modules such as SFF-8436 and SFF-8636: 1) Application select table 2) User writable EEPROM 3) Thresholds and alarms Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01ethtool: Add SFF-8436 and SFF-8636 max EEPROM length definitionsErez Alfasi1-0/+3
Added max EEPROM length defines for ethtool usage: #define ETH_MODULE_SFF_8636_MAX_LEN 640 #define ETH_MODULE_SFF_8436_MAX_LEN 640 These definitions are used to determine the EEPROM data length when reading high eeprom pages. For example, SFF-8636 EEPROM data from page 03h needs to be stored at data[512] - data[639]. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5e: Return error when trying to insert existing flower filterVlad Buslov1-0/+1
With unlocked TC it is possible to have spurious deletes and inserts of same filter. TC layer needs drivers to always return error when flow insertion failed in order to correctly calculate "in_hw_count" for each filter. Fix mlx5e_configure_flower() to return -EEXIST when TC tries to insert a filter that is already provisioned to the driver. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5e: Replace TC VLAN pop with VLAN 0 rewrite in prio tag modeEli Britstein1-0/+36
Current ConnectX HW is unable to perform VLAN pop in TX path and VLAN push on RX path. To workaround that limitation untagged packets are tagged with VLAN ID 0x000 (priority tag) and pop/push actions are replaced by VLAN re-write actions (which are supported by the HW). Replace TC VLAN pop action with a VLAN priority tag header rewrite. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5e: ACLs for priority tag modeEli Britstein3-12/+193
Current ConnectX HW is unable to perform VLAN pop in TX path and VLAN push on RX path. As a workaround, untagged packets are tagged with VID 0x000 allowing pop/push actions to be exchanged with VLAN rewrite actions. Use the ingress ACL table, preceding the FDB, to push VLAN 0x000 ID tag for untagged packets and the egress ACL table, succeeding the FDB, to pop VLAN 0x000 ID tag. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5e: Turn on HW tunnel offload in all TIRsTariq Toukan4-1/+7
Hardware requires that all TIRs that steer traffic to the same RQ should share identical tunneled_offload_en value. For that, the tunneled_offload_en bit should be set/unset (according to the HW capability) for all TIRs', not only the ones dedicated for tunneled (inner) traffic. Fixes: 1b223dd39162 ("net/mlx5e: Fix checksum handling for non-stripped vlan packets") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01net/mlx5e: Take common TIR context settings into a functionTariq Toukan1-28/+21
Many TIR context settings are common to different TIR types, take them into a common function. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-05-01ipv6: A few fixes on dereferencing rt->fromMartin KaFai Lau1-20/+18
It is a followup after the fix in commit 9c69a1320515 ("route: Avoid crash from dereferencing NULL rt->from") rt6_do_redirect(): 1. NULL checking is needed on rt->from because a parallel fib6_info delete could happen that sets rt->from to NULL. (e.g. rt6_remove_exception() and fib6_drop_pcpu_from()). 2. fib6_info_hold() is not enough. Same reason as (1). Meaning, holding dst->__refcnt cannot ensure rt->from is not NULL or rt->from->fib6_ref is not 0. Instead of using fib6_info_hold_safe() which ip6_rt_cache_alloc() is already doing, this patch chooses to extend the rcu section to keep "from" dereference-able after checking for NULL. inet6_rtm_getroute(): 1. NULL checking is also needed on rt->from for a similar reason. Note that inet6_rtm_getroute() is using RTNL_FLAG_DOIT_UNLOCKED. Fixes: a68886a69180 ("net/ipv6: Make from in rt6_info rcu protected") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Wei Wang <weiwan@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01rds: ib: force endiannes annotationNicholas Mc Guire1-5/+3
While the endiannes is being handled correctly as indicated by the comment above the offending line - sparse was unhappy with the missing annotation as be64_to_cpu() expects a __be64 argument. To mitigate this annotation all involved variables are changed to a consistent __le64 and the conversion to uint64_t delayed to the call to rds_cong_map_updated(). Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01Merge branch 'net-mvpp2-cls-Add-classification'David S. Miller4-84/+545
Maxime Chevallier says: ==================== net: mvpp2: cls: Add classification This series is a rework of the previously standalone patch adding classification support for mvpp2 : https://lore.kernel.org/netdev/20190423075031.26074-1-maxime.chevallier@bootlin.com/ This patch has been reworked according to Saeed's review, to make sure that the location of the rule is always respected and serves as a way to prioritize rules between each other. This the 3rd iteration of this submission, but since it's now a series, I reset the revision numbering. This series implements that in a limited configuration for now, since we limit the total number of rules per port to 4. The main factors for this limitation are that : - We share the classification tables between all ports (4 max, although one is only used for internal loopback), hence we have to perform a logical separation between rules, which is done today by dedicated ranges for each port in each table - The "Flow table", which dictates which lookups operations are performed for an ingress packet, in subdivided into 22 "sub flows", each corresponding to a traffic type based on the L3 proto, L4 proto, the presence or not of a VLAN tag and the L3 fragmentation. This makes so that when adding a rule, it has to be added into each of these subflows, introducing duplications of entries and limiting our max number of entries. These limitations can be overcomed in several ways, but for readability sake, I'd rather submit basic classification offload support for now, and improve it gradually. This series also adds a small cosmetic cleanup patch (1), and also adds support for the "Drop" action compared to the first submission of this feature. It is simple enough to be added with this basic support. Compared to the first submissions, the NETIF_F_NTUPLE flag was also removed, following Saeed's comment. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-01net: mvpp2: cls: Allow dropping packets with classification offloadMaxime Chevallier3-9/+32
This commit introduces support for the "Drop" action in classification offload. This corresponds to the "-1" action with ethtool -N. This is achieved using the color marking actions available in the C2 engine, which associate a color to a packet. These colors can be either Green, Yellow or Red, Red meaning that the packet should be dropped. Green and Yellow colors are interpreted by the Policer, which isn't supported yet. This method of dropping using the Classifier is different than the already existing early-drop features, such as VLAN filtering and MAC UC/MC filtering, which are performed during the Parsing step, and therefore take precedence over classification actions. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>