summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2019-05-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds87-680/+1765
Pull networking fixes from David Miller: 1) Fix OOPS during nf_tables rule dump, from Florian Westphal. 2) Use after free in ip_vs_in, from Yue Haibing. 3) Fix various kTLS bugs (NULL deref during device removal resync, netdev notification ignoring, etc.) From Jakub Kicinski. 4) Fix ipv6 redirects with VRF, from David Ahern. 5) Memory leak fix in igmpv3_del_delrec(), from Eric Dumazet. 6) Missing memory allocation failure check in ip6_ra_control(), from Gen Zhang. And likewise fix ip_ra_control(). 7) TX clean budget logic error in aquantia, from Igor Russkikh. 8) SKB leak in llc_build_and_send_ui_pkt(), from Eric Dumazet. 9) Double frees in mlx5, from Parav Pandit. 10) Fix lost MAC address in r8169 during PCI D3, from Heiner Kallweit. 11) Fix botched register access in mvpp2, from Antoine Tenart. 12) Use after free in napi_gro_frags(), from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (89 commits) net: correct zerocopy refcnt with udp MSG_MORE ethtool: Check for vlan etype or vlan tci when parsing flow_rule net: don't clear sock->sk early to avoid trouble in strparser net-gro: fix use-after-free read in napi_gro_frags() net: dsa: tag_8021q: Create a stable binary format net: dsa: tag_8021q: Change order of rx_vid setup net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value ipv4: tcp_input: fix stack out of bounds when parsing TCP options. mlxsw: spectrum: Prevent force of 56G mlxsw: spectrum_acl: Avoid warning after identical rules insertion net: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT r8169: fix MAC address being lost in PCI D3 net: core: support XDP generic on stacked devices. netvsc: unshare skb in VF rx handler udp: Avoid post-GRO UDP checksum recalculation net: phy: dp83867: Set up RGMII TX delay net: phy: dp83867: do not call config_init twice net: phy: dp83867: increase SGMII autoneg timer duration net: phy: dp83867: fix speed 10 in sgmii mode net: phy: marvell10g: report if the PHY fails to boot firmware ...
2019-05-30Merge tag 'arm64-fixes' of ↵Linus Torvalds6-30/+56
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "The fixes are still trickling in for arm64, but the only really significant one here is actually fixing a regression in the botched module relocation range checking merged for -rc2. Hopefully we've nailed it this time. - Fix implementation of our set_personality() system call, which wasn't being wrapped properly - Fix system call function types to keep CFI happy - Fix siginfo layout when delivering SIGKILL after a kernel fault - Really fix module relocation range checking" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: use the correct function type for __arm64_sys_ni_syscall arm64: use the correct function type in SYSCALL_DEFINE0 arm64: fix syscall_fn_t type signal/arm64: Use force_sig not force_sig_fault for SIGKILL arm64/module: revert to unsigned interpretation of ABS16/32 relocations arm64: Fix the arm64_personality() syscall wrapper redirection
2019-05-30Merge tag 'for-5.2-rc2-tag' of ↵Linus Torvalds7-48/+128
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes for bugs reported by users, fuzzing tools and regressions: - fix crashes in relocation: + resuming interrupted balance operation does not properly clean up orphan trees + with enabled qgroups, resuming needs to be more careful about block groups due to limited context when updating qgroups - fsync and logging fixes found by fuzzing - incremental send fixes for no-holes and clone - fix spin lock type used in timer function for zstd" * tag 'for-5.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix race updating log root item during fsync Btrfs: fix wrong ctime and mtime of a directory after log replay Btrfs: fix fsync not persisting changed attributes of a directory btrfs: qgroup: Check bg while resuming relocation to avoid NULL pointer dereference btrfs: reloc: Also queue orphan reloc tree for cleanup to avoid BUG_ON() Btrfs: incremental send, fix emission of invalid clone operations Btrfs: incremental send, fix file corruption when no-holes feature is enabled btrfs: correct zstd workspace manager lock to use spin_lock_bh() btrfs: Ensure replaced device doesn't have pending chunk allocation
2019-05-30Merge tag 'configfs-for-5.2-2' of git://git.infradead.org/users/hch/configfsLinus Torvalds1-8/+6
Pull configs fix from Christoph Hellwig: - fix a use after free in configfs_d_iput (Sahitya Tummala) * tag 'configfs-for-5.2-2' of git://git.infradead.org/users/hch/configfs: configfs: Fix use-after-free when accessing sd->s_dentry
2019-05-30Merge tag 'sound-5.2-rc3' of ↵Linus Torvalds6-33/+51
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "No big surprises here, just a few device-specific fixes. HD-audio received several fixes for Acer, Dell, Huawei and other laptops as well as the workaround for the new Intel chipset. One significant one-liner fix is the disablement of the node-power saving on Realtek codecs, which may potentially cover annoying bugs like the background noises or click noises on many devices. Other than that, a fix for FireWire bit definitions, and another fix for LINE6 USB audio bug that was discovered by syzkaller" * tag 'sound-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: fireface: Use ULL suffixes for 64-bit constants ALSA: hda/realtek - Improve the headset mic for Acer Aspire laptops ALSA: line6: Assure canceling delayed work at disconnection ALSA: hda - Force polling mode on CNL for fixing codec communication ALSA: hda/realtek - Enable micmute LED for Huawei laptops ALSA: hda/realtek - Set default power save node to 0 ALSA: hda/realtek - Check headset type by unplug and resume
2019-05-30Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds3-10/+11
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk driver fixes from Stephen Boyd: - Don't expose the SiFive clk driver on non-RISCV architectures - Fix some bits describing clks in the imx8mm driver - Always call clk domain code in the TI driver so non-legacy platforms work * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: ti: clkctrl: Fix clkdm_clk handling clk: imx: imx8mm: fix int pll clk gate clk: sifive: restrict Kconfig scope for the FU540 PRCI driver
2019-05-30net: correct zerocopy refcnt with udp MSG_MOREWillem de Bruijn3-5/+9
TCP zerocopy takes a uarg reference for every skb, plus one for the tcp_sendmsg_locked datapath temporarily, to avoid reaching refcnt zero as it builds, sends and frees skbs inside its inner loop. UDP and RAW zerocopy do not send inside the inner loop so do not need the extra sock_zerocopy_get + sock_zerocopy_put pair. Commit 52900d22288ed ("udp: elide zerocopy operation in hot path") introduced extra_uref to pass the initial reference taken in sock_zerocopy_alloc to the first generated skb. But, sock_zerocopy_realloc takes this extra reference at the start of every call. With MSG_MORE, no new skb may be generated to attach the extra_uref to, so refcnt is incorrectly 2 with only one skb. Do not take the extra ref if uarg && !tcp, which implies MSG_MORE. Update extra_uref accordingly. This conditional assignment triggers a false positive may be used uninitialized warning, so have to initialize extra_uref at define. Changes v1->v2: fix typo in Fixes SHA1 Fixes: 52900d22288e7 ("udp: elide zerocopy operation in hot path") Reported-by: syzbot <syzkaller@googlegroups.com> Diagnosed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30Merge branch '100GbE' of ↵David S. Miller16-218/+754
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 100GbE Intel Wired LAN Driver Updates 2019-05-30 This series contains updates to ice driver only. Brett continues his work with interrupt handling by fixing an issue where were writing to the incorrect register to disable all VF interrupts. Tony consolidates the unicast and multicast MAC filters into a single new function. Anirudh adds support for virtual channel vector mapping to receive and transmit queues. This uses a bitmap to associate indicated queues with the specified vector. Makes several cosmetic code cleanups, as well as update the driver to align with the current specification for managing MAC operation codes (opcodes). Paul adds support for Forward Error Correction (FEC) and also adds the ethtool get and set handlers to modify FEC parameters. Bruce cleans up the driver code to fix a number of issues, such as, reducing the scope of some local variables, reduce the number of de-references by changing a local variable and reorder the code to remove unnecessary "goto's". Dave adds switch rules to be able to handle LLDP packets and in the process, fix a couple of issues found, like stop treating DCBx state of "not started" as an error and stop hard coding the filter information flag to transmit. Jacob updates the driver to allow for more granular debugging by developers by using a distinct separate bit for dumping firmware logs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: sched: act_ctinfo: minor size optimisationKevin 'ldir' Darbyshire-Bryant1-4/+0
Since the new parameter block is initialised to 0 by kzmalloc we don't need to mask & clear unused operational mode bits, they are already unset. Drop the pointless code. Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30ethtool: Check for vlan etype or vlan tci when parsing flow_ruleMaxime Chevallier1-2/+6
When parsing an ethtool flow spec to build a flow_rule, the code checks if both the vlan etype and the vlan tci are specified by the user to add a FLOW_DISSECTOR_KEY_VLAN match. However, when the user only specified a vlan etype or a vlan tci, this check silently ignores these parameters. For example, the following rule : ethtool -N eth0 flow-type udp4 vlan 0x0010 action -1 loc 0 will result in no error being issued, but the equivalent rule will be created and passed to the NIC driver : ethtool -N eth0 flow-type udp4 action -1 loc 0 In the end, neither the NIC driver using the rule nor the end user have a way to know that these keys were dropped along the way, or that incorrect parameters were entered. This kind of check should be left to either the driver, or the ethtool flow spec layer. This commit makes so that ethtool parameters are forwarded as-is to the NIC driver. Since none of the users of ethtool_rx_flow_rule_create are using the VLAN dissector, I don't think this qualifies as a regression. Fixes: eca4205f9ec3 ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Acked-by: Pablo Neira Ayuso <pablo@gnumonks.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30Merge branch 'complex-c45-phys'David S. Miller3-21/+40
Heiner Kallweit says: ==================== net: phy: improve handling of more complex C45 PHY's This series tries to address few problematic aspects raised by Russell. Concrete example is the Marvell 88x3310, the changes should be helpful for other complex C45 PHY's too. v2: - added patch enabling interrupts also if phylib state machine isn't started - removed patch dealing with the double link status read This one needs little bit more thinking and will go separately. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: phy: export phy_queue_state_machineHeiner Kallweit2-4/+6
We face the issue that link change interrupt and link status may be reported by different PHY layers. As a result the link change interrupt may occur before the link status changes. Export phy_queue_state_machine to allow PHY drivers to specify a delay between link status change interrupt and link status check. v2: - change jiffies parameter type to unsigned long Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Suggested-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: phy: add callback for custom interrupt handler to struct phy_driverHeiner Kallweit2-2/+10
The phylib interrupt handler handles link change events only currently. However PHY drivers may want to use other interrupt sources too, e.g. to report temperature monitoring events. Therefore add a callback to struct phy_driver allowing PHY drivers to implement a custom interrupt handler. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Suggested-by: Russell King - ARM Linux admin <linux@armlinux.org.uk> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: phy: enable interrupts when PHY is attached alreadyHeiner Kallweit3-15/+24
This patch is a step towards allowing PHY drivers to handle more interrupt sources than just link change. E.g. several PHY's have built-in temperature monitoring and can raise an interrupt if a temperature threshold is exceeded. We may be interested in such interrupts also if the phylib state machine isn't started. Therefore move enabling interrupts to phy_request_interrupt(). v2: - patch added to series Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30qed: Fix static checker warningMichal Kalderon1-12/+12
In some cases abs_ppfid could be printed without being initialized. Fixes: 79284adeb99e ("qed: Add llh ppfid interface and 100g support for offload protocols") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: dsa: Add error path handling in dsa_tree_setup()Ioana Ciornei1-23/+66
In case a call to dsa_tree_setup() fails, an attempt to cleanup is made by calling dsa_tree_remove_switch(), which should take care of removing/unregistering any resources previously allocated. This does not happen because it is conditioned by dst->setup being true, which is set only after _all_ setup steps were performed successfully. This is especially interesting when the internal MDIO bus is registered but afterwards, a port setup fails and the mdiobus_unregister() is never called. This leads to a BUG_ON() complaining about the fact that it's trying to free an MDIO bus that's still registered. Add proper error handling in all functions branching from dsa_tree_setup(). Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reported-by: kernel test robot <rong.a.chen@intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: don't clear sock->sk early to avoid trouble in strparserJakub Kicinski1-1/+1
af_inet sets sock->sk to NULL which trips strparser over: BUG: kernel NULL pointer dereference, address: 0000000000000012 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.2.0-rc1-00139-g14629453a6d3 #21 RIP: 0010:tcp_peek_len+0x10/0x60 RSP: 0018:ffffc02e41c54b98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff9cf924c4e030 RCX: 0000000000000051 RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff9cf97128f480 RBP: ffff9cf9365e0300 R08: ffff9cf94fe7d2c0 R09: 0000000000000000 R10: 000000000000036b R11: ffff9cf939735e00 R12: ffff9cf91ad9ae40 R13: ffff9cf924c4e000 R14: ffff9cf9a8fcbaae R15: 0000000000000020 FS: 0000000000000000(0000) GS:ffff9cf9af7c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000012 CR3: 000000013920a003 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> strp_data_ready+0x48/0x90 tls_data_ready+0x22/0xd0 [tls] tcp_rcv_established+0x569/0x620 tcp_v4_do_rcv+0x127/0x1e0 tcp_v4_rcv+0xad7/0xbf0 ip_protocol_deliver_rcu+0x2c/0x1c0 ip_local_deliver_finish+0x41/0x50 ip_local_deliver+0x6b/0xe0 ? ip_protocol_deliver_rcu+0x1c0/0x1c0 ip_rcv+0x52/0xd0 ? ip_rcv_finish_core.isra.20+0x380/0x380 __netif_receive_skb_one_core+0x7e/0x90 netif_receive_skb_internal+0x42/0xf0 napi_gro_receive+0xed/0x150 nfp_net_poll+0x7a2/0xd30 [nfp] ? kmem_cache_free_bulk+0x286/0x310 net_rx_action+0x149/0x3b0 __do_softirq+0xe3/0x30a ? handle_irq_event_percpu+0x6a/0x80 irq_exit+0xe8/0xf0 do_IRQ+0x85/0xd0 common_interrupt+0xf/0xf </IRQ> RIP: 0010:cpuidle_enter_state+0xbc/0x450 To avoid this issue set sock->sk after sk_prot->close. My grepping and testing did not discover any code which would depend on the current behaviour. Fixes: c46234ebb4d1 ("tls: RX path for ktls") Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net-gro: fix use-after-free read in napi_gro_frags()Eric Dumazet1-1/+1
If a network driver provides to napi_gro_frags() an skb with a page fragment of exactly 14 bytes, the call to gro_pull_from_frag0() will 'consume' the fragment by calling skb_frag_unref(skb, 0), and the page might be freed and reused. Reading eth->h_proto at the end of napi_frags_skb() might read mangled data, or crash under specific debugging features. BUG: KASAN: use-after-free in napi_frags_skb net/core/dev.c:5833 [inline] BUG: KASAN: use-after-free in napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841 Read of size 2 at addr ffff88809366840c by task syz-executor599/8957 CPU: 1 PID: 8957 Comm: syz-executor599 Not tainted 5.2.0-rc1+ #32 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:142 napi_frags_skb net/core/dev.c:5833 [inline] napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841 tun_get_user+0x2f3c/0x3ff0 drivers/net/tun.c:1991 tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037 call_write_iter include/linux/fs.h:1872 [inline] do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693 do_iter_write fs/read_write.c:970 [inline] do_iter_write+0x184/0x610 fs/read_write.c:951 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015 do_writev+0x15b/0x330 fs/read_write.c:1058 Fixes: a50e233c50db ("net-gro: restore frag0 optimization") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30Merge branch 'Fixes-for-DSA-tagging-using-802-1Q'David S. Miller1-14/+65
Vladimir Oltean says: ==================== Fixes for DSA tagging using 802.1Q During the prototyping for the "Decoupling PHYLINK from struct net_device" patchset, the CPU port of the sja1105 driver was moved to a different spot. This uncovered an issue in the tag_8021q DSA code, which used to work by mistake - the CPU port was the last hardware port numerically, and this was masking an ordering issue which is very likely to be seen in other drivers that make use of 802.1Q tags. A question was also raised whether the VID numbers bear any meaning, and the conclusion was that they don't, at least not in an absolute sense. The second patch defines bit fields inside the DSA 802.1Q VID so that tcpdump can decode it unambiguously (although the meaning is now clear even by visual inspection). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: dsa: tag_8021q: Create a stable binary formatVladimir Oltean1-10/+50
Tools like tcpdump need to be able to decode the significance of fake VLAN headers that DSA uses to separate switch ports. But currently these have no global significance - they are simply an ordered list of DSA_MAX_SWITCHES x DSA_MAX_PORTS numbers ending at 4095. The reason why this is submitted as a fix is that the existing mapping of VIDs should not enter into a stable kernel, so we can pretend that only the new format exists. This way tcpdump won't need to try to make something out of the VLAN tags on 5.2 kernels. Fixes: f9bbe4477c30 ("net: dsa: Optional VLAN-based port separation for switches without tagging") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: dsa: tag_8021q: Change order of rx_vid setupIoana Ciornei1-4/+15
The 802.1Q tagging performs an unbalanced setup in terms of RX VIDs on the CPU port. For the ingress path of a 802.1Q switch to work, the RX VID of a port needs to be seen as tagged egress on the CPU port. While configuring the other front-panel ports to be part of this VID, for bridge scenarios, the untagged flag is applied even on the CPU port in dsa_switch_vlan_add. This happens because DSA applies the same flags on the CPU port as on the (bridge-controlled) slave ports, and the effect in this case is that the CPU port tagged settings get deleted. Instead of fixing DSA by introducing a way to control VLAN flags on the CPU port (and hence stop inheriting from the slave ports) - a hard, perhaps intractable problem - avoid this situation by moving the setup part of the RX VID on the CPU port after all the other front-panel ports have been added to the VID. Fixes: f9bbe4477c30 ("net: dsa: Optional VLAN-based port separation for switches without tagging") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30Merge branch 'r8169-fw'David S. Miller1-34/+37
Heiner Kallweit says: ==================== r8169: decouple firmware handling code from actual driver code These two patches are a step towards eventually factoring out firmware handling code to a separate source file. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30r8169: decouple rtl_phy_write_fw from actual driver codeHeiner Kallweit1-20/+28
This patch is a further step towards decoupling firmware handling from the actual driver code. Firmware can be for PHY and/or MAC, and two pairs of read/write functions are needed for handling PHY firmware and MAC firmware respectively. Pass these functions via struct rtl_fw and avoid the ugly switching of mdio_ops behind the back of rtl_writephy(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30r8169: improve rtl_fw_format_okHeiner Kallweit1-14/+9
Simplify the function a little bit and use strscpy() where appropriate. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30r8169: enable WoL speed down on more chip versionsHeiner Kallweit1-24/+2
Call the pll power down function also for chip versions 02..06 and 13..15. The MAC can't be powered down on these chip versions, but at least they benefit from the speed-down power-saving if WoL is enabled. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30sctp: deduplicate identical skb_checksum_opsMatteo Croce2-11/+8
The same skb_checksum_ops struct is defined twice in two different places, leading to code duplication. Declare it as a global variable into a common header instead of allocating it on the stack on each function call. bloat-o-meter reports a slight code shrink. add/remove: 1/1 grow/shrink: 0/10 up/down: 128/-1282 (-1154) Function old new delta sctp_csum_ops - 128 +128 crc32c_csum_ops 16 - -16 sctp_rcv 6616 6583 -33 sctp_packet_pack 4542 4504 -38 nf_conntrack_sctp_packet 4980 4926 -54 execute_masked_set_action 6453 6389 -64 tcf_csum_sctp 575 428 -147 sctp_gso_segment 1292 1126 -166 sctp_csum_check 579 412 -167 sctp_snat_handler 957 772 -185 sctp_dnat_handler 1321 1132 -189 l4proto_manip_pkt 2536 2313 -223 Total: Before=359297613, After=359296459, chg -0.00% Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Matteo Croce <mcroce@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: avoid indirect calls in L4 checksum calculationMatteo Croce1-4/+11
Commit 283c16a2dfd3 ("indirect call wrappers: helpers to speed-up indirect calls of builtin") introduces some macros to avoid doing indirect calls. Use these helpers to remove two indirect calls in the L4 checksum calculation for devices which don't have hardware support for it. As a test I generate packets with pktgen out to a dummy interface with HW checksumming disabled, to have the checksum calculated in every sent packet. The packet rate measured with an i7-6700K CPU and a single pktgen thread raised from 6143 to 6608 Kpps, an increase by 7.5% Suggested-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: dsa: sja1105: Make static_config_check_memory_size staticYueHaibing1-1/+1
Fix sparse warning: drivers/net/dsa/sja1105/sja1105_static_config.c:446:1: warning: symbol 'static_config_check_memory_size' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue valueAntoine Tenart1-6/+4
MVPP2_TXQ_SCHED_TOKEN_CNTR_REG() expects the logical queue id but the current code is passing the global tx queue offset, so it ends up writing to unknown registers (between 0x8280 and 0x82fc, which seemed to be unused by the hardware). This fixes the issue by using the logical queue id instead. Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30Merge branch 'connection-tracking-support-for-bridge'David S. Miller15-276/+1206
Pablo Neira Ayuso says: ==================== connection tracking support for bridge This patchset adds native connection tracking support for the bridge. Patch #1 and #2 extract code from IPv4/IPv6 fragmentation core and introduce the fraglist splitter. That splits a skbuff fraglist into independent fragments. Patch #3 and #4 also extract code from IPv4/IPv6 fragmentation core and introduce the skbuff into fragments transformer. This can be used by linearized skbuffs (eg. coming from nfqueue and ct helpers) as well as cloned skbuffs (that are either seen either with taps or with bridge port flooding). Patch #5 moves the specific IPCB() code from these new fragment splitter/transformer APIs into the IPv4 stack. The bridge has a different control buffer layout and it starts using this new APIs in this patchset. Patch #6 adds basic infrastructure that allows to register bridge conntrack support. Patch #7 adds bridge conntrack support (only for IPv4 in this patch). Patch #8 adds IPv6 support for the bridge conntrack support. Patch #9 registers the IPv4/IPv6 conntrack hooks in case the bridge conntrack is used to deal with local traffic, ie. prerouting -> input bridge hook path. This cover the bridge interface has a IP address scenario. Before this patchset, only chance for people to do stateful filtering is to use the `br_netfilter` emulation layer, that turns bridge frame into IPv4/IPv6 packets and inject them into the IPv4/IPv6 hooks. Apparently, this module allows users to use iptables and all of its feature-set from the bridge, including stateful filtering. However, this approach is flawed in many aspects that have been discussed many times. This is a step forward to deprecate `br_netfilter'. v2: Fix English typo in commit message. v3: Fix another English typo in commit message. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30netfilter: nf_conntrack_bridge: register inet conntrack for bridgePablo Neira Ayuso1-16/+42
This patch enables IPv4 and IPv6 conntrack from the bridge to deal with local traffic. Hence, packets that are passed up to the local input path are confirmed later on from the {ipv4,ipv6}_confirm() hooks. For packets leaving the IP stack (ie. output path), fragmentation occurs after the inet postrouting hook. Therefore, the bridge local out and postrouting bridge hooks see fragments with conntrack objects, which is inconsistent. In this case, we could defragment again from the bridge output hook, but this is expensive. The recommended filtering spot for outgoing locally generated traffic leaving through the bridge interface is to use the classic IPv4/IPv6 output hook, which comes earlier. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30netfilter: nf_conntrack_bridge: add support for IPv6Pablo Neira Ayuso3-2/+230
br_defrag() and br_fragment() indirections are added in case that IPv6 support comes as a module, to avoid pulling innecessary dependencies in. The new fraglist iterator and fragment transformer APIs are used to implement the refragmentation code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30netfilter: bridge: add connection tracking systemPablo Neira Ayuso8-4/+410
This patch adds basic connection tracking support for the bridge, including initial IPv4 support. This patch register two hooks to deal with the bridge forwarding path, one from the bridge prerouting hook to call nf_conntrack_in(); and another from the bridge postrouting hook to confirm the entry. The conntrack bridge prerouting hook defragments packets before passing them to nf_conntrack_in() to look up for an existing entry, otherwise a new entry is allocated and it is attached to the skbuff. The conntrack bridge postrouting hook confirms new conntrack entries, ie. if this is the first packet seen, then it adds the entry to the hashtable and (if needed) it refragments the skbuff into the original fragments, leaving the geometry as is if possible. Exceptions are linearized skbuffs, eg. skbuffs that are passed up to nfqueue and conntrack helpers, as well as cloned skbuff for the local delivery (eg. tcpdump), also in case of bridge port flooding (cloned skbuff too). The packet defragmentation is done through the ip_defrag() call. This forces us to save the bridge control buffer, reset the IP control buffer area and then restore it after call. This function also bumps the IP fragmentation statistics, it would be probably desiderable to have independent statistics for the bridge defragmentation/refragmentation. The maximum fragment length is stored in the control buffer and it is used to refragment the skbuff from the postrouting path. The new fraglist splitter and fragment transformer APIs are used to implement the bridge refragmentation code. The br_ip_fragment() function drops the packet in case the maximum fragment size seen is larger than the output port MTU. This patchset follows the principle that conntrack should not drop packets, so users can do it through policy via invalid state matching. Like br_netfilter, there is no refragmentation for packets that are passed up for local delivery, ie. prerouting -> input path. There are calls to nf_reset() already in several spots in the stack since time ago already, eg. af_packet, that show that skbuff fraglist handling from the netif_rx path is supported already. The helpers are called from the postrouting hook, before confirmation, from there we may see packet floods to bridge ports. Then, although unlikely, this may result in exercising the helpers many times for each clone. It would be good to explore how to pass all the packets in a list to the conntrack hook to do this handle only once for this case. Thanks to Florian Westphal for handing me over an initial patchset version to add support for conntrack bridge. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30netfilter: nf_conntrack: allow to register bridge supportPablo Neira Ayuso3-3/+72
This patch adds infrastructure to register and to unregister bridge support for the conntrack module via nf_ct_bridge_register() and nf_ct_bridge_unregister(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: ipv4: place control buffer handling away from fragmentation iteratorsPablo Neira Ayuso1-18/+37
Deal with the IPCB() area away from the iterators. The bridge codebase has its own control buffer layout, move specific IP control buffer handling into the IPv4 codepath. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: ipv6: split skbuff into fragments transformerPablo Neira Ayuso2-76/+126
This patch exposes a new API to refragment a skbuff. This allows you to split either a linear skbuff or to force the refragmentation of an existing fraglist using a different mtu. The API consists of: * ip6_frag_init(), that initializes the internal state of the transformer. * ip6_frag_next(), that allows you to fetch the next fragment. This function internally allocates the skbuff that represents the fragment, it pushes the IPv6 header, and it also copies the payload for each fragment. The ip6_frag_state object stores the internal state of the splitter. This code has been extracted from ip6_fragment(). Symbols are also exported to allow to reuse this iterator from the bridge codepath to build its own refragmentation routine by reusing the existing codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: ipv4: split skbuff into fragments transformerPablo Neira Ayuso2-88/+128
This patch exposes a new API to refragment a skbuff. This allows you to split either a linear skbuff or to force the refragmentation of an existing fraglist using a different mtu. The API consists of: * ip_frag_init(), that initializes the internal state of the transformer. * ip_frag_next(), that allows you to fetch the next fragment. This function internally allocates the skbuff that represents the fragment, it pushes the IPv4 header, and it also copies the payload for each fragment. The ip_frag_state object stores the internal state of the splitter. This code has been extracted from ip_do_fragment(). Symbols are also exported to allow to reuse this iterator from the bridge codepath to build its own refragmentation routine by reusing the existing codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: ipv6: add skbuff fraglist splitterPablo Neira Ayuso2-55/+102
This patch adds the skbuff fraglist split iterator. This API provides an iterator to transform the fraglist into single skbuff objects, it consists of: * ip6_fraglist_init(), that initializes the internal state of the fraglist iterator. * ip6_fraglist_prepare(), that restores the IPv6 header on the fragment. * ip6_fraglist_next(), that retrieves the fragment from the fraglist and updates the internal state of the iterator to point to the next fragment in the fraglist. The ip6_fraglist_iter object stores the internal state of the iterator. This code has been extracted from ip6_fragment(). Symbols are also exported to allow to reuse this iterator from the bridge codepath to build its own refragmentation routine by reusing the existing codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30net: ipv4: add skbuff fraglist splitterPablo Neira Ayuso2-33/+78
This patch adds the skbuff fraglist splitter. This API provides an iterator to transform the fraglist into single skbuff objects, it consists of: * ip_fraglist_init(), that initializes the internal state of the fraglist splitter. * ip_fraglist_prepare(), that restores the IPv4 header on the fragments. * ip_fraglist_next(), that retrieves the fragment from the fraglist and it updates the internal state of the splitter to point to the next fragment skbuff in the fraglist. The ip_fraglist_iter object stores the internal state of the iterator. This code has been extracted from ip_do_fragment(). Symbols are also exported to allow to reuse this iterator from the bridge codepath to build its own refragmentation routine by reusing the existing codebase. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30Merge branch 'add-TFO-backup-key'David S. Miller11-118/+694
Jason Baron says: ==================== add TFO backup key Christoph, Igor, and I have worked on an API that facilitates TFO key rotation. This is a follow up to the series that Christoph previously posted, with an API that meets both of our use-cases. Here's a link to the previous work: https://patchwork.ozlabs.org/cover/1013753/ Changes in v2: -spelling fixes in ip-sysctl.txt (Jeremy Sowden) -re-base to latest net-next ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30selftests/net: add TFO key rotation selftestJason Baron4-0/+394
Demonstrate how the primary and backup TFO keys can be rotated while minimizing the number of client cookies that are rejected. Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30Documentation: ip-sysctl.txt: Document tcp_fastopen_keyJason Baron1-0/+20
Add docs for /proc/sys/net/ipv4/tcp_fastopen_key Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Cc: Jeremy Sowden <jeremy@azazel.net> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30tcp: add support for optional TFO backup key to net.ipv4.tcp_fastopen_keyJason Baron1-24/+71
Add the ability to add a backup TFO key as: # echo "x-x-x-x,x-x-x-x" > /proc/sys/net/ipv4/tcp_fastopen_key The key before the comma acks as the primary TFO key and the key after the comma is the backup TFO key. This change is intended to be backwards compatible since if only one key is set, userspace will simply read back that single key as follows: # echo "x-x-x-x" > /proc/sys/net/ipv4/tcp_fastopen_key # cat /proc/sys/net/ipv4/tcp_fastopen_key x-x-x-x Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30tcp: add support to TCP_FASTOPEN_KEY for optional backup keyJason Baron1-10/+20
Add support for get/set of an optional backup key via TCP_FASTOPEN_KEY, in addition to the current 'primary' key. The primary key is used to encrypt and decrypt TFO cookies, while the backup is only used to decrypt TFO cookies. The backup key is used to maximize successful TFO connections when TFO keys are rotated. Currently, TCP_FASTOPEN_KEY allows a single 16-byte primary key to be set. This patch now allows a 32-byte value to be set, where the first 16 bytes are used as the primary key and the second 16 bytes are used for the backup key. Similarly, for getsockopt(), we can receive a 32-byte value as output if requested. If a 16-byte value is used to set the primary key via TCP_FASTOPEN_KEY, then any previously set backup key will be removed. Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30tcp: add backup TFO key infrastructureJason Baron6-58/+162
We would like to be able to rotate TFO keys while minimizing the number of client cookies that are rejected. Currently, we have only one key which can be used to generate and validate cookies, thus if we simply replace this key clients can easily have cookies rejected upon rotation. We propose having the ability to have both a primary key and a backup key. The primary key is used to generate as well as to validate cookies. The backup is only used to validate cookies. Thus, keys can be rotated as: 1) generate new key 2) add new key as the backup key 3) swap the primary and backup key, thus setting the new key as the primary We don't simply set the new key as the primary key and move the old key to the backup slot because the ip may be behind a load balancer and we further allow for the fact that all machines behind the load balancer will not be updated simultaneously. We make use of this infrastructure in subsequent patches. Suggested-by: Igor Lubashev <ilubashe@akamai.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30tcp: introduce __tcp_fastopen_cookie_gen_cipher()Christoph Paasch1-36/+37
Restructure __tcp_fastopen_cookie_gen() to take a 'struct crypto_cipher' argument and rename it as __tcp_fastopen_cookie_gen_cipher(). Subsequent patches will provide different ciphers based on which key is being used for the cookie generation. Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Jason Baron <jbaron@akamai.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30Merge branch 'mlxsw-Hardware-monitoring-enhancements'David S. Miller7-111/+274
Ido Schimmel says: ==================== mlxsw: Hardware monitoring enhancements This patchset from Vadim provides various hardware monitoring related improvements for mlxsw. Patch #1 allows querying firmware version from the switch driver when the underlying bus is I2C. This is useful for baseboard management controller (BMC) systems that communicate with the ASIC over I2C. Patch #2 improves driver's performance over I2C by utilizing larger transactions sizes, if possible. Patch #3 re-orders driver's initialization sequence to enforce a specific firmware version before new firmware features are utilized. This is a prerequisite for patches #4-#6. Patches #4-#6 expose the temperature of inter-connect devices (gearboxes) that are present in Mellanox SN3800 systems and split 2x50Gb/s lanes to 4x25Gb/s lanes. Patches #7-#8 reduce the transaction size when reading SFP modules temperatures, which is crucial when working over I2C. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30mlxsw: core: Reduce buffer size in transactions for SFP modules temperature ↵Vadim Pasternak3-74/+33
readout Obtain SFP modules temperatures through MTMP register instead of MTBR register, because the first one utilizes shorter transaction buffer size for request. It improves performance in case low frequency interface (I2C) is used for communication with a chip. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30mlxsw: core: Extend the index size for temperature sensors readoutVadim Pasternak1-3/+4
Extend sensor index size for Management Temperature Bulk Register (MTBR) and Management Temperature Register (MTMP) upto 12 bits in order to align registers description with new version of PRM document. Add define for base sensor index for SFP modules temperature reading for MTMP register. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30mlxsw: core: Extend hwmon interface with inter-connect temperature attributesVadim Pasternak1-5/+96
Add new attributes to hwmon object for exposing inter-connects temperature input, highest, reset_history temperatures and label. Temperatures are read from Management Temperature Register. The number of inter-connect devices is read from Management General Peripheral Information Register. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>