summaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2019-09-16tcp: Add TCP_INFO counter for packets received out-of-orderThomas Higdon4-0/+7
For receive-heavy cases on the server-side, we want to track the connection quality for individual client IPs. This counter, similar to the existing system-wide TCPOFOQueue counter in /proc/net/netstat, tracks out-of-order packet reception. By providing this counter in TCP_INFO, it will allow understanding to what degree receive-heavy sockets are experiencing out-of-order delivery and packet drops indicating congestion. Please note that this is similar to the counter in NetBSD TCP_INFO, and has the same name. Also note that we avoid increasing the size of the tcp_sock struct by taking advantage of a hole. Signed-off-by: Thomas Higdon <tph@fb.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16net: mdio: switch to using gpiod_get_optional()Dmitry Torokhov1-13/+9
The MDIO device reset line is optional and now that gpiod_get_optional() returns proper value when GPIO support is compiled out, there is no reason to use fwnode_get_named_gpiod() that I plan to hide away. Let's switch to using more standard gpiod_get_optional() and gpiod_set_consumer_name() to keep the nice "PHY reset" label. Also there is no reason to only try to fetch the reset GPIO when we have OF node, gpiolib can fetch GPIO data from firmwares as well. Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller21-322/+260
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-09-16 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Now that initial BPF backend for gcc has been merged upstream, enable BPF kselftest suite for bpf-gcc. Also fix a BE issue with access to bpf_sysctl.file_pos, from Ilya. 2) Follow-up fix for link-vmlinux.sh to remove bash-specific extensions related to recent work on exposing BTF info through sysfs, from Andrii. 3) AF_XDP zero copy fixes for i40e and ixgbe driver which caused umem headroom to be added twice, from Ciara. 4) Refactoring work to convert sock opt tests into test_progs framework in BPF kselftests, from Stanislav. 5) Fix a general protection fault in dev_map_hash_update_elem(), from Toke. 6) Cleanup to use BPF_PROG_RUN() macro in KCM, from Sami. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16bpf: fix accessing bpf_sysctl.file_pos on s390Ilya Leoshkevich4-9/+22
"ctx:file_pos sysctl:read write ok" fails on s390 with "Read value != nux". This is because verifier rewrites a complete 32-bit bpf_sysctl.file_pos update to a partial update of the first 32 bits of 64-bit *bpf_sysctl_kern.ppos, which is not correct on big-endian systems. Fix by using an offset on big-endian systems. Ditto for bpf_sysctl.file_pos reads. Currently the test does not detect a problem there, since it expects to see 0, which it gets with high probability in error cases, so change it to seek to offset 3 and expect 3 in bpf_sysctl.file_pos. Fixes: e1550bfe0de4 ("bpf: Add file_pos field to bpf_sysctl ctx") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20190816105300.49035-1-iii@linux.ibm.com/
2019-09-16xdp: Fix race in dev_map_hash_update_elem() when replacing elementToke Høiland-Jørgensen1-5/+12
syzbot found a crash in dev_map_hash_update_elem(), when replacing an element with a new one. Jesper correctly identified the cause of the crash as a race condition between the initial lookup in the map (which is done before taking the lock), and the removal of the old element. Rather than just add a second lookup into the hashmap after taking the lock, fix this by reworking the function logic to take the lock before the initial lookup. Fixes: 6f9d451ab1a3 ("xdp: Add devmap_hash map type for looking up devices by hashed index") Reported-and-tested-by: syzbot+4e7a85b1432052e8d6f8@syzkaller.appspotmail.com Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-16Merge branch 'bpf-af-xdp-unaligned-fixes'Daniel Borkmann3-5/+5
Ciara Loftus says: ==================== This patch set contains some fixes for AF_XDP zero copy in the i40e and ixgbe drivers as well as a fix for the 'xdpsock' sample application when running in unaligned mode. Patches 1 and 2 fix a regression for the i40e and ixgbe drivers which caused the umem headroom to be added to the xdp handle twice, resulting in an incorrect value being received by the user for the case where the umem headroom is non-zero. Patch 3 fixes an issue with the xdpsock sample application whereby the start of the tx packet data (offset) was not being set correctly when the application was being run in unaligned mode. This patch set has been applied against commit a2c11b034142 ("kcm: use BPF_PROG_RUN") ==================== Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-16samples/bpf: fix xdpsock l2fwd tx for unaligned modeCiara Loftus1-1/+1
Preserve the offset of the address of the received descriptor, and include it in the address set for the tx descriptor, so the kernel can correctly locate the start of the packet data. Fixes: 03895e63ff97 ("samples/bpf: add buffer recycling for unaligned chunks to xdpsock") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-16ixgbe: fix xdp handle calculationsCiara Loftus1-2/+2
Commit 7cbbf9f1fa23 ("ixgbe: fix xdp handle calculations") reintroduced the addition of the umem headroom to the xdp handle in the ixgbe_zca_free, ixgbe_alloc_buffer_slow_zc and ixgbe_alloc_buffer_zc functions. However, the headroom is already added to the handle in the function ixgbe_run_xdp_zc. This commit removes the latter addition and fixes the case where the headroom is non-zero. Fixes: 7cbbf9f1fa23 ("ixgbe: fix xdp handle calculations") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-16i40e: fix xdp handle calculationsCiara Loftus1-2/+2
Commit 4c5d9a7fa149 ("i40e: fix xdp handle calculations") reintroduced the addition of the umem headroom to the xdp handle in the i40e_zca_free, i40e_alloc_buffer_slow_zc and i40e_alloc_buffer_zc functions. However, the headroom is already added to the handle in the function i40_run_xdp_zc. This commit removes the latter addition and fixes the case where the headroom is non-zero. Fixes: 4c5d9a7fa149 ("i40e: fix xdp handle calculations") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-16selftests/bpf: add bpf-gcc supportIlya Leoshkevich3-23/+67
Now that binutils and gcc support for BPF is upstream, make use of it in BPF selftests using alu32-like approach. Share as much as possible of CFLAGS calculation with clang. Fixes only obvious issues, leaving more complex ones for later: - Use gcc-provided bpf-helpers.h instead of manually defining the helpers, change bpf_helpers.h include guard to avoid conflict. - Include <linux/stddef.h> for __always_inline. - Add $(OUTPUT)/../usr/include to include path in order to use local kernel headers instead of system kernel headers when building with O=. In order to activate the bpf-gcc support, one needs to configure binutils and gcc with --target=bpf and make them available in $PATH. In particular, gcc must be installed as `bpf-gcc`, which is the default. Right now with binutils 25a2915e8dba and gcc r275589 only a handful of tests work: # ./test_progs_bpf_gcc # Summary: 7/39 PASSED, 1 SKIPPED, 98 FAILED The reason for those failures are as follows: - Build errors: - `error: too many function arguments for eBPF` for __always_inline functions read_str_var and read_map_var - must be inlining issue, and for process_l3_headers_v6, which relies on optimizing away function arguments. - `error: indirect call in function, which are not supported by eBPF` where there are no obvious indirect calls in the source calls, e.g. in __encap_ipip_none. - `error: field 'lock' has incomplete type` for fields of `struct bpf_spin_lock` type - bpf_spin_lock is re#defined by bpf-helpers.h, so its usage is sensitive to order of #includes. - `error: eBPF stack limit exceeded` in sysctl_tcp_mem. - Load errors: - Missing object files due to above build errors. - `libbpf: failed to create map (name: 'test_ver.bss')`. - `libbpf: object file doesn't contain bpf program`. - `libbpf: Program '.text' contains unrecognized relo data pointing to section 0`. - `libbpf: BTF is required, but is missing or corrupted` - no BTF support in gcc yet. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-09-16net: stmmac: socfpga: re-use the `interface` parameter from platform dataAlexandru Ardelean1-5/+10
The socfpga sub-driver defines an `interface` field in the `socfpga_dwmac` struct and parses it on init. The shared `stmmac_probe_config_dt()` function also parses this from the device-tree and makes it available on the returned `plat_data` (which is the same data available via `netdev_priv()`). All that's needed now is to dig that information out, via some `dev_get_drvdata()` && `netdev_priv()` calls and re-use it. Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16Merge branch 'More-fixes-for-unlocked-cls-hardware-offload-API-refactoring'David S. Miller8-57/+116
Vlad Buslov says: ==================== More fixes for unlocked cls hardware offload API refactoring Two fixes for my "Refactor cls hardware offload API to support rtnl-independent drivers" series and refactoring patch that implements infrastructure necessary for the fixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16net: sched: use get_dev() action API in flow_action infraVlad Buslov3-20/+18
When filling in hardware intermediate representation tc_setup_flow_action() directly obtains, checks and takes reference to dev used by mirred action, instead of using act->ops->get_dev() API created specifically for this purpose. In order to remove code duplication, refactor flow_action infra to use action API when obtaining mirred action target dev. Extend get_dev() with additional argument that is used to provide dev destructor to the user. Fixes: 5a6ff4b13d59 ("net: sched: take reference to action dev before calling offloads") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16net: sched: take reference to psample group in flow_action infraVlad Buslov6-14/+58
With recent patch set that removed rtnl lock dependency from cls hardware offload API rtnl lock is only taken when reading action data and can be released after action-specific data is parsed into intermediate representation. However, sample action psample group is passed by pointer without obtaining reference to it first, which makes it possible to concurrently overwrite the action and deallocate object pointed by psample_group pointer after rtnl lock is released but before driver finished using the pointer. To prevent such race condition, obtain reference to psample group while it is used by flow_action infra. Extend psample API with function psample_group_take() that increments psample group reference counter. Extend struct tc_action_ops with new get_psample_group() API. Implement the API for action sample using psample_group_take() and already existing psample_group_put() as a destructor. Use it in tc_setup_flow_action() to take reference to psample group pointed to by entry->sample.psample_group and release it in tc_cleanup_flow_action(). Disable bh when taking psample_groups_lock. The lock is now taken while holding action tcf_lock that is used by data path and requires bh to be disabled, so doing the same for psample_groups_lock is necessary to preserve SOFTIRQ-irq-safety. Fixes: 918190f50eb6 ("net: sched: flower: don't take rtnl lock for cls hw offloads API") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16net: sched: extend flow_action_entry with destructorVlad Buslov2-33/+50
Generalize flow_action_entry cleanup by extending the structure with pointer to destructor function. Set the destructor in tc_setup_flow_action(). Refactor tc_cleanup_flow_action() to call entry->destructor() instead of using switch that dispatches by entry->id and manually executes cleanup. This refactoring is necessary for following patches in this series that require destructor to use tc_action->ops callbacks that can't be easily obtained in tc_cleanup_flow_action(). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16net/wan: dscc4: remove broken dscc4 driverDan Carpenter4-2078/+0
Using static analysis, I discovered that the "dpriv->pci_priv->pdev" pointer is always NULL. This pointer was supposed to be initialized during probe and is essential for the driver to work. It would be easy to add a "ppriv->pdev = pdev;" to dscc4_found1() but this driver has been broken since before we started using git and no one has complained so probably we should just remove it. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16qed: fix spelling mistake "fullill" -> "fulfill"Colin Ian King1-1/+1
There is a spelling mistake in a DP_VERBOSE debug message. Fix it. (Using American English spelling as this is the most common way to spell this in the kernel). Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-16net: dsa: b53: Add support for port_egress_floods callbackFlorian Fainelli2-0/+35
Add support for configuring the per-port egress flooding control for both Unicast and Multicast traffic. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller159-1377/+1672
Minor overlapping changes in the btusb and ixgbe drivers. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-14Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds6-4/+124
Pull kvm fixes from Paolo Bonzini: "The main change here is a revert of reverts. We recently simplified some code that was thought unnecessary; however, since then KVM has grown quite a few cond_resched()s and for that reason the simplified code is prone to livelocks---one CPUs tries to empty a list of guest page tables while the others keep adding to them. This adds back the generation-based zapping of guest page tables, which was not unnecessary after all. On top of this, there is a fix for a kernel memory leak and a couple of s390 fixlets as well" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot KVM: x86: work around leak of uninitialized stack contents KVM: nVMX: handle page fault in vmread KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset()
2019-09-14Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-4/+2
Pull virtio fix from Michael Tsirkin: "A last minute revert The 32-bit build got broken by the latest defence in depth patch. Revert and we'll try again in the next cycle" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: Revert "vhost: block speculation of translated descriptors"
2019-09-14Merge tag 'riscv/for-v5.3' of ↵Linus Torvalds3-14/+15
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fix from Paul Walmsley: "Last week, Palmer and I learned that there was an error in the RISC-V kernel image header format that could make it less compatible with the ARM64 kernel image header format. I had missed this error during my original reviews of the patch. The kernel image header format is an interface that impacts bootloaders, QEMU, and other user tools. Those packages must be updated to align with whatever is merged in the kernel. We would like to avoid proliferating these image formats by keeping the RISC-V header as close as possible to the existing ARM64 header. Since the arch/riscv patch that adds support for the image header was merged with our v5.3-rc1 pull request as commit 0f327f2aaad6a ("RISC-V: Add an Image header that boot loader can parse."), we think it wise to try to fix this error before v5.3 is released. The fix itself should be backwards-compatible with any project that has already merged support for premature versions of this interface. It primarily involves ensuring that the RISC-V image header has something useful in the same field as the ARM64 image header" * tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: modify the Image header to improve compatibility with the ARM64 header
2019-09-14Revert "vhost: block speculation of translated descriptors"Michael S. Tsirkin1-4/+2
This reverts commit a89db445fbd7f1f8457b03759aa7343fa530ef6b. I was hasty to include this patch, and it breaks the build on 32 bit. Defence in depth is good but let's do it properly. Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-09-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds67-289/+443
Pull networking fixes from David Miller: 1) Don't corrupt xfrm_interface parms before validation, from Nicolas Dichtel. 2) Revert use of usb-wakeup in btusb, from Mario Limonciello. 3) Block ipv6 packets in bridge netfilter if ipv6 is disabled, from Leonardo Bras. 4) IPS_OFFLOAD not honored in ctnetlink, from Pablo Neira Ayuso. 5) Missing ULP check in sock_map, from John Fastabend. 6) Fix receive statistic handling in forcedeth, from Zhu Yanjun. 7) Fix length of SKB allocated in 6pack driver, from Christophe JAILLET. 8) ip6_route_info_create() returns an error pointer, not NULL. From Maciej Żenczykowski. 9) Only add RDS sock to the hashes after rs_transport is set, from Ka-Cheong Poon. 10) Don't double clean TX descriptors in ixgbe, from Ilya Maximets. 11) Presence of transmit IPSEC offload in an SKB is not tested for correctly in ixgbe and ixgbevf. From Steffen Klassert and Jeff Kirsher. 12) Need rcu_barrier() when register_netdevice() takes one of the notifier based failure paths, from Subash Abhinov Kasiviswanathan. 13) Fix leak in sctp_do_bind(), from Mao Wenan. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) cdc_ether: fix rndis support for Mediatek based smartphones sctp: destroy bucket if failed to bind addr sctp: remove redundant assignment when call sctp_get_port_local sctp: change return type of sctp_get_port_local ixgbevf: Fix secpath usage for IPsec Tx offload sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()' ixgbe: Fix secpath usage for IPsec TX offload. net: qrtr: fix memort leak in qrtr_tun_write_iter net: Fix null de-reference of device refcount ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()' tun: fix use-after-free when register netdev failed tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR ixgbe: fix double clean of Tx descriptors with xdp ixgbe: Prevent u8 wrapping of ITR value to something less than 10us mlx4: fix spelling mistake "veify" -> "verify" net: hns3: fix spelling mistake "undeflow" -> "underflow" net: lmc: fix spelling mistake "runnin" -> "running" NFC: st95hf: fix spelling mistake "receieve" -> "receive" net/rds: An rds_sock is added too early to the hash table mac80211: Do not send Layer 2 Update frame before authorization ...
2019-09-14Merge tag 'mmc-v5.3-rc8' of ↵Linus Torvalds7-29/+17
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: - tmio: Fixup runtime PM management during probe and remove - sdhci-pci-o2micro: Fix eMMC initialization for an AMD SoC - bcm2835: Prevent lockups when terminating work * tag 'mmc-v5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: tmio: Fixup runtime PM management during remove mmc: tmio: Fixup runtime PM management during probe Revert "mmc: tmio: move runtime PM enablement to the driver implementations" Revert "mmc: sdhci: Remove unneeded quirk2 flag of O2 SD host controller" Revert "mmc: bcm2835: Terminate timeout work synchronously"
2019-09-14Merge tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drmLinus Torvalds4-8/+11
Pull drm fixes from Dave Airlie: "From the maintainer summit, just some last minute fixes for final: lima: - fix gem_wait ioctl core: - constify modes list i915: - DP MST high color depth regression - GPU hangs on vulkan compute workloads" * tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm: drm/lima: fix lima_gem_wait() return value drm/i915: Restore relaxed padding (OCL_OOB_SUPPRES_ENABLE) for skl+ drm/i915: Limit MST to <= 8bpc once again drm/modes: Make the whitelist more const
2019-09-14Merge tag 'wireless-drivers-next-for-davem-2019-09-14' of ↵David S. Miller76-3570/+8505
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 5.4 Last set of patches for 5.4. wil6210 and rtw88 being most active this time, but ath9k also having a new module to load devices without EEPROM. Major changes: wil6210 * add support for Enhanced Directional Multi-Gigabit (EDMG) channels 9-11 * add debugfs file to show PCM ring content * report boottime_ns in scan results ath9k * add a separate loader for AR92XX (and older) pci(e) without eeprom brcmfmac * use the same wiphy after PCIe reset to not confuse the user space rtw88 * enable interrupt migration * enable AMSDU in AMPDU aggregation * report RX power for each antenna * enable to DPK and IQK calibration methods to improve performance ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-14Merge tag 'kvm-s390-master-5.3-1' of ↵Paolo Bonzini2-1/+13
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master KVM: s390: Fixes for 5.3 - prevent a user triggerable oops in the migration code - do not leak kernel stack content
2019-09-14KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslotSean Christopherson2-2/+101
James Harvey reported a livelock that was introduced by commit d012a06ab1d23 ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot""). The livelock occurs because kvm_mmu_zap_all() as it exists today will voluntarily reschedule and drop KVM's mmu_lock, which allows other vCPUs to add shadow pages. With enough vCPUs, kvm_mmu_zap_all() can get stuck in an infinite loop as it can never zap all pages before observing lock contention or the need to reschedule. The equivalent of kvm_mmu_zap_all() that was in use at the time of the reverted commit (4e103134b8623, "KVM: x86/mmu: Zap only the relevant pages when removing a memslot") employed a fast invalidate mechanism and was not susceptible to the above livelock. There are three ways to fix the livelock: - Reverting the revert (commit d012a06ab1d23) is not a viable option as the revert is needed to fix a regression that occurs when the guest has one or more assigned devices. It's unlikely we'll root cause the device assignment regression soon enough to fix the regression timely. - Remove the conditional reschedule from kvm_mmu_zap_all(). However, although removing the reschedule would be a smaller code change, it's less safe in the sense that the resulting kvm_mmu_zap_all() hasn't been used in the wild for flushing memslots since the fast invalidate mechanism was introduced by commit 6ca18b6950f8d ("KVM: x86: use the fast way to invalidate all pages"), back in 2013. - Reintroduce the fast invalidate mechanism and use it when zapping shadow pages in response to a memslot being deleted/moved, which is what this patch does. For all intents and purposes, this is a revert of commit ea145aacf4ae8 ("Revert "KVM: MMU: fast invalidate all pages"") and a partial revert of commit 7390de1e99a70 ("Revert "KVM: x86: use the fast way to invalidate all pages""), i.e. restores the behavior of commit 5304b8d37c2a5 ("KVM: MMU: fast invalidate all pages") and commit 6ca18b6950f8d ("KVM: x86: use the fast way to invalidate all pages") respectively. Fixes: d012a06ab1d23 ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"") Reported-by: James Harvey <jamespharvey20@gmail.com> Cc: Alex Willamson <alex.williamson@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-14KVM: x86: work around leak of uninitialized stack contentsFuqian Huang1-0/+7
Emulation of VMPTRST can incorrectly inject a page fault when passed an operand that points to an MMIO address. The page fault will use uninitialized kernel stack memory as the CR2 and error code. The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR exit to userspace; however, it is not an easy fix, so for now just ensure that the error code and CR2 are zero. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Cc: stable@vger.kernel.org [add comment] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-14KVM: nVMX: handle page fault in vmreadPaolo Bonzini1-1/+3
The implementation of vmread to memory is still incomplete, as it lacks the ability to do vmread to I/O memory just like vmptrst. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-13riscv: modify the Image header to improve compatibility with the ARM64 headerPaul Walmsley3-14/+15
Part of the intention during the definition of the RISC-V kernel image header was to lay the groundwork for a future merge with the ARM64 image header. One error during my original review was not noticing that the RISC-V header's "magic" field was at a different size and position than the ARM64's "magic" field. If the existing ARM64 Image header parsing code were to attempt to parse an existing RISC-V kernel image header format, it would see a magic number 0. This is undesirable, since it's our intention to align as closely as possible with the ARM64 header format. Another problem was that the original "res3" field was not being initialized correctly to zero. Address these issues by creating a 32-bit "magic2" field in the RISC-V header which matches the ARM64 "magic" field. RISC-V binaries will store "RSC\x05" in this field. The intention is that the use of the existing 64-bit "magic" field in the RISC-V header will be deprecated over time. Increment the minor version number of the file format to indicate this change, and update the documentation accordingly. Fix the assembler directives in head.S to ensure that reserved fields are properly zero-initialized. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Reported-by: Palmer Dabbelt <palmer@sifive.com> Reviewed-by: Palmer Dabbelt <palmer@sifive.com> Cc: Atish Patra <atish.patra@wdc.com> Cc: Karsten Merker <merker@debian.org> Link: https://lore.kernel.org/linux-riscv/194c2f10c9806720623430dbf0cc59a965e50448.camel@wdc.com/T/#u Link: https://lore.kernel.org/linux-riscv/mhng-755b14c4-8f35-4079-a7ff-e421fd1b02bc@palmer-si-x1e/T/#t
2019-09-13Merge branch ↵David S. Miller8-31/+106
'devlink-move-reload-fail-indication-to-devlink-core-and-expose-to-user' Jiri Pirko says: ==================== net: devlink: move reload fail indication to devlink core and expose to user First two patches are dependencies of the last one. That moves devlink reload failure indication to the devlink code, so the drivers do not have to track it themselves. Currently it is only mlxsw, but I will send a follow-up patchset that introduces this in netdevsim too. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-13net: devlink: move reload fail indication to devlink core and expose to userJiri Pirko4-11/+30
Currently the fact that devlink reload failed is stored in drivers. Move this flag into devlink core. Also, expose it to the user. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-13net: devlink: split reload op into twoJiri Pirko5-16/+56
In order to properly implement failure indication during reload, split the reload op into two ops, one for down phase and one for up phase. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-13mlx4: Split restart_one into two functionsJiri Pirko3-7/+23
Split the function restart_one into two functions and separate teardown and buildup. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-13cdc_ether: fix rndis support for Mediatek based smartphonesBjørn Mork1-1/+9
A Mediatek based smartphone owner reports problems with USB tethering in Linux. The verbose USB listing shows a rndis_host interface pair (e0/01/03 + 10/00/00), but the driver fails to bind with [ 355.960428] usb 1-4: bad CDC descriptors The problem is a failsafe test intended to filter out ACM serial functions using the same 02/02/ff class/subclass/protocol as RNDIS. The serial functions are recognized by their non-zero bmCapabilities. No RNDIS function with non-zero bmCapabilities were known at the time this failsafe was added. But it turns out that some Wireless class RNDIS functions are using the bmCapabilities field. These functions are uniquely identified as RNDIS by their class/subclass/protocol, so the failing test can safely be disabled. The same applies to the two types of Misc class RNDIS functions. Applying the failsafe to Communication class functions only retains the original functionality, and fixes the problem for the Mediatek based smartphone. Tow examples of CDC functional descriptors with non-zero bmCapabilities from Wireless class RNDIS functions are: 0e8d:000a Mediatek Crosscall Spider X5 3G Phone CDC Header: bcdCDC 1.10 CDC ACM: bmCapabilities 0x0f connection notifications sends break line coding and serial state get/set/clear comm features CDC Union: bMasterInterface 0 bSlaveInterface 1 CDC Call Management: bmCapabilities 0x03 call management use DataInterface bDataInterface 1 and 19d2:1023 ZTE K4201-z CDC Header: bcdCDC 1.10 CDC ACM: bmCapabilities 0x02 line coding and serial state CDC Call Management: bmCapabilities 0x03 call management use DataInterface bDataInterface 1 CDC Union: bMasterInterface 0 bSlaveInterface 1 The Mediatek example is believed to apply to most smartphones with Mediatek firmware. The ZTE example is most likely also part of a larger family of devices/firmwares. Suggested-by: Lars Melin <larsm17@gmail.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-13Merge branch 'sctp_do_bind-leak'David S. Miller1-10/+11
Mao Wenan says: ==================== fix memory leak for sctp_do_bind First two patches are to do cleanup, remove redundant assignment, and change return type of sctp_get_port_local. Third patch is to fix memory leak for sctp_do_bind if failed to bind address. v2: add one patch to change return type of sctp_get_port_local. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-13sctp: destroy bucket if failed to bind addrMao Wenan1-4/+6
There is one memory leak bug report: BUG: memory leak unreferenced object 0xffff8881dc4c5ec0 (size 40): comm "syz-executor.0", pid 5673, jiffies 4298198457 (age 27.578s) hex dump (first 32 bytes): 02 00 00 00 81 88 ff ff 00 00 00 00 00 00 00 00 ................ f8 63 3d c1 81 88 ff ff 00 00 00 00 00 00 00 00 .c=............. backtrace: [<0000000072006339>] sctp_get_port_local+0x2a1/0xa00 [sctp] [<00000000c7b379ec>] sctp_do_bind+0x176/0x2c0 [sctp] [<000000005be274a2>] sctp_bind+0x5a/0x80 [sctp] [<00000000b66b4044>] inet6_bind+0x59/0xd0 [ipv6] [<00000000c68c7f42>] __sys_bind+0x120/0x1f0 net/socket.c:1647 [<000000004513635b>] __do_sys_bind net/socket.c:1658 [inline] [<000000004513635b>] __se_sys_bind net/socket.c:1656 [inline] [<000000004513635b>] __x64_sys_bind+0x3e/0x50 net/socket.c:1656 [<0000000061f2501e>] do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296 [<0000000003d1e05e>] entry_SYSCALL_64_after_hwframe+0x49/0xbe This is because in sctp_do_bind, if sctp_get_port_local is to create hash bucket successfully, and sctp_add_bind_addr failed to bind address, e.g return -ENOMEM, so memory leak found, it needs to destroy allocated bucket. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Mao Wenan <maowenan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-13sctp: remove redundant assignment when call sctp_get_port_localMao Wenan1-2/+1
There are more parentheses in if clause when call sctp_get_port_local in sctp_do_bind, and redundant assignment to 'ret'. This patch is to do cleanup. Signed-off-by: Mao Wenan <maowenan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-13sctp: change return type of sctp_get_port_localMao Wenan1-4/+4
Currently sctp_get_port_local() returns a long which is either 0,1 or a pointer casted to long. It's neither of the callers use the return value since commit 62208f12451f ("net: sctp: simplify sctp_get_port"). Now two callers are sctp_get_port and sctp_do_bind, they actually assumend a casted to an int was the same as a pointer casted to a long, and they don't save the return value just check whether it is zero or non-zero, so it would better change return type from long to int for sctp_get_port_local. Signed-off-by: Mao Wenan <maowenan@huawei.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-13ip: support SO_MARK cmsgWillem de Bruijn9-8/+15
Enable setting skb->mark for UDP and RAW sockets using cmsg. This is analogous to existing support for TOS, TTL, txtime, etc. Packet sockets already support this as of commit c7d39e32632e ("packet: support per-packet fwmark for af_packet sendmsg"). Similar to other fields, implement by 1. initialize the sockcm_cookie.mark from socket option sk_mark 2. optionally overwrite this in ip_cmsg_send/ip6_datagram_send_ctl 3. initialize inet_cork.mark from sockcm_cookie.mark 4. initialize each (usually just one) skb->mark from inet_cork.mark Step 1 is handled in one location for most protocols by ipcm_init_sk as of commit 351782067b6b ("ipv4: ipcm_cookie initializers"). Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-13Merge ath-next from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.gitKalle Valo33-215/+1202
ath.git patches for 5.4. Major changes: wil6210 * add support for Enhanced Directional Multi-Gigabit (EDMG) channels 9-11 * add debugfs file to show PCM ring content * report boottime_ns in scan results ath9k * add a separate loader for AR92XX (and older) pci(e) without eeprom, enabled with the new ATH9K_PCI_NO_EEPROM Kconfig option
2019-09-13rtw88: report RX power for each antennaYan-Hsuan Chuang3-3/+9
Report chains and chain_signal in ieee80211_rx_status. It is useful for program such as tcpdump to see if the antennas are well connected/placed. 8822C is able to receive CCK rates with 2 antennas, while 8822B can only use 1 antenna path to receive CCK rates. Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com> Reviewed-by: Chris Chiu <chiu@endlessm.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-09-13rtw88: fix wrong rx power calculationTsang-Shian Lin1-2/+2
Fix the wrong RF path for CCK rx power calculation. Fixes: e3037485c68e ("rtw88: new Realtek 802.11ac driver") Signed-off-by: Tsang-Shian Lin <thlin@realtek.com> Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com> Reviewed-by: Chris Chiu <chiu@endlessm.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-09-13rtlwifi: rtl8192de: replace _rtl92d_evm_db_to_percentage with generic versionMichael Straube1-16/+2
Function _rtl92d_evm_db_to_percentage is functionally identical to the generic version rtl_evm_db_to_percentage, so remove _rtl92d_evm_db_to_percentage and use the generic version instead. Signed-off-by: Michael Straube <straube.linux@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-09-13rtlwifi: rtl8192cu: replace _rtl92c_evm_db_to_percentage with generic versionMichael Straube1-17/+1
Function _rtl92c_evm_db_to_percentage is functionally identical to the generic version rtl_evm_db_to_percentage, so remove _rtl92c_evm_db_to_percentage and use the generic version instead. Signed-off-by: Michael Straube <straube.linux@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-09-13rtlwifi: rtl8192ce: replace _rtl92c_evm_db_to_percentage with generic versionMichael Straube1-22/+1
Function _rtl92c_evm_db_to_percentage is functionally identical to the generic version rtl_evm_db_to_percentage, so remove _rtl92c_evm_db_to_percentage and use the generic version instead. Signed-off-by: Michael Straube <straube.linux@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-09-13rtw88: allows to receive AMSDU in AMPDUYan-Hsuan Chuang1-0/+1
The hardware has enough buffer to receive like 8K for an MPDU. So tell mac80211 that we can receive AMSDU in AMPDU. Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2019-09-13rtw88: add dynamic cck pd mechanismTzu-En Huang4-0/+188
This mechanism reduces the numbers of false alram in cck rate by dynamically adjusting the value of power threshold and cs_ratio. We determine the new value by three factors, which are rssi, false alarm count and igi. Based on these factors, we define the current condition into five levels. Compared to the previous level, if the level is changed, we set the new values for power threshold and cs_ratio. Signed-off-by: Tzu-En Huang <tehuang@realtek.com> Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>