summaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2019-08-23Merge tag 'ceph-for-5.3-rc6' of git://github.com/ceph/ceph-clientLinus Torvalds1-1/+2
Pull ceph fixes from Ilya Dryomov: "Three important fixes tagged for stable (an indefinite hang, a crash on an assert and a NULL pointer dereference) plus a small series from Luis fixing instances of vfree() under spinlock" * tag 'ceph-for-5.3-rc6' of git://github.com/ceph/ceph-client: libceph: fix PG split vs OSD (re)connect race ceph: don't try fill file_lock on unsuccessful GETFILELOCK reply ceph: clear page dirty before invalidate page ceph: fix buffer free while holding i_ceph_lock in fill_inode() ceph: fix buffer free while holding i_ceph_lock in __ceph_build_xattrs_blob() ceph: fix buffer free while holding i_ceph_lock in __ceph_setxattr() libceph: allow ceph_buffer_put() to receive a NULL ceph_buffer
2019-08-22libceph: allow ceph_buffer_put() to receive a NULL ceph_bufferLuis Henriques1-1/+2
Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-08-19Merge branch 'siginfo-linus' of ↵Linus Torvalds1-1/+14
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull kernel thread signal handling fix from Eric Biederman: "I overlooked the fact that kernel threads are created with all signals set to SIG_IGN, and accidentally caused a regression in cifs and drbd when replacing force_sig with send_sig. This is my fix for that regression. I add a new function allow_kernel_signal which allows kernel threads to receive signals sent from the kernel, but continues to ignore all signals sent from userspace. This ensures the user space interface for cifs and drbd remain the same. These kernel threads depend on blocking networking calls which block until something is received or a signal is pending. Making receiving of signals somewhat necessary for these kernel threads. Perhaps someday we can cleanup those interfaces and remove allow_kernel_signal. If not allow_kernel_signal is pretty trivial and clearly documents what is going on so I don't think we will mind carrying it" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal: Allow cifs and drbd to receive their terminating signals
2019-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds14-19/+43
Pull networking fixes from David Miller: 1) Fix jmp to 1st instruction in x64 JIT, from Alexei Starovoitov. 2) Severl kTLS fixes in mlx5 driver, from Tariq Toukan. 3) Fix severe performance regression due to lack of SKB coalescing of fragments during local delivery, from Guillaume Nault. 4) Error path memory leak in sch_taprio, from Ivan Khoronzhuk. 5) Fix batched events in skbedit packet action, from Roman Mashak. 6) Propagate VLAN TX offload to hw_enc_features in bond and team drivers, from Yue Haibing. 7) RXRPC local endpoint refcounting fix and read after free in rxrpc_queue_local(), from David Howells. 8) Fix endian bug in ibmveth multicast list handling, from Thomas Falcon. 9) Oops, make nlmsg_parse() wrap around the correct function, __nlmsg_parse not __nla_parse(). Fix from David Ahern. 10) Memleak in sctp_scend_reset_streams(), fro Zheng Bin. 11) Fix memory leak in cxgb4, from Wenwen Wang. 12) Yet another race in AF_PACKET, from Eric Dumazet. 13) Fix false detection of retransmit failures in tipc, from Tuong Lien. 14) Use after free in ravb_tstamp_skb, from Tho Vu. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits) ravb: Fix use-after-free ravb_tstamp_skb netfilter: nf_tables: map basechain priority to hardware priority net: sched: use major priority number as hardware priority wimax/i2400m: fix a memory leak bug net: cavium: fix driver name ibmvnic: Unmap DMA address of TX descriptor buffers after use bnxt_en: Fix to include flow direction in L2 key bnxt_en: Use correct src_fid to determine direction of the flow bnxt_en: Suppress HWRM errors for HWRM_NVM_GET_VARIABLE command bnxt_en: Fix handling FRAG_ERR when NVM_INSTALL_UPDATE cmd fails bnxt_en: Improve RX doorbell sequence. bnxt_en: Fix VNIC clearing logic for 57500 chips. net: kalmia: fix memory leaks cx82310_eth: fix a memory leak bug bnx2x: Fix VF's VLAN reconfiguration in reload. Bluetooth: Add debug setting for changing minimum encryption key size tipc: fix false detection of retransmit failures lan78xx: Fix memory leaks MAINTAINERS: r8169: Update path to the driver MAINTAINERS: PHY LIBRARY: Update files in the record ...
2019-08-19keys: Fix description sizeDavid Howells1-4/+4
The maximum key description size is 4095. Commit f771fde82051 ("keys: Simplify key description management") inadvertantly reduced that to 255 and made sizes between 256 and 4095 work weirdly, and any size whereby size & 255 == 0 would cause an assertion in __key_link_begin() at the following line: BUG_ON(index_key->desc_len == 0); This can be fixed by simply increasing the size of desc_len in struct keyring_index_key to a u16. Note the argument length test in keyutils only checked empty descriptions and descriptions with a size around the limit (ie. 4095) and not for all the values in between, so it missed this. This has been addressed and https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/commit/?id=066bf56807c26cd3045a25f355b34c1d8a20a5aa now exhaustively tests all possible lengths of type, description and payload and then some. The assertion failure looks something like: kernel BUG at security/keys/keyring.c:1245! ... RIP: 0010:__key_link_begin+0x88/0xa0 ... Call Trace: key_create_or_update+0x211/0x4b0 __x64_sys_add_key+0x101/0x200 do_syscall_64+0x5b/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 It can be triggered by: keyctl add user "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" a @s Fixes: f771fde82051 ("keys: Simplify key description management") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-19signal: Allow cifs and drbd to receive their terminating signalsEric W. Biederman1-1/+14
My recent to change to only use force_sig for a synchronous events wound up breaking signal reception cifs and drbd. I had overlooked the fact that by default kthreads start out with all signals set to SIG_IGN. So a change I thought was safe turned out to have made it impossible for those kernel thread to catch their signals. Reverting the work on force_sig is a bad idea because what the code was doing was very much a misuse of force_sig. As the way force_sig ultimately allowed the signal to happen was to change the signal handler to SIG_DFL. Which after the first signal will allow userspace to send signals to these kernel threads. At least for wake_ack_receiver in drbd that does not appear actively wrong. So correct this problem by adding allow_kernel_signal that will allow signals whose siginfo reports they were sent by the kernel through, but will not allow userspace generated signals, and update cifs and drbd to call allow_kernel_signal in an appropriate place so that their thread can receive this signal. Fixing things this way ensures that userspace won't be able to send signals and cause problems, that it is clear which signals the threads are expecting to receive, and it guarantees that nothing else in the system will be affected. This change was partly inspired by similar cifs and drbd patches that added allow_signal. Reported-by: ronnie sahlberg <ronniesahlberg@gmail.com> Reported-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Tested-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Cc: Steve French <smfrench@gmail.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: David Laight <David.Laight@ACULAB.COM> Fixes: 247bc9470b1e ("cifs: fix rmmod regression in cifs.ko caused by force_sig changes") Fixes: 72abe3bcf091 ("signal/cifs: Fix cifs_put_tcp_session to call send_sig instead of force_sig") Fixes: fee109901f39 ("signal/drbd: Use send_sig not force_sig") Fixes: 3cf5d076fb4d ("signal: Remove task parameter from force_sig") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-08-18netfilter: nf_tables: map basechain priority to hardware priorityPablo Neira Ayuso1-0/+2
This patch adds initial support for offloading basechains using the priority range from 1 to 65535. This is restricting the netfilter priority range to 16-bit integer since this is what most drivers assume so far from tc. It should be possible to extend this range of supported priorities later on once drivers are updated to support for 32-bit integer priorities. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-18net: sched: use major priority number as hardware priorityPablo Neira Ayuso1-1/+1
tc transparently maps the software priority number to hardware. Update it to pass the major priority which is what most drivers expect. Update drivers too so they do not need to lshift the priority field of the flow_cls_common_offload object. The stmmac driver is an exception, since this code assumes the tc software priority is fine, therefore, lshift it just to be conservative. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-18Merge tag 'usb-5.3-rc5' of ↵Linus Torvalds2-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are number of small USB fixes for 5.3-rc5. Syzbot has been on a tear recently now that it has some good USB debugging hooks integrated, so there's a number of fixes in here found by those tools for some _very_ old bugs. Also a handful of gadget driver fixes for reported issues, some hopefully-final dma fixes for host controller drivers, and some new USB serial gadget driver ids. All of these have been in linux-next this week with no reported issues (the usb-serial ones were in linux-next in its own branch, but merged into mine on Friday)" * tag 'usb-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: add a hcd_uses_dma helper usb: don't create dma pools for HCDs with a localmem_pool usb: chipidea: imx: fix EPROBE_DEFER support during driver probe usb: host: fotg2: restart hcd after port reset USB: CDC: fix sanity checks in CDC union parser usb: cdc-acm: make sure a refcount is taken early enough USB: serial: option: add the BroadMobi BM818 card USB: serial: option: Add Motorola modem UARTs USB: core: Fix races in character device registration and deregistraion usb: gadget: mass_storage: Fix races between fsg_disable and fsg_set_alt usb: gadget: composite: Clear "suspended" on reset/disconnect usb: gadget: udc: renesas_usb3: Fix sysfs interface of "role" USB: serial: option: add D-Link DWM-222 device ID USB: serial: option: Add support for ZTE MF871A
2019-08-17Merge tag 'for-linus-2019-08-17' of git://git.kernel.dk/linux-blockLinus Torvalds1-4/+1
Pull block fixes from Jens Axboe: "A collection of fixes that should go into this series. This contains: - Revert of the REQ_NOWAIT_INLINE and associated dio changes. There were still corner cases there, and even though I had a solution for it, it's too involved for this stage. (me) - Set of NVMe fixes (via Sagi) - io_uring fix for fixed buffers (Anthony) - io_uring defer issue fix (Jackie) - Regression fix for queue sync at exit time (zhengbin) - xen blk-back memory leak fix (Wenwen)" * tag 'for-linus-2019-08-17' of git://git.kernel.dk/linux-block: io_uring: fix an issue when IOSQE_IO_LINK is inserted into defer list block: remove REQ_NOWAIT_INLINE io_uring: fix manual setup of iov_iter for fixed buffers xen/blkback: fix memory leaks blk-mq: move cancel of requeue_work to the front of blk_exit_queue nvme-pci: Fix async probe remove race nvme: fix controller removal race with scan work nvme-rdma: fix possible use-after-free in connect error flow nvme: fix a possible deadlock when passthru commands sent to a multipath device nvme-core: Fix extra device_put() call on error path nvmet-file: fix nvmet_file_flush() always returning an error nvmet-loop: Flush nvme_delete_wq when removing the port nvmet: Fix use-after-free bug when a port is removed nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns
2019-08-17Merge branch 'for-upstream' of ↵David S. Miller1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== pull request: bluetooth 2019-08-17 Here's a set of Bluetooth fixes for the 5.3-rc series: - Multiple fixes for Qualcomm (btqca & hci_qca) drivers - Minimum encryption key size debugfs setting (this is required for Bluetooth Qualification) - Fix hidp_send_message() to have a meaningful return value ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-17Bluetooth: Add debug setting for changing minimum encryption key sizeMarcel Holtmann1-0/+1
For testing and qualification purposes it is useful to allow changing the minimum encryption key size value that the host stack is going to enforce. This adds a new debugfs setting min_encrypt_key_size to achieve this functionality. Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2019-08-16Merge tag 'pm-5.3-rc5' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These add a check to avoid recent suspend-to-idle power regression on systems with NVMe drives where the PCIe ASPM policy is "performance" (or when the kernel is built without ASPM support), fix an issue related to frequency limits in the schedutil cpufreq governor and fix a mistake related to the PM QoS usage in the cpufreq core introduced recently. Specifics: - Disable NVMe power optimization related to suspend-to-idle added recently on systems where PCIe ASPM is not able to put PCIe links into low-power states to prevent excess power from being drawn by the system while suspended (Rafael Wysocki). - Make the schedutil governor handle frequency limits changes properly in all cases (Viresh Kumar). - Prevent the cpufreq core from treating positive values returned by dev_pm_qos_update_request() as errors (Viresh Kumar)" * tag 'pm-5.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: nvme-pci: Allow PCI bus-level PM to be used if ASPM is disabled PCI/ASPM: Add pcie_aspm_enabled() cpufreq: schedutil: Don't skip freq update when limits change cpufreq: dev_pm_qos_update_request() can return 1 on success
2019-08-15Merge tag 'rxrpc-fixes-20190814' of ↵David S. Miller1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fix local endpoint handling Here's a pair of patches that fix two issues in the handling of local endpoints (rxrpc_local structs): (1) Use list_replace_init() rather than list_replace() if we're going to unconditionally delete the replaced item later, lest the list get corrupted. (2) Don't access the rxrpc_local object after passing our ref to the workqueue, not even to illuminate tracepoints, as the work function may cause the object to be freed. We have to cache the information beforehand. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller1-2/+7
Pablo Neira Ayuso says: ==================== Netfilter fixes for net This patchset contains Netfilter fixes for net: 1) Extend selftest to cover flowtable with ipsec, from Florian Westphal. 2) Fix interaction of ipsec with flowtable, also from Florian. 3) User-after-free with bound set to rule that fails to load. 4) Adjust state and timeout for flows that expire. 5) Timeout update race with flows in teardown state. 6) Ensure conntrack id hash calculation use invariants as input, from Dirk Morris. 7) Do not push flows into flowtable for TCP fin/rst packets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-15block: remove REQ_NOWAIT_INLINEJens Axboe1-4/+1
We had a few issues with this code, and there's still a problem around how we deal with error handling for chained/split bios. For now, just revert the code and we'll try again with a thoroug solution. This reverts commits: e15c2ffa1091 ("block: fix O_DIRECT error handling for bio fragments") 0eb6ddfb865c ("block: Fix __blkdev_direct_IO() for bio fragments") 6a43074e2f46 ("block: properly handle IOCB_NOWAIT for async O_DIRECT IO") 893a1c97205a ("blk-mq: allow REQ_NOWAIT to return an error inline") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-08-15Merge tag 'auxdisplay-for-linus-v5.3-rc5' of git://github.com/ojeda/linuxLinus Torvalds1-39/+0
Pull auxdisplay fixes from Miguel Ojeda: "A few minor auxdisplay improvements: - A couple of small header cleanups for charlcd (Masahiro Yamada) - A trivial typo fix for the examples of cfag12864b (Masahiro Yamada) - An Kconfig help text improvement for charlcd (Mans Rullgard) - An error path fix for panel (zhengbin)" * tag 'auxdisplay-for-linus-v5.3-rc5' of git://github.com/ojeda/linux: auxdisplay: Fix a typo in cfag12864b-example.c auxdisplay: charlcd: add include guard to charlcd.h auxdisplay: charlcd: move charlcd.h to drivers/auxdisplay auxdisplay: charlcd: add help text for backlight initial state auxdisplay: panel: need to delete scan_timer when misc_register fails in panel_attach
2019-08-15usb: add a hcd_uses_dma helperChristoph Hellwig2-1/+4
The USB buffer allocation code is the only place in the usb core (and in fact the whole kernel) that uses is_device_dma_capable, while the URB mapping code uses the uses_dma flag in struct usb_bus. Switch the buffer allocation to use the uses_dma flag used by the rest of the USB code, and create a helper in hcd.h that checks this flag as well as the CONFIG_HAS_DMA to simplify the caller a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20190811080520.21712-3-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-14Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-1/+2
Pull rdma fixes from Doug Ledford: "Fairly small pull request for -rc3. I'm out of town the rest of this week, so I made sure to clean out as much as possible from patchworks in enough time for 0-day to chew through it (Yay! for 0-day being back online! :-)). Jason might send through any emergency stuff that could pop up, otherwise I'm back next week. The only real thing of note is the siw ABI change. Since we just merged siw *this* release, there are no prior kernel releases to maintain kernel ABI with. I told Bernard that if there is anything else about the siw ABI he thinks he might want to change before it goes set in stone, he should get it in ASAP. The siw module was around for several years outside the kernel tree, and it had to be revamped considerably for inclusion upstream, so we are making no attempts to be backward compatible with the out of tree version. Once 5.3 is actually released, we will have our baseline ABI to maintain. Summary: - Fix a memory registration release flow issue that was causing a WARN_ON (mlx5) - If the counters for a port aren't allocated, then we can't do operations on the non-existent counters (core) - Check the right variable for error code result (mlx5) - Fix a use after free issue (mlx5) - Fix an off by one memory leak (siw) - Actually return an error code on error (core) - Allow siw to be built on 32bit arches (siw, ABI change, but OK since siw was just merged this merge window and there is no prior released kernel to maintain compatibility with and we also updated the rdma-core user space package to match)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/siw: Change CQ flags from 64->32 bits RDMA/core: Fix error code in stat_get_doit_qp() RDMA/siw: Fix a memory leak in siw_init_cpulist() IB/mlx5: Fix use-after-free error while accessing ev_file pointer IB/mlx5: Check the correct variable in error handling code RDMA/counter: Prevent QP counter binding if counters unsupported IB/mlx5: Fix implicit MR release flow
2019-08-14Merge tag 'dma-mapping-5.3-4' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds1-4/+9
Pull dma-mapping fixes from Christoph Hellwig: - fix the handling of the bus_dma_mask in dma_get_required_mask, which caused a regression in this merge window (Lucas Stach) - fix a regression in the handling of DMA_ATTR_NO_KERNEL_MAPPING (me) - fix dma_mmap_coherent to not cause page attribute mismatches on coherent architectures like x86 (me) * tag 'dma-mapping-5.3-4' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: fix page attributes for dma_mmap_* dma-direct: don't truncate dma_required_mask to bus addressing capabilities dma-direct: fix DMA_ATTR_NO_KERNEL_MAPPING
2019-08-14rxrpc: Fix read-after-free in rxrpc_queue_local()David Howells1-3/+3
rxrpc_queue_local() attempts to queue the local endpoint it is given and then, if successful, prints a trace line. The trace line includes the current usage count - but we're not allowed to look at the local endpoint at this point as we passed our ref on it to the workqueue. Fix this by reading the usage count before queuing the work item. Also fix the reading of local->debug_id for trace lines, which must be done with the same consideration as reading the usage count. Fixes: 09d2bf595db4 ("rxrpc: Add a tracepoint to track rxrpc_local refcounting") Reported-by: syzbot+78e71c5bab4f76a6a719@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-13netlink: Fix nlmsg_parse as a wrapper for strict message parsingDavid Ahern1-3/+2
Eric reported a syzbot warning: BUG: KMSAN: uninit-value in nh_valid_get_del_req+0x6f1/0x8c0 net/ipv4/nexthop.c:1510 CPU: 0 PID: 11812 Comm: syz-executor444 Not tainted 5.3.0-rc3+ #17 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x191/0x1f0 lib/dump_stack.c:113 kmsan_report+0x162/0x2d0 mm/kmsan/kmsan_report.c:109 __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:294 nh_valid_get_del_req+0x6f1/0x8c0 net/ipv4/nexthop.c:1510 rtm_del_nexthop+0x1b1/0x610 net/ipv4/nexthop.c:1543 rtnetlink_rcv_msg+0x115a/0x1580 net/core/rtnetlink.c:5223 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:5241 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xf6c/0x1050 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x110f/0x1330 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:637 [inline] sock_sendmsg net/socket.c:657 [inline] ___sys_sendmsg+0x14ff/0x1590 net/socket.c:2311 __sys_sendmmsg+0x53a/0xae0 net/socket.c:2413 __do_sys_sendmmsg net/socket.c:2442 [inline] __se_sys_sendmmsg+0xbd/0xe0 net/socket.c:2439 __x64_sys_sendmmsg+0x56/0x70 net/socket.c:2439 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:297 entry_SYSCALL_64_after_hwframe+0x63/0xe7 The root cause is nlmsg_parse calling __nla_parse which means the header struct size is not checked. nlmsg_parse should be a wrapper around __nlmsg_parse with NL_VALIDATE_STRICT for the validate argument very much like nlmsg_parse_deprecated is for NL_VALIDATE_LIBERAL. Fixes: 3de6440354465 ("netlink: re-add parse/validate functions in strict mode") Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-08-13Revert "mm, thp: restore node-local hugepage allocations"Andrea Arcangeli1-0/+2
This reverts commit 2f0799a0ffc033b ("mm, thp: restore node-local hugepage allocations"). commit 2f0799a0ffc033b was rightfully applied to avoid the risk of a severe regression that was reported by the kernel test robot at the end of the merge window. Now we understood the regression was a false positive and was caused by a significant increase in fairness during a swap trashing benchmark. So it's safe to re-apply the fix and continue improving the code from there. The benchmark that reported the regression is very useful, but it provides a meaningful result only when there is no significant alteration in fairness during the workload. The removal of __GFP_THISNODE increased fairness. __GFP_THISNODE cannot be used in the generic page faults path for new memory allocations under the MPOL_DEFAULT mempolicy, or the allocation behavior significantly deviates from what the MPOL_DEFAULT semantics are supposed to be for THP and 4k allocations alike. Setting THP defrag to "always" or using MADV_HUGEPAGE (with THP defrag set to "madvise") has never meant to provide an implicit MPOL_BIND on the "current" node the task is running on, causing swap storms and providing a much more aggressive behavior than even zone_reclaim_node = 3. Any workload who could have benefited from __GFP_THISNODE has now to enable zone_reclaim_mode=1||2||3. __GFP_THISNODE implicitly provided the zone_reclaim_mode behavior, but it only did so if THP was enabled: if THP was disabled, there would have been no chance to get any 4k page from the current node if the current node was full of pagecache, which further shows how this __GFP_THISNODE was misplaced in MADV_HUGEPAGE. MADV_HUGEPAGE has never been intended to provide any zone_reclaim_mode semantics, in fact the two are orthogonal, zone_reclaim_mode = 1|2|3 must work exactly the same with MADV_HUGEPAGE set or not. The performance characteristic of memory depends on the hardware details. The numbers below are obtained on Naples/EPYC architecture and the N/A projection extends them to show what we should aim for in the future as a good THP NUMA locality default. The benchmark used exercises random memory seeks (note: the cost of the page faults is not part of the measurement). D0 THP | D0 4k | D1 THP | D1 4k | D2 THP | D2 4k | D3 THP | D3 4k | ... 0% | +43% | +45% | +106% | +131% | +224% | N/A | N/A D0 means distance zero (i.e. local memory), D1 means distance one (i.e. intra socket memory), D2 means distance two (i.e. inter socket memory), etc... For the guest physical memory allocated by qemu and for guest mode kernel the performance characteristic of RAM is more complex and an ideal default could be: D0 THP | D1 THP | D0 4k | D2 THP | D1 4k | D3 THP | D2 4k | D3 4k | ... 0% | +58% | +101% | N/A | +222% | N/A | N/A | N/A NOTE: the N/A are projections and haven't been measured yet, the measurement in this case is done on a 1950x with only two NUMA nodes. The THP case here means THP was used both in the host and in the guest. After applying this commit the THP NUMA locality order that we'll get out of MADV_HUGEPAGE is this: D0 THP | D1 THP | D2 THP | D3 THP | ... | D0 4k | D1 4k | D2 4k | D3 4k | ... Before this commit it was: D0 THP | D0 4k | D1 4k | D2 4k | D3 4k | ... Even if we ignore the breakage of large workloads that can't fit in a single node that the __GFP_THISNODE implicit "current node" mbind caused, the THP NUMA locality order provided by __GFP_THISNODE was still not the one we shall aim for in the long term (i.e. the first one at the top). After this commit is applied, we can introduce a new allocator multi order API and to replace those two alloc_pages_vmas calls in the page fault path, with a single multi order call: unsigned int order = (1 << HPAGE_PMD_ORDER) | (1 << 0); page = alloc_pages_multi_order(..., &order); if (!page) goto out; if (!(order & (1 << 0))) { VM_WARN_ON(order != 1 << HPAGE_PMD_ORDER); /* THP fault */ } else { VM_WARN_ON(order != 1 << 0); /* 4k fallback */ } The page allocator logic has to be altered so that when it fails on any zone with order 9, it has to try again with a order 0 before falling back to the next zone in the zonelist. After that we need to do more measurements and evaluate if adding an opt-in feature for guest mode is worth it, to swap "DN 4k | DN+1 THP" with "DN+1 THP | DN 4k" at every NUMA distance crossing. Link: http://lkml.kernel.org/r/20190503223146.2312-3-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13Revert "Revert "mm, thp: consolidate THP gfp handling into ↵Andrea Arcangeli1-8/+4
alloc_hugepage_direct_gfpmask"" Patch series "reapply: relax __GFP_THISNODE for MADV_HUGEPAGE mappings". The fixes for what was originally reported as "pathological THP behavior" we rightfully reverted to be sure not to introduced regressions at end of a merge window after a severe regression report from the kernel bot. We can safely re-apply them now that we had time to analyze the problem. The mm process worked fine, because the good fixes were eventually committed upstream without excessive delay. The regression reported by the kernel bot however forced us to revert the good fixes to be sure not to introduce regressions and to give us the time to analyze the issue further. The silver lining is that this extra time allowed to think more at this issue and also plan for a future direction to improve things further in terms of THP NUMA locality. This patch (of 2): This reverts commit 356ff8a9a78fb35d ("Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"). So it reapplies 89c83fb539f954 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"). Consolidation of the THP allocation flags at the same place was meant to be a clean up to easier handle otherwise scattered code which is imposing a maintenance burden. There were no real problems observed with the gfp mask consolidation but the reversion was rushed through without a larger consensus regardless. This patch brings the consolidation back because this should make the long term maintainability easier as well as it should allow future changes to be less error prone. [mhocko@kernel.org: changelog additions] Link: http://lkml.kernel.org/r/20190503223146.2312-2-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13include/asm-generic/5level-fixup.h: fix variable 'p4d' set but not usedQian Cai1-3/+18
A compiler throws a warning on an arm64 system since commit 9849a5697d3d ("arch, mm: convert all architectures to use 5level-fixup.h"), mm/kasan/init.c: In function 'kasan_free_p4d': mm/kasan/init.c:344:9: warning: variable 'p4d' set but not used [-Wunused-but-set-variable] p4d_t *p4d; ^~~ because p4d_none() in "5level-fixup.h" is compiled away while it is a static inline function in "pgtable-nopud.h". However, if converted p4d_none() to a static inline there, powerpc would be unhappy as it reads those in assembler language in "arch/powerpc/include/asm/book3s/64/pgtable.h", so it needs to skip assembly include for the static inline C function. While at it, converted a few similar functions to be consistent with the ones in "pgtable-nopud.h". Link: http://lkml.kernel.org/r/20190806232917.881-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13mm: workingset: fix vmstat counters for shadow nodesRoman Gushchin1-0/+19
Memcg counters for shadow nodes are broken because the memcg pointer is obtained in a wrong way. The following approach is used: virt_to_page(xa_node)->mem_cgroup Since commit 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages") page->mem_cgroup pointer isn't set for slab pages, so memcg_from_slab_page() should be used instead. Also I doubt that it ever worked correctly: virt_to_head_page() should be used instead of virt_to_page(). Otherwise objects residing on tail pages are not accounted, because only the head page contains a valid mem_cgroup pointer. That was a case since the introduction of these counters by the commit 68d48e6a2df5 ("mm: workingset: add vmstat counter for shadow nodes"). Link: http://lkml.kernel.org/r/20190801233532.138743-1-guro@fb.com Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages") Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13mm: document zone device struct page field usageRalph Campbell1-1/+10
Patch series "mm/hmm: fixes for device private page migration", v3. Testing the latest linux git tree turned up a few bugs with page migration to and from ZONE_DEVICE private and anonymous pages. Hopefully it clarifies how ZONE_DEVICE private struct page uses the same mapping and index fields from the source anonymous page mapping. This patch (of 3): Struct page for ZONE_DEVICE private pages uses the page->mapping and and page->index fields while the source anonymous pages are migrated to device private memory. This is so rmap_walk() can find the page when migrating the ZONE_DEVICE private page back to system memory. ZONE_DEVICE pmem backed fsdax pages also use the page->mapping and page->index fields when files are mapped into a process address space. Add comments to struct page and remove the unused "_zd_pad_1" field to make this more clear. Link: http://lkml.kernel.org/r/20190724232700.23327-2-rcampbell@nvidia.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-13RDMA/siw: Change CQ flags from 64->32 bitsBernard Metzler1-1/+2
This patch changes the driver/user shared (mmapped) CQ notification flags field from unsigned 64-bits size to unsigned 32-bits size. This enables building siw on 32-bit architectures. This patch changes the siw-abi, but as siw was only just merged in this merge window cycle, there are no released kernels with the prior abi. We are making no attempt to be binary compatible with siw user space libraries prior to the merge of siw into the upstream kernel, only moving forward with upstream kernels and upstream rdma-core provided siw libraries are we guaranteeing compatibility. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190809151816.13018-1-bmt@zurich.ibm.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-12PCI/ASPM: Add pcie_aspm_enabled()Rafael J. Wysocki1-0/+2
Add a function checking whether or not PCIe ASPM has been enabled for a given device. It will be used by the NVMe driver to decide how to handle the device during system suspend. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
2019-08-10dma-mapping: fix page attributes for dma_mmap_*Christoph Hellwig1-4/+9
All the way back to introducing dma_common_mmap we've defaulted to mark the pages as uncached. But this is wrong for DMA coherent devices. Later on DMA_ATTR_WRITE_COMBINE also got incorrect treatment as that flag is only treated special on the alloc side for non-coherent devices. Introduce a new dma_pgprot helper that deals with the check for coherent devices so that only the remapping cases ever reach arch_dma_mmap_pgprot and we thus ensure no aliasing of page attributes happens, which makes the powerpc version of arch_dma_mmap_pgprot obsolete and simplifies the remaining ones. Note that this means arch_dma_mmap_pgprot is a bit misnamed now, but we'll phase it out soon. Fixes: 64ccc9c033c6 ("common: dma-mapping: add support for generic dma_mmap_* calls") Reported-by: Shawn Anastasio <shawn@anastas.io> Reported-by: Gavin Li <git@thegavinli.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> # arm64
2019-08-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-2/+7
Pull kvm fixes from Paolo Bonzini: "Bugfixes (arm and x86) and cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: selftests: kvm: Adding config fragments KVM: selftests: Update gitignore file for latest changes kvm: remove unnecessary PageReserved check KVM: arm/arm64: vgic: Reevaluate level sensitive interrupts on enable KVM: arm: Don't write junk to CP15 registers on reset KVM: arm64: Don't write junk to sysregs on reset KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block x86: kvm: remove useless calls to kvm_para_available KVM: no need to check return value of debugfs_create functions KVM: remove kvm_arch_has_vcpu_debugfs() KVM: Fix leak vCPU's VMCS value into other pCPU KVM: Check preempted_in_kernel for involuntary preemption KVM: LAPIC: Don't need to wakeup vCPU twice afer timer fire arm64: KVM: hyp: debug-sr: Mark expected switch fall-through KVM: arm64: Update kvm_arm_exception_class and esr_class_str for new EC KVM: arm: vgic-v3: Mark expected switch fall-through arm64: KVM: regmap: Fix unexpected switch fall-through KVM: arm/arm64: Introduce kvm_pmu_vcpu_init() to setup PMU counter index
2019-08-09Merge tag 'mlx5-fixes-2019-08-08' of ↵David S. Miller2-5/+4
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-08-08 This series introduces some fixes to mlx5 driver. Highlights: 1) From Tariq, Critical mlx5 kTLS fixes to better align with hw specs. 2) From Aya, Fixes to mlx5 tx devlink health reporter. 3) From Maxim, aRFs parsing to use flow dissector to avoid relying on invalid skb fields. Please pull and let me know if there is any problem. For -stable v4.3 ('net/mlx5e: Only support tx/rx pause setting for port owner') For -stable v4.9 ('net/mlx5e: Use flow keys dissector to parse packets for ARFS') For -stable v5.1 ('net/mlx5e: Fix false negative indication on tx reporter CQE recovery') ('net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter') ('net/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg off') Note: when merged with net-next this minor conflict will pop up: ++<<<<<<< (net-next) + if (is_eswitch_flow) { + flow->esw_attr->match_level = match_level; + flow->esw_attr->tunnel_match_level = tunnel_match_level; ++======= + if (flow->flags & MLX5E_TC_FLOW_ESWITCH) { + flow->esw_attr->inner_match_level = inner_match_level; + flow->esw_attr->outer_match_level = outer_match_level; ++>>>>>>> (net) To resolve, use hunks from net (2nd) and replace: if (flow->flags & MLX5E_TC_FLOW_ESWITCH) with if (is_eswitch_flow) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-09sock: make cookie generation global instead of per netnsDaniel Borkmann2-3/+2
Generating and retrieving socket cookies are a useful feature that is exposed to BPF for various program types through bpf_get_socket_cookie() helper. The fact that the cookie counter is per netns is quite a limitation for BPF in practice in particular for programs in host namespace that use socket cookies as part of a map lookup key since they will be causing socket cookie collisions e.g. when attached to BPF cgroup hooks or cls_bpf on tc egress in host namespace handling container traffic from veth or ipvlan devices with peer in different netns. Change the counter to be global instead. Socket cookie consumers must assume the value as opqaue in any case. Not every socket must have a cookie generated and knowledge of the counter value itself does not provide much value either way hence conversion to global is fine. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Martynas Pumputis <m@lambda.lt> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-09Merge tag 'drm-fixes-2019-08-09' of git://anongit.freedesktop.org/drm/drmLinus Torvalds1-19/+1
Pull drm fixes from Dave Airlie: "Usual fixes roundup. Nothing too crazy or serious, one non-released ioctl is removed in the amdkfd driver. core: - mode parser strncpy fix i915: - GLK DSI escape clock setting - HDCP memleak fix tegra: - one gpiod/of regression fix amdgpu: - fix VCN to handle the latest navi10 firmware - fix for fan control on navi10 - properly handle SMU metrics table on navi10 - fix a resume regression on Stoney - kfd revert a GWS ioctl vmwgfx: - memory leak fix rockchip: - suspend fix" * tag 'drm-fixes-2019-08-09' of git://anongit.freedesktop.org/drm/drm: drm/vmwgfx: fix memory leak when too many retries have occurred Revert "drm/amdkfd: New IOCTL to allocate queue GWS" Revert "drm/amdgpu: fix transform feedback GDS hang on gfx10 (v2)" drm/amdgpu: pin the csb buffer on hw init for gfx v8 drm/rockchip: Suspend DP late drm/i915: Fix wrong escape clock divisor init for GLK drm/i915: fix possible memory leak in intel_hdcp_auth_downstream() drm/modes: Fix unterminated strncpy drm/amd/powerplay: correct navi10 vcn powergate drm/amd/powerplay: honor hw limit on fetching metrics data for navi10 drm/amd/powerplay: Allow changing of fan_control in smu_v11_0 drm/amd/amdgpu/vcn_v2_0: Move VCN 2.0 specific dec ring test to vcn_v2_0 drm/amd/amdgpu/vcn_v2_0: Mark RB commands as KMD commands drm/tegra: Fix gpiod_get_from_of_node() regression
2019-08-09Merge tag 'sound-5.3-rc4' of ↵Linus Torvalds3-13/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Lots of small fixes at this time since we've received the ASoC fix batch now. - Some coverage in ASoC core mostly for minor issues like NULL checks for DPCM and proper error handling in DAI instantiation - A collection of small device-specific changes in various ASoC codec and platform drivers - OF-tree refcount fixes in a few ASoC drivers - Fixes of memory leaks in the error paths of various ASoC / ALSA drivers - A workaround for a long-standing issue on AMD HD-audio device - Updates of MAINTAINERS, mail addresses, file permission fixups" * tag 'sound-5.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (38 commits) ALSA: firewire: fix a memory leak bug sound: fix a memory leak bug ALSA: hda - Workaround for crackled sound on AMD controller (1022:1457) ALSA: hiface: fix multiple memory leak bugs ALSA: hda - Don't override global PCM hw info flag ALSA: usb-audio: fix a memory leak bug ASoC: max98373: Remove executable bits ASoC: amd: acp3x: use dma address for acp3x dma driver ASoC: amd: acp3x: use dma_ops of parent device for acp3x dma driver ASoC: max98373: add 88200 and 96000 sampling rate support ASoC: sun4i-i2s: Incorrect SR and WSS computation MAINTAINERS: Update Intel ASoC drivers maintainers ASoC: ti: davinci-mcasp: Correct slot_width posed constraint ASoC: rockchip: Fix mono capture ASoC: Intel: Fix some acpi vs apci typo in somme comments ASoC: ti: davinci-mcasp: Fix clk PDIR handling for i2s master mode ASoC: Fail card instantiation if DAI format setup fails ASoC: SOF: Intel: hda: remove misleading error trace from IRQ thread ASoC: qcom: apq8016_sbc: Fix oops with multiple DAI links ASoC: dapm: fix a memory leak bug ...
2019-08-09Merge branch 'linus' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "Fix a number of bugs in the ccp driver" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: ccp - Ignore tag length when decrypting GCM ciphertext crypto: ccp - Add support for valid authsize values less than 16 crypto: ccp - Fix oops by properly managing allocated structures
2019-08-09Merge tag 'kvmarm-fixes-for-5.3-2' of ↵Paolo Bonzini1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm fixes for 5.3, take #2 - Fix our system register reset so that we stop writing non-sensical values to them, and track which registers get reset instead. - Sync VMCR back from the GIC on WFI so that KVM has an exact vue of PMR. - Reevaluate state of HW-mapped, level triggered interrupts on enable.
2019-08-09Merge tag 'kvmarm-fixes-for-5.3' of ↵Paolo Bonzini276-2346/+6668
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm fixes for 5.3 - A bunch of switch/case fall-through annotation, fixing one actual bug - Fix PMU reset bug - Add missing exception class debug strings
2019-08-09netfilter: nf_tables: use-after-free in failing rule with bound setPablo Neira Ayuso1-2/+7
If a rule that has already a bound anonymous set fails to be added, the preparation phase releases the rule and the bound set. However, the transaction object from the abort path still has a reference to the set object that is stale, leading to a use-after-free when checking for the set->bound field. Add a new field to the transaction that specifies if the set is bound, so the abort path can skip releasing it since the rule command owns it and it takes care of releasing it. After this update, the set->bound field is removed. [ 24.649883] Unable to handle kernel paging request at virtual address 0000000000040434 [ 24.657858] Mem abort info: [ 24.660686] ESR = 0x96000004 [ 24.663769] Exception class = DABT (current EL), IL = 32 bits [ 24.669725] SET = 0, FnV = 0 [ 24.672804] EA = 0, S1PTW = 0 [ 24.675975] Data abort info: [ 24.678880] ISV = 0, ISS = 0x00000004 [ 24.682743] CM = 0, WnR = 0 [ 24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000 [ 24.692207] [0000000000040434] pgd=0000000000000000 [ 24.697119] Internal error: Oops: 96000004 [#1] SMP [...] [ 24.889414] Call trace: [ 24.891870] __nf_tables_abort+0x3f0/0x7a0 [ 24.895984] nf_tables_abort+0x20/0x40 [ 24.899750] nfnetlink_rcv_batch+0x17c/0x588 [ 24.904037] nfnetlink_rcv+0x13c/0x190 [ 24.907803] netlink_unicast+0x18c/0x208 [ 24.911742] netlink_sendmsg+0x1b0/0x350 [ 24.915682] sock_sendmsg+0x4c/0x68 [ 24.919185] ___sys_sendmsg+0x288/0x2c8 [ 24.923037] __sys_sendmsg+0x7c/0xd0 [ 24.926628] __arm64_sys_sendmsg+0x2c/0x38 [ 24.930744] el0_svc_common.constprop.0+0x94/0x158 [ 24.935556] el0_svc_handler+0x34/0x90 [ 24.939322] el0_svc+0x8/0xc [ 24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863) [ 24.948336] ---[ end trace cebbb9dcbed3b56f ]--- Fixes: f6ac85858976 ("netfilter: nf_tables: unbind set in rule from commit path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-08net/tls: prevent skb_orphan() from leaking TLS plain text with offloadJakub Kicinski3-1/+20
sk_validate_xmit_skb() and drivers depend on the sk member of struct sk_buff to identify segments requiring encryption. Any operation which removes or does not preserve the original TLS socket such as skb_orphan() or skb_clone() will cause clear text leaks. Make the TCP socket underlying an offloaded TLS connection mark all skbs as decrypted, if TLS TX is in offload mode. Then in sk_validate_xmit_skb() catch skbs which have no socket (or a socket with no validation) and decrypted flag set. Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and sk->sk_validate_xmit_skb are slightly interchangeable right now, they all imply TLS offload. The new checks are guarded by CONFIG_TLS_DEVICE because that's the option guarding the sk_buff->decrypted member. Second, smaller issue with orphaning is that it breaks the guarantee that packets will be delivered to device queues in-order. All TLS offload drivers depend on that scheduling property. This means skb_orphan_partial()'s trick of preserving partial socket references will cause issues in the drivers. We need a full orphan, and as a result netem delay/throttling will cause all TLS offload skbs to be dropped. Reusing the sk_buff->decrypted flag also protects from leaking clear text when incoming, decrypted skb is redirected (e.g. by TC). See commit 0608c69c9a80 ("bpf: sk_msg, sock{map|hash} redirect through ULP") for justification why the internal flag is safe. The only location which could leak the flag in is tcp_bpf_sendmsg(), which is taken care of by clearing the previously unused bit. v2: - remove superfluous decrypted mark copy (Willem); - remove the stale doc entry (Boris); - rely entirely on EOR marking to prevent coalescing (Boris); - use an internal sendpages flag instead of marking the socket (Boris). v3 (Willem): - reorganize the can_skb_orphan_partial() condition; - fix the flag leak-in through tcp_bpf_sendmsg. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08inet: frags: re-introduce skb coalescing for local deliveryGuillaume Nault1-1/+1
Before commit d4289fcc9b16 ("net: IP6 defrag: use rbtrees for IPv6 defrag"), a netperf UDP_STREAM test[0] using big IPv6 datagrams (thus generating many fragments) and running over an IPsec tunnel, reported more than 6Gbps throughput. After that patch, the same test gets only 9Mbps when receiving on a be2net nic (driver can make a big difference here, for example, ixgbe doesn't seem to be affected). By reusing the IPv4 defragmentation code, IPv6 lost fragment coalescing (IPv4 fragment coalescing was dropped by commit 14fe22e33462 ("Revert "ipv4: use skb coalescing in defragmentation"")). Without fragment coalescing, be2net runs out of Rx ring entries and starts to drop frames (ethtool reports rx_drops_no_frags errors). Since the netperf traffic is only composed of UDP fragments, any lost packet prevents reassembly of the full datagram. Therefore, fragments which have no possibility to ever get reassembled pile up in the reassembly queue, until the memory accounting exeeds the threshold. At that point no fragment is accepted anymore, which effectively discards all netperf traffic. When reassembly timeout expires, some stale fragments are removed from the reassembly queue, so a few packets can be received, reassembled and delivered to the netperf receiver. But the nic still drops frames and soon the reassembly queue gets filled again with stale fragments. These long time frames where no datagram can be received explain why the performance drop is so significant. Re-introducing fragment coalescing is enough to get the initial performances again (6.6Gbps with be2net): driver doesn't drop frames anymore (no more rx_drops_no_frags errors) and the reassembly engine works at full speed. This patch is quite conservative and only coalesces skbs for local IPv4 and IPv6 delivery (in order to avoid changing skb geometry when forwarding). Coalescing could be extended in the future if need be, as more scenarios would probably benefit from it. [0]: Test configuration Sender: ip xfrm policy flush ip xfrm state flush ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1 ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir in tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1 ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir out tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow netserver -D -L fc00:2::1 Receiver: ip xfrm policy flush ip xfrm state flush ip xfrm state add src fc00:2::1 dst fc00:1::1 proto esp spi 0x1001 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:2::1 dst fc00:1::1 ip xfrm policy add src fc00:2::1 dst fc00:1::1 dir in tmpl src fc00:2::1 dst fc00:1::1 proto esp mode transport action allow ip xfrm state add src fc00:1::1 dst fc00:2::1 proto esp spi 0x1000 aead 'rfc4106(gcm(aes))' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b 96 mode transport sel src fc00:1::1 dst fc00:2::1 ip xfrm policy add src fc00:1::1 dst fc00:2::1 dir out tmpl src fc00:1::1 dst fc00:2::1 proto esp mode transport action allow netperf -H fc00:2::1 -f k -P 0 -L fc00:1::1 -l 60 -t UDP_STREAM -I 99,5 -i 5,5 -T5,5 -6 Signed-off-by: Guillaume Nault <gnault@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-08net/mlx5e: kTLS, Fix progress params context WQE layoutTariq Toukan1-3/+2
The TLS progress params context WQE should not include an Eth segment, drop it. In addition, align the tls_progress_params layout with the HW specification document: - fix the tisn field name. - remove the valid bit. Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures") Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-08net/mlx5: kTLS, Fix wrong TIS opmod constantsTariq Toukan1-2/+2
Fix the used constants for TLS TIS opmods, per the HW specification. Fixes: a12ff35e0fb7 ("net/mlx5: Introduce TLS TX offload hardware bits and structures") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-08auxdisplay: charlcd: move charlcd.h to drivers/auxdisplayMasahiro Yamada1-39/+0
This header is included in drivers/auxdisplay/. Make it a local header. Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
2019-08-08Merge tag 'drm-fixes-5.3-2019-08-07' of ↵Dave Airlie1-19/+1
git://people.freedesktop.org/~agd5f/linux into drm-fixes drm-fixes-5.3-2019-08-07: amdgpu: - Fixes VCN to handle the latest navi10 firmware - Fixes for fan control on navi10 - Properly handle SMU metrics table on navi10 - Fix a resume regression on Stoney amdkfd: - Revert new GWS ioctl. It's not ready. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190807184221.3323-1-alexander.deucher@amd.com
2019-08-07Revert "drm/amdkfd: New IOCTL to allocate queue GWS"Alex Deucher1-19/+1
This reverts commit 1a058c3376765ee31d65e28cbbb9d4ff15120056. This interface is still in too much flux. Revert until it's sorted out. Acked-by: Oak Zeng <Oak.Zeng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds14-74/+79
Pull networking fixes from David Miller: "Yeah I should have sent a pull request last week, so there is a lot more here than usual: 1) Fix memory leak in ebtables compat code, from Wenwen Wang. 2) Several kTLS bug fixes from Jakub Kicinski (circular close on disconnect etc.) 3) Force slave speed check on link state recovery in bonding 802.3ad mode, from Thomas Falcon. 4) Clear RX descriptor bits before assigning buffers to them in stmmac, from Jose Abreu. 5) Several missing of_node_put() calls, mostly wrt. for_each_*() OF loops, from Nishka Dasgupta. 6) Double kfree_skb() in peak_usb can driver, from Stephane Grosjean. 7) Need to hold sock across skb->destructor invocation, from Cong Wang. 8) IP header length needs to be validated in ipip tunnel xmit, from Haishuang Yan. 9) Use after free in ip6 tunnel driver, also from Haishuang Yan. 10) Do not use MSI interrupts on r8169 chips before RTL8168d, from Heiner Kallweit. 11) Upon bridge device init failure, we need to delete the local fdb. From Nikolay Aleksandrov. 12) Handle erros from of_get_mac_address() properly in stmmac, from Martin Blumenstingl. 13) Handle concurrent rename vs. dump in netfilter ipset, from Jozsef Kadlecsik. 14) Setting NETIF_F_LLTX on mac80211 causes complete breakage with some devices, so revert. From Johannes Berg. 15) Fix deadlock in rxrpc, from David Howells. 16) Fix Kconfig deps of enetc driver, we must have PHYLIB. From Yue Haibing. 17) Fix mvpp2 crash on module removal, from Matteo Croce. 18) Fix race in genphy_update_link, from Heiner Kallweit. 19) bpf_xdp_adjust_head() stopped working with generic XDP when we fixes generic XDP to support stacked devices properly, fix from Jesper Dangaard Brouer. 20) Unbalanced RCU locking in rt6_update_exception_stamp_rt(), from David Ahern. 21) Several memory leaks in new sja1105 driver, from Vladimir Oltean" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (214 commits) net: dsa: sja1105: Fix memory leak on meta state machine error path net: dsa: sja1105: Fix memory leak on meta state machine normal path net: dsa: sja1105: Really fix panic on unregistering PTP clock net: dsa: sja1105: Use the LOCKEDS bit for SJA1105 E/T as well net: dsa: sja1105: Fix broken learning with vlan_filtering disabled net: dsa: qca8k: Add of_node_put() in qca8k_setup_mdio_bus() net: sched: sample: allow accessing psample_group with rtnl net: sched: police: allow accessing police->params with rtnl net: hisilicon: Fix dma_map_single failed on arm64 net: hisilicon: fix hip04-xmit never return TX_BUSY net: hisilicon: make hip04_tx_reclaim non-reentrant tc-testing: updated vlan action tests with batch create/delete net sched: update vlan action for batched events operations net: stmmac: tc: Do not return a fragment entry net: stmmac: Fix issues when number of Queues >= 4 net: stmmac: xgmac: Fix XGMAC selftests be2net: disable bh with spin_lock in be_process_mcc net: cxgb3_main: Fix a resource leak in a error path in 'init_one()' net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER ...
2019-08-06net: sched: sample: allow accessing psample_group with rtnlVlad Buslov1-1/+1
Recently implemented support for sample action in flow_offload infra leads to following rcu usage warning: [ 1938.234856] ============================= [ 1938.234858] WARNING: suspicious RCU usage [ 1938.234863] 5.3.0-rc1+ #574 Not tainted [ 1938.234866] ----------------------------- [ 1938.234869] include/net/tc_act/tc_sample.h:47 suspicious rcu_dereference_check() usage! [ 1938.234872] other info that might help us debug this: [ 1938.234875] rcu_scheduler_active = 2, debug_locks = 1 [ 1938.234879] 1 lock held by tc/19540: [ 1938.234881] #0: 00000000b03cb918 (rtnl_mutex){+.+.}, at: tc_new_tfilter+0x47c/0x970 [ 1938.234900] stack backtrace: [ 1938.234905] CPU: 2 PID: 19540 Comm: tc Not tainted 5.3.0-rc1+ #574 [ 1938.234908] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 1938.234911] Call Trace: [ 1938.234922] dump_stack+0x85/0xc0 [ 1938.234930] tc_setup_flow_action+0xed5/0x2040 [ 1938.234944] fl_hw_replace_filter+0x11f/0x2e0 [cls_flower] [ 1938.234965] fl_change+0xd24/0x1b30 [cls_flower] [ 1938.234990] tc_new_tfilter+0x3e0/0x970 [ 1938.235021] ? tc_del_tfilter+0x720/0x720 [ 1938.235028] rtnetlink_rcv_msg+0x389/0x4b0 [ 1938.235038] ? netlink_deliver_tap+0x95/0x400 [ 1938.235044] ? rtnl_dellink+0x2d0/0x2d0 [ 1938.235053] netlink_rcv_skb+0x49/0x110 [ 1938.235063] netlink_unicast+0x171/0x200 [ 1938.235073] netlink_sendmsg+0x224/0x3f0 [ 1938.235091] sock_sendmsg+0x5e/0x60 [ 1938.235097] ___sys_sendmsg+0x2ae/0x330 [ 1938.235111] ? __handle_mm_fault+0x12cd/0x19e0 [ 1938.235125] ? __handle_mm_fault+0x12cd/0x19e0 [ 1938.235138] ? find_held_lock+0x2b/0x80 [ 1938.235147] ? do_user_addr_fault+0x22d/0x490 [ 1938.235160] __sys_sendmsg+0x59/0xa0 [ 1938.235178] do_syscall_64+0x5c/0xb0 [ 1938.235187] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 1938.235192] RIP: 0033:0x7ff9a4d597b8 [ 1938.235197] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54 [ 1938.235200] RSP: 002b:00007ffcfe381c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 1938.235205] RAX: ffffffffffffffda RBX: 000000005d4497f9 RCX: 00007ff9a4d597b8 [ 1938.235208] RDX: 0000000000000000 RSI: 00007ffcfe381cb0 RDI: 0000000000000003 [ 1938.235211] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006 [ 1938.235214] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000001 [ 1938.235217] R13: 0000000000480640 R14: 0000000000000012 R15: 0000000000000001 Change tcf_sample_psample_group() helper to allow using it from both rtnl and rcu protected contexts. Fixes: a7a7be6087b0 ("net/sched: add sample action to the hardware intermediate representation") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06net: sched: police: allow accessing police->params with rtnlVlad Buslov1-2/+2
Recently implemented support for police action in flow_offload infra leads to following rcu usage warning: [ 1925.881092] ============================= [ 1925.881094] WARNING: suspicious RCU usage [ 1925.881098] 5.3.0-rc1+ #574 Not tainted [ 1925.881100] ----------------------------- [ 1925.881104] include/net/tc_act/tc_police.h:57 suspicious rcu_dereference_check() usage! [ 1925.881106] other info that might help us debug this: [ 1925.881109] rcu_scheduler_active = 2, debug_locks = 1 [ 1925.881112] 1 lock held by tc/18591: [ 1925.881115] #0: 00000000b03cb918 (rtnl_mutex){+.+.}, at: tc_new_tfilter+0x47c/0x970 [ 1925.881124] stack backtrace: [ 1925.881127] CPU: 2 PID: 18591 Comm: tc Not tainted 5.3.0-rc1+ #574 [ 1925.881130] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 1925.881132] Call Trace: [ 1925.881138] dump_stack+0x85/0xc0 [ 1925.881145] tc_setup_flow_action+0x1771/0x2040 [ 1925.881155] fl_hw_replace_filter+0x11f/0x2e0 [cls_flower] [ 1925.881175] fl_change+0xd24/0x1b30 [cls_flower] [ 1925.881200] tc_new_tfilter+0x3e0/0x970 [ 1925.881231] ? tc_del_tfilter+0x720/0x720 [ 1925.881243] rtnetlink_rcv_msg+0x389/0x4b0 [ 1925.881250] ? netlink_deliver_tap+0x95/0x400 [ 1925.881257] ? rtnl_dellink+0x2d0/0x2d0 [ 1925.881264] netlink_rcv_skb+0x49/0x110 [ 1925.881275] netlink_unicast+0x171/0x200 [ 1925.881284] netlink_sendmsg+0x224/0x3f0 [ 1925.881299] sock_sendmsg+0x5e/0x60 [ 1925.881305] ___sys_sendmsg+0x2ae/0x330 [ 1925.881309] ? task_work_add+0x43/0x50 [ 1925.881314] ? fput_many+0x45/0x80 [ 1925.881329] ? __lock_acquire+0x248/0x1930 [ 1925.881342] ? find_held_lock+0x2b/0x80 [ 1925.881347] ? task_work_run+0x7b/0xd0 [ 1925.881359] __sys_sendmsg+0x59/0xa0 [ 1925.881375] do_syscall_64+0x5c/0xb0 [ 1925.881381] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 1925.881384] RIP: 0033:0x7feb245047b8 [ 1925.881388] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54 [ 1925.881391] RSP: 002b:00007ffc2d2a5788 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 1925.881395] RAX: ffffffffffffffda RBX: 000000005d4497ed RCX: 00007feb245047b8 [ 1925.881398] RDX: 0000000000000000 RSI: 00007ffc2d2a57f0 RDI: 0000000000000003 [ 1925.881400] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006 [ 1925.881403] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000001 [ 1925.881406] R13: 0000000000480640 R14: 0000000000000012 R15: 0000000000000001 Change tcf_police_rate_bytes_ps() and tcf_police_tcfp_burst() helpers to allow using them from both rtnl and rcu protected contexts. Fixes: 8c8cfc6ed274 ("net/sched: add police action to the hardware intermediate representation") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06Merge tag 'asoc-fix-v5.3-rc3' of ↵Takashi Iwai3-13/+21
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v5.3 A relatively large batch of mostly unremarkable fixes here, a couple of small core fixes for fairly obscure issues, more comment/email updates with no code impact than usual and a bunch of small driver fixes. The support for new sample rates in the max98373 driver is a fix for the fact that the driver declared support for those rates but would in fact return an error if these rates were selected.