summaryrefslogtreecommitdiffstats
path: root/drivers
AgeCommit message (Collapse)AuthorFilesLines
2020-02-13bcache: ignore pending signals when creating gc and allocator threadColy Li2-2/+29
When run a cache set, all the bcache btree node of this cache set will be checked by bch_btree_check(). If the bcache btree is very large, iterating all the btree nodes will occupy too much system memory and the bcache registering process might be selected and killed by system OOM killer. kthread_run() will fail if current process has pending signal, therefore the kthread creating in run_cache_set() for gc and allocator kernel threads are very probably failed for a very large bcache btree. Indeed such OOM is safe and the registering process will exit after the registration done. Therefore this patch flushes pending signals during the cache set start up, specificly in bch_cache_allocator_start() and bch_gc_thread_start(), to make sure run_cache_set() won't fail for large cahced data set. Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-02-13RDMA/rxe: Fix soft lockup problem due to using tasklets in softirqZhu Yanjun1-4/+4
When run stress tests with RXE, the following Call Traces often occur watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0] ... Call Trace: <IRQ> create_object+0x3f/0x3b0 kmem_cache_alloc_node_trace+0x129/0x2d0 __kmalloc_reserve.isra.52+0x2e/0x80 __alloc_skb+0x83/0x270 rxe_init_packet+0x99/0x150 [rdma_rxe] rxe_requester+0x34e/0x11a0 [rdma_rxe] rxe_do_task+0x85/0xf0 [rdma_rxe] tasklet_action_common.isra.21+0xeb/0x100 __do_softirq+0xd0/0x298 irq_exit+0xc5/0xd0 smp_apic_timer_interrupt+0x68/0x120 apic_timer_interrupt+0xf/0x20 </IRQ> ... The root cause is that tasklet is actually a softirq. In a tasklet handler, another softirq handler is triggered. Usually these softirq handlers run on the same cpu core. So this will cause "soft lockup Bug". Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20200212072635.682689-8-leon@kernel.org Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13RDMA/mlx5: Prevent overflow in mmap offset calculationsLeon Romanovsky1-2/+2
The cmd and index variables declared as u16 and the result is supposed to be stored in u64. The C arithmetic rules doesn't promote "(index >> 8) << 16" to be u64 and leaves the end result to be u16. Fixes: 7be76bef320b ("IB/mlx5: Introduce VAR object and its alloc/destroy methods") Link: https://lore.kernel.org/r/20200212072635.682689-10-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13IB/umad: Fix kernel crash while unloading ib_umadYonatan Cohen1-2/+3
When disassociating a device from umad we must ensure that the sysfs access is prevented before blocking the fops, otherwise assumptions in syfs don't hold: CPU0 CPU1 ib_umad_kill_port() ibdev_show() port->ib_dev = NULL dev_name(port->ib_dev) The prior patch made an error in moving the device_destroy(), it should have been split into device_del() (above) and put_device() (below). At this point we already have the split, so move the device_del() back to its original place. kernel stack PF: error_code(0x0000) - not-present page Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI RIP: 0010:ibdev_show+0x18/0x50 [ib_umad] RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000 RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870 RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000 R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40 R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58 FS: 00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0 Call Trace: dev_attr_show+0x15/0x50 sysfs_kf_seq_show+0xb8/0x1a0 seq_read+0x12d/0x350 vfs_read+0x89/0x140 ksys_read+0x55/0xd0 do_syscall_64+0x55/0x1b0 entry_SYSCALL_64_after_hwframe+0x44/0xa9: Fixes: cf7ad3030271 ("IB/umad: Avoid destroying device while it is accessed") Link: https://lore.kernel.org/r/20200212072635.682689-9-leon@kernel.org Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13RDMA/mlx5: Fix async events cleanup flowsYishai Hadas1-23/+28
As in the prior patch, the devx code is not fully cleaning up its event_lists before finishing driver_destroy allowing a later read to trigger user after free conditions. Re-arrange things so that the event_list is always empty after destroy and ensure it remains empty until the file is closed. Fixes: f7c8416ccea5 ("RDMA/core: Simplify destruction of FD uobjects") Link: https://lore.kernel.org/r/20200212072635.682689-7-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13RDMA/core: Add missing list deletion on freeing event queueMichael Guralnik1-0/+1
When the uobject file scheme was revised to allow device disassociation from the file it became possible for read() to still happen the driver destroys the uobject. The old clode code was not tolerant to concurrent read, and when it was moved to the driver destroy it creates a bug. Ensure the event_list is empty after driver destroy by adding the missing list_del(). Otherwise read() can trigger a use after free and double kfree. Fixes: f7c8416ccea5 ("RDMA/core: Simplify destruction of FD uobjects") Link: https://lore.kernel.org/r/20200212072635.682689-6-leon@kernel.org Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-02-13EDAC/sysfs: Remove csrow objects on errorsRobert Richter1-2/+1
All created csrow objects must be removed in the error path of edac_create_csrow_objects(). The objects have been added as devices. They need to be removed by doing a device_del() *and* put_device() call to also free their memory. The missing put_device() leaves a memory leak. Use device_unregister() instead of device_del() which properly unregisters the device doing both. Fixes: 7adc05d2dc3a ("EDAC/sysfs: Drop device references properly") Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: John Garry <john.garry@huawei.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200212120340.4764-4-rrichter@marvell.com
2020-02-13EDAC/mc: Fix use-after-free and memleaks during device removalRobert Richter2-21/+6
A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and DEBUG_KMEMLEAK set, revealed several issues when removing an mci device: 1) Use-after-free: On 27.11.19 17:07:33, John Garry wrote: > [ 22.104498] BUG: KASAN: use-after-free in > edac_remove_sysfs_mci_device+0x148/0x180 The use-after-free is caused by the mci_for_each_dimm() macro called in edac_remove_sysfs_mci_device(). The iterator was introduced with c498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator"). The iterator loop calls device_unregister(&dimm->dev), which removes the sysfs entry of the device, but also frees the dimm struct in dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(), the dimm struct is accessed again, after having been freed already. The fix is to free all the mci device's subsequent dimm and csrow objects at a later point, in _edac_mc_free(), when the mci device itself is being freed. This keeps the data structures intact and the mci device can be fully used until its removal. The change allows the safe usage of mci_for_each_dimm() to release dimm devices from sysfs. 2) Memory leaks: Following memory leaks have been detected: # grep edac /sys/kernel/debug/kmemleak | sort | uniq -c 1 [<000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0 # mci->csrows 16 [<00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0 # csr->channels 16 [<00000000e2734dba>] edac_mc_alloc+0x518/0x9d0 # csr->channels[chn] 1 [<00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0 # mci->dimms 34 [<00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc() All leaks are from memory allocated by edac_mc_alloc(). Note: The test above shows that edac_mc_alloc() was called here from ghes_edac_register(), thus both functions show up in the stack trace but the module causing the leaks is edac_mc. The comments with the data structures involved were made manually by analyzing the objdump. The data structures listed above and created by edac_mc_alloc() are not properly removed during device removal, which is done in edac_mc_free(). There are two paths implemented to remove the device depending on device registration, _edac_mc_free() is called if the device is not registered and edac_unregister_sysfs() otherwise. The implemenations differ. For the sysfs case, the mci device removal lacks the removal of subsequent data structures (csrows, channels, dimms). This causes the memory leaks (see mci_attr_release()). [ bp: Massage commit message. ] Fixes: c498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator") Fixes: faa2ad09c01c ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.") Fixes: 7a623c039075 ("edac: rewrite the sysfs code to use struct device") Reported-by: John Garry <john.garry@huawei.com> Signed-off-by: Robert Richter <rrichter@marvell.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: John Garry <john.garry@huawei.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com
2020-02-12hwmon: (pmbus/xdpe12284) fix typo in compatible stringsJohan Hovold1-2/+2
Make sure that the driver compatible strings matches the binding by removing the space between the manufacturer and model. Fixes: aaafb7c8eb1c ("hwmon: (pmbus) Add support for Infineon Multi-phase xdpe122 family controllers") Cc: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20200212092426.24012-1-johan@kernel.org Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2020-02-12ice: Trivial fixesTony Nguyen11-29/+31
This is a collection of trivial fixes including fixing whitespace, typos, function headers, reverse Christmas tree, etc. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12ice: Use correct netif error functionBen Shelton1-1/+1
Use the correct netif_msg_[tx,rx]_error() function to determine whether to print the MDD event type. Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12ice: Cleanup ice_vsi_alloc_q_vectorsAnirudh Venkataramanan1-8/+3
1. Remove local variable num_q_vectors and use vsi->num_q_vectors instead 2. Remove local variable pf and pass vsi->back to ice_pf_to_dev Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12ice: Make print statements more compactAnirudh Venkataramanan8-227/+123
Formatting strings in print function calls (like dev_info, dev_err, etc.) can exceed 80 columns without making checkpatch unhappy. So remove newlines where applicable and make print statements more compact. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12ice: Use ice_pf_to_devAnirudh Venkataramanan6-27/+27
Use ice_pf_to_dev(pf) instead of &pf->pdev->dev Use ice_pf_to_dev(vsi->back) instead of &vsi->back->pdev->dev When a pointer to the pf instance is available, use ice_pf_to_dev instead of ice_hw_to_dev Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12ice: Remove possible null dereferenceTony Nguyen1-2/+0
Commit 1f45ebe0d8fb ("ice: add extra check for null Rx descriptor") moved the call to ice_construct_skb() under a null check as Coverity reported a possible use of null skb. However, the original call was not deleted, do so now. Fixes: 1f45ebe0d8fb ("ice: add extra check for null Rx descriptor") Reported-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12ice: update Unit Load Status bitmask to check after resetBruce Allan2-5/+18
After a reset the Unit Load Status bits in the GLNVM_ULD register to check for completion should be 0x7FF before continuing. Update the mask to check (minus the three reserved bits that are always set). Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12ice: fix and consolidate logging of NVM/firmware version informationBruce Allan4-28/+13
Logging the firmware/NVM information during driver load is redundant since that information is also available via ethtool. Move the functionality found in ice_nvm_version_str() directly into ice_get_drvinfo() and remove calling the former and logging that info during driver probe. This also gets rid of a bug in ice_nvm_version_str() where it returns a pointer to a buffer which is free'ed when that function exits. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12ice: Modify link message loggingAkeem G Abodunrin1-1/+1
This patch modifies link message logging to include "Full Duplex" and "Negotiated" for FEC, so as to distinguish it from "Requested" FEC. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12ice: Remove CONFIG_PCI_IOV wrap in ice_set_pf_capsAnirudh Venkataramanan1-2/+0
Remove unnecessary CONFIG_PCI_IOV wrapping in ice_set_pf_caps. None of the data structures accessed within the block are wrapped with this flag. When CONFIG_PCI_IOV is undefined, pf->num_vfs_supported will be 0 anyway. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12ice: Remove ice_dev_onetime_setup()Brett Creeley4-20/+0
ice_dev_onetime_setup contains driver workarounds needed for firmware limitations. These issues have now been resolved in newer NVMs so remove the function. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12ice: Don't allow same value for Rx tail to be written twiceBrett Creeley1-1/+1
Currently we compare the value we are about to write to the Rx tail register with the previous value of next_to_use. The problem with this is we only write tail on 8 descriptor boundaries, but next_to_use is updated whenever we clean Rx descriptors. Fix this by comparing the value we are about to write to tail with the previously written tail value. This will prevent duplicate Rx tail bumps. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12ice: display supported and advertised link modesPaul Greenwalt1-280/+2
Display all of the supported and advertised link modes based on the PHY capability with media. Displaying all supported modes is more informative then only displaying the current link mode. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12ice: Fix switch between FW and SW LLDPDave Ertman1-5/+5
When switching between FW and SW LLDP mode, the number of configured TLV apps in the driver's DCB configuration is getting out of synch with what lldpad thinks is configured. This is causing a problem when shutting down lldpad. The cleanup is trying to delete TLV apps that are not defined in the kernel. Since the driver is keeping an accurate account of the apps defined, use the drivers number of apps to determine if there is an app to delete. If the number of apps is <= 1, then do not attempt to delete. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12ice: Fix DCB rebuild after resetDave Ertman2-53/+36
The function ice_dcb_rebuild had some logic flaws in it, and also didn't differentiate between FW and SW modes needs. For FW flow, the willing setting was being forced to OFF and left that way. Unwilling in DCB FW mode is not a supported model. Leave the config alone and use the return value from the set command to determine if setting the config was successful. The SW DCB flow does not need to need to register for MIB change events (as they are not used in SW mode). Use !is_sw_lldp checks to only perform FW specific task while in FW mode. Also adding a reapplication of the current DCB config after a link event. Some NVMs are not maintaining their DCB configs across link events. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12net: ethernet: ave: Add capability of rgmii-id modeKunihiko Hayashi1-0/+9
This allows you to specify the type of rgmii-id that will enable phy internal delay in ethernet phy-mode. This adds all RGMII cases to all of get_pinmode() except LD11, because LD11 SoC doesn't support RGMII due to the constraint of the hardware. When RGMII phy mode is specified in the devicetree for LD11, the driver will abort with an error. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-12enic: prevent waking up stopped tx queues over watchdog resetFiro Yang1-1/+1
Recent months, our customer reported several kernel crashes all preceding with following message: NETDEV WATCHDOG: eth2 (enic): transmit queue 0 timed out Error message of one of those crashes: BUG: unable to handle kernel paging request at ffffffffa007e090 After analyzing severl vmcores, I found that most of crashes are caused by memory corruption. And all the corrupted memory areas are overwritten by data of network packets. Moreover, I also found that the tx queues were enabled over watchdog reset. After going through the source code, I found that in enic_stop(), the tx queues stopped by netif_tx_disable() could be woken up over a small time window between netif_tx_disable() and the napi_disable() by the following code path: napi_poll-> enic_poll_msix_wq-> vnic_cq_service-> enic_wq_service-> netif_wake_subqueue(enic->netdev, q_number)-> test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &txq->state) In turn, upper netowrk stack could queue skb to ENIC NIC though enic_hard_start_xmit(). And this might introduce some race condition. Our customer comfirmed that this kind of kernel crash doesn't occur over 90 days since they applied this patch. Signed-off-by: Firo Yang <firo.yang@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-12drm/i915: Mark the removal of the i915_request from the sched.linkChris Wilson1-0/+2
Keep the rq->fence.flags consistent with the status of the rq->sched.link, and clear the associated bits when decoupling the link on retirement (as we may wish to inspect those flags independent of other state). Fixes: c3f1ed90e6ff ("drm/i915/gt: Allow temporary suspension of inflight requests") References: https://gitlab.freedesktop.org/drm/intel/issues/997 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-3-chris@chris-wilson.co.uk (cherry picked from commit b4a9a149f91ea345da76bcfe3f8a39715ac346a6) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12drm/i915/execlists: Reclaim the hanging virtual requestChris Wilson2-0/+185
If we encounter a hang on a virtual engine, as we process the hang the request may already have been moved back to the virtual engine (we are processing the hang on the physical engine). We need to reclaim the request from the virtual engine so that the locking is consistent and local to the real engine on which we will hold the request for error state capturing. v2: Pull the reclamation into execlists_hold() and assert that cannot be called from outside of the reset (i.e. with the tasklet disabled). v3: Added selftest v4: Drop the reference owned by the virtual engine Fixes: ad18ba7b5eeb ("drm/i915/execlists: Offline error capture") Testcase: igt/gem_exec_balancer/hang Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-2-chris@chris-wilson.co.uk (cherry picked from commit 989df3a7bd2abe566521e61d1aebf603eb013b7f) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12drm/i915/execlists: Take a reference while capturing the guilty requestChris Wilson2-10/+32
Thanks to preempt-to-busy, we leave the request on the HW as we submit the preemption request. This means that the request may complete at any moment as we process HW events, and in particular the request may be retired as we are planning to capture it for a preemption timeout. Be more careful while obtaining the request to capture after a preemption timeout, and check to see if it completed before we were able to put it on the on-hold list. If we do see it did complete just before we capture the request, proclaim the preemption-timeout a false positive and pardon the reset as we should hit an arbitration point momentarily and so be able to process the preemption. Note that even after we move the request to be on hold it may be retired (as the reset to stop the HW comes after), so we do require to hold our own reference as we work on the request for capture (and all of the peeking at state within the request needs to be carefully protected). Fixes: c3f1ed90e6ff ("drm/i915/gt: Allow temporary suspension of inflight requests") Closes: https://gitlab.freedesktop.org/drm/intel/issues/997 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200122140243.495621-1-chris@chris-wilson.co.uk (cherry picked from commit 4ba5c086a1d8e38d6927967ae1a3271a6ab7a927) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12drm/i915/execlists: Offline error captureChris Wilson1-2/+120
Currently, we skip error capture upon forced preemption. We apply forced preemption when there is a higher priority request that should be running but is being blocked, and we skip inline error capture so that the preemption request is not further delayed by a user controlled capture -- extending the denial of service. However, preemption reset is also used for heartbeats and regular GPU hangs. By skipping the error capture, we remove the ability to debug GPU hangs. In order to capture the error without delaying the preemption request further, we can do an out-of-line capture by removing the guilty request from the execution queue and scheduling a worker to dump that request. When removing a request, we need to remove the entire context and all descendants from the execution queue, so that they do not jump past. Closes: https://gitlab.freedesktop.org/drm/intel/issues/738 Fixes: 3a7a92aba8fb ("drm/i915/execlists: Force preemption") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-3-chris@chris-wilson.co.uk (cherry picked from commit 748317386afb235e11616098d2c7772e49776b58) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12drm/i915/gt: Allow temporary suspension of inflight requestsChris Wilson5-6/+321
In order to support out-of-line error capture, we need to remove the active request from HW and put it to one side while a worker compresses and stores all the details associated with that request. (As that compression may take an arbitrary user-controlled amount of time, we want to let the engine continue running on other workloads while the hanging request is dumped.) Not only do we need to remove the active request, but we also have to remove its context and all requests that were dependent on it (both in flight, queued and future submission). Finally once the capture is complete, we need to be able to resubmit the request and its dependents and allow them to execute. v2: Replace stack recursion with a simple list. v3: Check all the parents, not just the first, when searching for a stuck ancestor! References: https://gitlab.freedesktop.org/drm/intel/issues/738 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-2-chris@chris-wilson.co.uk (cherry picked from commit 32ff621fd74496f0c33644125fb69ff175859b1f) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12drm/i915: Keep track of request among the scheduling listsChris Wilson4-18/+38
If we keep track of when the i915_request.sched.link is on the HW runlist, or in the priority queue we can simplify our interactions with the request (such as during rescheduling). This also simplifies the next patch where we introduce a new in-between list, for requests that are ready but neither on the run list or in the queue. v2: Update i915_sched_node.link explanation for current usage where it is a link on both the queue and on the runlists. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200116184754.2860848-1-chris@chris-wilson.co.uk (cherry picked from commit 672c368f9398042b629740cc9816e8e939eff2db) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12Merge tag 'gvt-fixes-2020-02-12' of https://github.com/intel/gvt-linux into ↵Jani Nikula2-2/+6
drm-intel-next-fixes gvt-fixes-2020-02-12 - fix possible high-order allocation fail for late load (Igor) - fix one missed lock for ppgtt mm LRU list (Igor) Signed-off-by: Jani Nikula <jani.nikula@intel.com> From: Zhenyu Wang <zhenyuw@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200212065912.GB4997@zhen-hp.sh.intel.com
2020-02-12drm/i915/gem: Tighten checks and acquiring the mmap objectChris Wilson2-30/+21
Make sure we hold the rcu lock as we acquire the rcu protected reference of the object when looking it up from the associated mmap vma. Closes: https://gitlab.freedesktop.org/drm/intel/issues/1083 Fixes: cc662126b413 ("drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200130143931.1906301-1-chris@chris-wilson.co.uk (cherry picked from commit 280d14a69da2e71f43408537c008f2775d5e5360) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12drm/i915: Fix preallocated barrier list appendJosé Roberto de Souza1-9/+10
Only the first and the last nodes were being added to ref->preallocated_barriers. Renaming variables to make it more easy to read. Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200129232345.84512-1-jose.souza@intel.com (cherry picked from commit d4c3c0b8221a72107eaf35c80c40716b81ca463e) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12drm/i915/gt: Acquire ce->active before ce->pin_count/ce->pin_mutexChris Wilson2-21/+31
Similar to commit ac0e331a628b ("drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release") we have the same race of trying to pin the context underneath a mutex while allowing the decrement to be atomic outside of that mutex. This leads to the problem where two threads may simultaneously try to pin the context and the second not notice that they needed to repin the context. <2> [198.669621] kernel BUG at drivers/gpu/drm/i915/gt/intel_timeline.c:387! <4> [198.669703] invalid opcode: 0000 [#1] PREEMPT SMP PTI <4> [198.669712] CPU: 0 PID: 1246 Comm: gem_exec_create Tainted: G U W 5.5.0-rc6-CI-CI_DRM_7755+ #1 <4> [198.669723] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017 <4> [198.669776] RIP: 0010:timeline_advance+0x7b/0xe0 [i915] <4> [198.669785] Code: 00 48 c7 c2 10 f1 46 a0 48 c7 c7 70 1b 32 a0 e8 bb dd e7 e0 bf 01 00 00 00 e8 d1 af e7 e0 31 f6 bf 09 00 00 00 e8 35 ef d8 e0 <0f> 0b 48 c7 c1 48 fa 49 a0 ba 84 01 00 00 48 c7 c6 10 f1 46 a0 48 <4> [198.669803] RSP: 0018:ffffc900004c3a38 EFLAGS: 00010296 <4> [198.669810] RAX: ffff888270b35140 RBX: ffff88826f32ee00 RCX: 0000000000000006 <4> [198.669818] RDX: 00000000000017c5 RSI: 0000000000000000 RDI: 0000000000000009 <4> [198.669826] RBP: ffffc900004c3a64 R08: 0000000000000000 R09: 0000000000000000 <4> [198.669834] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88826f9b5980 <4> [198.669841] R13: 0000000000000cc0 R14: ffffc900004c3dc0 R15: ffff888253610068 <4> [198.669849] FS: 00007f63e663fe40(0000) GS:ffff888276c00000(0000) knlGS:0000000000000000 <4> [198.669857] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [198.669864] CR2: 00007f171f8e39a8 CR3: 000000026b1f6005 CR4: 00000000003606f0 <4> [198.669872] Call Trace: <4> [198.669924] intel_timeline_get_seqno+0x12/0x40 [i915] <4> [198.669977] __i915_request_create+0x76/0x5a0 [i915] <4> [198.670024] i915_request_create+0x86/0x1c0 [i915] <4> [198.670068] i915_gem_do_execbuffer+0xbf2/0x2500 [i915] <4> [198.670082] ? __lock_acquire+0x460/0x15d0 <4> [198.670128] i915_gem_execbuffer2_ioctl+0x11f/0x470 [i915] <4> [198.670171] ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915] <4> [198.670181] drm_ioctl_kernel+0xa7/0xf0 <4> [198.670188] drm_ioctl+0x2e1/0x390 <4> [198.670233] ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915] Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin") References: ac0e331a628b ("drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200127152829.2842149-1-chris@chris-wilson.co.uk (cherry picked from commit e5429340bfa2dc43a07c3329e0c30cdae4cc0b35) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_releaseChris Wilson1-7/+9
As we use a mutex to serialise the first acquire (as it may be a lengthy operation), but only an atomic decrement for the release, we have to be careful in case a second thread races and completes both acquire/release as the first finishes its acquire. Thread A Thread B i915_active_acquire i915_active_acquire atomic_read() == 0 atomic_read() == 0 mutex_lock() mutex_lock() atomic_read() == 0 ref->active(); atomic_inc() mutex_unlock() atomic_read() == 1 i915_active_release atomic_dec_and_test() -> 0 ref->retire() atomic_inc() -> 1 mutex_unlock() So thread A has acquired the ref->active_count but since the ref was still active at the time, it did not initialise it. By switching the check inside the mutex to an atomic increment only if already active, we close the race. Fixes: c9ad602feabe ("drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200126102346.1877661-3-chris@chris-wilson.co.uk (cherry picked from commit ac0e331a628b5ded087eab09fad2ffb082ac61ba) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-12drm/i915: Stub out i915_gpu_coredump_putChris Wilson1-0/+4
i915_gpu_coreddump_put is currently only defined if CONFIG_DRM_I915_CAPTURE_ERROR is enabled, provide a stub otherwise. Reported-by: Mike Lothian <mike@fireburn.co.uk> Fixes: 742379c0c400 ("drm/i915: Start chopping up the GPU error capture") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mike Lothian <mike@fireburn.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200124192255.541355-1-chris@chris-wilson.co.uk (cherry picked from commit 7e36505d0cf82f2920f2fd22ebb14a8b540396a3) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-02-11net: ena: ena-com.c: prevent NULL pointer dereferenceArthur Kiyanovski1-0/+5
comp_ctx can be NULL in a very rare case when an admin command is executed during the execution of ena_remove(). The bug scenario is as follows: * ena_destroy_device() sets the comp_ctx to be NULL * An admin command is executed before executing unregister_netdev(), this can still happen because our device can still receive callbacks from the netdev infrastructure such as ethtool commands. * When attempting to access the comp_ctx, the bug occurs since it's set to NULL Fix: Added a check that comp_ctx is not NULL Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-11net: ena: ethtool: use correct value for crc32 hashSameeh Jubran1-2/+2
Up till kernel 4.11 there was no enum defined for crc32 hash in ethtool, thus the xor enum was used for supporting crc32. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-11net: ena: make ena rxfh support ETH_RSS_HASH_NO_CHANGEArthur Kiyanovski3-0/+16
As the name suggests ETH_RSS_HASH_NO_CHANGE is received upon changing the key or indirection table using ethtool while keeping the same hash function. Also add a function for retrieving the current hash function from the ena-com layer. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Saeed Bshara <saeedb@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-11net: ena: fix corruption of dev_idx_to_host_tblArthur Kiyanovski1-28/+0
The function ena_com_ind_tbl_convert_from_device() has an overflow bug as explained below. Either way, this function is not needed at all since we don't retrieve the indirection table from the device at any point which means that this conversion is not needed. The bug: The for loop iterates over all io_sq_queues, when passing the actual number of used queues the io_sq_queues[i].idx equals 0 since they are uninitialized which results in the following code to be executed till the end of the loop: dev_idx_to_host_tbl[0] = i; This results dev_idx_to_host_tbl[0] in being equal to ENA_TOTAL_NUM_QUEUES - 1. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-11net: ena: fix incorrectly saving queue numbers when setting RSS indirection ↵Arthur Kiyanovski2-1/+25
table The indirection table has the indices of the Rx queues. When we store it during set indirection operation, we convert the indices to our internal representation of the indices. Our internal representation of the indices is: even indices for Tx and uneven indices for Rx, where every Tx/Rx pair are in a consecutive order starting from 0. For example if the driver has 3 queues (3 for Tx and 3 for Rx) then the indices are as follows: 0 1 2 3 4 5 Tx Rx Tx Rx Tx Rx The BUG: The issue is that when we satisfy a get request for the indirection table, we don't convert the indices back to the original representation. The FIX: Simply apply the inverse function for the indices of the indirection table after we set it. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-11net: ena: rss: store hash function as values and not bitsArthur Kiyanovski1-1/+5
The device receives, stores and retrieves the hash function value as bits and not as their enum value. The bug: * In ena_com_set_hash_function() we set cmd.u.flow_hash_func.selected_func to the bit value of rss->hash_func. (1 << rss->hash_func) * In ena_com_get_hash_function() we retrieve the hash function and store it's bit value in rss->hash_func. (Now the bit value of rss->hash_func is stored in rss->hash_func instead of it's enum value) The fix: This commit fixes the issue by converting the retrieved hash function values from the device to the matching enum value of the set bit using ffs(). ffs() finds the first set bit's index in a word. Since the function returns 1 for the LSB's index, we need to subtract 1 from the returned value (note that BIT(0) is 1). Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-11net: ena: rss: fix failure to get indirection tableSameeh Jubran1-0/+14
On old hardware, getting / setting the hash function is not supported while gettting / setting the indirection table is. This commit enables us to still show the indirection table on older hardwares by setting the hash function and key to NULL. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-11net: ena: rss: do not allocate key when not supportedSameeh Jubran1-3/+21
Currently we allocate the key whether the device supports setting the key or not. This commit adds a check to the allocation function and handles the error accordingly. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-11net: ena: fix incorrect default RSS keyArthur Kiyanovski2-0/+16
Bug description: When running "ethtool -x <if_name>" the key shows up as all zeros. When we use "ethtool -X <if_name> hfunc toeplitz hkey <some:random:key>" to set the key and then try to retrieve it using "ethtool -x <if_name>" then we return the correct key because we return the one we saved. Bug cause: We don't fetch the key from the device but instead return the key that we have saved internally which is by default set to zero upon allocation. Fix: This commit fixes the issue by initializing the key to a random value using netdev_rss_key_fill(). Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-11net: ena: add missing ethtool TX timestamping indicationArthur Kiyanovski1-0/+1
Current implementation of the driver calls skb_tx_timestamp()to add a software tx timestamp to the skb, however the software-transmit capability is not reported in ethtool -T. This commit updates the ethtool structure to report the software-transmit capability in ethtool -T using the standard ethtool_op_get_ts_info(). This function reports all software timestamping capabilities (tx and rx), as well as setting phc_index = -1. phc_index is the index of the PTP hardware clock device that will be used for hardware timestamps. Since we don't have such a device in ENA, using the default -1 value is the correct setting. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Ezequiel Lara Gomez <ezegomez@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-11net: ena: fix uses of round_jiffies()Arthur Kiyanovski1-3/+3
>From the documentation of round_jiffies(): "Rounds a time delta in the future (in jiffies) up or down to (approximately) full seconds. This is useful for timers for which the exact time they fire does not matter too much, as long as they fire approximately every X seconds. By rounding these timers to whole seconds, all such timers will fire at the same time, rather than at various times spread out. The goal of this is to have the CPU wake up less, which saves power." There are 2 parts to this patch: ================================ Part 1: ------- In our case we need timer_service to be called approximately every X=1 seconds, and the exact time does not matter, so using round_jiffies() is the right way to go. Therefore we add round_jiffies() to the mod_timer() in ena_timer_service(). Part 2: ------- round_jiffies() is used in check_for_missing_keep_alive() when getting the jiffies of the expiration of the keep_alive timeout. Here it is actually a mistake to use round_jiffies() because we want the exact time when keep_alive should expire and not an approximate rounded time, which can cause early, false positive, timeouts. Therefore we remove round_jiffies() in the calculation of keep_alive_expired() in check_for_missing_keep_alive(). Fixes: 82ef30f13be0 ("net: ena: add hardware hints capability to the driver") Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-11net: ena: fix potential crash when rxfh key is NULLArthur Kiyanovski1-8/+9
When ethtool -X is called without an hkey, ena_com_fill_hash_function() is called with key=NULL, which is passed to memcpy causing a crash. This commit fixes this issue by checking key is not NULL. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>