summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/hfi1
AgeCommit message (Collapse)AuthorFilesLines
2016-10-09Merge tag 'for-linus' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull main rdma updates from Doug Ledford: "This is the main pull request for the rdma stack this release. The code has been through 0day and I had it tagged for linux-next testing for a couple days. Summary: - updates to mlx5 - updates to mlx4 (two conflicts, both minor and easily resolved) - updates to iw_cxgb4 (one conflict, not so obvious to resolve, proper resolution is to keep the code in cxgb4_main.c as it is in Linus' tree as attach_uld was refactored and moved into cxgb4_uld.c) - improvements to uAPI (moved vendor specific API elements to uAPI area) - add hns-roce driver and hns and hns-roce ACPI reset support - conversion of all rdma code away from deprecated create_singlethread_workqueue - security improvement: remove unsafe ib_get_dma_mr (breaks lustre in staging)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits) staging/lustre: Disable InfiniBand support iw_cxgb4: add fast-path for small REG_MR operations cxgb4: advertise support for FR_NSMR_TPTE_WR IB/core: correctly handle rdma_rw_init_mrs() failure IB/srp: Fix infinite loop when FMR sg[0].offset != 0 IB/srp: Remove an unused argument IB/core: Improve ib_map_mr_sg() documentation IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets IB/mthca: Move user vendor structures IB/nes: Move user vendor structures IB/ocrdma: Move user vendor structures IB/mlx4: Move user vendor structures IB/cxgb4: Move user vendor structures IB/cxgb3: Move user vendor structures IB/mlx5: Move and decouple user vendor structures IB/{core,hw}: Add constant for node_desc ipoib: Make ipoib_warn ratelimited IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue IB/ipoib: Remove deprecated create_singlethread_workqueue ...
2016-10-07IB/{core,hw}: Add constant for node_descYuval Shaia1-1/+2
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Fix trace of atomic ackMike Marciniszyn2-3/+3
The length is incorrect, causing the trace data to be truncated. Add the additional 8 bytes that should have been there. Also trace out the atomic ack in hex to aid debugging. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Update SMA ingress checks for response packetsJianxin Xiong1-25/+24
Fix "unsupported method" error by skipping ingress pkey checks on response SMA packets. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Use EPROM platform configuration readDean Luick2-15/+26
The driver will now try to read directly from the EPROM as its first choice for the platform configuration file. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Add ability to read platform config from the EPROMDean Luick2-0/+84
Add a function to read the platform configuration file from the EPROM. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Restore EPROM read abilityDean Luick3-3/+103
Partially revert commit d07903174202 ("IB/hfi1: Remove EPROM functionality from data device"), bringing back the ability to read from the EPROM. This code will be used for driver-only acccess to the EPROM, hence change EPROM read to save to a buffer instead of copy touser. Also allow any offset and remove missed includes and leftover declarations. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Add new debugfs sdma_cpu_list fileTadeusz Struk3-0/+83
Add a debugfs sdma_cpu_list file that can be used to examine the CPU to sdma engine assignments for the whole device. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Add irq affinity notification handlerTadeusz Struk2-10/+103
This patch adds an irq affinity notification handler. When a user changes interrupt affinity settings for an sdma engine, the driver needs to make changes to its internal sde structures and also update the affinity_hint. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Add a new VL sysfs attribute for sdma enginesTadeusz Struk1-1/+14
This patch adds a read-only "VL" attribute for the sysfs entry of each sdma engine. It will allow the user to check VL to sdma engine mappings. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Add sysfs interface for affinity setupTadeusz Struk5-7/+412
Some users want more control over which cpu cores are being used by the driver. For example, users might want to restrict the driver to some specified subset of the cores so that they can appropriately partition processes, irq handlers, and work threads. To allow the user to fine tune system affinity settings new sysfs attributes are introduced per sdma engine. This patch adds a new attribute type for sdma engine and a new cpu_list attribute. When the user writes a cpu range to the cpu_list attribute the driver will create an internal cpu->sdma map, which will be used later as a look-up table to choose an optimal engine for a user requests. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Fix resource release in context allocationJakub Pawlak2-5/+13
Correct resource free in allocate_ctxt() function. When context creation fails allocated resources are properly released and pointer in receive context data table is set back to NULL. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Remove unused variable from devdataDennis Dalessandro1-2/+0
We no longer use an error tasklet. Remove it from the hfi1_devdata structure. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Cleanup tasklet refs in commentsDennis Dalessandro2-10/+10
The code no longer uses tasklets for the send engine. However it does use a tasklet for sdma but the send routines use a workqueue now days. Update the comments to reflect that. Make things more generic with saying "send engine" because that is what is being referred to. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Adjust hardware buffering parameterHarish Chegondi2-3/+3
It was determined that 0x880 is a better value for hardware buffering, use it. Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Act on external device timeoutDean Luick2-2/+6
Add missing external device timeout notification. Recognize it as a failed LNI signal from the 8051 firmware. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Fix defered ack race with qp destroyMike Marciniszyn1-1/+4
There is a a bug in defered ack stuff that causes a race with the destroy of a QP. A packet causes a defered ack to be pended by putting the QP into an rcd queue. A return from the driver interrupt processing will process that rcd queue of QPs and attempt to do a direct send of the ack. At this point no locks are held and the above QP could now be put in the reset state in the qp destroy logic. A refcount protects the QP while it is in the rcd queue so it isn't going anywhere yet. If the direct send fails to allocate a pio buffer, hfi1_schedule_send() is called to trigger sending an ack from the send engine. There is no state test in that code path. The refcount is then dropped from the driver.c caller potentially allowing the qp destroy to continue from its refcount wait in parallel with the workqueue scheduling of the qp. Cc: stable@vger.kernel.org Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Combine shift copy and byte copy for SGE readsSebastian Sanchez1-137/+23
Prevent over-reading the SGE length by using byte reads for non quad-word reads. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Do not read more than a SGE lengthSebastian Sanchez1-48/+40
In certain cases, if the tail of an SGE is not 8-byte aligned, bytes beyond the end to an 8-byte alignment can be read. Change the copy routine to avoid the over-read. Instead, stop on the final whole quad-word, then read the remaining bytes. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Extend i2c timeoutDean Luick1-1/+1
Allow a longer timeout for i2c due to clock stretching and inaccurate jiffy timing when under a spin lock. This timeout is consistent with other i2c-algo-bit users. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Increase default settings of max_cqes and max_qpsJianxin Xiong1-2/+2
The ib_write_bw test allows using up to 16384 QPs. When a relatively large number of QPs (within that range) is used, the test can fail because the number of CQ entries needed exceeds the limit set by the driver. This patch increases the default setting of max_cqes from 0x2FFFF (196607) to 0x2FFFFF(3145727), which is sufficient to cover the maximum number needed by the ib_write_bw test (2097152). The default setting of max_qps is also increased from 16384 to 32768 to allow the test to run successfully with 16383 or 16384 QPs. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Remove filtering of Set(PkeyTable) in HFI SMASebastian Sanchez1-6/+0
The FM should have full control to set the pkeys in the driver pkey table. Remove filtering done by the driver. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Consolidate pio control masks into single definitionMike Marciniszyn4-30/+32
This allows for adding additional pages of adaptive pio opcode control including manufacturer specific ones. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/rdmavt, IB/hfi1: Add lockdep asserts for lock debugMike Marciniszyn2-2/+22
This patch adds lockdep asserts in key code paths for insuring lock correctness. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Move iowait_init() to priv allocateMike Marciniszyn1-7/+7
The call is misplaced in the reset calldown function and causes issues with lockdep assertions that are to be added. Fixes: Commit a2c2d608957c ("staging/rdma/hfi1: Remove create_qp functionality") Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Fix locking scheme for affinity settingsTadeusz Struk2-43/+51
Existing locking scheme in affinity.c file using the &node_affinity.lock spinlock is not very elegant. We acquire the lock to get hfi1_affinity_node entry, unlock, and then use the entry without the lock held. With more functions being added, which access and modify the entries, this can lead to race conditions. This patch makes this locking scheme more consistent. It changes the spinlock to mutex. Since all the code is executed in a user process context there is no need for a spinlock. This also allows to keep the lock not only while we look up for the node affinity entry, but over the whole section where the entry is being used. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Fix user-space buffers mapping with IOMMU enabledTymoteusz Kielan7-63/+79
The dma_XXX API functions return bus addresses which are physical addresses when IOMMU is disabled. Buffer mapping to user-space is done via remap_pfn_range() with PFN based on bus address instead of physical. This results in wrong pages being mapped to user-space when IOMMU is enabled. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Tymoteusz Kielan <tymoteusz.kielan@intel.com> Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Fix the count of user packets submitted to an SDMA engineHarish Chegondi3-28/+30
Each user SDMA request coming into the driver may contain multiple packets. Each user packet may use multiple SDMA descriptors to fill the send buffer. The field seqsubmitted in struct user_sdma_request counts the number of user packets submitted to an SDMA engine. Sometimes, the intermediate count may not be updated properly. However, once all the packets' descriptors are successfully submitted to the SDMA engine, the final count is updated correctly. But, if only some of the packets are submitted to the engine due to an error, the intermediate count doesn't reflect the partial number of packets submitted to the SDMA engine. This can cause a hang later in the code as the count of packets submitted to the SDMA engine doesn't match the the count of packets processed by the SDMA engine. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/hfi1: Move serdes tune inside link start functionDean Luick2-15/+8
All calls to tune_serdes and start_link are paired. Move tune_serdes inside start_link. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-10-02IB/qib,IB/hfi: Use core common header fileMike Marciniszyn14-209/+107
Use common header file structs, defines, and accessors in the drivers. The old declarations are removed. The repositioning of the includes allows for the removal of hfi1_message_header and replaces its use with ib_header. Also corrected are two issues with set_armed_to_active(): - The "packet" parameter is now a pointer as it should have been - The etype is validated to insure that the header is correct Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-16IB/rdmavt, IB/qib, IB/hfi1: Use new QP put get routinesMike Marciniszyn4-15/+11
This improves readability and hides the reference count mechanism from the client drivers. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/hfi1: Rework debugfs to use SRCUMike Marciniszyn1-80/+52
The debugfs RCU trips many debug kernel warnings because of potential sleeps with an RCU read lock held. This includes both user copy calls and slab allocations throughout the file. This patch switches the RCU to use SRCU for file remove/access race protection. In one case, the SRCU is implicit in the use of the raw debugfs file object and just works. In the seq_file case, a wrapper around seq_read() and seq_lseek() is used to enforce the SRCU using the debugfs supplied functions debugfs_use_file_start() and debugfs_use_file_stop(). The sychronize_rcu() is deleted since the SRCU prevents the remove access race. The RCU locking is kept for qp_stats since the QP hash list is protected using the non-sleepable RCU. Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/hfi1: Make n_krcvqs be an unsigned long integerHarish Chegondi3-5/+5
The global variable n_krcvqs stores the sum of the number of kernel receive queues of VLs 0-7 which the user can pass to the driver through the module parameter array krcvqs which is of type unsigned integer. If the user passes large value(s) into krcvqs parameter array, it can cause an arithmetic overflow while calculating n_krcvqs which is also of type unsigned int. The overflow results in an incorrect value of n_krcvqs which can lead to kernel crash while loading the driver. Fix by changing the data type of n_krcvqs to unsigned long. This patch also changes the data type of other variables that get their values from n_krcvqs. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/hfi1: Add QSFP sanity pre-checkDean Luick4-8/+82
Sometimes a QSFP device does not respond in the expected time after a power-on. Add a read pre-check/retry when starting the link on driver load. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/hfi1: Fix AHG KDETH Intr shiftJubin John1-1/+4
In the set_txreq_header_ahg(), The KDETH Intr bit is obtained from the header in the user sdma request using a KDETH_GET shift and mask macro. This value is then futher right shifted by 16 causing us to lose the value i.e it is shifted to zero, leading to the following smatch warning: drivers/infiniband/hw/hfi1/user_sdma.c:1482 set_txreq_header_ahg() warn: mask and shift to zero The Intr bit should be left shifted into its correct position in the KDETH header before the AHG update. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/hfi1: Fix SGE length for misaligned PIO copySebastian Sanchez1-0/+12
When trying to align the source pointer and there's a byte carry in an SGE copy, bytes are borrowed from the next quad-word X to complete the required quad-word copy. Then, the SGE length is reduced by the number of borrowed bytes. After this, if the remaining number of bytes from quad-word X (extra bytes) is greater than the new SGE length, the number of extra bytes needs to be updated to the new SGE length. Otherwise, when the SGE length gets updated again after the extra bytes are read to create the new byte carry, it goes negative, which then becomes a very large number as the SGE length is an unsigned integer. This causes SGE buffer to be over-read. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-02IB/hfi1: Fix the size parameter to find_first_bitChristophe Jaillet1-4/+4
The 2nd parameter of 'find_first_bit' is the number of bits to search. In this case, we are passing 'sizeof(u64)' which is 8. It is likely that the number of bits of 'port_mask' was expected here. Use sizeof() * 8 to get the correct number. It has been spotted by the following coccinelle script: @@ expression ret, x; @@ * ret = \(find_first_bit \| find_first_zero_bit\) (x, sizeof(...)); Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-26IB/hfi1: Clean up type used and castingChristophe Jaillet1-2/+2
In all other places in this file where 'find_first_bit' is called, port_num is defined as a 'u8' and no casting is done. Do the same here in order to be more consistent. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22IB/hfi1: Fix mm_struct use after freeIra Weiny1-0/+2
Testing with CONFIG_SLUB_DEBUG_ON=y resulted in the kernel panic below. This is the result of the mm_struct sometimes being free'd prior to hfi1_file_close being called. This was due to the combination of 2 reasons: 1) hfi1_file_close is deferred in process exit and it therefore may not be called synchronously with process exit. 2) exit_mm is called prior to exit_files in do_exit. Normally this is ok however, our kernel bypass code requires us to have access to the mm_struct for house keeping both at "normal" close time as well as at process exit. Therefore, the fix is to simply keep a reference to the mm_struct until we are done with it. [ 3006.340150] general protection fault: 0000 [#1] SMP [ 3006.346469] Modules linked in: hfi1 rdmavt rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash dm_log dm_mod snd_hda_code c_realtek iTCO_wdt snd_hda_codec_generic iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass c rct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw snd_hda_intel gf128mul snd_hda_codec glue_helper snd_hda_core ablk_helper sn d_hwdep cryptd snd_seq snd_seq_device snd_pcm snd_timer snd soundcore pcspkr shpchp mei_me sg lpc_ich mei i2c_i801 mfd_core ioatdma ipmi_devi ntf wmi ipmi_si ipmi_msghandler acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 jbd2 mbcache mlx4_en ib_core sr_mod s d_mod cdrom crc32c_intel mgag200 drm_kms_helper syscopyarea sysfillrect igb sysimgblt fb_sys_fops ptp mlx4_core ttm isci pps_core ahci drm li bsas libahci dca firewire_ohci i2c_algo_bit scsi_transport_sas firewire_core crc_itu_t i2c_core libata [last unloaded: mlx4_ib] [ 3006.461759] CPU: 16 PID: 11624 Comm: mpi_stress Not tainted 4.7.0-rc5+ #1 [ 3006.469915] Hardware name: Intel Corporation W2600CR ........../W2600CR, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [ 3006.483027] task: ffff8804102f0040 ti: ffff8804102f8000 task.ti: ffff8804102f8000 [ 3006.491971] RIP: 0010:[<ffffffff810f0383>] [<ffffffff810f0383>] __lock_acquire+0xb3/0x19e0 [ 3006.501905] RSP: 0018:ffff8804102fb908 EFLAGS: 00010002 [ 3006.508447] RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000001 RCX: 0000000000000000 [ 3006.517012] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff880410b56a40 [ 3006.525569] RBP: ffff8804102fb9b0 R08: 0000000000000001 R09: 0000000000000000 [ 3006.534119] R10: ffff8804102f0040 R11: 0000000000000000 R12: 0000000000000000 [ 3006.542664] R13: ffff880410b56a40 R14: 0000000000000000 R15: 0000000000000000 [ 3006.551203] FS: 00007ff478c08700(0000) GS:ffff88042e200000(0000) knlGS:0000000000000000 [ 3006.560814] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3006.567806] CR2: 00007f667f5109e0 CR3: 0000000001c06000 CR4: 00000000000406e0 [ 3006.576352] Stack: [ 3006.579157] ffffffff8124b819 ffffffffffffffff 0000000000000000 ffff8804102fb940 [ 3006.588072] 0000000000000002 0000000000000000 ffff8804102f0040 0000000000000007 [ 3006.596971] 0000000000000006 ffff8803cad6f000 0000000000000000 ffff8804102f0040 [ 3006.605878] Call Trace: [ 3006.609220] [<ffffffff8124b819>] ? uncharge_batch+0x109/0x250 [ 3006.616382] [<ffffffff810f2313>] lock_acquire+0xd3/0x220 [ 3006.623056] [<ffffffffa0a30bfc>] ? hfi1_release_user_pages+0x7c/0xa0 [hfi1] [ 3006.631593] [<ffffffff81775579>] down_write+0x49/0x80 [ 3006.638022] [<ffffffffa0a30bfc>] ? hfi1_release_user_pages+0x7c/0xa0 [hfi1] [ 3006.646569] [<ffffffffa0a30bfc>] hfi1_release_user_pages+0x7c/0xa0 [hfi1] [ 3006.654898] [<ffffffffa0a2efb6>] cacheless_tid_rb_remove+0x106/0x330 [hfi1] [ 3006.663417] [<ffffffff810efd36>] ? mark_held_locks+0x66/0x90 [ 3006.670498] [<ffffffff817771f6>] ? _raw_spin_unlock_irqrestore+0x36/0x60 [ 3006.678741] [<ffffffffa0a2f1ee>] tid_rb_remove+0xe/0x10 [hfi1] [ 3006.686010] [<ffffffffa0a0c5d5>] hfi1_mmu_rb_unregister+0xc5/0x100 [hfi1] [ 3006.694387] [<ffffffffa0a2fcb9>] hfi1_user_exp_rcv_free+0x39/0x120 [hfi1] [ 3006.702732] [<ffffffffa09fc6ea>] hfi1_file_close+0x17a/0x330 [hfi1] [ 3006.710489] [<ffffffff81263e9a>] __fput+0xfa/0x230 [ 3006.716595] [<ffffffff8126400e>] ____fput+0xe/0x10 [ 3006.722696] [<ffffffff810b95c6>] task_work_run+0x86/0xc0 [ 3006.729379] [<ffffffff81099933>] do_exit+0x323/0xc40 [ 3006.735672] [<ffffffff8109a2dc>] do_group_exit+0x4c/0xc0 [ 3006.742371] [<ffffffff810a7f55>] get_signal+0x345/0x940 [ 3006.748958] [<ffffffff810340c7>] do_signal+0x37/0x700 [ 3006.755328] [<ffffffff8127872a>] ? poll_select_set_timeout+0x5a/0x90 [ 3006.763146] [<ffffffff811609cb>] ? __audit_syscall_exit+0x1db/0x260 [ 3006.770853] [<ffffffff8110f3e3>] ? rcu_read_lock_sched_held+0x93/0xa0 [ 3006.778765] [<ffffffff812347a4>] ? kfree+0x1e4/0x2a0 [ 3006.784986] [<ffffffff8108e75a>] ? exit_to_usermode_loop+0x33/0xac [ 3006.792551] [<ffffffff8108e785>] exit_to_usermode_loop+0x5e/0xac [ 3006.799907] [<ffffffff81003dca>] do_syscall_64+0x12a/0x190 [ 3006.806664] [<ffffffff81777a7f>] entry_SYSCALL64_slow_path+0x25/0x25 [ 3006.814396] Code: 24 08 44 89 44 24 10 89 4c 24 18 e8 a8 d8 ff ff 48 85 c0 8b 4c 24 18 44 8b 44 24 10 44 8b 4c 24 08 4c 8b 14 24 0f 84 30 08 00 00 <f0> ff 80 98 01 00 00 8b 3d 48 ad be 01 45 8b a2 90 0b 00 00 85 [ 3006.837158] RIP [<ffffffff810f0383>] __lock_acquire+0xb3/0x19e0 [ 3006.844401] RSP <ffff8804102fb908> [ 3006.851170] ---[ end trace b7b9f21cf06c27df ]--- [ 3006.927420] Kernel panic - not syncing: Fatal exception [ 3006.933954] Kernel Offset: disabled [ 3006.940961] ---[ end Kernel panic - not syncing: Fatal exception [ 3006.948249] ------------[ cut here ]------------ Fixes: 3faa3d9a308e ("IB/hfi1: Make use of mm consistent") Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22IB/hfi1: Improve J_KEY generationMitko Haralanov1-1/+18
Previously, J_KEY generation was based on the lower 16 bits of the user's UID. While this works, it was not good enough as a non-root user could collide with a root user given a sufficiently large UID. This patch attempt to improve the J_KEY generation by using the following algorithm: The 16 bit J_KEY space is partitioned into 3 separate spaces reserved for different user classes: * all users with administtor privileges (including 'root') will use J_KEYs in the range of 0 to 31, * all kernel protocols, which use KDETH packets will use J_KEYs in the range of 32 to 63, and * all other users will use J_KEYs in the range of 64 to 65535. The above separation is aimed at preventing different user levels from sending packets to each other and, additionally, separate kernel protocols from all other types of users. The later is meant to prevent the potential corruption of kernel memory by any other type of user. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22IB/hfi1: Return invalid field for non-QSFP CableInfo queriesEaswar Hariharan1-0/+5
The driver does not check if the CableInfo query is supported for the port type. Return early if CableInfo is not supported for the port type, making compliance with the specification explicit and preventing lower level code from potentially doing the wrong thing if the query is not supported for the hardware implementation. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22IB/hfi1: Add missing error code assignment before testChristophe Jaillet1-1/+1
It is likely that checking the result of 'setup_ctxt' is expected here. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22IB/hfi1: Using kfree_rcu() to simplify the codeWei Yongjun3-10/+2
The callback function of call_rcu() just calls a kfree(), so we can use kfree_rcu() instead of call_rcu() + callback function. Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22IB/hfi1: Validate header in set_armed_activeMike Marciniszyn1-1/+2
Validate the etype to insure that the header is correct. Reviewed-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22IB/hfi1: Pass packet ptr to set_armed_activeMike Marciniszyn1-5/+5
The "packet" parameter was being passed on the stack, change it to a pointer. Reviewed-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22IB/hfi1: Fetch monitor values on-demand for CableInfo queryEaswar Hariharan2-2/+33
The monitor values from bytes 22 through 81 of the QSFP memory space (SFF 8636) are dynamic and serving them out of the QSFP memory cache maintained by the driver provides stale data to the CableInfo SMA query. This patch refreshes the dynamic values from the QSFP memory on request and overwrites the stale data from the cache for the overlap between the requested range and the monitor range. Reviewed-by: Jubin John <jubin.john@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22IB/hfi1,IB/qib: Fix qp_stats sleep with rcu read lock heldMike Marciniszyn2-9/+9
The qp init function does a kzalloc() while holding the RCU lock that encounters the following warning with a debug kernel when a cat of the qp_stats is done: [ 231.723948] rcu_scheduler_active = 1, debug_locks = 0 [ 231.731939] 3 locks held by cat/11355: [ 231.736492] #0: (debugfs_srcu){......}, at: [<ffffffff813001a5>] debugfs_use_file_start+0x5/0x90 [ 231.746955] #1: (&p->lock){+.+.+.}, at: [<ffffffff81289a6c>] seq_read+0x4c/0x3c0 [ 231.755873] #2: (rcu_read_lock){......}, at: [<ffffffffa0a0c535>] _qp_stats_seq_start+0x5/0xd0 [hfi1] [ 231.766862] The init functions do an implicit next which requires the rcu read lock before the kzalloc(). Fix for both drivers is to change the scope of the init function to only do the allocation and the initialization of the just allocated iter. The implict next is moved back into the respective start functions to fix the issue. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> CC: <stable@vger.kernel.org> # 4.6.x- Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22IB/hfi1: Remove duplicated include from affinity.cWei Yongjun1-1/+0
Remove duplicated include. Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22IB/hfi1: Allocate cpu mask on the heap to silence warningTadeusz Struk1-7/+13
If CONFIG_FRAME_WARN is small (1K) and CONFIG_NR_CPUS big then a frame size warning is triggered during build. Allocate the cpu mask dynamically to silence the warning. Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-04Merge tag 'for-linus-2' of ↵Linus Torvalds45-2892/+3829
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull second round of rdma updates from Doug Ledford: "This can be split out into just two categories: - fixes to the RDMA R/W API in regards to SG list length limits (about 5 patches) - fixes/features for the Intel hfi1 driver (everything else) The hfi1 driver is still being brought to full feature support by Intel, and they have a lot of people working on it, so that amounts to almost the entirety of this pull request" * tag 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (84 commits) IB/hfi1: Add cache evict LRU list IB/hfi1: Fix memory leak during unexpected shutdown IB/hfi1: Remove unneeded mm argument in remove function IB/hfi1: Consistently call ops->remove outside spinlock IB/hfi1: Use evict mmu rb operation IB/hfi1: Add evict operation to the mmu rb handler IB/hfi1: Fix TID caching actions IB/hfi1: Make the cache handler own its rb tree root IB/hfi1: Make use of mm consistent IB/hfi1: Fix user SDMA racy user request claim IB/hfi1: Fix error condition that needs to clean up IB/hfi1: Release node on insert failure IB/hfi1: Validate SDMA user iovector count IB/hfi1: Validate SDMA user request index IB/hfi1: Use the same capability state for all shared contexts IB/hfi1: Prevent null pointer dereference IB/hfi1: Rename TID mmu_rb_* functions IB/hfi1: Remove unneeded empty check in hfi1_mmu_rb_unregister() IB/hfi1: Restructure hfi1_file_open IB/hfi1: Make iovec loop index easy to understand ...