summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox/mlx5/core/uar.c
diff options
context:
space:
mode:
authorAchiad Shochat <achiad@mellanox.com>2015-07-23 23:35:59 +0300
committerDavid S. Miller <davem@davemloft.net>2015-07-27 00:29:17 -0700
commit88a85f99e51fb2373259ab83c8bb130a9bbf3804 (patch)
treef27980b2cdd3c9601a5953c7d268eabc18452a2e /drivers/net/ethernet/mellanox/mlx5/core/uar.c
parent58d522912ac7d25b63f468fa4a4e8bb059c5144e (diff)
downloadlinux-88a85f99e51fb2373259ab83c8bb130a9bbf3804.tar.bz2
net/mlx5e: TX latency optimization to save DMA reads
A regular TX WQE execution involves two or more DMA reads - one to fetch the WQE, and another one per WQE gather entry. These DMA reads obviously increase the TX latency. There are two mlx5 mechanisms to bypass these DMA reads: 1) Inline WQE 2) Blue Flame (BF) An inline WQE contains a whole packet, thus saves the DMA read/s of the regular WQE gather entry/s. Inline WQE support was already added in the previous commit. A BF WQE is written directly to the device I/O mapped memory, thus enables saving the DMA read that fetches the WQE. The BF WQE I/O write must be in cache line granularity, thus uses the CPU write combining mechanism. A BF WQE I/O write acts also as a TX doorbell for notifying the device of new TX WQEs. A BF WQE is written to the same I/O mapped address as the regular TX doorbell, thus this address is being mapped twice - once by ioremap() and once by io_mapping_map_wc(). While both mechanisms reduce the TX latency, they both consume more CPU cycles than a regular WQE: - A BF WQE must still be written to host memory, in addition to being written directly to the device I/O mapped memory. - An inline WQE involves copying the SKB data into it. To handle this tradeoff, we introduce here a heuristic algorithm that strives to avoid using these two mechanisms in case the TX queue is being back-pressured by the device, and limit their usage rate otherwise. An inline WQE will always be "Blue Flamed" (written directly to the device I/O mapped memory) while a BF WQE may not be inlined (may contain gather entries). Preliminary testing using netperf UDP_RR shows that the latency goes down from 17.5us to 16.9us, while the message rate (tested with pktgen) stays the same. Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'drivers/net/ethernet/mellanox/mlx5/core/uar.c')
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/uar.c6
1 files changed, 6 insertions, 0 deletions
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/uar.c b/drivers/net/ethernet/mellanox/mlx5/core/uar.c
index 9ef85873ceea..eb05c845ece9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/uar.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/uar.c
@@ -32,6 +32,7 @@
#include <linux/kernel.h>
#include <linux/module.h>
+#include <linux/io-mapping.h>
#include <linux/mlx5/driver.h>
#include <linux/mlx5/cmd.h>
#include "mlx5_core.h"
@@ -246,6 +247,10 @@ int mlx5_alloc_map_uar(struct mlx5_core_dev *mdev, struct mlx5_uar *uar)
goto err_free_uar;
}
+ if (mdev->priv.bf_mapping)
+ uar->bf_map = io_mapping_map_wc(mdev->priv.bf_mapping,
+ uar->index << PAGE_SHIFT);
+
return 0;
err_free_uar:
@@ -257,6 +262,7 @@ EXPORT_SYMBOL(mlx5_alloc_map_uar);
void mlx5_unmap_free_uar(struct mlx5_core_dev *mdev, struct mlx5_uar *uar)
{
+ io_mapping_unmap(uar->bf_map);
iounmap(uar->map);
mlx5_cmd_free_uar(mdev, uar->index);
}