Optimizing o2iblnd Performance: Difference between revisions
No edit summary |
|||
Line 41: | Line 41: | ||
<pre style="overflow-x:auto;"> | <pre style="overflow-x:auto;"> | ||
options hfi1 krcvqs= | options hfi1 krcvqs=4 piothreshold=0 sge_copy_mode=2 wss_threshold=70 | ||
</pre> | </pre> | ||
See also [https://jira.hpdd.intel.com/browse/LU-8943 LU-8943] | Some experimentation with the <code>krcvqs</code> parameter may be required to find the optimal balance of Lustre IO performance against other workloads. Lustre servers may derive additional performance from increasing the value up to 8. For Lustre clients, higher values can improve Lustre performance but might impact application performance. | ||
See also: | |||
* Intel® Omni-Path Performance Tuning User Guide from the OPA documentation bundle: [https://www.intel.sg/content/www/xa/en/support/articles/000016242/network-and-i-o/fabric-products.html End User Publications, Release Notes, and EULAs for Intel® Omni-Path Software] | |||
* [https://jira.hpdd.intel.com/browse/LU-8943 LU-8943] | |||
=== irqbalance === | === irqbalance === |
Revision as of 15:59, 28 August 2018
In addition to defining the LNet interfaces, the kernel module files can be used to supply parameters to other kernel modules used by Lustre. This is commonly used to supply tuning optimizations to the LNet drivers, to maximize performance of the network interface. An example of this optimization can be seen in Lustre version 2.8.0 and later, in the file /etc/modprobe.d/ko2iblnd.conf
, which includes the following:
alias ko2iblnd-opa ko2iblnd options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4 install ko2iblnd /usr/sbin/ko2iblnd-probe
This configuration is automatically applied to the LNet kernel module when an Intel® Omni-Path interface is installed, but not when a different network interface is present. Please note that when a host is connected to more than one fabric sharing the same Lustre network driver, options set by modprobe
will be applied to all interfaces using the same driver. To set individual, per-device tuning parameters, use the Dynamic LNet configuration utility, lnetctl
, to configure the interfaces instead.
The following set of options has been defined to optimize the performance of Intel® Omni-Path Architecture. A detailed description is beyond the scope of this exercise, but the following summary provides an overview:
peer_credits=128
- the number of concurrent sends to a single peerpeer_credits_hiw=64
- Hold in Wait – when to eagerly return creditscredits=1024
- the number of concurrent sends (to all peers)concurrent_sends=256
- send work-queue sizingntx=2048
- the number of message descriptors that are pre-allocated when theko2iblnd
module is loaded in the kernelmap_on_demand=32
- the number of noncontiguous memory regions that will be mapped into a virtual contiguous regionfmr_pool_size=2048
- the size of the Fast Memory registration (FMR) pool (must be>= ntx/4
)fmr_flush_trigger=512
- the dirty FMR pool flush triggerfmr_cache=1
- enable FMR cachingconns_per_peer=4
- create multiple queue pairs per peer to allow higher throughput from a single client. This is of most benefit to OPA interfaces, when coupled with thekrcvqs
parameter of the OPAhfi1
kernel driver. Thehfi1
driver optionkrcvqs
must also be set. It is recommended to setkrcvqs=8
. If queue-pair memory usage becomes excessive, reduce theko2iblnd
conns_per_peer
value to2
and reducekrcvqs
to4
.
The default values used by Lustre if no parameters are given is:
peer_credits=8
peer_credits_hiw=8
concurrent_sends=8
credits=64
Optimizations are applied automatically on detection of an Intel® high performance network interface. Some of the parameters, such as FMR, are incompatible with other devices, such as Mellanox InfiniBand products using the MLX5 driver. It can be disabled by setting map_on_demand=0
(the default). The configuration file can be modified or deleted to meet the specific requirements of a given installation.
In general, the default ko2iblnd
settings work well with Mellanox InfiniBand HCAs and no tuning is normally required. Architecture differences between Intel® fabrics and Mellanox mean that setting universal defaults is very difficult. Intel® OPA and Intel® True Scale Fabric have an architecture that favors lightweight, high-frequency message-passing communications, compared to Mellanox, which has historically placed an emphasis on throughput-oriented workloads. Because Mellanox InfiniBand has historically been the dominant high-speed fabric, LNet driver development has naturally tended in the past to align with this technology, aided by interfaces that are intended to support storage-like workloads. What the above settings do is tune the LNet driver for communications on Intel® fabrics, if present.
Note: It is possible to use the socklnd
driver on RDMA fabrics if there is an upper-level protocol that supports TCP/IP traffic, such as the IPoIB driver for InfiniBand fabrics. This use of socklnd
on InfiniBand, RoCE, and Intel® OPA networks is not recommended because it will compromise the performance of LNet compared to the RDMA-based o2iblnd
, and can have a negative impact on the stability of the resulting network connection. Instead, it is strongly recommended that o2iblnd
is used wherever possible; it provides the highest performance with the lowest overheads on these fabrics.
Additional Intel® Omni-Path Optimization
Intel makes the following recommendations for OPA hfi1
driver options for use with Lustre:
options hfi1 krcvqs=4 piothreshold=0 sge_copy_mode=2 wss_threshold=70
Some experimentation with the krcvqs
parameter may be required to find the optimal balance of Lustre IO performance against other workloads. Lustre servers may derive additional performance from increasing the value up to 8. For Lustre clients, higher values can improve Lustre performance but might impact application performance.
See also:
- Intel® Omni-Path Performance Tuning User Guide from the OPA documentation bundle: End User Publications, Release Notes, and EULAs for Intel® Omni-Path Software
- LU-8943
irqbalance
The purpose of irqbalance
is to distribute hardware interrupts across processors on a multiprocessor system in order to increase performance. According to the Intel Omni-Path tuning guide (download pdf bundle), setting the irqbalance hint policy to exact
can be beneficial to the hfi1
receive and send DMA interrupt algorithms in the driver.
To install the irqbalance
package, run the following command:
yum -y install irqbalance
Once installed, edit /etc/sysconfig/irqbalance
, and add the following line:
IRQBALANCE_ARGS=--hintpolicy=exact
Enable the irqbalance
service and reload the configuration (make sure the HFI1 driver is loaded first):
systemctl restart irqbalance.service