Optimizing o2iblnd Performance

In addition to defining the LNet interfaces, the kernel module files can be used to supply parameters to other kernel modules used by Lustre. This is commonly used to supply tuning optimizations to the LNet drivers, to maximize performance of the network interface. An example of this optimization can be seen in Lustre version 2.8.0 and later, in the file, which includes the following:

 alias ko2iblnd-opa ko2iblnd options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 conns_per_peer=4

install ko2iblnd /usr/sbin/ko2iblnd-probe

This configuration is automatically applied to the LNet kernel module when an Intel® Omni-Path interface is installed, but not when a different network interface is present. Please note that when a host is connected to more than one fabric sharing the same Lustre network driver, options set by  will be applied to all interfaces using the same driver. To set individual, per-device tuning parameters, use the Dynamic LNet configuration utility,, to configure the interfaces instead.

The following set of options has been defined to optimize the performance of Intel® Omni-Path Architecture. A detailed description is beyond the scope of this exercise, but the following summary provides an overview:


 * - the number of concurrent sends to a single peer
 * - Hold in Wait – when to eagerly return credits
 * - the number of concurrent sends (to all peers)
 * - send work-queue sizing
 * - the number of message descriptors that are pre-allocated when the  module is loaded in the kernel
 * - the number of noncontiguous memory regions that will be mapped into a virtual contiguous region
 * - the size of the Fast Memory registration (FMR) pool (must be )
 * - the dirty FMR pool flush trigger
 * - enable FMR caching
 * - create multiple queue pairs per peer to allow higher throughput from a single client. This is of most benefit to OPA interfaces, when coupled with the  parameter of the OPA   kernel driver. The   driver option   must also be set. It is recommended to set  . In some cases, setting   will yield improved IO performance, but this can impact other workloads, especially on clients. If queue-pair memory usage becomes excessive, reduce the     value to   and.

The default values used by Lustre if no parameters are given is:



Optimizations are applied automatically on detection of an Intel® high performance network interface. Some of the parameters, such as FMR, are incompatible with other devices, such as Mellanox InfiniBand products using the MLX5 driver. It can be disabled by setting  (the default). The configuration file can be modified or deleted to meet the specific requirements of a given installation.

In general, the default  settings work well with Mellanox InfiniBand HCAs and no tuning is normally required. Architecture differences between Intel® fabrics and Mellanox mean that setting universal defaults is very difficult. Intel® OPA and Intel® True Scale Fabric have an architecture that favors lightweight, high-frequency message-passing communications, compared to Mellanox, which has historically placed an emphasis on throughput-oriented workloads. Because Mellanox InfiniBand has historically been the dominant high-speed fabric, LNet driver development has naturally tended in the past to align with this technology, aided by interfaces that are intended to support storage-like workloads. What the above settings do is tune the LNet driver for communications on Intel® fabrics, if present.

Note: It is possible to use the  driver on RDMA fabrics if there is an upper-level protocol that supports TCP/IP traffic, such as the IPoIB driver for InfiniBand fabrics. This use of  on InfiniBand, RoCE, and Intel® OPA networks is not recommended because it will compromise the performance of LNet compared to the RDMA-based , and can have a negative impact on the stability of the resulting network connection. Instead, it is strongly recommended that  is used wherever possible; it provides the highest performance with the lowest overheads on these fabrics.

Additional Intel® Omni-Path Optimization
Intel makes the following recommendations for OPA  driver options for use with Lustre:

 options hfi1 krcvqs=4 piothreshold=0 sge_copy_mode=2 wss_threshold=70

Some experimentation with the  parameter may be required to find the optimal balance of Lustre IO performance against other workloads. Lustre servers may derive additional performance from increasing the value up to 8. For Lustre clients, higher values can improve Lustre performance but might impact application performance.

See also:


 * Intel® Omni-Path Performance Tuning User Guide from the OPA documentation bundle: End User Publications, Release Notes, and EULAs for Intel® Omni-Path Software
 * LU-8943

irqbalance
The purpose of  is to distribute hardware interrupts across processors on a multiprocessor system in order to increase performance. According to the Intel Omni-Path tuning guide (download pdf bundle), setting the irqbalance hint policy to   can be beneficial to the   receive and send DMA interrupt algorithms in the driver.

To install the  package, run the following command:

 yum -y install irqbalance

Once installed, edit, and add the following line:

 IRQBALANCE_ARGS=--hintpolicy=exact

Enable the  service and reload the configuration (make sure the HFI1 driver is loaded first):

 systemctl restart irqbalance.service