Optimizing o2iblnd Performance

In addition to defining the LNet interfaces, the kernel module files can be used to supply parameters to other kernel modules used by Lustre. This is commonly used to supply tuning optimizations to the LNet drivers, to maximize performance of the network interface. An example of this optimization can be seen in Lustre version 2.8.0 and later, in the file, which includes the following:

 alias ko2iblnd-opa ko2iblnd options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1

install ko2iblnd /usr/sbin/ko2iblnd-probe

This configuration is automatically applied to the LNet kernel module when an Intel® Omni-Path interface is installed, but not when a different network interface is present. Please note that when a host is connected to more than one fabric sharing the same Lustre network driver, options set by  will be applied to all interfaces using the same driver. To set individual, per-device tuning parameters, use the Dynamic LNet configuration utility,, to configure the interfaces instead.

The following set of options has been defined to optimize the performance of Intel® Omni-Path Architecture. A detailed description is beyond the scope of this exercise, but the following summary provides an overview:


 * - the number of concurrent sends to a single peer
 * - Hold in Wait – when to eagerly return credits
 * - the number of concurrent sends (to all peers)
 * - send work-queue sizing
 * - the number of message descriptors that are pre-allocated when the  module is loaded in the kernel
 * - the number of noncontiguous memory regions that will be mapped into a virtual contiguous region
 * - the size of the Fast Memory registration (FMR) pool (must be )
 * - the dirty FMR pool flush trigger
 * - enable FMR caching

The default values used by Lustre if no parameters are given is:



Optimizations are applied automatically on detection of an Intel® high performance network interface. Some of the parameters, such as FMR, are incompatible with other devices, such as Mellanox InfiniBand products using the MLX5 driver. It can be disabled by setting  (the default). The configuration file can be modified or deleted to meet the specific requirements of a given installation.

In general, the default  settings work well with Mellanox InfiniBand HCAs and no tuning is normally required. Architecture differences between Intel® fabrics and Mellanox mean that setting universal defaults is very difficult. Intel® OPA and Intel® True Scale Fabric have an architecture that favors lightweight, high-frequency message-passing communications, compared to Mellanox, which has historically placed an emphasis on throughput-oriented workloads. Because Mellanox InfiniBand has historically been the dominant high-speed fabric, LNet driver development has naturally tended in the past to align with this technology, aided by interfaces that are intended to support storage-like workloads. What the above settings do is tune the LNet driver for communications on Intel® fabrics, if present.

Note: It is possible to use the  driver on RDMA fabrics if there is an upper-level protocol that supports TCP/IP traffic, such as the IPoIB driver for InfiniBand fabrics. This use of  on InfiniBand, RoCE, and Intel® OPA networks is not recommended because it will compromise the performance of LNet compared to the RDMA-based , and can have a negative impact on the stability of the resulting network connection. Instead, it is strongly recommended that  is used wherever possible; it provides the highest performance with the lowest overheads on these fabrics.