Legacy LNet Active-Passive InfiniBand Bonding with ib-bond HCA Driver

From Lustre Wiki
Jump to: navigation, search

Note: This article is superseded in later versions of Lustre that include Multi-Rail LNet support. The Multi-Rail LNet feature was introduced in Lustre 2.10.0.

For versions of Lustre that do not include LNet Multi-rail support, LNet can still be configured to take advantage of bonded network interfaces when they are presented as a single device by the underlying transport. Be aware that devices using OFED or the in-kernel InfiniBand bonding drivers will only support active-passive, or failover, network bonding, which means that only one physical interface is active at any one point in time. Thus, configuring multiple InfiniBand connections on a single fabric using the ib-bond kernel driver provides a way to improve fault-tolerance, but will not increase throughput.

Refer to the Multi-Rail LNet article for information on the far more powerful and versatile networking functionality implemented in LNet since Lustre 2.10.0. Multi-Rail LNet offers multiple active ntwork paths and is implemented manner that is fabric-agnostic. Lustre's native Multi-Rail LNet functionality allows data to be aggregated across multiple network transports simultaneously, as well as providing fault tolerance features.

Note: A host can have multiple independent LNet interfaces configured and connected to separate networks (multi-homing), without requiring either bonding or Multi-rail functionality. This enables servers to be directly connected to multiple fabrics simultaneously, or for a Lustre client to mount file systems that have been presented over different fabrics.

The ko2iblnd LND provides support for InfiniBand network device bonding in an active-passive configuration, for the purposes of high availability (HA). Because the bonded interface is active-passive, there is no improvement in throughput performance, so the feature is only suitable for use in situations where service availability is a mandated requirement (mission-critical platforms).

With this form of bonding, the server actively uses one interface in the bonded group at a time. If the active interface fails, traffic fails over to the remaining interface in the bond group.

This form of InfiniBand bonding support is distinct from the use of bonded network interfaces with ksocklnd, which runs over TCP/IP sockets. The ksocklnd TCP/IP LNet driver does not distinguish between bonded or single interfaces and no specific LNet configuration is required.

Enabling Active-Passive InfiniBand (o2ib) Bonding

To enable failover support in LNet for bonded InfiniBand (or other network interfaces supported by OFED), add the following option into the kernel modules configuration:

options ko2iblnd dev_failover=1

The common convention is to create files in the directory /etc/modprobe.d containing options for loadable kernel modules.

With this option enabled, one can refer to the bonded network interface in the LNet configuration. For example:

options lnet networks=o2ib0(bond0)

The following example, based on a RHEL / CentOS operating platform, illustrates a bonded network configuration for a Lustre system with two InfiniBand interfaces.

/etc/modprobe.d/lustre.conf:
alias ibbond bonding
options lnet networks=o2ib0(ibbond)
options ko2iblnd dev_failover=1

/etc/sysconfig/network-scripts/ifcfg-ibbond:
DEVICE=ibbond
BOOTPROTO=none
IPADDR=10.0.0.11
NETMASK=255.255.0.0
ONBOOT=yes
TYPE=Bonding
USERCTL=no
MTU=2044
BONDING_OPTS="mode=1 miimon=100 primary=ib0"

/etc/sysconfig/network-scripts/ifcfg-ib0:
DEVICE=ib0
USERCTL=no
ONBOOT=yes
MASTER=ibbond
SLAVE=yes
BOOTPROTO=none
TYPE=InfiniBand

/etc/sysconfig/network-scripts/ifcfg-ib1:
DEVICE=ib1
USERCTL=no
ONBOOT=yes
MASTER=ibbond
SLAVE=yes
BOOTPROTO=none
TYPE=InfiniBand

The ibbond alias name in the sample /etc/modprobe.d/lustre.conf configuration file is arbitrary, but is more descriptive than e.g. bond0. It is common to encounter installations where there are both bonded Ethernet and bonded IB interfaces on the same host, and choosing a descriptive naming convention simplifies administration of the machines.

Restrictions for Legacy ib-bond LNet Topologies

If the version of Lustre does not natively support multi-rail topologies, i.e., multiple network interfaces connected to the same subnet, attempts to assign two interfaces to the same LNet will fail.

For example:

options lnet networks="tcp0(eth0),tcp0(eth1)"

The above configuration will cause a syntax error when the kernel module is loaded and an attempt is made to start the network. The following transcript shows the behavior when this unsupported configuration is attempted:

[root@rh7z-pe ~]# modprobe -v lnet
insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/libcfs.ko 
insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/lnet.ko networks="tcp0(eth0),tcp0(eth1)" 
[root@rh7z-pe ~]# lctl network up
LNET configure error 22: Invalid argument

The kernel ring buffer will have a record of the error reported by the LNet driver, for example:

[root@rh7z-pe ~]# dmesg | tail -1
[ 6620.324053] LNetError: 111-1: Duplicate network specified: tcp

The kernel will also log the error in the syslog:

[root@rh7z-pe ~]# tail -1 /var/log/messages
Feb 21 21:11:28 rh7z-pe kernel: LNetError: 111-1: Duplicate network specified: tcp

Similarly, one cannot specify multiple interfaces within the parentheses associated with an LNet LND. In the following example, only the first interface, eth0, will be used to create an NID for the host; the second parameter, eth1, will be ignored:

# eth0 inet 192.168.207.2/24
# eth1 inet 192.168.207.111/24
# options lnet networks="tcp0(eth0,eth1)"
[root@rh7z-pe ~]# modprobe -v lnet
insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/libcfs.ko 
insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/lnet.ko networks="tcp0(eth0,eth1)" 
[root@rh7z-pe ~]# lctl network up
LNET configured
[root@rh7z-pe ~]# lctl list_nids
192.168.207.2@tcp

To take advantage of complex topologies and to aggregate performance across multiple network interfaces, use the latest version of Lustre containing the Multi-rail LNet feature.