LNet Configuration Edge Case Behaviors and Side-Effects

From Lustre Wiki
Jump to: navigation, search

If no explicit configuration is supplied to LNet, either through a modprobe options file or a YAML description for lnetctl, it will attempt to create a valid TCP/IP (socklnd) NID for tcp0 using the first network interface that is detected by the operating system (e.g. eth0) when the module is loaded and the LNet service started.

The order of interface detection is entirely at the discretion of the operating system, which means that there is no guarantee that the ordering of interfaces will be preserved between reboots and on the insertion of a new hardware device. It also means that the default behavior for a host will differ depending on its hardware configuration. Most operating systems do try to ensure that a device, after it is detected, maintains the same device name (eth0, eth1, etc.) between reboots. Nevertheless, it is strongly recommended that all configuration be stated explicitly: defining the configuration also defines the expected behavior of the system, making it easier to audit.

The ip2nets option for the LNet kernel module is a list of network definition and IP-match pairs. These pairs are processed in sequence. If there is a match for a local IP address, then that network definition is used for the node, and further pairs for that network are ignored. Multiple networks can be matched.

For example:

ip2nets="tcp(eth2) 134.32.1.[4-10/2]; tcp(eth1) *.*.*.*"

This set of rules is used to create network tcp0 (the 0 is implied, because the LNet network number is omitted). If a local IP address matches 134.32.1.[4-10/2], meaning it is one of 134.32.1.4, 134.32.1.6, 134.32.1.8, or 134.32.1.10, then tcp0 is created using interface eth2. Otherwise the second pair is used, and because "*.*.*.*" matches every address, it always creates tcp0 on eth1.

Note that ip2nets will use the IP address definition to match the host, not the interface. The ip2nets definition will not verify or otherwise qualify that the IP address matched is associated with the physical network interface in the specification. This means that a pattern can match the IP address of an interface that will not actually be used for LNet communications. From the above example, if a host has an interface eth3 with IP address 134.32.1.4, then that would be considered a match good enough to trigger the creation of the NID on tcp0(eth2).

Also, if the device is not specified in an ip2nets definition, LNet will pick up the first available device rather than the device that matches the IP address pattern. For example, if the IP address pattern matches the IP address on eth1, but no device is mentioned in the ip2nets definition, then eth0 will get an LNet configuration. As an illustration, consider a host with a 10.70/16 IP address on eth0, and a 192.168/24 address on eth1. The following ip2nets definition will create a NID on eth0, even though the pattern matches the eth1 device:

options lnet ip2nets="tcp0 192.168.*.*"

If, instead, the device is included in the spec, then the configuration will be applied to eth1:

options lnet ip2nets="tcp0(eth1) 192.168.*.*"

The definition is interpreted as follows: configure the first socklnd NID that is found on the host where there is an IP Address matching 192.168.*.*. In this respect, it's consistent with the behavior of the much simpler networks syntax in the following example:

options lnet networks="tcp0"

This example creates a NID on the first network device detected by the operating system, because no device was specified. In common with the ip2nets parameter, the lack of definition of a specific network interface means that LNet will configure the first interface that was detected by the host operating system.

If an interface is explicitly specified as well as a pattern, the interface matched using the IP pattern will be compared against the explicitly defined interface. For example, if the ip2nets definition is “tcp(eth0) 192.168.*.3” and there exists in the system a device eth0 with IP address 192.0.19.3 and a device eth1 with IP address 192.168.3.3, then configuration will fail, because the pattern contradicts the interface specified. A clear warning will be displayed if inconsistent configuration is encountered.

If the LNet number for a NID is 0 (zero), for example, tcp0, or o2ib0, the number will sometimes be omitted from command output, and can usually be omitted from configuration files as well (although it is not recommended – for reasons of clarity alone, it is recommended to supply as much information as is reasonable when creating configuration information).