LNet Configuration Edge Case Behaviors and Side-Effects

If no explicit configuration is supplied to LNet, either through a modprobe options file or a YAML description for, it will attempt to create a valid TCP/IP  NID for   using the first network interface that is detected by the operating system (e.g.  ) when the module is loaded and the LNet service started.

The order of interface detection is entirely at the discretion of the operating system, which means that there is no guarantee that the ordering of interfaces will be preserved between reboots and on the insertion of a new hardware device. It also means that the default behavior for a host will differ depending on its hardware configuration. Most operating systems do try to ensure that a device, after it is detected, maintains the same device name (, , etc.) between reboots. Nevertheless, it is strongly recommended that all configuration be stated explicitly: defining the configuration also defines the expected behavior of the system, making it easier to audit.

The  option for the LNet kernel module is a list of network definition and IP-match pairs. These pairs are processed in sequence. If there is a match for a local IP address, then that network definition is used for the node, and further pairs for that network are ignored. Multiple networks can be matched.

For example:

 ip2nets="tcp(eth2) 134.32.1.[4-10/2]; tcp(eth1) *.*.*.*"

This set of rules is used to create network  (the   is implied, because the LNet network number is omitted). If a local IP address matches, meaning it is one of  ,  ,  , or  , then   is created using interface. Otherwise the second pair is used, and because " " matches every address, it always creates  on.

Note that  will use the IP address definition to match the host, not the interface. The  definition will not verify or otherwise qualify that the IP address matched is associated with the physical network interface in the specification. This means that a pattern can match the IP address of an interface that will not actually be used for LNet communications. From the above example, if a host has an interface   with IP address , then that would be considered a match good enough to trigger the creation of  the NID on.

Also, if the device is not specified in an  definition, LNet will pick up the first available device rather than the device that matches the IP address pattern. For example, if the IP address pattern matches the IP address on, but no device is mentioned in the   definition, then   will get an LNet configuration. As an illustration, consider a host with a  IP address on , and a   address on. The following  definition will create a NID on , even though the pattern matches the   device:

 options lnet ip2nets="tcp0 192.168.*.*"

If, instead, the device is included in the spec, then the configuration will be applied to :

 options lnet ip2nets="tcp0(eth1) 192.168.*.*"

The definition is interpreted as follows: configure the first socklnd NID that is found on the host where there is an IP Address matching. In this respect, it's consistent with the behavior of the much simpler networks syntax in the following example:

 options lnet networks="tcp0"

This example creates a NID on the first network device detected by the operating system, because no device was specified. In common with the  parameter, the lack of definition of a specific network interface means that LNet will configure the first interface that was detected by the host operating system.

If an interface is explicitly specified as well as a pattern, the interface matched using the IP pattern will be compared against the explicitly defined interface. For example, if the  definition is “ ” and there exists in the system a device   with IP address   and a device   with IP address , then configuration will fail, because the pattern contradicts the interface specified. A clear warning will be displayed if inconsistent configuration is encountered.

If the LNet number for a NID is 0 (zero), for example,, or  , the number will sometimes be omitted from command output, and can usually be omitted from configuration files as well (although it is not recommended – for reasons of clarity alone, it is recommended to supply as much information as is reasonable when creating configuration information).