Starting and Stopping LNet

LNet, like most of the services that comprise a Lustre file system, runs in the Linux kernel and is incorporated as a kernel module. LNet is started in two steps:


 * 1) Load the kernel modules
 * 2) Start the services

The  kernel module can be loaded directly through the modprobe command or indirectly by loading a kernel module that has a dependency on LNet. In normal operation, the  module will be loaded indirectly as a consequence of attempting to start a Lustre service, e.g. by mounting a file system on a client. However, one can treat LNet as independent of Lustre and start it on its own. This is useful for testing and debugging purposes, and to provide some verification of correctness when a system boots up prior to committing to loading the higher-level services (i.e. Lustre).

To load the LNet kernel module, run:

 modprobe [-v] lnet

The  flag is optional and provides verbose output, which is useful for debugging purposes but is normally omitted. For example:

 [root@rh7z-pe ~]# modprobe -v lnet insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/libcfs.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/lnet.ko networks="tcp0(eth1)"

Notice that a second module, called, was also loaded. The  module is an API used throughout Lustre and LNet and provides primitives for things like process management, memory management, and debugging.

After the module is loaded, the LNet service needs to be started:

 lctl network up lctl network configure
 * or

The  command works on all versions of Lustre, and prior to version 2.7, it is the only way to manually start LNet. In Lustre 2.7 and onward, there is also the  utility:

 lnetctl lnet configure [--all]

The  command will not automatically configure networks that are specified in the kernel module parameters; the   service will start, but the interfaces will not be configured. Supplying the  flag will cause all of the networks defined as kernel module options to be loaded and started.

To view the loaded configuration:

 lctl list_nids

lnetctl net show [--verbose] lnetctl export
 * 1) or for dynamic lnet in Lustre 2.7+

The  command is equivalent to.

To shut down LNet and unload the kernel modules, first stop the LNet networks on the host:

 lctl network down lctl network unconfigure
 * 1) or

Then use the  command to unload the kernel modules:

 lustre_rmmod

One can unload the module by using  directly:

 rmmod lnet rmmod libcfs

is the recommended method for unloading Lustre and LNet kernel modules, because it will check for dependencies and eliminates any guesswork on the part of the systems administrator in trying to identify all of the modules to unload and the correct sequence for doing so.

LNet can also be loaded indirectly, as a dependency of the  kernel module. If LNet is loaded in this way, its start-up behavior is different because the LNet networks defined in kernel module options will be automatically configured and brought online. This is easily illustrated just by loading the Lustre module:

 [root@rh7z-pe ~]# modprobe -v lustre insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/libcfs.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/lnet.ko networks="tcp0(eth1)" insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/obdclass.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/ptlrpc.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/fld.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/fid.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/lov.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/mdc.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/lmv.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/lustre.ko

Notice that the  module is loaded as a dependency. The console and kernel ring buffer output will look something like this:

 [266699.213610] LNet: HW CPU cores: 2, npartitions: 1 [266699.232630] alg: No test for adler32 (adler32-zlib) [266699.234184] alg: No test for crc32 (crc32-table) [266707.286906] Lustre: Lustre: Build Version: jenkins-arch=x86_64,build_type=server,distro=el7,ib_stack=inkernel-40--PRISTINE-3.10.0-327.13.1.el7_lustre.x86_64 [266707.338890] LNet: Added LNI 192.168.207.2@tcp [8/256/0/180] [266707.339851] LNet: Accept secure, port 988

As can be seen in the above output, the LNet networks were automatically loaded.

The  behavior is also different in this circumstance, compared to loading LNet on its own. If the administrator loads and configures LNet on its own, independently of the Lustre module, then it is necessary to unconfigure the LNet networks before removing the kernel modules:

 [root@rh7z-pe ~]# modprobe lnet [root@rh7z-pe ~]# lctl network up LNET configured [root@rh7z-pe ~]# lctl list_nids 192.168.207.2@tcp [root@rh7z-pe ~]# lustre_rmmod Modules still loaded: lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o [root@rh7z-pe ~]# lctl network down LNET ready to unload [root@rh7z-pe ~]# lustre_rmmod [root@rh7z-pe ~]# lsmod |grep lnet

However, if the  module is loaded indirectly, as a dependency of the Lustre kernel module, then   will gracefully unload all modules including  :

 [root@rh7z-pe ~]# modprobe lustre [root@rh7z-pe ~]# lctl list_nids 192.168.207.2@tcp [root@rh7z-pe ~]# lustre_rmmod [root@rh7z-pe ~]# lsmod | grep -E lnet\|lustre

This behavior is consistent, but not entirely intuitive. The reason for this behavior has to do with a special function of LNet: routing. LNet routing enables a node that is connected to more than one LNet fabric to route traffic between the networks. LNet routing is a complex topic and is not discussed in this article. For more information on LNet routers, see:

http://www.intel.com/content/www/us/en/software/configuring-lnet-routers-file-systems-lustre-guide.html

Because routing is a function of the network, not of the Lustre file system itself,  will effectively assume that if a host has only the   module loaded and running, then it is providing routing services. will therefore refuse to unload the modules unless the  service is explicitly unconfigured.

If, on the other hand, the lustre kernel module is also loaded, and there are no file systems mounted, then  will assume that the host is either an idle server or client and will unload the entire stack, including the   modules.

If a Lustre OSD is mounted on a host, then the  command will not unload the Lustre kernel modules and will report an error:

 [root@rh7z-mds1 ~]# df -ht lustre File system     Size  Used Avail Use% Mounted on mgspool/mgt     960M  2.2M  956M   1% /lfs/mgt

[root@rh7z-mds1 ~]# lustre_rmmod 0 UP osd-zfs MGS-osd MGS-osd_UUID 5 1 UP mgs MGS MGS 7 2 UP mgc MGC192.168.227.11@tcp1 c2108a9c-a62f-6626-48e4-68f1caf1bce3 5 Modules still loaded: lustre/mgs/mgs.o lustre/mgc/mgc.o lustre/quota/lquota.o lustre/fid/fid.o lustre/fld/fld.o lnet/klnds/socklnd/ksocklnd.o lustre/ptlrpc/ptlrpc.o lustre/obdclass/obdclass.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o

[root@rh7z-mds1 ~]# df -ht lustre File system     Size  Used Avail Use% Mounted on mgspool/mgt     960M  2.2M  956M   1% /lfs/mgt

From this example, it can be seen that because the MGS is mounted,  takes no action to remove the kernel modules. Instead, it shows that there are active services running on the host and exits. The MGT is still mounted and the MGS is running. The  command is a very useful tool for ensuring the correct and safe unloading of Lustre kernel modules.