Starting and Stopping LNet
LNet, like most of the services that comprise a Lustre file system, runs in the Linux kernel and is incorporated as a kernel module. LNet is started in two steps:
- Load the kernel modules
- Start the services
lnet kernel module can be loaded directly through the modprobe command or indirectly by loading a kernel module that has a dependency on LNet. In normal operation, the
lnet module will be loaded indirectly as a consequence of attempting to start a Lustre service, e.g. by mounting a file system on a client. However, one can treat LNet as independent of Lustre and start it on its own. This is useful for testing and debugging purposes, and to provide some verification of correctness when a system boots up prior to committing to loading the higher-level services (i.e. Lustre).
To load the LNet kernel module, run:
modprobe [-v] lnet
-v flag is optional and provides verbose output, which is useful for debugging purposes but is normally omitted. For example:
[[email protected] ~]# modprobe -v lnet insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/libcfs.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/lnet.ko networks="tcp0(eth1)"
Notice that a second module, called
libcfs.ko, was also loaded. The
libcfs module is an API used throughout Lustre and LNet and provides primitives for things like process management, memory management, and debugging.
After the module is loaded, the LNet service needs to be started:
lctl network up # or lctl network configure
lctl network command works on all versions of Lustre, and prior to version 2.7, it is the only way to manually start LNet. In Lustre 2.7 and onward, there is also the
lnetctl lnet configure [--all]
lnetctl configure command will not automatically configure networks that are specified in the kernel module parameters; the
lnet service will start, but the interfaces will not be configured. Supplying the
--all flag will cause all of the networks defined as kernel module options to be loaded and started.
To view the loaded configuration:
lctl list_nids # or for dynamic lnet in Lustre 2.7+ lnetctl net show [--verbose] lnetctl export
lnetctl export command is equivalent to
lnetctl net show --verbose.
To shut down LNet and unload the kernel modules, first stop the LNet networks on the host:
lctl network down # or lctl network unconfigure
Then use the
lustre_rmmod command to unload the kernel modules:
One can unload the module by using
rmmod lnet rmmod libcfs
lustre_rmmod is the recommended method for unloading Lustre and LNet kernel modules, because it will check for dependencies and eliminates any guesswork on the part of the systems administrator in trying to identify all of the modules to unload and the correct sequence for doing so.
LNet can also be loaded indirectly, as a dependency of the
lustre kernel module. If LNet is loaded in this way, its start-up behavior is different because the LNet networks defined in kernel module options will be automatically configured and brought online. This is easily illustrated just by loading the Lustre module:
[[email protected] ~]# modprobe -v lustre insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/libcfs.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/lnet.ko networks="tcp0(eth1)" insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/obdclass.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/ptlrpc.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/fld.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/fid.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/lov.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/mdc.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/lmv.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/lustre.ko
Notice that the
lnet.ko module is loaded as a dependency. The console and kernel ring buffer output will look something like this:
[266699.213610] LNet: HW CPU cores: 2, npartitions: 1 [266699.232630] alg: No test for adler32 (adler32-zlib) [266699.234184] alg: No test for crc32 (crc32-table) [266707.286906] Lustre: Lustre: Build Version: jenkins-arch=x86_64,build_type=server,distro=el7,ib_stack=inkernel-40--PRISTINE-3.10.0-327.13.1.el7_lustre.x86_64 [266707.338890] LNet: Added LNI [email protected] [8/256/0/180] [266707.339851] LNet: Accept secure, port 988
As can be seen in the above output, the LNet networks were automatically loaded.
lustre_rmmod behavior is also different in this circumstance, compared to loading LNet on its own. If the administrator loads and configures LNet on its own, independently of the Lustre module, then it is necessary to unconfigure the LNet networks before removing the kernel modules:
[[email protected] ~]# modprobe lnet [[email protected] ~]# lctl network up LNET configured [[email protected] ~]# lctl list_nids [email protected] [[email protected] ~]# lustre_rmmod Modules still loaded: lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o [[email protected] ~]# lctl network down LNET ready to unload [[email protected] ~]# lustre_rmmod [[email protected] ~]# lsmod |grep lnet
However, if the
lnet module is loaded indirectly, as a dependency of the Lustre kernel module, then
lustre_rmmod will gracefully unload all modules including
[[email protected] ~]# modprobe lustre [[email protected] ~]# lctl list_nids [email protected] [[email protected] ~]# lustre_rmmod [[email protected] ~]# lsmod | grep -E lnet\|lustre
This behavior is consistent, but not entirely intuitive. The reason for this behavior has to do with a special function of LNet: routing. LNet routing enables a node that is connected to more than one LNet fabric to route traffic between the networks. LNet routing is a complex topic and is not discussed in this article. For more information on LNet routers, see:
Because routing is a function of the network, not of the Lustre file system itself,
lustre_rmmod will effectively assume that if a host has only the
lnet module loaded and running, then it is providing routing services.
lustre_rmmod will therefore refuse to unload the modules unless the
lnet service is explicitly unconfigured.
If, on the other hand, the lustre kernel module is also loaded, and there are no file systems mounted, then
lustre_rmmod will assume that the host is either an idle server or client and will unload the entire stack, including the
If a Lustre OSD is mounted on a host, then the
lustre_rmmod command will not unload the Lustre kernel modules and will report an error:
[[email protected] ~]# df -ht lustre File system Size Used Avail Use% Mounted on mgspool/mgt 960M 2.2M 956M 1% /lfs/mgt [[email protected] ~]# lustre_rmmod 0 UP osd-zfs MGS-osd MGS-osd_UUID 5 1 UP mgs MGS MGS 7 2 UP mgc [email protected] c2108a9c-a62f-6626-48e4-68f1caf1bce3 5 Modules still loaded: lustre/mgs/mgs.o lustre/mgc/mgc.o lustre/quota/lquota.o lustre/fid/fid.o lustre/fld/fld.o lnet/klnds/socklnd/ksocklnd.o lustre/ptlrpc/ptlrpc.o lustre/obdclass/obdclass.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o [[email protected] ~]# df -ht lustre File system Size Used Avail Use% Mounted on mgspool/mgt 960M 2.2M 956M 1% /lfs/mgt
From this example, it can be seen that because the MGS is mounted,
lustre_rmmod takes no action to remove the kernel modules. Instead, it shows that there are active services running on the host and exits. The MGT is still mounted and the MGS is running. The
lustre_rmmod command is a very useful tool for ensuring the correct and safe unloading of Lustre kernel modules.