Starting and Stopping LNet: Difference between revisions
No edit summary |
No edit summary |
||
Line 159: | Line 159: | ||
[[Category:Lustre Networking (LNet)]] | [[Category:Lustre Networking (LNet)]] | ||
[[Category:Lustre Systems Administration]] |
Latest revision as of 22:47, 30 August 2017
LNet, like most of the services that comprise a Lustre file system, runs in the Linux kernel and is incorporated as a kernel module. LNet is started in two steps:
- Load the kernel modules
- Start the services
The lnet
kernel module can be loaded directly through the modprobe command or indirectly by loading a kernel module that has a dependency on LNet. In normal operation, the lnet
module will be loaded indirectly as a consequence of attempting to start a Lustre service, e.g. by mounting a file system on a client. However, one can treat LNet as independent of Lustre and start it on its own. This is useful for testing and debugging purposes, and to provide some verification of correctness when a system boots up prior to committing to loading the higher-level services (i.e. Lustre).
To load the LNet kernel module, run:
modprobe [-v] lnet
The -v
flag is optional and provides verbose output, which is useful for debugging purposes but is normally omitted. For example:
[root@rh7z-pe ~]# modprobe -v lnet insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/libcfs.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/lnet.ko networks="tcp0(eth1)"
Notice that a second module, called libcfs.ko
, was also loaded. The libcfs
module is an API used throughout Lustre and LNet and provides primitives for things like process management, memory management, and debugging.
After the module is loaded, the LNet service needs to be started:
lctl network up # or lctl network configure
The lctl network
command works on all versions of Lustre, and prior to version 2.7, it is the only way to manually start LNet. In Lustre 2.7 and onward, there is also the lnetctl
utility:
lnetctl lnet configure [--all]
The lnetctl configure
command will not automatically configure networks that are specified in the kernel module parameters; the lnet
service will start, but the interfaces will not be configured. Supplying the --all
flag will cause all of the networks defined as kernel module options to be loaded and started.
To view the loaded configuration:
lctl list_nids # or for dynamic lnet in Lustre 2.7+ lnetctl net show [--verbose] lnetctl export
The lnetctl export
command is equivalent to lnetctl net show --verbose
.
To shut down LNet and unload the kernel modules, first stop the LNet networks on the host:
lctl network down # or lctl network unconfigure
Then use the lustre_rmmod
command to unload the kernel modules:
lustre_rmmod
One can unload the module by using rmmod
directly:
rmmod lnet rmmod libcfs
lustre_rmmod
is the recommended method for unloading Lustre and LNet kernel modules, because it will check for dependencies and eliminates any guesswork on the part of the systems administrator in trying to identify all of the modules to unload and the correct sequence for doing so.
LNet can also be loaded indirectly, as a dependency of the lustre
kernel module. If LNet is loaded in this way, its start-up behavior is different because the LNet networks defined in kernel module options will be automatically configured and brought online. This is easily illustrated just by loading the Lustre module:
[root@rh7z-pe ~]# modprobe -v lustre insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/libcfs.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/net/lustre/lnet.ko networks="tcp0(eth1)" insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/obdclass.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/ptlrpc.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/fld.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/fid.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/lov.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/mdc.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/lmv.ko insmod /lib/modules/3.10.0-327.13.1.el7_lustre.x86_64/extra/kernel/fs/lustre/lustre.ko
Notice that the lnet.ko
module is loaded as a dependency. The console and kernel ring buffer output will look something like this:
[266699.213610] LNet: HW CPU cores: 2, npartitions: 1 [266699.232630] alg: No test for adler32 (adler32-zlib) [266699.234184] alg: No test for crc32 (crc32-table) [266707.286906] Lustre: Lustre: Build Version: jenkins-arch=x86_64,build_type=server,distro=el7,ib_stack=inkernel-40--PRISTINE-3.10.0-327.13.1.el7_lustre.x86_64 [266707.338890] LNet: Added LNI 192.168.207.2@tcp [8/256/0/180] [266707.339851] LNet: Accept secure, port 988
As can be seen in the above output, the LNet networks were automatically loaded.
The lustre_rmmod
behavior is also different in this circumstance, compared to loading LNet on its own. If the administrator loads and configures LNet on its own, independently of the Lustre module, then it is necessary to unconfigure the LNet networks before removing the kernel modules:
[root@rh7z-pe ~]# modprobe lnet [root@rh7z-pe ~]# lctl network up LNET configured [root@rh7z-pe ~]# lctl list_nids 192.168.207.2@tcp [root@rh7z-pe ~]# lustre_rmmod Modules still loaded: lnet/klnds/socklnd/ksocklnd.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o [root@rh7z-pe ~]# lctl network down LNET ready to unload [root@rh7z-pe ~]# lustre_rmmod [root@rh7z-pe ~]# lsmod |grep lnet
However, if the lnet
module is loaded indirectly, as a dependency of the Lustre kernel module, then lustre_rmmod
will gracefully unload all modules including lnet
:
[root@rh7z-pe ~]# modprobe lustre [root@rh7z-pe ~]# lctl list_nids 192.168.207.2@tcp [root@rh7z-pe ~]# lustre_rmmod [root@rh7z-pe ~]# lsmod | grep -E lnet\|lustre
This behavior is consistent, but not entirely intuitive. The reason for this behavior has to do with a special function of LNet: routing. LNet routing enables a node that is connected to more than one LNet fabric to route traffic between the networks. LNet routing is a complex topic and is not discussed in this article. For more information on LNet routers, see:
Because routing is a function of the network, not of the Lustre file system itself, lustre_rmmod
will effectively assume that if a host has only the lnet
module loaded and running, then it is providing routing services. lustre_rmmod
will therefore refuse to unload the modules unless the lnet
service is explicitly unconfigured.
If, on the other hand, the lustre kernel module is also loaded, and there are no file systems mounted, then lustre_rmmod
will assume that the host is either an idle server or client and will unload the entire stack, including the lnet
modules.
If a Lustre OSD is mounted on a host, then the lustre_rmmod
command will not unload the Lustre kernel modules and will report an error:
[root@rh7z-mds1 ~]# df -ht lustre File system Size Used Avail Use% Mounted on mgspool/mgt 960M 2.2M 956M 1% /lfs/mgt [root@rh7z-mds1 ~]# lustre_rmmod 0 UP osd-zfs MGS-osd MGS-osd_UUID 5 1 UP mgs MGS MGS 7 2 UP mgc MGC192.168.227.11@tcp1 c2108a9c-a62f-6626-48e4-68f1caf1bce3 5 Modules still loaded: lustre/mgs/mgs.o lustre/mgc/mgc.o lustre/quota/lquota.o lustre/fid/fid.o lustre/fld/fld.o lnet/klnds/socklnd/ksocklnd.o lustre/ptlrpc/ptlrpc.o lustre/obdclass/obdclass.o lnet/lnet/lnet.o libcfs/libcfs/libcfs.o [root@rh7z-mds1 ~]# df -ht lustre File system Size Used Avail Use% Mounted on mgspool/mgt 960M 2.2M 956M 1% /lfs/mgt
From this example, it can be seen that because the MGS is mounted, lustre_rmmod
takes no action to remove the kernel modules. Instead, it shows that there are active services running on the host and exits. The MGT is still mounted and the MGS is running. The lustre_rmmod
command is a very useful tool for ensuring the correct and safe unloading of Lustre kernel modules.