Mounting a Lustre File System on Client Nodes
All end-user application I/O happens via a service called the Lustre client. The client is responsible for providing a POSIX interface to applications, creating a coherent presentation of the metadata (file system name space) and object data (file content) to applications running on the client operating system. All Lustre file system IO is transacted over a network protocol.
Client specifications are entirely application-driven and vary widely across the spectrum of applications, organisations and industries. Lustre clients must be running a Linux operating system, and the client software is comprised of kernel modules, with some user-space tools to assist with configuration and management.
Starting and stopping the Lustre Client
Start a Lustre client using the
mount command, the basic syntax of which is:
mount -t lustre \ [-o <options> ] \ <MGS NID>[:<MGS NID>]:/<fsname> \ /lustre/<fsname>
To stop the Lustre client, unmount the file system:
umount commands require super-user privileges to run.
The mount point directory must exist before the mount command is executed. The recommended convention for the mount point of the client is
<fsname> is the name of the file system.
mount command is invoked, the client first registers with the MGS to retrieve the configuration information, also referred to as the log, for the file system that it wants to mount. A single MGS can store the configuration information for more than one file system.
The following example shows the command line used to mount a file system named
mkdir -p /lustre/demo mount -t lustre \ 192.168.227.11@tcp1:192.168.227.12@tcp1:/demo \ /lustre/demo
The client will try to connect to the MGS in the order of the NID addresses supplied on the command line. If connection to the first NID fails, the client will attempt a connection using the next NID.
To verify that the file system is mounted on the client, use the
[root@rh7z-c3 ~]# df -ht lustre File system Size Used Avail Use% Mounted on 192.168.227.11@tcp1:192.168.227.12@tcp1:/demo 49G 2.9M 49G 1% /lustre/demo The lctl dl command provides detail on the connections to the Lustre services: [root@rh7z-c3 ~]# lctl dl 0 UP mgc MGC192.168.227.11@tcp1 7f07b5f9-27e3-0b09-7456-d83ae184d204 5 1 UP lov demo-clilov-ffff8800bab6a000 c04fa65d-3f0b-9cbf-b373-6a894da8e0be 4 2 UP lmv demo-clilmv-ffff8800bab6a000 c04fa65d-3f0b-9cbf-b373-6a894da8e0be 4 3 UP mdc demo-MDT0000-mdc-ffff8800bab6a000 c04fa65d-3f0b-9cbf-b373-6a894da8e0be 5 4 UP osc demo-OST0000-osc-ffff8800bab6a000 c04fa65d-3f0b-9cbf-b373-6a894da8e0be 5
If the MGS is unavailable, the
mount command will return an error, similar to the following example:
[root@rh7z-c3 ~]# mount -t lustre \ > 192.168.227.11@tcp1:192.168.227.12@tcp1:/demo \ > /lustre/demo mount.lustre: mount 192.168.227.11@tcp1:192.168.227.12@tcp1:/demo at /lustre/demo failed: Input/output error Is the MGS running?
More detailed information on the failure will be in the syslog and kernel ring buffer:
[ 9996.909126] Lustre: 10199:0:(client.c:1967:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1459822631/real 1459822631] req@ffff8800ad298000 x1530734379532292/t0(0) o250->MGC192.168.227.11@email@example.com@tcp1:26/25 lens 400/544 e 0 to 1 dl 1459822636 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [10021.923403] Lustre: 10199:0:(client.c:1967:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1459822656/real 1459822656] req@ffff880138238000 x1530734379532308/t0(0) o250->MGC192.168.227.11@firstname.lastname@example.org@tcp1:26/25 lens 400/544 e 0 to 1 dl 1459822661 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [10027.155495] LustreError: 15c-8: MGC192.168.227.11@tcp1: The configuration from log 'demo-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. [10027.207044] Lustre: Unmounted demo-client [10027.214618] LustreError: 10212:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount (-5)
The most common cause of failure is an improperly configured network interface, or LNet NID. Verify that the LNet protocol is able to communicate with the MGS with
lctl ping <MGS NID>
If the ping fails, the command will return an I/O error:
[root@rh7z-c3 ~]# lctl ping 192.168.227.11@tcp1 failed to ping 192.168.227.11@tcp1: Input/output error
Check the LNet settings before continuing.
If the ping succeeds, but the mount still fails, verify that the Lustre services are running on the target host. Also check to see if there are any services running on the client that might be interfering with communication, such as a firewall or SELinux. While SELinux is supported in Lustre 2.8 onwards, older releases of Lustre are not compatible. Temporarily disabling the firewall and SELinux can help narrow down the root cause of issues with Lustre communications.
If there are no OSS services online, but the MGS and the MDS for MDT0 are running, then the client mount command will hang indefinitely until an OSS service starts up.
There are options specific to Lustre that can be applied to the Lustre client
mount command. The most common of these are
flock: enable support for cluster-wide, coherent posix file locks ('flock's). Must be applied to the
mountcommands for all clients that will be accessing common data requiring lock functionality. This has no measurable performance impact in modern (2.x) versions of Lustre (this page previously stated there was a performance impact; this is not correct for modern versions). This is the recommended mode, and is enabled by default in Lustre 2.13 and newer.
localflock: enable client-local
flocksupport. This option is a holdover from older versions where full flock support had a performance impact. It is potentially unsafe and should be used with caution - it is only suitable for applications that require posix file locks, but don’t run on multiple hosts (or where the data will not be accessed in a manner that would require locking across multiple hosts). This generally implies that you have a multi-threaded app using flocks that is known to be running on only one node. If the app is ever run across multiple hosts, it will *not* receive the expected locking behavior, which would generally result in application failures.
user_xattr: Enable support for user extended attributes.
Additionally, consider using the
_netdev mount option when mounting the Lustre client, especially when adding an entry into
/etc/fstab. This option indicates to the operating system that the file system has a dependency on the network such that it should not be mounted before the network is online and should be unmounted on shutdown prior to stopping the network stack. An example entry for
192.168.227.11@tcp1:192.168.227.12@tcp1:/demo /lustre/demo lustre defaults,_netdev 0 0
For systemd based systems, the following is recommended to ensure correct startup and shutdown. Be sure you have properly setup
lnet service in systemd before using these options:
192.168.227.11@tcp1:192.168.227.12@tcp1:/demo /lustre/demo lustre defaults,_netdev,noauto,x-systemd.automount,x-systemd.requires=lnet.service 0 0
Refer to the
mount.lustre(8) man page for more information on the available options.