Difference between revisions of "Mounting a Lustre File System on Client Nodes"
(Created page with "All end-user application I/O happens via a service called the Lustre client. The client is responsible for providing a POSIX interface to applications, creating a coherent pre...")
(→Starting and stopping the Lustre Client: Add info for client mounts with systemd (C.f. https://jira.hpdd.intel.com/browse/LU-8293))
|Line 107:||Line 107:|
Revision as of 09:39, 7 May 2018
All end-user application I/O happens via a service called the Lustre client. The client is responsible for providing a POSIX interface to applications, creating a coherent presentation of the metadata (file system name space) and object data (file content) to applications running on the client operating system. All Lustre file system IO is transacted over a network protocol.
Client specifications are entirely application-driven and vary widely across the spectrum of applications, organisations and industries. Lustre clients must be running a Linux operating system, and the client software is comprised of kernel modules, with some user-space tools to assist with configuration and management.
Starting and stopping the Lustre Client
Start a Lustre client using the
mount command, the basic syntax of which is:
mount -t lustre \ [-o <options> ] \ <MGS NID>[:<MGS NID>]:/<fsname> \ /lustre/<fsname>
To stop the Lustre client, unmount the file system:
umount commands require super-user privileges to run.
The mount point directory must exist before the mount command is executed. The recommended convention for the mount point of the client is
<fsname> is the name of the file system.
mount command is invoked, the client first registers with the MGS to retrieve the configuration information, also referred to as the log, for the file system that it wants to mount. A single MGS can store the configuration information for more than one file system.
The following example shows the command line used to mount a file system named
mkdir -p /lustre/demo mount -t lustre \ [email protected]:[email protected]:/demo \ /lustre/demo
The client will try to connect to the MGS in the order of the NID addresses supplied on the command line. If connection to the first NID fails, the client will attempt a connection using the next NID.
To verify that the file system is mounted on the client, use the
[[email protected] ~]# df -ht lustre File system Size Used Avail Use% Mounted on [email protected]:[email protected]:/demo 49G 2.9M 49G 1% /lustre/demo The lctl dl command provides detail on the connections to the Lustre services: [[email protected] ~]# lctl dl 0 UP mgc MGC192.168[email protected] 7f07b5f9-27e3-0b09-7456-d83ae184d204 5 1 UP lov demo-clilov-ffff8800bab6a000 c04fa65d-3f0b-9cbf-b373-6a894da8e0be 4 2 UP lmv demo-clilmv-ffff8800bab6a000 c04fa65d-3f0b-9cbf-b373-6a894da8e0be 4 3 UP mdc demo-MDT0000-mdc-ffff8800bab6a000 c04fa65d-3f0b-9cbf-b373-6a894da8e0be 5 4 UP osc demo-OST0000-osc-ffff8800bab6a000 c04fa65d-3f0b-9cbf-b373-6a894da8e0be 5
If the MGS is unavailable, the
mount command will return an error, similar to the following example:
[[email protected] ~]# mount -t lustre \ > [email protected]:[email protected]:/demo \ > /lustre/demo mount.lustre: mount [email protected]:[email protected]:/demo at /lustre/demo failed: Input/output error Is the MGS running?
More detailed information on the failure will be in the syslog and kernel ring buffer:
[ 9996.909126] Lustre: 10199:0:(client.c:1967:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1459822631/real 1459822631] [email protected] x1530734379532292/t0(0) o250->[email protected]@[email protected]:26/25 lens 400/544 e 0 to 1 dl 1459822636 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [10021.923403] Lustre: 10199:0:(client.c:1967:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1459822656/real 1459822656] [email protected] x1530734379532308/t0(0) o250->[email protected]@[email protected]:26/25 lens 400/544 e 0 to 1 dl 1459822661 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [10027.155495] LustreError: 15c-8: [email protected]: The configuration from log 'demo-client' failed (-5). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. [10027.207044] Lustre: Unmounted demo-client [10027.214618] LustreError: 10212:0:(obd_mount.c:1342:lustre_fill_super()) Unable to mount (-5)
The most common cause of failure is an improperly configured network interface, or LNet NID. Verify that the LNet protocol is able to communicate with the MGS with
lctl ping <MGS NID>
If the ping fails, the command will return an I/O error:
[[email protected] ~]# lctl ping [email protected] failed to ping [email protected]: Input/output error
Check the LNet settings before continuing.
If the ping succeeds, but the mount still fails, verify that the Lustre services are running on the target host. Also check to see if there are any services running on the client that might be interfering with communication, such as a firewall or SELinux. While SELinux is supported in Lustre 2.8 onwards, older releases of Lustre are not compatible. Temporarily disabling the firewall and SELinux can help narrow down the root cause of issues with Lustre communications.
If there are no OSS services online, but the MGS and the MDS for MDT0 are running, then the client mount command will hang indefinitely until an OSS service starts up.
There are options specific to Lustre that can be applied to the Lustre client
mount command. The most common of these are
flock: enable support for cluster-wide, coherent file locks. Must be applied to the
mountcommands for all clients that will be accessing common data requiring lock functionality. Cluster-wide locking will have a detrimental impact on file system performance, and should only be enabled when absolutely required. For some applications, the locking is only necessary on a sub-set of nodes. For example, the CTDB cluster framework used by Samba to provide a parallel, high-availability SMB gateway, relies on locking of a shared file when coordinating cluster start-up and recovery. However, only the CTDB nodes need to mount the Lustre file system with the
flockoption. This is an example of application or domain-specific lock requirements.
localflock: enable client-local
flocksupport. This is much faster than cluster-wide
flocksupport, but is only suitable for applications that require locks, but don’t run on multiple hosts (or where the data will not be accessed in a manner that would require locking across multiple hosts).
user_xattr: Enable support for user extended attributes.
Additionally, consider using the
_netdev mount option when mounting the Lustre client, especially when adding an entry into
/etc/fstab. This option indicates to the operating system that the file system has a dependency on the network such that it should not be mounted before the network is online and should be unmounted on shutdown prior to stopping the network stack. An example entry for
[email protected]:[email protected]:/demo /lustre/demo lustre defaults,_netdev 0 0
For systemd based systems, the following is recommended to ensure correct startup and shutdown:
[email protected]:[email protected]:/demo /lustre/demo lustre defaults,_netdev,noauto,x-systemd.automount,x-systemd.requires=lnet.service 0 0
Refer to the
mount.lustre(8) man page for more information on the available options.