Creating the Lustre Management Service (MGS)
MGT Formatted as an LDISKFS OSD
The syntax for creating an LDISKFS-based MGT is as follows:
mkfs.lustre --mgs \ [ --reformat ] \ [ --servicenode <NID> [--servicenode <NID> …]] \ [ --failnode <NID> [--failnode <NID> …]] \ [ --backfstype=ldiskfs] \ [ --mkfsoptions <options> ] \ <device path>
The next example uses the --servicenode
syntax to create an MGT that can be run on two servers as a high availability failover resource:
[root@rh7z-mds1 ~]# mkfs.lustre --mgs \ --servicenode 192.168.227.11@tcp1 \ --servicenode 192.168.227.12@tcp1 \ --backfstype=ldiskfs \ /dev/dm-1
The command-line formats a new MGT that will be used by the MGS for storage. Two server NIDs are supplied as service nodes for the MGS, 192.168.227.11@tcp1
and 192.168.227.12@tcp1
.
The --failnode
syntax is similar, but is used to define only a failover target for the storage service. For example:
[root@rh7z-mds1 ~]# mkfs.lustre --mgs \ --failnode 192.168.227.12@tcp1 \ --backfstype=ldiskfs \ /dev/dm-1
Here, the failover host is identified as 192.168.227.12@tcp1
, one server in an HA pair (and which, for the purpose of this example, has the hostname rh7z-mds2
). The mkfs.lustre
command was executed on rh7z-mds1
(NID: 192.168.227.11@tcp1
), and the mount
command must also be run from rh7z-mds1
when the MGS service starts for the very first time.
MGT Formatted as a ZFS OSD
Formatting the MGT using only the mkfs.lustre command
Note: For the greatest flexibility and control when creating ZFS-based Lustre storage targets, do not use this approach – instead, create the zpool separately from formatting the Lustre OSD. See Formatting the MGT using zpool and mkfs.lustre.
The syntax for creating a ZFS-based MGT using only the mkfs.lustre
command is as follows:
mkfs.lustre --mgs \ [ --reformat ] \ [ --servicenode <NID> [--servicenode <NID> …]] \ [ --failnode <NID> [--failnode <NID> …]] \ --backfstype=zfs \ [ --mkfsoptions <options> ] \ <zpool>/<dataset> <zpool specification>
This example uses the --servicenode
syntax to create an MGT that can be run on two servers as a high availability failover resource:
[root@rh7z-mds1 ~]# mkfs.lustre --mgs \ --servicenode 192.168.227.11@tcp1 \ --servicenode 192.168.227.12@tcp1 \ --backfstype=zfs \ mgspool/mgt mirror sda sdc
The command-line formats a new MGT that will be used by the MGS for storage. The command further defines a mirrored zpool called mgspool
(consisting of two devices) and creates a ZFS dataset called mgt
. Two server NIDs are supplied as service nodes for the MGS, 192.168.227.11@tcp1
and 192.168.227.12@tcp1
.
The --failnode
syntax is similar, but is used to define only a failover target for the storage service. For example:
[root@rh7z-mds1 ~]# mkfs.lustre --mgs \ --failnode 192.168.227.12@tcp1 \ --backfstype=zfs \ mgspool/mgt mirror sda sdc
As with the LDISKFS example in the previous section, the failover host is identified as 192.168.227.12@tcp1
, one server in an HA pair (and which has the hostname rh7z-mds2
). The mkfs.lustre
command was executed on rh7z-mds1
(NID: 192.168.227.11@tcp1
), and the mount
command must also be run from rh7z-mds1
when the MGS service starts for the very first time.
Note that when creating a ZFS-based OSD using only the mkfs.lustre command, it is not possible to set or change some properties of the zpool or its vdevs, such as the ashift
property. For this reason, it is highly recommended that the zpools be created independently of the mkfs.lustre command, as shown in the next section.
Formatting the MGT using zpool and mkfs.lustre
To create a ZFS-based MGT, first create a zpool to contain the MGT file system dataset:
zpool create [-f] -O canmount=off \ [-o ashift=<n>] \ -o cachefile=/etc/zfs/<zpool name>.spec | -o cachefile=none \ <zpool name> <zpool specification>
Next, use mkfs.lustre
to actually create the file system inside the zpool:
mkfs.lustre --mgs \ [ --reformat ] \ [ --servicenode <NID> [--servicenode <NID> …]] \ [ --failnode <NID> [--failnode <NID> …]] \ --backfstype=zfs \ [ --mkfsoptions <options> ] \ <pool name>/<dataset>
For example:
# Create the zpool zpool create -O canmount=off \ -o cachefile=none \ mgspool mirror sda sdc # Format the Lustre MGT mkfs.lustre --mgs \ --servicenode 192.168.227.11@tcp1 \ --servicenode 192.168.227.12@tcp1 \ --backfstype=zfs \ mgspool/mgt
Use the zfs get
command to retrieve comprehensive metadata information about the file system dataset and to confirm that the Lustre properties have been set correctly:
zfs get all | awk '$2 ~ /lustre/'
Alternatively, use the following command to retrieve all properties that have been explicitly set (which may be a larger list than just the lustre:
properties):
zfs get all -s local
An MGT example:
[root@rh7z-mds1 ~]# zfs get all -s local NAME PROPERTY VALUE SOURCE mgspool canmount off local mgspool/mgt canmount off local mgspool/mgt xattr sa local mgspool/mgt lustre:version 1 local mgspool/mgt lustre:index 65535 local mgspool/mgt lustre:failover.node 192.168.227.11@tcp1:192.168.227.12@tcp1 local mgspool/mgt lustre:svname MGS local mgspool/mgt lustre:flags 4132 local
Starting the MGS Service
The mount
command is used to start all Lustre storage services, including the MGS. Therefore, to start up the MGS, one must mount the MGT on the server.
The syntax is:
mount -t lustre [-o <options>] \ <ldiskfs blockdev>|<zpool>/<dataset> <mount point>
The mount
command syntax is very similar for both LDISKFS and ZFS MGT storage targets. The main difference is the format of the path to the storage. For LDISKFS, the path will resolve to a block device, such as /dev/sda
or /dev/mapper/mpatha
, whereas for ZFS, the path resolves to a dataset in a zpool, e.g., mgspool/mgt
.
The mount point directory must exist before the mount
command is executed. The recommended convention for the mount point of the MGT storage is /lustre/mgt
.
The following example starts a ZFS-based MGT:
# Ignore MOUNTPOINT column in output: not used by Lustre [root@rh7z-mds1 ~]# zfs list NAME USED AVAIL REFER MOUNTPOINT mgspool 1.67M 974M 19K /mgspool mgspool/mgt 1.59M 974M 1.59M /mgspool/mgt [root@rh7z-mds1 ~]# mkdir -p /lustre/mgt [root@rh7z-mds1 ~]# mount -t lustre mgspool/mgt /lustre/mgt [root@rh7z-mds1 ~]# df -ht lustre File system Size Used Avail Use% Mounted on mgspool/mgt 960M 1.7M 957M 1% /lustre/mgt
Note that the default output for zfs list
shows the mount points for the MGS pool and MGT dataset in the MOUNTPOINT
column. The content in the MOUNTPOINT
column can be ignored because the referenced mount points are not used for mounting Lustre ZFS OSDs. Instead, the mount point is created administrator explicitly, just as for an LDISKFS-based storage target or any other regular file system mount point.
To reduce confusion, the ZFS file system mountpoint
property can be set equal to none
. For example:
zfs set mountpoint=none mgspool zfs set mountpoint=none mgspool/mgt
Note: Only the mount -t lustre
command can start Lustre services. Mounting storage as type ldiskfs
or zfs
will mount a storage target on the host, but it will not trigger the startup of the Lustre kernel processes.
To verify that the MGS is running, check that the device has been mounted, then get the Lustre device list with lctl dl
and review the running processes:
[root@rh7z-mds1 lustre]# df -ht lustre File system Size Used Avail Use% Mounted on mgspool/mgt 960M 2.0M 956M 1% /lustre/mgt [root@rh7z-mds1 ~]# lctl dl 0 UP osd-zfs MGS-osd MGS-osd_UUID 5 1 UP mgs MGS MGS 5 2 UP mgc MGC192.168.227.11@tcp1 5d62a612-f872-09a4-7da8-4ce562af6e0c 5 [root@rh7z-mds1 ~]# ps -ef | awk '/mgs/ && !/awk/' root 15162 2 0 02:44 ? 00:00:00 [mgs_params_noti] root 15163 2 0 02:44 ? 00:00:00 [ll_mgs_0000] root 15164 2 0 02:44 ? 00:00:00 [ll_mgs_0001] root 15165 2 0 02:44 ? 00:00:00 [ll_mgs_0002]
Stopping the MGS Service
To stop a Lustre service, run umount
on the corresponding target:
umount <mount point>
The mount point must correspond to the mount point used with the mount -t lustre command
. For example:
[root@rh7z-mds1 ~]# df -ht lustre File system Size Used Avail Use% Mounted on mgspool/mgt 960M 2.0M 956M 1% /lustre/mgt [root@rh7z-mds1 ~]# umount /lustre/mgt [root@rh7z-mds1 ~]# df -ht lustre df: no file systems processed [root@rh7z-mds1 ~]# lctl dl [root@rh7z-mds1 ~]#
Using the regular umount
command is the correct way to stop a given Lustre service and unmount the associated storage, for both LDISKFS and ZFS-based Lustre storage volumes.
Do not use the zfs unmount
command to stop a Lustre service. Attempting to use zfs
commands to unmount a storage target that is mounted as part of an active Lustre service will return an error:
[root@rh7z-mds1 ~]# lctl dl 0 UP osd-zfs MGS-osd MGS-osd_UUID 5 1 UP mgs MGS MGS 5 2 UP mgc MGC192.168.227.11@tcp1 be9fad27-107b-d165-8494-9a723b90e863 5 [root@rh7z-mds1 ~]# mount -t lustre mgspool/mgt on /lustre/mgt type lustre (ro) [root@rh7z-mds1 ~]# zfs list NAME USED AVAIL REFER MOUNTPOINT mgspool 2.05M 974M 19K /mgspool mgspool/mgt 1.97M 974M 1.97M /mgspool/mgt [root@rh7z-mds1 ~]# zpool status pool: mgspool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM mgspool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 sda ONLINE 0 0 0 sdc ONLINE 0 0 0 errors: No known data errors [root@rh7z-mds1 ~]# zfs unmount mgspool/mgt cannot unmount 'mgspool/mgt': not currently mounted [root@rh7z-mds1 ~]# zfs unmount /lustre/mgt cannot unmount '/lustre/mgt': not a ZFS file system
In the example, the MGS is up and running on a host, and the MGT storage is formatted as a ZFS dataset in a mirrored zpool. The service is online and the storage is mounted as a Lustre file system type. When an attempt is made to use ZFS to umount the volume, the command fails, regardless of if one uses <zpool>/<dataset>
or the mount point as the reference to the storage volume.
These examples are provided to reinforce the point that many of the Lustre server management tools are the same whether LDISKFS or ZFS is used for the underlying storage. Of course there are storage-level differences, but where possible, the Lustre tools are common to both storage target formats.