Creating the Lustre Management Service (MGS)

MGT Formatted as an LDISKFS OSD
The syntax for creating an LDISKFS-based MGT is as follows:

 mkfs.lustre --mgs \ [ --reformat ] \ [ --servicenode  [--servicenode  …]] \ [ --failnode  [--failnode  …]] \ [ --backfstype=ldiskfs] \ [ --mkfsoptions ] \

The next example uses the  syntax to create an MGT that can be run on two servers as a high availability failover resource:

 [root@rh7z-mds1 ~]# mkfs.lustre --mgs \ --servicenode 192.168.227.11@tcp1 \ --servicenode 192.168.227.12@tcp1 \ --backfstype=ldiskfs \ /dev/dm-1

The command-line formats a new MGT that will be used by the MGS for storage. Two server NIDs are supplied as service nodes for the MGS,  and.

The  syntax is similar, but is used to define only a failover target for the storage service. For example:

 [root@rh7z-mds1 ~]# mkfs.lustre --mgs \ --failnode 192.168.227.12@tcp1 \ --backfstype=ldiskfs \ /dev/dm-1

Here, the failover host is identified as, one server in an HA pair (and which, for the purpose of this example, has the hostname  ). The  command was executed on   (NID:  ), and the   command must also be run from   when the MGS service starts for the very first time.

Formatting the MGT using only the mkfs.lustre command
Note: For the greatest flexibility and control when creating ZFS-based Lustre storage targets, do not use this approach – instead, create the zpool separately from formatting the Lustre OSD. See Formatting the MGT using zpool and mkfs.lustre.

The syntax for creating a ZFS-based MGT using only the  command is as follows:

 mkfs.lustre --mgs \ [ --reformat ] \ [ --servicenode  [--servicenode  …]] \ [ --failnode  [--failnode  …]] \ --backfstype=zfs \ [ --mkfsoptions ] \ / syntax to create an MGT that can be run on two servers as a high availability failover resource:

 [root@rh7z-mds1 ~]# mkfs.lustre --mgs \ --servicenode 192.168.227.11@tcp1 \ --servicenode 192.168.227.12@tcp1 \ --backfstype=zfs \ mgspool/mgt mirror sda sdc

The command-line formats a new MGT that will be used by the MGS for storage. The command further defines a mirrored zpool called  (consisting of two devices) and creates a ZFS dataset called. Two server NIDs are supplied as service nodes for the MGS,  and.

The  syntax is similar, but is used to define only a failover target for the storage service. For example:

 [root@rh7z-mds1 ~]# mkfs.lustre --mgs \ --failnode 192.168.227.12@tcp1 \ --backfstype=zfs \ mgspool/mgt mirror sda sdc

As with the LDISKFS example in the previous section, the failover host is identified as, one server in an HA pair (and which has the hostname  ). The  command was executed on   (NID:  ), and the   command must also be run from   when the MGS service starts for the very first time.

Note that when creating a ZFS-based OSD using only the mkfs.lustre command, it is not possible to set or change some properties of the zpool or its vdevs, such as the  property. For this reason, it is highly recommended that the zpools be created independently of the mkfs.lustre command, as shown in the next section.

Formatting the MGT using zpool and mkfs.lustre
To create a ZFS-based MGT, first create a zpool to contain the MGT file system dataset:

 zpool create [-f] -O canmount=off \ [-o ashift=] \ -o cachefile=/etc/zfs/ .spec | -o cachefile=none \

Next, use  to actually create the file system inside the zpool:

<pre style="overflow-x:auto;"> mkfs.lustre --mgs \ [ --reformat ] \ [ --servicenode <NID> [--servicenode <NID> …]] \ [ --failnode <NID> [--failnode <NID> …]] \ --backfstype=zfs \ [ --mkfsoptions ] \ / command to retrieve comprehensive metadata information about the file system dataset and to confirm that the Lustre properties have been set correctly:

<pre style="overflow-x:auto;"> zfs get all | awk '$2 ~ /lustre/'

Alternatively, use the following command to retrieve all properties that have been explicitly set (which may be a larger list than just the  properties):

<pre style="overflow-x:auto;"> zfs get all -s local

An MGT example:

<pre style="overflow-x:auto;"> [root@rh7z-mds1 ~]# zfs get all -s local NAME        PROPERTY              VALUE                                    SOURCE mgspool     canmount              off                                      local mgspool/mgt canmount              off                                      local mgspool/mgt xattr                 sa                                       local mgspool/mgt lustre:version        1                                        local mgspool/mgt lustre:index          65535                                    local mgspool/mgt lustre:failover.node  192.168.227.11@tcp1:192.168.227.12@tcp1  local mgspool/mgt lustre:svname         MGS                                      local mgspool/mgt lustre:flags          4132                                     local

Starting the MGS Service
The  command is used to start all Lustre storage services, including the MGS. Therefore, to start up the MGS, one must mount the MGT on the server.

The syntax is:

<pre style="overflow-x:auto;"> mount -t lustre [-o ] \ | / command syntax is very similar for both LDISKFS and ZFS MGT storage targets. The main difference is the format of the path to the storage. For LDISKFS, the path will resolve to a block device, such as  or , whereas for ZFS, the path resolves to a dataset in a zpool, e.g.,.

The mount point directory must exist before the  command is executed. The recommended convention for the mount point of the MGT storage is.

The following example starts a ZFS-based MGT:

<pre style="overflow-x:auto;"> [root@rh7z-mds1 ~]# zfs list NAME         USED  AVAIL  REFER  MOUNTPOINT mgspool     1.67M   974M    19K  /mgspool mgspool/mgt 1.59M   974M  1.59M  /mgspool/mgt
 * 1) Ignore MOUNTPOINT column in output: not used by Lustre

[root@rh7z-mds1 ~]# mkdir -p /lustre/mgt

[root@rh7z-mds1 ~]# mount -t lustre mgspool/mgt /lustre/mgt

[root@rh7z-mds1 ~]# df -ht lustre File system     Size  Used Avail Use% Mounted on mgspool/mgt     960M  1.7M  957M   1% /lustre/mgt

Note that the default output for  shows the mount points for the MGS pool and MGT dataset in the   column. The content in the  column can be ignored because the referenced mount points are not used for mounting Lustre ZFS OSDs. Instead, the mount point is created administrator explicitly, just as for an LDISKFS-based storage target or any other regular file system mount point.

To reduce confusion, the ZFS file system  property can be set equal to. For example:

<pre style="overflow-x:auto;"> zfs set mountpoint=none mgspool zfs set mountpoint=none mgspool/mgt

Note: Only the  command can start Lustre services. Mounting storage as type  or   will mount a storage target on the host, but it will not trigger the startup of the Lustre kernel processes.

To verify that the MGS is running, check that the device has been mounted, then get the Lustre device list with  and review the running processes:

<pre style="overflow-x:auto;"> [root@rh7z-mds1 lustre]# df -ht lustre File system     Size  Used Avail Use% Mounted on mgspool/mgt     960M  2.0M  956M   1% /lustre/mgt

[root@rh7z-mds1 ~]# lctl dl 0 UP osd-zfs MGS-osd MGS-osd_UUID 5 1 UP mgs MGS MGS 5 2 UP mgc MGC192.168.227.11@tcp1 5d62a612-f872-09a4-7da8-4ce562af6e0c 5

[root@rh7z-mds1 ~]# ps -ef | awk '/mgs/ && !/awk/' root    15162     2  0 02:44 ? 00:00:00 [mgs_params_noti] root    15163     2  0 02:44 ? 00:00:00 [ll_mgs_0000] root    15164     2  0 02:44 ? 00:00:00 [ll_mgs_0001] root    15165     2  0 02:44 ? 00:00:00 [ll_mgs_0002]

Stopping the MGS Service
To stop a Lustre service, run  on the corresponding target:

<pre style="overflow-x:auto;"> umount

The mount point must correspond to the mount point used with the. For example:

<pre style="overflow-x:auto;"> [root@rh7z-mds1 ~]# df -ht lustre File system     Size  Used Avail Use% Mounted on mgspool/mgt     960M  2.0M  956M   1% /lustre/mgt [root@rh7z-mds1 ~]# umount /lustre/mgt [root@rh7z-mds1 ~]# df -ht lustre df: no file systems processed [root@rh7z-mds1 ~]# lctl dl [root@rh7z-mds1 ~]#

Using the regular  command is the correct way to stop a given Lustre service and unmount the associated storage, for both LDISKFS and ZFS-based Lustre storage volumes.

Do not use the  command to stop a Lustre service. Attempting to use  commands to unmount a storage target that is mounted as part of an active Lustre service will return an error:

<pre style="overflow-x:auto;"> [root@rh7z-mds1 ~]# lctl dl 0 UP osd-zfs MGS-osd MGS-osd_UUID 5 1 UP mgs MGS MGS 5 2 UP mgc MGC192.168.227.11@tcp1 be9fad27-107b-d165-8494-9a723b90e863 5

[root@rh7z-mds1 ~]# mount -t lustre mgspool/mgt on /lustre/mgt type lustre (ro)

[root@rh7z-mds1 ~]# zfs list NAME          USED  AVAIL  REFER  MOUNTPOINT mgspool      2.05M   974M    19K  /mgspool mgspool/mgt  1.97M   974M  1.97M  /mgspool/mgt

[root@rh7z-mds1 ~]# zpool status pool: mgspool state: ONLINE scan: none requested config:

NAME       STATE     READ WRITE CKSUM mgspool    ONLINE       0     0     0 mirror-0 ONLINE       0     0     0 sda    ONLINE       0     0     0 sdc    ONLINE       0     0     0

errors: No known data errors

[root@rh7z-mds1 ~]# zfs unmount mgspool/mgt cannot unmount 'mgspool/mgt': not currently mounted

[root@rh7z-mds1 ~]# zfs unmount /lustre/mgt cannot unmount '/lustre/mgt': not a ZFS file system

In the example, the MGS is up and running on a host, and the MGT storage is formatted as a ZFS dataset in a mirrored zpool. The service is online and the storage is mounted as a Lustre file system type. When an attempt is made to use ZFS to umount the volume, the command fails, regardless of if one uses  or the mount point as the reference to the storage volume.

These examples are provided to reinforce the point that many of the Lustre server management tools are the same whether LDISKFS or ZFS is used for the underlying storage. Of course there are storage-level differences, but where possible, the Lustre tools are common to both storage target formats.