Lustre Management Service (MGS)

The Management Service, or MGS, is a global resource that can support multiple file systems in a service domain. The MGS stores configuration information for one or more Lustre file systems in a cluster and provides this information to other Lustre hosts. Servers and clients connect to the MGS on startup in order to retrieve the configuration log for the file system. Notification of changes to a file system’s configuration, including server restarts, are distributed by the MGS. Data is stored on a block device called the management target (MGT).

The MGS has very modest resource requirements compared to the Metadata Service (MDS) and the Object Storage Service (OSS), as it is not a compute- or storage-intensive service. The storage required for the MGT is somewhere in the region of 100-200MiB, and any physical storage allocation is likely to far exceed this minimum requirement. Therefore the minimum real-world requirement is a fault-tolerant storage device, ideally a two-disk, RAID 1 mirror. Some storage arrays support the creation of a small virtual disk from a larger physical array configuration. For example, one might create a physical storage array consisting of 24 disks in a RAID 10 configuration, and split this into 2 vdisks: a small vdisk for the MGT (perhaps 1GB), with the remainder allocated to e.g. MDT0. Each vdisk must be presented as an independent block device to the servers that can be managed independently from the point of view of failover configuration.

The MGS can run on a standalone server, but like all Lustre services, if that host fails, the MGS will be offline until the host can be restored. Lustre servers and clients will be unable to start while the MGS is offline, although services that are already running will continue to work. In addition, Lustre nodes will be unable to receive notifications of changes in the Lustre configuration if the MGS is unavailable. Therefore, the MGS is most often deployed into an HA failover configuration, with a small shared storage device or volume that can be mounted on more than one host. Because the MGS consumes only a small amount of a server’s resources, it is unusual to create an HA cluster that contains only the MGS. Instead, the MGS will typically be integrated into a failover HA cluster framework with the root MDS (i.e., the metadata service for MDT0, which is the storage target that contains the root of a Lustre file system namespace).

For truly flexible, high-availability configurations, where resources are somewhat autonomous and can be managed independently, the MGT storage should be allocated to a device or volume that is independent of the MDT storage, from the perspective of the OS (although both MGT and MDT might be in the same physical storage enclosure). The intention is that each service can migrate independently between hosts in a high availability server configuration.

For ldiskfs-based storage, this generally means that the MGT and MDT are contained on separate LUNs or vdisks in a storage array, and for ZFS, the MGT and MDT should be in separate zpools.

When using ZFS for the MGT and MDT storage in a high availability configuration, do not configure the MGT and MDT as datasets in the same pool. The pool can only be imported onto one host at a time, which will prevent the services from running on separate hosts, and will not allow independent service migration or failover.

Refer to the Metadata server cluster diagram for a typical high availability MGS and MDS server cluster configuration.

There can only be one MGS running on a node at one time. This means that it is not possible to have multiple MGTs configured in the same HA cluster because even if the services initially start on separate nodes, if a failover occurs they will both end up being located on the same host. Incoming client connections will not be able to determine which service to connect to, and may be connected arbitrarily, resulting in the registration and other configuration data being split in unpredictable ways across the competing resources. This could cause corruption of the configuration on both targets. There is no specific requirement in Lustre to create multiple MGS; one MGS will often suffice for many file systems in a subnet.