File:LUG2019-Smart Policies Data Placement Tiering Lustre-LiXi.pdf

In a massive storage system, it becomes more and more common to see heterogeneous media being used at the same time. Different types of mechanical hard disks, SSDs and NVMe can all be attached as storage media in a single Lustre file system with a unified name-space. These devices have different specifications on the aspects of capacity, latency, bandwidth, reliability, cost and so on. A major challenge to the Lustre file system is how to provide necessary support to the users so as to help them to get the maximum benefit out of the different specifications of the storage media.

The mechanism of Lustre OST/MDT pool provides nature basis to the support for heterogeneous storage devices. The OST/MDT pools can be used to classify and isolate the storage devices logically according to their specifications. However, in order to build a sophisticated and complete solutions of data management in a file system with different storage pools, OST/MDT pool needs to provide necessary mechanisms or tools, including the policies and tools for data placement and movement.

One improvement of OST/MDT pool that we’ve been recently working on (LU-11234) is adding an Data Placement Policy (DPP) mechanism for it. DPP enables users to define the rules that determines what pools the newly created files will be located on. The rules can be based on UID, GID, JobID, file name and the expressions based on the combinations of these attributes. By configuring proper rules in DPP, administrators of a Lustre file system with different storage types have better ways to control how the storage spaces and bandwidths should be allocated. DPP is useful for the following use cases: Allocate the storage space with higher performance to the critical jobs or work-flow. The high-performance pools usually have higher costs thus should not be occupied by low-priority jobs. And DPP enables the users to define rules to better allocate the spaces and bandwidths to the high-priority jobs. Control the priorities between different users and groups. By defining rules based on the UID/GID, the files created by users/groups with higher priority will be located into quick storage pool. Use different storage types for files with different I/O patterns. In a lot of cases, the I/O patterns can be determined by the filename extensions. DPP enables a good way to split and isolate I/Os with different pattern thus might be able to promote the entire efficiency of the file system.

Besides of the internals and use cases of DPP, the presentation will introduce how DPP can be used together with the existing and upcoming Lustre features or tools for better data management, space allocation and quality of service in a Lustre file system with multiple storage tiers, including: Data placement and migration between tiers together with parallel data migration tools. QoS control of OST/MDT pools together with NRS TBF policies. Space management of OST/MDT pools together with Pool Quota. Multiple tiering that uses LPCC for burst buffer, heterogeneous OST/MDT pools for hot/cold data and HSM for archive.