File:LUG2019-Flexible Lustre Management-Simmons.pdf

Over the years Lustre has continued to grow in its feature set. The HPC systems that deploy Lustre also continue to grow size. The combination of these two factors have created an incredible burden for sites to handle such systems. The largest cost comes from the complex of configuring the file system to optimize performance as well as managing day to day maintenance to keep the file system operating. Traditionally file systems are not interactive stacks which requires sites to develop novel techniques to gauge the state of the file system.

For Lustre to move into the Linux kernel source tree certain requirements have to be meet. Adopting those new requirements have actually opened up Lustre to leverage some new powerful functionality. Some of these new approaches offer better performance and scalability. Adopting these new APIs allows Lustre to better integrate with the standard OS software stack as well. This new functionality can ease the burden of configuring as well as maintaining any size deployment of Lustre. In this presentation we will examine new ways to handle large scale configuration. How Lustre can be monitored for state changes and what administrative setups can be done to act on those changes. This allows the potential for a cluster to manage its file system without direct administrative action. We can demonstrate the use of various tools in typical HPC environments that were never available before. Exploration of new potential features such as automated file system recovery or adding the ability for Lustre aware utilizes to be aware of Lustre events that occurred on another node.