File:LUG2019-Unscratching Lustre-Harr.pdf

File systems, especially complex, parallel ones, take many years to mature. Given the risky behavior inherent in adolescence, these young file systems are often used as “scratch” file systems, containing non-critical or easily-reproducible data in case of data loss or corruption from unforeseen bugs. With Lustre now starting its third decade of life since conception at Carnegie Mellon University in 1999, it is striving to cast off its teenage years and present itself in a responsible and mature fashion.

As one of the original funders of Lustre and the first user of Lustre in production back in 2003, Lawrence Livermore National Lab’s Livermore Computing (LC) has a long and close relationship with the file system and has an interest in seeing Lustre further mature. To that end, in the second half of 2018 and coinciding with the retirement of many older Lustre file systems, LC commenced the “un-scratching” of Lustre: the migration of multiple, production, “scratch” Lustre file systems to persistent, non-scratch, non-purgeable, file systems.

This presentation first addresses the state of Lustre in LC through the first half of 2018. It then further details the rationale behind this change, specifically from a user and an administrative perspective. Next it covers some of the mechanics, results, and yes, even a bit of politics involved in the conversion. The presentation then addresses the current state of the production Lustre file systems in LC, including the implementation of user quotas and the takeaways gathered from that experience. Finally, the presentation will touch on what this change means for the future, specifically in regards to refreshing the hardware underlying Lustre.