Lustre 101

From Lustre Wiki
Revision as of 08:50, 30 April 2026 by Rfmohr (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Lustre 101 Course Series (Lustre Administration Essentials)

NOTE: These slides were originally hosted by ORNL in 2016 as part of a self-paced web-based course series designed to provide an introduction to Lustre. The course was targeted at experienced system administrators who were relatively new to Lustre. Although the content may be old, some of information/concepts presented in this course are still relevant to newer versions of Lustre.

  • Introduction to Lustre - This presentation provides a general overview of the Lustre file system for anyone wanting to learn more about basic Lustre functionality, features, and architecture. The basic components of Lustre are discussed, including the LNet transport layer. Information about Lustre file striping is also included.
  • Hardware Selection and Benchmarking for Lustre - This presentation covers topics relevant to the process of hardware selection for a Lustre file system. Recommendations for servers, clients, and networking are provided. In addition, general concepts for benchmarking a Lustre file system are discussed, and a list of some useful tools is provided.
  • Basic Lustre Installation and Setup from Stock RPMs - This presentation illustrates how to setup a simple Lustre file system using the stock Lustre RPM packages. The process for installing the RPM packages will be covered, along with a description of the configuration files that are needed by Lustre. Options for formatting and mounting the Lustre storage are also covered.
  • Creating a Lustre Test System from Source with Virtual Machines - This presentation describes how to build a small Lustre file system using virtual machines that can be used as a test platform for anyone wanting to experiment with Lustre. Rather than use the stock Lustre RPM packages, the process of building Lustre from source code will be demonstrated.
  • Lustre Tuning and Advanced LNet Configuration - This presentation discusses several of the Lustre kernel modules and available performance tuning options, as well as server and client tuning options for Lustre. Examples of more complex LNet configurations are also illustrated.
  • File System Administration and Monitoring - This presentation covers some basic Lustre file system administration tasks such as starting and stopping a Lustre file system, mounting the file system on a client node, and usage reporting. An overview of several useful monitoring tools is also presented.
  • Analysis of Crash Dumps and Log Files - This presentation discusses how to gather diagnostic information for Lustre using kernel crash dumps and log files. An overview of crash dumps is given, including the necessary steps to generate dumps and use them for analyzing the cause of Lustre kernel module exceptions (aka LBUGs). The analysis of system logs and Lustre-specific logs to identify problems is also covered.
  • Evaluating the Functionality, Performance, and Reliability of Lustre - This presentation describes the methods used by the Oak Ridge Leadership Computing Facility (OLCF) to evaluate the functionality, performance, and reliability of new Lustre versions before deploying them into production.
  • Lustre and Memory
  • Lustre Over Long-Haul Connections Using LNet Routers
  • Working with Problematic Nodes
  • Recovery and Eviction

Acknowledgments: The Lustre 101 course series was developed by the Computational Research and Development Programs at Oak Ridge National Laboratory (ORNL), with support from the U.S. Department of Defense and the Oak Ridge Leadership Computing Facility (OLCF). OLCF is supported by the Office of Science of the U.S. Department of Energy.

Lustre Ecosystem Tutorials

The following tutorials were presented at the International Workshop on the Lustre Ecosystem held in 2015 and 2016:

2015

2016