Robinhood Policy Engine
Introduction
The Robinhood Policy Engine is an open-source tool that assists in the management of large file systems. It keeps an updated copy of the file system metadata in a MySQL database. It understands Lustre file's `hsm state`. Based on the state, it can perform actions specified by a policy, such as "archive all modified files to the HSM storage". Those capablities make it an integral component in many Lustre HSM solutions.
Mailing Lists
- robinhood-news: announcement of new releases.
- robinhood-support: user discussions and support.
- robinhood-devel: design and coding discussions.
Overview
Robinhood supports any POSIX file system, but implements advanced features specifically for Lustre file systems. It is distributed under the CeCILL-C license (LGPL-compatible).
Robinhood maintains a replica of file system metadata in a MySQL/MariaDB database, which can provide an overall view of the file system without the overhead of directly scanning the file system. With its knowledge of the file system, it provides the following main functionalities.
- Policy Engine: Schedule actions on file system entries according to admin-defined criteria, based on entry attributes (age, size, path, owner, Lustre-specific attributes, etc.).
- User/group usage accounting: File size profiling and top consumers reporting. For example, Fast find and du clones: rbh-find and rbh-du query the metadata database instead of the file system for near-instant results.
- Customizable alerts: Trigger notifications on file system entries meeting specified conditions.
- Lustre-aware: Aware of Lustre OSTs and MDTs; can read MDT changelogs for near-real-time database updates.
- Lustre/HSM integration: Schedule file archive, release, and remove actions through Lustre's Hierarchical Storage Management, HSM, framework.
Example: Using Robinhood to Archive Lustre Files with HSM
A common use case for Robinhood is managing Lustre/HSM to automatically archive files from Lustre to a backend storage system, typically slower but low-cost, such as Azure's blob storage or AWS' S3. Below is an example of how this works.
First, include a policy file in your Robinhood configuration file:
%include "includes/lhsm_archive.inc"
The content of `lhsm_archive.inc` policy file can be
lhsm_archive_rules {
# Don't archive empty files or .tmp or .log files
ignore { size == 0 or name == "*.tmp" or name == "*.log" }
rule default {
# Last modification to the file should be at least 1 hour ago
condition { last_mod > 1h }
}
}
lhsm_archive_trigger {
trigger_on = scheduled;
# Schedule archive checks/actions every 15 minutes
check_interval = 15min;
}
To activate the above policy:
robinhood --run=lhsm_archive --target=all
You can then check HSM status across the file system using:
rbh-report --status-info lhsm_archive
Papers and Presentations
- Robinhood User Group (RUG) event papers and presentations
- Improving overall Robinhood performance for use on large scale deployments (LUG 2016) Video Slides
- Data Life Cycle Monitoring using RobinHood at Scale (LAD 2015) Video Slides
- Robinhood v3 and Beyond (LAD 2015) Video Slides
- Taking back control of HPC file systems with Robinhood Policy Engine, International Workshop on the Lustre Ecosystem, 2015 Slides
- Using Robinhood to Purge Data from Lustre File Systems Cray User Group, 2014 Slides
- Robinhood Policy Engine (LUG 2013) Video Slides