File:LUG2019-Robinhood Reports-Hall.pdf

For many Lustre sites, file system purging is not an option. At these sites, the main way to manage the data on the file system is to rely on users cleaning up on their own. This is frequently a losing situation.

Finding data on a file system is also often a difficult task, especially when wading through millions or billions of files. Thankfully Robinhood allows us to maintain a replica of our file system metadata in a database, so we have the foundation necessary for users to find their data easily. The trouble is that there are only so many ways to retrieve that data from Robinhood, and typically those methods aren’t detailed enough or digestible by the common user.

That’s where we have filled the gap. To give our users a simple view into their data, we developed robinhood-reports. This is a new web front end for Robinhood databases that simplifies how users can search for their data by providing reports directed specifically to users’ needs. Currently, robinhood-reports includes the following preconfigured reports: Summary – summarizes all the file systems being monitored by Robinhood Detailed – breaks down the summary on a per user basis Age Histogram – age histogram of a user’s data Big Directories – directories with the largest number of files contained immediately within Interesting Files – files with interesting “extensions”, such as scripts and source code. Required Robinhood file class configuration Print Files – files with a certain pattern – typically used for log files from jobs Largest Directories – directories with the largest total size of files contained immediately within