ZFS MDT ENOSPC Recovery

As of version 2.10, Lustre on ZFS does not prevent users from filling up the dataset in which an MDT resides. Possible symptoms are that df -i shows no free inodes, or mkdir or attempts to create a new file fail with ENOSPC. This can happen for a variety of reasons, including, but not necessarily limited to:


 * Enabled changelogs without an active reader consuming the entries created
 * Users creating many files and directories and exceeding the storage capacity of the pool
 * Lustre's internal update logs not being purged due to issues during recovery

This is problematic because Lustre and ZFS need some working space to delete files and directories. A completely full MDT will return ENOSPC in response to rmdir or rm, preventing the user from cleaning up.

Administrators who encounter this problem have a recovery option if they are using ZFS 0.6.5 or above. The method of recovery is independent of the Lustre version.

ZFS prevents some space in the pool from being consumed under normal circumstances. The amount of reserved free space is controlled by spa_slop_shift; see zfs-module-parameters(5) for details. Assuming the MDT is already started and the file system is mounted on a client, perform the following steps to recover.

1. Coordinate with users or use permissions to prevent user access to the file system temporarily.

2. Record the current value of spa_slop_shift on the impacted MDS mds$ cat /sys/module/zfs/parameters/spa_slop_shift

3. On the impacted MDS, increment the value of spa_slop_shift. mds$ echo $(( $(cat /sys/module/zfs/parameters/spa_slop_shift) + 1)) > /sys/module/zfs/parameters/spa_slop_shift

4. On the impacted MDS, run zfs list. Record the working space made available under AVAIL for the dataset containing the MDT.

5. On the client, remove directories and files.

6. On the impacted MDS, run zfs list. The difference between the current value under AVAIL and the value from step 4 is the space successfully freed by your deletes and rmdirs. Return to step 5 until you have freed up at least 20 MB.

7. On the impacted MDS, return the value of spa_slop_shift to the value as recorded from step 2.

At this point Lustre should allow further purging via the client mounts without any special steps. Allow user access when free space available meets your local requirements.

As of this writing, patches are being worked to improve this, see https://jira.hpdd.intel.com/browse/LU-8856