Handling Full OSTs

(Updated: Oct 2009)

Sometimes a Lustre™ file system becomes unbalanced, often due to changed stripe settings. If an OST is full and an attempt is made to write more information to the file system, an error occurs. The procedures below describe how to handle a full OST.

Checking File System Usage
The example below shows an unbalanced file system:

root@LustreClient01 ~]# lfs df -h UUID                bytes   Used  Available Use%  Mounted on lustre-MDT0000_UUID  4.4G   214.5M   3.9G     4%   /mnt/lustre[MDT:0] lustre-OST0000_UUID 2.0G   751.3M   1.1G    37%   /mnt/lustre[OST:0] lustre-OST0001_UUID 2.0G   755.3M   1.1G    37%   /mnt/lustre[OST:1] lustre-OST0002_UUID 2.0G     1.7G 155.1M    86%   /mnt/lustre[OST:2] <- lustre-OST0003_UUID 2.0G   751.3M   1.1G    37%   /mnt/lustre[OST:3] lustre-OST0004_UUID 2.0G   747.3M   1.1G    37%   /mnt/lustre[OST:4] lustre-OST0005_UUID 2.0G   743.3M   1.1G    36%   /mnt/lustre[OST:5]

filesystem summary: 11.8G    5.4G    5.8G    45%  /mnt/lustre

In this case, OST:2 is almost full and when an attempt is made to write additional information to the file system (even with uniform striping over all the OSTs), the write command fails as follows:

[root@LustreClient01 ~]# lfs setstripe /mnt/lustre 4M 0 -1 [root@LustreClient01 ~]# dd if=/dev/zero of=/mnt/lustre/test_3 bs=10M count=100 dd: writing `/mnt/lustre/test_3': No space left on device 98+0 records in 97+0 records out 1017192448 bytes (1.0 GB) copied, 23.2411 seconds, 43.8 MB/s

Disabling MDS Object Creation on OST
To enable continued use of the file system, the full OST has to have object creation disabled. This needs to be done on all MDS nodes, since the MDS allocates OST objects for new files.

1. Log in to the MDS server:

[root@LustreClient01 ~]# ssh root@mds01 root@mds01's password: Last login: Wed Nov 26 13:35:12 2008 from LustreClient01

2. Use the lctl set_param command to disable object creation on the OST:

[root@mds01 ~]# lctl set_param osp.testfs-OST0002-*.max_create_count=0 osp.testfs-OST0002-MDT0000.max_create_count=0 osp.testfs-OST0002-MDT0001.max_create_count=0

The MDS connections to OST0002 will no longer create objects there. This process should be repeated for other MDS nodes if needed. If a new file is now written to the file system, the write will be successful as the stripes are allocated across the remaining active OSTs.

3. Once the OST is no longer full (e.g. objects deleted or migrated off the OST), it should be enabled again: [root@mds01 ~]# lctl set_param osp.testfs-OST0002-*.max_create_count=20000 osp.testfs-OST0002-MDT0000.max_create_count=20000 osp.testfs-OST0002-MDT0001.max_create_count=20000

Migrating Data within a File System
As stripes cannot be moved within the file system, data must be migrated manually by copying and renaming the file, removing the original file, and renaming the new file with the original file name.

1. Identify the file(s) to be moved. In the example below, output from the getstripe command indicates that the file test_2 is located entirely on OST2:

[root@LustreClient01 ~]# lfs getstripe /mnt/lustre/test_2 OBDS: 0: lustre-OST0000_UUID ACTIVE 1: lustre-OST0001_UUID ACTIVE 2: lustre-OST0002_UUID ACTIVE 3: lustre-OST0003_UUID ACTIVE 4: lustre-OST0004_UUID ACTIVE 5: lustre-OST0005_UUID ACTIVE /mnt/lustre/test_2 obdidx     objid     objid     group 2         8       0x8         0

2. Move the file(s):

[root@LustreClient01 ~]# cp /mnt/lustre/test_2 /mnt/lustre/test_2.tmp [root@LustreClient01 ~]# rm /mnt/lustre/test_2 rm: remove regular file `/mnt/lustre/test_2'? Y

3. Check the file system balance. The df output in the example below shows a more balanced system compared to the df output in the example in Handling Full OSTs.

[root@LustreClient01 ~]# lfs df -h UUID                 bytes   Used Available Use% Mounted on lustre-MDT0000_UUID   4.4G  214.5M      3.9G   4% /mnt/lustre[MDT:0] lustre-OST0000_UUID  2.0G    1.3G    598.1M  65% /mnt/lustre[OST:0] lustre-OST0001_UUID  2.0G    1.3G    594.1M  65% /mnt/lustre[OST:1] lustre-OST0002_UUID  2.0G  913.4M   1000.0M  45% /mnt/lustre[OST:2] lustre-OST0003_UUID  2.0G    1.3G    602.1M  65% /mnt/lustre[OST:3] lustre-OST0004_UUID  2.0G    1.3G    606.1M  64% /mnt/lustre[OST:4] lustre-OST0005_UUID  2.0G    1.3G    610.1M  64% /mnt/lustre[OST:5]

filesystem summary: 11.8G    7.3G      3.9G  61% /mnt/lustre

4. Change the name of the file back to the original filename so it can be found by clients.

[root@LustreClient01 ~]# mv test2.tmp test2 [root@LustreClient01 ~]# ls /mnt/lustre test1 test_2 test3 test_3 test4 test_4 test_x

5. Reactivate the OST from the MDS for further writes:

[root@mds ~]# lctl --device 7 activate

[root@mds ~]# lctl dl 0 UP mgs MGS MGS 9 1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-81655dd1e813 5 2 UP mdt MDS MDS_uuid 3 3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 5 5 UP osc lustre-OST0000-osc lustre-mdtlov_UUID 5 6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5 7 UP osc lustre-OST0002-osc lustre-mdtlov_UUID 5 8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5 9 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5 10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID