Handling Full OSTs: Difference between revisions

From Lustre Wiki
Jump to navigation Jump to search
(→‎Taking a Full OST Offline: update for 2.9+ process to handle full OSTs)
(update to use modern commands and output formats)
Line 1: Line 1:
{| class='wikitable'
<small>''(Updated: Mar 2018)''</small>
|-
!Note: This page originated on the old Lustre wiki. It was identified as likely having value and was migrated to the new wiki. It is in the process of being reviewed/updated and may currently have content that is out of date.
|}
 
<small>''(Updated: Oct 2009)''</small>
__TOC__
__TOC__
Sometimes a Lustre™ file system becomes unbalanced, often due to changed stripe settings. If an OST is full and an attempt is made to write more information to the file system, an error occurs. The procedures below describe how to handle a full OST.
Sometimes the OSTs in a file system have unbalanced usage, either due to the addition of new OSTs, or because of user error such as explicitly specifying the same starting OST index (e.g. '''-i 0''') for a large number of files, or when creating a single large file on one OST.  If an OST is full and an attempt is made to write more information to that OST (e.g. extending an existing file), an error may occur. The procedures below describe how to handle a full OST.


== Checking  File System Usage ==
== Checking  File System Usage ==
Line 12: Line 7:


<pre>
<pre>
root@LustreClient01 ~]# lfs df -h
root@client01 ~]# lfs df -h
UUID                bytes  Used  Available Use%  Mounted on
UUID                bytes  Used  Available Use%  Mounted on
lustre-MDT0000_UUID  4.4G  214.5M  3.9G    4%  /mnt/lustre[MDT:0]
testfs-MDT0000_UUID  4.4G  214.5M  3.9G    4%  /mnt/testfs[MDT:0]
lustre-OST0000_UUID  2.0G   751.3M   1.1G    37%  /mnt/lustre[OST:0]
testfs-MDT0001_UUID  4.4G  144.5M  4.0G    4%  /mnt/testfs[MDT:1]
lustre-OST0001_UUID  2.0G   755.3M   1.1G    37%  /mnt/lustre[OST:1]
testfs-OST0000_UUID  2.0T   751.3G   1.1G    37%  /mnt/testfs[OST:0]
lustre-OST0002_UUID  2.0G     1.7G 155.1M    86%  /mnt/lustre[OST:2] <-
testfs-OST0001_UUID  2.0T   755.3G   1.1G    37%  /mnt/testfs[OST:1]
lustre-OST0003_UUID  2.0G   751.3M   1.1G    37%  /mnt/lustre[OST:3]
testfs-OST0002_UUID  2.0T     1.9T  55.1M    99%  /mnt/testfs[OST:2] <-
lustre-OST0004_UUID  2.0G   747.3M   1.1G    37%  /mnt/lustre[OST:4]
testfs-OST0003_UUID  2.0T   751.3G   1.1G    37%  /mnt/testfs[OST:3]
lustre-OST0005_UUID  2.0G   743.3M   1.1G    36%  /mnt/lustre[OST:5]
testfs-OST0004_UUID  2.0T   747.3G   1.1G    37%  /mnt/testfs[OST:4]
testfs-OST0005_UUID  2.0T   743.3G   1.1G    36%  /mnt/testfs[OST:5]


filesystem summary: 11.8G     5.4G    5.8G   45%  /mnt/lustre
filesystem summary: 11.8T     5.5T  5.7T   46%  /mnt/lustre
</pre>
</pre>


In this case, ''OST:2'' is almost full and when an attempt is made to write additional information to the file system (even with uniform striping over all the OSTs), the write command fails as follows:
In this case, ''OST:2'' is almost full and when an attempt is made to write additional information to the file system (with uniform striping over all the OSTs), the write command fails as follows:


<pre>
<pre>
[root@LustreClient01 ~]# lfs setstripe /mnt/lustre 4M 0 -1
[root@client01 ~]# lfs setstripe -c -1 /mnt/testfs
[root@LustreClient01 ~]# dd if=/dev/zero of=/mnt/lustre/test_3 bs=10M count=100
[root@client01 ~]# dd if=/dev/zero of=/mnt/testfs/test_3 bs=10M count=100
dd: writing `/mnt/lustre/test_3': No space left on device
dd: writing `/mnt/testfs/test_3': No space left on device
98+0 records in
98+0 records in
97+0 records out
97+0 records out
Line 40: Line 36:
To enable continued use of the file system, the full OST has to have object creation disabled.  This needs to be done on all MDS nodes, since the MDS allocates OST objects for new files.
To enable continued use of the file system, the full OST has to have object creation disabled.  This needs to be done on all MDS nodes, since the MDS allocates OST objects for new files.


1. ''Log in to the MDS server:''
1. As the root user on the MDS use the ''lctl set_param'' command to disable object creation on the OST:
 
<pre>
[root@LustreClient01 ~]# ssh root@mds01
root@mds01's password:
Last login: Wed Nov 26 13:35:12 2008 from LustreClient01
</pre>
 
2. ''Use the'' lctl set_param'' command to disable object creation on the OST:''


<pre>
<pre>
Line 58: Line 46:
The MDS connections to OST0002 will no longer create objects there.  This process should be repeated for other MDS nodes if needed. If a new file is now written to the file system, the write will be successful as the stripes are allocated across the remaining active OSTs.   
The MDS connections to OST0002 will no longer create objects there.  This process should be repeated for other MDS nodes if needed. If a new file is now written to the file system, the write will be successful as the stripes are allocated across the remaining active OSTs.   


3''Once the OST is no longer full (e.g. objects deleted or migrated off the OST), it should be enabled again:''
2.  Once the OST is no longer full (e.g. objects deleted or migrated off the full OST), it should be enabled again:
<pre>
<pre>
[root@mds01 ~]# lctl set_param osp.testfs-OST0002-*.max_create_count=20000
[root@mds01 ~]# lctl set_param osp.testfs-OST0002-*.max_create_count=20000
Line 65: Line 53:
</pre>
</pre>


== Migrating Data within a File System ==
'''Note:''' for releases 2.10.6 and earlier, the '''create_count''' must also be set to a non-zero value:
As stripes cannot be moved within the file system, data must be migrated manually by copying and renaming the file, removing the original file, and renaming the new file with the original file name.
 
1. ''Identify the file(s) to be moved.'' In the example below, output from the ''getstripe'' command indicates that the file ''test_2'' is located entirely on OST2:
 
<pre>
<pre>
[root@LustreClient01 ~]# lfs getstripe /mnt/lustre/test_2
[root@mds01 ~]# lctl set_param osp.testfs-OST0002-*.create_count=128
OBDS:
osp.testfs-OST0002-MDT0000.max_create_count=128
0: lustre-OST0000_UUID ACTIVE
osp.testfs-OST0002-MDT0001.max_create_count=128
1: lustre-OST0001_UUID ACTIVE
2: lustre-OST0002_UUID ACTIVE
3: lustre-OST0003_UUID ACTIVE
4: lustre-OST0004_UUID ACTIVE
5: lustre-OST0005_UUID ACTIVE
/mnt/lustre/test_2
obdidx      objid    objid    group
    2          8      0x8        0
</pre>
</pre>


2. ''Move the file(s):''
== Migrating Data within a File System ==
 
Data from existing files can be migrated to other OSTs using the ''lfs_migrate'' command.  This can be done either while the full OST is deactivated, as described above, or while the OST is still active (in which case the full OST will have a reduced, but not zero, chance of being used for new files).
<pre>
[root@LustreClient01 ~]# cp /mnt/lustre/test_2 /mnt/lustre/test_2.tmp
[root@LustreClient01 ~]# rm /mnt/lustre/test_2
rm: remove regular file `/mnt/lustre/test_2'? Y
</pre>


3. ''Check the file system balance.'' The ''df'' output in the example below shows a more balanced system compared to the ''df'' output in the example in [[Handling Full OSTs]].
1. Identify the file(s) to be moved. In the example below, output from the ''getstripe'' command indicates that the file ''test_2'' is located entirely on OST2:


<pre>
<pre>
[root@LustreClient01 ~]# lfs df -h
[root@client01 ~]# lfs find /mnt/testfs -size +1T --ost 2
UUID                  bytes  Used Available Use% Mounted on
/mnt/testfs/test_2
lustre-MDT0000_UUID  4.4G  214.5M      3.9G  4% /mnt/lustre[MDT:0]
lmm_stripe_count: 1
lustre-OST0000_UUID  2.0G    1.3G    598.1M  65% /mnt/lustre[OST:0]
lmm_stripe_size:   4194304
lustre-OST0001_UUID   2.0G    1.3G    594.1M  65% /mnt/lustre[OST:1]
lmm_pattern:       1
lustre-OST0002_UUID  2.0G  913.4M  1000.0M  45% /mnt/lustre[OST:2]
lmm_layout_gen:    0
lustre-OST0003_UUID  2.0G   1.3G    602.1M  65% /mnt/lustre[OST:3]
lmm_stripe_offset: 2
lustre-OST0004_UUID  2.0G    1.3G    606.1M  64% /mnt/lustre[OST:4]
obdidx objid objid group
lustre-OST0005_UUID  2.0G    1.3G    610.1M  64% /mnt/lustre[OST:5]
    2       1424032       0x15baa0             0
 
filesystem summary:  11.8G    7.3G      3.9G  61% /mnt/lustre
</pre>
</pre>


4. ''Change the name of the file back to the original filename'' so it can be found by clients.
2. If this is very large, it should also be striped across multiple OSTs.  Use ''lfs_migrate'' to move the file(s) to new OSTs:


<pre>
<pre>
[root@LustreClient01 ~]# mv test2.tmp test2
[root@client01 ~]# lfs_migrate -c 4 /mnt/testfs/test_2
[root@LustreClient01 ~]# ls /mnt/lustre
test1 test_2 test3 test_3 test4 test_4 test_x
</pre>
</pre>


5. Reactivate the OST from the MDS for further writes:
3. Check the file system balance. The ''df'' output in the example below shows a more balanced system compared to the ''df'' output in the example in [[Handling Full OSTs]].


<pre>
<pre>
[root@mds ~]# lctl --device 7 activate
[root@client01 ~]# lfs df -h
UUID                  bytes  Used Available Use% Mounted on
testfs-MDT0000_UUID  4.4G  214.5M      3.9G  4% /mnt/testfs[MDT:0]
testfs-MDT0001_UUID  4.4G  144.5M      4.0G  4% /mnt/testfs[MDT:1]
testfs-OST0000_UUID  2.0T    1.3T    598.1G  65% /mnt/testfs[OST:0]
testfs-OST0001_UUID  2.0T    1.3T    594.1G  65% /mnt/testfs[OST:1]
testfs-OST0002_UUID  2.0T  913.4G  1000.0G  45% /mnt/testfs[OST:2]
testfs-OST0003_UUID  2.0T    1.3T    602.1G  65% /mnt/testfs[OST:3]
testfs-OST0004_UUID  2.0T    1.3T    606.1G  64% /mnt/testfs[OST:4]
testfs-OST0005_UUID  2.0T    1.3T    610.1G  64% /mnt/testfs[OST:5]


[root@mds ~]# lctl dl
filesystem summary:  11.8T    7.3T      3.9T 61% /mnt/testfs
  0 UP mgs MGS MGS 9
  1 UP mgc MGC192.168.0.10@tcp e384bb0e-680b-ce25-7bc9-81655dd1e813 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
  4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 5
  5 UP osc lustre-OST0000-osc lustre-mdtlov_UUID 5
  6 UP osc lustre-OST0001-osc lustre-mdtlov_UUID 5
  7 UP osc lustre-OST0002-osc lustre-mdtlov_UUID 5
  8 UP osc lustre-OST0003-osc lustre-mdtlov_UUID 5
  9 UP osc lustre-OST0004-osc lustre-mdtlov_UUID 5
  10 UP osc lustre-OST0005-osc lustre-mdtlov_UUID
</pre>
</pre>
[[Category: NeedsReview]]

Revision as of 17:56, 18 March 2019

(Updated: Mar 2018)

Sometimes the OSTs in a file system have unbalanced usage, either due to the addition of new OSTs, or because of user error such as explicitly specifying the same starting OST index (e.g. -i 0) for a large number of files, or when creating a single large file on one OST. If an OST is full and an attempt is made to write more information to that OST (e.g. extending an existing file), an error may occur. The procedures below describe how to handle a full OST.

Checking File System Usage

The example below shows an unbalanced file system:

root@client01 ~]# lfs df -h
UUID                 bytes   Used  Available Use%  Mounted on
testfs-MDT0000_UUID  4.4G   214.5M   3.9G     4%   /mnt/testfs[MDT:0]
testfs-MDT0001_UUID  4.4G   144.5M   4.0G     4%   /mnt/testfs[MDT:1]
testfs-OST0000_UUID  2.0T   751.3G   1.1G    37%   /mnt/testfs[OST:0]
testfs-OST0001_UUID  2.0T   755.3G   1.1G    37%   /mnt/testfs[OST:1]
testfs-OST0002_UUID  2.0T     1.9T  55.1M    99%   /mnt/testfs[OST:2] <-
testfs-OST0003_UUID  2.0T   751.3G   1.1G    37%   /mnt/testfs[OST:3]
testfs-OST0004_UUID  2.0T   747.3G   1.1G    37%   /mnt/testfs[OST:4]
testfs-OST0005_UUID  2.0T   743.3G   1.1G    36%   /mnt/testfs[OST:5]

filesystem summary: 11.8T     5.5T   5.7T    46%  /mnt/lustre

In this case, OST:2 is almost full and when an attempt is made to write additional information to the file system (with uniform striping over all the OSTs), the write command fails as follows:

[root@client01 ~]# lfs setstripe -c -1 /mnt/testfs
[root@client01 ~]# dd if=/dev/zero of=/mnt/testfs/test_3 bs=10M count=100
dd: writing `/mnt/testfs/test_3': No space left on device
98+0 records in
97+0 records out
1017192448 bytes (1.0 GB) copied, 23.2411 seconds, 43.8 MB/s

Disabling MDS Object Creation on OST

To enable continued use of the file system, the full OST has to have object creation disabled. This needs to be done on all MDS nodes, since the MDS allocates OST objects for new files.

1. As the root user on the MDS use the lctl set_param command to disable object creation on the OST:

[root@mds01 ~]# lctl set_param osp.testfs-OST0002-*.max_create_count=0
osp.testfs-OST0002-MDT0000.max_create_count=0
osp.testfs-OST0002-MDT0001.max_create_count=0

The MDS connections to OST0002 will no longer create objects there. This process should be repeated for other MDS nodes if needed. If a new file is now written to the file system, the write will be successful as the stripes are allocated across the remaining active OSTs.

2. Once the OST is no longer full (e.g. objects deleted or migrated off the full OST), it should be enabled again:

[root@mds01 ~]# lctl set_param osp.testfs-OST0002-*.max_create_count=20000
osp.testfs-OST0002-MDT0000.max_create_count=20000
osp.testfs-OST0002-MDT0001.max_create_count=20000

Note: for releases 2.10.6 and earlier, the create_count must also be set to a non-zero value:

[root@mds01 ~]# lctl set_param osp.testfs-OST0002-*.create_count=128
osp.testfs-OST0002-MDT0000.max_create_count=128
osp.testfs-OST0002-MDT0001.max_create_count=128

Migrating Data within a File System

Data from existing files can be migrated to other OSTs using the lfs_migrate command. This can be done either while the full OST is deactivated, as described above, or while the OST is still active (in which case the full OST will have a reduced, but not zero, chance of being used for new files).

1. Identify the file(s) to be moved. In the example below, output from the getstripe command indicates that the file test_2 is located entirely on OST2:

[root@client01 ~]# lfs find /mnt/testfs -size +1T --ost 2
/mnt/testfs/test_2
lmm_stripe_count:  1
lmm_stripe_size:   4194304
lmm_pattern:       1
lmm_layout_gen:    0
lmm_stripe_offset: 2
	obdidx		 objid		 objid		 group
	     2	       1424032	      0x15baa0	             0

2. If this is very large, it should also be striped across multiple OSTs. Use lfs_migrate to move the file(s) to new OSTs:

[root@client01 ~]# lfs_migrate -c 4 /mnt/testfs/test_2

3. Check the file system balance. The df output in the example below shows a more balanced system compared to the df output in the example in Handling Full OSTs.

[root@client01 ~]# lfs df -h
UUID                  bytes   Used Available Use% Mounted on
testfs-MDT0000_UUID   4.4G  214.5M      3.9G   4% /mnt/testfs[MDT:0]
testfs-MDT0001_UUID   4.4G  144.5M      4.0G   4% /mnt/testfs[MDT:1]
testfs-OST0000_UUID   2.0T    1.3T    598.1G  65% /mnt/testfs[OST:0]
testfs-OST0001_UUID   2.0T    1.3T    594.1G  65% /mnt/testfs[OST:1]
testfs-OST0002_UUID   2.0T  913.4G   1000.0G  45% /mnt/testfs[OST:2]
testfs-OST0003_UUID   2.0T    1.3T    602.1G  65% /mnt/testfs[OST:3]
testfs-OST0004_UUID   2.0T    1.3T    606.1G  64% /mnt/testfs[OST:4]
testfs-OST0005_UUID   2.0T    1.3T    610.1G  64% /mnt/testfs[OST:5]

filesystem summary:  11.8T    7.3T      3.9T  61% /mnt/testfs