Configuring Lustre File Striping: Difference between revisions

From Lustre Wiki
Jump to navigation Jump to search
(adding lustre striping guide from nics)
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
{| class='wikitable'
One of the main factors leading to the high performance of Lustre™ file systems is the ability to stripe data over multiple OSTs. The stripe count can be set on a file system, directory, or file level.  An example showing the use of striping is provided below.  
|-
!Note: This page originated on the old Lustre wiki. It was identified as likely having value and was migrated to the new wiki. It is in the process of being reviewed/updated and may currently have content that is out of date.
|}


* [https://www.nics.tennessee.edu/computing-resources/file-systems/lustre-striping-guide Lustre Striping Guide]
The [https://www.nics.tennessee.edu/computing-resources/file-systems/lustre-striping-guide Lustre Striping Guide] provides a good overview of how Lustre file striping works.


One of the main factors leading to the high performance of Lustre™ file systems is the ability to stripe data over multiple OSTs. The stripe count can be set on a file system, directory, or file level.  An example showing the use of striping is provided below.
For more detailed information, see [http://doc.lustre.org/lustre_manual.xhtml#managingstripingfreespace Chapter 19: ''Managing File Striping and Free Space''] in the [http://lustre.org/documentation/ ''Lustre Operations Manual''].
 
For additional information, see [http://wiki.lustre.org/manual/LustreManual20_HTML/ManagingStripingFreeSpace.html#50438209_pgfId-5529 Chapter 18: ''Managing File Striping and Free Space''] in the [http://wiki.lustre.org/manual/LustreManual20_HTML/index.html ''Lustre Operations Manual'']


== Setting Up Striping ==
== Setting Up Striping ==


To see the current stripe size, use the command ''lfs getstripe [file, dir, fs]''. This command will produce output similar to the following:
To see the layout of a particular file, or the default layout to be used for new files in a particular directory, use the command '''lfs getstripe {file|directory|root}'''.  If run against the filesystem root directory, it will show the default layout for all files created in the filesystem that do not otherwise specify a layout at creation time or inherit it from a layout on the parent directory.  For example, running the command on the filesystem root directory shows the following (the '''-d''' option limits the output to the specified directory):


<pre>
<pre>
root@LustreClient01 lustre]# lfs getstripe /mnt/lustre
root@client# lfs getstripe -d /mnt/testfs
OBDS:
stripe_count: 2 stripe_size:   4194304 pattern:      0 stripe_offset: -1
0: lustre-OST0000_UUID ACTIVE
1: lustre-OST0001_UUID ACTIVE
2: lustre-OST0002_UUID ACTIVE
3: lustre-OST0003_UUID ACTIVE
4: lustre-OST0004_UUID ACTIVE
5: lustre-OST0005_UUID ACTIVE
/mnt/lustre
(Default) stripe_count: 2 stripe_size: 4M stripe_offset: 0
</pre>
</pre>


In this example, the default stripe count is 2 (that is, data blocks are striped over two OSTs), the default stripe size is 4 MB (the stripe size can be set in K, M or G), and all writes start from the first OST.
In this example, the default '''stripe_count''' is 2 (that is, data blocks are striped alternately over two OSTs), the default '''stripe_size''' is 4 MB (that is, each OST reads or writes 4MB of data for the first file stripe before going to the second file stripe on the next OST), and files do not start on a specific OST index (that is, the MDS will balance new file creations across all OSTs in the filesystem for maximum performance).


'''''Note:''''' When setting the stripe, the offset is set before the stripe count.
Note that the '''stripe_size''' does '''not''' affect the allocation size of the file on disk (which is controlled by the underlying OST filesystem, typically 4KB for ldiskfs), but only the distribution of file data across OSTs.


The command to set a new stripe pattern on the file system may look like this:
The command to set the above default layout on the file system looked like this:


[root@LustreClient01 lustre]# lfs setstripe -s 4M -i 0 -c 1 /mnt/lustre
  root@client# lfs setstripe -S 4M -c 2 /mnt/testfs


This example command sets the stripe of ''/mnt/lustre'' to 4 MB blocks starting at OST0 and spanning over one OST. If a new file is created with these settings, the following results are seen:
If a new 2000MB file is created with these filesystem default settings, the following results are seen:


<pre>
<pre>
[root@LustreClient01 lustre]# dd if=/dev/zero of=/mnt/lustre/test1 bs=10M count=100
root@client# dd if=/dev/zero of=/mnt/testfs/test1 bs=20M count=100
 
root@client# lfs df -h
root@LustreClient01 lustre]# lfs df -h
UUID                  bytes    Used  Available  Use%  Mounted on
UUID                  bytes    Used  Available  Use%  Mounted on
lustre-MDT0000_UUID    4.4G  214.5M      3.9G    4%  /mnt/lustre[MDT:0]
testfs-MDT0000_UUID    4.4G  214.5M      3.9G    4%  /mnt/testfs[MDT:0]
lustre-OST0000_UUID    2.0G    1.1G    830.1M    53%  /mnt/lustre[OST:0]
testfs-OST0000_UUID    2.0G    1.1G    830.1M    53%  /mnt/testfs[OST:0]
lustre-OST0001_UUID    2.0G   83.3M      1.8G     4%  /mnt/lustre[OST:1]
testfs-OST0001_UUID    2.0G     1.1G     830.1M    53%  /mnt/testfs[OST:1]
lustre-OST0002_UUID    2.0G    83.3M      1.8G    4%  /mnt/lustre[OST:2]
testfs-OST0002_UUID    2.0G    83.3M      1.8G    4%  /mnt/testfs[OST:2]
lustre-OST0003_UUID    2.0G    83.3M      1.8G    4%  /mnt/lustre[OST:3]
testfs-OST0003_UUID    2.0G    83.3M      1.8G    4%  /mnt/testfs[OST:3]
lustre-OST0004_UUID    2.0G    83.3M      1.8G    4%  /mnt/lustre[OST:4]
testfs-OST0004_UUID    2.0G    83.3M      1.8G    4%  /mnt/testfs[OST:4]
lustre-OST0005_UUID    2.0G    83.3M      1.8G    4%  /mnt/lustre[OST:5]
testfs-OST0005_UUID    2.0G    83.3M      1.8G    4%  /mnt/testfs[OST:5]


filesystem summary:  11.8G    1.5G      9.7G   12%  /mnt/lustre
filesystem_summary:  11.8G    2.5G      8.8G   20%  /mnt/testfs
</pre>
</pre>


In this example, the entire file was written to the first OST with a very uneven distribution of data blocks.
In this example, the entire file was written to the first two OSTs (1000MB per OST) with no usage of the other four OSTs.  That only two OSTs are used is expected/requested, since other files will be created on those other OSTs to balance space and bandwidth usage.  Note that the layout ('''stripe_count''', '''stripe_size''', OSTs) of a file is fixed when the file is first '''created''' (opened).  To change the layout of a file after it is created, the '''lfs migrate''' command (which takes the same parameters as '''lfs setstripe''') is needed to move the file data to different OSTs.


Continuing with this example, the file is removed and the stripe count is changed to a value of ''-1'' to specify striping over all available OSTs:
Continuing with this example, if a new 1000MB file is created with an explicit '''stripe_count''' of '''-1''' to specify striping over all available OSTs instead of using the filesystem default:


[root@LustreClient01 lustre]# lfs setstripe -s 4M -i 0 -c -1 /mnt/lustre
  root@client# lfs setstripe -c -1 /mnt/testfs/test2


Now, when a file is created, the new stripe setting evenly distributes the data over all the available OSTs:
Now, when this file is written, the new stripe setting evenly distributes about 160MB of the filek data over each the available OSTs.  Using a widely-striped file is good if the file is very large, or a lot of clients will be accessing it concurrently, but typically it is best to have individual files striped over only 1 or 2 OSTs for minimal overhead, and let multiple processes creating separate files handle the parallelism across different OSTs.


<pre>
<pre>
[root@LustreClient01 lustre]# dd if=/dev/zero of=/mnt/lustre/test1 bs=10M count=100
root@client# dd if=/dev/zero of=/mnt/testfs/test1 bs=10M count=100
100+0 records in
root@client# lfs df -h
100+0 records out
1048576000 bytes (1.0 GB) copied, 20.2589 seconds, 51.8 MB/s
 
[root@LustreClient01 lustre]# lfs df -h
UUID                  bytes    Used  Available  Use%  Mounted on
UUID                  bytes    Used  Available  Use%  Mounted on
lustre-MDT0000_UUID    4.4G  214.5M      3.9G    4%  /mnt/lustre[MDT:0]
testfs-MDT0000_UUID    4.4G  214.5M      3.9G    4%  /mnt/testfs[MDT:0]
lustre-OST0000_UUID    2.0G   251.3M      1.6G   12%  /mnt/lustre[OST:0]
testfs-OST0000_UUID    2.0G     1.3G    670.2M   61%  /mnt/testfs[OST:0]
lustre-OST0001_UUID    2.0G   251.3M      1.6G   12%  /mnt/lustre[OST:1]
testfs-OST0001_UUID    2.0G     1.3G    670.2M   61%  /mnt/testfs[OST:1]
lustre-OST0002_UUID    2.0G  251.3M      1.6G    12%  /mnt/lustre[OST:2]
testfs-OST0002_UUID    2.0G  251.3M      1.6G    12%  /mnt/testfs[OST:2]
lustre-OST0003_UUID    2.0G  251.3M      1.6G    12%  /mnt/lustre[OST:3]
testfs-OST0003_UUID    2.0G  251.3M      1.6G    12%  /mnt/testfs[OST:3]
lustre-OST0004_UUID    2.0G  247.3M      1.6G    12%  /mnt/lustre[OST:4]
testfs-OST0004_UUID    2.0G  247.3M      1.6G    12%  /mnt/testfs[OST:4]
lustre-OST0005_UUID    2.0G  247.3M      1.6G    12%  /mnt/lustre[OST:5]
testfs-OST0005_UUID    2.0G  247.3M      1.6G    12%  /mnt/testfs[OST:5]


filesystem summary:  11.8G    1.5G      9.7G    12%  /mnt/lustre
filesystem_summary:  11.8G    3.5G      7.7G    12%  /mnt/testfs
</pre>
</pre>


== Displaying Stripe Information for a File ==
== Displaying Layout Information for a File ==


The ''lfs getstripe'' command can be used to display information that shows over which OSTs a file is distributed. For example, the output from the following command (showing multiple ''obdidx'' entries) indicates that the file ''test1'' is striped over all six active OSTs in the configuration:
The '''lfs getstripe''' command can be used to display information that shows which specific OSTs a file is distributed over. For example, the output from the following command indicates that the file ''test2'' is striped over all six active OSTs in the filesystem, both because of the '''lmm_stripe_count:''' line, and because it shows 6 separate '''l_fid:''' objects allocated for the file starting at OST0002 because the '''test1''' file had just allocated objects on OST0000 and OST0001 (the '''-y''' option formats the output nicely in YAML format):


<pre>
<pre>
[root@LustreClient01 ~]# lfs getstripe /mnt/lustre/test1
root@client# lfs getstripe -y /mnt/testfs/test2
OBDS:
lmm_stripe_count:  6
0: lustre-OST0000_UUID ACTIVE
lmm_stripe_size:   4194304
1: lustre-OST0001_UUID ACTIVE
lmm_pattern:       raid0
2: lustre-OST0002_UUID ACTIVE
lmm_layout_gen:   0
3: lustre-OST0003_UUID ACTIVE
lmm_stripe_offset: 2
4: lustre-OST0004_UUID ACTIVE
lmm_objects:
5: lustre-OST0005_UUID ACTIVE
      - l_ost_idx: 2
/mnt/lustre/test1
        l_fid:    0x100020000:0x2:0x0
    obdidx      objid     objid      group
      - l_ost_idx: 3
          0          8       0x8          0
        l_fid:    0x100030000:0x2:0x0
          1          4      0x4          0
      - l_ost_idx: 4
          2          5       0x5          0
        l_fid:    0x100040000:0x2:0x0
          3          5      0x5          0
       - l_ost_idx: 5
          4          4       0x4          0
        l_fid:    0x100050000:0x2:0x0
          5          2      0x2          0
       - l_ost_idx: 0
        l_fid:    0x100000000:0x3:0x0
       - l_ost_idx: 1
        l_fid:    0x100010000:0x3:0x0
</pre>
</pre>


In contrast, the output from the following command, which lists just a single ''obdidx'' entry, indicates that the file ''test2'' is contained on a single OST:
In contrast, the output from the following command, which shows a '''lmm_stripe_count:''' of two and lists only two '''l_fid''' entries, indicates that the file ''test1'' is stored on two OSTs, namely OST0000 and OST0001:


<pre>
<pre>
[root@LustreClient01 ~]# lfs getstripe /mnt/lustre/test_2
root@client# lfs getstripe -y /mnt/testfs/test1
OBDS:
lmm_stripe_count:  2
0: lustre-OST0000_UUID ACTIVE
lmm_stripe_size:   4194304
1: lustre-OST0001_UUID ACTIVE
lmm_pattern:      raid0
2: lustre-OST0002_UUID ACTIVE
lmm_layout_gen:    0
3: lustre-OST0003_UUID ACTIVE
lmm_stripe_offset: 3
4: lustre-OST0004_UUID ACTIVE
lmm_objects:
5: lustre-OST0005_UUID ACTIVE
      - l_ost_idx: 0
/mnt/lustre/test_2
        l_fid:    0x100000000:0x2:0x0
   obdidx     objid     objid     group
      - l_ost_idx: 1
        2          8       0x8          0
        l_fid:    0x100010000:0x2:0x0
 
</pre>
 
See the ''lfs-getstripe(1)'' and ''lfs-setstripe(1)'' man pages for full details of what options are available.
 
== Setting Up Progressive File Layouts ==
 
With Lustre 2.10 and later, it is possible to configure [http://doc.lustre.org/lustre_manual.xhtml#pfl Progressive File Layouts] (PFL) on a file, which avoids much of the need to explicitly specify layouts for files of different sizes.  A PFL file can have different layout parameters for different regions of a single file, and as the file size increases it activates the later parts of the file layout.  This can allow lower overhead for small files that only need a single stripe, increased bandwidth for larger files, and wide distribution of space usage for a very large file.
 
To create a PFL file layout, the '''lfs setstripe -E <size>''' option is used to specify the layout for each extent of the file up to the specified '''size''', and the parameters following '''-E''' can be set arbitrarily for each extent of the file.  Typically, small files should have a lower ''stripe_count'' (for low overhead) and as the file size increases the '''stripe_count''' should also increase (to distribute space usage and increase bandwidth):
 
<pre>
root@client# lfs setstripe -E 256M -c 1 -E 4G -c 4 -E -1 -c -1 -S 4M /mnt/testfs/test3
root@client# lfs getstripe /mnt/testfs/test3
/mnt/testfs/test3
  lcm_layout_gen:    3
  lcm_mirror_count:  1
  lcm_entry_count:  3
    lcme_id:            1
    lcme_mirror_id:      0
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:  268435456
      lmm_stripe_count:  1
      lmm_stripe_size:  1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 0
      lmm_objects:
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x4:0x0] }
 
    lcme_id:            2
    lcme_mirror_id:      0
    lcme_flags:          0
    lcme_extent.e_start: 268435456
    lcme_extent.e_end:  4294967296
      lmm_stripe_count:  4
      lmm_stripe_size:  1048576
      lmm_pattern:      raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: -1
 
    lcme_id:            3
    lcme_mirror_id:     0
    lcme_flags:          0
    lcme_extent.e_start: 4294967296
    lcme_extent.e_end:  EOF
      lmm_stripe_count:  -1
      lmm_stripe_size:  4194304
      lmm_pattern:      raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: -1
</pre>
This example shows a file with 3 components, the first component has a single stripe up to 256MB in size, the second component will have 4 stripes up to 4GB in size, and the last component goes to the end of the file ('''-E -1''') and will stripe over all OSTs ('''-c -1''') with a stripe size of 4MB.  The first component of a file is always initialized (has objects allocted), while the later components will only have objects allocated once the file grows larger:
<pre>
root@client# dd if=/dev/zero of=/mnt/testfs/test3 bs=10M count=30
root@client# lfs getstripe /mnt/testfs/test3
/mnt/testfs/test3
  lcm_layout_gen:   4
  lcm_mirror_count:  1
  lcm_entry_count:  3
    lcme_id:            1
    lcme_mirror_id:     0
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:  268435456
      lmm_stripe_count:  1
      lmm_stripe_size:  1048576
      lmm_pattern:      raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 0
      lmm_objects:
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x4:0x0] }
 
    lcme_id:            2
     lcme_mirror_id:     0
    lcme_flags:          init
    lcme_extent.e_start: 268435456
    lcme_extent.e_end:  4294967296
      lmm_stripe_count:  4
      lmm_stripe_size:  1048576
      lmm_pattern:      raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 2
      lmm_objects:
      - 0: { l_ost_idx: 2, l_fid: [0x100020000:0x4:0x0] }
      - 1: { l_ost_idx: 3, l_fid: [0x100030000:0x4:0x0] }
      - 2: { l_ost_idx: 4, l_fid: [0x100040000:0x4:0x0] }
      - 3: { l_ost_idx: 5, l_fid: [0x100050000:0x4:0x0] }
 
    lcme_id:            3
    lcme_mirror_id:      0
    lcme_flags:         0
    lcme_extent.e_start: 4294967296
    lcme_extent.e_end:  EOF
       lmm_stripe_count:  -1
      lmm_stripe_size:  4194304
      lmm_pattern:      raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: -1
</pre>
</pre>

Latest revision as of 13:28, 6 February 2019

One of the main factors leading to the high performance of Lustre™ file systems is the ability to stripe data over multiple OSTs. The stripe count can be set on a file system, directory, or file level. An example showing the use of striping is provided below.

The Lustre Striping Guide provides a good overview of how Lustre file striping works.

For more detailed information, see Chapter 19: Managing File Striping and Free Space in the Lustre Operations Manual.

Setting Up Striping

To see the layout of a particular file, or the default layout to be used for new files in a particular directory, use the command lfs getstripe {file|directory|root}. If run against the filesystem root directory, it will show the default layout for all files created in the filesystem that do not otherwise specify a layout at creation time or inherit it from a layout on the parent directory. For example, running the command on the filesystem root directory shows the following (the -d option limits the output to the specified directory):

root@client# lfs getstripe -d /mnt/testfs
stripe_count:  2 stripe_size:   4194304 pattern:       0 stripe_offset: -1

In this example, the default stripe_count is 2 (that is, data blocks are striped alternately over two OSTs), the default stripe_size is 4 MB (that is, each OST reads or writes 4MB of data for the first file stripe before going to the second file stripe on the next OST), and files do not start on a specific OST index (that is, the MDS will balance new file creations across all OSTs in the filesystem for maximum performance).

Note that the stripe_size does not affect the allocation size of the file on disk (which is controlled by the underlying OST filesystem, typically 4KB for ldiskfs), but only the distribution of file data across OSTs.

The command to set the above default layout on the file system looked like this:

 root@client# lfs setstripe -S 4M -c 2 /mnt/testfs

If a new 2000MB file is created with these filesystem default settings, the following results are seen:

root@client# dd if=/dev/zero of=/mnt/testfs/test1 bs=20M count=100
root@client# lfs df -h
UUID                  bytes     Used  Available   Use%   Mounted on
testfs-MDT0000_UUID    4.4G   214.5M       3.9G     4%   /mnt/testfs[MDT:0]
testfs-OST0000_UUID    2.0G     1.1G     830.1M    53%   /mnt/testfs[OST:0]
testfs-OST0001_UUID    2.0G     1.1G     830.1M    53%   /mnt/testfs[OST:1]
testfs-OST0002_UUID    2.0G    83.3M       1.8G     4%   /mnt/testfs[OST:2]
testfs-OST0003_UUID    2.0G    83.3M       1.8G     4%   /mnt/testfs[OST:3]
testfs-OST0004_UUID    2.0G    83.3M       1.8G     4%   /mnt/testfs[OST:4]
testfs-OST0005_UUID    2.0G    83.3M       1.8G     4%   /mnt/testfs[OST:5]

filesystem_summary:   11.8G     2.5G       8.8G    20%   /mnt/testfs

In this example, the entire file was written to the first two OSTs (1000MB per OST) with no usage of the other four OSTs. That only two OSTs are used is expected/requested, since other files will be created on those other OSTs to balance space and bandwidth usage. Note that the layout (stripe_count, stripe_size, OSTs) of a file is fixed when the file is first created (opened). To change the layout of a file after it is created, the lfs migrate command (which takes the same parameters as lfs setstripe) is needed to move the file data to different OSTs.

Continuing with this example, if a new 1000MB file is created with an explicit stripe_count of -1 to specify striping over all available OSTs instead of using the filesystem default:

 root@client# lfs setstripe -c -1 /mnt/testfs/test2

Now, when this file is written, the new stripe setting evenly distributes about 160MB of the filek data over each the available OSTs. Using a widely-striped file is good if the file is very large, or a lot of clients will be accessing it concurrently, but typically it is best to have individual files striped over only 1 or 2 OSTs for minimal overhead, and let multiple processes creating separate files handle the parallelism across different OSTs.

root@client# dd if=/dev/zero of=/mnt/testfs/test1 bs=10M count=100
root@client# lfs df -h
UUID                  bytes     Used  Available   Use%   Mounted on
testfs-MDT0000_UUID    4.4G   214.5M       3.9G     4%  /mnt/testfs[MDT:0]
testfs-OST0000_UUID    2.0G     1.3G     670.2M    61%  /mnt/testfs[OST:0]
testfs-OST0001_UUID    2.0G     1.3G     670.2M    61%  /mnt/testfs[OST:1]
testfs-OST0002_UUID    2.0G   251.3M       1.6G    12%  /mnt/testfs[OST:2]
testfs-OST0003_UUID    2.0G   251.3M       1.6G    12%  /mnt/testfs[OST:3]
testfs-OST0004_UUID    2.0G   247.3M       1.6G    12%  /mnt/testfs[OST:4]
testfs-OST0005_UUID    2.0G   247.3M       1.6G    12%  /mnt/testfs[OST:5]

filesystem_summary:   11.8G     3.5G       7.7G    12%  /mnt/testfs

Displaying Layout Information for a File

The lfs getstripe command can be used to display information that shows which specific OSTs a file is distributed over. For example, the output from the following command indicates that the file test2 is striped over all six active OSTs in the filesystem, both because of the lmm_stripe_count: line, and because it shows 6 separate l_fid: objects allocated for the file starting at OST0002 because the test1 file had just allocated objects on OST0000 and OST0001 (the -y option formats the output nicely in YAML format):

root@client# lfs getstripe -y /mnt/testfs/test2
lmm_stripe_count:  6
lmm_stripe_size:   4194304
lmm_pattern:       raid0
lmm_layout_gen:    0
lmm_stripe_offset: 2
lmm_objects:
      - l_ost_idx: 2
        l_fid:     0x100020000:0x2:0x0
      - l_ost_idx: 3
        l_fid:     0x100030000:0x2:0x0
      - l_ost_idx: 4
        l_fid:     0x100040000:0x2:0x0
      - l_ost_idx: 5
        l_fid:     0x100050000:0x2:0x0
      - l_ost_idx: 0
        l_fid:     0x100000000:0x3:0x0
      - l_ost_idx: 1
        l_fid:     0x100010000:0x3:0x0

In contrast, the output from the following command, which shows a lmm_stripe_count: of two and lists only two l_fid entries, indicates that the file test1 is stored on two OSTs, namely OST0000 and OST0001:

root@client# lfs getstripe -y /mnt/testfs/test1
lmm_stripe_count:  2
lmm_stripe_size:   4194304
lmm_pattern:       raid0
lmm_layout_gen:    0
lmm_stripe_offset: 3
lmm_objects:
      - l_ost_idx: 0
        l_fid:     0x100000000:0x2:0x0
      - l_ost_idx: 1
        l_fid:     0x100010000:0x2:0x0

See the lfs-getstripe(1) and lfs-setstripe(1) man pages for full details of what options are available.

Setting Up Progressive File Layouts

With Lustre 2.10 and later, it is possible to configure Progressive File Layouts (PFL) on a file, which avoids much of the need to explicitly specify layouts for files of different sizes. A PFL file can have different layout parameters for different regions of a single file, and as the file size increases it activates the later parts of the file layout. This can allow lower overhead for small files that only need a single stripe, increased bandwidth for larger files, and wide distribution of space usage for a very large file.

To create a PFL file layout, the lfs setstripe -E <size> option is used to specify the layout for each extent of the file up to the specified size, and the parameters following -E can be set arbitrarily for each extent of the file. Typically, small files should have a lower stripe_count (for low overhead) and as the file size increases the stripe_count should also increase (to distribute space usage and increase bandwidth):

root@client# lfs setstripe -E 256M -c 1 -E 4G -c 4 -E -1 -c -1 -S 4M /mnt/testfs/test3
root@client# lfs getstripe /mnt/testfs/test3
/mnt/testfs/test3
  lcm_layout_gen:    3
  lcm_mirror_count:  1
  lcm_entry_count:   3
    lcme_id:             1
    lcme_mirror_id:      0
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:   268435456
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 0
      lmm_objects:
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x4:0x0] }

    lcme_id:             2
    lcme_mirror_id:      0
    lcme_flags:          0
    lcme_extent.e_start: 268435456
    lcme_extent.e_end:   4294967296
      lmm_stripe_count:  4
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: -1

    lcme_id:             3
    lcme_mirror_id:      0
    lcme_flags:          0
    lcme_extent.e_start: 4294967296
    lcme_extent.e_end:   EOF
      lmm_stripe_count:  -1
      lmm_stripe_size:   4194304
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: -1

This example shows a file with 3 components, the first component has a single stripe up to 256MB in size, the second component will have 4 stripes up to 4GB in size, and the last component goes to the end of the file (-E -1) and will stripe over all OSTs (-c -1) with a stripe size of 4MB. The first component of a file is always initialized (has objects allocted), while the later components will only have objects allocated once the file grows larger:

root@client# dd if=/dev/zero of=/mnt/testfs/test3 bs=10M count=30
root@client# lfs getstripe /mnt/testfs/test3
/mnt/testfs/test3
  lcm_layout_gen:    4
  lcm_mirror_count:  1
  lcm_entry_count:   3
    lcme_id:             1
    lcme_mirror_id:      0
    lcme_flags:          init
    lcme_extent.e_start: 0
    lcme_extent.e_end:   268435456
      lmm_stripe_count:  1
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 0
      lmm_objects:
      - 0: { l_ost_idx: 0, l_fid: [0x100000000:0x4:0x0] }

    lcme_id:             2
    lcme_mirror_id:      0
    lcme_flags:          init
    lcme_extent.e_start: 268435456
    lcme_extent.e_end:   4294967296
      lmm_stripe_count:  4
      lmm_stripe_size:   1048576
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: 2
      lmm_objects:
      - 0: { l_ost_idx: 2, l_fid: [0x100020000:0x4:0x0] }
      - 1: { l_ost_idx: 3, l_fid: [0x100030000:0x4:0x0] }
      - 2: { l_ost_idx: 4, l_fid: [0x100040000:0x4:0x0] }
      - 3: { l_ost_idx: 5, l_fid: [0x100050000:0x4:0x0] }

    lcme_id:             3
    lcme_mirror_id:      0
    lcme_flags:          0
    lcme_extent.e_start: 4294967296
    lcme_extent.e_end:   EOF
      lmm_stripe_count:  -1
      lmm_stripe_size:   4194304
      lmm_pattern:       raid0
      lmm_layout_gen:    0
      lmm_stripe_offset: -1