ZFS Tunables for Lustre Object Storage Servers (OSS): Difference between revisions

From Lustre Wiki
Jump to navigation Jump to search
mNo edit summary
m (update a few default values from current code)
 
(2 intermediate revisions by 2 users not shown)
Line 9: Line 9:
|0
|0
|1
|1
|-
|zfetch_max_distance
|Max bytes to readahead per stream
|8MB
|64MB
|-
|-
|zfs_vdev_scheduler
|zfs_vdev_scheduler
Line 16: Line 21:
|-
|-
|zfs_arc_max
|zfs_arc_max
|Maximum size of ARC
|Maximum size of RAM cache
|50% RAM
|50% RAM
|75% RAM
|75% RAM
|-
|-
|zfs_dirty_data_max
|zfs_dirty_data_max^*
|Amount of dirty data on the system; able to absorb more workload variation before throttling
|Amount of dirty data on the system; able to absorb more workload variation before throttling
|10% RAM
|10% RAM (max 4GB)
|1 - 4GB
|10% RAM or 2-3s of full-bandwidth writes
|-
|zfs_vdev_async_read_max_active
|Maximum asynchronous read I/Os active to each device
|3
|16
|-
|zfs_vdev_aggregation_limit
|Maximum amount of data to aggregate for a single write
|128KB
|recordsize (1-4MB)
|-
|-
|zfs_vdev_async_write_active_min_dirty_percent
|zfs_vdev_async_write_active_min_dirty_percent
|Threshold below which IO scheduler will limit concurrent operations to the minimum. Above this value, concurrent operations increases linearly until the maximum.
|Threshold below which IO scheduler will limit concurrent operations to the minimum. Above this value, concurrent operations increases linearly until <code>zfs_vdev_async_write_active_min_dirty_percent</code>.
|30
|30
|20
|20
|-
|-
|zfs_vdev_async_write_min_active
|zfs_vdev_async_write_min_active
|Minimum asynchronous write I/Os active to each device
|Minimum number of asynchronous write I/Os active to each device
|1
|2
|5
|5
|-
|-
|zfs_vdev_async_write_max_active
|zfs_vdev_async_write_max_active
|Maximum asynchronous write I/Os active to each device
|Maximum number of asynchronous write I/Os active to each device
|10
|10
|10
|10
|-
|-
|zfs_vdev_sync_read_min_active
|zfs_vdev_sync_read_min_active
|Minimum synchronous read I/Os active to each device
|Minimum number of synchronous read I/Os active to each device
|
|10
|16
|16
|-
|-
|zfs_vdev_sync_read_max_active
|zfs_vdev_sync_read_max_active
|Maximum synchronous read I/Os active to each device
|Maximum number of synchronous read I/Os active to each device
|
|10
|16
|16
|-
|spl_kmem_cache_slab_limit
|Objects of spl_kmem_cache_slab_limit or smaller will be allocated using the Linux slab allocator, large objects use the SPL allocator. A cut-off of 16K  was determined to be optimal for architectures using 4K pages.
|16384
|16384
|}
|}


When dirty data is less than 30% of <code>zfs_dirty_data_max</code>, ZFS keeps one outstanding write per VDEV. Dirty data will build up very quickly, and because there is only one outstanding write per disk, ZFS will start to delay or even halt writes.
When dirty data is less than <code>zfs_vdev_async_write_active_min_dirty_percent</code> of <code>zfs_dirty_data_max</code>, ZFS keeps only <code>zfs_vdev_async_write_min_active</code> outstanding writes per VDEV. Dirty data will build up more quickly below this threshold, and because there is only one outstanding write per disk by default, ZFS would start to delay or even halt writes.


The <code>zfs_dirty_data_max</code> parameter should ideally match the backend storage capability. The code simply uses 10% of system memory as the default.
Note that the <code>zfs_dirty_data_max</code> parameter should ideally match the backend storage capability, allowing 2-3s of dirty data to be aggregated on the server to allow write merging and more efficient I/O ordering. The code simply uses 10% of system memory as the default, capped at <code>zfs_dirty_data_max_max</code> (default 25% of RAM, or 4GB, whichever is less).  Setting <code>zfs_dirty_data_max</code> explicitly will bypass the default <code>zfs_dirty_data_max_max</code> limit of 4GB.


For a comprehensive description of all available ZFS and SPL module parameters, refer to the [https://github.com/zfsonlinux/zfs/blob/master/man/man5/zfs-module-parameters.5 zfs-module-parameters](5) and [https://github.com/zfsonlinux/spl/blob/master/man/man5/spl-module-parameters.5 spl-module-parameters](5) man pages.
For a comprehensive description of all available ZFS and SPL module parameters, refer to the [https://github.com/zfsonlinux/zfs/blob/master/man/man5/zfs-module-parameters.5 zfs-module-parameters](5) and [https://github.com/zfsonlinux/spl/blob/master/man/man5/spl-module-parameters.5 spl-module-parameters](5) man pages.

Latest revision as of 13:34, 25 May 2018

Parameter Notes Default Suggested
metaslab_debug_unload Prevent ZFS from unloading the spacemaps from a metaslab once it is read in 0 1
zfetch_max_distance Max bytes to readahead per stream 8MB 64MB
zfs_vdev_scheduler VDEV Scheduler noop deadline
zfs_arc_max Maximum size of RAM cache 50% RAM 75% RAM
zfs_dirty_data_max^* Amount of dirty data on the system; able to absorb more workload variation before throttling 10% RAM (max 4GB) 10% RAM or 2-3s of full-bandwidth writes
zfs_vdev_async_read_max_active Maximum asynchronous read I/Os active to each device 3 16
zfs_vdev_aggregation_limit Maximum amount of data to aggregate for a single write 128KB recordsize (1-4MB)
zfs_vdev_async_write_active_min_dirty_percent Threshold below which IO scheduler will limit concurrent operations to the minimum. Above this value, concurrent operations increases linearly until zfs_vdev_async_write_active_min_dirty_percent. 30 20
zfs_vdev_async_write_min_active Minimum number of asynchronous write I/Os active to each device 2 5
zfs_vdev_async_write_max_active Maximum number of asynchronous write I/Os active to each device 10 10
zfs_vdev_sync_read_min_active Minimum number of synchronous read I/Os active to each device 10 16
zfs_vdev_sync_read_max_active Maximum number of synchronous read I/Os active to each device 10 16

When dirty data is less than zfs_vdev_async_write_active_min_dirty_percent of zfs_dirty_data_max, ZFS keeps only zfs_vdev_async_write_min_active outstanding writes per VDEV. Dirty data will build up more quickly below this threshold, and because there is only one outstanding write per disk by default, ZFS would start to delay or even halt writes.

Note that the zfs_dirty_data_max parameter should ideally match the backend storage capability, allowing 2-3s of dirty data to be aggregated on the server to allow write merging and more efficient I/O ordering. The code simply uses 10% of system memory as the default, capped at zfs_dirty_data_max_max (default 25% of RAM, or 4GB, whichever is less). Setting zfs_dirty_data_max explicitly will bypass the default zfs_dirty_data_max_max limit of 4GB.

For a comprehensive description of all available ZFS and SPL module parameters, refer to the zfs-module-parameters(5) and spl-module-parameters(5) man pages.

In addition to the kernel module parameters, it is recommended that ZFS compression is also enabled when creating ZFS datasets for OSTs. Creating Lustre Object Storage Services (OSS) provides examples of the commands to create OSTs with compression enabled.