ZFS Tunables for Lustre Object Storage Servers (OSS)

From Lustre Wiki
Revision as of 05:18, 8 May 2018 by Happe (talk | contribs) (Added zfs_dirty_data_max_max (capping) and added optimization values.)
Jump to navigation Jump to search
Parameter Notes Default Suggested
metaslab_debug_unload Prevent ZFS from unloading the spacemaps from a metaslab once it is read in 0 1
zfs_vdev_scheduler VDEV Scheduler noop deadline
zfs_arc_max Maximum size of ARC 50% RAM 75% RAM
zfs_dirty_data_max Amount of dirty data on the system; able to absorb more workload variation before throttling 10% RAM 1 - 4GB
zfs_vdev_async_write_active_min_dirty_percent Threshold below which IO scheduler will limit concurrent operations to the minimum. Above this value, concurrent operations increases linearly until the maximum. 30 20
zfs_vdev_async_write_min_active Minimum asynchronous write I/Os active to each device 1 5
zfs_vdev_async_write_max_active Maximum asynchronous write I/Os active to each device 10 10
zfs_vdev_sync_read_min_active Minimum synchronous read I/Os active to each device 16
zfs_vdev_sync_read_max_active Maximum synchronous read I/Os active to each device 16
spl_kmem_cache_slab_limit Objects of spl_kmem_cache_slab_limit or smaller will be allocated using the Linux slab allocator, large objects use the SPL allocator. A cut-off of 16K was determined to be optimal for architectures using 4K pages. 16384 16384

When dirty data is less than 30% of zfs_dirty_data_max, ZFS keeps one outstanding write per VDEV. Dirty data will build up very quickly, and because there is only one outstanding write per disk, ZFS will start to delay or even halt writes.

The zfs_dirty_data_max parameter should ideally match the backend storage capability. The code simply uses 10% of system memory, capped at zfs_dirty_data_max_max, as the default.

For a comprehensive description of all available ZFS and SPL module parameters, refer to the zfs-module-parameters(5) and spl-module-parameters(5) man pages.

In addition to the kernel module parameters, it is recommended that ZFS compression is also enabled when creating ZFS datasets for OSTs. Creating Lustre Object Storage Services (OSS) provides examples of the commands to create OSTs with compression enabled.

A few parameters to get some extra bandwidth. The suggested values might need tweaking for the specific setup.

Parameter Notes Default Suggested
zfetch_max_distance Max bytes to prefetch per stream 8MB 64MB
zfs_vdev_async_read_max_active Maximum asynchronous read I/Os active to each device 3 16
zfs_vdev_aggregation_limit Max vdev I/O aggregation size 128KB 16MB