SGPDD Survey

From Lustre Wiki
Jump to: navigation, search

Description

sgpdd-survey is an IO workload generator for benchmarking the performance of disk storage, generating large sequential IO workload on the target storage devices. It is a wrapper for the sgp_dd (SCSI Generic Parallel DD) command found in the SCSI device utilities (sg3_utils) package.

sgp_dd is a scalable version of the “dd” command with options for multi-threaded IO with different blocksizes and regionsizes. From the sgp_dd(8) man page:

"[sgp_dd is] specialised for "files" that are Linux SCSI generic (sg) and raw devices. Similar syntax and semantics to dd(1) but does not perform any conversions. Uses POSIX threads to increase the amount of parallelism. This improves speed in some cases."

Purpose

SGPDD-survey is intended to evaluate the raw performance of all LUNs in the storage arrays attached to a server. The benchmark is useful in testing the overall throughput of a storage controller and is typically run after establishing the baseline performance of the individual drives and/or LUNs attached through the controller.

The logic behind this process flow is to determine whether or not the storage controller or HBA or some other higher-level component within the server platform introduces a performance bottleneck, at least for bandwidth. One can normally derive the theoretical bandwidth of the data path from the hardware specifications. Low-level specifications for the disk drives and the I/O paths will determine the theoretical maximum bandwidth of the storage subsystem. One must consider the bandwidth of the PCIe bus, HBA, storage controller and the individual storage devices (disk drives and SSDs). The slowest component in the IO path will determine the maximum throughput that can be obtained by the system overall. The sgpdd-survey benchmark can help to determine how close to the theoretical bandwidth the system is operating and highlight any significant deviation from the ideal.

sgpdd-survey is typically run within an individual host across all LUNs presented to that host. In high availability (HA) server configurations, where the storage will be presented in an active-passive failover configuration, the benchmark is normally run against the "primary" storage targets for each server only. That is, the LUNs are assumed to be mapped to their preferred servers.

Preparation

The sgpdd-survey script is distributed in the lustre-iokit package, and requires the sgp_dd program from the sg3_utils package. Install the lustre-iokit package on each of the machines that will be evaluated by the benchmark. On RHEL / CentOS systems, yum will automatically resolve additional dependencies.

Configure all storage volumes into their production configuration and present the LUNs for use on the target host.

Benchmark Execution

The sgpdd-survey script takes its parameters from environment variables established at run time. The parameters that are of most interest are as follows (refer to the section SGPDD-Survey Input Parameters for a detailed breakdown of the parameters and how to calculate suitable values):

Name Description Typical Values
crglo Initial number of concurrent regions 1
crghi Maximum number of concurrent regions 256
thrlo Initial number of threads 1
thrhi Maximum number of threads 4096
size Data set size in MiB per storage device (LUN) 2.5 * size(RAM) / count(LUNs)
rslt_loc Directory to contain results. Must exist before benchmark is run /var/tmp/sgpdd-survey_out
scsidevs List of storage targets to test. Space separated list enclosed in quotes; each target has the format <hostname>:<device path>. Must choose one of scsidevs or rawdevs but not both Example:

"oss01:/dev/sdb oss01:/dev/sdc"

rawdevs List of raw devices to test. Space separated list enclosed in quotes; each target has the format <hostname>:<device path>. Must choose one of scsidevs or rawdevs but not both Example:

"oss01:/dev/raw/raw1 oss01:/dev/raw/raw2"

The scsidevs option is typically chosen for hardware RAID storage systems, whereas the rawdevs option is typically required for software RAID devices.

The following is an example command line for executing sgpdd-survey:

mkdir -p /var/tmp/sgpdd-survey_out

crglo=1 crghi=256 \
thrlo=1 thrhi=4096 \
size=51200 \
rslt_loc=/var/tmp/sgpdd-survey_out \
scsidevs="ct7-oss1:/dev/sdb ct7-oss1:/dev/sdc ct7-oss1:/dev/sdd" \
sgpdd-survey

SGPDD-Survey Input Parameters

Name Description Typical Values
crglo

crghi

The range of concurrent regions to exercise on each iteration, per device. Starting at crglo, the number of concurrent regions is doubled on each iteration until crghi is reached.

The crglo and crghi parameters control how many independent regions on the storage device will be read or written simultaneously. This is intended to simulate multiple Lustre clients accessing each OST.

More regions mean less performance, as this will increase the amount of seeking the storage devices need to do.

crglo=1

crghi=256

thrlo

thrhi

The range of threads to iterate over. For each value of the concurrent region range, run sgp_dd with the number of threads starting at thrlo until thrhi is reached. The thread count is doubled on every iteration of the thread count loop. The starting thread count is either thrlo or crglo, whichever is greater.

The thrlo and thrhi parameters are used to direct the number of worker threads running in parallel. This is intended to simulate the Lustre OSS threads.

thrlo=1

thrhi=4096

size The data set size in MiB per storage device (LUN). The total dataset size for the entire benchmark is calculated as the LUN dataset size multiplied by the number of LUNs in the benchmark:
ds_total = size * count(LUNs)

Set size to a small value (e.g. 100MB) to quickly test the configuration for correctness.

For a full benchmarking run, set ds_total to a value greater than or equal to twice the target system's RAM. This circumvents any cacheing that may occur within the target system. From this, calculate the value of the size input parameter for sgpdd-survey as follows:

size = ds_total / count(LUNs)

where ds_total = 2.5 * RAM

As an example, for a server with 32GB RAM and 4 LUNs, the input size is calculated as follows:

size = (32GB * 2.5) / 4 LUNs
     = (32768MiB * 2.5) / 4
     = 81920 / 4
     = 20480
2.5 * sizeof(RAM) / count(LUNs), in MB

e.g.:

size=20480

rslt_loc Directory to contain results. Must exist before benchmark is run. rslt_loc=/var/tmp/sgpdd-survey_out
scsidevs List of storage targets to test. Space separated list enclosed in quotes; each target has the format <hostname>:<device path>. The host component is optional and can be used when sgpdd-survey is scaled across multiple servers (ref: LU-2043). Normally, sgpdd-survey benchmarking tasks are contained within individual hosts.

Must choose one of scsidevs or rawdevs but not both. Under normal circumstances, use scsidevs unless working with MD RAID, in which case use rawdevs.

Example:

"scsidevs=oss01:/dev/sdb oss01:/dev/sdc"

rawdevs List of raw devices to test. Space separated list enclosed in quotes; each target has the format <hostname>:<device path>. When benchmarking software RAID, use the raw command to map the MD RAID device path to a raw device. See the MD RAID and raw devices section for more detailed information.

Must choose one of scsidevs or rawdevs but not both. rawdevs is normally only used when benchmarking MD RAID.

Example:

rawdevs="oss01:/dev/raw/raw1 oss01:/dev/raw/raw2"

MD RAID and Raw Devices

Software RAID volumes created with MD RAID should be submitted to the benchmark using the rawdevs input parameter, not scsidevs. For this to work, bind the MD RAID block device to a raw device using the raw command. e.g.:

raw /dev/raw/raw1 /dev/md0

This will create a raw binding for the MD RAID device /dev/md0 to /dev/raw/raw1. The raw device name is arbitrary but take care not to overwrite an existing binding as the binding will be replaced without issuing a warning. One can also specify the major, minor numbers for the block device to be mapped. To list the existing raw bindings:

raw -qa

To remove a binding, map the raw device to major and minor device numbers 0 0, e.g.:

raw /dev/raw/raw1 0 0