VDBench

From Lustre Wiki
Jump to navigation Jump to search

Description

VDBench is an I/O workload generator for measuring storage performance and verifying the data integrity of direct-attached and network connected storage. The software is known to run on several operating platforms.

Purpose

VDBench is typically used to establish baseline performance characteristics of block storage, both for individual disk drives and for RAID LUNs. It can also be used at the file system level to simulate application IO. The processes in this document will use the benchmark to verify the performance of individual drives and LUNs. VDBench will destroy content when running write workloads against raw devices. Do not use on raw devices containing production data.

Preparation

  1. Install the Java Run-time environment, if not already present on the target machine. Use the official JRE from http://java.com. The complete list of supported run-times is available at: https://www.java.com/en/download/manual.jsp.
    • Unless Java is a permanent fixture of the platform run-time, download the 64-bit tarball rather than the RPM package. This will allow the Java software to be installed in an arbitrary, isolated directory structure and easily deleted when the benchmark is concluded. For example:
      cd $HOME
      tar zxf $HOME/jre-8u131-linux-x64.tar.gz
      
  2. Download VDBench. The current official version is available from the Oracle Technology Network (OTN):

    http://www.oracle.com/technetwork/server-storage/vdbench-downloads-1901681.html

    The download is free, but Oracle requires that users register an account. An older version remains on SourceForge:

    https://sourceforge.net/projects/vdbench/

  3. Unzip the VDBench archive:
    mkdir $HOME/vdbench
    cd $HOME/vdbench $$ unzip $HOME/vdbench50406.zip
    
  4. Update the vdbench wrapper script contained within the VDBench distribution to point to the JRE location. e.g.:
    cd $HOME/vdbench
    sed -i.inst 's/^\(java\)=.*$/\1=$HOME\/jre1.8.0_131\/bin\/java/' vdbench
    
  5. Run a quick test to ensure that vdbench can run on the target system:
    ./vdbench -t
    

Older versions of VDBench require CSH or TCSH, so we can be thankful for small mercies.

Benchmark Execution

Establish Baseline Performance of Individual Drives

  1. Ensure that all individual disks are presented to the operating platform for testing. For some storage arrays, one must create separate LUNS for each target device.
  2. Create a test profile for each target designed to run read only and write only tests. Use the following to create a template:
    for i in b c d e f; do
    sed 's/sdX/sd'${i}'/g' > input_sd${i}_rw_test <<__EOF
    # SD -- Storage Definition
    sd=sdX,lun=/dev/sdX,openflags=o_direct
    # WD -- Workload Definition
    wd=sdX_wd_r_seq,sd=sd*,xfersize=1024k,rdpct=100,seekpct=sequential
    wd=sdX_wd_w_seq,sd=sd*,xfersize=1024k,rdpct=0,seekpct=sequential
    # RD -- Run Definition
    rd=sdX_run_r_seq_iomax,wd=sdX_wd_r_seq,iorate=max,elapsed=100,interval=10,forthreads=(1-1024,d),warmup=20
    rd=sdX_run_w_seq_iomax,wd=sdX_wd_w_seq,iorate=max,elapsed=100,interval=10,forthreads=(1-1024,d),warmup=20
    __EOF
    done
    

    Initially, structure the read tests to run first. This gives the best opportunity to discover and fix any potential error in the benchmark configuration before a write test is run that destroys the data. Adjust the input files according to the results obtained when looking at optimisation; for example, if write performance is poor, one may wish to disable the read tests altogether while adjusting parameters that affect write performance.

    Create one test profile for each device under test. Templates containing multiple storage definitions will be evaluated for future use once issues relating to CPU utilisation have been resolved (see notes).

  3. Run each vdbench test case in sequence:
    for i in b c d e f; do
    ./vdbench -f input_sd${i}_rw_test -o o_sd${i}_rw_test.tod
    done
    
  4. Tabulate the results in a spreadsheet such as Excel and generate graphs to visualise the data. Establish the performance trend and look for any exceptions. One should normally expect to see healthy drives performing within +/-5% of one another. Faulty drives normally stand out quite clearly.
  5. Replace any bad drives and re-run VDBench against those targets.

Establish Baseline Performance of RAID LUNs

  1. Create the RAID LUNs that will be used to establish the file system storage volumes for Lustre.
  2. Repeat the VDBench benchmark using the same test profile as for individual storage devices:
    for i in b c d e f; do
    sed 's/sdX/sd'${i}'/g' > input_sd${i}_raid_rw_test <<__EOF
    # VDBench baseline performance test for RAID Volumes
    # SD -- Storage Definition
    sd=sdX,lun=/dev/sdX,openflags=o_direct
    # WD -- Workload Definition
    wd=sdX_raid_wd_r_seq,sd=sd*,xfersize=1024k,rdpct=100,seekpct=sequential
    wd=sdX_raid_wd_w_seq,sd=sd*,xfersize=1024k,rdpct=0,seekpct=sequential
    # RD -- Run Definition
    rd=sdX_raid_run_r_seq_iomax,wd=wd_r_seq,iorate=max,elapsed=100,interval=10,forthreads=(1-1024,d),warmup=20
    rd=sdX_raid_run_w_seq_iomax,wd=wd_w_seq,iorate=max,elapsed=100,interval=10,forthreads=(1-1024,d),warmup=20
    __EOF
    done
    

    Note that the device names may be different for assembled LUNs, depending on the driver used and/or vendor-supplied software. e.g. MD RAID devices are typically /dev/mdX and kernel multipath devices are typically /dev/dm-XX. This can vary by Linux distribution as well as by storage vendor.

  3. Run each vdbench test case in sequence:
    for i in b c d e f; do
    ./vdbench -f input_sd${i}_raid_rw_test -o o_sd${i}_raid_rw_test.tod
    done
    
  4. Tabulate the results in a spreadsheet such as Excel and generate graphs to visualise the data. Establish the performance trend and look for any exceptions. One should normally expect to see healthy volumes performing within +/-5% of one another.
  5. If any exceptions are discovered, examine the affected LUN to identify the root cause. If a hardware fault has been identified, replace the affected component.
  6. If one or more disk drives have been replaced, re-run vdbench against the replacement device(s). Note that it may be necessary to destroy the RAID volume in order to re-run the vdbench test case for individual drives. When individual testing is complete, re-assemble the RAID volume and re-run the benchmark for RAID LUNs.
  7. Finalise the results and record in the spreadsheet.

Notes

VDBench Test Definition Files

In a vdbench input template, there are 3 main sections that are of importance:

  • SD: storage definition
  • WD: workload definition
  • RD: run definition

Definitions must be recorded in the template in the specific order listed above, i.e. SD, then WD and finally RD.

Each definition is contained on a single line. Continuation over multiple lines can be managed by using the standard shell continuation character '\' (backslash) at the end of the line. The continuation character must be immediately preceded by whitespace and must be the last character on the line.

SD: Storage Definition

SD, the storage definition, is used to define the characteristics of the disk or LUN to be tested, e.g.:

sd=sdb,lun=/dev/sdb,openflags=o_direct
Parameter Description
sd=sdb Marks the start of a storage definition. sdb is an arbitrary label and must be unique within the definition file. Using the device name is recommended – the WWID or some other unique identifier that can be uniquely associated with the device under test are also suitable. Multiple storage definitions can be listed, one per line.
lun=/dev/sdb The path to the storage target.
openflags=o_direct Additional controls or options for opening or closing LUNs or files.

Note: On Linux, one must specify openflags=o_direct when referencing a device file, e.g. /dev/sdX

WD: Workload Definition

WD, the workload definition describes the test characteristics for a given storage definition:

wd=wd1,sd=sd*,xfersize=1024k,rdpct=100,seekpct=sequential
wd=wd2,sd=sd*,xfersize=1024k,rdpct=0,seekpct=sequential

In the above examples, wd1 represents a 100% sequential read workload and wd2 represents a 100% sequential write workload.

Parameter Description
wd=wd1 Marks the start of a workload definition. Must appear after the storage definitions and before any run definitions. wd1 is an arbitrary label and must be unique within the definition file. It is recommended that the workload definition name reflect the type of test, e.g.: sdX_wd_r_seq (where sdX is the device label, wdstands for workload definition, r is read workload, seq means sequential workload).
sd=sd* The name of the storage definitions to use. There can be more than one, e.g.: sd=(sd1,sd2). An asterisk (*) indicates all storage definitions listed in the template. In this example, sd* refers to all storage definitions with a label beginning with sd.
xfersize=1024k The data transfer size distribution. Normally use 1024k for Lustre workloads.
rdpct=100

rdpct=0

The percentage number of read transactions in the workload. 0 indicates zero reads (in other words, a 100% write workload). 100 indicates a 100% read workload.
seekpct=sequential The percentage of random seeks in the workload. 0 or sequential indicates zero random seeks. 100 or random means every I/O goes to a random seek address.

RD: Run Definition

RD, the run definition, specifies the workload definition to run, the IO rates to generate and how long to run for.

rd=run1,wd=wd1,iorate=max,elapsed=100,interval=10,warmup=20,forthreads=(1-1024,d)
rd=run2,wd=wd2,iorate=max,elapsed=100,interval=10,warmup=20,forthreads=(1-1024,d)
Parameter Description
rd=run1

rd=run2

Marks the start of a run definition. Must appear after the storage definitions and workload definitions. run1 is an arbitrary label, unique within the definition file. It is recommended that the run definition name reflect the test characteristics, e.g.: sdX_run_r_seq_iomax (sdX is the device label, run menas run definition, r for read workload, seq for sequential workload, iomax means maximum I/O rate).
wd=wd1

wd=wd2

The workload definition(s) to use. Normally just select one.
elapsed=100 The time, in seconds, for each run. Must be at least 2x the reporting interval. Does not include any warmup time, if specified (total run time will be elapsed time plus warm-up).
interval=10 The reporting interval, i.e. the number of seconds between each report update.
warmup=20 The time to wait before recording results in the run total. Must be a multiple of the reporting interval. In the above example, the first 2 reports will not be recorded in the overall results. The result of the warmup runs will still be reported in the output but will not form part of the overall result.
forthreads=(1-1024,d) Create a loop for generating results for different thread counts. (1-1024,d) represents a range from 1 to 1024 threads, (d)oubling the thread count on each iteration (i.e. 1, 2, 4, 8, 16, ... 1024).

Running VDBench

When executing the vdbench command, use the -f flag to refer to the input test definition and -o to refer to the output directory that will contain the results. e.g.:

./vdbench -f sdb_read_write -o o_sdb_read_write

The input definition file name should conform to the following format:

input_<device>_<test type>_test

e.g.:

input_sdb_rw_test

The output directory name should conform to the following format:

o_<device>_<test type>_test.tod

e.g.:

o_sdb_rw_test.tod

The suffix, .tod, instructs VDBench to add the date and time of day as the suffix to the output directory. This helps to prevent test results data being overwritten on repeat test runs.

References