Category:Benchmarking

Introduction
Benchmarking is a complex and, to a certain extent, controversial subject, certain to stimulate debate among interested parties, who are unlikely to reach any meaningful agreement on tools, approach, workloads or acceptable outcomes. Nevertheless, there is a general consensus that benchmarking is necessary.

What follows is a general outline of benchmarking for the purpose of establishing the overall health of a Lustre file system and its supporting infrastructure. The methods are intended to exercise the infrastructure (stress testing) to identify faults and establish baseline performance, rather than to monitor it.

The goal is to ensure that a system is running in accordance with specification, and defects have been identified and eliminated. These are requirements that are typical of acceptance criteria for delivery of a new system, or for ongoing reliability and performance testing as part of a service level agreement (SLA) or audit.

Baseline measurements provide a reference for users against which to compare the performance or efficiency of their applications.

Benchmarking will also highlight potential defects in the environment, for example if a result is lower than expected when compared to the established reference, or if no result is returned at all.

Process Outline
Benchmarking is, in principal, a reasonably straightforward endeavour: one runs a test, records a set of measurements and compares that to an established reference. The closer the measurements are to matching the reference, the better the result.

With Lustre, the typical goal is to deliver streaming IO performance in excess of 90% of the measured baseline performance of the underlying platform. Or put another way: 90%+ efficiency of the underlying system performance (for large-scale streaming IO workloads).

Lustre is a virtual file system running in software (albeit within the Linux kernel), so it depends upon the capabilities of the underlying platform: the hardware, the networking and the OS.

Therefore, in order to determine whether or not Lustre is capable of meeting a performance goal, one must determine the capabilities of the platform upon which Lustre is running. This is again conducted through benchmarking of the computers, storage and networking, comparing the results with the specifications provided by the vendors.

A picture of Lustre health is built up in layers:


 * 1) Establish the base operating platform
 * 2) * Physical integration, cabling and power
 * 3) * Firmware updates
 * 4) * Storage configuration
 * 5) * OS installation and configuration
 * 6) * Device driver installation
 * 7) * Application software installation (including Lustre)
 * 8) Benchmark the network
 * 9) Benchmark the individual storage devices or LUNs
 * 10) Benchmark the OSS servers
 * 11) Benchmark the MDS servers
 * 12) Userspace / application benchmarks

Network
Establish a network performance baseline, independent of Lustre or LNet, using the tools relevant to the network technology itself. Once the baseline is established, and it is determined that the fabric is clean (and optionally, soft and fragrant), run the LNet benchmark,.

This section will skip the basic connectivity testing, assuming that readers will have some existing familiarity with,  ,  , and so on. All of the tools basically depend on being able to establish connections between hosts, as a fundamental pre-requisite. Diagnostics tools are also omitted, for the sake of brevity.

The intention of these benchmarks is to establish a point-to-point performance baseline, and as such may not scale to very large clusters, at least not without some scripting to automate the process.

Ethernet / TCP
The  utility provides comprehensive bandwidth and latency performance testing for TCP networks. The software is hosted on GitHub:

https://github.com/HewlettPackard/netperf

Netperf has two components: a server daemon and a client. The server process acts as a target for the client to connect to.

The server is invoked from the command line as follows:

 netserver

The  process will return the port number it is running on, e.g. 12865.

Here are two simple examples of running the Netperf client:

 netperf -H  -p -t TCP_RR -f B -l 60 -v 2
 * 1) Measure request / response (latency)

netperf -H  -p -t TCP_STREAM -f M -l 60 -v 2
 * 1) Measure bulk transfer

For example:

 netperf -H 10.73.2.22 -p 12865 -t TCP_RR -f B -l 60 -v 2

netperf -H 10.73.2.22 -p 12865 -t TCP_STREAM -f M -l 60 -v 2

InfiniBand
The simplest tools in the IB benchmarking arsenal are  and , or. They are distributed in the InfiniBand performance tools package and provide a simple ready reckoner of point to point performance.

Each of the tools has similar command line options and are invoked in a similar manner. The tools have a receiver (server) and sender (client) mode.

For example, to start a server process:

 ib_read_bw [-F] [-d ] [-i ]

To run a client process:

 ib_read_bw [-F] [-d ] [-i ]

There are more options available. Refer to the man page for more details.

Omni-Path
For Omni-Path fabrics, the  command, which is part of the   package, is the easiest to operate. The command has server and client modes. Launch the server / receiver as follows:

 hfi1_pkt_test -r

The server invocation will output the LID and context that it is listening on.

To run the client process:

 hfi1_pkt_test -L -C

LNet Selftest (lnet_selftest)
LNet Selftest is a kernel module and application used to benchmark the performance of the Lustre Networking (LNet) protocol. A page dedicated to LNet Selftest has been created: LNET_Selftest

Storage Devices / LUNs
Benchmarking the individual storage devices for an installation is recommended, although the process can be daunting for very large installations. At the very least, consider benchmarking all of the LUNs that will ultimately be used to create the Lustre OSDs (MGT, MDTs, OSTs) before committing any permanent production data to the storage. This will help to identify early life defects, as well as establish the baseline performance of the storage subsystem.

Benchmarking the storage is critical, particularly as this is one of the most difficult components of the infrastructure to replace once the file system enters production. Intermittent faults, performance variation and slow-downs of individual devices can be difficult to isolate once a system is live.

VDBench
VDBench is a mature and very capable application for storage benchmarking. VDBench is available for free from Oracle, although it may require a portion of your soul to acquire. Also, it is a Java application, which amounts to much the same thing. There is a page dedicated to VDBench: VDBench

In addition to benchmarking low-level devices, VDBench can be used as a general file system benchmark as well. It is versatile and can simulate different types of application workloads.

SGP_DD Survey (sgpdd-survey)
is an IO workload generator for benchmarking the performance of disk storage, generating large sequential IO workload on the target storage devices. It is a wrapper for the  (SCSI Generic Parallel DD) command found in the SCSI device utilities  package.

is part of the Lustre IO Kit package. There is a page dedicated to : SGPDD_Survey

OBDFilter Survey (obdfilter-survey)
OBDFilter-Survey tests the performance of one or more OSTs by simulating client IO. Each OSS server in an installation is tested individually. The script is a wrapper around the  sub-command. OBDFilter-survey requires a functional Lustre file system, i.e. MGS and MDT running, target OSTs running. There are three modes of execution,,   and  , but normally only the disk test is run. The other two are less reliable, and the network test in particular has been superseded by.

is part of the Lustre IO Kit package. There is a page dedicated to : OBDFilter_Survey

MDTest (mdtest)
MDTest is an MPI-based application for evaluating the metadata performance of a file system and has been designed to test parallel file systems including Lustre.

The application runs on Lustre clients and requires a fully configured Lustre file system. should be run in parallel across several nodes in order to saturate the file system. The program can create directory trees or arbitrary depth and can be directed to create a mixture of work-loads, including file-only tests. One specifies how many threads per client to create and how many files or directories per thread.

An overview of the software is available here: MDTest

Note: MDTest is not a Lustre-specific benchmark and can be run on any POSIX-compliant file system. It does require a fully configured file system implementation in order to run; for Lustre this means the MDS service, OSSs and clients are installed, configured and running.

IOR
IOR is a commonly used file system benchmarking application particularly well-suited for evaluating the performance of parallel file systems. It is, in some ways, the equivalent of Linpack for high performance storage.

The application runs on Lustre clients and requires a fully configured Lustre file system, although it is not a Lustre-specific application. There is a description on using IOR here: IOR

Additional References
Highly recommended resource for reviewing Linux performance tools:
 * http://www.brendangregg.com/linuxperf.html