OBDFilter Survey

Description
OBDFilter-Survey tests the performance of one or more OSTs by simulating Lustre client IO. Each OSS server in an installation is tested individually. The  script is a wrapper around the   sub-command. OBDFilter-survey requires a functional Lustre file system, i.e. MGS and MDT running, target OSTs running. Lustre clients are not required for disk-only test but are needed for the network and remote file system (netdisk) modes (although in practice, the latter two modes are not used).

There are 3 test cases covered by the  benchmark, referred to as:



The  and   modes are not normally used for benchmarking as they may produce unreliable results. The  test has been effectively superseded by the LNET_Selftest benchmark. Also note that the  benchmark itself does not scale well beyond a small number of OSTs. From the Lustre discussion mailing list:

 The obdfilter_survey script is NOT scalable beyond tens of OSTs since it is only intended to measure the I/O performance of individual storage subsystems, not the scalability of the entire system.

Therefore, only run  on individual OSTs, using the   test case.

Note:  is a potentially destructive test and there is a small risk that pre-existing data will be lost during execution. The test will consume capacity on the storage targets during execution and, in the manner of all benchmarks, will compete with other processes for resources. Do not run this benchmark on a system containing production data.

Purpose
OBDFilter-Survey provides feedback on the potential performance of OSTs attached to an OSS. The  script generates sequential I/O from varying numbers of threads and objects (files) to simulate the I/O patterns of a Lustre client. It can be run directly on an OSS node to measure the OST storage performance without any intervening network, or it can be run remotely on a Lustre client to measure the OST performance including network overhead.

The approach and methodology for  is very similar to that for sgpdd-survey.

Preparation

 * 1) Install the host operating system for each of the Lustre servers (MGS, MDS and OSS). On some sites, the MGS will be co-located with an MDS.
 * 2) Install the Lustre server software distribution on each system.
 * 3) Configure the Lustre Network (LNet) module and verify that it is operating correctly.
 * 4) Create the MGS, MDT and OST file system targets according to the system design.
 * 5) Start the Lustre services (mount MGS, OSTs and MDTs).

Benchmark Execution
The  script takes its parameters from environment variables established at run time. The parameters that are of most interest are as follows (refer to the section OBDFilter-Survey Input Parameters for a detailed breakdown of the parameters and how to calculate suitable values):

The following is an example command line for executing :

 mkdir -p /var/tmp/obdfilter-survey_out

nobjlo=1 nobjhi=512 \ thrlo=1 thrhi=1024 \ size=51200 \ rslt_loc=/var/tmp/obdfilter-survey_out \ targets="lustre-OST0006 lustre-OST0007 lustre-OST0008" \ case=disk \ obdfilter-survey

OBDFilter-Survey Input Parameters
Note: The parameters to  are largely undocumented. This section attempts to accurately reflect the intended meaning and usage of each parameter.

Note: Due to the similarity between  and   command options, some material is duplicated between the two pages.

Issues
 If the test is aborted before completion, the target OST will not be cleaned up. Over time, the OST will fill with objects if this pattern of aborting runs is repeated. It is difficult to clean these objects from the OST. If this happens, umount the OST and re-mount as type. Go to the  (alphabetic letter capital 'O' – oh – not the numeral zero) directory and remove the objects found there. Do not remove the  file.

Alternatively, read the object IDs from the detail log of the test run and use the following command while the file system is mounted as type :

 lctl --device destroy  ERROR entries in the benchmark output usually mean out of space condition – check detail log. SHORT entries in the benchmark output usually mean test completed too quickly, e.g. because data size is too small – check detail log.

On catastrophic fail, echo-clients might not be cleaned up. Use  to fix, e.g.:

 [root@oss1 ~]# lctl dl 0 UP mgc MGC10.73.0.11@tcp 82e16bb1-9e83-1236-4435-c0dcf29a04da 5 1 UP ost OSS OSS_uuid 3 2 UP obdfilter lustre-OST0000 lustre-OST0000_UUID 7 3 AT echo_client lustre-OST0000_ecc lustre-OST0000_ecc_UUID 1

[root@oss1 ~]# lctl lctl > cfg lustre-OST0000_ecc lctl > cleanup lctl > detach lctl > exit

[root@oss1 ~]# lctl dl 0 UP mgc MGC10.73.0.11@tcp 82e16bb1-9e83-1236-4435-c0dcf29a04da 5 1 UP ost OSS OSS_uuid 3 2 UP obdfilter lustre-OST0000 lustre-OST0000_UUID 7  

Appendix: Sample script
 TARGETS="oss01:lustre oss02:lustre1 oss02:lustre2" NOBJLO=1 NOBJHI=1 THRLO=16 THRHI=16 OUTPUT="/tmp/obdfilter-survey-1j-16t-100s.out" SIZE="100" ssh oss01 mkdir -p $OUTPUT ssh oss02 mkdir -p $OUTPUT thrhi=$THRHI thrlo=$THRLO \ nobjhi=$NOBJHI nobjlo=$NOBJLO \ size=$SIZE case="disk" \ targets=$TARGETS rslt_loc=$OUTPUT \ /usr/bin/obdfilter-survey
 * 1) !/bin/bash
 * 2) Minimal obdfilter-survey script
 * 1) SIZE="46000"