LNET Selftest

Description
LNET Selftest is a network verification application for Lustre that performs 2 principal functions:
 * Confirm the correctness of the Lustre Networking configuration of a given machine
 * Measure the performance of the connection between a set of machines

LNET Selftest requires that the Lustre software is installed and the LNET kernel module is configured on each of the machines that will participate in the Lustre network. A Lustre file system is not required in order to verify LNet functionality with the LNet-selftest benchmark.

Purpose
LNET-selftest is used to verify that the Lustre networking (LNET) module is properly installed and configured, as well as to establish the performance of the underlying hardware that supports Lustre on a computer system. LNET-selftest is very useful in determining the performance of the networking layer, which is essential when conducting a survey of a Lustre file system installation, either for benchmarking purposes or for identifying bottlenecks or limitations in the data pipeline from client to server.

There are two test types that the software supports: " " (BRW) and " ". From the Lustre manual:
 * - a  generates a short request message, which results in a short response. Pings are useful to determine latency and small message overhead and to simulate Lustre metadata traffic.
 * - in a  ('bulk read write') test, data is transferred from the target machine to the source machine  or data is transferred from the source to the target . The size of the bulk transfer is set using the   parameter. A BRW test is useful to determine network bandwidth and to simulate Lustre I/O traffic.

Preparation
The LNET Selftest kernel module must be installed and loaded on all machines in the test before the application is started. Identify the set of all systems that will participate in a session and ensure that the kernel module has been loaded. To load the kernel module run this command:

 modprobe lnet_selftest

Kernel module dependencies are automatically resolved and loaded by modprobe. This will make sure all the necessary modules are loaded:,  ,   and the kernel lustre network driver (LND) appropriate to the fabric or fabrics upon which LNet operates (e.g. ksocklnd for TCP/IP, ko2iblnd for RDMA fabrics).

Identify a "console" node from which to conduct the tests. This is the single system from which all LNET selftest commands will be executed. The console node owns the LNET selftest session and there should be only one active session on the network for a given set of nodes. Make sure that each session is isolated from network activity that might distort the results, and do not run any workloads except for LNet-selftest on machines participating in the session.

It is strongly recommended that a survey of overall network health, and analysis of raw network performance between the target machines are carried out prior to running the LNET-selftest benchmark. Fabric health is critically important to preserving performance and reliability in a distributed system. Baseline benchmarking of the fabric will help to identify and measure any performance overhead introduced by LNET and provides a point of comparison between the low-level network capability and the results from the LNet benchmarks. If results are not within expectations, it is important to establish a measurement that is independent of Lustre or LNet, in order to isolate root causes quickly and effectively.

Using the Wrapper Script
Use the LNET Selftest wrapper in the appendix to execute the test cases referenced in this document. The header of the script has some variables that need to be set in accordance with the target environment. Without changes, the script is very unlikely to operate correctly, if at all. Here is a listing of the header:  ST=lst-output-$(date +%Y-%m-%d-%H:%M:%S) CN=64 SZ=1M TM=30 BRW=read CKSUM=simple LFROM="10.73.2.21@tcp" LTO="10.73.2.22@tcp"
 * 1) Output file
 * 1) Concurrency
 * 1) Size
 * 1) Length of time to run test (secs)
 * 1) Which BRW test to run (read or write)
 * 1) Checksum calculation (simple or full)
 * 1) The LST "from" list -- e.g. Lustre clients. Space separated list of NIDs.
 * 1) The LST "to" list -- e.g. Lustre servers. Space separated list of NIDs.

Single Client Throughput – LNET Selftest Read (2 Nodes, 1:1)
Used to establish point to point unidirectional read performance between two nodes.

Set the wrapper up as follows:

 ST=lst-output-$(date +%Y-%m-%d-%H:%M:%S) CN=32 SZ=1M TM=60 BRW=read CKSUM=simple LFROM="10.73.2.21@tcp" LTO="10.73.2.22@tcp"
 * 1) Output file
 * 1) Concurrency
 * 1) Size
 * 1) Length of time to run test (secs)
 * 1) Which BRW test to run (read or write)
 * 1) Checksum calculation (simple or full)
 * 1) The LST "from" list -- e.g. Lustre clients. Space separated list of NIDs.
 * 1) The LST "to" list -- e.g. Lustre servers. Space separated list of NIDs.

Change the  and   lists as required.

Run the script several times, changing the concurrency setting with at the start of every new run. Use the sequence 1, 2, 4, 8, 16, 32, 64, 128. Modify the output filename for each run so that it is clear what results have been captured into each file.

Single Client Throughput – LNET Selftest Write (2 Nodes, 1:1)
Used to establish point to point unidirectional write performance between two nodes.

Set the wrapper up as follows:

 ST=lst-output-$(date +%Y-%m-%d-%H:%M:%S) CN=32 SZ=1M TM=60 BRW=write CKSUM=simple LFROM="10.73.2.21@tcp" LTO="10.73.2.22@tcp"
 * 1) Output file
 * 1) Concurrency
 * 1) Size
 * 1) Length of time to run test (secs)
 * 1) Which BRW test to run (read or write)
 * 1) Checksum calculation (simple or full)
 * 1) The LST "from" list -- e.g. Lustre clients. Space separated list of NIDs.
 * 1) The LST "to" list -- e.g. Lustre servers. Space separated list of NIDs.

Change the  and   lists as required.

Run the script several times, changing the concurrency setting with at the start of every new run. Use the sequence 1, 2, 4, 8, 16, 32, 64, 128. Modify the output filename for each run so that it is clear what results have been captured into each file.

LNET Selftest (1:1 Matrix)
The following command invocation will automatically run a series of benchmarks, testing the connections, in pairs, between all of the nodes mentioned in the list. The  script generates a list of all unique permutations of host pairs, and then launches the   script for each pair.

 echo "  ... " | \ awk '{for (i=1;i<=NF;i++)T[i]=$i} END {for (i=1; i<NF; i++) {for (j=i+1; j<=NF; j++) system("LFROM="T[i]" LTO="T[j]" ./lst-bench.sh")}}'

LNET Selftest Read (Many:1)
Many clients, one server read test. Find saturation point for one server. To test, edit the wrapper variables:
 * Set
 * Add a space-separated list of NIDS to the  variable, using quotes to encapsulate the entire string.

LNET Selftest Write (Many:1)
Many clients, one server write test. Find saturation point for one server. Set BRW=write, and add a space-separated list of NIDS to the LFROM variable, using quotes to encapsulate the entire string.

LNET Selftest Read (Many:Many)
Many clients, many servers read test. From initial saturation point for many clients to a single server, grow number of servers and re-test. Grow number of clients to meet saturation point of server pool.

LNET Selftest Write (Many:Many)
Many clients, many servers write test. From initial saturation point for many clients to a single server, grow number of servers and re-test. Grow number of clients to meet saturation point of server pool.

Running the Benchmark
The test cases referenced in this document use a small wrapper script, the listing for which is provided as an appendix to this document. The script is intended to simplify and to a certain extent automate the setup of an LNET selftest session, using a small number of well-defined variables to isolate the parameters of interest to benchmarking requirements.

In some circumstances, it may be necessary or preferable to run the LNET selftest benchmark without recourse to the wrapper. The application is very flexible and can be used to create more complex scenarios than those afforded by the wrapper, so the following section describes how to run LNET selftest entirely from the command line.

 Load the  kernel module on all nodes that are participating in the benchmark. On the LNET Selftest console node, create a new session. For example:  export LST_SESSION=$$ lst new_session twonoderead This creates a session called. The name of the session is arbitrary but should reflect the type of test being created. The  variable is mandatory – most   commands will fail to run if it is not set. It is common practice to use the PID of the command shell as the session identifier in the environment variable but one could also use some other unique identifier, e.g..  Define the LNET host groups. Each group is comprised of one or more hosts referenced by their NID. Multiple NIDs can be specified as a space-separated list and/or using pattern matching rules. Group names are arbitrary but should be relevant to the members of that group. In the following example, there is a group called  and a group called , each with a single host/NID:  lst add_group client 192.168.0.21@o2ib lst add_group server 192.168.0.11@o2ib </li> Create a new batch (a batch is a container for grouping a set of tests). Name the group after the type of test that will be run. In this example, the batch is called " ": <pre style="overflow-x:auto;"> lst add_batch bulk_read </li> Add a test to the batch: <pre style="overflow-x:auto;"> lst add_test --batch bulk_read --from client --to server brw read check=full size=1M --concurrency=16 Multiple tests can be added to a single batch. Tests will be run in parallel. There are 3 test types:,   and. Results will be affected by the amount of concurrency defined in the test. </li> Execute the tests in the "bulk_read" batch: <pre style="overflow-x:auto;"> lst run bulk_read " " is launched as a background process and returns to the command prompt immediately. The test will continue to run until either " " or " " is run. No output is returned; use " " to monitor. </li> To monitor the execution of a batch, us the "lst stat" command: <pre style="overflow-x:auto;"> lst stat [ ...] will continue to display output until the process is killed. There must be at least one group listed on the command line. Multiple groups can be added as a space-separated list. </li> Capture 30 seconds worth of output for groups "client" and "server" to a file: <pre style="overflow-x:auto;"> lst stat client server > /tmp/lst-out-`date +s` & LSTPID=$! sleep 30 kill $LSTPID </li> Stop the batch: <pre style="overflow-x:auto;"> lst stop bulk_read This step is optional if there are no more tests to run for the active session, since ending a session will end any active tests. </li> Clean up the session when complete: <pre style="overflow-x:auto;"> lst end_session </li> Remove the  kernel module from all of the machines.</li> </ol>
 * The command adds a " " test to the batch " ".
 * This test runs from the nodes in the  group to the nodes in the   group.
 * " " forces the test to validate the data (using a checksum).
 * " " sets the size of each IO transaction to 1MB.
 * Setting the concurrency simulates running multiple threads. One can also set distribution for running one to many tests (refer to the Lustre manual).

A complete example in one code block: <pre style="overflow-x:auto;"> export LST_SESSION=$$ lst new_session twonoderead lst add_group client 192.168.0.21@o2ib lst add_group server 192.168.0.11@o2ib lst add_batch bulk_read lst add_test --batch bulk_read --from client --to server brw read check=full size=1M lst run bulk_read lst stat client server & sleep 30; kill $! lst stop bulk_read lst end_session

This is the entire test case code, except for the  kernel module load. Just change the NIDs of each  to run.

Appendix: LNET Selftest Wrapper
<pre style="overflow-x:auto;">
 * 1) !/bin/sh
 * 2) Simple wrapper script for LNET Selftest
 * 1) Simple wrapper script for LNET Selftest


 * 1) Parameters are supplied as environment variables
 * 2) The defaults are reasonable for quick verification.
 * 3) For in-depth benchmarking, increase the time (TM)
 * 4) variable to e.g. 60 seconds, and iterate over
 * 5) concurrency to find optimal values.
 * 6) Reference: http://wiki.lustre.org/LNET_Selftest
 * 1) Reference: http://wiki.lustre.org/LNET_Selftest

CN=${CN:-32} SZ=${SZ:-1M} TM=${TM:-10} BRW=${BRW:-"read"} CKSUM=${CKSUM:-"simple"}
 * 1) Concurrency
 * 1) Size
 * 1) Length of time to run test (secs)
 * 1) Which BRW test to run (read or write)
 * 1) Checksum calculation (simple or full)

LFROM=${LFROM:?ERROR: the LFROM variable is not set} LTO=${LTO:?ERROR: the LTO variable is not set}
 * 1) The LST "from" list -- e.g. Lustre clients. Space separated list of NIDs.
 * 2) LFROM="10.10.2.21@tcp"
 * 1) The LST "to" list -- e.g. Lustre servers. Space separated list of NIDs.
 * 2) LTO="10.10.2.22@tcp"


 * 1) End of customisation.

export LST_SESSION=$$ echo LST_SESSION = ${LST_SESSION} lst new_session lst${BRW} lst add_group lfrom ${LFROM} lst add_group lto ${LTO} lst add_batch bulk_${BRW} lst add_test --batch bulk_${BRW} --from lfrom --to lto brw ${BRW} \ --concurrency=${CN} check=${CKSUM} size=${SZ} lst run bulk_${BRW} echo -n "Capturing statistics for ${TM} secs " lst stat lfrom lto & LSTPID=$! i=1 j=$((${TM}/5)) if [ $((${TM}%5)) -ne 0 ]; then let j++; fi while [ $i -le $j ]; do sleep 5 let i++ done kill ${LSTPID} && wait ${LISTPID} >/dev/null 2>&1 echo lst show_error lfrom lto lst stop bulk_${BRW} lst end_session
 * 1) Delay loop with interval markers displayed every 5 secs.
 * 2) Test time is rounded up to the nearest 5 seconds.