LNET Selftest

From Lustre Wiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Description

LNET Selftest is a network verification application for Lustre that performs 2 principal functions:

  • Confirm the correctness of the Lustre Networking configuration of a given machine
  • Measure the performance of the connection between a set of machines

LNET Selftest requires that the Lustre software is installed and the LNET kernel module is configured on each of the machines that will participate in the Lustre network. A Lustre file system is not required in order to verify LNet functionality with the LNet-selftest benchmark.

Purpose

LNET-selftest is used to verify that the Lustre networking (LNET) module is properly installed and configured, as well as to establish the performance of the underlying hardware that supports Lustre on a computer system. LNET-selftest is very useful in determining the performance of the networking layer, which is essential when conducting a survey of a Lustre file system installation, either for benchmarking purposes or for identifying bottlenecks or limitations in the data pipeline from client to server.

There are two test types that the software supports: "bulk read-write" (BRW) and "ping". From the Lustre manual:

  • ping - a ping generates a short request message, which results in a short response. Pings are useful to determine latency and small message overhead and to simulate Lustre metadata traffic.
  • brw - in a brw ('bulk read write') test, data is transferred from the target machine to the source machine (brwread) or data is transferred from the source to the target (brwwrite). The size of the bulk transfer is set using the size parameter. A BRW test is useful to determine network bandwidth and to simulate Lustre I/O traffic.

Preparation

The LNET Selftest kernel module must be installed and loaded on all machines in the test before the application is started. Identify the set of all systems that will participate in a session and ensure that the kernel module has been loaded. To load the kernel module run this command:

modprobe lnet_selftest

Kernel module dependencies are automatically resolved and loaded by modprobe. This will make sure all the necessary modules are loaded: libcfs, lnet, lnet_selftest and the kernel lustre network driver (LND) appropriate to the fabric or fabrics upon which LNet operates (e.g. ksocklnd for TCP/IP, ko2iblnd for RDMA fabrics).

Identify a "console" node from which to conduct the tests. This is the single system from which all LNET selftest commands will be executed. The console node owns the LNET selftest session and there should be only one active session on the network for a given set of nodes. Make sure that each session is isolated from network activity that might distort the results, and do not run any workloads except for LNet-selftest on machines participating in the session.

It is strongly recommended that a survey of overall network health, and analysis of raw network performance between the target machines are carried out prior to running the LNET-selftest benchmark. Fabric health is critically important to preserving performance and reliability in a distributed system. Baseline benchmarking of the fabric will help to identify and measure any performance overhead introduced by LNET and provides a point of comparison between the low-level network capability and the results from the LNet benchmarks. If results are not within expectations, it is important to establish a measurement that is independent of Lustre or LNet, in order to isolate root causes quickly and effectively.

Using the Wrapper Script

Use the LNET Selftest wrapper in the appendix to execute the test cases referenced in this document. The header of the script has some variables that need to be set in accordance with the target environment. Without changes, the script is very unlikely to operate correctly, if at all. Here is a listing of the header:

#Output file
ST=lst-output-$(date +%Y-%m-%d-%H:%M:%S)
# Concurrency
CN=64
#Size
SZ=1M
# Length of time to run test (secs)
TM=30
# Which BRW test to run (read or write)
BRW=read
# Checksum calculation (simple or full)
CKSUM=simple
# The LST "from" list -- e.g. Lustre clients. Space separated list of NIDs.
LFROM="10.73.2.21@tcp"
# The LST "to" list -- e.g. Lustre servers. Space separated list of NIDs.
LTO="10.73.2.22@tcp"

Notes

  • CN: the concurrency setting simulates the number of threads performing communication. The LNET Selftest default is 1, which is not enough to properly exercise the connection. Set to at least 16, but experiment with higher values (32 or 64 being reasonable choices).
  • SZ: the size setting determines the size of the IO transaction. For bandwidth (throughput) measurements, use 1M.
  • TM: test time in seconds– how long to run the benchmark for. Set to a reasonable number in order to ensure collection of sufficient data to extrapolate a meaningful average (at least 60 seconds).
  • BRW: The Bulk Read/Write test to use. There are only two choices "read" or "write".
  • CKSUM: The checksum checking method. Choose either "simple" or "full".
  • LFROM: a space-separated list of NIDs that represent the "from" list (or source) in LNET Selftest. This is often a set of clients.
  • LTO: a space-separated list of NIDs that represent the "to" list (or destination) in LNET Selftest. This is often a set of servers.

Note: The wrapper script does not validate input parameters or changes to variables at this time. There is no pre-parsing of variables and will not clean up the environment on error. Use with caution and as a reference for how to operate the LNET Selftest application.

In the sections following this one, the parameters required for the LNET selftest wrapper will be displayed first, followed by a description of the manual command-line process.

Single Client Throughput – LNET Selftest Read (2 Nodes, 1:1)

Used to establish point to point unidirectional read performance between two nodes.

Set the wrapper up as follows:

#Output file
ST=lst-output-$(date +%Y-%m-%d-%H:%M:%S)
# Concurrency
CN=32
#Size
SZ=1M
# Length of time to run test (secs)
TM=60
# Which BRW test to run (read or write)
BRW=read
# Checksum calculation (simple or full)
CKSUM=simple
# The LST "from" list -- e.g. Lustre clients. Space separated list of NIDs.
LFROM="10.73.2.21@tcp"
# The LST "to" list -- e.g. Lustre servers. Space separated list of NIDs.
LTO="10.73.2.22@tcp"

Change the LFROM and LTO lists as required.

Run the script several times, changing the concurrency setting with at the start of every new run. Use the sequence 1, 2, 4, 8, 16, 32, 64, 128. Modify the output filename for each run so that it is clear what results have been captured into each file.

Single Client Throughput – LNET Selftest Write (2 Nodes, 1:1)

Used to establish point to point unidirectional write performance between two nodes.

Set the wrapper up as follows:

#Output file
ST=lst-output-$(date +%Y-%m-%d-%H:%M:%S)
# Concurrency
CN=32
#Size
SZ=1M
# Length of time to run test (secs)
TM=60
# Which BRW test to run (read or write)
BRW=write
# Checksum calculation (simple or full)
CKSUM=simple
# The LST "from" list -- e.g. Lustre clients. Space separated list of NIDs.
LFROM="10.73.2.21@tcp"
# The LST "to" list -- e.g. Lustre servers. Space separated list of NIDs.
LTO="10.73.2.22@tcp"

Change the LFROM and LTO lists as required.

Run the script several times, changing the concurrency setting with at the start of every new run. Use the sequence 1, 2, 4, 8, 16, 32, 64, 128. Modify the output filename for each run so that it is clear what results have been captured into each file.

LNET Selftest Read (Many:1)

Many clients, one server read test. Find saturation point for one server. To test, edit the wrapper variables:

  • Set BRW=read
  • Add a space-separated list of NIDS to the LFROM variable, using quotes to encapsulate the entire string.

LNET Selftest Write (Many:1)

Many clients, one server write test. Find saturation point for one server. Set BRW=write, and add a space-separated list of NIDS to the LFROM variable, using quotes to encapsulate the entire string.

LNET Selftest Read (Many:Many)

Many clients, many servers read test. From initial saturation point for many clients to a single server, grow number of servers and re-test. Grow number of clients to meet saturation point of server pool.

LNET Selftest Write (Many:Many)

Many clients, many servers write test. From initial saturation point for many clients to a single server, grow number of servers and re-test. Grow number of clients to meet saturation point of server pool.

Running the Benchmark

The test cases referenced in this document use a small wrapper script, the listing for which is provided as an appendix to this document. The script is intended to simplify and to a certain extent automate the setup of an LNET selftest session, using a small number of well-defined variables to isolate the parameters of interest to benchmarking requirements.

In some circumstances, it may be necessary or preferable to run the LNET selftest benchmark without recourse to the wrapper. The application is very flexible and can be used to create more complex scenarios than those afforded by the wrapper, so the following section describes how to run LNET selftest entirely from the command line.

  1. Load the lnet_selftest kernel module on all nodes that are participating in the benchmark.
  2. On the LNET Selftest console node, create a new session. For example:
    export LST_SESSION=$$
    lst new_session twonoderead 
    

    This creates a session called twonoderead. The name of the session is arbitrary but should reflect the type of test being created. The LST_SESSION variable is mandatory – most lst commands will fail to run if it is not set. It is common practice to use the PID of the command shell ($$) as the session identifier in the environment variable but one could also use some other unique identifier, e.g. `date +%s`.

  3. Define the LNET host groups. Each group is comprised of one or more hosts referenced by their NID. Multiple NIDs can be specified as a space-separated list and/or using pattern matching rules. Group names are arbitrary but should be relevant to the members of that group. In the following example, there is a group called client and a group called server, each with a single host/NID:
    lst add_group client 192.168.0.21@o2ib
    lst add_group server 192.168.0.11@o2ib
    
  4. Create a new batch (a batch is a container for grouping a set of tests). Name the group after the type of test that will be run. In this example, the batch is called "bulk_read":
    lst add_batch bulk_read
    
  5. Add a test to the batch:
    lst add_test --batch bulk_read --from client --to server brw read check=full size=1M --concurrency=16
    

    Multiple tests can be added to a single batch. Tests will be run in parallel.

    • The command adds a "brw read" test to the batch "bulk_read".
    • This test runs from the nodes in the client group to the nodes in the server group.
    • "check=full" forces the test to validate the data (using a checksum).
    • "size=1M" sets the size of each IO transaction to 1MB.
    • Setting the concurrency simulates running multiple threads. One can also set distribution for running one to many tests (refer to the Lustre manual).

    There are 3 test types: brw read, brw write and ping. Results will be affected by the amount of concurrency defined in the test.

  6. Execute the tests in the "bulk_read" batch:
    lst run bulk_read
    

    "lst run" is launched as a background process and returns to the command prompt immediately. The test will continue to run until either "lst stop" or "lst end_session" is run. No output is returned; use "lst stat" to monitor.

  7. To monitor the execution of a batch, us the "lst stat" command:
    lst stat <group name> [<group name> ...]
    

    lst stat will continue to display output until the process is killed (Control-C). There must be at least one group listed on the command line. Multiple groups can be added as a space-separated list.

  8. Capture 30 seconds worth of output for groups "client" and "server" to a file:
    lst stat client server > /tmp/lst-out-`date +s` &
    LSTPID=$!
    sleep 30
    kill $LSTPID
    
  9. Stop the batch:
    lst stop bulk_read
    

    This step is optional if there are no more tests to run for the active session, since ending a session will end any active tests.

  10. Clean up the session when complete:
    lst end_session
    
  11. Remove the lnet_selftest kernel module from all of the machines.

A complete example in one code block:

export LST_SESSION=$$
lst new_session twonoderead
lst add_group client 192.168.0.21@o2ib
lst add_group server 192.168.0.11@o2ib
lst add_batch bulk_read
lst add_test --batch bulk_read --from client --to server brw read check=full size=1M
lst run bulk_read
lst stat client server & sleep 30; kill $!
lst stop bulk_read
lst end_session

This is the entire test case code, except for the lst kernel module load. Just change the NIDs of each group to run.

References

Appendix: LNET Selftest Wrapper

#!/bin/sh
#
# Simple wrapper script for LNET Selftest
#

# Parameters are supplied as environment variables
# The defaults are reasonable for quick verification.
# For in-depth benchmarking, increase the time (TM)
# variable to e.g. 60 seconds, and iterate over
# concurrency to find optimal values.
#
# Reference: http://wiki.lustre.org/LNET_Selftest

# Concurrency
CN=${CN:-32}
#Size
SZ=${SZ:-1M}
# Length of time to run test (secs)
TM=${TM:-10}
# Which BRW test to run (read or write)
BRW=${BRW:-"read"}
# Checksum calculation (simple or full)
CKSUM=${CKSUM:-"simple"}

# The LST "from" list -- e.g. Lustre clients. Space separated list of NIDs.
# LFROM="10.10.2.21@tcp"
LFROM=${LFROM:?ERROR: the LFROM variable is not set}
# The LST "to" list -- e.g. Lustre servers. Space separated list of NIDs.
# LTO="10.10.2.22@tcp"
LTO=${LTO:?ERROR: the LTO variable is not set}

### End of customisation.

export LST_SESSION=$$
echo LST_SESSION = ${LST_SESSION}
lst new_session lst${BRW}
lst add_group lfrom ${LFROM}
lst add_group lto ${LTO}
lst add_batch bulk_${BRW}
lst add_test --batch bulk_${BRW} --from lfrom --to lto brw ${BRW} \
  --concurrency=${CN} check=${CKSUM} size=${SZ}
lst run bulk_${BRW}
echo -n "Capturing statistics for ${TM} secs "
lst stat lfrom lto &
LSTPID=$!
# Delay loop with interval markers displayed every 5 secs.
# Test time is rounded up to the nearest 5 seconds.
i=1
j=$((${TM}/5))
if [ $((${TM}%5)) -ne 0 ]; then let j++; fi
while [ $i -le $j ]; do
  sleep 5
  let i++
done
kill ${LSTPID} && wait ${LISTPID} >/dev/null 2>&1
echo
lst show_error lfrom lto
lst stop bulk_${BRW}
lst end_session