| C H A P T E R 23 |
|
Testing Lustre Network Performance (LNET Self-Test) |
This chapter describes the LNET self-test, which is used by site administrators to confirm that Lustre Networking (LNET) has been properly installed and configured, and that underlying network software and hardware are performing according to expectations. The chapter includes:
LNET Self-Test Command Reference
LNET self-test is a kernel module that runs over LNET and the Lustre network drivers (LNDs. It is designed to:
After you have obtained performance results for your Lustre network, refer to Chapter 25: Lustre Tuning for information about parameters that can be used to tune LNET for optimum performance.
| Note - Apart from the performance impact, LNET self-test is invisible to Lustre. |
An LNET self-test cluster includes two types of nodes:
LNET self-test has two user utilities:
To run LNET self-test, these modules must be loaded:
To load the required modules, run:
modprobe lnet_selftest
This command recursively loads the modules on which LNET self-test depends.
| Note - While the console and test nodes require all the prerequisite modules to be loaded, userspace test nodes do not require these modules. |
This section describes how to create and run an LNET self-test. The examples shown are for a test that simulates the traffic pattern of a set of Lustre servers on a TCP network accessed by Lustre clients on an InfiniBand network connected via LNET routers. In this example, half the clients are reading and half the clients are writing.
A session is a set of processes that run on a test node. Only one session can be run at a time on a test node to ensure that the session has exclusive use of the node. The console node is used to create, change or destroy a session (new_session, end_session, show_session). For more about session parameters, see Session Commands.
Almost all operations should be performed within the context of a session. From the console node, a user can only operate nodes in his own session. If a session ends, the session context in all test nodes is stopped.
The following commands set the LST_SESSION environment variable to identify the session on the console node and create a session called read_write:
export LST_SESSION=$$ lst new_session read_write
A group is a named collection of nodes. Any number of groups can exist in a single LNET self-test session. Group membership is not restricted in that a node can be included in any number of groups.
Each node in a group has a rank, determined by the order in which it was added to the group. The rank is used to establish test traffic patterns.
A user can only control nodes in his/her session. To allocate nodes to the session, the user needs to add nodes to a group (of the session). All nodes in a group can be referenced by the group name. A node can be allocated to multiple groups of a session.
In the following example, three groups are established:
lst add_group servers 192.168.10.[8,10,12-16]@tcp lst add_group readers 192.168.1.[1-253/2]@o2ib lst add_group writers 192.168.1.[2-254/2]@o2ib
A test generates a network load between two groups of nodes, a source group identified using the --from parameter and a target group identified using the --to parameter. When a test is running, each node in the --from <group> simulates a client by sending requests to nodes in the --to <group>, which are simulating a set of servers, and then receives responses in return. This activity is designed to mimic Lustre RPC traffic.
A batch is a collection of tests that are started and stopped together and run in parallel. A test must always be run as part of a batch, even if it is just a single test. Users can only run or stop a test batch, not individual tests.
Tests in a batch are non-destructive to the file system, and can be run in a normal Lustre environment (provided the performance impact is acceptable).
A simple batch might contain a single test, for example, to determine whether the network bandwidth presents an I/O bottleneck. In this example, the --to <group> could be comprised of Lustre OSSs and --from <group> the compute nodes. A second test could be added to perform pings from a login node to the MDS to see how checkpointing affects the ls -l process.
Two types of tests are available:
In the example below, a batch is created called bulk_rw. Then two brw tests are added. In the first test, 1M of data is sent from the servers to the clients as a simulated read operation with a simple data validation check. In the second test, 4K of data is sent from the clients to the servers as a simulated write operation with a full data validation check.
lst add_batch bulk_rw lst add_test --batch bulk_rw --from readers --to servers \ brw read check=simple size=1M lst add_test --batch bulk_rw --from writers --to servers \ brw write check=full size=4K
The traffic pattern and test intensity is determined by several properties such as test type, distribution of test nodes, concurrency of test, and RDMA operation type. For more details, see Batch and Test Commands.
This sample LNET self-test script simulates the traffic pattern of a set of Lustre servers on a TCP network, accessed by Lustre clients on an InfiniBand network (connected via LNET routers). In this example, half the clients are reading and half the clients are writing.
Run this script on the console node:
#!/bin/bash export LST_SESSION=$$ lst new_session read/write lst add_group servers 192.168.10.[8,10,12-16]@tcp lst add_group readers 192.168.1.[1-253/2]@o2ib lst add_group writers 192.168.1.[2-254/2]@o2ib lst add_batch bulk_rw lst add_test --batch bulk_rw --from readers --to servers \ brw read check=simple size=1M lst add_test --batch bulk_rw --from writers --to servers \ brw write check=full size=4K # start running lst run bulk_rw # display server stats for 30 seconds lst stat servers & sleep 30; kill $! # tear down lst end_session
| Note - This script can be easily adapted to pass the group NIDs by shell variables or command line arguments (making it good for general-purpose use). |
The LNET self-test (lst) utility is used to issue LNET self-test commands. The lst utility takes a number of command line arguments. The first argument is the command name and subsequent arguments are command-specific.
This section describes lst session commands.
The lst utility uses the LST_SESSION environmental variable to identify the session locally on the self-test console node. This should be a numeric value that uniquely identifies all session processes on the node. It is convenient to set this to the process ID of the shell both for interactive use and in shell scripts. Almost all lst commands require LST_SESSION to be set.
export LST_SESSION=$$
new_session [--timeout SECONDS] [--force] NAME
$ lst new_session --force read_write
Stops all operations and tests in the current session and clears the session’s status.
$ lst end_session
Shows the session information. This command prints information about the current session. It does not require LST_SESSION to be defined in the process environment.
$ lst show_session
This section describes lst group commands.
add_group <name> <NIDS> [<NIDs>...]
Creates the group and adds a list of test nodes to the group.
|
A string that may be expanded to include one or more LNET NIDs. |
$ lst add_group servers 192.168.10.[35,40-45]@tcp $ lst add_group clients 192.168.1.[10-100]@tcp 192.168.[2,4].\ [10-20]@tcp
update_group <name> [--refresh] [--clean <status>] [--remove <NIDs>]
Updates the state of nodes in a group or adjusts a group’s membership. This command is useful if some nodes have crashed and should be excluded from the group.
|
Removes nodes with a specified status from the group. Status may be: |
||
$ lst update_group clients --refresh $ lst update_group clients --clean busy $ lst update_group clients --clean invalid // \ invalid == busy || down || unknown $ lst update_group clients --remove \192.168.1.[10-20]@tcp
list_group [<name>] [--active] [--busy] [--down] [--unknown] [--all]
Prints information about a group or lists all groups in the current session if no group is specified.
$ lst list_group 1) clients 2) servers Total 2 groups $ lst list_group clients ACTIVE BUSY DOWN UNKNOWN TOTAL 3 1 2 0 6 $ lst list_group clients --all 192.168.1.10@tcp Active 192.168.1.11@tcp Active 192.168.1.12@tcp Busy 192.168.1.13@tcp Active 192.168.1.14@tcp DOWN 192.168.1.15@tcp DOWN Total 6 nodes $ lst list_group clients --busy 192.168.1.12@tcp Busy Total 1 node
Removes a group from the session. If the group is referred to by any test, then the operation fails. If nodes in the group are referred to only by this group, then they are kicked out from the current session; otherwise, they are still in the current session.
$ lst del_group clients
lstclient --sesid <NID> --group <name> [--server_mode]
Use lstclient to run the userland self-test client. The lstclient command should be executed after creating a session on the console. There are only two mandatory options for lstclient:
|
When included, forces LNET to behave as a server, such as starting an acceptor if the underlying NID needs it or using privileged ports. Only root is allowed to use the --server_mode option. |
Console $ lst new_session testsession Client1 $ lstclient --sesid 192.168.1.52@tcp --group clients
Client1 $ lstclient --sesid 192.168.1.52@tcp |--group clients --server_mode
This section describes lst batch and test commands.
A default batch test set named batch is created when the session is started. You can specify a batch name by using add_batch:
$ lst add_batch bulkperf
Creates a batch test called bulkperf.
add_test --batch <batchname> [--loop <#>] [--concurrency <#>] [--distribute <#:#>]
--from<group> --to <group> {brw|ping} <test options>
Adds a test to a batch. The parameters are described below.
|
Determines the ratio of client nodes to server nodes for the specified test. This allows you to specify a wide range of topologies, including one-to-one and all-to-all. Distribution divides the source group into subsets, which are paired with equivalent subsets from the target group so only nodes in matching subsets communicate. |
||
|
Sends a small request message, resulting in a small reply message. For more details, see Defining and Running the Tests |
||
|
Sends a small request message followed by a bulk data transfer, resulting in a small reply message. Defining and Running the Tests. Options are: |
||
|
I/O size in bytes, KB or MB (i.e., size=1024, size=4K, size=1M). The default is 4K bytes. |
||
|
A data validation check (checksum of data). The default is that no check is done. |
||
Examples showing use of the distribute parameter:
Clients: (C1, C2, C3, C4, C5, C6) Server: (S1, S2, S3) --distribute 1:1 (C1->S1), (C2->S2), (C3->S3), (C4->S1), (C5->S2), \(C6->S3) /* -> means test conversation */ --distribute 2:1 (C1,C2->S1), (C3,C4->S2), (C5,C6->S3) --distribute 3:1 (C1,C2,C3->S1), (C4,C5,C6->S2), (NULL->S3) --distribute 3:2 (C1,C2,C3->S1,S2), (C4,C5,C6->S3,S1) --distribute 4:1 (C1,C2,C3,C4->S1), (C5,C6->S2), (NULL->S3) --distribute 4:2 (C1,C2,C3,C4->S1,S2), (C5, C6->S3, S1) --distribute 6:3 (C1,C2,C3,C4,C5,C6->S1,S2,S3)
The setting --distribute 1:1 is the default setting where each source node communicates with one target node.
When the setting --distribute 1:<n> (where <n> is the size of the target group) is used, each source node communicates with every node in the target group.
Note that if there are more source nodes than target nodes, some source nodes may share the same target nodes. Also, if there are more target nodes than source nodes, some higher-ranked target nodes will be idle.
$ lst add_group clients 192.168.1.[10-17]@tcp $ lst add_group servers 192.168.10.[100-103]@tcp $ lst add_batch bulkperf $ lst add_test --batch bulkperf --loop 100 --concurrency 4 \ --distribute 4:2 --from clients brw WRITE size=16K
In the example above, a batch test called bulkperf that will do a 16 kbyte bulk write request. In this test, two groups of four clients (sources) write to each of four servers (targets) as shown below:
list_batch [<name>] [--test <index>] [--active] [--invalid] [--server | client]
Lists batches in the current session or lists client and server nodes in a batch or a test.
|
Lists tests in a batch. If no option is used, all tests in the batch are listed. IIf one of these options are used, only specified tests in the batch are listed: |
||
$ lst list_batch bulkperf $ lst list_batch bulkperf Batch: bulkperf Tests: 1 State: Idle ACTIVE BUSY DOWN UNKNOWN TOTAL client 8 0 0 0 8 server 4 0 0 0 4 Test 1(brw) (loop: 100, concurrency: 4) ACTIVE BUSY DOWN UNKNOWN TOTAL client 8 0 0 0 8 server 4 0 0 0 4 $ lst list_batch bulkperf --server --active 192.168.10.100@tcp Active 192.168.10.101@tcp Active 192.168.10.102@tcp Active 192.168.10.103@tcp Active
$ lst run bulkperf
$ lst stop bulkperf
query <name> [--test <index>] [--timeout <seconds> [--loop <#>] [--delay <seconds>] [--all]
|
Only queries the specified test. The test index starts from 1. |
|
|
The timeout value to wait for RPC. The default is 5 seconds. |
|
$ lst run bulkperf $ lst query bulkperf --loop 5 --delay 3 Batch is running Batch is running Batch is running Batch is running Batch is running $ lst query bulkperf --all 192.168.1.10@tcp Running 192.168.1.11@tcp Running 192.168.1.12@tcp Running 192.168.1.13@tcp Running 192.168.1.14@tcp Running 192.168.1.15@tcp Running 192.168.1.16@tcp Running 192.168.1.17@tcp Running $ lst stop bulkperf $ lst query bulkperf Batch is idle
This section describes other lst commands.
ping [-session] [--group <name>] [--nodes <NIDs>] [--batch <name>] [--server] [--timeout <seconds>]
Sends a “hello” query to the nodes.
|
Sends RPC to all server nodes instead of client nodes. This option is only used with --batch <name>. |
|
$ lst ping 192.168.10.[15-20]@tcp 192.168.1.15@tcp Active [session: liang id: 192.168.1.3@tcp] 192.168.1.16@tcp Active [session: liang id: 192.168.1.3@tcp] 192.168.1.17@tcp Active [session: liang id: 192.168.1.3@tcp] 192.168.1.18@tcp Busy [session: Isaac id: 192.168.10.10@tcp] 192.168.1.19@tcp Down [session: <NULL> id: LNET_NID_ANY] 192.168.1.20@tcp Down [session: <NULL> id: LNET_NID_ANY]
stat [--bw] [--rate] [--read] [--write] [--max] [--min] [--avg] " " [--timeout <seconds>]
[--delay <seconds>] <group>|<NIDs> [<group>|<NIDs>]
The collection performance and RPC statistics of one or more nodes.
|
The timeout of the statistics RPC. The default is 5 seconds. |
|
$ lst run bulkperf $ lst stat clients [LNet Rates of clients] [W] Avg: 1108 RPC/s Min: 1060 RPC/s Max: 1155 RPC/s [R] Avg: 2215 RPC/s Min: 2121 RPC/s Max: 2310 RPC/s [LNet Bandwidth of clients] [W] Avg: 16.60 MB/s Min: 16.10 MB/s Max: 17.1 MB/s [R] Avg: 40.49 MB/s Min: 40.30 MB/s Max: 40.68 MB/s
Specifying a group name (<group>) causes statistics to be gathered for all nodes in a test group. For example:
$ lst stat servers
where servers is the name of a test group created by lst add_group
Specifying a NID range (<NIDs>) causes statistics to be gathered for selected nodes. For example:
$ lst stat 192.168.0.[1-100/2]@tcp Only LNET performance statistics are available. By default, all statistics information is displayed. Users can specify additional information with these options.
show_error [--session] [<group>|<NIDs>]...
Lists the number of failed RPCs on test nodes.
|
Lists errors in the current test session. With this option, historical RPC errors are not listed. |
$ lst show_error clientsclients 12345-192.168.1.15@tcp: [Session: 1 brw errors, 0 ping errors] \ [RPC: 20 errors, 0 dropped, 12345-192.168.1.16@tcp: [Session: 0 brw errors, 0 ping errors] \ [RPC: 1 errors, 0 dropped, Total 2 error nodes in clients $ lst show_error --session clients clients 12345-192.168.1.15@tcp: [Session: 1 brw errors, 0 ping errors] Total 1 error nodes in clients
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.