Testing HOWTO: Difference between revisions

From Lustre Wiki
Jump to navigation Jump to search
No edit summary
 
(16 intermediate revisions by 2 users not shown)
Line 1: Line 1:
This HOWTO is intended to demonstrate the basics of configuring and running a small subset of tests on a multi-node configuration. [[Test Descriptions | There are many tests available in the suite]], and the principles demonstrated here will apply to them all, but this HOWTO will focus on the sanity test.
This HOWTO is intended to demonstrate the basics of configuring and running tests on a multi-node configuration. [[Test Descriptions | There are many tests available in the Lustre Test Suites]], and the principles demonstrated in configuring your system and running the sanity test suite will apply to them all. We will also demonstrate adding additional configuration for feature specific test suites.
 
Note that there is a configuration distributed with the Lustre test suite, local.sh, that easily enables you to test Lustre on a single node using lookback devices with no additional configuration needed. This works well, and tests the Lustre software, but the purpose of this HOWTO is to demonstrate using multiple servers and clients to test more Lustre features in an environment representative of a real install.


While these examples do use virtual machines, they are merely examples and the specifics should be easy to apply to real hardware with the prerequisites setup.
While these examples do use virtual machines, they are merely examples and the specifics should be easy to apply to real hardware with the prerequisites setup.


==== Example System Configuration ====  
==== System Configuration ====  


This HOWTO uses a cluster of six virtual machines to run the Lustre tests. Two clients, two MDS, two OSS. This enables testing of a wide variety of Lustre features.
This HOWTO uses a cluster of six virtual machines running CentOS 7.1 with a recent lustre-master build to run the Lustre tests. Two clients, two MDS, two OSS. This enables testing of a wide variety of Lustre features.


* node01 - MGS and MDS - 192.168.1.101
* node01 - MGS and MDS - 192.168.56.201
** 512MB MGT - /dev/sdb
** 1GB MGT/MDT - /dev/sdb
** 1GB MDT - /dev/sdc
* node02 - MDS - 192.168.56.202
* node02 - MDS - 192.168.1.102
** 1GB MDT - /dev/sdb
** 1GB MDT - /dev/sdb
* node03 - OSS - 192.168.1.103
* node03 - OSS - 192.168.56.203
** Four 16GB OST - /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde
** Four 16GB OST - /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde
* node04 - OSS - 192.168.1.104
* node04 - OSS - 192.168.56.204
** Four 16GB OST - /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde
** Four 16GB OST - /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde
* node05 - client - 192.168.1.105
* node05 - client - 192.168.56.205
* node06 - client - 192.168.1.106
** 16GB shared directory for tests and results
* node06 - client - 192.168.56.206


==== Prerequisites ====
==== System Setup ====


* Lustre installed on all nodes, same version
* Install the Lustre client packages on two machines, and the Lustre server packages on the other four, using the same version of Lustre
* PDSH installed and ssh passwordless authentication between nodes configured, will all identities accepted
** Follow this guide to install Lustre RPMs - [https://wiki.whamcloud.com/display/PUB/Walk-thru-+Deploying+Lustre+pre-built+RPMs Walk-thru - Deploying Lustre pre-built RPMs]
** Also be sure to install the latest e2fsprogs on the servers - [https://build.whamcloud.com/job/e2fsprogs-master/ e2fsprogs-master builds]
* Disable SELINUX
** Set SELINUX=disabled in /etc/sysconfig/selinux
* Disable the firewall
** service firewalld stop && systemctl disable firewalld.service
* Generate passwordless ssh keys for hosts and exchange identities across all nodes, and also accept the host fingerprints
** The goal is to be able to pdsh using ssh from all machines without requiring any user input
* Install PDSH and ensure you can execute commands across the cluster
** https://build.whamcloud.com/job/toolkit/arch=x86_64,distro=el7/lastSuccessfulBuild/artifact/_topdir/RPMS/x86_64/
* Install the epel-release package to enable the EPEL repo
* Install the net-tools package for netstat
* Add user 'runas' with UID 500 and GID 500 to all the nodes
* Create an NFS share that is mounted on all the nodes
** A small number of tests will make use of a shared storage location
* Configure hostnames and populate /etc/hosts
<pre>
127.0.0.1  localhost localhost.localdomain localhost4 localhost4.localdomain4
::1        localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.56.201 node01
192.168.56.202 node02
192.168.56.203 node03
192.168.56.204 node04
192.168.56.205 node05
192.168.56.206 node06
</pre>


==== Configuration ====
==== Installing Additional Applications for Testing ====


Install this configuration file in /usr/lib64/lustre/tests/cfg/multinode.sh
* Install [https://dbench.samba.org/ dbench] - optional, used to gather baseline metrics of disks used in testing
** Available from the EPEL repo, or from the Whamcloud toolkit builds https://build.whamcloud.com/job/toolkit/arch=x86_64,distro=el7/lastSuccessfulBuild/artifact/_topdir/RPMS/x86_64/
* Install [http://www.iozone.org/ iozone] - optional, used to gather baseline metrics of filesystem performance
** Available from EL7 builds of the toolkit - https://build.whamcloud.com/job/toolkit/arch=x86_64,distro=el7/lastSuccessfulBuild/artifact/_topdir/RPMS/x86_64/
* Install Parallel IO Simulator (PIOS) - optional, used to simulate shared file and file per process IO
** Available from the EL6 builds of the toolkit - https://build.whamcloud.com/job/toolkit/arch=x86_64,distro=el6/lastSuccessfulBuild/artifact/_topdir/RPMS/x86_64/
* Install POSIX package - optional, used by the posix.sh test suite
** Available from the builds of the toolkit - https://build.whamcloud.com/job/toolkit/arch=x86_64,distro=el6/lastSuccessfulBuild/artifact/_topdir/RPMS/x86_64/


<pre>
==== Test Configuration ====


# Enables verbose acc-sm output.
There is a configuration distributed with the Lustre test suite, local.sh, that easily enables you to test Lustre on a single node using loopback devices with no additional configuration needed. This works well, and tests the Lustre software, but the purpose of this HOWTO is to demonstrate using multiple servers and clients to test more Lustre features in an environment representative of a real install.
VERBOSE=${VERBOSE:-"false"}


# File system configuration
Fortunately many of the default environment variables required are defined in local.sh, so we can define the specifics for our system and then source local.sh. If you're using multiple clients, we should source the ncli.sh configuration which ultimately sources local.sh but also has some functions to setup a multiple client environment.
FSNAME="testfs"
FSTYPE=ldiskfs


# Network configuration
Install this configuration file in /usr/lib64/lustre/tests/cfg/multinode.sh on all of the nodes.
NETTYPE="tcp"
 
# fact hosts
mds_HOST=${mds_HOST:-node01}
mgs_HOST=${mgs_HOST:-$mds_HOST}
ost_HOST=${ost_HOST:-node03}


<pre>
# MDS and MDT configuration
# MDS and MDT configuration
MDSCOUNT=2
MDSCOUNT=2
SINGLEMDS=${SINGLEMDS:-mds1}
MDSSIZE=8589000
MDS_FS_MKFS_OPTS=${MDS_FS_MKFS_OPTS:-}
MDS_MOUNT_OPTS=${MDS_MOUNT_OPTS:-}
MDSFSTYPE=ldiskfs


mds1_HOST="node01"
mds_HOST="node01"
MDSDEV1="/dev/sdd"
MDSDEV1="/dev/sdb"
mds1_MOUNT="/mnt/testfs/mdt1"
mds1_FSTYPE=ldiskfs


mds2_HOST="node02"
mds2_HOST="node02"
MDSDEV2="/dev/sdd"
MDSDEV2="/dev/sdb"
mds2_MOUNT="/mnt/testfs/mdt2"
mds2_FSTYPE=ldiskfs
 
# MGS and MGT configuration
mgs_HOST=${mgs_HOST:-"$mds_HOST"} # combination mgs/mds
MGSOPT=${MGSOPT:-}
MGS_FS_MKFS_OPTS=${MGS_FS_MKFS_OPTS:-}
MGS_MOUNT_OPTS=${MGS_MOUNT_OPTS:-}
MGSFSTYPE=ldiskfs
 
MGSDEV="/dev/sdb"
MGSSIZE=536000
mgs_MOUNT="/mnt/testfs/mgt"
MGSNID="192.168.1.101@tcp"
mgs_FSTYPE=ldiskfs


# OSS and OST configuration
# OSS and OST configuration
OSTCOUNT=${OSTCOUNT:-8}
OSTCOUNT=8
OSTSIZE=${OSTSIZE:-16777216}
OSTFSTYPE=ldiskfs


ost1_HOST="node03"
ost_HOST="node03"
OSTDEV1="/dev/sdb"
OSTDEV1="/dev/sdb"
ost1_MOUNT="/mnt/testfs/ost1"
ost1_FSTYPE=ldiskfs


ost2_HOST="node03"
ost2_HOST="node03"
OSTDEV2="/dev/sdc"
OSTDEV2="/dev/sdc"
ost2_MOUNT="/mnt/testfs/ost2"
ost2_FSTYPE=ldiskfs


ost3_HOST="node03"
ost3_HOST="node03"
OSTDEV3="/dev/sdd"
OSTDEV3="/dev/sdd"
ost3_MOUNT="/mnt/testfs/ost3"
ost3_FSTYPE=ldiskfs


ost4_HOST="node03"
ost4_HOST="node03"
OSTDEV4="/dev/sde"
OSTDEV4="/dev/sde"
ost4_MOUNT="/mnt/testfs/ost4"
ost4_FSTYPE=ldiskfs


ost5_HOST="node04"
ost5_HOST="node04"
OSTDEV5="/dev/sdb"
OSTDEV5="/dev/sdb"
ost5_MOUNT="/mnt/testfs/ost5"
ost5_FSTYPE=ldiskfs


ost6_HOST="node04"
ost6_HOST="node04"
OSTDEV6="/dev/sdc"
OSTDEV6="/dev/sdc"
ost6_MOUNT="/mnt/testfs/ost6"
ost6_FSTYPE=ldiskfs


ost7_HOST="node04"
ost7_HOST="node04"
OSTDEV7="/dev/sdd"
OSTDEV7="/dev/sdd"
ost7_MOUNT="/mnt/testfs/ost7"
ost7_FSTYPE=ldiskfs


ost8_HOST="node04"
ost8_HOST="node04"
OSTDEV8="/dev/sde"
OSTDEV8="/dev/sde"
ost8_MOUNT="/mnt/testfs/ost8"
ost8_FSTYPE=ldiskfs
# OST striping configuration
STRIPE_BYTES=${STRIPE_BYTES:-1048576}
STRIPES_PER_OBJ=${STRIPES_PER_OBJ:-0}


# Client configuration
# Client configuration
CLIENTCOUNT=2
CLIENTCOUNT=2
CLIENTS="node05,node06"
RCLIENTS="node05 node06"
CLIENT1="node05"
 
CLIENT2="node06"
PDSH="/usr/bin/pdsh -S -Rssh -w"
RCLIENTS="node06"
 
SHARED_DIRECTORY=${SHARED_DIRECTORY:-/opt/testing/shared}
 
. /usr/lib64/lustre/tests/cfg/ncli.sh
</pre>
 
==== Running the Tests ====
 
Now you are ready to run the tests. I recommend by starting with one single subtest to check that you have everything configured correctly.
 
Run this command from the client1 node (node05 in this example) to launch the sanity test, running only test 0.
 
auster is the tool used to drive the tests. In this example we are passing the flags
* -f multinode - use the multinode.sh configuration in /usr/lib64/lustre/tests/cfg/ when running the tests
* -r - allow the tests to reformat the devices
* -s - run the SLOW tests, which are skipped if we don't pass this flag
* -v - provide verbose output
* sanity --only 0 - run the sanity tests, but only test 0a
 
<pre>
[root@node05 tests]# pwd
/usr/lib64/lustre/tests
[root@node05 tests]# ./auster -f multinode -rsv -d /opt/results/ sanity --only 0a
</pre>
 
You will see a lot of output while the tests format the targets, start the filesystem, and mount it on the clients.
 
You will see this output if the test successfully runs:
<pre>
== sanity test 0a: touch; rm ========================================================================= 11:17:34 (1441210654)
/mnt/testfs/f0a.sanity has type file OK
/mnt/testfs/f0a.sanity: absent OK
Resetting fail_loc on all nodes...done.
PASS 0a (1s)
== sanity test complete, duration 6 sec ============================================================== 11:17:35 (1441210655)
</pre>
 
Now you're ready to run the full sanity test. Just remove the --only flag to auster:
<pre>
[root@node05 tests]# pwd
/usr/lib64/lustre/tests
[root@node05 tests]# ./auster -f multinode -rsv -d /opt/results/ sanity
</pre>
 
Running sanity with the -s flag takes about 1h32m in the virtual machine cluster on my computer.
 
==== Test Output ====
 
By default, auster will write output to <pre> /tmp/test_logs/<date>/<time>/ </pre>
 
The -d flag directed auster to store the results in the base directory /opt/results/. auster will create <pre> /opt/results/<date>/<time>/ </pre> directories. Inside the directory corresponding to your test run, you'll find the test output.


MOUNT="/mnt/testfs"
==== Adding Additional Configuration for sanity-hsm ====
MOUNT1="/mnt/testfs"
MOUNT2="/mnt/testfs2"
DIR=${DIR:-$MOUNT}
DIR1=${DIR:-$MOUNT1}
DIR2=${DIR2:-$MOUNT2}


# UID and GID configuration
Now that we are able to run the sanity test suite, we can expand our testing by running additional test suites. For example, you may want to test the HSM feature of lustre. This is covered by the sanity-hsm test suite.
# Used by several tests to set the UID and GID
if [ $UID -ne 0 ]; then
        log "running as non-root uid $UID"
        RUNAS_ID="$UID"
        RUNAS_GID=`id -g $USER`
        RUNAS=""
else
        RUNAS_ID=${RUNAS_ID:-500}
        RUNAS_GID=${RUNAS_GID:-$RUNAS_ID}
        RUNAS=${RUNAS:-"runas -u $RUNAS_ID -g $RUNAS_GID"}
fi


# Software configuration
The sanity-hsm needs some additional configuration to the basic file we created earlier.
PDSH="/usr/bin/pdsh -S -Rssh -w"
 
FAILURE_MODE=${FAILURE_MODE:-SOFT} # or HARD
In particular, we now need a client that will act as a agent for the HSM copying, and a filesystem to archive files to.
POWER_DOWN=${POWER_DOWN:-"powerman --off"}
 
POWER_UP=${POWER_UP:-"powerman --on"}
We will add node07 to our setup.
SLOW=${SLOW:-no}
FAIL_ON_ERROR=${FAIL_ON_ERROR:-true}


# Debug configuration
* node07 - client/HSM agent - 192.168.56.207
#PTLDEBUG=${PTLDEBUG:-"vfstrace rpctrace dlmtrace neterror ha config \
** 8GB filesystem formatted ext4 and mounted at /archive - /dev/sdb
PTLDEBUG=${PTLDEBUG:-"vfstrace dlmtrace neterror ha config \
      ioctl super lfsck"}
SUBSYSTEM=${SUBSYSTEM:-"all -lnet -lnd -pinger"}


# Lustre timeout
Add this machine to our hosts file:
TIMEOUT=${TIMEOUT:-"30"}
<pre>
192.168.56.207 node07
</pre>


# promise 2MB for every cpu
Expand the RCLIENTS definition to include node07:
if [ -f /sys/devices/system/cpu/possible ]; then
    _debug_mb=$((($(cut -d "-" -f 2 /sys/devices/system/cpu/possible)+1)*2))
else
    _debug_mb=$(($(getconf _NPROCESSORS_CONF)*2))
fi


DEBUG_SIZE=${DEBUG_SIZE:-$_debug_mb}
<pre>
DEBUGFS=${DEBUGFS:-"/sbin/debugfs"}
RCLIENTS="node05 node06 node07"
</pre>


TMP=${TMP:-/tmp}
Add additional environment variables to our multinode.sh configuration file for sanity-hsm:


<pre>
AGTDEV1=/archive
agt1_HOST=node07
HSMTOOL_VERBOSE="-v -v -v -v -v -v"
</pre>
</pre>
Our system is now setup to run the sanity-hsm test suite. This is the auster command you would run to start the test:
<pre>
[root@node05 tests]# pwd
/usr/lib64/lustre/tests
[root@node05 tests]# ./auster -f multinode -rsv -d /opt/results/ sanity-hsm
</pre>
[[Category: Testing]]

Latest revision as of 10:38, 22 June 2018

This HOWTO is intended to demonstrate the basics of configuring and running tests on a multi-node configuration. There are many tests available in the Lustre Test Suites, and the principles demonstrated in configuring your system and running the sanity test suite will apply to them all. We will also demonstrate adding additional configuration for feature specific test suites.

While these examples do use virtual machines, they are merely examples and the specifics should be easy to apply to real hardware with the prerequisites setup.

System Configuration

This HOWTO uses a cluster of six virtual machines running CentOS 7.1 with a recent lustre-master build to run the Lustre tests. Two clients, two MDS, two OSS. This enables testing of a wide variety of Lustre features.

  • node01 - MGS and MDS - 192.168.56.201
    • 1GB MGT/MDT - /dev/sdb
  • node02 - MDS - 192.168.56.202
    • 1GB MDT - /dev/sdb
  • node03 - OSS - 192.168.56.203
    • Four 16GB OST - /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde
  • node04 - OSS - 192.168.56.204
    • Four 16GB OST - /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde
  • node05 - client - 192.168.56.205
    • 16GB shared directory for tests and results
  • node06 - client - 192.168.56.206

System Setup

  • Install the Lustre client packages on two machines, and the Lustre server packages on the other four, using the same version of Lustre
  • Disable SELINUX
    • Set SELINUX=disabled in /etc/sysconfig/selinux
  • Disable the firewall
    • service firewalld stop && systemctl disable firewalld.service
  • Generate passwordless ssh keys for hosts and exchange identities across all nodes, and also accept the host fingerprints
    • The goal is to be able to pdsh using ssh from all machines without requiring any user input
  • Install PDSH and ensure you can execute commands across the cluster
  • Install the epel-release package to enable the EPEL repo
  • Install the net-tools package for netstat
  • Add user 'runas' with UID 500 and GID 500 to all the nodes
  • Create an NFS share that is mounted on all the nodes
    • A small number of tests will make use of a shared storage location
  • Configure hostnames and populate /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.56.201	node01
192.168.56.202	node02
192.168.56.203	node03
192.168.56.204	node04
192.168.56.205	node05
192.168.56.206	node06

Installing Additional Applications for Testing

Test Configuration

There is a configuration distributed with the Lustre test suite, local.sh, that easily enables you to test Lustre on a single node using loopback devices with no additional configuration needed. This works well, and tests the Lustre software, but the purpose of this HOWTO is to demonstrate using multiple servers and clients to test more Lustre features in an environment representative of a real install.

Fortunately many of the default environment variables required are defined in local.sh, so we can define the specifics for our system and then source local.sh. If you're using multiple clients, we should source the ncli.sh configuration which ultimately sources local.sh but also has some functions to setup a multiple client environment.

Install this configuration file in /usr/lib64/lustre/tests/cfg/multinode.sh on all of the nodes.

# MDS and MDT configuration
MDSCOUNT=2

mds_HOST="node01"
MDSDEV1="/dev/sdb"

mds2_HOST="node02"
MDSDEV2="/dev/sdb"

# OSS and OST configuration
OSTCOUNT=8

ost_HOST="node03"
OSTDEV1="/dev/sdb"

ost2_HOST="node03"
OSTDEV2="/dev/sdc"

ost3_HOST="node03"
OSTDEV3="/dev/sdd"

ost4_HOST="node03"
OSTDEV4="/dev/sde"

ost5_HOST="node04"
OSTDEV5="/dev/sdb"

ost6_HOST="node04"
OSTDEV6="/dev/sdc"

ost7_HOST="node04"
OSTDEV7="/dev/sdd"

ost8_HOST="node04"
OSTDEV8="/dev/sde"

# Client configuration
CLIENTCOUNT=2
RCLIENTS="node05 node06"

PDSH="/usr/bin/pdsh -S -Rssh -w"

SHARED_DIRECTORY=${SHARED_DIRECTORY:-/opt/testing/shared}

. /usr/lib64/lustre/tests/cfg/ncli.sh

Running the Tests

Now you are ready to run the tests. I recommend by starting with one single subtest to check that you have everything configured correctly.

Run this command from the client1 node (node05 in this example) to launch the sanity test, running only test 0.

auster is the tool used to drive the tests. In this example we are passing the flags

  • -f multinode - use the multinode.sh configuration in /usr/lib64/lustre/tests/cfg/ when running the tests
  • -r - allow the tests to reformat the devices
  • -s - run the SLOW tests, which are skipped if we don't pass this flag
  • -v - provide verbose output
  • sanity --only 0 - run the sanity tests, but only test 0a
[root@node05 tests]# pwd
/usr/lib64/lustre/tests
[root@node05 tests]# ./auster -f multinode -rsv -d /opt/results/ sanity --only 0a

You will see a lot of output while the tests format the targets, start the filesystem, and mount it on the clients.

You will see this output if the test successfully runs:

== sanity test 0a: touch; rm ========================================================================= 11:17:34 (1441210654)
/mnt/testfs/f0a.sanity has type file OK
/mnt/testfs/f0a.sanity: absent OK
Resetting fail_loc on all nodes...done.
PASS 0a (1s)
== sanity test complete, duration 6 sec ============================================================== 11:17:35 (1441210655)

Now you're ready to run the full sanity test. Just remove the --only flag to auster:

[root@node05 tests]# pwd
/usr/lib64/lustre/tests
[root@node05 tests]# ./auster -f multinode -rsv -d /opt/results/ sanity

Running sanity with the -s flag takes about 1h32m in the virtual machine cluster on my computer.

Test Output

By default, auster will write output to

 /tmp/test_logs/<date>/<time>/ 

The -d flag directed auster to store the results in the base directory /opt/results/. auster will create

 /opt/results/<date>/<time>/ 

directories. Inside the directory corresponding to your test run, you'll find the test output.

Adding Additional Configuration for sanity-hsm

Now that we are able to run the sanity test suite, we can expand our testing by running additional test suites. For example, you may want to test the HSM feature of lustre. This is covered by the sanity-hsm test suite.

The sanity-hsm needs some additional configuration to the basic file we created earlier.

In particular, we now need a client that will act as a agent for the HSM copying, and a filesystem to archive files to.

We will add node07 to our setup.

  • node07 - client/HSM agent - 192.168.56.207
    • 8GB filesystem formatted ext4 and mounted at /archive - /dev/sdb

Add this machine to our hosts file:

192.168.56.207	node07

Expand the RCLIENTS definition to include node07:

RCLIENTS="node05 node06 node07"

Add additional environment variables to our multinode.sh configuration file for sanity-hsm:

AGTDEV1=/archive 
agt1_HOST=node07
HSMTOOL_VERBOSE="-v -v -v -v -v -v"

Our system is now setup to run the sanity-hsm test suite. This is the auster command you would run to start the test:

[root@node05 tests]# pwd
/usr/lib64/lustre/tests
[root@node05 tests]# ./auster -f multinode -rsv -d /opt/results/ sanity-hsm