Difference between revisions of "Testing HOWTO"

From Lustre Wiki
Jump to navigation Jump to search
Line 4: Line 4:
  
 
While these examples do use virtual machines, they are merely examples and the specifics should be easy to apply to real hardware with the prerequisites setup.
 
While these examples do use virtual machines, they are merely examples and the specifics should be easy to apply to real hardware with the prerequisites setup.
 
This HOWTO uses CentOS 7.1 with Lustre 2.8-RC.
 
  
 
==== System Configuration ====  
 
==== System Configuration ====  
  
This HOWTO uses a cluster of six virtual machines to run the Lustre tests. Two clients, two MDS, two OSS. This enables testing of a wide variety of Lustre features.
+
This HOWTO uses a cluster of six virtual machines running CentOS 7.1 with a recent lustre-master to run the Lustre tests. Two clients, two MDS, two OSS. This enables testing of a wide variety of Lustre features.
  
 
* node01 - MGS and MDS - 192.168.56.201
 
* node01 - MGS and MDS - 192.168.56.201
Line 26: Line 24:
 
==== System Setup ====
 
==== System Setup ====
  
* Lustre installed on all nodes, same version
+
* Install the Lustre clients on two machines, and the Lustre server on the other four, using the same version of Lustre
 
** Follow this guide to install Lustre RPMs - [https://wiki.hpdd.intel.com/display/PUB/Walk-thru-+Deploying+Lustre+pre-built+RPMs Walk-thru - Deploying Lustre pre-built RPMs]
 
** Follow this guide to install Lustre RPMs - [https://wiki.hpdd.intel.com/display/PUB/Walk-thru-+Deploying+Lustre+pre-built+RPMs Walk-thru - Deploying Lustre pre-built RPMs]
 
* Disable SELINUX
 
* Disable SELINUX
Line 32: Line 30:
 
* Disable the firewall
 
* Disable the firewall
 
** service firewalld stop && systemctl disable firewalld.service
 
** service firewalld stop && systemctl disable firewalld.service
* Generate passwordless ssh keys for hosts and exchange identities across all nodes, while also accepting the host keys
+
* Generate passwordless ssh keys for hosts and exchange identities across all nodes, and also accept the host keys
 +
* Install PDSH and ensure you can execute commands across the cluster
 +
** https://build.hpdd.intel.com/job/toolkit/arch=x86_64,distro=el7/lastSuccessfulBuild/artifact/_topdir/RPMS/x86_64/
 
* Install the epel-release package to enable the EPEL repo
 
* Install the epel-release package to enable the EPEL repo
 
* Install the net-tools package for netstat
 
* Install the net-tools package for netstat
* Add user 'runas' with UID 500 and GID 500
+
* Add user 'runas' with UID 500 and GID 500 to all the nodes
 +
* Create an NFS share that is mounted on all the nodes
 +
** A small number of tests will make use of a shared storage location
 
* Configure hostnames and populate /etc/hosts
 
* Configure hostnames and populate /etc/hosts
 
<pre>
 
<pre>
Line 50: Line 52:
 
==== Installing Additional Applications for Testing ====
 
==== Installing Additional Applications for Testing ====
  
* Install PDSH - mandatory
 
** https://build.hpdd.intel.com/job/toolkit/arch=x86_64,distro=el7/lastSuccessfulBuild/artifact/_topdir/RPMS/x86_64/
 
 
* Install [https://dbench.samba.org/ dbench] - optional, used to gather baseline metrics of disks used in testing
 
* Install [https://dbench.samba.org/ dbench] - optional, used to gather baseline metrics of disks used in testing
 
** Available from the EPEL repo, or from the Intel HPDD toolkit builds https://build.hpdd.intel.com/job/toolkit/arch=x86_64,distro=el7/lastSuccessfulBuild/artifact/_topdir/RPMS/x86_64/
 
** Available from the EPEL repo, or from the Intel HPDD toolkit builds https://build.hpdd.intel.com/job/toolkit/arch=x86_64,distro=el7/lastSuccessfulBuild/artifact/_topdir/RPMS/x86_64/
Line 256: Line 256:
 
[[email protected] tests]# ./auster -f multinode -rsv -d /opt/results/ sanity
 
[[email protected] tests]# ./auster -f multinode -rsv -d /opt/results/ sanity
 
</pre>
 
</pre>
 +
 +
Running sanity with the -s flag takes about 1h32m in the virtual machine cluster on my computer.
 +
 +
==== Test Output ====
  
 
The -d flag directed auster to store the results in the base directory /opt/results/. auster will create <pre> /opt/results/<date>/<time> </pre> directories. Inside the directory corresponding to your test run, you'll find the test output.
 
The -d flag directed auster to store the results in the base directory /opt/results/. auster will create <pre> /opt/results/<date>/<time> </pre> directories. Inside the directory corresponding to your test run, you'll find the test output.

Revision as of 09:18, 3 September 2015

This HOWTO is intended to demonstrate the basics of configuring and running a small subset of tests on a multi-node configuration. There are many tests available in the suite, and the principles demonstrated here will apply to them all, but this HOWTO will focus on the sanity .

Note that there is a configuration distributed with the Lustre test suite, local.sh, that easily enables you to test Lustre on a single node using lookback devices with no additional configuration needed. This works well, and tests the Lustre software, but the purpose of this HOWTO is to demonstrate using multiple servers and clients to test more Lustre features in an environment representative of a real install.

While these examples do use virtual machines, they are merely examples and the specifics should be easy to apply to real hardware with the prerequisites setup.

System Configuration

This HOWTO uses a cluster of six virtual machines running CentOS 7.1 with a recent lustre-master to run the Lustre tests. Two clients, two MDS, two OSS. This enables testing of a wide variety of Lustre features.

  • node01 - MGS and MDS - 192.168.56.201
    • 512MB MGT - /dev/sdb
    • 1GB MDT - /dev/sdc
  • node02 - MDS - 192.168.56.202
    • 1GB MDT - /dev/sdb
  • node03 - OSS - 192.168.56.203
    • Four 16GB OST - /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde
  • node04 - OSS - 192.168.56.204
    • Four 16GB OST - /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde
  • node05 - client - 192.168.56.205
    • 16GB shared directory for tests and results
  • node06 - client - 192.168.56.206

System Setup

  • Install the Lustre clients on two machines, and the Lustre server on the other four, using the same version of Lustre
  • Disable SELINUX
    • Set SELINUX=disabled in /etc/sysconfig/selinux
  • Disable the firewall
    • service firewalld stop && systemctl disable firewalld.service
  • Generate passwordless ssh keys for hosts and exchange identities across all nodes, and also accept the host keys
  • Install PDSH and ensure you can execute commands across the cluster
  • Install the epel-release package to enable the EPEL repo
  • Install the net-tools package for netstat
  • Add user 'runas' with UID 500 and GID 500 to all the nodes
  • Create an NFS share that is mounted on all the nodes
    • A small number of tests will make use of a shared storage location
  • Configure hostnames and populate /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.56.201	node01	node01.unix.localdomain
192.168.56.202	node02	node02.unix.localdomain
192.168.56.203	node03	node03.unix.localdomain
192.168.56.204	node04	node04.unix.localdomain
192.168.56.205	node05	node05.unix.localdomain
192.168.56.206	node06	node06.unix.localdomain

Installing Additional Applications for Testing

Test Configuration

Install this configuration file in /usr/lib64/lustre/tests/cfg/multinode.sh on all of the nodes.

# Enables verbose acc-sm output.
VERBOSE=${VERBOSE:-"false"}

# File system configuration
FSNAME="testfs"
FSTYPE=ldiskfs

# Network configuration
NETTYPE="tcp"

# fact hosts
mds_HOST=${mds_HOST:-node01}
mgs_HOST=${mgs_HOST:-$mds_HOST}
ost_HOST=${ost_HOST:-node03}

# MDS and MDT configuration
MDSCOUNT=2
SINGLEMDS=${SINGLEMDS:-mds1}

MDSSIZE=8589000
MDS_FS_MKFS_OPTS=${MDS_FS_MKFS_OPTS:-}
MDS_MOUNT_OPTS=${MDS_MOUNT_OPTS:-}
MDSFSTYPE=ldiskfs

mds1_HOST="node01"
MDSDEV1="/dev/sdd"
mds1_MOUNT="/mnt/testfs/mdt1"
mds1_FSTYPE=ldiskfs

mds2_HOST="node02"
MDSDEV2="/dev/sdd"
mds2_MOUNT="/mnt/testfs/mdt2"
mds2_FSTYPE=ldiskfs

# MGS and MGT configuration
mgs_HOST=${mgs_HOST:-"$mds_HOST"} # combination mgs/mds
MGSOPT=${MGSOPT:-}
MGS_FS_MKFS_OPTS=${MGS_FS_MKFS_OPTS:-}
MGS_MOUNT_OPTS=${MGS_MOUNT_OPTS:-}
MGSFSTYPE=ldiskfs

MGSDEV="/dev/sdb"
MGSSIZE=536000
mgs_MOUNT="/mnt/testfs/mgt"
MGSNID="[email protected]"
mgs_FSTYPE=ldiskfs

# OSS and OST configuration
OSTCOUNT=${OSTCOUNT:-8}
OSTSIZE=${OSTSIZE:-16777216}
OSTFSTYPE=ldiskfs

ost1_HOST="node03"
OSTDEV1="/dev/sdb"
ost1_MOUNT="/mnt/testfs/ost1"
ost1_FSTYPE=ldiskfs

ost2_HOST="node03"
OSTDEV2="/dev/sdc"
ost2_MOUNT="/mnt/testfs/ost2"
ost2_FSTYPE=ldiskfs

ost3_HOST="node03"
OSTDEV3="/dev/sdd"
ost3_MOUNT="/mnt/testfs/ost3"
ost3_FSTYPE=ldiskfs

ost4_HOST="node03"
OSTDEV4="/dev/sde"
ost4_MOUNT="/mnt/testfs/ost4"
ost4_FSTYPE=ldiskfs

ost5_HOST="node04"
OSTDEV5="/dev/sdb"
ost5_MOUNT="/mnt/testfs/ost5"
ost5_FSTYPE=ldiskfs

ost6_HOST="node04"
OSTDEV6="/dev/sdc"
ost6_MOUNT="/mnt/testfs/ost6"
ost6_FSTYPE=ldiskfs

ost7_HOST="node04"
OSTDEV7="/dev/sdd"
ost7_MOUNT="/mnt/testfs/ost7"
ost7_FSTYPE=ldiskfs

ost8_HOST="node04"
OSTDEV8="/dev/sde"
ost8_MOUNT="/mnt/testfs/ost8"
ost8_FSTYPE=ldiskfs

# OST striping configuration
STRIPE_BYTES=${STRIPE_BYTES:-1048576}
STRIPES_PER_OBJ=${STRIPES_PER_OBJ:-0}

# Client configuration
CLIENTCOUNT=2
CLIENTS="node05,node06"
CLIENT1="node05"
CLIENT2="node06"
RCLIENTS="node06"

MOUNT="/mnt/testfs"
MOUNT1="/mnt/testfs"
MOUNT2="/mnt/testfs2"
DIR=${DIR:-$MOUNT}
DIR1=${DIR:-$MOUNT1}
DIR2=${DIR2:-$MOUNT2}

# UID and GID configuration
# Used by several tests to set the UID and GID
if [ $UID -ne 0 ]; then
        log "running as non-root uid $UID"
        RUNAS_ID="$UID"
        RUNAS_GID=`id -g $USER`
        RUNAS=""
else
        RUNAS_ID=${RUNAS_ID:-500}
        RUNAS_GID=${RUNAS_GID:-$RUNAS_ID}
        RUNAS=${RUNAS:-"runas -u $RUNAS_ID -g $RUNAS_GID"}
fi

# Software configuration
PDSH="/usr/bin/pdsh -S -Rssh -w"
FAILURE_MODE=${FAILURE_MODE:-SOFT} # or HARD
POWER_DOWN=${POWER_DOWN:-"powerman --off"}
POWER_UP=${POWER_UP:-"powerman --on"}
SLOW=${SLOW:-no}
FAIL_ON_ERROR=${FAIL_ON_ERROR:-true}

# Debug configuration
#PTLDEBUG=${PTLDEBUG:-"vfstrace rpctrace dlmtrace neterror ha config \
PTLDEBUG=${PTLDEBUG:-"vfstrace dlmtrace neterror ha config \
		      ioctl super lfsck"}
SUBSYSTEM=${SUBSYSTEM:-"all -lnet -lnd -pinger"}

# Lustre timeout
TIMEOUT=${TIMEOUT:-"30"}

# promise 2MB for every cpu
if [ -f /sys/devices/system/cpu/possible ]; then
    _debug_mb=$((($(cut -d "-" -f 2 /sys/devices/system/cpu/possible)+1)*2))
else
    _debug_mb=$(($(getconf _NPROCESSORS_CONF)*2))
fi

DEBUG_SIZE=${DEBUG_SIZE:-$_debug_mb}
DEBUGFS=${DEBUGFS:-"/sbin/debugfs"}

TMP=${TMP:-/tmp}
SHARED_DIRECTORY=${SHARED_DIRECTORY:-/opt/testing/shared}

Running the Tests

Now you are ready to run the tests. I recommend by starting with one single subtest to check that you have everything configured correctly.

Run this command from the client1 node (node05 in this example) to launch the sanity test, running only test 0.

auster is the tool used to drive the tests. In this example we are passing the flags

  • -f multinode - use the multinode.sh configuration in /usr/lib64/lustre/tests/cfg/ when running the tests
  • -r - allow the tests to reformat the devices
  • -s - run the SLOW tests, which are skipped if we don't pass this flag
  • -v - provide verbose output
  • sanity --only 0 - run the sanity tests, but only test 0a
[[email protected] tests]# pwd
/usr/lib64/lustre/tests
[[email protected] tests]# ./auster -f multinode -rsv -d /opt/results/ sanity --only 0a

You will see a lot of output while the tests format the targets, start the filesystem, and mount it on the clients.

You will see this output if the test successfully runs:

== sanity test 0a: touch; rm ========================================================================= 11:17:34 (1441210654)
/mnt/testfs/f0a.sanity has type file OK
/mnt/testfs/f0a.sanity: absent OK
Resetting fail_loc on all nodes...done.
PASS 0a (1s)
== sanity test complete, duration 6 sec ============================================================== 11:17:35 (1441210655)

Now you're ready to run the full sanity test. Just remove the --only flag to auster:

[[email protected] tests]# pwd
/usr/lib64/lustre/tests
[[email protected] tests]# ./auster -f multinode -rsv -d /opt/results/ sanity

Running sanity with the -s flag takes about 1h32m in the virtual machine cluster on my computer.

Test Output

The -d flag directed auster to store the results in the base directory /opt/results/. auster will create

 /opt/results/<date>/<time> 

directories. Inside the directory corresponding to your test run, you'll find the test output.