Testing HOWTO

From Lustre Wiki
Revision as of 08:34, 28 August 2015 by Justinmiller (talk | contribs) (Rough draft of HOWTO)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This HOWTO is intended to demonstrate the basics of configuring and running a small subset of tests on a multi-node configuration. There are many tests available in the suite, and the principles demonstrated here will apply to them all, but this HOWTO will focus on the sanity test.

Note that there is a configuration distributed with the Lustre test suite, local.sh, that easily enables you to test Lustre on a single node using lookback devices with no additional configuration needed. This works well, and tests the Lustre software, but the purpose of this HOWTO is to demonstrate using multiple servers and clients to test more Lustre features in an environment representative of a real install.

While these examples do use virtual machines, they are merely examples and the specifics should be easy to apply to real hardware with the prerequisites setup.

Example System Configuration

This HOWTO uses a cluster of six virtual machines to run the Lustre tests. Two clients, two MDS, two OSS. This enables testing of a wide variety of Lustre features.

  • node01 - 192.168.1.101
    • MGS and MDS
      • 512MB MGT
      • 1GB MDT
  • node02 - 192.168.1.102
    • MDS
      • 1GB MDT
  • node03 - 192.168.1.103
    • OSS
      • Four 16GB OST
  • node04 - 192.168.1.104
    • OSS
      • Four 16GB OST
  • node05 - 192.168.1.105
    • client
  • node06 - 192.168.1.106
    • client

Prerequisites

  • Lustre installed on all nodes, same version
  • PDSH installed and ssh passwordless authentication between nodes configured, will all identities accepted

Configuration

Install this configuration file in /usr/lib64/lustre/tests/cfg/multinode.sh

# Enables verbose acc-sm output. VERBOSE=${VERBOSE:-"false"}

  1. File system configuration

FSNAME="testfs" FSTYPE=ldiskfs

  1. Network configuration

NETTYPE="tcp"

  1. fact hosts

mds_HOST=${mds_HOST:-node01} mgs_HOST=${mgs_HOST:-$mds_HOST} ost_HOST=${ost_HOST:-node03}

  1. MDS and MDT configuration

MDSCOUNT=1 SINGLEMDS=${SINGLEMDS:-mds1}

MDSSIZE=8589000 MDS_FS_MKFS_OPTS=${MDS_FS_MKFS_OPTS:-} MDS_MOUNT_OPTS=${MDS_MOUNT_OPTS:-} MDSFSTYPE=ldiskfs

mds1_HOST="node01" MDSDEV1="/dev/sdd" mds1_MOUNT="/mnt/testfs/mdt1" mds1_FSTYPE=ldiskfs

  1. MGS and MGT configuration

mgs_HOST=${mgs_HOST:-"$mds_HOST"} # combination mgs/mds MGSOPT=${MGSOPT:-} MGS_FS_MKFS_OPTS=${MGS_FS_MKFS_OPTS:-} MGS_MOUNT_OPTS=${MGS_MOUNT_OPTS:-} MGSFSTYPE=ldiskfs

MGSDEV="/dev/sdb" MGSSIZE=536000 mgs_MOUNT="/mnt/testfs/mgt" MGSNID="192.168.1.101@tcp" mgs_FSTYPE=ldiskfs

  1. OSS and OST configuration

OSTCOUNT=${OSTCOUNT:-8} OSTSIZE=${OSTSIZE:-16777216} OSTFSTYPE=ldiskfs

ost1_HOST="node03" OSTDEV1="/dev/sdb" ost1_MOUNT="/mnt/testfs/ost1" ost1_FSTYPE=ldiskfs

ost2_HOST="node03" OSTDEV2="/dev/sdc" ost2_MOUNT="/mnt/testfs/ost2" ost2_FSTYPE=ldiskfs

ost3_HOST="node03" OSTDEV3="/dev/sdd" ost3_MOUNT="/mnt/testfs/ost3" ost3_FSTYPE=ldiskfs

ost4_HOST="node03" OSTDEV4="/dev/sde" ost4_MOUNT="/mnt/testfs/ost4" ost4_FSTYPE=ldiskfs

ost5_HOST="node04" OSTDEV5="/dev/sdb" ost5_MOUNT="/mnt/testfs/ost5" ost5_FSTYPE=ldiskfs

ost6_HOST="node04" OSTDEV6="/dev/sdc" ost6_MOUNT="/mnt/testfs/ost6" ost6_FSTYPE=ldiskfs

ost7_HOST="node04" OSTDEV7="/dev/sdd" ost7_MOUNT="/mnt/testfs/ost7" ost7_FSTYPE=ldiskfs

ost8_HOST="node04" OSTDEV8="/dev/sde" ost8_MOUNT="/mnt/testfs/ost8" ost8_FSTYPE=ldiskfs

  1. OST striping configuration

STRIPE_BYTES=${STRIPE_BYTES:-1048576} STRIPES_PER_OBJ=${STRIPES_PER_OBJ:-0}

  1. Client configuration

CLIENTCOUNT=2 CLIENTS="node05,node06" CLIENT1="node05" CLIENT2="node06" RCLIENTS="node06"

MOUNT="/mnt/testfs" MOUNT1="/mnt/testfs" MOUNT2="/mnt/testfs2" DIR=${DIR:-$MOUNT} DIR1=${DIR:-$MOUNT1} DIR2=${DIR2:-$MOUNT2}

  1. UID and GID configuration
  2. Used by several tests to set the UID and GID

if [ $UID -ne 0 ]; then

       log "running as non-root uid $UID"
       RUNAS_ID="$UID"
       RUNAS_GID=`id -g $USER`
       RUNAS=""

else

       RUNAS_ID=${RUNAS_ID:-500}
       RUNAS_GID=${RUNAS_GID:-$RUNAS_ID}
       RUNAS=${RUNAS:-"runas -u $RUNAS_ID -g $RUNAS_GID"}

fi

  1. Software configuration

PDSH="/usr/bin/pdsh -S -Rssh -w" FAILURE_MODE=${FAILURE_MODE:-SOFT} # or HARD POWER_DOWN=${POWER_DOWN:-"powerman --off"} POWER_UP=${POWER_UP:-"powerman --on"} SLOW=${SLOW:-no} FAIL_ON_ERROR=${FAIL_ON_ERROR:-true}

  1. Debug configuration
  2. PTLDEBUG=${PTLDEBUG:-"vfstrace rpctrace dlmtrace neterror ha config \

PTLDEBUG=${PTLDEBUG:-"vfstrace dlmtrace neterror ha config \ ioctl super lfsck"} SUBSYSTEM=${SUBSYSTEM:-"all -lnet -lnd -pinger"}

  1. Lustre timeout

TIMEOUT=${TIMEOUT:-"30"}

  1. promise 2MB for every cpu

if [ -f /sys/devices/system/cpu/possible ]; then

   _debug_mb=$((($(cut -d "-" -f 2 /sys/devices/system/cpu/possible)+1)*2))

else

   _debug_mb=$(($(getconf _NPROCESSORS_CONF)*2))

fi

DEBUG_SIZE=${DEBUG_SIZE:-$_debug_mb} DEBUGFS=${DEBUGFS:-"/sbin/debugfs"}

TMP=${TMP:-/tmp}