C H A P T E R  27

Lustre Operating Tips

This chapter describes tips to improve Lustre operations and includes the following sections:


27.1 Adding an OST to a Lustre File System

To add an OST to existing Lustre file system:

1. Add a new OST by passing on the following commands, run:

$ mkfs.lustre --fsname=spfs --ost --mgsnode=mds16@tcp0 /dev/sda
$ mkdir -p /mnt/test/ost0
$ mount -t lustre /dev/sda /mnt/test/ost0

2. Migrate the data (possibly).

The file system is quite unbalanced when new empty OSTs are added. New file creations are automatically balanced. If this is a scratch file system or files are pruned at a regular interval, then no further work may be needed. Files existing prior to the expansion can be rebalanced with an in-place copy, which can be done with a simple script.

The basic method is to copy existing files to a temporary file, then move the temp file over the old one. This should not be attempted with files which are currently being written to by users or applications. This operation redistributes the stripes over the entire set of OSTs. For a sample data migration script, see A Simple Data Migration Script.

A very clever migration script would do the following:

If a Lustre administrator wants to explore this approach further, per-OST disk-usage statistics can be found under /proc/fs/lustre/osc/*/rpc_stats


27.2 A Simple Data Migration Script

#!/bin/bash
# set -x
  
# A script to copy and check files.
# To avoid allocating objects on one or more OSTs, they should be
# deactivated on the MDS via "lctl --device {device_number} deactivate",
# where {device_number} is from the output of "lctl dl" on the MDS.
# To guard against corruption, the file is chksum'd 
# before and after the operation.
# 
  
CKSUM=${CKSUM:-md5sum}
 
usage() {
    echo "usage: $0 [-O <OST_UUID-to-empty>] <dir>" 1>&2
    echo "    -O can be specified multiple times" 1>&2
    exit 1
}
 
while getopts "O:" opt $*; do
    case $opt in
        O) OST_PARAM="$OST_PARAM -O $OPTARG";;
        \?) usage;;
    esac
done
 
shift $((OPTIND - 1))
MVDIR=$1
 
if [ $# -ne 1 -o ! -d $MVDIR ]; then
    usage
fi
 
lfs find -type f $OST_PARAM $MVDIR | while read OLDNAME; do
    echo -n "$OLDNAME: "
    if [ ! -w "$OLDNAME" ]; then
        echo "No write permission, skipping"
        continue
    fi
		
    OLDCHK=$($CKSUM "$OLDNAME" | awk '{print $1}')
    if [ -z "$OLDCHK" ]; then
        echo "checksum error - exiting" 1>&2
	exit 1
    fi
 
    NEWNAME=$(mktemp "$OLDNAME.tmp.XXXXXX")
    if [ $? -ne 0 -o -z "$NEWNAME" ]; then
        echo "unable to create temp file - exiting" 1>&2
	exit 2
    fi
    
    cp -a "$OLDNAME" "$NEWNAME"
    if [ $? -ne 0 ]; then 
        echo "copy error - exiting" 1>&2
        rm -f "$NEWNAME"
        exit 4
    fi
 
    NEWCHK=$($CKSUM "$NEWNAME" | awk '{print $1}') 
    if [ -z "$NEWCHK" ]; then
        echo "'$NEWNAME' checksum error - exiting" 1>&2
	exit 6
    fi
    if [ $OLDCHK != $NEWCHK ]; then
        echo "'$NEWNAME' bad checksum - "$OLDNAME" not moved, exiting" 1>&2
        rm -f "$NEWNAME"
        exit 8
    else
        mv "$NEWNAME" "$OLDNAME"
        if [ $? -ne 0 ]; then 
            echo "rename error - exiting" 1>&2
            rm -f "$NEWNAME"
            exit 12
        fi
    fi
    echo "done"
done


27.3 Adding Multiple SCSI LUNs on Single HBA

The configuration of the kernels packaged by the Lustre group is similar to that of the upstream RedHat and SuSE packages. Currently, RHEL does not enable CONFIG_SCSI_MULTI_LUN because it can cause problems with SCSI hardware.

To enable this, set the scsi_mod max_scsi_luns=xx option (typically, xx is 128) in either modprobe.conf (2.6 kernel) or modules.conf (2.4 kernel).

To pass this option as a kernel boot argument (in grub.conf or lilo.conf), compile the kernel with CONFIG_SCSI_MULT_LUN=y


27.4 Failures Running a Client and OST on the Same Machine

There are inherent problems if a client and OST share the same machine (and the same memory pool). An effort to relieve memory pressure (by the client), requires memory to be available to the OST. If the client is experiencing memory pressure, then the OST is as well. The OST may not get the memory it needs to help the client get the memory it needs because it is all one memory pool; this results in deadlock.

Running a client and an OST on the same machine can cause these failures:

As a result, running OST and client on same machine can cause a double failure and prevent a complete recovery.


27.5 Improving Lustre Metadata Performance While Using Large Directories

To improve metadata performance while using large directories, follow these tips: