Lustre with ZFS Install

Introduction
This page is an attempt to provide some information on how to install Lustre with a ZFS backend. You are encouraged to add your own version, either as a separate section or by editing this page into a general guide.

Build Lustre with ZFS (state as of 2017-12-04)
The following guides are valid for CentOS 7.

Build Lustre
HowTo build a one-node Lustre file system with ZFS backend:

 Prepare System  Disable SELinux for older clients sed -i '/^SELINUX=/s/.*/SELINUX=disabled/' /etc/selinux/config Install the kernel development tools yum -y groupinstall 'Development Tools' yum -y install epel-release Install additional dependencies yum -y install xmlto asciidoc elfutils-libelf-devel zlib-devel kernel-devel libyaml-devel yum -y install binutils-devel newt-devel python-devel hmaccalc perl-ExtUtils-Embed yum -y install bison elfutils-devel audit-libs-devel python-docutils sg3_utils expect yum -y install attr lsof quilt libselinux-devel  Prepare ZFS backend (follow the guide for packaged ZFS or go to this section for custom ZFS build)  EPEL release  URL='http://download.zfsonlinux.org' yum -y install --nogpgcheck $URL/epel/zfs-release.el7.noarch.rpm For the newest Lustre releases change <tt>/etc/yum.repos.d/zfs.repo</tt> to switch from <tt>dkms</tt> to <tt>kmod</tt> (more info |here and |here) </li> URL='http://download.zfsonlinux.org' [zfs] name=ZFS on Linux for EL 7 - dkms baseurl=http://download.zfsonlinux.org/epel/7/$basearch/ -enabled=1 +enabled=0 metadata_expire=7d gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux @@ -9,7 +9,7 @@ [zfs-kmod] name=ZFS on Linux for EL 7 - kmod baseurl=http://download.zfsonlinux.org/epel/7/kmod/$basearch/ -enabled=0 +enabled=1 metadata_expire=7d gpgcheck=1 gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux Install ZFS and its associated SPL packages </li> <ul>  <tt>kmod</tt> packages for newer releases </li> yum install -y zfs libzfs2-devel kmod-spl-devel kmod-zfs-devel  <tt>dkms</tt> packages for <tt>dkms</tt> support </li> yum install -y zfs libzfs2-devel zfs-dkms </ul> </ol> Build Lustre </li>  Get Lustre source code </li> git clone git://git.hpdd.intel.com/fs/lustre-release.git Configure (<tt>--disable-ldiskfs</tt> for ZFS backend, <tt>--disable-server</tt> for client only) </li> cd lustre-release/ sh ./autogen.sh ./configure --disable-ldiskfs Make and install <tt>rpms</tt> </li> make rpms yum -y install *.$(arch).rpm </ol> You may need to reboot and to explicitly load the ZFS and Lustre modules </li> reboot modprobe zfs modprobe lustre Format targets (change /tmp in this example to real devices or partitions and you may not need the device size then) </li> mkfs.lustre --mgs --backfstype=zfs --fsname=lustre --device-size=1048576 lustre-mgs/mgs /tmp/lustre-mgs mkfs.lustre --mdt --backfstype=zfs --fsname=lustre --index=0 --mgsnode=$(hostname)@tcp --device-size=1048576 lustre-mdt0/mdt0 /tmp/lustre-mdt0 mkfs.lustre --ost --backfstype=zfs --fsname=lustre --index=0 --mgsnode=$(hostname)@tcp --device-size=1048576 lustre-ost0/ost0 /tmp/lustre-ost0 <ul> Change <tt>/etc/ldev.conf</tt> </li> </ul> hostname - mgs    zfs:lustre-mgs/mgs hostname - mdt0   zfs:lustre-mdt0/mdt0 hostname - ost0   zfs:lustre-ost0/ost0 Run Lustre </li>  Reconfigure the firewall to allow incoming connections on TCP port 988 (for socklnd only), or temporarily disable it and fix up later</li> systemctl stop firewalld systemctl disable firewalld Start servers </li> systemctl start lustre <li>Mount client </li> mkdir /mnt/lustre/client mount -t lustre $(hostname):/lustre /mnt/lustre/client </ol> </ol>

Build ZFS
HowTo build a custom ZFS:

<ol> <li>Prepare System</li> <ol> <li>Disable SELinux </li> sed -i '/^SELINUX=/s/.*/SELINUX=disabled/' /etc/selinux/config <li>Install the kernel development tools</li> yum -y groupinstall 'Development Tools' yum -y install epel-release <li>Install additional dependencies</li> yum -y install parted lsscsi wget ksh yum -y install kernel-devel zlib-devel libattr-devel yum -y install libuuid-devel libblkid-devel libselinux-devel libudev-devel yum -y install device-mapper-devel openssl-devel </ol> <li>Clone Git-repositories</li> git clone https://github.com/zfsonlinux/spl.git git clone https://github.com/zfsonlinux/zfs.git <li>Perform all the following steps for both directories (complete spl first) </li> <ol> <li>Configure for specific system</li> cd <spl|zfs> ./autogen.sh ./configure --with-spec=redhat <li>Build RPMs in both directories</li> <ul> <li>kmod</li> make pkg-utils pkg-kmod <li>dkms</li> make pkg-utils rpm-dkms </ul> <li>Install RPMs </li> yum localinstall *.$(arch).rpm </ol> </ol>

Helpful links

 * http://zfsonlinux.org/lustre-configure-single.html
 * https://github.com/chaos/lustre/commit/04a38ba7 - ZFS and HA

This version applies to systems with JBODs where ZFS manages the disk directly without a Dell Raid Controller in between. This guide is very specific for a single installation at UW SSEC: versions have changed, and we use puppet to provide various software packages and configurations. However, it is included as some information may be useful to others.


 * 1) Lustre Server Prep Work
 * 2) OS Installation (RHEL6)
 * 3) You must use the RHEL/Centos 6.4 Kernel 2.6.32-358
 * 4) Use the "lustre" kickstart option which installs a 6.4 kernel
 * 5) Define the host in puppet so that it is not a default host - NOTE: We Use Puppet at SSEC to distribute various required packages, other environments will vary!
 * 6) Lustre 2.4 installation
 * 7) Puppet Modules Needed
 * zfs-repo
 * lustre-healthcheck
 * ib-mellanox
 * check_mk_agent-ssec
 * puppetConfigFile
 * lustre-shutdown
 * nagios_plugins
 * lustre24-server-zfs
 * selinux-disable
 * 1) Configure Metadata Controller
 * 2) Map metadata drives to enclosures (with scripts to help)
 * 3) For our example mds system we made aliases for 'ssd0' ssd1 ssd2 and ssd3
 * 4) put these in /etc/zfs/vdev_id.conf - for example:
 * 5) alias arch03e07s6 /dev/disk/by-path/pci-0000:04:00.0-sas-0x5000c50056b69199-lun-0
 * 6) run udevadm trigger to load drive aliases
 * 7) On metadata controller, run mkfs.lustre to create metadata partition. On our example system:
 * 8) Use separate MGS for multiple filesystems on same metadata server.
 * 9) Separate MGS: mkfs.lustre --mgs --backfstype=zfs lustre-meta/mgs mirror d2 d3 mirror d4 d5
 * 10) Separate MDT: mkfs.lustre --fsname=arcdata1 --mdt --mgsnode=172.16.23.14@o2ib --backfstype=zfs lustre-meta/arcdata1-meta
 * 11) Create /etc/ldev.conf and add the metadata partition. On example system, we added:
 * 12) geoarc-2-15 - MGS 	zfs:lustre-meta/mgs geoarc-2-15 - arcdata-MDT0000 zfs:lustre-meta/arcdata-meta
 * 13) Create /etc/modprobe.d/lustre.conf
 * 14) options lnet networks="o2ib" routes="tcp metadataip@o2ib0 172.16.24.[220-229]@o2ib0"
 * 15) NOTE: if you do not want routing, or if you are having trouble with setup, the simple options lnet networks="o2ib" is fine
 * 16) Start Lustre. If you have multiple metadata mounts, you can just run service lustre start.
 * 17) Add lnet service to chkconfig and ensure on startup. We may want to leave lustre off on startup for metadata controllers.


 * 1) Configure OSTs
 * 2) Map drives to enclosures (with scripts to help!)
 * 3) Run udevadm trigger to load drive aliases.
 * 4) mkfs.lustre on MD1200s.
 * 5) Example RAIDZ2 on one MD1200: mkfs.lustre --fsname=cove --ost --backfstype=zfs --index=0 --mgsnode=172.16.24.12@o2ib lustre-ost0/ost0 raidz2 e17s0 e17s1 e17s2 e17s3 e17s4 e17s5 e17s6 e17s7 e17s8 e17s9 e17s10 e17s11
 * 6) Example RAIDZ2 with 2 disks from each enclosure, 5 enclosures (our cove test example): mkfs.lustre --fsname=cove --ost --backfstype=zfs --index=0 --mgsnode=172.16.24.12@o2ib lustre-ost0/ost0 raidz2 e13s0 e13s1 e15s0 e15s1 e17s0 e17s1 e19s0 e19s1 e21s0 e21s1
 * 7) Repeat as necessary for additional enclosures.
 * 8) Create /etc/ldev.conf
 * 9) Example on lustre2-8-11:
 * 10) lustre2-8-11 - cove-OST0000    zfs:lustre-ost0/ost0  lustre2-8-11 - cove-OST0001     zfs:lustre-ost1/ost1  lustre2-8-11 - cove-OST0002     zfs:lustre-ost2/ost2
 * 11) Start OSTs. Example: service lustre start. Repeat as necessary for additional enclosures.
 * 12) Add services to chkconfig and setup.
 * 13) Configure backup metadata controller (future)
 * 14) Mount the Lustre file system on clients
 * 15) Add entry to /etc/fstab. With our example system, our fstab entry is:
 * 16) 172.16.24.12@o2ib:/cove        /cove            lustre  defaults,_netdev,user_xattr       0 0
 * 17) Create empty folder for mountpoint, and mount file system (e.g., mkdir /cove; mount /cove).