Lustre with ZFS Install

From Lustre Wiki
Jump to: navigation, search

Introduction

This page is an attempt to provide some information on how to install Lustre with a ZFS backend. You are encouraged to add your own version, either as a separate section or by editing this page into a general guide.


Build Lustre with ZFS (state 13.12.2016)

The following guides are valid for CentOS 7.

Build Lustre

HowTo build a one-node Lustre file system with ZFS backend:

  1. Prepare System
    1. Disable SELinux for older clients
    2. sed -i '/^SELINUX=/s/.*/SELINUX=disabled/' /etc/selinux/config 
    3. Install the kernel development tools
    4. yum -y groupinstall 'Development Tools'
      yum -y install epel-release
    5. Install additional dependencies
    6. yum -y install xmlto asciidoc elfutils-libelf-devel zlib-devel 
      yum -y install binutils-devel newt-devel python-devel hmaccalc perl-ExtUtils-Embed 
      yum -y install bison elfutils-devel  audit-libs-devel python-docutils sg3_utils expect 
      yum -y install attr lsof quilt libselinux-devel 
  2. Prepare ZFS backend (follow the guide for packaged ZFS or go to this section for custom ZFS build)
    1. EPEL release
    2. URL='http://archive.zfsonlinux.org'
      yum -y install --nogpgcheck $URL/epel/zfs-release.el7.noarch.rpm
    3. For the newest Lustre releases change /etc/yum.repos.d/zfs.repo to switch from dkms to kmod (more info [1] and [2])
    4. URL='http://archive.zfsonlinux.org'
       [zfs]
       name=ZFS on Linux for EL 7 - dkms
       baseurl=http://download.zfsonlinux.org/epel/7/$basearch/
      -enabled=1
      +enabled=0
       metadata_expire=7d
       gpgcheck=1
       gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux
      @@ -9,7 +9,7 @@
       [zfs-kmod]
       name=ZFS on Linux for EL 7 - kmod
       baseurl=http://download.zfsonlinux.org/epel/7/kmod/$basearch/
      -enabled=0
      +enabled=1
       metadata_expire=7d
       gpgcheck=1
       gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-zfsonlinux
    5. Install ZFS and its associated SPL packages
      • kmod packages for newer releases
      • yum install -y zfs libzfs2-devel kmod-spl-devel kmod-zfs-devel
      • dkms packages for dkms support
      • yum install -y zfs libzfs2-devel zfs-dkms
  3. Build Lustre
    1. Get Lustre source code
    2. git clone git://git.hpdd.intel.com/fs/lustre-release.git
    3. Configure (--disable-ldiskfs for ZFS backend, --without-server for client only)
    4. cd lustre-release/
      sh ./autogen.sh
      ./configure --disable-ldiskfs
    5. Make and install rpms
    6. make rpms
      yum -y install *.$(arch).rpm
  4. You may need to reboot and to explicitly load the ZFS and Lustre modules
  5. reboot
    modprobe zfs
    modprobe lustre
  6. Format targets (change /tmp in this example to real devices or partitions and you may not need the device size then)
  7. mkfs.lustre --mgs --backfstype=zfs --fsname=lustre --device-size=1048576 lustre-mgs/mgs /tmp/lustre-mgs
    mkfs.lustre --mdt --backfstype=zfs --fsname=lustre --index=0 --mgsnode=$(hostname)@tcp --device-size=1048576 lustre-mdt0/mdt0 /tmp/lustre-mdt0
    mkfs.lustre --ost --backfstype=zfs --fsname=lustre --index=0 --mgsnode=$(hostname)@tcp --device-size=1048576 lustre-ost0/ost0 /tmp/lustre-ost0
    • Change /etc/ldev.conf
    hostname - mgs     zfs:lustre-mgs/mgs
    hostname - mdt0    zfs:lustre-mdt0/mdt0
    hostname - ost0    zfs:lustre-ost0/ost0
  8. Run Lustre
    1. Reconfigure the firewall to allow incoming connections on TCP port 988 (for socklnd only), or temporarily disable it and fix up later
    2. systemctl stop firewalld
      systemctl disable firewalld 
    3. Start servers
    4. systemctl start lustre
    5. Mount client
    6. mkdir /mnt/lustre/client
      mount -t lustre $(hostname):/lustre /mnt/lustre/client

Build ZFS

HowTo build a custom ZFS:

  1. Prepare System
    1. Disable SELinux
    2. sed -i '/^SELINUX=/s/.*/SELINUX=disabled/' /etc/selinux/config 
    3. Install the kernel development tools
    4. yum -y groupinstall 'Development Tools'
      yum -y install epel-release
    5. Install additional dependencies
    6. yum -y install parted lsscsi wget ksh
      yum -y install kernel-devel zlib-devel libattr-devel 
      yum -y install libuuid-devel libblkid-devel libselinux-devel libudev-devel
      yum -y install device-mapper-devel 
  2. Clone Git-repositories
  3. git clone https://github.com/zfsonlinux/spl.git
    git clone https://github.com/zfsonlinux/zfs.git
  4. Perform all the following steps for both directories (complete spl first)
    1. Configure for specific system
    2. cd <spl|zfs>
      ./autogen.sh
      ./configure --with-spec=redhat
    3. Build RPMs in both directories
      • kmod
      • make pkg-utils pkg-kmod
      • dkms
      • make pkg-utils rpm-dkms
    4. Install RPMs
    5. yum localinstall *.$(arch).rpm

SSEC Example

Helpful links

This version applies to systems with JBODs where ZFS manages the disk directly without a Dell Raid Controller in between. This guide is very specific for a single installation at UW SSEC: versions have changed, and we use puppet to provide various software packages and configurations. However, it is included as some information may be useful to others.

  1. Lustre Server Prep Work
    1. OS Installation (RHEL6)
      1. You must use the RHEL/Centos 6.4 Kernel 2.6.32-358
      2. Use the "lustre" kickstart option which installs a 6.4 kernel
      3. Define the host in puppet so that it is not a default host - NOTE: We Use Puppet at SSEC to distribute various required packages, other environments will vary!
    2. Lustre 2.4 installation
      1. Puppet Modules Needed
  • zfs-repo
  • lustre-healthcheck
  • ib-mellanox
  • check_mk_agent-ssec
  • puppetConfigFile
  • lustre-shutdown
  • nagios_plugins
  • lustre24-server-zfs
  • selinux-disable
  1. Configure Metadata Controller
    1. Map metadata drives to enclosures (with scripts to help)
    2. For our example mds system we made aliases for 'ssd0' ssd1 ssd2 and ssd3
      1. put these in /etc/zfs/vdev_id.conf - for example:
      2. alias arch03e07s6 /dev/disk/by-path/pci-0000:04:00.0-sas-0x5000c50056b69199-lun-0
    3. run udevadm trigger to load drive aliases
    4. On metadata controller, run mkfs.lustre to create metadata partition. On our example system:
      1. Use separate MGS for multiple filesystems on same metadata server.
      2. Separate MGS: mkfs.lustre --mgs --backfstype=zfs lustre-meta/mgs mirror d2 d3 mirror d4 d5
      3. Separate MDT: mkfs.lustre --fsname=arcdata1 --mdt --mgsnode=172.16.23.14@o2ib --backfstype=zfs lustre-meta/arcdata1-meta
      4. Create /etc/ldev.conf and add the metadata partition. On example system, we added:
        1. geoarc-2-15 - MGS zfs:lustre-meta/mgs geoarc-2-15 - arcdata-MDT0000 zfs:lustre-meta/arcdata-meta
      5. Create /etc/modprobe.d/lustre.conf
        1. options lnet networks="o2ib" routes="tcp metadataip@o2ib0 172.16.24.[220-229]@o2ib0"
        2. NOTE: if you do not want routing, or if you are having trouble with setup, the simple options lnet networks="o2ib" is fine
    5. Start Lustre. If you have multiple metadata mounts, you can just run service lustre start.
    6. Add lnet service to chkconfig and ensure on startup. We may want to leave lustre off on startup for metadata controllers.
  1. Configure OSTs
    1. Map drives to enclosures (with scripts to help!)
    2. Run udevadm trigger to load drive aliases.
    3. mkfs.lustre on MD1200s.
      1. Example RAIDZ2 on one MD1200: mkfs.lustre --fsname=cove --ost --backfstype=zfs --index=0 --mgsnode=172.16.24.12@o2ib lustre-ost0/ost0 raidz2 e17s0 e17s1 e17s2 e17s3 e17s4 e17s5 e17s6 e17s7 e17s8 e17s9 e17s10 e17s11
      2. Example RAIDZ2 with 2 disks from each enclosure, 5 enclosures (our cove test example): mkfs.lustre --fsname=cove --ost --backfstype=zfs --index=0 --mgsnode=172.16.24.12@o2ib lustre-ost0/ost0 raidz2 e13s0 e13s1 e15s0 e15s1 e17s0 e17s1 e19s0 e19s1 e21s0 e21s1
    4. Repeat as necessary for additional enclosures.
    5. Create /etc/ldev.conf
      1. Example on lustre2-8-11:
      2. lustre2-8-11 - cove-OST0000 zfs:lustre-ost0/ost0 lustre2-8-11 - cove-OST0001 zfs:lustre-ost1/ost1 lustre2-8-11 - cove-OST0002 zfs:lustre-ost2/ost2
    6. Start OSTs. Example: service lustre start. Repeat as necessary for additional enclosures.
    7. Add services to chkconfig and setup.
  2. Configure backup metadata controller (future)
    1. Mount the Lustre file system on clients
    2. Add entry to /etc/fstab. With our example system, our fstab entry is:
      1. 172.16.24.12@o2ib:/cove /cove lustre defaults,_netdev,user_xattr 0 0
    3. Create empty folder for mountpoint, and mount file system (e.g., mkdir /cove; mount /cove).