C H A P T E R  15

Backup and Restore

Lustre provides backups at the file system-level, device-level and file-level. This chapter describes how to backup and restore on Lustre, and includes the following sections:


15.1 Backing up a File System

Backing up a complete file system gives you full control over the files to back up, and allows restoration of individual files as needed. File system-level backups are also the easiest to integrate into existing backup solutions.

File system backups are performed from a Lustre client (or many clients working parallel in different directories) rather than on individual server nodes; this is no different than backing up any other file system.

However, due to the large size of most Lustre file systems, it is not always possible to get a complete backup. We recommend that you back up subsets of a file system. This includes subdirectories of the entire file system, filesets for a single user, files incremented by date, and so on.


15.2 Backing up a Device (MDS or OST)

In some cases, it is useful to do a full, device-level backup of an individual device (MDS or OST), before replacing hardware, performing maintenance, etc. Doing full device-level backups ensures that all of the data is preserved in the original state and is the easiest method of doing a backup.



Note - A device-level backup of the MDS is especially important because, if it fails permanently, the entire file system would need to be restored.


If hardware replacement is the reason for the backup or if a spare storage device is available, it is possible to do a raw copy of the MDS or OST from one block device to the other, as long as the new device is at least as large as the original device. To do this, run:

dd if=/dev/{original} of=/dev/{new} bs=1M

If hardware errors cause read problems on the original device, use the command below to allow as much data as possible to be read from the original device while skipping sections of the disk with errors:

dd if=/dev/{original} of=/dev/{new} bs=4k conv=sync,noerror count={original size in 4kB blocks}

Even in the face of hardware errors, the ext3 file system is very robust and it may be possible to recover the file system data after running e2fsck -f on the new device.

15.2.1 Backing Up the MDS

This procedure provides another way to back up the MDS.

1. Make a mount point for the file system. Run:

mkdir -p /mnt/mds

2. Mount the file system. Run:

mount -t ldiskfs {mdsdev} /mnt/mds

3. Change to the mount point being backed up. Run:

cd /mnt/mds

4. Back up the EAs. Run:

getfattr -R -d -m '.*' -P . > ea.bak


Note - In most distributions, the getfattr command is part of the "attr" package. If the getfattr command returns errors like Operation not supported, then the kernel does not correctly support EAs. Stop and use a different backup method or contact us for assistance.


5. Verify that the ea.bak file has properly backed up the EA data on the MDS. Without this EA data, the backup is not useful. Look at this file with "more" or a text editor. For each file, it should have an item similar to this:

# file: ROOT/mds_md5sum3.txt
trusted.lov=0s0AvRCwEAAABXoKUCAAAAAAAAAAAAAAAAAAAQAAEAAADD5QoAAAAAAAAAAAAAAAAAAAAAAAEAAAA=

6. Back up all file system data. Run:

tar czvf {backup file}.tgz --sparse .


Note - In Lustre 1.6.7 and later, the --sparse option reduces the size of the backup file. Be sure to use it so the tar command does not mistakenly create an archive full of zeros.


7. Change directory out of the mounted file system. Run:

cd -

8. Unmount the file system. Run:

umount /mnt/mds


Note - When restoring an MDT backup on a different node as part of an MDT migration, you also have to change server NIDs and use the --writeconf command to re-generate the configuration logs. See Changing a Server NID and Regenerating Lustre Configuration Logs.


15.2.2 Backing Up an OST

Follow the same procedure as Backing Up the MDS (except skip Step 5) and, for each OST device file system, replace mds with ost in the commands.


15.3 Backing up Files

In other cases, it is desirable to back up only the file data on an MDS or OST instead of the entire device, e.g., if the device is very large but has little data in it, if the configuration of the parameters of the ext3 filesystem need to be changed, to use less space for the backup, etc.

In this situation, it is possible to mount the ext3 filesystem directly from the storage device, and do a file-level backup. Lustre MUST STOP be stopped on this node.

15.3.1 Backing up Extended Attributes

In Lustre, each OST object has an extended attribute (EA) that contains the MDT inode number and stripe index for the object. The EA’s striping information includes the location of file data on the OSTs and OST pool membership. The EA data must be backed up or the file backup will not be useful. Current backup tools do not properly save the EA data, so the following extra steps are required.

1. Make a mountpoint for the file system.

mkdir /mnt/mds

2. Mount the filesystem.

mount -t ldiskfs {olddev} /mnt/mds

3. Change to the mountpoint being backed up.

cd /mnt/mds

4. Back up the extended attributes.

getfattr -R -d -m '.*' -P . > ea.bak

In most distributions, the getfattr command is part of the "attr" package. If the getfattr command returns errors like "Operation not supported", then your kernel does not support EAs correctly. Stop and use a different backup method or submit a Bugzilla ticket.

5. Verify that the ea.bak file has properly backed up the EA data on the MDS. You can look at this file with "more" or a text editor. For each file, it should have an item similar to this

# file: ROOT/mds_md5sum3.txt 
trusted.lov=0s0AvRCwEAAABXoKUCAAAAAAAAQAAEAAADD5QoAAAAAAAAAAEAAAA=

6. Back up all file system data.

tar czvf {backup file}.tgz --sparse .

7. Change out of the mounted file system.

cd -

8. Unmount the file system.

umount /mnt/mds

9. Print the file system label and write it down.

e2label {olddev}

The same process should be followed on each MDS or OST file system.


15.4 Restoring from a File-level Backup

To restore data from a file-level backup, you need to format the device, restore the file data and then restore the EA data.

1. Format the new device. Run:

mkfs.lustre {--mdt|--ost} {other options} {newdev}

2. Mount the file system. Run:

mount -t ldiskfs {newdev} /mnt/mds

3. Change to the new file system mount point. Run:

cd /mnt/mds

4. Restore the file system backup. Run:

tar xzvpf {backup file} --sparse

5. Restore the file system extended attributes. Run:

setfattr --restore=ea.bak

6. Verify that the extended attributes were restored. If this is not correct, then all data in the files will be lost, and would show up as all files in the filesystem having zero length.

getfattr -d -m ".*" ROOT/mds_md5sum3.txt
trusted.lov=0s0AvRCwEAAABXoKUCAAAAAAAAQAAEAAADD5QoAAAAAAAAAEAAAA=

7. Remove the (now invalid) recovery logs. Run:

rm OBJECTS/* CATALOGS

8. Change out of the MDS file system.

cd -

9. Unmount the MDS file system.

umount /mnt/mds

If the file system was used between the time the backup was made and when it was restored, then the lfsck tool (part of Lustre e2fsprogs) can be run to ensure the file system is coherent. If all of the device file systems were backed up at the same time after the entire Lustre file system was stopped, this is not necessary. The file system should be immediately usable even if lfsck is not run, though there will be I/O errors reading from files that are present on the MDS but not the OSTs, and files that were created after the MDS backup will not be accessible/visible.


15.5 Using LVM Snapshots with Lustre

If you want to perform disk-based backups (because, for example, access to the backup system needs to be as fast as to the primary Lustre file system), you can use the Linux LVM snapshot tool to maintain multiple, incremental file system backups.

Because LVM snapshots cost CPU cycles as new files are written, taking snapshots of the main Lustre file system will probably result in unacceptable performance losses. You should create a new, backup Lustre file system and periodically (e.g., nightly) back up new/changed files to it. Periodic snapshots can be taken of this backup file system to create a series of "full" backups.



Note - Creating an LVM snapshot is not as reliable as making a separate backup, because the LVM snapshot shares the same disks as the primary MDT device, and depends on the primary MDT device for much of its data. If the primary MDT device becomes corrupted, this may result in the snapshot being corrupted.


15.5.1 Creating an LVM-based Backup File System

Use this procedure to create a backup Lustre file system for use with the LVM snapshot mechanism.

1. Create LVM volumes for the MDT and OSTs.

Create LVM devices for your MDT and OST targets. Make sure not to use the entire disk for the targets; save some room for the snapshots. The snapshots start out as 0 size, but grow as you make changes to the current file system. If you expect to change 20% of the file system between backups, the most recent snapshot will be 20% of the target size, the next older one will be 40%, etc. Here is an example:

cfs21:~# pvcreate /dev/sda1
	Physical volume "/dev/sda1" successfully created
cfs21:~# vgcreate volgroup /dev/sda1
	Volume group "volgroup" successfully created
cfs21:~# lvcreate -L200M -nMDT volgroup
	Logical volume "MDT" created
cfs21:~# lvcreate -L200M -nOST0 volgroup
	Logical volume "OST0" created
cfs21:~# lvscan
	ACTIVE				'/dev/volgroup/MDT' [200.00 MB] inherit
	ACTIVE				'/dev/volgroup/OST0' [200.00 MB] inherit

2. Format the LVM volumes as Lustre targets.

In this example, the backup file system is called “main” and designates the current, most up-to-date backup.

cfs21:~# mkfs.lustre --mdt --fsname=main /dev/volgroup/MDT
 No management node specified, adding MGS to this MDT.
    Permanent disk data:
 Target:     main-MDTffff
 Index:      unassigned
 Lustre FS:  main
 Mount type: ldiskfs
 Flags:      0x75
               (MDT MGS needs_index first_time update )
 Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
 Parameters:
checking for existing Lustre data
 device size = 200MB
 formatting backing filesystem ldiskfs on /dev/volgroup/MDT
         target name  main-MDTffff
         4k blocks     0
         options -i 4096 -I 512 -q -O dir_index -F
 mkfs_cmd = mkfs.ext2 -j -b 4096 -L main-MDTffff  -i 4096 -I 512 -q -O dir_index -F /dev/volgroup/MDT
 Writing CONFIGS/mountdata
cfs21:~# mkfs.lustre --ost --mgsnode=cfs21 --fsname=main /dev/volgroup/OST0
    Permanent disk data:
 Target:     main-OSTffff
Index:      unassigned
 Lustre FS:  main
 Mount type: ldiskfs
 Flags:      0x72
               (OST needs_index first_time update )
 Persistent mount opts: errors=remount-ro,extents,mballoc
 Parameters: mgsnode=192.168.0.21@tcp
checking for existing Lustre data
 device size = 200MB
 formatting backing filesystem ldiskfs on /dev/volgroup/OST0
         target name  main-OSTffff
         4k blocks     0
         options -I 256 -q -O dir_index -F
 mkfs_cmd = mkfs.ext2 -j -b 4096 -L main-OSTffff  -I 256 -q -O dir_index -F /dev/ volgroup/OST0
 Writing CONFIGS/mountdata
cfs21:~# mount -t lustre /dev/volgroup/MDT /mnt/mdt
cfs21:~# mount -t lustre /dev/volgroup/OST0 /mnt/ost
cfs21:~# mount -t lustre cfs21:/main /mnt/main

15.5.2 Backing up New/Changed Files to the Backup File System

At periodic intervals e.g., nightly, back up new and changed files to the LVM-based backup file system.

cfs21:~# cp /etc/passwd /mnt/main 
 
cfs21:~# cp /etc/fstab /mnt/main 
 
cfs21:~# ls /mnt/main 
fstab  passwd

15.5.3 Creating Snapshot Volumes

Whenever you want to make a "checkpoint" of the main Lustre file system, create LVM snapshots of all target MDT and OSTs in the LVM-based backup file system. You must decide the maximum size of a snapshot ahead of time, although you can dynamically change this later. The size of a daily snapshot is dependent on the amount of data changed daily in the main Lustre file system. It is likely that a two-day old snapshot will be twice as big as a one-day old snapshot.

You can create as many snapshots as you have room for in the volume group. If necessary, you can dynamically add disks to the volume group.

The snapshots of the target MDT and OSTs should be taken at the same point in time. Make sure that the cronjob updating the backup file system is not running, since that is the only thing writing to the disks. Here is an example:

cfs21:~# modprobe dm-snapshot
cfs21:~# lvcreate -L50M -s -n MDTb1 /dev/volgroup/MDT
   Rounding up size to full physical extent 52.00 MB
   Logical volume "MDTb1" created
cfs21:~# lvcreate -L50M -s -n OSTb1 /dev/volgroup/OST0
   Rounding up size to full physical extent 52.00 MB
   Logical volume "OSTb1" created

After the snapshots are taken, you can continue to back up new/changed files to "main". The snapshots will not contain the new files.

cfs21:~# cp /etc/termcap /mnt/main
cfs21:~# ls /mnt/main
fstab  passwd  termcap

15.5.4 Restoring the File System From a Snapshot

Use this procedure to restore the file system from an LVM snapshot.

1. Rename the LVM snapshot.

Rename the file system snapshot from "main" to "back" so you can mount it without unmounting "main". This is recommended, but not required. Use the --reformat flag to tunefs.lustre to force the name change. For example:

cfs21:~# tunefs.lustre --reformat --fsname=back --writeconf /dev/volgroup/MDTb1
 checking for existing Lustre data
 found Lustre data
 Reading CONFIGS/mountdata
Read previous values:
 Target:     main-MDT0000
 Index:      0
 Lustre FS:  main
 Mount type: ldiskfs
 Flags:      0x5
              (MDT MGS )
 Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
 Parameters:
Permanent disk data:
 Target:     back-MDT0000
 Index:      0
 Lustre FS:  back
 Mount type: ldiskfs
 Flags:      0x105
              (MDT MGS writeconf )
 Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
 Parameters:
Writing CONFIGS/mountdata
cfs21:~# tunefs.lustre --reformat --fsname=back --writeconf /dev/volgroup/OSTb1
 checking for existing Lustre data
 found Lustre data
 Reading CONFIGS/mountdata
Read previous values:
 Target:     main-OST0000
 Index:      0
 Lustre FS:  main
 Mount type: ldiskfs
 Flags:      0x2
              (OST )
 Persistent mount opts: errors=remount-ro,extents,mballoc
 Parameters: mgsnode=192.168.0.21@tcp
Permanent disk data:
 Target:     back-OST0000
 Index:      0
 Lustre FS:  back
 Mount type: ldiskfs
 Flags:      0x102
              (OST writeconf )
 Persistent mount opts: errors=remount-ro,extents,mballoc
 Parameters: mgsnode=192.168.0.21@tcp
Writing CONFIGS/mountdata
When renaming an FS, we must also erase the last_rcvd file from the snapshots
cfs21:~# mount -t ldiskfs /dev/volgroup/MDTb1 /mnt/mdtback
 cfs21:~# rm /mnt/mdtback/last_rcvd
 cfs21:~# umount /mnt/mdtback
 cfs21:~# mount -t ldiskfs /dev/volgroup/OSTb1 /mnt/ostback
 cfs21:~# rm /mnt/ostback/last_rcvd
 cfs21:~# umount /mnt/ostback

2. Mount the file system from the LVM snapshot.

For example:

 cfs21:~# mount -t lustre /dev/volgroup/MDTb1 /mnt/mdtback                                                                              
 cfs21:~# mount -t lustre /dev/volgroup/OSTb1 /mnt/ostback
 cfs21:~# mount -t lustre cfs21:/back /mnt/back

3. Note the old directory contents, as of the snapshot time.

For example:

cfs21:~/cfs/b1_5/lustre/utils# ls /mnt/back
 fstab  passwds

15.5.5 Deleting Old Snapshots

To reclaim disk space, you can erase old snapshots as your backup policy dictates. Run:

lvremove /dev/volgroup/MDTb1

15.5.6 Changing Snapshot Volume Size

You can also extend or shrink snapshot volumes if you find your daily deltas are smaller or larger than expected. Run:

lvextend -L10G /dev/volgroup/MDTb1


Note - Extending snapshots seems to be broken in older LVM. It is working in LVM v2.02.01.