The Lustre target file system ldiskfs (based on ext4) offers no guarantee of data integrity. To improve the reliability and resilience of the backing filesystem on the OST and MDT components, Lustre added support for using ZFS as backing filesystem for both OSTs and MDTs.
Lustre ZFS targets offer a number of advantages, such as improved data integrity with transaction-based, copy-on-write operations, snapshots, and persistent data and metadata checksums every write.
Copy-on-write means that ZFS never overwrites existing data. Changed information is written to a new block and the block pointer to in-use data is only moved after the write transaction is completed. This mechanism is used all the way up to the file system block structure at the top block.
To avoid data corruption, ZFS computes checksums of all data and metadata in the filesystem. The checksum is not stored with the data block, but rather in the pointer to the block. All checksums are done in server memory, so errors not caught by other file systems are detected in ZFS, such as:
- Phantom writes, where the write is dropped on the floor.
- Misdirected reads or writes, where the disk accesses the wrong block.
- DMA parity errors between the array and server memory or from the driver, since the checksum validates data inside the array.
- Driver errors, where data winds up in the wrong buffer inside the kernel.
- Accidental overwrites, such as swapping to a live file system.
Lustre support of ZFS offers several specific advantages:
- Self-healing capability - In a mirrored or RAID configuration, ZFS not only detects data corruption, but it automatically corrects the bad data.
- Improved administration - Because ZFS detects and reports data corruption on all read and write errors at the block level, it is easier for system administrators to quickly identify which hardware components are corrupting data. ZFS also has very easy-to-use command-line administration utilities.
- Hybrid storage support - ZFS supports the addition of high-speed I/O devices, such as SSDs, to the same storage pool as HDDs. The Read Cache Pool, or L2ARC, acts as a read-only cache layer between memory and the disk. This support can substantially improve the performance of random read operations. SSDs can also be used to improve metadata write performance, by adding them to the pool as "special" VDEV devices that store only metadata and small files. You can add as many SSDs to your storage pool as you need to increase your read cache size and IOPS, your metadata write IOPS, or both.
- Scalability - ZFS is a 128-bit file system. This means that it can scale to very large file systems for a single MDT or OST, and maximum size of a single file will be removed.