Views
| Page | Discussion | View source | History |
Architecture - Wide Striping
From Lustre.org
Note: The content on this page reflects the state of design of a Lustre feature at a particular point in time and may contain outdated information.
There are several use cases where Lustre wants to write exceptionally many stripes in files:
- Major HPC installations may have many hundreds or thousands of OSTs and we need to be able to stripe files over all of them
- Server Network Striping (SNS) will use parity declustering, resulting in an very large number of objects building up the striped file.
Therefore, wide striping will be a commonly encountered case. The goal is to encode the striping information in a very compact way.
Definitions (see fid-hld)
- A pool
- defines an un-ordered sets of OSTs and will be used to describe the striping in a manageable way.
- fid seq number
- part of fully specified FID, contains sequence in which object was created
- fid number
- part of fully specified FID, contains object id within its sequence
- object version
- part of fully specified FID, contains object version number
- FID
- fully specified object identification structure: FID = {f-sequence, f-number, f-version}
- FLDB
- FID Location DataBase, provides fid sequence to server (OST, MDS) mapping
APIs required
- Get a consecutive set of fid sequence numbers from the FLDB
- define an on-disk EA that contains a pool name and other RAID striping parameters, for use as a default directory EA
- define an on-disk EA that contains a RAID type, raid parameters, a starting fid sequence number, a count of objects over which the object may be striped, a sequence skip count, a single fid number used by this file in all specified sequences, the object version, possibly the pool from which this object was allocated (for future reference)
- offsets within the file are {lov_offset, stripe_index} = fn(file_offset, raid_type, raid_parameters}
- individual objects OBJ{0, ..., num_obj - 1} in the file can be located:
- OST(stripe_idx) = FLDB(seq_start + stripe_idx*seq_skip)
- OBJ(stripe_idx) = FID{seq_start + stripe_idx*seq_skip,fid_number,obj_version}

