Progressive File Layouts

From Lustre Wiki
Revision as of 15:57, 11 January 2017 by Adilger (talk | contribs) (→‎Phase 3b: Dynamic Layout Implementation: add description of components)
Jump to navigation Jump to search

The Lustre Progressive File Layout (PFL) feature intends to simplify the use of Lustre so that users can expect reasonable performance for a variety of normal file IO patterns without the need to explicitly understand their IO model or Lustre usage details in advance. In particular, users do not necessarily need to know the size or concurrency of output files in advance of their creation and explicitly specify an optimal layout for each file in order to achieve good performance for both highly concurrent shared-single-large-file IO or parallel IO to many smaller per-process files.

The PFL feature is implemented in several phases, providing incremental functionality with each phase, including the base functionality of Composite layouts which can be used for several other features that affect the file layout.

Phase 1: Prototype Implementation

Phase 2: Static Layout Implementation

The Static PFL Implementation will provide a functional implementation that allows specifying the full layout using standard user tools and addresses any shortcuts and/or defects in the Prototype implementation. The following functionality was implemented:

  • PFL2 Scope Statement describes the overall goals and intended outcomes of the production implementation
  • PFL2 Solution Architecture describes how the goals of the PFL project may be implemented, and how to measure the completion and outcomes
  • PFL2 High Level Design describes the implementation details for the PFL feature
  • Implement improved layout handling APIs
  • Address technical debt from prototype phase
  • Implement RPCs for modifying composite layouts (need Layout APIs)
  • Server composite layout support (need Layout APIs)

Phase 3a: PFL Usability Improvements

Server LOD support for composite layouts

On the MDS, the Logical Object Device (LOD) manages the operational aspects of files that components on multiple MDTs. The LOD component will primarily be concerned with the creation of new files using progressive layouts. In some cases, it will need to decode the layout and interact with objects one at a time for operations such as unlink, setattr, and LFSCK. The LOD code will also handle layout modification RPCs arriving from the clients, both when the file is idle and while it is in use by multiple clients.

LFSCK support for composite layouts

The Lustre File System Checker (LFSCK) verifies the structure of a Lustre filesystem, ensuring that the file layout on the MDT matches the objects located on the OST(s), and reconstructing the filesystem structure if it should become inconsistent or corrupted. In order to be able to do this, LFSCK needs to be able to understand the file's layout stored on the MDT object inode. Also, the OST objects need to store information about its part of the file layout so that the layout can be rebuilt if needed. With the addition of composite file layouts, LFSCK needs to be enhanced to support the new layout type, and the OST on-disk format needs to be extended so that OST objects can be identified as part of the correct component of the layout.

Default layout inheritance

In order to realize the full benefits of PFL, the progressive layout extents should not create OST objects until the size of the file grows sufficiently to need those objects. However, it is also necessary to be able to specify the layout template for the whole file at file creation time, so that the user or administrator can get the performance profile desired as the file is written.

It should be possible to specify a default layout template on a directory that is inherited by new files and subdirectories created within that directory. If no default layout template is specified on the parent directory, it should also be possible to inherit the filesystem-wide default layout template when a file is created.

Phase 3b: Dynamic Layout Implementation

Composite file templates

In order to realize the full benefits of PFL, the progressive layout extents should not allocate OST objects until the size of the file grows sufficiently to need those objects. However, it is also necessary to be able to specify the layout template for the whole file at file creation time without allocating OST objects for all components, so that the user or administrator can get the performance profile desired as file size grows during writes.

It should be possible to specify a default layout template on a directory that is inherited by new files and subdirectories created within that directory. If no default layout template is specified on the parent directory, it should inherit the filesystem-wide default layout template when a file is created.

Dynamic layout instantiation based on file offset

In order to simplify implementation, this project will focus on implementing composite layouts that are grown by allocating objects in non-overlapping layout extents at the end of the file, and will not implement modification of already allocated layout extents containing data.

The client IO (CLIO) layer needs to be able to manage the growth of the file layout by reconfiguring its IO stack to add new OST objects into the layout. The client will request that the MDS instantiate OST objects based on the layout template before it begins writing to a file offset beyond the currently instantiated layout components. The layout generation stored in the composite layout and in each layout extent will allow CLIO to detect whether a specific layout extent has been modified when the lock is revoked. Since the existing components of the file layout will not be modified for PFL files, any in-flight IO operations and cached data do not need to be interrupted.

Improved MDS object allocator

The current MDS object allocator is designed only to allocate objects for one file at the time the file is first created. For progressive file layouts, at a minimum the allocator will need to be enhanced in order to avoid allocating objects on OSTs that are already part of a file's other components. If files have multiple objects allocated to the same OSTs before objects are allocated from unused OSTs, there may be a significant performance loss due to oversubscribing the bandwidth on that OST compared to the other OSTs. The only exception may be for a fully-striped component at the end of the file (see #Example Progressive Layouts for more detail), where it would be acceptable to allocate objects across all of the available OSTs to maximize the bandwidth available for the file.