Lustre Internals Documentation Update: Difference between revisions

From Lustre Wiki
Jump to navigation Jump to search
Line 5: Line 5:
* Incremental updating of Lustre internals documentation.  
* Incremental updating of Lustre internals documentation.  
* Start with bringing the Understanding Lustre Filesystem Internals document up to date
* Start with bringing the Understanding Lustre Filesystem Internals document up to date
* The available time for those with Lustre internals knowledge is limited, have a low volume mailing list and targeted iterations
* The available time for those with Lustre internals knowledge is limited, find a dynamic that maximizes the efficiency of their time spent engaging with this effort.


=== Online Document ===
=== Online Document ===

Revision as of 10:04, 6 April 2017

Organization

Goals

  • Incremental updating of Lustre internals documentation.
  • Start with bringing the Understanding Lustre Filesystem Internals document up to date
  • The available time for those with Lustre internals knowledge is limited, find a dynamic that maximizes the efficiency of their time spent engaging with this effort.

Online Document

Community web-editable version of Understanding Lustre Filesystem Internals:

https://docs.google.com/document/d/1sbtonyl66h7g5AficO6BLRMeXwAeNp-pwswirWAkgIE

Mailing List

Google Groups Mailing List:

https://groups.google.com/forum/#!forum/lustre-internals-update/

Todo

  • Online Document
    • Overall review, remove out-of-date information
    • Add new sections for subsystems that did not exist in 1.6
    • Section-by-section review
    • Add section with timeline for architectural changes
    • Add HSM section
    • Add DNE coverage
    • Add OSD section (research materials from 2.3-2.4 time-frame)
    • Add ZFS coverage
    • Expand protocol coverage and reference RPC documentation
    • TOC was lost in the conversion but should be able to be auto-generated from sections via the use of styles
  • Wiki Page
    • Review Doxygen comments in code base and reference them here
    • Continue adding references to pertinent LUG/LAD/Lustre-Ecosystem/etc. presentations
  • Organizational
    • Find dynamic for collaboration between those with internals knowledge and those without who want to help
    • Gather existing materials (presentations, etc.)
    • Investigate IO simplification project materials for updates and integration
    • Generate Doxygen comments and put on web (automate?)
    • Find balance for how to coordinate (in document/wiki/group)

Community Feedback

Feedback from discussions with community members on updating Lustre internals documentation in general:

  • Top-down versus bottom-up approach
    • Differing viewpoints on this. Possible to find way to address both?
  • Separate documents (e.g. Understanding Lustre Filesystem Internals document) versus documentation in the code (e.g. Doxygen)
    • Possible to incrementally expand and update both, balance what's in them, and have them mutually reference?

Resources

Doxygen Code Documentation

Lustre Protocol Documentation

Presentations

Sequoia and the ZFS OSD (LUG2013)

Intel® Lustre* File Level Replication (LUG2014)

Distributed Name Space Phase I (LUG2013)

Lustre & Kerberos: in theory and in practice (LUG2015)

Documentation in Lustre Tree

Overview of the Lustre Client I/O (CLIO) subsystem

http://git.hpdd.intel.com/fs/lustre-release.git/blob_plain/HEAD:/Documentation/clio.txt

LFSCK

http://git.hpdd.intel.com/fs/lustre-release.git/blob_plain/HEAD:/Documentation/lfsck.txt

  • LFSCK master slave design
  • Object traversal design reference

Lock Ordering

http://git.hpdd.intel.com/fs/lustre-release.git/blob_plain/HEAD:/Documentation/lock-ordering

/* dot input file for lock-ordering diagram */

Overview of the Lustre Object Storage Device API

http://git.hpdd.intel.com/fs/lustre-release.git/blob_plain/HEAD:/Documentation/osd-api.txt

Overview of Dynamic LNet Configuration

http://git.hpdd.intel.com/fs/lustre-release.git/blob_plain/HEAD:/Documentation/dlc.txt

Lustre versioning

http://git.hpdd.intel.com/fs/lustre-release.git/blob_plain/HEAD:/Documentation/versioning.txt

Old Wiki Architectural Documents

Architecture - Backup
Architecture - CROW
Architecture - CTDB with Lustre
Architecture - Caching OSS
Architecture - Changelogs
Architecture - Changelogs 1.6
Architecture - Client Cleanup
Architecture - Clustered Metadata
Architecture - Commit on Share
Architecture - Cuts
Architecture - DMU OSD
Architecture - DMU Zerocopy
Architecture - End-to-end Checksumming
Architecture - Epochs
Architecture - External File Locking
Architecture - FIDs on OST
Architecture - Feature FS Replication
Architecture - Fileset
Architecture - Flash Cache
Architecture - Free Space Management
Architecture - GNS
Architecture - HSM
Architecture - HSM Migration
Architecture - HSM and Cache
Architecture - IO system
Architecture - Interoperability 1.6 1.8 2.0
Architecture - Interoperability fids zfs
Architecture - LRE Images
Architecture - Libcfs
Architecture - Llog over OSD
Architecture - Lustre DLDs
Architecture - Lustre HLDs
Architecture - Lustre Logging API
Architecture - MDS-on-DMU
Architecture - MDS striping format
Architecture - MPI IO and NetCDF
Architecture - MPI LND
Architecture - Metadata API
Architecture - Migration (1)
Architecture - Migration (2)
Architecture - Multiple Interfaces For LNET
Architecture - Network Request Scheduler
Architecture - New Metadata API
Architecture - OSS-on-DMU
Architecture - Open by fid
Architecture - PAG
Architecture - Pools of targets
Architecture - Profiling Tools for IO
Architecture - Proxy Cache
Architecture - Punch and Extent Migration
Architecture - Punch and Extent Migration Requirements
Architecture - Recovery Failures
Architecture - Request Redirection
Architecture - Scalable Pinger
Architecture - Security
Architecture - Server Network Striping
Architecture - Simple Space Balance Migration
Architecture - Simplified Interoperation
Architecture - Space Manager
Architecture - Sub Tree Locks
Architecture - User Level Access
Architecture - User Level OSS
Architecture - Userspace Servers
Architecture - Version Based Recovery
Architecture - Wide Striping
Architecture - Wire Level Protocol
Architecture - Write Back Cache
Architecture - Writing Architecture Documents
Architecture - ZFS TinyZAP
Architecture - ZFS for Lustre
Architecture - ZFS large dnodes
Architecture Descriptions

Old Site Lustre Internals Documentation Area

http://wiki.old.lustre.org/lid/

Glossary

http://wiki.old.lustre.org/lid/glossary/glossary.html

Brief descriptions of Lustre concepts, objects and major components indexed in various ways.

Lustre Internals: A Gentle Introduction

http://wiki.old.lustre.org/lid/agi/agi.html

Subsystem Map

TODO: Generate new version of

Original ULFI Table Of Contents

  • Component View on Architecture
    • Lustre Client
    • OSS
    • MDS
  • Lustre Lite
      • Connection
      • Dentry Object
      • Lustre Superblock
      • Lustre inode
    • Path Lookup
      • Path
      • Asynchronous I/O
      • Group I/O (or Synchronous I/O)
      • Direct I/O
      • Interface with VFS
    • Read-Ahead
  • LOV and OSC
      • Device Operations
    • Page Management
    • From OSC Client To OST
    • Grant
  • LDLM: Lock Manager
    • Namespace
    • Resource
    • Lock Type and Mode
    • Callbacks
    • Intent
    • Lock Manager
      • Requesting a Lock
      • Canceling a Lock
      • Policy Function
      • Cases
      • MDS: One Client Read
      • MDS: Two Clients
      • OST: Two Clients Read and Write
  • OST and Obdfilter
      • as OST
      • Initial Setup
      • Dispatching
      • Directory Layout
      • Group Number
      • Object Id
    • obdfilter
      • File Deletion
      • File Creation
  • MDC and Lustre Metadata
      • Overview
    • Striping EA
    • Striping API
  • Infrastructure Support
    • Lustre Client Registration
    • Superblock and Inode Registration
      • Device
    • Import and Export
  • Portal RPC
    • Client Side Interface
    • Server Side Interface
    • Bulk Transfer
      • NRS Optimization
    • Error Recovery: A Client Perspective
  • LNET: Lustre Networking
    • Core Concepts
      • LNET Process Id
      • ME: Matching Entry
      • MD: Memory Descriptor
      • Example Use of Offset
      • MD Options
      • Event Queue
    • Portal RPC: A Client of LNET
      • Get and Put Confusion
      • Router In the Middle
      • Round 1: Client Server Interactions
      • Round 2: More details
    • LNET API
      • Naming Conventions
      • Initialization and Teardown
      • Memory-Oriented Communication Semantics
      • Match Entry Management
    • LNET/LND Semantics and API
      • API Summary
    • LNET Startup and Data Transfer
      • Startup
      • LNET Send
      • LNET Receive
      • The Case for RDMA
    • LNET Routing
      • General Concepts
      • Asymmetric Routing Failure
      • Routing Buffer Management and Flow Control
      • Fine Grain Routing
  • Lustre Generic Filesystem Wraper Layer: fsfilt
    • Overview
    • fsfilt for ext3
    • fsfilt Use Case Examples
      • DIRECT_IO in Lustre
      • Replaying Last Transactions After a Server Crash
      • Client Connect/Disconnect
      • Why ls Is Expensive on Lustre
  • Lustre Disk Filesystem: ldiskfs
    • Kernel Patches
    • Patches: ext3 to ldiskfs
  • Future Work