Sun Oracle Logo

Lustre File System

Operations Manual - Version 2.0




Part I Introducing Lustre

1. Understanding Lustre

1.1 What Lustre Is (and What It Isn’t)

1.1.1 Lustre Features

1.2 Lustre Components

1.2.1 Management Server (MGS)

1.2.2 Lustre File System Components

1.2.3 Lustre Networking (LNET)

1.2.4 Lustre Cluster

1.3 Lustre Storage and I/O

1.3.1 Lustre File System and Striping

2. Understanding Lustre Networking (LNET)

2.1 Introducing LNET

2.2 Key Features of LNET

2.3 Supported Network Types

3. Understanding Failover in Lustre

3.1 What is Failover?

3.1.1 Failover Capabilities

3.1.2 Types of Failover Configurations

3.2 Failover Functionality in Lustre

3.2.1 MDT Failover Configuration (Active/Passive)

3.2.2 OST Failover Configuration (Active/Active)

Part II Installing and Configuring Lustre

4. Installation Overview

4.1 Steps to Installing Lustre

5. Setting Up a Lustre File System

5.1 Hardware Considerations

5.1.1 MDT Storage Hardware Considerations

5.1.2 OST Storage Hardware Considerations

5.2 Determining Space Requirements

5.2.1 Determining MDS/MDT Space Requirements

5.2.2 Determining OSS/OST Space Requirements

5.3 Setting File System Formatting Options

5.3.1 Setting the Number of Inodes for the MDS

5.3.2 Setting the Inode Size for the MDT

5.3.3 Setting the Number of Inodes for an OST

5.3.4 File and File System Limits

5.4 Determining Memory Requirements

5.4.1 Client Memory Requirements

5.4.2 MDS Memory Requirements Calculating MDS Memory Requirements

5.4.3 OSS Memory Requirements Calculating OSS Memory Requirements

5.5 Implementing Networks To Be Used by Lustre

6. Configuring Storage on a Lustre File System

6.1 Selecting Storage for the MDT and OSTs

6.1.1 Metadata Target (MDT)

6.1.2 Object Storage Server (OST)

6.2 Reliability Best Practices

6.3 Performance Tradeoffs

6.4 Formatting Options for RAID Devices

6.4.1 Computing file system parameters for mkfs

6.4.2 Choosing Parameters for an External Journal

6.5 Connecting a SAN to a Lustre File System

7. Setting Up Network Interface Bonding

7.1 Network Interface Bonding Overview

7.2 Requirements

7.3 Bonding Module Parameters

7.4 Setting Up Bonding

7.4.1 Examples

7.5 Configuring Lustre with Bonding

7.6 Bonding References

8. Installing the Lustre Software

8.1 Preparing to Install the Lustre Software

8.1.1 Required Software Network-specific kernel modules and libraries Lustre-Specific Tools and Utilities (Optional) High-Availability Software (Optional) Debugging Tools and Other Optional Packages

8.1.2 Environmental Requirements

8.2 Lustre Installation Procedure

9. Configuring Lustre Networking (LNET)

9.1 Overview of LNET Module Parameters

9.1.1 Using a Lustre Network Identifier (NID) to Identify a Node

9.2 Setting the LNET Module networks Parameter

9.2.1 Multihome Server Example

9.3 Setting the LNET Module ip2nets Parameter

9.4 Setting the LNET Module routes Parameter

9.4.1 Routing Example

9.5 Testing the LNET Configuration

9.6 Configuring the Router Checker

9.7 Best Practices for LNET Options

10. Configuring Lustre

10.1 Configuring a Simple Lustre File System

10.1.1 Simple Lustre Configuration Example

10.2 Additional Configuration Options

10.2.1 Scaling the Lustre File System

10.2.2 Changing Striping Defaults

10.2.3 Using the Lustre Configuration Utilities

11. Configuring Lustre Failover

11.1 Creating a Failover Environment

11.1.1 Power Management Software

11.1.2 Power Equipment

11.2 Setting up High-Availability (HA) Software with Lustre

Part III Administering Lustre

12. Lustre Monitoring

12.1 Lustre Changelogs

12.1.1 Working with Changelogs

12.1.2 Changelog Examples

12.2 Lustre Monitoring Tool

12.3 CollectL

12.4 Other Monitoring Options

13. Lustre Operations

13.1 Mounting by Label

13.2 Starting Lustre

13.3 Mounting a Server

13.4 Unmounting a Server

13.5 Specifying Failout/Failover Mode for OSTs

13.6 Handling Degraded OST RAID Arrays

13.7 Running Multiple Lustre File Systems

13.8 Setting and Retrieving Lustre Parameters

13.8.1 Setting Parameters with mkfs.lustre

13.8.2 Setting Parameters with tunefs.lustre

13.8.3 Setting Parameters with lctl Setting Temporary Parameters Setting Permanent Parameters Listing Parameters Reporting Current Parameter Values

13.9 Specifying NIDs and Failover

13.10 Erasing a File System

13.11 Reclaiming Reserved Disk Space

13.12 Replacing an Existing OST or MDS

13.13 Identifying To Which Lustre File an OST Object Belongs

14. Lustre Maintenance

14.1 Working with Inactive OSTs

14.2 Finding Nodes in the Lustre File System

14.3 Mounting a Server Without Lustre Service

14.4 Regenerating Lustre Configuration Logs

14.5 Changing a Server NID

14.6 Adding a New OST to a Lustre File System

14.7 Removing and Restoring OSTs

14.7.1 Removing an OST from the File System

14.7.2 Backing Up OST Configuration Files

14.7.3 Restoring OST Configuration Files

14.7.4 Returning a Deactivated OST to Service

14.8 Aborting Recovery

14.9 Determining Which Machine is Serving an OST

14.10 Changing the Address of a Failover Node

15. Managing Lustre Networking (LNET)

15.1 Updating the Health Status of a Peer or Router

15.2 Starting and Stopping LNET

15.2.1 Starting LNET Starting Clients

15.2.2 Stopping LNET

15.3 Multi-Rail Configurations with LNET

15.4 Load Balancing with InfiniBand

15.4.1 Setting Up modprobe.conf for Load Balancing

16. Upgrading Lustre

16.1 Lustre Interoperability

16.2 Upgrading Lustre 1.8.x to 2.0

16.2.1 Performing a File System Upgrade

17. Backing Up and Restoring a File System

17.1 Backing up a File System

17.1.1 Lustre_rsync Using Lustre_rsync lustre_rsync Examples

17.2 Backing Up and Restoring an MDS or OST (Device Level)

17.3 Making a File-Level Backup of an OST File System

17.4 Restoring a File-Level Backup

17.5 Using LVM Snapshots with Lustre

17.5.1 Creating an LVM-based Backup File System

17.5.2 Backing up New/Changed Files to the Backup File System

17.5.3 Creating Snapshot Volumes

17.5.4 Restoring the File System From a Snapshot

17.5.5 Deleting Old Snapshots

17.5.6 Changing Snapshot Volume Size

18. Managing File Striping and Free Space

18.1 How Lustre Striping Works

18.2 Lustre File Striping Considerations

18.2.1 Choosing a Stripe Size

18.3 Setting the File Layout/Striping Configuration (lfs setstripe)

18.3.1 Using a Specific Striping Pattern/File Layout for a Single File Setting the Stripe Size Setting the Stripe Count

18.3.2 Changing Striping for a Directory

18.3.3 Changing Striping for a File System

18.3.4 Creating a File on a Specific OST

18.4 Retrieving File Layout/Striping Information (getstripe)

18.4.1 Displaying the Current Stripe Size

18.4.2 Inspecting the File Tree

18.5 Managing Free Space

18.5.1 Checking File System Free Space

18.5.2 Using Stripe Allocations

18.5.3 Adjusting the Weighting Between Free Space and Location

19. Managing the File System and I/O

19.1 Handling Full OSTs

19.1.1 Checking OST Space Usage

19.1.2 Taking a Full OST Offline

19.1.3 Migrating Data within a File System

19.1.4 Returning an Inactive OST Back Online

19.2 Creating and Managing OST Pools

19.2.1 Working with OST Pools Using the lfs Command with OST Pools

19.2.2 Tips for Using OST Pools

19.3 Adding an OST to a Lustre File System

19.4 Performing Direct I/O

19.4.1 Making File System Objects Immutable

19.5 Other I/O Options

19.5.1 Lustre Checksums Changing Checksum Algorithms

20. Managing Failover

20.1 Lustre Failover and Multiple-Mount Protection

20.1.1 Working with Multiple-Mount Protection

21. Configuring and Managing Quotas

21.1 Working with Quotas

21.2 Enabling Disk Quotas Administrative and Operational Quotas

21.3 Creating Quota Files and Quota Administration

21.4 Quota Allocation

21.5 Known Issues with Quotas

21.5.1 Granted Cache and Quota Limits

21.5.2 Quota Limits

21.5.3 Quota File Formats

21.6 Lustre Quota Statistics

21.6.1 Interpreting Quota Statistics

22. Managing Lustre Security

22.1 Using ACLs

22.1.1 How ACLs Work

22.1.2 Using ACLs with Lustre

22.1.3 Examples

22.2 Using Root Squash

22.2.1 Configuring Root Squash

22.2.2 Enabling and Tuning Root Squash

22.2.3 Tips on Using Root Squash

Part IV Tuning Lustre for Performance

23. Testing Lustre Network Performance (LNET Self-Test)

23.1 LNET Self-Test Overview

23.1.1 Prerequisites

23.2 Using LNET Self-Test

23.2.1 Creating a Session

23.2.2 Setting Up Groups

23.2.3 Defining and Running the Tests

23.2.4 Sample Script

23.3 LNET Self-Test Command Reference

23.3.1 Session Commands

23.3.2 Group Commands

23.3.3 Batch and Test Commands

23.3.4 Other Commands

24. Benchmarking Lustre Performance (Lustre I/O Kit)

24.1 Using Lustre I/O Kit Tools

24.1.1 Contents of the Lustre I/O Kit

24.1.2 Preparing to Use the Lustre I/O Kit

24.2 Testing I/O Performance of Raw Hardware (sgpdd_survey)

24.2.1 Tuning Linux Storage Devices

24.2.2 Running sgpdd_survey

24.3 Testing OST Performance (obdfilter_survey)

24.3.1 Testing Local Disk Performance

24.3.2 Testing Network Performance

24.3.3 Testing Remote Disk Performance

24.3.4 Output Files Script Output Visualizing Results

24.4 Testing OST I/O Performance (ost_survey)

24.5 Collecting Application Profiling Information (stats-collect)

24.5.1 Using stats-collect

25. Lustre Tuning

25.1 Optimizing the Number of Service Threads

25.1.1 Specifying the OSS Service Thread Count

25.1.2 Specifying the MDS Service Thread Count

25.2 Tuning LNET Parameters

25.2.1 Transmit and Receive Buffer Size

25.2.2 Hardware Interrupts (enable_irq_affinity)

25.3 Lockless I/O Tunables

25.4 Improving Lustre Performance When Working with Small Files

25.5 Understanding Why Write Performance Is Better Than Read Performance

Part V Troubleshooting Lustre

26. Lustre Troubleshooting

26.1 Lustre Error Messages

26.1.1 Error Numbers

26.1.2 Viewing Error Messages

26.2 Reporting a Lustre Bug

26.3 Common Lustre Problems

26.3.1 OST Object is Missing or Damaged

26.3.2 OSTs Become Read-Only

26.3.3 Identifying a Missing OST

26.3.4 Fixing a Bad LAST_ID on an OST

26.3.5 Handling/Debugging "Bind: Address already in use" Error

26.3.6 Handling/Debugging Error "- 28"

26.3.7 Triggering Watchdog for PID NNN

26.3.8 Handling Timeouts on Initial Lustre Setup

26.3.9 Handling/Debugging "LustreError: xxx went back in time"

26.3.10 Lustre Error: "Slow Start_Page_Write"

26.3.11 Drawbacks in Doing Multi-client O_APPEND Writes

26.3.12 Slowdown Occurs During Lustre Startup

26.3.13 Log Message ‘Out of Memory’ on OST

26.3.14 Setting SCSI I/O Sizes

27. Troubleshooting Recovery

27.1 Recovering from Errors or Corruption on a Backing File System

27.2 Recovering from Corruption in the Lustre File System

27.2.1 Working with Orphaned Objects

27.3 Recovering from an Unavailable OST

28. Lustre Debugging

28.1 Diagnostic and Debugging Tools

28.1.1 Lustre Debugging Tools

28.1.2 External Debugging Tools Tools for Administrators and Developers Tools for Developers

28.2 Lustre Debugging Procedures

28.2.1 Understanding the Lustre Debug Messaging Format Lustre Debug Messages Format of Lustre Debug Messages Lustre Debug Messages Buffer

28.2.2 Using the lctl Tool to View Debug Messages Sample lctl Run

28.2.3 Dumping the Buffer to a File (debug_daemon) lctl debug_daemon Commands

28.2.4 Controlling Information Written to the Kernel Debug Log

28.2.5 Troubleshooting with strace

28.2.6 Looking at Disk Content

28.2.7 Finding the Lustre UUID of an OST

28.2.8 Printing Debug Messages to the Console

28.2.9 Tracing Lock Traffic

28.3 Lustre Debugging for Developers

28.3.1 Adding Debugging to the Lustre Source Code

28.3.2 Accessing a Ptlrpc Request History

28.3.3 Finding Memory Leaks Using

Part VI Reference

29. Installing Lustre from Source Code

29.1 Overview and Prerequisites

29.2 Patching the Kernel

29.2.1 Introducing the Quilt Utility

29.2.2 Get the Lustre Source and Unpatched Kernel

29.2.3 Patch the Kernel

29.3 Creating and Installing the Lustre Packages

29.4 Installing Lustre with a Third-Party Network Stack

30. Lustre Recovery

30.1 Recovery Overview

30.1.1 Client Failure

30.1.2 Client Eviction

30.1.3 MDS Failure (Failover)

30.1.4 OST Failure (Failover)

30.1.5 Network Partition

30.1.6 Failed Recovery

30.2 Metadata Replay

30.2.1 XID Numbers

30.2.2 Transaction Numbers

30.2.3 Replay and Resend

30.2.4 Client Replay List

30.2.5 Server Recovery

30.2.6 Request Replay

30.2.7 Gaps in the Replay Sequence

30.2.8 Lock Recovery

30.2.9 Request Resend

30.3 Reply Reconstruction

30.3.1 Required State

30.3.2 Reconstruction of Open Replies

30.4 Version-based Recovery

30.4.1 VBR Messages

30.4.2 Tips for Using VBR

30.5 Commit on Share

30.5.1 Working with Commit on Share

30.5.2 Tuning Commit On Share

31. LustreProc

31.1 Proc Entries for Lustre

31.1.1 Locating Lustre File Systems and Servers

31.1.2 Lustre Timeouts

31.1.3 Adaptive Timeouts Configuring Adaptive Timeouts Interpreting Adaptive Timeouts Information

31.1.4 LNET Information

31.1.5 Free Space Distribution Managing Stripe Allocation

31.2 Lustre I/O Tunables

31.2.1 Client I/O RPC Stream Tunables

31.2.2 Watching the Client RPC Stream

31.2.3 Client Read-Write Offset Survey

31.2.4 Client Read-Write Extents Survey

31.2.5 Watching the OST Block I/O Stream

31.2.6 Using File Readahead and Directory Statahead Tuning File Readahead Tuning Directory Statahead

31.2.7 OSS Read Cache Using OSS Read Cache

31.2.8 OSS Asynchronous Journal Commit

31.2.9 mballoc History

31.2.10 mballoc3 Tunables

31.2.11 Locking

31.2.12 Setting MDS and OSS Thread Counts

31.3 Debug

31.3.1 RPC Information for Other OBD Devices Interpreting OST Statistics Interpreting MDT Statistics

32. User Utilities

32.1 lfs

32.2 lfs_migrate

32.3 lfsck

32.4 Filefrag

32.5 Mount

32.6 Handling Timeouts

33. Lustre Programming Interfaces

33.1 User/Group Cache Upcall

33.1.1 Name

33.1.2 Description Primary and Secondary Groups

33.1.3 Parameters

33.1.4 Data Structures

33.2 l_getgroups Utility

34. Setting Lustre Properties in a C Program (llapi)

34.1 llapi_file_create

34.2 llapi_file_get_stripe

34.3 llapi_file_open

34.4 llapi_quotactl

34.5 llapi_path2fid

34.6 Example Using the llapi Library

35. Configuration Files and Module Parameters

35.1 Introduction

35.2 Module Options

35.2.1 LNET Options Network Topology networks ("tcp") routes (“”) forwarding ("")

35.2.2 SOCKLND Kernel TCP/IP LND

35.2.3 Portals LND (Linux)

35.2.4 MX LND

36. System Configuration Utilities

36.1 e2scan

36.2 l_getidentity

36.3 lctl

36.4 ll_decode_filter_fid

36.5 ll_recover_lost_found_objs

36.6 llobdstat

36.7 llog_reader

36.8 llstat

36.9 llverdev

36.10 lshowmount

36.11 lst


36.13 lustre_rsync

36.14 mkfs.lustre

36.15 mount.lustre

36.16 plot-llstat

36.17 routerstat

36.18 tunefs.lustre

36.19 Additional System Configuration Utilities

36.19.1 Application Profiling Utilities

36.19.2 More /proc Statistics for Application Profiling

36.19.3 Testing / Debugging Utilities

36.19.4 Flock Feature Example