Lustre File System
Operations Manual - Version 2.0
821-2076-10
|
Contents |
1.1 What Lustre Is (and What It Isn’t)
1.2.2 Lustre File System Components
1.2.3 Lustre Networking (LNET)
1.3.1 Lustre File System and Striping
2. Understanding Lustre Networking (LNET)
3. Understanding Failover in Lustre
3.1.2 Types of Failover Configurations
3.2 Failover Functionality in Lustre
3.2.1 MDT Failover Configuration (Active/Passive)
3.2.2 OST Failover Configuration (Active/Active)
Part II Installing and Configuring Lustre
4.1 Steps to Installing Lustre
5. Setting Up a Lustre File System
5.1.1 MDT Storage Hardware Considerations
5.1.2 OST Storage Hardware Considerations
5.2 Determining Space Requirements
5.2.1 Determining MDS/MDT Space Requirements
5.2.2 Determining OSS/OST Space Requirements
5.3 Setting File System Formatting Options
5.3.1 Setting the Number of Inodes for the MDS
5.3.2 Setting the Inode Size for the MDT
5.3.3 Setting the Number of Inodes for an OST
5.3.4 File and File System Limits
5.4 Determining Memory Requirements
5.4.1 Client Memory Requirements
5.4.2.1 Calculating MDS Memory Requirements
5.4.3.1 Calculating OSS Memory Requirements
5.5 Implementing Networks To Be Used by Lustre
6. Configuring Storage on a Lustre File System
6.1 Selecting Storage for the MDT and OSTs
6.1.2 Object Storage Server (OST)
6.2 Reliability Best Practices
6.4 Formatting Options for RAID Devices
6.4.1 Computing file system parameters for mkfs
6.4.2 Choosing Parameters for an External Journal
6.5 Connecting a SAN to a Lustre File System
7. Setting Up Network Interface Bonding
7.1 Network Interface Bonding Overview
7.5 Configuring Lustre with Bonding
8. Installing the Lustre Software
8.1 Preparing to Install the Lustre Software
8.1.1.1 Network-specific kernel modules and libraries
8.1.1.2 Lustre-Specific Tools and Utilities
8.1.1.3 (Optional) High-Availability Software
8.1.1.4 (Optional) Debugging Tools and Other Optional Packages
8.1.2 Environmental Requirements
8.2 Lustre Installation Procedure
9. Configuring Lustre Networking (LNET)
9.1 Overview of LNET Module Parameters
9.1.1 Using a Lustre Network Identifier (NID) to Identify a Node
9.2 Setting the LNET Module networks Parameter
9.2.1 Multihome Server Example
9.3 Setting the LNET Module ip2nets Parameter
9.4 Setting the LNET Module routes Parameter
9.5 Testing the LNET Configuration
9.6 Configuring the Router Checker
9.7 Best Practices for LNET Options
10.1 Configuring a Simple Lustre File System
10.1.1 Simple Lustre Configuration Example
10.2 Additional Configuration Options
10.2.1 Scaling the Lustre File System
10.2.2 Changing Striping Defaults
10.2.3 Using the Lustre Configuration Utilities
11. Configuring Lustre Failover
11.1 Creating a Failover Environment
11.1.1 Power Management Software
11.2 Setting up High-Availability (HA) Software with Lustre
12.1.1 Working with Changelogs
13.5 Specifying Failout/Failover Mode for OSTs
13.6 Handling Degraded OST RAID Arrays
13.7 Running Multiple Lustre File Systems
13.8 Setting and Retrieving Lustre Parameters
13.8.1 Setting Parameters with mkfs.lustre
13.8.2 Setting Parameters with tunefs.lustre
13.8.3 Setting Parameters with lctl
13.8.3.1 Setting Temporary Parameters
13.8.3.2 Setting Permanent Parameters
13.8.3.4 Reporting Current Parameter Values
13.9 Specifying NIDs and Failover
13.11 Reclaiming Reserved Disk Space
13.12 Replacing an Existing OST or MDS
13.13 Identifying To Which Lustre File an OST Object Belongs
14.1 Working with Inactive OSTs
14.2 Finding Nodes in the Lustre File System
14.3 Mounting a Server Without Lustre Service
14.4 Regenerating Lustre Configuration Logs
14.6 Adding a New OST to a Lustre File System
14.7 Removing and Restoring OSTs
14.7.1 Removing an OST from the File System
14.7.2 Backing Up OST Configuration Files
14.7.3 Restoring OST Configuration Files
14.7.4 Returning a Deactivated OST to Service
14.9 Determining Which Machine is Serving an OST
14.10 Changing the Address of a Failover Node
15. Managing Lustre Networking (LNET)
15.1 Updating the Health Status of a Peer or Router
15.2 Starting and Stopping LNET
15.3 Multi-Rail Configurations with LNET
15.4 Load Balancing with InfiniBand
15.4.1 Setting Up modprobe.conf for Load Balancing
16.2 Upgrading Lustre 1.8.x to 2.0
16.2.1 Performing a File System Upgrade
17. Backing Up and Restoring a File System
17.1.1.2 lustre_rsync Examples
17.2 Backing Up and Restoring an MDS or OST (Device Level)
17.3 Making a File-Level Backup of an OST File System
17.4 Restoring a File-Level Backup
17.5 Using LVM Snapshots with Lustre
17.5.1 Creating an LVM-based Backup File System
17.5.2 Backing up New/Changed Files to the Backup File System
17.5.3 Creating Snapshot Volumes
17.5.4 Restoring the File System From a Snapshot
17.5.6 Changing Snapshot Volume Size
18. Managing File Striping and Free Space
18.1 How Lustre Striping Works
18.2 Lustre File Striping Considerations
18.3 Setting the File Layout/Striping Configuration (lfs setstripe)
18.3.1 Using a Specific Striping Pattern/File Layout for a Single File
18.3.1.1 Setting the Stripe Size
18.3.1.2 Setting the Stripe Count
18.3.2 Changing Striping for a Directory
18.3.3 Changing Striping for a File System
18.3.4 Creating a File on a Specific OST
18.4 Retrieving File Layout/Striping Information (getstripe)
18.4.1 Displaying the Current Stripe Size
18.4.2 Inspecting the File Tree
18.5.1 Checking File System Free Space
18.5.2 Using Stripe Allocations
18.5.3 Adjusting the Weighting Between Free Space and Location
19. Managing the File System and I/O
19.1.1 Checking OST Space Usage
19.1.2 Taking a Full OST Offline
19.1.3 Migrating Data within a File System
19.1.4 Returning an Inactive OST Back Online
19.2 Creating and Managing OST Pools
19.2.1.1 Using the lfs Command with OST Pools
19.2.2 Tips for Using OST Pools
19.3 Adding an OST to a Lustre File System
19.4.1 Making File System Objects Immutable
19.5.1.1 Changing Checksum Algorithms
20.1 Lustre Failover and Multiple-Mount Protection
20.1.1 Working with Multiple-Mount Protection
21. Configuring and Managing Quotas
21.2.0.1 Administrative and Operational Quotas
21.3 Creating Quota Files and Quota Administration
21.5.1 Granted Cache and Quota Limits
21.6.1 Interpreting Quota Statistics
22.2.1 Configuring Root Squash
22.2.2 Enabling and Tuning Root Squash
22.2.3 Tips on Using Root Squash
Part IV Tuning Lustre for Performance
23. Testing Lustre Network Performance (LNET Self-Test)
23.2.3 Defining and Running the Tests
23.3 LNET Self-Test Command Reference
23.3.3 Batch and Test Commands
24. Benchmarking Lustre Performance (Lustre I/O Kit)
24.1 Using Lustre I/O Kit Tools
24.1.1 Contents of the Lustre I/O Kit
24.1.2 Preparing to Use the Lustre I/O Kit
24.2 Testing I/O Performance of Raw Hardware (sgpdd_survey)
24.2.1 Tuning Linux Storage Devices
24.3 Testing OST Performance (obdfilter_survey)
24.3.1 Testing Local Disk Performance
24.3.2 Testing Network Performance
24.3.3 Testing Remote Disk Performance
24.4 Testing OST I/O Performance (ost_survey)
24.5 Collecting Application Profiling Information (stats-collect)
25.1 Optimizing the Number of Service Threads
25.1.1 Specifying the OSS Service Thread Count
25.1.2 Specifying the MDS Service Thread Count
25.2.1 Transmit and Receive Buffer Size
25.2.2 Hardware Interrupts (enable_irq_affinity)
25.4 Improving Lustre Performance When Working with Small Files
25.5 Understanding Why Write Performance Is Better Than Read Performance
26.3.1 OST Object is Missing or Damaged
26.3.3 Identifying a Missing OST
26.3.4 Fixing a Bad LAST_ID on an OST
26.3.5 Handling/Debugging "Bind: Address already in use" Error
26.3.6 Handling/Debugging Error "- 28"
26.3.7 Triggering Watchdog for PID NNN
26.3.8 Handling Timeouts on Initial Lustre Setup
26.3.9 Handling/Debugging "LustreError: xxx went back in time"
26.3.10 Lustre Error: "Slow Start_Page_Write"
26.3.11 Drawbacks in Doing Multi-client O_APPEND Writes
26.3.12 Slowdown Occurs During Lustre Startup
26.3.13 Log Message ‘Out of Memory’ on OST
26.3.14 Setting SCSI I/O Sizes
27.1 Recovering from Errors or Corruption on a Backing File System
27.2 Recovering from Corruption in the Lustre File System
27.2.1 Working with Orphaned Objects
27.3 Recovering from an Unavailable OST
28.1 Diagnostic and Debugging Tools
28.1.2 External Debugging Tools
28.1.2.1 Tools for Administrators and Developers
28.2 Lustre Debugging Procedures
28.2.1 Understanding the Lustre Debug Messaging Format
28.2.1.1 Lustre Debug Messages
28.2.1.2 Format of Lustre Debug Messages
28.2.1.3 Lustre Debug Messages Buffer
28.2.2 Using the lctl Tool to View Debug Messages
28.2.3 Dumping the Buffer to a File (debug_daemon)
28.2.3.1 lctl debug_daemon Commands
28.2.4 Controlling Information Written to the Kernel Debug Log
28.2.5 Troubleshooting with strace
28.2.6 Looking at Disk Content
28.2.7 Finding the Lustre UUID of an OST
28.2.8 Printing Debug Messages to the Console
28.3 Lustre Debugging for Developers
28.3.1 Adding Debugging to the Lustre Source Code
28.3.2 Accessing a Ptlrpc Request History
28.3.3 Finding Memory Leaks Using leak_finder.pl
29. Installing Lustre from Source Code
29.1 Overview and Prerequisites
29.2.1 Introducing the Quilt Utility
29.2.2 Get the Lustre Source and Unpatched Kernel
29.3 Creating and Installing the Lustre Packages
29.4 Installing Lustre with a Third-Party Network Stack
30.2.7 Gaps in the Replay Sequence
30.3.2 Reconstruction of Open Replies
30.5.1 Working with Commit on Share
31.1.1 Locating Lustre File Systems and Servers
31.1.3.1 Configuring Adaptive Timeouts
31.1.3.2 Interpreting Adaptive Timeouts Information
31.1.5 Free Space Distribution
31.1.5.1 Managing Stripe Allocation
31.2.1 Client I/O RPC Stream Tunables
31.2.2 Watching the Client RPC Stream
31.2.3 Client Read-Write Offset Survey
31.2.4 Client Read-Write Extents Survey
31.2.5 Watching the OST Block I/O Stream
31.2.6 Using File Readahead and Directory Statahead
31.2.6.1 Tuning File Readahead
31.2.6.2 Tuning Directory Statahead
31.2.8 OSS Asynchronous Journal Commit
31.2.12 Setting MDS and OSS Thread Counts
31.3.1 RPC Information for Other OBD Devices
31.3.1.1 Interpreting OST Statistics
31.3.1.2 Interpreting MDT Statistics
33. Lustre Programming Interfaces
33.1.2.1 Primary and Secondary Groups
34. Setting Lustre Properties in a C Program (llapi)
34.6 Example Using the llapi Library
35. Configuration Files and Module Parameters
35.2.2 SOCKLND Kernel TCP/IP LND
36. System Configuration Utilities
36.5 ll_recover_lost_found_objs
36.19 Additional System Configuration Utilities
36.19.1 Application Profiling Utilities
36.19.2 More /proc Statistics for Application Profiling
36.19.3 Testing / Debugging Utilities
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.