Lustre File System
Operations Manual - Version 1.8
821-0035-12
|
Contents |
1.1 Introducing the Lustre File System
1.2.1 Lustre Networking (LNET)
1.4 Files in the Lustre File System
1.4.1 Lustre File System and Striping
1.7 Lustre Failover and Rolling Upgrades
2. Understanding Lustre Networking
2.3 Designing Your Lustre Network
2.3.1 Identify All Lustre Networks
2.3.2 Identify Nodes to Route Between Networks
2.3.3 Identify Network Interfaces to Include/Exclude from LNET
2.3.4 Determine Cluster-wide Module Configuration
2.3.5 Determine Appropriate Mount Parameters for Clients
2.4.1.2 OFED InfiniBand Options
2.4.2 Module Parameters - Routing
2.5 Starting and Stopping LNET
3.1 Preparing to Install Lustre
3.1.1 Supported Operating System, Platform and Interconnect
3.1.2 Required Lustre Software
3.1.3 Required Tools and Utilities
3.1.4 (Optional) High-Availability Software
3.1.6 Environmental Requirements
3.1.7.1 MDS Memory Requirements
3.1.7.2 OSS Memory Requirements
3.2 Installing Lustre from RPMs
3.3 Installing Lustre from Source Code
3.3.1.1 Introducing the Quilt Utility
3.3.1.2 Get the Lustre Source and Unpatched Kernel
3.3.2 Create and Install the Lustre Packages
3.3.3 Installing Lustre with a Third-Party Network Stack
4.1 Configuring the Lustre File System
4.1.0.1 Simple Lustre Configuration Example
4.1.1 Scaling the Lustre File System
4.2 Additional Lustre Configuration
4.3 Basic Lustre Administration
4.3.1 Specifying the File System Name
4.3.5 Working with Inactive OSTs
4.3.6 Finding Nodes in the Lustre File System
4.3.7 Mounting a Server Without Lustre Service
4.3.8 Specifying Failout/Failover Mode for OSTs
4.3.9 Running Multiple Lustre File Systems
4.3.10 Setting and Retrieving Lustre Parameters
4.3.10.1 Setting Parameters with mkfs.lustre
4.3.10.2 Setting Parameters with tunefs.lustre
4.3.10.3 Setting Parameters with lctl
4.3.10.4 Reporting Current Parameter Values
4.3.11 Regenerating Lustre Configuration Logs
4.3.13 Removing and Restoring OSTs
4.3.13.1 Removing an OST from the File System
4.3.13.2 Restoring an OST in the File System
4.3.15 Determining Which Machine is Serving an OST
4.4 More Complex Configurations
4.5.1 Changing the Address of a Failover Node
5.1 Introduction to Service Tags
5.2.2 Discovering and Registering Lustre Components
5.2.3 Service Tag Registration Information
6. Configuring Lustre - Examples
6.1.1 Lustre with Combined MGS/MDT
6.1.1.2 Configuration Generation and Application
6.1.2 Lustre with Separate MGS and MDT
6.1.2.2 Configuration Generation and Application
6.1.2.3 Configuring Lustre with a CSV File
7. More Complicated Configurations
7.3 Load Balancing with InfiniBand
7.3.1 Setting Up modprobe.conf for Load Balancing
7.4 Multi-Rail Configurations with LNET
8.1.2 Types of Failover Configurations
8.2 Failover Functionality in Lustre
8.2.1 MDT Failover Configuration (Active/Passive)
8.2.2 OST Failover Configuration (Active/Active)
8.3 Configuring and Using Heartbeat with Lustre Failover
8.3.1 Creating a Failover Environment
8.3.1.1 Power Management Software
8.3.2 Setting up the Heartbeat Software
8.3.2.3 (Optional) Migrating a Heartbeat Configuration (v1 to v2)
8.3.3.2 Switching Resources Between Nodes
9.1.1.1 Administrative and Operational Quotas
9.1.2 Creating Quota Files and Quota Administration
9.1.4 Known Issues with Quotas
9.1.4.1 Granted Cache and Quota Limits
9.1.5.1 Interpreting Quota Statistics
10.1 Considerations for Backend Storage
10.1.1 Selecting Storage for the MDS or OSTs
10.1.2 Reliability Best Practices
10.1.4 Formatting Options for RAID Devices
10.1.4.1 Creating an External Journal
10.1.5 Handling Degraded RAID Arrays
10.2 Insights into Disk Performance Measurement
10.3 Lustre Software RAID Support
10.3.0.1 Enabling Software RAID on Lustre
11.2 Lustre Setup with Kerberos
11.2.1 Configuring Kerberos for Lustre
11.2.1.1 Kerberos Distributions Supported on Lustre
11.2.1.2 Preparing to Set Up Lustre with Kerberos
11.2.1.3 Configuring Lustre for Kerberos
11.2.1.5 Setting the Environment
11.2.2 Types of Lustre-Kerberos Flavors
11.2.2.4 Specifying Security Flavors
11.2.2.6 Rules, Syntax and Examples
11.2.2.7 Authenticating Normal Users
12.3 Using Lustre with Multiple NICs versus Bonding NICs
12.4 Bonding Module Parameters
12.6 Configuring Lustre with Bonding
13. Upgrading and Downgrading Lustre
13.3 Upgrading Lustre 1.6.x to 1.8.x
13.3.1 Performing a Complete File System Upgrade
13.3.2 Performing a Rolling Upgrade
13.4 Upgrading Lustre 1.8.x to the Next Minor Version
13.5 Downgrading from Lustre 1.8.x to 1.6.x
13.5.1 Performing a Complete File System Downgrade
13.5.2 Performing a Rolling Downgrade
14.1 Installing the Lustre SNMP Module
14.2 Building the Lustre SNMP Module
14.3 Using the Lustre SNMP Module
15.2 Backing up a Device (MDS or OST)
15.3.1 Backing up Extended Attributes
15.4 Restoring from a File-level Backup
15.5 Using LVM Snapshots with Lustre
15.5.1 Creating an LVM-based Backup File System
15.5.2 Backing up New/Changed Files to the Backup File System
15.5.3 Creating Snapshot Volumes
15.5.4 Restoring the File System From a Snapshot
15.5.6 Changing Snapshot Volume Size
16.2.1 POSIX Installation Using a Quick Start Version
16.3 Building and Running a POSIX Compliance Test Suite on Lustre
16.3.1 Building the Test Suite from Scratch
16.3.2 Running the Test Suite Against Lustre
16.4 Isolating and Debugging Failures
18.1 Lustre I/O Kit Description and Prerequisites
18.1.2 Prerequisites to Using an I/O Kit
18.2.2.1 Running obdfilter_survey Against a Local Disk
18.2.2.2 Running obdfilter_survey Against a Network
18.2.2.3 Running obdfilter_survey Against a Network Disk
18.4.1 Basic Concepts of LNET Self-Test
18.4.2 LNET Self-Test Commands
19.2.7 Gaps in the Replay Sequence
19.3.2 Reconstruction of Open Replies
19.5 Recovering from Errors or Corruption on a Backing File System
19.6 Recovering from Corruption in the Lustre File System
19.6.1 Working with Orphaned Objects
Part III Lustre Tuning, Monitoring and Troubleshooting
20.1.1 OSS Service Thread Count
20.1.1.1 Optimizing the Number of Service Threads
20.1.2 MDS Service Thread Count
20.2.1 Transmit and receive buffer size:
20.3 Options for Formatting the MDT and OSTs
20.4 Overriding Default Formatting Options
20.4.1 Number of Inodes for the MDS
20.4.3 Number of Inodes for an OST
20.5 Large-Scale Tuning for Cray XT and Equivalents
21.1.1 Locating Lustre File Systems and Servers
21.1.3.1 Configuring Adaptive Timeouts
21.1.3.2 Interpreting Adaptive Timeouts Information
21.1.5 Free Space Distribution
21.1.5.1 Managing Stripe Allocation
21.2.1 Client I/O RPC Stream Tunables
21.2.2 Watching the Client RPC Stream
21.2.3 Client Read-Write Offset Survey
21.2.4 Client Read-Write Extents Survey
21.2.5 Watching the OST Block I/O Stream
21.2.6 Using File Readahead and Directory Statahead
21.2.6.1 Tuning File Readahead
21.2.6.2 Tuning Directory Statahead
21.2.8 OSS Asynchronous Journal Commit
21.2.12 Setting MDS and OSS Thread Counts
21.3.1 RPC Information for Other OBD Devices
21.3.1.1 Interpreting OST Statistics
21.3.1.3 Interpreting MDT Statistics
23.3 Common Lustre Problems and Performance Tips
23.3.1 Recovering from an Unavailable OST
23.3.2 Write Performance Better Than Read Performance
23.3.3 OST Object is Missing or Damaged
23.3.5 Identifying a Missing OST
23.3.6 Improving Lustre Performance When Working with Small Files
23.3.9 How to Fix a Bad LAST_ID on an OST
23.3.10 Reclaiming Reserved Disk Space
23.3.11 Considerations in Connecting a SAN with Lustre
23.3.12 Handling/Debugging "Bind: Address already in use" Error
23.3.13 Replacing An Existing OST or MDS
23.3.14 Handling/Debugging Error "- 28"
23.3.15 Triggering Watchdog for PID NNN
23.3.16 Handling Timeouts on Initial Lustre Setup
23.3.17 Handling/Debugging "LustreError: xxx went back in time"
23.3.18 Lustre Error: "Slow Start_Page_Write"
23.3.19 Drawbacks in Doing Multi-client O_APPEND Writes
23.3.20 Slowdown Occurs During Lustre Startup
23.3.21 Log Message ‘Out of Memory’ on OST
23.3.22 Number of OSTs Needed for Sustained Throughput
23.3.23 Setting SCSI I/O Sizes
23.3.24 Identifying Which Lustre File an OST Object Belongs To
24.1.1 Format of Lustre Debug Messages
24.1.2 Lustre Debug Messages Buffer
24.2 Tools for Lustre Debugging
24.2.1 Debug Daemon Option to lctl
24.2.1.1 lctl Debug Daemon Commands
24.2.2 Controlling the Kernel Debug Log
24.2.5 Printing to /var/log/messages
24.2.8 Adding Debugging to the Lustre Source Code
24.3 Troubleshooting with strace
24.4.1 Determine the Lustre UUID of an OST
25.1.2 Disadvantages of Striping
25.2 Setting and Retrieving Striping Information
25.2.2 Changing Striping for a Subdirectory
25.2.3 Using a Specific Striping Pattern/File Layout for a Single File
25.2.4 Creating a File on a Specific OST
25.3.1 Checking File System Free Space
25.3.2 Using Stripe Allocations
25.3.5 Adjusting the Weighting Between Free Space and Location
25.4.1 Checking File System Usage
25.4.2 Taking a Full OST Offline
25.4.3 Migrating Data within a File System
25.5 Creating and Managing OST Pools
25.5.1.1 Using the lfs Command with OST Pools
25.5.2 Tips for Using OST Pools
25.6.1 Making File System Objects Immutable
25.7.1.1 Changing Checksum Algorithms
26.2.1 Configuring Root Squash
26.2.2 Enabling and Tuning Root Squash
27.1 Adding an OST to a Lustre File System
27.2 A Simple Data Migration Script
27.3 Adding Multiple SCSI LUNs on Single HBA
27.4 Failures Running a Client and OST on the Same Machine
27.5 Improving Lustre Metadata Performance While Using Large Directories
29. Lustre Programming Interfaces (man2)
29.1.2.1 Primary and Secondary Groups
30. Setting Lustre Properties (man3)
31. Configuration Files and Module Parameters (man5)
31.2.2 SOCKLND Kernel TCP/IP LND
31.2.8 Portals LND (Catamount)
32. System Configuration Utilities (man8)
32.5 Additional System Configuration Utilities
32.5.3 Utilities to Manage Large Clusters
32.5.4 Application Profiling Utilities
32.5.5 More /proc Statistics for Application Profiling
32.5.6 Testing / Debugging Utilities
32.5.14 ll_recover_lost_found_objs
33.4 Maximum Number of OSTs and MDTs
33.5 Maximum Number of Clients
33.6 Maximum Size of a File System
33.8 Maximum Number of Files or Subdirectories in a Single Directory
33.10 Maximum Length of a Filename and Pathname
33.11 Maximum Number of Open Files for Lustre File Systems
Copyright © 2010, Oracle and/or its affiliates. All rights reserved.