| C H A P T E R 22 |
|
Lustre Monitoring |
This chapter provides information on monitoring Lustre and includes the following sections:
The Lustre Monitoring Tool (LMT[1]) is a Python-based, distributed system that provides a ''top'' like display of activity on server-side nodes[2] (MDS, OSS and portals routers) on one or more Lustre file systems. For more information on LMT, including the setup procedure, see:
LMT questions can be directed to:
The Red Hat Cluster Manager provides high availability features that are essential for data integrity, application availability and uninterrupted service under various failure conditions. You can use the Cluster Manager to test MDS/OST failure in Lustre clusters.
To use Cluster Manager to test MDS failover, specific hardware is required - a compute node, OSTs and two machines (to act as the active and failover MDSs). The MDS nodes need to be able to see the same shared storage, so you need to prepare a shared disk for the Cluster Manager and the MDSs. Several RPM packages are also required[3], along with certain configuration changes.
For more information on the Cluster Manager (bundled in the Red Hat Cluster Suite), see the Red Hat Cluster Suite. Supporting documentation is available to the Red Hat Cluster Suite Overview.
For more information on installing and configuring Cluster Manager for Lustre failover, and testing MDS failover, see Cluster Manager.
Lustre has a native SNMP module, which enables you to use various standard SNMP monitoring packages (anything using RRDTool as a backend) to track performance. For more information in installing, building and using the SNMP module, see Lustre SNMP Module.
CollectL is another tool that can be used to monitor Lustre. You can run CollectL on a Lustre system that has any combination of MDSs, OSTs and clients. The collected data can be written to a file for continuous logging and played back at a later time. It can also be converted to a format suitable for plotting.
For more information about CollectL, see:
http://collectl.sourceforge.net
Lustre-specific documentation is also available. See:
http://collectl.sourceforge.net/Tutorial-Lustre.html
Another option is to script a simple monitoring solution which looks at various reports from ipconfig, as well as the procfs files generated by Lustre.
Copyright © 2010, Oracle and/or its affiliates. All rights reserved.