Operating System Configuration Guidelines For Lustre: Difference between revisions

From Lustre Wiki
Jump to navigation Jump to search
(Created page with "This guide does not provide OS management instructions except as they directly relate to the installation and management of Lustre software. Refer to the documentation supplie...")
 
(Moved YUM configuration content to "Installing the Lustre Software".)
Line 98: Line 98:
subscription-manager repos --list-enabled
subscription-manager repos --list-enabled
</pre>
</pre>
==== Using YUM to Manage Software Distribution ====
To streamline the installation process, the Lustre packages can be copied to an HTTP server on the network and incorporated into local YUM repositories.
Using YUM repositories simplifies the distribution of software packages to computers, aiding provisioning and configuration automation, and simplifying tasks such as auditing and updating.
The following instructions can be used to help establish a web server as a YUM repository host for the Lustre packages. The examples make use of the default directory structure for an Apache HTTP server on RHEL / CentOS 7. NGINX and other web servers may use different directory structures to store content.
<ol>
<li>Create a temporary YUM repository definition. This will be used to assist with the initial acquisition of Lustre and related packages.
<pre style="overflow-x:auto;">
cat >/tmp/lustre-repo.conf <<\__EOF
[lustre-server-2.10.0]
name=lustre-server
baseurl=https://downloads.hpdd.intel.com/public/lustre/lustre-2.10.0/el7/server
# exclude=*debuginfo*
gpgcheck=0
[lustre-client-2.10.0]
name=lustre-client
baseurl=https://downloads.hpdd.intel.com/public/lustre/lustre-2.10.0/el7/client
# exclude=*debuginfo*
gpgcheck=0
[e2fsprogs-wc]
name=e2fsprogs-wc
baseurl=https://downloads.hpdd.intel.com/public/e2fsprogs/latest/el7
# exclude=*debuginfo*
gpgcheck=0
__EOF
</pre>
<p>'''Note:''' The above example references the Lustre version explicitly. To always pull the latest version, replace <code>lustre-2.10.0</code> with <code>latest-release</code> in the URLs and remove the version numbers from the repository section header. Always pull in the latest <code>e2fsprogs</code> package unless directed otherwise.
</p>
<p>To cut down on the size of the download when testing, uncomment the exclude lines. This will omit the download of the debuginfo packages, which can be large. Nevertheless, it is generally a good idea to pull in these files as well in order that they be readily available to aid debugging.
</p>
</li>
<li>Use the <code>reposync</code> command (distributed in the <code>yum-utils</code> package) to download mirrors of the Lustre repositories to the manager server:
<pre style="overflow-x:auto;">
mkdir -p /var/lib/www/html/repo
cd /var/www/html/repo
reposync -c /tmp/lustre-repo.conf -n \
-r lustre-server-2.10.0 \
-r lustre-client-2.10.0 \
-r e2fsprogs-wc
</pre>
</li>
<li>Create the repository metadata:
<pre style="overflow-x:auto;">
cd /var/www/html/repo
for i in e2fsprogs-wc lustre-client-2.10.0 lustre-server-2.10.0; do
(cd $i && createrepo .)
done
</pre>
</li>
<li>Create a YUM repository definition file. The following script creates a file containing repository definitions for the Lustre packages, and stores it in the web server static content directory. This makes it easy to distribute to the Lustre servers and clients.
<p>Review the content and adjust according to the requirements of the target environment. Run the script on the web server host:
</p>
<pre style="overflow-x:auto;">
hn=`hostname --fqdn`
cat >/var/www/html/lustre-2.10.0.repo <<__EOF
[lustre-server-2.10.0]
name=lustre-server
baseurl=https://$hn/repo/lustre-server-2.10.0
enabled=0
gpgcheck=0
proxy=_none_
[lustre-client-2.10.0]
name=lustre-client
baseurl=https://$hn/repo/lustre-client-2.10.0
enabled=0
gpgcheck=0
[e2fsprogs-wc]
name=e2fsprogs-wc
baseurl=https://$hn/repo/e2fsprogs-wc
enabled=0
gpgcheck=0
__EOF
</pre>
<p>Make sure that the <code>$hn</code> variable matches the host name that will be used by the Lustre servers and clients to access the YUM web server.
</p>
<p>The use of the version numbers for the repository definitions is a matter of preference, and can be altered. As new versions of Lustre are released, these version numbers will, naturally, need to be changed.
</p>
</li>
<li>Apply any configuration changes that may be necessary for the web server to incorporate the new bundle directories. The configuration may need to be reloaded, or the web service restarted when done.
</li>
<li>Copy the Lustre repo definition file onto each of the Lustre servers and clients, in the directory <code>/etc/yum.repos.d/</code>. Utilities like <code>curl</code> and <code>wget</code> can be used to retrieve the file from the web server as part of a configuration management system rule/promise or during system provisioning.
</li>
</ol>


=== SUSE Linux Enterprise Server (SLES) ===
=== SUSE Linux Enterprise Server (SLES) ===

Revision as of 21:59, 8 August 2017

This guide does not provide OS management instructions except as they directly relate to the installation and management of Lustre software. Refer to the documentation supplied with the OS for the details of what is required. The guide has been developed using RHEL 7 as the base operating system platform, and all examples have been taken from the same OS unless otherwise stated.

Lustre servers and clients can be configured from a common operating system base. A minimal installation consisting of the @core and @base package clusters is the recommended starting point for both server and client OS installations running RHEL or CentOS.

There is a kickstart template for the base OS included in Appendix B: RHEL / CentOS Kickstart Template .

With modern package management systems such as YUM and DNF, package updates and dependency resolution are automatically managed, further simplifying the installation process. It is recommended that the operating system installation be as small and simple as possible, given that additional packages will automatically be installed through dependency resolution when the Lustre packages are installed.

Network Addresses

Lustre servers must have a globally unique and persistent network identifier and this is derived from the IPv4 address of the interfaces used for Lustre network communications. The network interfaces for the Lustre servers must therefore be provided with static IPv4 address allocations. Lustre clients can be assigned static IP addresses or use DHCP. Lustre does not support the use of IPv6 addresses.

Date and Time Synchronization with NTP

While not a strict requirement of Lustre itself, time synchronization across the cluster is very important for overall consistency and coherence. Many applications and file management tools rely on accurate, or at least consistent, time-stamp information. Using NTP to keep time synchronized across the network ensures that time stamps for files are read and written consistently, so that applications get accurate information regardless of where they run in the cluster.

In addition to maintaining consistency in the time stamp records for metadata inodes and file objects, ensuring consistent time representation across a distributed IT infrastructure greatly aids with forensic tasks, such as application debugging or investigations into system failure. When the hosts all report the same time and date, it is much easier to establish correlations between events reported in the logs for the hosts.

Identity Management

Identity management is an important component of IT infrastructure and cannot be overlooked in Lustre. Users and groups are managed by the host operating system, not by Lustre, and all UIDs and GIDs must be made globally consistent across all Lustre clients and metadata servers. Object storage servers don’t have the same requirement, because they do not need to perform permissions checking for Lustre file access.

Any identity services supported by the C library Name Service Switch (NSSwitch) will be compatible with Lustre installations. It is the administrator’s choice whether the UNIX identity databases (passwd, shadow, group and gshadow) are used, or a centralized system such as LDAP.

SELinux and Firewall Configuration

For Lustre versions prior to 2.8, and for Intel® Enterprise Edition for Lustre* software versions older than 3.0.0.0, SELinux is not supported and must be disabled across all servers and clients participating in a Lustre file system.

For ease of installation and management, it is suggested that firewall software is disabled. If there is a strong requirement for the operating system firewall to be in place, then make sure that port 988 is open to facilitate LNet communications on TCP/IP infrastructure, and that the NTP port (default: UDP/123) is also open to allow time synchronization.

On Lustre servers using a Pacemaker and Corosync HA framework, ports must be opened to enable Corosync communications. For RHEL/CentOS servers, a port must also be opened to support the pcsd helper daemon for the PCS cluster management software. Instructions on how to do this are provided in Red Hat Enterprise Linux HA Framework Configuration for Two-Node Lustre Cluster. Please refer to the documentation provided by the operating system vendor for further information on the configuration of high availability software on systems where the firewall is enabled.

Firewalls and SELinux add complexity and overheads to installations, and if communications issues appear when setting up an environment, disabling these features as a first step in debugging will often save time in identifying a root cause.

Operating System Software Package Management

Red Hat Enterprise Linux (RHEL) and CentOS

Red Hat Enterprise Linux and CentOS both rely heavily on the YUM package manager to install software. Software repositories can be local to the host in the form of a directory tree or a locally-mounted DVD-ROM or ISO, or made accessible from a network server, usually via the HTTP[S] protocol. Both Red Hat and CentOS maintain repositories accessible via the Internet. CentOS, being a free distribution with no subscription support, provides access to these repositories free of charge. Systems running Red Hat software require an active subscription to the Red Hat Content Delivery Network.

Note: The RHEL High Availability Add-on entitlement is required for Lustre systems that will make use of the Pacemaker and Corosync HA framework software in Red Hat's supported systems.

At a minimum, the following subscriptions are required for Lustre systems running RHEL 7-based systems:

[root@rh7z-mds1 ~]# subscription-manager repos --list-enabled
+----------------------------------------------------------+
    Available Repositories in /etc/yum.repos.d/redhat.repo
+----------------------------------------------------------+
Repo ID:   rhel-7-server-rpms
Repo Name: Red Hat Enterprise Linux 7 Server (RPMs)
Repo URL:  https://cdn.redhat.com/content/dist/rhel/server/7/$releasever/$basearch/os
Enabled:   1

Repo ID:   rhel-ha-for-rhel-7-server-rpms
Repo Name: Red Hat Enterprise Linux High Availability (for RHEL 7 Server) (RPMs)
Repo URL:  https://cdn.redhat.com/content/dist/rhel/server/7/$releasever/$basearch/highavailability/
           os
Enabled:   1

To register a subscription entitlement for a server, use the subscription-manager command. For example:

subscription-manager register --autosubscribe

This will automatically select the most suitable subscription for the registered server based on the entitlements granted to the licensee. For more information on managing Red Hat software subscriptions, see the relevant product documentation for the operating system release.

The subscription-manager command can also be used to configure specific RHEL package repositories:

subscription-manager repos --enable <repo name>

For example:

subscription-manager repos \
  --enable rhel-ha-for-rhel-7-server-rpms

Disabling a repository is achieved by using the --disable option in place of --enable:

subscription-manager repos --disable <repo name>

To get a list of the available RHEL repositories for a given subscription, use the following command:

subscription-manager repos --list

To get the list of currently enabled repos:

subscription-manager repos --list-enabled

SUSE Linux Enterprise Server (SLES)

This documentation was originally developed to provide instructions for Lustre for use with Red Hat Enteprise Linux (RHEL) or CentOS. SUSE Linux Enterprise Server (SLES), like Red Hat Enterprise Linux, uses an RPM-based package management system, although there are some significant differences between the two platforms. SLES configuration is not currently incorporated into the documentation.

The general structure of the process for managing software installation and configuration for SLES is similar to that of RHEL, but the tools often differ. In particular, SLES makes use of a command called zypper in place of Red Hat's yum; and Red Hat's pcs application is replaced by crmsh on SLES when managing high-availability clusters.

SUSE servers require installation of the SUSE Linux Enterprise High Availability Extension to enable HA failover configuration and management of Lustre services.

Note: SUSE Linux will mark self-compiled kernel modules as unsupported by the operating system. By default, SLES will refuse to load kernel modules that do not have the supported flag set. the following is an example of the error that will be returned when attempting to load an unsupported kernel module:

sl12sp2-b:~ # modprobe zfs
modprobe: ERROR: module 'zavl' is unsupported
modprobe: ERROR: Use --allow-unsupported or set allow_unsupported_modules 1 in
modprobe: ERROR: /etc/modprobe.d/10-unsupported-modules.conf
modprobe: ERROR: could not insert 'zfs': Operation not permitted
sl12sp2-b:~ # vi /etc/modprobe.d/10-unsupported-modules.conf 

To allow self-compiled kernel modules to be loaded in a SLES OS, add the following entry into /etc/modprobe.d/10-unsupported-modules.conf:

allow_unsupported_modules 1

More information is available from the SUSE documentation

Device Drivers for High Performance Network Fabrics

For the most part, the documentation assumes that the machine OS is using the device driver software supplied by the operating system vendor and does not make use of specific 3rd-party device drivers for network interfaces, storage,or other hardware. There are circumstances where the networking software stack provided by the operating system will need to be replaced by a specific vendor version. This requirement is most common when working with InfiniBand network fabrics, which use specific versions of the OFED software distribution from either the OpenFabrics Alliance or InfiniBand vendors. In this case, the Lustre network drivers need to be recompiled to make use of the 3rd-party network drivers.

Instructions for compiling Lustre from source are available in the Compiling Lustre wiki page, including how to compile Lustre with support for third party network device drivers (InfiniBand and Intel OPA).