Creating a Framework for High Availability with Pacemaker

Introduction
High availability, usually abbreviated to "HA", is a term used to describe systems and software frameworks that are designed to preserve application service availability even in the event of failure of a component of the system. The failed component might be software or hardware; the HA framework will attempt to respond to the failure such that the applications running within the framework continue to operate correctly.

While the number of discrete failure scenarios that might be catalogued is potentially very large, they generally fall into one of a very small number of categories:


 * 1) Failure of the application providing the service
 * 2) Failure of a software dependency upon which the application relies
 * 3) Failure of a hardware dependency upon which the application relies
 * 4) Failure of an external service or infrastructure component upon which the application or supporting framework relies

HA systems protect application availability by grouping sets of servers and software into cooperative units or clusters. HA clusters are typically groups of two or more servers, each running their own operating platform, that communicate with one another over a network connection. HA clusters will often have multi-ported, shared external storage, with each server in the cluster connected over redundant storage paths to the storage hardware.

A cluster software framework manages communication between the cluster participants (nodes). The framework will communicate the health of system hardware and application services between the nodes in the cluster and provide means to manage services and nodes, as well as react to changes in the cluster environment (e.g., server failure).

HA systems are characterized as typically having redundancy in the hardware configuration: two or more servers, each with two or more storage IO paths and often two or more network interfaces configured using bonding or link aggregation. Storage systems will often have similar redundancy characteristics, such as RAID data protection.

Measurements of availability are normally applied to the availability of the applications running on the HA cluster, rather than the hosting infrastructure. For example, loss of a physical server due to a component failure would trigger a failover or migration of the services that the server was providing to another node in the cluster. In this scenario, the outage duration would be the measure of time taken to migrate the applications to another node and restore the applications to running state. The service may be considered degraded until the failed component is repaired and restored, but the HA framework has avoided an ongoing outage.

On systems running an operating system based on Linux, the most commonly used HA cluster framework comprises two software applications used in combination: Pacemaker – to provide resource management – and Corosync – to provide cluster communications and low-level management, such as membership and quorum. Pacemaker can trace its genesis back to the original Linux HA project, called Heartbeat, while Corosync is derived from the OpenAIS project.

Pacemaker and Corosync are widely supported across the major Linux distributions, including Red Hat Enterprise Linux and SuSE Linux Enterprise Server. Red Hat Enterprise Linux version 6 used a very complex HA solution incorporating several other tools, and while this has been simplified since the release of RHEL 6.4, there is still some legacy software in the framework.

With the release of RHEL 7, the high-availability framework from Red Hat has been rationalized around Pacemaker and Corosync v2, simplifying the software environment. Red Hat also provides a command line tool called PCS (Pacemaker and Corosync Shell) that is available for both RHEL version 6 and version 7. PCS provides a consistent system management command interface for the high availability software and abstracts the underlying software implementation.

Note: Lustre does not absolutely need to be incorporated into an HA software framework such as Pacemaker, but doing so enables the operating platform to automatically make decisions about failover/migration of services without operator intervention. HA frameworks also help with general maintenance and management of application resources.

Red Hat Enterprise Linux and CentOS
Red Hat Enterprise Linux version 6 has a complex history with regard to the development and provision of HA software. Prior to version 6.4, Red Hat's high availability software was complex and difficult to install and maintain. With the release of RHEL 6.4 and in all subsequent RHEL 6 updates, this has been consolidated around three principal packages: Pacemaker, Corosync version 1, and CMAN. The software stack was further simplified in RHEL 7 to just Pacemaker and Corosync version 2.

Red Hat EL 6 HA clusters use Pacemaker to provide cluster resource management (CRM), while CMAN is used to provide cluster membership and quorum services. Corosync provides communications but no other services. CMAN is unique to Red Hat Enterprise Linux and is part of an older framework. In RHEL 7, CMAN is no longer required and its functionality is entirely accommodated by Corosync version 2, but for any HA clusters running RHEL 6, Red Hat stipulates the use of CMAN in Pacemaker clusters.

The PCS application (Pacemaker and Corosync Shell) was also introduced in RHEL 6.4 and is available in current releases of both RHEL 6 and 7. PCS simplifies the installation and configuration of HA clusters in Red Hat.

Hardware and Server Infrastructure Prerequisites
This article will demonstrate how to configure a Lustre high-availability building block using two servers and a dedicated external storage array that is connected to both servers. The two-node building block designs for metadata servers and object storage servers provide a suitable basis for deployment of a production-ready, high-availability Lustre parallel file system cluster.

Figure 1 shows a blue-print for typical high-availability Lustre server building blocks, one for the metadata and management services, and one for object storage.



Each server depicted in Figure 1 requires three network interfaces:
 * 1) A dedicated cluster communication network between paired servers, used as a Corosync communications ring. This can be a cross-over / point-to-point connection, or can be made via a switch.
 * 2) A management network or public interface connection. This will be used by the HA cluster as an additional communications ring for Corosync.
 * 3) Public interface, used for connection to the high performance data network – this is the network from which Lustre services will normally be accessed by client computers

A variation on this architecture, not specifically covered in this guide, has a single Corosync communications ring made from two network interfaces that are configured into a bond on a private network. The bond is created per the operating system documented process, and then added to the Corosync configuration.

RHEL / CentOS
In addition to the prerequisites previously described for Lustre, the operating system requires installation of the HA software suite. It may also be necessary to enable optional repositories. For RHEL systems, the  command can be used to enable the software entitlements for the HA software packages. For example:

 subscription-manager repos \ --enable rhel-ha-for-rhel-7-server-rpms \ --enable rhel-7-server-optional-rpms

or:

 subscription-manager repos \ --enable rhel-ha-for-rhel-6-server-rpms \ --enable rhel-6-server-optional-rpms

This step is not required for CentOS. Refer to the documentation for the operating system distribution for more complete information on enabling subscription entitlements.

RHEL / CentOS
 Login as the super-user on each of the servers in the proposed cluster and install the HA framework software:

 yum -y install pcs pacemaker corosync fence-agents [cman]

Note: The  package is only required for RHEL/CentOS 6 servers.  On each server, add a user account to be used for cluster management and set a password forthat account. The convention is to create a user account with the name. The  user should have been installed as part of the package installation (the account is created during installation of the   package). will make use of this account to facilitate cluster management: the  account is used to authenticate the command line application, , with the   configuration daemon running on each cluster node (  is used by the   application to manage distribution of commands and synchronize the cluster configuration between the nodes).

The following is taken from the  package postinstall script and shows the basic procedure for adding the hacluster account if it does not already exist:

 getent group haclient >/dev/null || groupadd -r haclient -g 189 getent passwd hacluster >/dev/null || useradd -r -g haclient -u 189 -s /sbin/nologin -c "cluster user" hacluster  Set a password for the  account. This must be set, and there is no default. Make the password the same on each cluster node:

 passwd hacluster  Modify or disable the firewall software on each server in the cluster. According to Red Hat, the following ports need to be enabled:

 TCP: ports 2224, 3121, 21064 UDP: ports 5405 </ul>

In RHEL 7, the firewall software can be configured to permit cluster traffic as follows:

<pre style="overflow-x:auto;"> firewall-cmd --permanent --add-service=high-availability firewall-cmd --add-service=high-availability

Verify the firewall configuration:

<pre style="overflow-x:auto;"> firewall-cmd --list-service </li> Lustre also requires port  to be open for incoming connections, and ports 1021-1023 for outgoing connections. </li>

Alternatively, disable the firewall completely.

For RHEL 7:

<pre style="overflow-x:auto;"> systemctl stop firewalld systemctl disable firewalld

</li>

And for RHEL 6: <pre style="overflow-x:auto;"> chkconfig iptables off chkconfig ip6tables off service iptables stop service ip6tables stop </li> </ol>

Note: When working with hostnames in Pacemaker and Corosync, always use the fully qualified domain name to reference cluster nodes.

Configure the PCS Daemon
<ol> Start the Pacemaker configuration daemon, , on all servers: <ul> RHEL 7: <pre style="overflow-x:auto;"> systemctl start pcsd.service systemctl enable pcsd.service </li> RHEL 6: <pre style="overflow-x:auto;"> service pcsd start chkconfig pcsd on </li> </ul> </li>

Verify that the service is running: <ul> RHEL 7: </li> RHEL 6: </li> </ul>

The following example is taken from a server running RHEL 7: <pre style="overflow-x:auto;"> [root@rh7z-mds1 ~]# systemctl start pcsd.service [root@rh7z-mds1 ~]# systemctl status pcsd.service &bull; pcsd.service - PCS GUI and remote configuration interface Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2016-04-13 01:30:52 EDT; 1min 11s ago Main PID: 29343 (pcsd) CGroup: /system.slice/pcsd.service ├─29343 /bin/sh /usr/lib/pcsd/pcsd start ├─29347 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /u... └─29348 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb

Apr 13 01:30:50 rh7z-mds1 systemd[1]: Starting PCS GUI and remote configuration interface... Apr 13 01:30:52 rh7z-mds1 systemd[1]: Started PCS GUI and remote configuration interface. </li>

Set up PCS authentication by executing the following command on just one of the cluster nodes:

<pre style="overflow-x:auto;"> pcs cluster auth [...] -u hacluster

For example:

<pre style="overflow-x:auto;"> [root@rh7z-mds1 ~]# pcs cluster auth \ >  rh7z-mds1.lfs.intl rh7z-mds2.lfs.intl \ >  -u hacluster Password: rh7z-mds2.lfs.intl: Authorized rh7z-mds1.lfs.intl: Authorized </li> </ol>

Create the Cluster Framework
The  command syntax is comprehensive, but not all of the functionality is available for RHEL 6 clusters. For example, the syntax for configuring the redundant ring protocol (RRP) for Corosync communications has only recently been added to RHEL 6.

Unless otherwise stated, the commands in this section are executed on only one node in the cluster.

The command line syntax is:

<pre style="overflow-x:auto;"> pcs cluster setup [ --start ] --name \ [ ] \ [ --transport {udpu|udp} ] \ [ --rrpmode {active|passive} ] \ [ --addr0 ] \ [ --addr1 ] \ [ --mcast0 ] [ --mcastport0 ] \ [ --mcast1 ] [ --mcastport1 ] \ [ --token ] [ --join ] \ [ ... ]

The node specification is a comma-separated list of hostnames or IP addresses for the host interfaces that will be used for Corosync’s communications. The cluster name is an arbitrary string and will default to  if the option is omitted.

It is possible to create a cluster configuration that comprises a single node. Additional nodes can be added to the cluster configuration at any time after the initial cluster has been created. This can be particularly useful when conducting a major operating system upgrade or server migration, where new servers need to be commissioned and it is necessary to minimize the duration of any outages.

For example, upgrading from RHEL 6 to RHEL 7 usually requires installing the new OS from a clean baseline: there is no "in-place" upgrade path. One way to work around this limitation is to upgrade the nodes one at a time, creating a new framework on the first upgraded node, stopping the resources on the old cluster and recreating them on the new cluster, then rebuilding the second node (and possibly any additional nodes).

The minimum requirement for cluster network communications is a single interface in the cluster configuration, but further interfaces can be added in order to increase the robustness of the HA cluster’s inter-node messaging. Communications are organized into rings, with each ring representing a separate network. Corosync can support multiple rings using a feature called the Redundant Ring Protocol (RRP).

There are two transport types supported by the PCS command:  (UDP unicast) and   (used for multicast). The  transport is recommended, as it is more efficient. , which is the default if no transport is specified&dagger;, should only be selected for circumstances where multicast cannot be used.

&dagger; Note: The default transport may differ, depending on the tools used to create the cluster configuration. According to the  man page, the default transport is. However, the  man page states that the default transport for RHEL 7 is   and the default for RHEL 6 is.

When using  (unicast), the Corosync communication rings are determined by the node specification, which is a comma-separated list of hostnames or IP addresses associated with the ring interfaces. For example:

<pre style="overflow-x:auto;"> pcs cluster setup --name demo node1-A,node1-B node2-A,node2-B

When the  (multicast) transport is chosen, the communications rings are defined by listing the networks upon which the Corosync multicast traffic will be carried, along with an optional list of the multicast addresses and ports that will be used. The rings are specified using the flags  and , for example:

<pre style="overflow-x:auto;"> pcs cluster setup --name demo node1-A node2-A \ --transport udp \ --addr0 10.70.0.0 --addr1 192.168.227.0

Use network addresses rather than host IP addresses for defining the  interfaces, as this will allow a common Corosync configuration to be used across all cluster nodes. If host IP addresses are used, additional manual configuration of Corosync will be required on the cluster nodes. Using network addresses will simplify setup and maintenance.

Note: Corosync cannot parse network addresses supplied in the CIDR (Classless Inter-Domain Routing) notation, e.g.,. Always use the full dot notation for specifying networks, e.g.  or.

The multicast addresses default to  for   and   for. The default multicast port is  for both multicast rings.

Corosync actually uses two multicast ports for communication in each ring. Ports are assigned in receive / send pairs, but only the receive port number is specified when configuring the cluster. The send port is one less than the receive port number (i.e. ). Make sure that there is a gap of at least 1 between assigned ports for a given multicast address in a subnet. Also, if there are several HA clusters with Corosync rings on the same subnet, each cluster will require a unique multicast port pair (different clusters can use the same multicast address, but not the same multicast ports).

For example, if there are six OSSs configured into three HA pairs, and an MDS pair, then each pair of servers will require a unique multicast port for each ring, and that there must be a gap of at least one between the port numbers. So, a range of,  ,  ,   might be suitable. A range of,  ,  ,   is not valid because there are no gaps between the numbers to accommodate the send port.

The redundant ring protocol (RRP) mode is specified by the  flag. Valid options are:,   and. If only one interface is defined, then  is automatically selected. If multiple rings are defined, either  or   must be used.

When set to, Corosync will send all messages across all interfaces simultaneously. Throughput is not as fast but overall latency is improved, especially when communicating over faulty or unreliable networks.

The  setting tells Corosync to use one interface, with the remaining interfaces available as standbys. If the interface fails, one of the standby interfaces will be used instead. This is also the default mode when creating an RRP configuration with.

In theory, the  mode provides better reliability across multiple interfaces, while   mode may be preferred when the messaging rate is more important. However, the manual page for  makes the choice clear and straightforward: only   mode is supported by   and it is the only mode that receives testing.

The  flag specifies the timeout in milliseconds after which a token is declared lost. The default is 1000 (1000ms or 1 second). The value represents the overall length of time before a token is declared lost. Any retransmits occur within this window.

On a Lustre server cluster, the default  timeout is generally too short to accommodate variation in response when servers are under heavy load. An otherwise healthy server that is busy can take longer to pass the token to the next server in the ring compared to when the server is idle; if the timeout is too short, the cluster might declare the token lost. If there are too many lost tokens from one node, the cluster framework will consider the node dead.

It is recommended that the value of the  parameter be increased significantly from the default. 20000ms is a reasonable, conservative value, but users will want to experiment to find the optimal setting. If the cluster seems to failover too frequently under load, but without any other symptoms, the value should be increased as a first step to see if it alleviates the problem.

PCS Configuration Examples
The following example uses the simplest invocation to create a cluster framework configuration comprising two nodes. This example does not specify a transport, so the default of  will be chosen by PCS for cluster communications on RHEL 7, and   will be chosed for RHEL 6:

<pre style="overflow-x:auto;"> pcs cluster setup --name demo-MDS \ rh7z-mds1.lfs.intl rh7z-mds2.lfs.intl

The next example again uses  but incorporates a second, redundant, ring for cluster communications:

<pre style="overflow-x:auto;"> pcs cluster setup --name demo-MDS-1-2 \ rh7z-mds1.lfs.intl,192.168.227.11 \ rh7z-mds2.lfs.intl,192.168.227.12

The hostname specification is comma-separated, and the node interfaces are specified in ring priority order. The first interface in the list will join, the second interface will join. In the above example, the  interfaces correspond to the hostname   for the first node, and   for the second node. The  interfaces are   and   for node 1 and node 2 respectively. One could also add the IP addresses for ring1 into the hosts table or DNS if there is a preference to refer to the interfaces by name rather than by address.

The next example demonstrates the syntax for creating a two-node cluster with two Corosync communications rings using  multicast:

<pre style="overflow-x:auto;"> pcs cluster setup --name demo-MDS-1-2 \ rh7z-mds1.lfs.intl rh7z-mds2.lfs.intl \ --transport udp \ --rrpmode passive \ --token 20000 \ --addr0 10.70.0.0 \ --addr1 192.168.227.0 \ --mcast0 239.255.1.1 --mcastport0 49152 \ --mcast1 239.255.2.1 --mcastport1 49152

This example uses the preferred syntax and configuration for a two-node HA cluster. The names, IP addresses, etc. will be different for each individual installation, but the structure is consistent and is a good template to copy.

Note: The above example will create different results when run on RHEL 6 versus RHEL 7. This is because RHEL 6 uses an additional package called CMAN, which assumes some of the responsibilities that on RHEL 7 are managed entirely by Corosync. Because of this difference, RHEL 6 clusters may behave a little differently to RHEL 7 clusters, even though the commands used to configure each might be identical.

Note: If there are any unexpected or unexplained side-effects when running with RHEL 6 clusters, try simplifying the configuration. For example, try changing the transport from  multicast to the simpler   unicast configuration, and use the comma-separated syntax to define the node addresses for RRP, rather than using the   flags.

Changing the Default Security Key
Changing the default key used by Corosync for communications is optional, but will improve the overall security of the cluster installation. The different operating system distributions and releases have different procedures for managing the cluster framework authentication key, so the following information is provided for information only. Refer to the OS vendor’s documentation for up to date instructions.

The default key can be changed by running the command. The key will be written to the file. Run the command on a single host in the cluster, then copy the resulting key to each node. The file must be owned by the root user and given read-only permissions. Example output follows:

<pre style="overflow-x:auto;"> [root@rh7z-mds1 ~]# corosync-keygen Corosync Cluster Engine Authentication key generator. Gathering 1024 bits for key from /dev/random. Press keys on your keyboard to generate entropy. Writing corosync key to /etc/corosync/authkey. [root@rh7z-mds1 ~]# ll /etc/corosync/authkey -r 1 root root 128 Apr 13 23:48 /etc/corosync/authkey

Note: If the key is not the same for every node in the cluster, then they will not be able to communicate with each other to form a cluster. For hosts running Corosync version 2, creating the key and copying to all the nodes should be sufficient. For hosts running RHEL 6 with the CMAN software, the cluster framework also needs to be made aware of the new key:

<pre style="overflow-x:auto;"> ccs -f /etc/cluster/cluster.conf \ --setcman keyfile="/etc/corosync/authkey"

Starting and Stopping the cluster framework
To start the cluster framework, issue the following command from one of the cluster nodes:

<pre style="overflow-x:auto;"> pcs cluster start [ [ ...] | --all ]

To start the cluster framework on the current node only, run the pcs cluster start command without any additional options. To start the cluster on all nodes, supply the  flag, and to limit the startup to a specific set of nodes, list them individually on the command line.

To shut down part or all of the cluster framework, issue the   command:

<pre style="overflow-x:auto;"> pcs cluster stop [ [ ...] | --all ]

The parameters for the  command are the same as the paramaters for.

Do not configure the cluster software to run automatically on system boot. If an error occurs during the operation of the cluster and a node is isolated and powered off or rebooted as a consequence, it is imperative that the node be repaired, reviewed and restored to a healthy state before committing it back to the cluster framework. Until the root cause of the fault has been isolated and corrected, adding a node back into the framework may be dangerous and could put services and data at risk.

For this reason, ensure that the  and   services are disabled in the sysvinit or systemd boot sequences:

RHEL 7:

<pre style="overflow-x:auto;"> systemctl disable corosync.service systemctl disable pacemaker.service

RHEL 6:

<pre style="overflow-x:auto;"> chkconfig cman off chkconfig corosync off chkconfig pacemaker off

However, it is safe to keep the PCS helper daemon,, enabled.

Set Global Cluster Properties
When the cluster framework has been created and is running on at least one of the nodes, set the following global defaults for properties and resources.

The  property defines how the cluster will behave when there is a loss of quorum. For two-node HA clusters, this property should be set to, which tells the cluster to keep running. When there are more than two nodes, set the value of the property to.

<pre style="overflow-x:auto;"> pcs property set no-quorum-policy=ignore
 * 1)    For 2 node cluster:
 * 2)        no_quorum_policy=ignore
 * 3)    For > 2 node cluster:
 * 4)        no_quorum_policy=stop

The  property tells the cluster whether or not there are fencing agents configured on the cluster. If set to  (strongly recommended and essential for any production deployment), the cluster will try to fence any nodes that are running resources that cannot be stopped. The cluster will also refuse to start any resources unless there is at least one STONITH resource configured.

The property should only ever be set to  when the cluster will be used for demonstration purposes.

<pre style="overflow-x:auto;"> pcs property set stonith-enabled=true
 * 1) values: true (default) or false

When  is set equal to , this indicates that all of the nodes in the cluster have equivalent configurations and are equally capable of running any of the defined resources. For a simple two-node cluster with shared storage, as is commonly used for Lustre services,  should nearly always be set to.

<pre style="overflow-x:auto;"> pcs property set symmetric-cluster=true
 * 1) values: true (default) or false

is a resource property that defines how much a resource prefers to stay on the node where it is currently running. The higher the value, the more sticky the resource, and the less likely it is to migrate automatically to its most preferred location if it is running on a non-preferred / non-default node in the cluster and the resource is healthy. affects the behaviour of.

If a resource is running on a non-preferred node, and the resource is healthy, it will not be migrated automatically back to its preferred node. If the stickiness is higher than the preference score of a resource, the resource will not move automatically while the machine it is running on remains healthy.

The default value is 0 (zero). It's common to set the value greater than 100 as an indicator that the resource should not be disrupted by migrating it automatically if the resource and the node it is running on are both healthy.

<pre style="overflow-x:auto;"> pcs resource defaults resource-stickiness=200

Verify cluster configuration and status
To view overall cluster status:

<pre style="overflow-x:auto;"> pcs status [ | --help]

For example:

<pre style="overflow-x:auto;"> [root@rh7z-mds1 ~]# pcs status Cluster name: demo-MDS-1-2 WARNING: no stonith devices and stonith-enabled is not false Last updated: Thu Apr 14 00:58:29 2016		Last change: Wed Apr 13 21:16:13 2016 by hacluster via crmd on rh7z-mds1.lfs.intl Stack: corosync Current DC: rh7z-mds1.lfs.intl (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum 2 nodes and 0 resources configured

Online: [ rh7z-mds1.lfs.intl rh7z-mds2.lfs.intl ]

Full list of resources:

PCSD Status: rh7z-mds1.lfs.intl: Online rh7z-mds2.lfs.intl: Online

Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled

To review the cluster configuration:

<pre style="overflow-x:auto;"> pcs cluster cib

The output will be in the CIB XML format.

The Corosync run-time configuration can also be reviewed:

<ul> RHEL 7 / Corosync v2: </li> RHEL 6 / Corosync v1: </li> </ul>

This can be very useful when verifying specific changes to the cluster communications configuration, such as the RRP setup. For example:

<pre style="overflow-x:auto;"> [root@rh7z-mds1 ~]# corosync-cmapctl | grep interface totem.interface.0.bindnetaddr (str) = 10.70.0.0 totem.interface.0.mcastaddr (str) = 239.255.1.1 totem.interface.0.mcastport (u16) = 49152 totem.interface.1.bindnetaddr (str) = 192.168.227.0 totem.interface.1.mcastaddr (str) = 239.255.2.1 totem.interface.1.mcastport (u16) = 49152

To check the status of the Corosync rings:

<pre style="overflow-x:auto;"> [root@rh7z-mds1 ~]# corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id	= 10.70.227.11 status	= ring 0 active with no faults RING ID 1 id	= 192.168.227.11 status	= ring 1 active with no faults

To get the cluster status from CMAN on RHEL 6 clusters:

<pre style="overflow-x:auto;"> [root@rh6-mds1 ~]# cman_tool status Version: 6.2.0 Config Version: 14 Cluster Name: demo-MDS-1-2 Cluster Id: 28594 Cluster Member: Yes Cluster Generation: 24 Membership state: Cluster-Member Nodes: 2 Expected votes: 1 Total votes: 2 Node votes: 1 Quorum: 1 Active subsystems: 9 Flags: 2node Ports Bound: 0 Node name: rh6-mds1.lfs.intl Node ID: 1 Multicast addresses: 239.255.1.1 239.255.2.1 Node addresses: 10.70.206.11 192.168.206.11

If the cluster appears to start, but there are errors reported by  and in the syslog related to Corosync totem, then there may be a conflict in the multicast address configuration with another cluster or service on the same subnet. A typical error in the syslog would look similar to the following output:

<pre style="overflow-x:auto;"> Apr 13 22:11:15 rh67-pe corosync[26370]:  [TOTEM ] Received message has invalid digest... ignoring. Apr 13 22:11:15 rh67-pe corosync[26370]:  [TOTEM ] Invalid packet data

These errors indicate that the node has intercepted traffic intended for a node on a different cluster.

Also be careful in the definition of the network and multicast addresses. will often create the configuration without complaint, and the cluster framework may even load without reporting any errors to the command shell. However, a misconfiguration may lead to a failure in the RRP that it not immediately obvious. Look for unexpected information in the Corosync database and the cluster CIB.

For example, if one of the cluster node addresses shows up as  or , this indicates a problem with the addresses supplied to   with the   or   flags.

Next Steps

 * Lustre Server Fault Isolation with Pacemaker Node Fencing
 * Creating Pacemaker Resources for Lustre Storage Services