Creating a Framework for High Availability with Pacemaker

From Lustre Wiki
Jump to navigation Jump to search

High availability, usually abbreviated to "HA", is a term used to describe systems and software frameworks that are designed to preserve application service availability even in the event of failure of a component of the system. The failed component might be software or hardware; the HA framework will attempt to respond to the failure such that the applications running within the framework continue to operate correctly.

While the number of discrete failure scenarios that might be catalogued is potentially very large, they generally fall into one of a very small number of categories:

  1. Failure of the application providing the service
  2. Failure of a software dependency upon which the application relies
  3. Failure of a hardware dependency upon which the application relies
  4. Failure of an external service or infrastructure component upon which the application or supporting framework relies

HA systems protect application availability by grouping sets of servers and software into cooperative units or clusters. HA clusters are typically groups of two or more servers, each running their own operating platform, that communicate with one another over a network connection. HA clusters will often have multi-ported, shared external storage, with each server in the cluster connected over redundant storage paths to the storage hardware.

A cluster software framework manages communication between the cluster participants (nodes). The framework will communicate the health of system hardware and application services between the nodes in the cluster and provide means to manage services and nodes, as well as react to changes in the cluster environment (e.g., server failure).

HA systems are characterized as typically having redundancy in the hardware configuration: two or more servers, each with two or more storage IO paths and often two or more network interfaces configured using bonding or link aggregation. Storage systems will often have similar redundancy characteristics, such as RAID data protection.

Measurements of availability are normally applied to the availability of the applications running on the HA cluster, rather than the hosting infrastructure. For example, loss of a physical server due to a component failure would trigger a failover or migration of the services that the server was providing to another node in the cluster. In this scenario, the outage duration would be the measure of time taken to migrate the applications to another node and restore the applications to running state. The service may be considered degraded until the failed component is repaired and restored, but the HA framework has avoided an ongoing outage.

On systems running an operating system based on Linux, the most commonly used HA cluster framework comprises two software applications used in combination: Pacemaker – to provide resource management – and Corosync – to provide cluster communications and low-level management, such as membership and quorum. Pacemaker can trace its genesis back to the original Linux HA project, called Heartbeat, while Corosync is derived from the OpenAIS project.

Pacemaker and Corosync are widely supported across the major Linux distributions, including Red Hat Enterprise Linux and SuSE Linux Enterprise Server. Red Hat Enterprise Linux version 6 used a very complex HA solution incorporating several other tools, and while this has been simplified since the release of RHEL 6.4, there is still some legacy software in the framework.

With the release of RHEL 7, the high-availability framework from Red Hat has been rationalized around Pacemaker and Corosync v2, simplifying the software environment. Red Hat also provides a command line tool called PCS (Pacemaker and Corosync Shell) that is available for both RHEL version 6 and version 7. PCS provides a consistent system management command interface for the high availability software and abstracts the underlying software implementation.

Note: Lustre does not absolutely need to be incorporated into an HA software framework such as Pacemaker, but doing so enables the operating platform to automatically make decisions about failover/migration of services without operator intervention. HA frameworks also help with general maintenance and management of application resources.

HA Framework Configuration for a Two-Node Cluster

Red Hat Enterprise Linux and CentOS

Red Hat Enterprise Linux version 6 has a complex history with regard to the development and provision of HA software. Prior to version 6.4, Red Hat's high availability software was complex and difficult to install and maintain. With the release of RHEL 6.4 and in all subsequent RHEL 6 updates, this has been consolidated around three principal packages: Pacemaker, Corosync version 1, and CMAN. The software stack was further simplified in RHEL 7 to just Pacemaker and Corosync version 2.

Red Hat EL 6 HA clusters use Pacemaker to provide cluster resource management (CRM), while CMAN is used to provide cluster membership and quorum services. Corosync provides communications but no other services. CMAN is unique to Red Hat Enterprise Linux and is part of an older framework. In RHEL 7, CMAN is no longer required and its functionality is entirely accommodated by Corosync version 2, but for any HA clusters running RHEL 6, Red Hat stipulates the use of CMAN in Pacemaker clusters.

The PCS application (Pacemaker and Corosync Shell) was also introduced in RHEL 6.4 and is available in current releases of both RHEL 6 and 7. PCS simplifies the installation and configuration of HA clusters in Red Hat.

Hardware and Server Infrastructure Prerequisites

This article will demonstrate how to configure a Lustre high-availability building block using two servers and a dedicated external storage array that is connected to both servers. The two-node building block designs for metadata servers and object storage servers provide a suitable basis for deployment of a production-ready, high-availability Lustre parallel file system cluster.

Figure 13 shows a blue-print for typical high-availability Lustre server building blocks, one for the metadata and management services, and one for object storage.

Figure 13. Lustre Server High-Availability Building Blocks

Each server depicted in Figure 13 requires three network interfaces:

  1. A dedicated cluster communication network between paired servers, used as a Corosync communications ring. This can be a cross-over / point-to-point connection, or can be made via a switch.
  2. A management network or public interface connection. This will be used by the HA cluster as an additional communications ring for Corosync.
  3. Public interface, used for connection to the high performance data network – this is the network from which Lustre services will normally be accessed by client computers

A variation on this architecture, not specifically covered in this guide, has a single Corosync communications ring made from two network interfaces that are configured into a bond on a private network. The bond is created per the operating system documented process, and then added to the Corosync configuration.

Software Prerequisites

RHEL / CentOS

In addition to the prerequisites previously described for Lustre, the operating system requires installation of the HA software suite. It may also be necessary to enable optional repositories. For RHEL systems, the subscription-manager command can be used to enable the software entitlements for the HA software packages. For example:

subscription-manager repos \
  --enable rhel-ha-for-rhel-7-server-rpms \
  --enable rhel-7-server-optional-rpms

or:

subscription-manager repos \
  --enable rhel-ha-for-rhel-6-server-rpms \
  --enable rhel-6-server-optional-rpms

This step is not required for CentOS. Refer to the documentation for the operating system distribution for more complete information on enabling subscription entitlements.

Install the HA software

RHEL / CentOS

  1. Login as the super-user (root) on each of the servers in the proposed cluster and install the HA framework software:
    yum -y install pcs pacemaker corosync fence-agents [cman]
    

    Note: The cman package is only required for RHEL/CentOS 6 servers.

  2. On each server, add a user account to be used for cluster management and set a password forthat account. The convention is to create a user account with the name hacluster. The hacluster user should have been installed as part of the package installation (the account is created during installation of the pacemaker-libs package). pcs will make use of this account to facilitate cluster management: the hacluster account is used to authenticate the command line application, pcs, with the pcsd configuration daemon running on each cluster node (pcsd is used by the pcs application to manage distribution of commands and synchronize the cluster configuration between the nodes).

    The following is taken from the pacemaker-libs package postinstall script and shows the basic procedure for adding the hacluster account if it does not already exist:

    getent group haclient >/dev/null || groupadd -r haclient -g 189
    getent passwd hacluster >/dev/null || useradd -r -g haclient -u 189 -s /sbin/nologin -c "cluster user" hacluster
    
  3. Set a password for the hacluster account. This must be set, and there is no default. Make the password the same on each cluster node:
    passwd hacluster
    
  4. Modify or disable the firewall software on each server in the cluster. According to Red Hat, the following ports need to be enabled:
    • TCP: ports 2224, 3121, 21064
    • UDP: ports 5405

    In RHEL 7, the firewall software can be configured to permit cluster traffic as follows:

    firewall-cmd --permanent --add-service=high-availability
    firewall-cmd --add-service=high-availability
    

    Verify the firewall configuration:

    firewall-cmd --list-service
    
  5. Lustre also requires port 988 to be open for incoming connections, and ports 1021-1023 for outgoing connections.
  6. Alternatively, disable the firewall completely.

    For RHEL 7:

    systemctl stop firewalld
    systemctl disable firewalld
    
  7. And for RHEL 6:

    chkconfig iptables off
    chkconfig ip6tables off
    service iptables stop
    service ip6tables stop
    

Note: When working with hostnames in Pacemaker and Corosync, always use the fully qualified domain name to reference cluster nodes.

Configure the Core HA Framework – PCS Instructions

Configure the PCS Daemon

  1. Start the Pacemaker configuration daemon, pcsd, on all servers:
    • RHEL 7: systemctl start pcsd.service
    • RHEL 6: service pcsd start
  2. Verify that the service is running:
    • RHEL 7: systemctl status pcsd.service
    • RHEL 6: service pcsd status

    The following example is taken from a server running RHEL 7:

    [root@rh7z-mds1 ~]# systemctl start pcsd.service
    [root@rh7z-mds1 ~]# systemctl status pcsd.service
    • pcsd.service - PCS GUI and remote configuration interface
       Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
       Active: active (running) since Wed 2016-04-13 01:30:52 EDT; 1min 11s ago
     Main PID: 29343 (pcsd)
       CGroup: /system.slice/pcsd.service
               ├─29343 /bin/sh /usr/lib/pcsd/pcsd start
               ├─29347 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /u...
               └─29348 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
    
    Apr 13 01:30:50 rh7z-mds1 systemd[1]: Starting PCS GUI and remote configuration interface...
    Apr 13 01:30:52 rh7z-mds1 systemd[1]: Started PCS GUI and remote configuration interface.
    
  3. Set up PCS authentication by executing the following command on just one of the cluster nodes:
    pcs cluster auth <node 1> <node 2> [...] -u hacluster
    

    For example:

    [root@rh7z-mds1 ~]# pcs cluster auth \
    >   rh7z-mds1.lfs.intl rh7z-mds2.lfs.intl \
    >   -u hacluster
    Password: 
    rh7z-mds2.lfs.intl: Authorized
    rh7z-mds1.lfs.intl: Authorized
    

Create the Cluster Framework

The pcs command syntax is comprehensive, but not all of the functionality is available for RHEL 6 clusters. For example, the syntax for configuring the redundant ring protocol (RRP) for Corosync communications has only recently been added to RHEL 6.

Unless otherwise stated, the commands in this section are executed on only one node in the cluster.

The command line syntax is:

pcs cluster setup [ --start ] --name <cluster name> \
  <node 1 specification> [ <node 2 specification> ] \
  [ --transport {udpu|udp} ] \
  [ --rrpmode {active|passive} ] \
  [ --addr0 <address> ] \
  [ --addr1 <address> ] \
  [ --mcast0 <address> ] [ --mcastport0 <port> ] \
  [ --mcast1 <address> ] [ --mcastport1 <port> ] \
  [ --token <timeout> ] [ --join <timeout> ] \
  [ ... ]

The node specification is a comma-separated list of hostnames or IP addresses for the host interfaces that will be used for Corosync’s communications. The cluster name is an arbitrary string and will default to pcmk if the option is omitted.

It is possible to create a cluster configuration that comprises a single node. Additional nodes can be added to the cluster configuration at any time after the initial cluster has been created. This can be particularly useful when conducting a major operating system upgrade or server migration, where new servers need to be commissioned and it is necessary to minimize the duration of any outages.

For example, upgrading from RHEL 6 to RHEL 7 usually requires installing the new OS from a clean baseline: there is no "in-place" upgrade path. One way to work around this limitation is to upgrade the nodes one at a time, creating a new framework on the first upgraded node, stopping the resources on the old cluster and recreating them on the new cluster, then rebuilding the second node (and possibly any additional nodes).

The minimum requirement for cluster network communications is a single interface in the cluster configuration, but further interfaces can be added in order to increase the robustness of the HA cluster’s inter-node messaging. Communications are organized into rings, with each ring representing a separate network. Corosync can support multiple rings using a feature called the Redundant Ring Protocol (RRP).

There are two transport types supported by the PCS command: udpu (UDP unicast) and udp (used for multicast). The udp transport is recommended, as it is more efficient. udpu, which is the default if no transport is specified†, should only be selected for circumstances where multicast cannot be used.

Note: The default transport may differ, depending on the tools used to create the cluster configuration. According to the corosync.conf(5) man page, the default transport is udp. However, the pcs(8) man page states that the default transport for RHEL 7 is udpu and the default for RHEL 6 is udp.

When using udpu (unicast), the Corosync communication rings are determined by the node specification, which is a comma-separated list of hostnames or IP addresses associated with the ring interfaces. For example:

pcs cluster setup --name demo node1-A,node1-B node2-A,node2-B

When the udp (multicast) transport is chosen, the communications rings are defined by listing the networks upon which the Corosync multicast traffic will be carried, along with an optional list of the multicast addresses and ports that will be used. The rings are specified using the flags --addr0 and --addr1, for example:

pcs cluster setup --name demo node1-A node2-A \
  --transport udp \
  --addr0 10.70.0.0 --addr1 192.168.227.0

Use network addresses rather than host IP addresses for defining the udp interfaces, as this will allow a common Corosync configuration to be used across all cluster nodes. If host IP addresses are used, additional manual configuration of Corosync will be required on the cluster nodes. Using network addresses will simplify setup and maintenance.

Note: Corosync cannot parse network addresses supplied in the CIDR (Classless Inter-Domain Routing) notation, e.g., 10.70/16. Always use the full dot notation for specifying networks, e.g. 10.70.0.0 or 192.168.227.0.

The multicast addresses default to 239.255.1.1 for ring0 and 255.239.2.1 for ring1. The default multicast port is 5405 for both multicast rings.

Corosync actually uses two multicast ports for communication in each ring. Ports are assigned in receive / send pairs, but only the receive port number is specified when configuring the cluster. The send port is one less than the receive port number (i.e. send port = mcastport - 1). Make sure that there is a gap of at least 1 between assigned ports for a given multicast address in a subnet. Also, if there are several HA clusters with Corosync rings on the same subnet, each cluster will require a unique multicast port pair (different clusters can use the same multicast address, but not the same multicast ports).

For example, if there are six OSSs configured into three HA pairs, and an MDS pair, then each pair of servers will require a unique multicast port for each ring, and that there must be a gap of at least one between the port numbers. So, a range of 49152, 49154, 49156, 49158 might be suitable. A range of 49152, 49153, 49154, 49155 is not valid because there are no gaps between the numbers to accommodate the send port.

The redundant ring protocol (RRP) mode is specified by the --rrpmode flag. Valid options are: none, active and passive. If only one interface is defined, then none is automatically selected. If multiple rings are defined, either active or passive must be used.

When set to active, Corosync will send all messages across all interfaces simultaneously. Throughput is not as fast but overall latency is improved, especially when communicating over faulty or unreliable networks.

The passive setting tells Corosync to use one interface, with the remaining interfaces available as standbys. If the interface fails, one of the standby interfaces will be used instead. This is also the default mode when creating an RRP configuration with pcs.

In theory, the active mode provides better reliability across multiple interfaces, while passive mode may be preferred when the messaging rate is more important. However, the manual page for pcs makes the choice clear and straightforward: only passive mode is supported by pcs and it is the only mode that receives testing.

The --token flag specifies the timeout in milliseconds after which a token is declared lost. The default is 1000 (1000ms or 1 second). The value represents the overall length of time before a token is declared lost. Any retransmits occur within this window.

On a Lustre server cluster, the default token timeout is generally too short to accommodate variation in response when servers are under heavy load. An otherwise healthy server that is busy can take longer to pass the token to the next server in the ring compared to when the server is idle; if the timeout is too short, the cluster might declare the token lost. If there are too many lost tokens from one node, the cluster framework will consider the node dead.

It is recommended that the value of the token parameter be increased significantly from the default. 20000ms is a reasonable, conservative value, but users will want to experiment to find the optimal setting. If the cluster seems to failover too frequently under load, but without any other symptoms, the value should be increased as a first step to see if it alleviates the problem.

PCS Configuration Examples

The following example uses the simplest invocation to create a cluster framework configuration comprising two nodes. This example does not specify a transport, so the default of udpu will be chosen by PCS for cluster communications on RHEL 7, and udp will be chosed for RHEL 6:

pcs cluster setup --name demo-MDS \
  rh7z-mds1.lfs.intl rh7z-mds2.lfs.intl

The next example again uses udpu but incorporates a second, redundant, ring for cluster communications:

pcs cluster setup --name demo-MDS-1-2 \
  rh7z-mds1.lfs.intl,192.168.227.11 \
  rh7z-mds2.lfs.intl,192.168.227.12

The hostname specification is comma-separated, and the node interfaces are specified in ring priority order. The first interface in the list will join ring0, the second interface will join ring1. In the above example, the ring0 interfaces correspond to the hostname rh7z-mds1.lfs.intl for the first node, and rh7z-mds2.lfs.intl for the second node. The ring1 interfaces are 192.168.227.11 and 192.168.227.12 for node 1 and node 2 respectively. One could also add the IP addresses for ring1 into the hosts table or DNS if there is a preference to refer to the interfaces by name rather than by address.

The next example demonstrates the syntax for creating a two-node cluster with two Corosync communications rings using udp multicast:

pcs cluster setup --name demo-MDS-1-2 \
  rh7z-mds1.lfs.intl rh7z-mds2.lfs.intl \
  --transport udp \
  --rrpmode passive \
  --token 20000 \
  --addr0 10.70.0.0 \
  --addr1 192.168.227.0 \
  --mcast0 239.255.1.1 --mcastport0 49152 \
  --mcast1 239.255.2.1 --mcastport1 49152

This example uses the preferred syntax and configuration for a two-node HA cluster. The names, IP addresses, etc. will be different for each individual installation, but the structure is consistent and is a good template to copy.

Note: The above example will create different results when run on RHEL 6 versus RHEL 7. This is because RHEL 6 uses an additional package called CMAN, which assumes some of the responsibilities that on RHEL 7 are managed entirely by Corosync. Because of this difference, RHEL 6 clusters may behave a little differently to RHEL 7 clusters, even though the commands used to configure each might be identical.

Note: If there are any unexpected or unexplained side-effects when running with RHEL 6 clusters, try simplifying the configuration. For example, try changing the transport from udp multicast to the simpler udpu unicast configuration, and use the comma-separated syntax to define the node addresses for RRP, rather than using the --addr[0,1] flags.

Changing the Default Security Key

Changing the default key used by Corosync for communications is optional, but will improve the overall security of the cluster installation. The different operating system distributions and releases have different procedures for managing the cluster framework authentication key, so the following information is provided for information only. Refer to the OS vendor’s documentation for up to date instructions.

The default key can be changed by running the command corosync-keygen. The key will be written to the file /etc/corosync/authkey. Run the command on a single host in the cluster, then copy the resulting key to each node. The file must be owned by the root user and given read-only permissions. Example output follows:

[root@rh7z-mds1 ~]# corosync-keygen 
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Writing corosync key to /etc/corosync/authkey.
[root@rh7z-mds1 ~]# ll /etc/corosync/authkey 
-r-------- 1 root root 128 Apr 13 23:48 /etc/corosync/authkey

Note: If the key is not the same for every node in the cluster, then they will not be able to communicate with each other to form a cluster. For hosts running Corosync version 2, creating the key and copying to all the nodes should be sufficient. For hosts running RHEL 6 with the CMAN software, the cluster framework also needs to be made aware of the new key:

ccs -f /etc/cluster/cluster.conf \
  --setcman keyfile="/etc/corosync/authkey"

Starting and Stopping the cluster framework

To start the cluster framework, issue the following command from one of the cluster nodes:

pcs cluster start [ <node> [<node> ...] | --all ]

To start the cluster framework on the current node only, run the pcs cluster start command without any additional options. To start the cluster on all nodes, supply the --all flag, and to limit the startup to a specific set of nodes, list them individually on the command line.

To shut down part or all of the cluster framework, issue the pcs stop command:

pcs cluster stop [ <node> [<node> ...] | --all ]

The parameters for the pcs stop command are the same as the paramaters for pcs start.

Do not configure the cluster software to run automatically on system boot. If an error occurs during the operation of the cluster and a node is isolated and powered off or rebooted as a consequence, it is imperative that the node be repaired, reviewed and restored to a healthy state before committing it back to the cluster framework. Until the root cause of the fault has been isolated and corrected, adding a node back into the framework may be dangerous and could put services and data at risk.

For this reason, ensure that the pacemaker and corosync services are disabled in the sysvinit or systemd boot sequences:

RHEL 7:

systemctl disable corosync.service
systemctl disable pacemaker.service

RHEL 6:

chkconfig cman off 
chkconfig corosync off
chkconfig pacemaker off

However, it is safe to keep the PCS helper daemon, pcsd, enabled.

Set Global Cluster Properties

When the cluster framework has been created and is running on at least one of the nodes, set the following global defaults for properties and resources.

no_quorum_policy

The no_quorum_policy property defines how the cluster will behave when there is a loss of quorum. For two-node HA clusters, this property should be set to ignore, which tells the cluster to keep running. When there are more than two nodes, set the value of the property to stop.

###    For 2 node cluster:
###        no_quorum_policy=ignore
###    For > 2 node cluster:
###        no_quorum_policy=stop
pcs property set no-quorum-policy=ignore

stonith-enabled

The stonith-enabled property tells the cluster whether or not there are fencing agents configured on the cluster. If set to true (strongly recommended and essential for any production deployment), the cluster will try to fence any nodes that are running resources that cannot be stopped. The cluster will also refuse to start any resources unless there is at least one STONITH resource configured.

The property should only ever be set to false when the cluster will be used for demonstration purposes.

### values: true (default) or false
pcs property set stonith-enabled=true

symmetric-cluster

When symmetric-cluster is set equal to true, this indicates that all of the nodes in the cluster have equivalent configurations and are equally capable of running any of the defined resources. For a simple two-node cluster with shared storage, as is commonly used for Lustre services, symmetric-cluster should nearly always be set to true.

### values: true (default) or false
pcs property set symmetric-cluster=true

resource-stickiness

resource-stickiness is a resource property that defines how much a resource prefers to stay on the node where it is currently running. The higher the value, the more sticky the resource, and the less likely it is to migrate automatically to its most preferred location if it is running on a non-preferred / non-default node in the cluster and the resource is healthy. resource-stickiness affects the behaviour of auto-failback.

If a resource is running on a non-preferred node, and the resource is healthy, it will not be migrated automatically back to its preferred node. If the stickiness is higher than the preference score of a resource, the resource will not move automatically while the machine it is running on remains healthy.

The default value is 0 (zero). It's common to set the value greater than 100 as an indicator that the resource should not be disrupted by migrating it automatically, if the resource and the node it is running on are both healthy.

pcs resource defaults resource-stickiness=200

Verify cluster configuration and status

To view overall cluster status:

pcs status [ <options> | --help]

For example:

[root@rh7z-mds1 ~]# pcs status
Cluster name: demo-MDS-1-2
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Thu Apr 14 00:58:29 2016		Last change: Wed Apr 13 21:16:13 2016 by hacluster via crmd on rh7z-mds1.lfs.intl
Stack: corosync
Current DC: rh7z-mds1.lfs.intl (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 0 resources configured

Online: [ rh7z-mds1.lfs.intl rh7z-mds2.lfs.intl ]

Full list of resources:


PCSD Status:
  rh7z-mds1.lfs.intl: Online
  rh7z-mds2.lfs.intl: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
</code>

To review the cluster configuration:

<pre style="overflow-x:auto;">
pcs cluster cib

The output will be in the CIB XML format.

The Corosync run-time configuration can also be reviewed:

  • RHEL 7 / Corosync v2: corosync-cmapctl
  • RHEL 6 / Corosync v1: corosync-objctl

This can be very useful when verifying specific changes to the cluster communications configuration, such as the RRP setup. For example:

[root@rh7z-mds1 ~]# corosync-cmapctl | grep interface
totem.interface.0.bindnetaddr (str) = 10.70.0.0
totem.interface.0.mcastaddr (str) = 239.255.1.1
totem.interface.0.mcastport (u16) = 49152
totem.interface.1.bindnetaddr (str) = 192.168.227.0
totem.interface.1.mcastaddr (str) = 239.255.2.1
totem.interface.1.mcastport (u16) = 49152
</code>

To check the status of the Corosync rings:

<pre style="overflow-x:auto;">
[root@rh7z-mds1 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
	id	= 10.70.227.11
	status	= ring 0 active with no faults
RING ID 1
	id	= 192.168.227.11
	status	= ring 1 active with no faults

To get the cluster status from CMAN on RHEL 6 clusters:

[root@rh6-mds1 ~]# cman_tool status
Version: 6.2.0
Config Version: 14
Cluster Name: demo-MDS-1-2
Cluster Id: 28594
Cluster Member: Yes
Cluster Generation: 24
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1  
Active subsystems: 9
Flags: 2node 
Ports Bound: 0  
Node name: rh6-mds1.lfs.intl
Node ID: 1
Multicast addresses: 239.255.1.1 239.255.2.1 
Node addresses: 10.70.206.11 192.168.206.11 

If the cluster appears to start, but there are errors reported by pcs cluster status and in the syslog related to Corosync totem, then there may be a conflict in the multicast address configuration with another cluster or service on the same subnet. A typical error in the syslog would look similar to the following output:

Apr 13 22:11:15 rh67-pe corosync[26370]:   [TOTEM ] Received message has invalid digest... ignoring.
Apr 13 22:11:15 rh67-pe corosync[26370]:   [TOTEM ] Invalid packet data

These errors indicate that the node has intercepted traffic intended for a node on a different cluster.

Also be careful in the definition of the network and multicast addresses. pcs will often create the configuration without complaint, and the cluster framework may even load without reporting any errors to the command shell. However, a misconfiguration may lead to a failure in the RRP that it not immediately obvious. Look for unexpected information in the Corosync database and the cluster CIB.

For example, if one of the cluster node addresses shows up as localhost or 127.0.0.1, this indicates a problem with the addresses supplied to pcs with the --addr0 or --addr1 flags.