Validating a Failover Cluster

This post is part of the Failover Cluster Checklist series.

Validation process

Beware: Running cluster validation may cause failovers or offline cluster resources. 

Running cluster validation in production is not recommended, especially if you are troubleshooting an unstable cluster. I once took down a six-node Hyper-V cluster by running validation. It broke the storage connection to all nodes, and all the VMs crashed as a result. I recommend scheduling an outage, or at least informing the powers that be before you press the validate button.

  • Start the “Validate Configuration” wizard

clip_image002

  • Add servers. If you run this wizard on an existing cluster, the existing cluster nodes may be pre-added. Make changes as necessary.
  • Select “Run All tests”
    clip_image005
  • Wait for the test to complete (5-60 minutes depending on configuration)

If you are troubleshooting a specific validation error or warning, re-running a full validation test may be time-consuming. Try re-running just the failing test until you get it working again, but remember to run a full validation afterwards to make sure you haven’t “fixed” something else.

Common errors in the report

Software updates missing, Validate software update levels

You can usually ignore this on a new cluster, as you should run windows update after installing the cluster anyway. On an existing cluster, fix as soon as possible. Sometimes, this rule will generate false positives due to the cluster nodes not being patched at the same time. This may lead to different KB numbers as one update may supersede another. You may have to remove patches from some nodes to correct this.

SNAGHTML664cc8

More than one VLAN on the same MAC address due to teaming.

Also known as a Converged networking setup. Not recommended. Make sure you have at least 2 completely independent networks.

Node reachable by only one interface

Make sure the network is highly available (NIC TEAM).

SNAGHTML68598f

No disks

clip_image008

This is normal if you have a cluster configuration without shared storage. Otherwise, this warning points to mis-configured storage.

MPIO related errors or warnings

The SAN connection is mis-configured or faulty, or you need a DSM update. Check with your SAN admin.

SCSI-3 persistent reservations

SNAGHTML69b3e0

Your SAN does not support clustered storage pools with the current firmware. If you do not plan to use storage pools in your cluster, this warning can safely be ignored.

Author: DizzyBadger

SQL Server DBA, Cluster expert, Principal Analyst

3 thoughts on “Validating a Failover Cluster”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.