This post is part of the Failover Cluster Checklist series.
Beware: Running cluster validation may cause failovers or offline cluster resources.
Running cluster validation in production is not recommended, especially if you are troubleshooting an unstable cluster. I once took down a six-node Hyper-V cluster by running validation. It broke the storage connection to all nodes, and all the VMs crashed as a result. I recommend scheduling an outage, or at least informing the powers that be before you press the validate button.
- Start the “Validate Configuration” wizard
- Add servers. If you run this wizard on an existing cluster, the existing cluster nodes may be pre-added. Make changes as necessary.
- Select “Run All tests”
- Wait for the test to complete (5-60 minutes depending on configuration)
If you are troubleshooting a specific validation error or warning, re-running a full validation test may be time-consuming. Try re-running just the failing test until you get it working again, but remember to run a full validation afterwards to make sure you haven’t “fixed” something else.
Common errors in the report
Software updates missing, Validate software update levels
You can usually ignore this on a new cluster, as you should run windows update after installing the cluster anyway. On an existing cluster, fix as soon as possible. Sometimes, this rule will generate false positives due to the cluster nodes not being patched at the same time. This may lead to different KB numbers as one update may supersede another. You may have to remove patches from some nodes to correct this.
More than one VLAN on the same MAC address due to teaming.
Also known as a Converged networking setup. Not recommended. Make sure you have at least 2 completely independent networks.
Node reachable by only one interface
Make sure the network is highly available (NIC TEAM).
This is normal if you have a cluster configuration without shared storage. Otherwise, this warning points to mis-configured storage.
MPIO related errors or warnings
The SAN connection is mis-configured or faulty, or you need a DSM update. Check with your SAN admin.
SCSI-3 persistent reservations
Your SAN does not support clustered storage pools with the current firmware. If you do not plan to use storage pools in your cluster, this warning can safely be ignored.