After installation of Exchange 2010 SP1 Rollup5 I discovered that one of my networks was listed as partitioned in EMC. I ran several test without being able to find a cause for this. I am able to ping the addresses both ways, and there are no routes from network 1 to network 2. I even ran “Validate this cluster” (network only) successfully. Further investigation established that the cluster core resources were offline as well:
After a lot of fruitless searching I came across this Technet blog: http://blogs.technet.com/b/timmcmic/archive/2010/05/12/cluster-core-resources-fail-to-come-online-on-some-exchange-2010-database-availability-group-dag-nodes.aspx , who describes this situation and how to resolve it. But it also claims that the error is fixed in SP1. Since I’ve been at SP1 since installation I was skeptical, but I tried it anyway. As I suspected it didn’t help, but a comment from MattP_75 put me on the right track to a solution.
The problem is related to one or both of the networks not allowing client connections. I even found it in the eventlog, event 1223 from FailoverClustering.
When I tried to change it in Failover Cluster management as the event suggested, it just bounced back to not allowing client access. To get it to stick, I had to change it in the registry.
I have no idea why this happens, but several of the comments on the article mentioned above talk about backup agents, mostly backup exec which I don’t have on my servers.
This is what I did to resolve the issue:
- Shut down one of the nodes to ensure quorum
- Change the role value to 3 on all cluster networks
- Get the core resources online
- Restart the other node
- Check the registry on both nodes to verify
I have had this happen again when I restart the cluster node that is hosting the Public Folders database, but it doesn’t happen every time.
Microsoft recently released a hotfix which might be related to this error, kb2550886. According to the Exchange Team blog this is highly recommended for Exchange DAG’s running on Windows 2008R2, and they describe a scenario resembling the problem mentioned above. I have not verified that this update solves the problem permanently, but it is most certainly worth installing at your earliest convenience if you haven’t done so already.