Exchange 2010 Dag won’t go online

Problem

After installation of Exchange 2010 SP1 Rollup5 I discovered that one of my networks was listed as partitioned in EMC. I ran several test without being able to find a cause for this. I am able to ping the addresses both ways, and there are no routes from network 1 to network 2. I even ran “Validate this cluster” (network only) successfully. Further investigation established that the cluster core resources were offline as well:

SNAGHTML5c562012

After a lot of fruitless searching I came across this Technet blog: http://blogs.technet.com/b/timmcmic/archive/2010/05/12/cluster-core-resources-fail-to-come-online-on-some-exchange-2010-database-availability-group-dag-nodes.aspx , who describes this situation and how to resolve it. But it also claims that the error is fixed in SP1. Since I’ve been at SP1 since installation I was skeptical, but I tried it anyway. As I suspected it didn’t help, but a comment from MattP_75 put me on the right track to a solution.

The problem is related to one or both of the networks not allowing client connections. I even found it in the eventlog, event 1223 from FailoverClustering.

SNAGHTML5c5c3660image

When I tried to change it in Failover Cluster management as the event suggested, it just bounced back to not allowing client access. To get it to stick, I had to change it in the registry.

I have no idea why this happens, but several of the comments on the article mentioned above talk about backup agents, mostly backup exec which I don’t have on my servers.

Solution

This is what I did to resolve the issue:

  • Shut down one of the nodes to ensure quorum
  • Change the role value to 3 on all cluster networks
    image
  • Get the core resources online
  • Restart the other node
  • Check the registry on both nodes to verify

I have had this happen again when I restart the cluster node that is hosting the Public Folders database, but it doesn’t happen every time.

Update 2011.11.23:

Microsoft recently released a hotfix which might be related to this error, kb2550886. According to the Exchange Team blog this is highly recommended for Exchange DAG’s running on Windows 2008R2, and they describe a scenario resembling the problem mentioned above. I have not verified that this update solves the problem permanently, but it is most certainly worth installing at your earliest convenience if you haven’t done so already.

Author: DizzyBadger

SQL Server DBA, Cluster expert, Principal Analyst

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.