Exchange 2010 Dag won’t go online

Problem

After installation of Exchange 2010 SP1 Rollup5 I discovered that one of my networks was listed as partitioned in EMC. I ran several test without being able to find a cause for this. I am able to ping the addresses both ways, and there are no routes from network 1 to network 2. I even ran “Validate this cluster” (network only) successfully. Further investigation established that the cluster core resources were offline as well:

SNAGHTML5c562012

After a lot of fruitless searching I came across this Technet blog: http://blogs.technet.com/b/timmcmic/archive/2010/05/12/cluster-core-resources-fail-to-come-online-on-some-exchange-2010-database-availability-group-dag-nodes.aspx , who describes this situation and how to resolve it. But it also claims that the error is fixed in SP1. Since I’ve been at SP1 since installation I was skeptical, but I tried it anyway. As I suspected it didn’t help, but a comment from MattP_75 put me on the right track to a solution.

The problem is related to one or both of the networks not allowing client connections. I even found it in the eventlog, event 1223 from FailoverClustering.

SNAGHTML5c5c3660image

When I tried to change it in Failover Cluster management as the event suggested, it just bounced back to not allowing client access. To get it to stick, I had to change it in the registry.

I have no idea why this happens, but several of the comments on the article mentioned above talk about backup agents, mostly backup exec which I don’t have on my servers.

Solution

This is what I did to resolve the issue:

  • Shut down one of the nodes to ensure quorum
  • Change the role value to 3 on all cluster networks
    image
  • Get the core resources online
  • Restart the other node
  • Check the registry on both nodes to verify

I have had this happen again when I restart the cluster node that is hosting the Public Folders database, but it doesn’t happen every time.

Update 2011.11.23:

Microsoft recently released a hotfix which might be related to this error, kb2550886. According to the Exchange Team blog this is highly recommended for Exchange DAG’s running on Windows 2008R2, and they describe a scenario resembling the problem mentioned above. I have not verified that this update solves the problem permanently, but it is most certainly worth installing at your earliest convenience if you haven’t done so already.

Reparere søkeindeksen på en databasekopi

Problem

Forsøk på å gjøre passiv databasekopi aktiv feiler med følgende melding:

Database copy [navn] on server [server] has content index catalog files in the following state: ‘Failed’.

Løsning

Sjekk om det gjelder flere baser ved å kjøre følgende script:

###############################################################################
###			Find database copies with failed ContentIndex State       		###
###			 Jan Kåre Lokna												###
###			v 1																###	
###############################################################################
try
	{
	#Get mailbox servers
	$Servers = Get-ExchangeServer | Where-Object {($_.ServerRole -match "Mailbox") }
	foreach($Server in $Servers)
	{
		Get-MailboxDatabaseCopyStatus -Server $Server.Name | Where-Object{($_.ContentIndexState -ne "Healthy")}
	}
}
catch [Exception]
{
	Write-Host "Something went horribly wrong..."
	Write-Host $_.Exception.TosSTring()
}

Eksempel på output:

Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex 
Length Length State 
---- ------ --------- ----------- -------------------- ------------ 
DB2\EXCserver2 Healthy 0 0 27.05.2011 13:51:07 Failed 

Kjør følgende kommando for å hente data fra aktiv kopi:

Update-MailboxDatabaseCopy [name] -CatalogOnly 

Name hentes fra output av skriptet over. Vær obs på at kommandoen kasnkje må kjøres på den server som eier databasekopien (EXCServer2 i eksempelet over).