Clustered MSDTC fails

Problem

While setting up a new clustered Distributed transaction coordinator for a SQL Server FCI, it fails to come online when restarted. This time it happened after I enabled network dtc access, but I have had this happen a lot during patching and cluster failover. Usually, I would just remove and reinstall, but that doesn’t seem to help this time. No matte what I did, FC Manager would just list it as failed:

image

Analysis

Looking in Services, I could see the DTC service was disabled:

image

The GUID in the service name can be matched to the cluster resource in the registry. This is useful if you have more than one DTC in your cluster, as FCI will not allow you to have several DTC resources with the same name. additional DTCs are named “New Distributed Transaction Coordinator (1)”, 2 and so on.

image

I tried to enable the service, only to be greeted with a snarky “This service is marked for deletion” message.

image

Then, I tried removing the resource and adding a new one, as this is my standard MO whenever I have trouble with a clustered MSDTC. Doing that I ended up with another “Marked for deletion” DTC service. My next idea was to fail over the instance, but then I thought, what if this had been in production? Thus, I kept on searching for another solution. And the solution turned out to be a simple one…

Solution

Log out ALL user sessions from the active node. This means all, yourself and any disconnected others included. Then log back in again, and bring the DTC resource online.

image

And by the way, remember change the policy for the DTC, to make sure that such errors doesn’t take down and fail over the entire instance. It could have solved the problem, but it could just as easily lead to the instance failing back and forth until it fails. Adding a script or policy that automatically logs out inactive users from the cluster nodes once a day is also a good idea.

image

Print This Post Print This Post

Tags: ,

Leave a Reply

%d bloggers like this: