Problem
I always disable the named pipes protocol on my SQL Servers if they are going to accept network connections from clients. I am not sure if named pipes still poises a threat to my network in a 10GB Ethernet world, but I have spent a considerable amount of time troubleshooting network latencies caused by named pipes previously. For some background on TCP/IP vs NP, check out this article by devpro. Going into SQL Config manager and turning it of has become part of my routine, so imagine my surprise when one of the junior DBAs at my workplace told me that this caused a new cluster instance he was installing to fail. I could of course have dismissed it as some strange fault during installation, but a quick survey revealed that he had followed my SQL Server cluster installation manual to the letter. And this happened only in production, not in the identical QA environment. The only error message logged during SQL startup was this one:
“Error: 26058, Severity: 16, State: 1. A TCP provider is enabled, but there are no TCP listening ports configured. The server cannot accept TCP connections.”
The failover cluster service does not like SQL Server instances that are not network connectable, so it promptly shut down the instance, followed by a slew of error messages in the event log. If I remember correctly, the default timeout for a SQL Server instance to come online after it is started is 3 minutes. That is, it has to be ready for client connections and respond to the cluster service’s health checks in 3 minutes. This does not include any time needed to recover user databases, so in a physical environment with a functional storage subsystem this poses no problem at all. If you have slow hardware or other problems causing a slow startup you can increase the timeout through failover cluster manager, but I have personally never had the need to increase it.
So far, I have only seen this issue in the following scenario:
- MSSQL 2012 SP1
- Windows 2012
- Failover cluster
- >1 instance of SQL Server in the cluster
But I have been unable to reproduce the problem consistently in my lab. I have otherwise equal cluster environments, where some have this issue and some does not.
Steps to reproduce:
- Install instance 1
- Apply patches to Instance 1
- Test failover of instance 1
- Disable named pipes on instance 1
- Restart Instance 1
- Install instance 2
- Apply patches to instance 2
- Test Failover of instance 2
- Disable named pipes on instance 2
- Restart instance 2
- Instance 2 fails
Analysis
We spent quite a lot of time comparing the working cluster with the failing one. Down to firmware versions, drivers and patches applied. Everything was equal. I even considered a local disk fault, but failing over to another node didn’t change anything. The registry settings also looked ok. On a cluster, the registry settings are only available on the node currently owning the instance. The network settings are found in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL11.[Instance name]\MSSQLServer\SuperSocketNetLib\.
WARNING: NEVER try to change anything directly in the registry on a clustered instance, especially one that is not working. See Technet for a guide to changing clustered registry settings on an offline SQL instance.
As the sample shows, we even had Listen All enabled.
Workaround
After a lot of tinkering we came up empty and resorted to re-enabling Named Pipes. If anyone finds or know of a solution to this, please leave a comment below.
Note: the following guide is vague by design. It is very difficult to provide a proper step by step guide for this workaround covering all alternatives. If you have questions, leave a comment including details about your situation, and I will try to clarify.
To enable named pipes again, we just started sqlserver directly through the command line on the active node. This will usually bring it online. If not, stop the cluster service while you do this. Then we enabled named pipes in Config Manager and stopped the instance again. After that, we were able to start the instance as expected, albeit with named pipes enabled.
Warning: If you stop the cluster service on an active node, the cluster may try to fail over the instance to another node if there are other active nodes. Make sure the instance is listed as Failed before you try this. If all else fails, you may have to shut down the other nodes as well, thus taking down all instances in the cluster.