Error 87 when installing SQL Server Failover Cluster instance

Problem

When trying to install a SQL Server Failover Cluster instance, the installer fails with error 87. Full error message from the summary log:

Feature: Database Engine Services
Status: Failed
Reason for failure: An error occurred during the setup process of the feature.
Next Step: Use the following information to resolve the error, uninstall this feature, and then run the setup process again.
Component name: SQL Server Database Engine Services Instance Features
Component error code: 87
Error description: The parameter is incorrect.

And from the GUI:

Analysis

This is an error I have not come across before.

The SQL Server instance is kind of installed, but it is not working, so you have to uninstall it and remove it from the cluster to try again. This is rather common when a clustered installation fails though.

As luck would have it, the culprit was easy to identify. A quick look in the cluster log (the one for the SQL Server installation, not the actual cluster log) revealed that the IP address supplied for the SQL Server instance was invalid. The cluster in question was located in a very small subnet, a /26 subnet. The IP allocated was .63. A /26 subnet contains 64 addresses. As you may or may not know, the first and last addresses in a subnet are reserved. The first address (.0) is the network address, and the last address (yes, that would be .63) is reserved as the broadcast address. It is also common to reserve the second or second to last address for a gateway, that would be .1 or .62 in this case.

Snippet from the log:

Solution

Allocate a different IP address. In our case that meant moving the cluster to a different subnet. as the subnet was completely filled to the brim.

Action plan:

  • Replace the IP on node 1
  • Wait for the cluster to arbitrate using the heartbeat VLAN or direct attach crossover cable.
  • Add an IP to the cluster goup resource group and give it an address in the new subnet.
  • Bring the new IP online
  • Replace the IP on node 2
  • Wait for the cluster to achieve quorum
  • Remove the old IP from the Cluster Group resource group
  • Validate the cluster
  • Re-try the SQL Server Failover Cluster instance installation.

Connect SQL Server Management Studio to a different AD domain

Problem

  • SSMS is installed on a management workstation in DomainA.
  • The SQL Server is installed on a server in DomainB.
  • There is no trust between DomainA and DomainB.
  • The server is set to use AD Authentication only.’

Solution

Use runas /netonly to start SSMS.

The netonly switch will cause SSMS to use alternate credentials for all remote access. This will enable access to remote SQL Servers using the supplied credentials as long as you are able to authenticate to the domain. Network capture tests indicates that you need network access to a domain controller in DomainB from your management workstation for this to work.

  • Run the following command in the folder where SSMS.EXE is installed:
RUNAS /netonly /user:DomainB\user SSMS.EXE
  • Then connect to the server you want to talk to in DomainB as you would if you were running SSMS from a computer in DomainB.

SSMS will indicate that you are running as DomainA\user, but if you run a SELECT SYSTEM_USER command you will see that your commands are executed as DomainB\user. When you open the Connect to Server dialog, the DomainA user will be shown (and greyed out as usual), but you will actually connect as the specified DomainB user.

image
image

Be aware that if you want to connect to SQL Servers in several disjointed domains, you will need to have one window for each account. All of them will seem as they are using the DomainA account, so it can get a bit confusing. I recommend connecting to a server at once, that way you should be able to easily identify which domain your window is connecting to.

SQL Server 2016 sysadmin escalation using PowerUpSQL

Scenario

For some reason you need to gain sysadmin access to a SQL Server instance. Maybe you have inherited it from a DBA that was eaten by a sleepy tiger, or more likely, the SQL Server instance was installed by a consultant and the sysadmin password was forgotten a long time ago. Oh, and you want to do it while the SQL Server is running. No matter why, here is one possible solution, provided that you have local admin access. There are probably ways to escalate access without local admin as well. And of course there are tons of ways to escalate a normal user to an admin user, but that is not the focus of this post.

This procedure has been tested on SQL Server 2014 and 2016. On previous versions, simply running SQL Server Management Studio as admin locally on the server was enough, but on never versions that path has been blocked by default.

To make the process easier, we will utilize the PowerUpSQL PowerShell module from NetSPI.

Procedure

Note: One of the SQL Server instances used in the example is the default instance of the server IM01. As such, it has no instance specification on the form Server\instance. If your server has on or more named instances, you will have to specify the instance name.

You need to obtain local or remote console access to the server. The user has to be a local admin on the server, and should be a user on the SQL Server. Application service accounts are nice starting point if no one has any access to the SQL Server at all. That is, you may not have access to the server, but you have an application using it, and that application accesses the SQL Server using a service account in Active Directory. You can give that account temporary local admin access on the SQL Server. Then you can escalate it to sysadmin access, and then use that login to grant yourself access. You should of course remove sysadmin access from the service account when you are done.

As a sidenote, you will often find that the service account already has sysadmin access, or that the application has a hard-coded SQL Server login account (not in AD but a SQL Server specific account) that is called sa (sysadmin) or is a member of the sysadmin group.

  • Download the module from https://github.com/NetSPI/PowerUpSQL. There are ways to launch the module remotely, but in this example we are copying them to the server.
  • Copy the files to a local folder, we use C:\Temp in our example.
  • Get a copy of psexec.exe from sysinternals.
  • Open an administrative PowerShell session running as the local system.
  • PsExec.exe -i -s powershell.exe

image

  • Run the following commands in the black PowerShell console.
  • Verify that you are running as the nt authority\system account by running whoami.
  • Import-Module c:\Temp\PowerUpSQL.psd1
  • Enumerate local SQL Server instances
  • Get-SQLInstanceLocal
  • Check the current access level
  • Get-SQLServerInfo -Verbose -Instance Server\instance

image

  • Look for IsSysadmin in the output
  • Escalate access
    • Invoke-SQLEscalatePriv –Verbose –Instance “SQLServer1\Instance1”
    • Invoke-SQLImpersonateService -Verbose -Instance SQLServer1\Instance1

Note: Sadly, some of the screenshots were lost. I will add some new ones later if I remember.

  • If you are successful, you may execute TSQL code to grant yourself access.
  • Get-SQLServerInfo -Verbose -Instance Server\instance
  • Look for IsSysadmin in the output
  • Import SQLPS
  • Import-Module -Name Sqlps
  • Test that you are able to execute arbitrary TSQL
  • Invoke-Sqlcmd -Query “Select @@version” -serverinstance SQLServer1\Instance1
  • Run the TSQL necessary to grant yourself access. Sample:
  • Invoke-Sqlcmd -Query “ALTER SERVER ROLE [sysadmin] ADD MEMBER [Domain\user]” -serverinstance
  • De-escalate access
  • Invoke-SQLImpersonateService -Verbose -Rev2Self
  • Connect to the server using your chosen account.

If that did not work

There is another method, based on shutting the SQL Server down and starting it in single user mode.

  • Log in to the SQL Server as a local admin.
  • Shut down SQL Server and the SQL Server Agent.
  • image
  • Start an administrative command prompt.
  • Navigate to the SQL Server instance directory, usually something like this: “C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Binn”
  • If you are using a default instance, execute sqlservr.exe –m –c
  • For a named instance, use sqlservr.exe –m –c –s Instancename
  • If someone else keep nabbing your connection, use -m”SQLCMD” to only allow sqlcmd connections.
  • Open a new command prompt and navigate to the instance directory.
  • Execute sqlcmd.exe
  • You should get a 1> prompt.
  • Input “@@SERVERVERSION” on line 1 without the quotes and press enter.
  • Input “GO” at line 2 and again press enter. The server should respond with the current version.

image

  • Add yourself to the sysadmin role by entering the following lines:
USE [master]
GO
CREATE LOGIN [DOMAIN\user] FROM WINDOWS WITH DEFAULT_DATABASE=[master]
GO
ALTER SERVER ROLE [sysadmin] ADD MEMBER [DOMAIN\user]
GO

image

  • There will be no response, but you can verify by executing the following lines.
EXEC sp_helpsrvrolemember 'sysadmin'
GO
  • It will return a list of sysadmin group members.
  • When you are done, return to the first cmd window and press [Ctrl]+C to shut down the SQL Server.
  • Restart the SQL Server and SQL Server Agent services in the normal mode.

Unexpected sense, Sense key B code 41

Problem

The system event log is running over with Event Id 2095 from Server Administrator. “Unexpected sense. SCSI sense data: Sense key:  B Sense code: 41 Sense qualifier:  0:  Physical Disk 0:1:20 Controller 1, Connector 0”.

Several events per second at the most.

image

 

Analysis

https://en.wikipedia.org/wiki/Key_Code_Qualifier shows a list of common Sense Key Codes. Key B translates to aborted command. Key 41 is not listed, but it is very likely that something is not as it should be with disk 20 on controller 1. OMSA can tell you which disk this is, and what arrays it is a part of.

image

OMSA claims that the disk is working fine, but unless the drive is trying to tell me that it has found some missing common sense, I think I have to respectfully disagree. Such faults are usually not a good sign, especially when they are so prevalent as in this case. So I performed a firmware/driver upgrade as that will often provide some insight. In this case SUU 1809 replaced SUU 1803, that is a 6 month span in revisions.

The upgrade resulted in a new error:

Event ID 2335, Controller event log: PD 14(e0x20/s20) is not a certified drive:  Controller 1 (PERC H730P Adapter) .

image

 

OMSA tells me that the disk is in a failed state.

image

 

Time to register a support case with the evil wizards of the silver slanted E.

 

Solution

The disk was replaced. OMSA still complains about an error, specifically an enclosure error, but the iDRAC shows a green storage status.

OMSA:

image

image

 

iDRAC:

image

After restarting all the OMSA services and the iDRAC Service Module service the status returned to green in OMSA as well.

image

Microsoft Update with PSWindowsUpdate 2.0

Preface

This is an update to my previous post about PSWindowsUpdate located here: https://lokna.no/?p=2132. The content is pretty much the same, but updated for PSWindowsUpdate 2.0.

Most of my Windows servers are patched by WSUS, SCCM or a similar automated patch management solution at regular intervals. But not all. Some servers are just too important to be autopatched. This is a combination of SLA requirements making downtime difficult to schedule and the sheer impact of a botched patch run on backend servers. Thus, a more hands-on approach is needed. In W2012R2 and far back this was easily achieved by running the manual Windows Update application. I ran through the process in QA, let it simmer for a while and went on to repeat the process in production if no nefarious effects were found during testing. Some systems even have three or more staging levels. It is a very manual process, but it works, and as we are required to hand-hold the servers during the update anyway, it does not really cost anything. Then along came Windows Server 2016. Or Windows 10 I should really say, as the Update-module in W2016 is carbon copied from W10 without changes. It is even trying to convince me to install W10 Creators update on my servers…

clip_image001

In Windows Server 2016 the lazy bastards at Microsoft just could not be bothered to implement the functionality from W2012R2 WU. It is no longer possible to defer specific updates I do not want, such as the stupid Silverlight mess. If I want Microsoft update, then I have to take it all. And if I should become slightly insane and suddenly decide I want driver updates from WU, the only way to do that is to go through device manager and check every single device for updates. Or install WUMT, a shady custom WU client of unknown origin.

I could of course use WSUS or SCCM to push just the updates I want, but then I have to magically imagine what updates each server wants and add them to an ever growing number of target groups. Every time I have a patch run. Now that is expensive. If I had enough of the “special needs” servers to justify the manpower-cost, I would have done so long ago. Thus, another solution was needed…

PSWindowsUpdate to the rescue. PSWindUpdate is a Powershell module written by a user called MichalGajda enabling management of Windows Update through Powershell. You can find it here: https://www.powershellgallery.com/packages/PSWindowsUpdate/2.0.0.0. In this post I go through how to install the module and use it to run Microsoft Update in a way that resembles the functionality from W2012R2. You could tell the module to install a certain list of updates, but I found it easier to hide the unwanted updates. It also ensures that they are not added by mistake with the next round of patches.

Getting started

(See the following chapters for details.)

  • You should of course start by installing the module. This should be a one-time deal, unless a new version has been released since last time you used it. New versions of the module should of course be tested in QA like any other software.
  • Then, make sure that Microsoft Update is active.
  • Check for updates to get a list of available patches.
  • Hide any unwanted patches
  • Install the updates
  • Re-check for updates to make sure there are no “round-two” patches to install.

Continue reading “Microsoft Update with PSWindowsUpdate 2.0”

sql_text returns *password—–

Problem

While hunting for performance bottlenecks using Spotlight some months ago I came across some strange statements listed only as *password—– (the – continues for a couple of pages). This was new to me, and even a search on the pesky trackers-with-integrated-search came up blank. We passed it off to the developers and never heard back, and I forgot about the whole thing. Then, during a stress test in QA it popped up on the radar again, and this time I decided to dig into it myself.

Analysis

Being a good boy, I was using extended events to trace down the problem. I do not know why, but profiler still feels more at home. Anyhow, I set up a session triggering on sql_text like “%*password—%. Looking something like this:

CREATE EVENT SESSION [pwdstar] ON SERVER
ADD EVENT sqlserver.rpc_starting( ACTION(sqlserver.client_app_name,sqlserver.client_connection_id,sqlserver.client_hostname,sqlserver.client_pid,sqlserver.database_name,sqlserver.session_id,sqlserver.sql_text,sqlserver.tsql_stack,sqlserver.username)
WHERE ([package0].[greater_than_uint64]([sqlserver].[database_id],(4)) AND [package0].[equal_boolean]([sqlserver].[is_system],(0)) AND ([sqlserver].[like_i_sql_unicode_string]([sqlserver].[sql_text],N'%*password--%') AND [sqlserver].[equal_i_sql_unicode_string]([sqlserver].[database_name],N'DBNAME'))))
ADD TARGET package0.event_file(SET filename=N'pwdstar',max_file_size=(200))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=ON,STARTUP_STATE=OFF
GO

I came back an hour later to look at the results, and was surprised to find that the tsql_stack returned a more normal looking statement:

image

Back to the trackers I went, this time trying to figure out why sql_text differed from statement. This produced far better results, and I came across this article: https://www.sqlskills.com/blogs/jonathan/understanding-the-sql_text-action-in-extended-events/, and following the links produced the following statement from Mr. Keheyias:

· If the sql_text Action returns:



*password------------------------------


the text was stripped by the engine because it contained a password reference in it.  This functionality occurs in SQL Trace as well and is a security measure to prevent passwords from being returned in the trace results.

As I said, this was news to me, but the function seems to be an old one not necessarily related to xEvents. Extended Events does however use a more aggressive masking algorithm than what is used in profiler from what I can tell.

Solution

If you come across such queries in your profiler or xEvents results, you are dealing with something SQL server believes is password related. The results are thus hidden. The easiest way to get the statement is capturing tsql_stack as well. In SQL 2016 you can capture the statement directly:

image

Microsoft Update with PSWindowsUpdate

Preface

Most of my Windows servers are patched by WSUS, SCCM or a similar automated patch management solution at regular intervals. But not all. Some servers are just too important to be autopatched. This is a combination of SLA requirements making downtime difficult to schedule and the sheer impact of a botched patch run on backend servers. Thus, a more hands-on approach is needed. In W2012R2 and far back this was easily achieved by running the manual Windows Update application. I ran through the process in QA, let it simmer for a while and went on to repeat the process in production if no nefarious effects were found during testing. Some systems even have three or more staging levels. It is a very manual process, but it works, and as we are required to hand-hold the servers during the update anyway, it does not really cost anything. Then along came Windows Server 2016. Or Windows 10 I should really say, as the Update-module in W2016 is carbon copied from W10 without changes. It is even trying to convince me to install W10 Creators update on my servers…

clip_image001

In Windows Server 2016 the lazy bastards at Microsoft just could not be bothered to implement the functionality from W2012R2 WU. It is no longer possible to defer specific updates I do not want, such as the stupid Silverlight mess. If I want Microsoft update, then I have to take it all. And if I should become slightly insane and suddenly decide I want driver updates from WU, the only way to do that is to go through device manager and check every single device for updates. Or install WUMT, a shady custom WU client of unknown origin.

I could of course use WSUS or SCCM to push just the updates I want, but then I have to magically imagine what updates each server wants and add them to an ever growing number of target groups. Every time I have a patch run. Now that is expensive. If I had enough of the “special needs” servers to justify the manpower-cost, I would have done so long ago. Thus, another solution was needed…

PSWindowsUpdate to the rescue. PSWindUpdate is a Powershell module written by a user called MichalGajda on the technet gallery enabling management of Windows Update through Powershell. In this post I go through how to install the module and use it to run Microsoft Update in a way that resembles the functionality from W2012R2. You could tell the module to install a certain list of updates, but I found it easier to hide the unwanted updates. It also ensures that they are not added by mistake with the next round of patches.

Getting started

(See the following chapters for details.)

  • You should of course start by installing the module. This should be a one-time deal, unless a new version has been released since last time you used it. New versions of the module should of course be tested in QA like any other software.
  • Then, make sure that Microsoft Update is active.
  • Check for updates to get a list of available patches.
  • Hide any unwanted patches
  • Install the updates
  • Re-check for updates to make sure there are no “round-two” patches to install.

Continue reading “Microsoft Update with PSWindowsUpdate”

No Microsoft Update

Problem

I was preparing to roll out SQL Server 2016 and Windows Server 2016 and had deployed the first server in  production. I suddenly noticed that even if I selected “Check online for updates from Microsoft Update” in the horrible new update dialog, I never got any of the additional updates. Btw, this link/button only appears when you have an internal SCCM or WSUS server configured. Clicking the normal Check For Updates button will get updates from WSUS.

image

 

Analysis

This was working as expected in the lab, but the lab does not have the fancy System Center Configuration Manager and WSUS systems. So of course I blamed SCCM and uninstalled the agent. But to no avail, still no updates. I lurked around the update dialog and found that the “Give me updates for other Microsoft products..” option was grayed out and disabled. I am sure that I checked this box during installation, as I remember looking for its location. But it was no longer selected, it was even grayed out.

image

This smells of GPOs. But I also remembered trying to get this option checked by a GPO to save time during installation, and that it was not possible to do so in Win2012R2. Into the Group Policy Manager of the lab DC I went…

It appears that GPO management of the Microsoft Update option has been added in Win2016:

image

This option is not available in Win2012R2, but as we have a GPO that defines “Configure Automatic Updates”, it defaults to disabled.

solution

Alternative 1: Upgrade your domain controllers to Win2016.

Alternative 2: Install the Win2016 .admx files on all your domain controllers and administrative workstations.

Then, change the GPO ensuring that “Install updates for other Microsoft products is enabled. Selecting 3 – Auto download used to be a safe setting.

Alternative 3: Remove the GPO or set “Configure Automatic Updates” to “Not Configured”, thus allowing local configuration.

The SQL 2012 SP3 update killed my cluster

Problem

After an upgrade to SQL 2012 SP3, a clustered instance fails to start listing error 33009, 912 and 3417:

2016-04-28 14:32:23.39 spid14s     Error: 33009, Severity: 16, State: 2.
2016-04-28 14:32:23.39 spid14s     The database owner SID recorded in the master database differs from the database owner SID recorded in database 'msdb'. You should correct this situation by resetting the owner of database 'msdb' using the ALTER AUTHORIZATION statement.
2016-04-28 14:32:23.39 spid14s     Error: 912, Severity: 21, State: 2.
2016-04-28 14:32:23.39 spid14s     Script level upgrade for database 'master' failed because upgrade step 'msdb110_upgrade.sql' encountered error 33009, state 2, severity 16. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the 'master' database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.
2016-04-28 14:32:23.39 spid14s     Error: 3417, Severity: 21, State: 3.
2016-04-28 14:32:23.39 spid14s     Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.

Something went down the steep road to oblivion during the upgrade process, and the instance is as dead as an overrun squirrel. To be more specific, the master database does not like the looks of the owner attribute on the msdb database, and is down right refusing to complete the upgrade process.

Analysis

  • Start the instance with trace flag 902 to disable the upgrade scripts and get the server back online. You can do this by command line if you wish, but for clustered instances it is easier to use SQL Server config manager. There will be some existing startup parameters. Do not mess with those, just add a new one using the handy “Add” button provided.
  • image
  • Wait for the instance to start, and log in using SSMS.
  • To find out what SID is where, run the following script
--Sid in master
 Select SID from master..sysdatabases Where  Name = 'msdb'
--Sid in database
 Select [SID]  From msdb.sys.database_principals Where Name = 'DBO'

It will probably show 0x01  (sa) for one of the results, and an Active directory SID for the other one:

image

If you want to know what username SQL Server has stored for the SID, use this command, replacing 0x01 with the SID from the previous result:

 --Sid to username
 SELECT  Name as [LoginName]  FROM   master..syslogins  Where  SID = 0x01

Be aware that if the username has been changed in AD, lets say you changed DOMAIN\johnsmith to DOMAIN\jsmith1, SQL Server will not necessarily be aware of this. You can validate the username in AD using powershell. See https://technet.microsoft.com/en-us/library/ff730940.aspx. And just to make the point clear, do not mess around with account names in AD if they are linked to SQL Server. It may lead to login failures, and possibly cows being abducted by aliens at your local dairy farm.

Solution

Change the database owner to match the value stored in master. Remember to use brackets for AD users.

Alter Authorization on Database::msdb To [Domain\user]

Then remove the trace flag and restart the instance to try the upgrade scripts again. If the current owner is not the owner you want, allow the upgrade to finish and change it again.

LCK_M_U Wait troubleshooting

Problem

I got a callout from our monitoring system, alerting me of a high Average Wait Time on one of my production servers:

image

The baseline average wait value for this server is around 9ms, so an average in the 900ms range is a definitive sign of something. Whether it is a sign of something bad or not is for the investigation to decide. All other operating parameters where in the normal range, both for the SQL Server and the connected application servers. Usually, such warnings are accompanied by a complaint from the application team about slow application performance, but not today.

Analysis

I started out running Mr. Randal’s brilliant wait stats scripts. As the graph from the monitoring system contained a lot of jitter, I went with http://www.sqlskills.com/blogs/paul/capturing-wait-statistics-period-time/ and let it run for the 30 minute default period when the waits were high to se what was going on. The result revealed that LCK_M_U had joined the usual band of ASYNC_NETWORK_IO, TRACEWRITE and CXPACKET waits:

image

LCK_M_U indicates that someone somewhere is waiting for an update lock. This particular system has a very high Read ratio versus Writes, and most writes are done during nightly update tasks. As such, this wait type was not expected and further analysis was needed. I found an old article by Kendra Little describing how to collect data using  sp_WhoIsactive: http://www.littlekendra.com/2011/02/01/whoisactive/

Using her example as a guide, I created a script to collect some data in my monitoring database. To use this script, you need a user database that you can play around with and an up to date version of Adam Machanics sp_WhoIsactive. I tried using the @find_block_leaders option, but I ran in to a known recursion level limitation so I had to remove it. The script has three parts:

Create a table

This part creates a table tagged with todays date to make it easy to separate different troubleshooting sessions. You could also schedule daily automatic runs of the entire script to collect data at specific times each day.

DECLARE @destination_table VARCHAR(4000) ;
SET @destination_table = 'WhoIsActive_' + CONVERT(VARCHAR, GETDATE(), 112) ;

DECLARE @schema VARCHAR(4000) ;
EXEC dbo.sp_WhoIsActive
@get_transaction_info = 1,
@get_plans = 1,
@find_block_leaders = 0,
@get_locks=true,
@RETURN_SCHEMA = 1,
@SCHEMA = @schema OUTPUT;

SET @schema = REPLACE(@schema, '', @destination_table) ;

PRINT @schema
EXEC(@schema) ;
go

Collect data

You can run this part several times, and the intervals can be adjusted to suit your needs. Just be aware that if you cross midnight, you either have to create another table for the new day, or modify the collection script to log to yesterdays table.

DECLARE @destination_table VARCHAR(4000) ;
SET @destination_table = 'WhoIsActive_' + CONVERT(VARCHAR, GETDATE(), 112) ;
DECLARE @msg NVARCHAR(1000) ;
DECLARE @numberOfRuns INT ;
SET @numberOfRuns = 100 ;

WHILE @numberOfRuns > 0
    BEGIN;
        EXEC dbo.sp_WhoIsActive 
		@get_transaction_info = 1, 
		@get_plans = 1, 
		@find_block_leaders = 0, 
		@get_locks=true,
		@DESTINATION_TABLE = @destination_table ;

        SET @numberOfRuns = @numberOfRuns - 1 ;

        IF @numberOfRuns > 0
            BEGIN
                SET @msg = CONVERT(CHAR(19), GETDATE(), 121) + ': ' +
                 'Logged info. Waiting...'
                RAISERROR(@msg,0,0) WITH nowait ;

                WAITFOR DELAY '00:00:05'
            END
        ELSE
            BEGIN
                SET @msg = CONVERT(CHAR(19), GETDATE(), 121) + ': ' + 'Done.'
                RAISERROR(@msg,0,0) WITH nowait ;
            END

    END ;
GO

The default is 100 runs at 5 second intervals. I had to run the collection script three times to get the data I wanted, so don’t give up if you don’t catch it on the first run.

Analyze the data

Two sample analysis scripts. The first one gives you all the collected data sorted by collection time, the second filters out the LCK_M_U waits only and has a hardcoded table name as I didn’t finish the first day.

DECLARE @destination_table NVARCHAR(2000), @dSQL NVARCHAR(4000) ;
SET @destination_table = 'WhoIsActive_' + CONVERT(VARCHAR, GETDATE(), 112) ;
SET @dSQL = N'SELECT collection_time, * FROM dbo.' +
 QUOTENAME(@destination_table) + N' order by 1 desc' ;
print @dSQL
EXEC sp_executesql @dSQL
go
select collection_time, * 
from JKL_Log.dbo.WhoIsActive_20150616
where session_id not like 51 and wait_info like '%LCK_M_U%'
order by 1 desc;

The result clearly identified the waiting code. This code is not necessarily the culprit, it could be another slow task hogging locks on the data.

SNAGHTML33261fb4

Solution

With the troublesome T-SQL code identified, the next step would be debugging the application. In this case, the code was from an Entity Framework app, and I have to get the developers of said application to look into the issue.