How to collect the AOAG replica state change log

Problem

You want to read the AOAG replica state change log for the entire cluster, not just from a single node. You could of course read it from one node at the time and concatenate the data manually, but where is the fun in that?

Analysis

We have many possible avenues to explore:

  • Central Management Server
  • Storing the xel files on a network share
  • Linking the servers
  • SQLCMD
  • PowerShell

After considering my options I landed on PowerShell, more specifically on the dbatools PowerShell module.  To be even more specific, I employed the invoke-dbaquery command: Invoke-DbaQuery | dbatools | SQL Server automation with PowerShell.

Why? I hear you ask. Well, because I already have dbatools installed and the other options required more hassle. And who wants that? It is not like I spent a day making this work or anything. That would be stupid. Especially if you have only one cluster. If you have many on the other hand…

The usual caveats and gotchas

I am not responsible for what you do with the code and advice given here. If you do not understand it, do your research and get help before you run this in production. It might not even work at all. Tests have been performed at SQL 2022/25 and Win 2022/2025 only.

A simple solution for simple people, eh simple setups

Let us be completely honest; if you have one or two clusters and need these data once a year, you do not need a fancy solution. Just run the script below on each node, paste the results into excel and perform your further analysis in excel:

WITH exEvents AS (SELECT object_name, CONVERT(XML, event_data) AS data
FROM sys.fn_xe_file_target_read_file('AlwaysOn*.xel', null, null, null)
WHERE object_name = 'availability_replica_state_change'
)
--select * from exEvents
SELECT  
	data.value('(/event/@timestamp)[1]','datetime') AS [TimeStamp],
	data.value('(event/data[@name=''availability_group_name''])[1]','varchar(max)') AS [AGName],
	data.value('(event/data[@name=''availability_replica_name''])[1]','varchar(max)') AS [AGReplicaName], 
	data.value('(event/data[@name=''previous_state'']/value)[1]','varchar(max)') AS [PreviousStateID], 
	data.value('(event/data[@name=''previous_state'']/text)[1]','varchar(max)') AS [PreviousState], 
	data.value('(event/data[@name=''current_state'']/value)[1]','varchar(max)') AS [CurrentStateID],
	data.value('(event/data[@name=''current_state'']/text)[1]','varchar(max)') AS [CurrentState]
FROM exEvents
ORDER BY TimeStamp DESC

The results should look something like this:

What the script does when it works

  • Gathers output files (.xel files) from the AlwaysOn_health session. If you do not have this session, you have to re-create it. The easiest way to do so is to copy it from a cluster node where it has not been tampered wth.
  • The data is stored in the temporary table exEvents
  • And the fancy select statement parses said xml data into something we can read without getting a headache

A more complex solution for stubborn people

…or people with many clusters and/or more complex needs.

First, let us make a table to store our results. You can create this table in any database on any server reachable from the machine on which you run these commands. You also need access to all the cluster nodes and a recent copy of the dbatools PowerShell module. Then, we need a script file to execute. Let us call it AOAGLog.sql. Or perhaps Bob.sql. The name does not matter, but it is referenced in the next step.

--Results table
USE TestDB;
CREATE TABLE xeData(object_name nVarchar(MAX), data XML);

--AOAGLog.sql, used to collect xel files. Referenced by invoke-dbaquery below
SELECT object_name, CONVERT(XML, event_data) AS data
FROM sys.fn_xe_file_target_read_file('AlwaysOn*.xel', null, null, null)
WHERE object_name = 'availability_replica_state_change'

Here comes the PowerShell part

We need to execute tha AOAGLog script against all the nodes in the cluster. First we create an array with all the node names. Then we run Invoke DbaQuery and store the result in the $results variable.

$Instances=  @( 'node1', 'node2', 'node3' )
$results = Invoke-DbaQuery -SqlInstance $Instances -File C:\Tools\aoaglog.sql

Next, we take the results and store them in the exData table. This table does not have to be located on one of the nodes. Remember to empty the table between runs to avoid duplicates.

foreach ($row in $results) {
    Invoke-DbaQuery -SqlInstance "xeDataServer" -Database "TestDB" -Query @"
  INSERT INTO xeData (object_name, data)
  VALUES ('$($row.object_name)', '$($row.data)')
  "@
  }

Parse the results

Then, all that remains is to parse the results and analyse them. The default is sorted by timestamp, newest first, but you can play around with this as needed. You can also store the results for later reference.

-- Parse results (runs on the server hosting the results table)
   USE TestDB;
   SELECT  
   	data.value('(/event/@timestamp)[1]','datetime') AS [TimeStamp],
   	data.value('(event/data[@name=''availability_group_name''])[1]','varchar(max)') AS [AGName],
   	data.value('(event/data[@name=''availability_replica_name''])[1]','varchar(max)') AS [AGReplicaName], 
   	data.value('(event/data[@name=''previous_state'']/value)[1]','varchar(max)') AS [PreviousStateID], 
   	data.value('(event/data[@name=''previous_state'']/text)[1]','varchar(max)') AS [PreviousState], 
   	data.value('(event/data[@name=''current_state'']/value)[1]','varchar(max)') AS [CurrentStateID],
   	data.value('(event/data[@name=''current_state'']/text)[1]','varchar(max)') AS [CurrentState]
   FROM xeData
   ORDER BY TimeStamp DESC

sql_text returns *password—–

Problem

While hunting for performance bottlenecks using Spotlight some months ago I came across some strange statements listed only as *password—– (the – continues for a couple of pages). This was new to me, and even a search on the pesky trackers-with-integrated-search came up blank. We passed it off to the developers and never heard back, and I forgot about the whole thing. Then, during a stress test in QA it popped up on the radar again, and this time I decided to dig into it myself.

Analysis

Being a good boy, I was using extended events to trace down the problem. I do not know why, but profiler still feels more at home. Anyhow, I set up a session triggering on sql_text like “%*password—%. Looking something like this:

CREATE EVENT SESSION [pwdstar] ON SERVER
ADD EVENT sqlserver.rpc_starting( ACTION(sqlserver.client_app_name,sqlserver.client_connection_id,sqlserver.client_hostname,sqlserver.client_pid,sqlserver.database_name,sqlserver.session_id,sqlserver.sql_text,sqlserver.tsql_stack,sqlserver.username)
WHERE ([package0].[greater_than_uint64]([sqlserver].[database_id],(4)) AND [package0].[equal_boolean]([sqlserver].[is_system],(0)) AND ([sqlserver].[like_i_sql_unicode_string]([sqlserver].[sql_text],N'%*password--%') AND [sqlserver].[equal_i_sql_unicode_string]([sqlserver].[database_name],N'DBNAME'))))
ADD TARGET package0.event_file(SET filename=N'pwdstar',max_file_size=(200))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=ON,STARTUP_STATE=OFF
GO

I came back an hour later to look at the results, and was surprised to find that the tsql_stack returned a more normal looking statement:

image

Back to the trackers I went, this time trying to figure out why sql_text differed from statement. This produced far better results, and I came across this article: https://www.sqlskills.com/blogs/jonathan/understanding-the-sql_text-action-in-extended-events/, and following the links produced the following statement from Mr. Keheyias:

· If the sql_text Action returns:



*password------------------------------


the text was stripped by the engine because it contained a password reference in it.  This functionality occurs in SQL Trace as well and is a security measure to prevent passwords from being returned in the trace results.

As I said, this was news to me, but the function seems to be an old one not necessarily related to xEvents. Extended Events does however use a more aggressive masking algorithm than what is used in profiler from what I can tell.

Solution

If you come across such queries in your profiler or xEvents results, you are dealing with something SQL server believes is password related. The results are thus hidden. The easiest way to get the statement is capturing tsql_stack as well. In SQL 2016 you can capture the statement directly:

image