Orchestrator bug kills database server

Update

20131031: Code to limit the size of the error log files added to the solution. So far only tested on SQL 2012 SP1 CU2.

Problem

The Database server/operating system starts complaining about low disk space. Analysis reveals that the volume containing the default MSSQL log folder is very low on space, maybe even 0 bytes free. Further analysis shows that the ERRORLOG files are taking up all remaining space on the volume.

Analysis

The ERROROLOG files is filled to the brim with messages like this one:

Date		07.01.2013 21:09:09
Log		SQL Server (Current - 07.01.2013 14:57:00)

Source		spid29s

Message
The activated proc '[Microsoft.SystemCenter.Orchestrator.Maintenance].[MaintenanceWorker]' running on queue 'OrchestratorDB.Microsoft.SystemCenter.Orchestrator.Maintenance.MaintenanceServiceQueue' output the following:  'Error: 9002, The transaction log for database 'OrchestratorDB' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases'

image

image

Orchestrator has a maintenance process that does not handle a situation were the database transaction log is unable to grow, either due to restrictions or due to the fact that the disk is full. When this happens, it bloats the ERRORLOG with the messages mentioned above. This happens very quickly as it keeps retrying several times each second, so even a 10GB volume is filled in hours. If this situation is left unchecked for a couple of days, the risk of volume NTFS corruption is high.

solution

  • Disable access to the Orchestrator database or stop the Orchestrator servers to prevent further bloating.
  • Stop the database instance
  • Delete the bloated error log to free up space
  • Restart the instance
  • Change the number of error logs SQL server is allowed to create (default is 6, sample limits to 10) and limit the size of the error log files to 50MB each. This will give you a total of 11 error logfiles, each with a max size of about 50 000 KB for a total of 550MB.
    USE [master]
    GO
    EXEC xp_instance_regwrite N'HKEY_LOCAL_MACHINE',
     N'Software\Microsoft\MSSQLServer\MSSQLServer',
     N'NumErrorLogs', REG_DWORD, 10
    GO
    EXEC xp_instance_regwrite N'HKEY_LOCAL_MACHINE',
     N'Software\Microsoft\MSSQLServer\MSSQLServer', 
     N'ErrorLogSizeInKb', REG_DWORD, 50000
    GO
  • Solve the problem regarding the Orchestrator DB transaction log. Usually such problems are caused by either a failed backup job (thus no log truncation) or the simple fact that you need to allocate more space for the data. If a failed backup job is the problem, consider shrinking the transaction logs back down to their normal size. See https://lokna.no/?p=349 for more information about shrinking troublesome logs (Norwegian version only).
  • Enable access to the Orchestrator database.
  • Monitor the error logs and see if the values you chose for log size and number of logs fits your workload. If the logs contain at least a day of messages, but not more than a month, you are in the butter zone.

Author: DizzyBadger

SQL Server DBA, Cluster expert, Principal Analyst

One thought on “Orchestrator bug kills database server”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.