The event log is littered with regular copies of the following event:
“GetSQL2008SPNState.vbs : The Query ‘SELECT ProtocolName FROM ServerNetworkProtocol where Enabled = true and InstanceName = ‘INSTANCE” did not return any valid instances. Please check to see if this is a valid WMI Query.. Invalid class”
I do not know what impact this has on SCOM (if any), but the warning is escalated to the Administrative Events view and is thus making it harder to spot other errors as it occurs very frequently. On one of my servers this error occurred every 10 minutes or so. Furthermore, I suspect the cause of the problem to be changes in RunAS accounts in SCOM, but I am not certain.
I found this on the Technet forums http://social.technet.microsoft.com/Forums/en-US/operationsmanagermgmtpacks/thread/01eff618-1087-4b6a-9d3f-9f1402ddf3f4:
Reregister SQL management object (MOF) data. I have yet to figure out exactly what MOF data is, but SCOM is using it to monitor the SQL server instance(s) on the server. Perform the following operations:
- Start an administrative command prompt
- Browse to “C:\Program Files (x86)\Microsoft SQL Server\100\Shared”
- Run mofcomp sqlmgmproviderxpsp2up.mof
- Restart the SCOM agent service (System Center Management)
Every time a maintenance plan generated SQL agent job tries to execute, the following error occurs in the system event log:
Log Name: System
Date: 02.02.2013 16:00:02
Event ID: 10016
Task Category: None
User: SQL AGENT Service account
Computer: SQL Server
The application-specific permission settings do not grant Local Launch permission for the COM Server application with CLSID
to the user “SQL AGENT Service account ”SID (S-1-5-21-) from address LocalHost (Using LRPC). This security permission can be modified using the Component Services administrative tool.
Searching the registry for the CLSID revealed that the DCOM application in question is MsDtsServer100:
A quick peek in the settings reveals that only local administrators and SYSTEM are granted the Local Launch permission by default:
During further analysis I discovered that this error only occurred on one of the cluster nodes (the server in question is part of a two node failover cluster). I then found that the SQLAgent service account was added to local admins on the other cluster node, but not on this one. Whether or not the SQL Agent service account should be a member of local admins or not is debatable, but it sure gives you a lot of gripe if it isn’t.
1: Add the SQL Agent service account to the local administrators group on the server.
2: Or give the SQL Agent service account explicit local launch permissions on MsDtsServer100 using dcomcnfg:
Trying to restore a backup of a database on a different server than the one were the backup originates generates the following error message: “The operating system returned the error ‘5(failed to retrieve text for this error. Reason: 15105)’ while attempting ‘RestoreContainer::ValidateTargetForCreation’”:
Error 5 is as always access denied. I don’t know why text retrieval fails, but that is a problem for another day .Restores of backups originating from the destination server on the destination server works as expected, and the SQLServer service account have the required permissions granted on both the .bak file and the target folder. Restore of the same backup file on other servers during staging worked as expected. I thus concluded that there had to be some differences between the staging and production servers causing the issue.
- staging servers are virtual and stand-alone servers, production is a physical cluster
- staging servers use vm drives as data volumes, production has SAN disks attached to mount points
- staging has a couple of cores and > 20gigs of ram, production has 16 cores and > 200 gigs
Then it hit me, mount points are renowned for causing strange permission issues due to the fact that the permissions for the mount point and the actual volume mounted are stored separately in different ACLs:
Further investigation revealed that the service account had full control permissions on both ACLs, but the volume permissions on the data and transaction log volumes were granted via group membership, while the mount point permissions on those volumes were granted explicitly. This doesn’t seem to be a problem for other operations, but when you try to restore a database to a different destination than were it came from, the server account needs explicit permissions on the destination folder(s). That is, when original file name and Restore As are not the same.
Grant the SQLServer service account explicit permissions on both the volume and the mount point. If you have different mount points for transaction log and data files, you have to do this on both folders. Furthermore, I would guess similar errors could occur if the service account lacks access to the source bak file.
20131031: Code to limit the size of the error log files added to the solution. So far only tested on SQL 2012 SP1 CU2.
The Database server/operating system starts complaining about low disk space. Analysis reveals that the volume containing the default MSSQL log folder is very low on space, maybe even 0 bytes free. Further analysis shows that the ERRORLOG files are taking up all remaining space on the volume.
The ERROROLOG files is filled to the brim with messages like this one:
Date 07.01.2013 21:09:09
Log SQL Server (Current - 07.01.2013 14:57:00)
The activated proc '[Microsoft.SystemCenter.Orchestrator.Maintenance].[MaintenanceWorker]' running on queue 'OrchestratorDB.Microsoft.SystemCenter.Orchestrator.Maintenance.MaintenanceServiceQueue' output the following: 'Error: 9002, The transaction log for database 'OrchestratorDB' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases'
Orchestrator has a maintenance process that does not handle a situation were the database transaction log is unable to grow, either due to restrictions or due to the fact that the disk is full. When this happens, it bloats the ERRORLOG with the messages mentioned above. This happens very quickly as it keeps retrying several times each second, so even a 10GB volume is filled in hours. If this situation is left unchecked for a couple of days, the risk of volume NTFS corruption is high.
- Disable access to the Orchestrator database or stop the Orchestrator servers to prevent further bloating.
- Stop the database instance
- Delete the bloated error log to free up space
- Restart the instance
- Change the number of error logs SQL server is allowed to create (default is 6, sample limits to 10) and limit the size of the error log files to 50MB each. This will give you a total of 11 error logfiles, each with a max size of about 50 000 KB for a total of 550MB.
EXEC xp_instance_regwrite N'HKEY_LOCAL_MACHINE',
N'NumErrorLogs', REG_DWORD, 10
EXEC xp_instance_regwrite N'HKEY_LOCAL_MACHINE',
N'ErrorLogSizeInKb', REG_DWORD, 50000
- Solve the problem regarding the Orchestrator DB transaction log. Usually such problems are caused by either a failed backup job (thus no log truncation) or the simple fact that you need to allocate more space for the data. If a failed backup job is the problem, consider shrinking the transaction logs back down to their normal size. See https://lokna.no/?p=349 for more information about shrinking troublesome logs (Norwegian version only).
- Enable access to the Orchestrator database.
- Monitor the error logs and see if the values you chose for log size and number of logs fits your workload. If the logs contain at least a day of messages, but not more than a month, you are in the butter zone.
Sometimes you might want to change the owner of a job connected to a maintenance plan. For instance, the current owner could be a deleted account, something that will cause the jobs to fail with not so funny error messages regarding permissions or missing accounts. It is possible to change the job owner using TSQL or Management Studio, but each time the plan is updated, the owner is overwritten by the owner of the maintenance plan. To change this, you have to use TSQL AFAIK.
NB: Running this is dangerous if you don’t know what you are doing! The msdb.dbo.ssispackages table contains system objects as well as your manually created plans, and could also contain SSIS packages as the name indicates. Always take a backup of MSDB before making changes.
Changing the maintenance plan owner on MSSQL 2000:
WHERE owner LIKE ‘Feil bruker’
On MSSQL 2008R2:
WHERE ownersid = suser_sid(’OLD USER’)
or just set sa as the owner of everything:
WHERE ownersid != 0x01
Afterwards, either save the plan to update job ownership or run the following to change the job owner as well.
WHERE suser_sname(owner_sid) LIKE ‘Feil bruker’
or just set sa as the owner of everything here as well:
WHERE owner_sid != 0x01
If it doesn’t seem to work as expected, restart the agent service.