Failover Cluster: access to update the secure DNS Zone was denied.

Problem

After you have built a cluster, the Cluster Events page fills up with Event ID 1257 From FailoverClustering complaining about not being able to write to the DNS records in AD:

“Cluster network name resource failed registration of one or more associated DNS names(s) because the access to update the secure DNS Zone was denied.


Cluster Network name: X
DNS Zone: Y


Ensure that cluster name object (CNO) is granted permissions to the Secure DNS Zone.”

image

Solution

There may be other root cause scenarios, but in my case the problem was a static DNS reservation on the domain controller.

As usual, if you do not understand the action plan below, seek help or get educated before you continue. Your friendly local search  engine is a nice place to start if you do not have a local cluster expert. This action plan includes actions that will take down parts of your cluster momentarily, so do not perform these steps on a production cluster during peak load. Schedule a maintenance window.

  • Identify the source of the static reservation and apply public shaming and/or pain as necessary to ensure that this does not happen again. Cluster DNS records should be dynamic.
  • Identify the static DNS record in your Active Directory Integrated DNS forward lookup zone. Ask for help from your DNS or AD team if necessary.
  • Delete the static record
  • Take the Cluster Name Object representing the DNS record offline in Failover Cluster manager (or by powershell). Be aware that any dependent resources will also go offline.
  • Bring everything back online. This should trigger a new DNS registration attempt. You could also wait for the cluster to attempt this automatically, but client connections may fail while you are waiting.
  • Verify that the DNS record is created as a dynamic record. It should have a current Timestamp.

Identify which drive \Device\Harddisk#\DR# represents

Problem

You get an error message stating that there is some kind of problem with \Device\Harddisk#\DR#. For instance Event ID 11 from Disk: The driver detected a controller error on \Device\Harddisk4\DR4.

image

The disk number referenced in the error message does not necessarily correspond to the disk id numbers in Disk Management. To figure out which disk is referenced, some digging is required.

Solution

There may be several ways to identify the drive. In this post I will show the WinObj-method, as it is the only one that has worked consistently for me.

  • First, get a hold of WinObj from http://sysinternals.com 
  • Run WinObj as admin
  • Browse to \Device\Harddisk#. We will use Harddisk4\DR4 as a sample from here on out, but you should of course replace that with the numbers from your error message.

image

  • Look at the SymLink column to identify the entry from the error message.

image

  • Go to the GLOBAL folder and sort by the SymLink column.
  • Scroll down to the \Device\Harddisk4\DR4 entries
  • You will find several entries, some for the drive and some for the volume or volumes.

image

image

  • The most interesting one in this example is the drive letter D for Volume 5 (the only volume on this drive).
  • Looking in Disk management we can identify the drive, in this case it was an empty SD-card slot. We also see that the Disk number and DR number are both 4, but there is no definitive guarantee that these numbers are equal.

image

image

Most likely, the event was caused by an improper removal of the SD card. As this is a server and the SD card slot could be a virtual device connected to the out-of-band IPMI/ILO/DRAC/IMM management chip, the message could also be related to a restart or upgrade of the out-of-band chip. In any case, there is no threat to our important data which are hopefully not located on an SD card.

If you receive this error on a SAN HBA or local RAID controller, check the management software for your device. You may need a firmware upgrade, or you could have a slightly defective drive that is about to go out in flames.

iSCSI in the LAB

I sometimes run internal Windows Failover Clustering training, and I am sometimes asked “How can I test this at home when I do not have a SAN?”. As you may know, even though clusters without shared storage are in deed possible in the current year, some cluster types still rely heavily on shared storage. When it comes to SQL Server clusters for instance, a Failover Cluster instance (which relies on shared storage) is a lot easier to operate compared to an AOAG cluster which does not rely on shared storage. There are a lot of possible solutions to this problem, you could for instance use your home NAS as an iSCSI “SAN”, as many popular NAS boxes have iSCSI support. In this post however, I will focus on how to build a Windows Server 2019 vm with an iSCSI target for LAB usage. This is NOT intended as a guide for creating a production iSCSI server. It is most definitely a bad iSCSI server with poor performance and not suited for anything requiring production level performance.

“What will I need to do this?” I hear you ask. You will need a Windows Server 2019 VM with some spare storage mounted as a secondary drive, a domain controller VM and some cluster VMs. You could also use a physical machine if you want, but usually this kind of setup involves one physical machine running a set of VMs. In this setup I have four VMs:

  • DC03, the domain controller
  • SQL19-1 and SQL19-2, the cluster nodes for a SQL 2019 CTP 2.2 failover cluster instance
  • iSCSI19, the iSCSI server.

The domain controller and SQL Servers are not covered in this post. See my Failover Cluster series for information on how to build a cluster.

Action plan

  • Make sure that your domain controller and iSCSI client VMs are running.
  • Make sure that Failover Cluster features are installed on the client VMs.
  • Enable the iSCSI service on your cluster nodes that are going to use the iSCSI server. All you have to do is to start the iSCSI initiator program, it will ask you to enable the iSCSI service:
  • clip_image001

  • Create a Windows 2019 Server VM or physical server.
  • Add it to your lab domain.
  • Set a static IP, both v4 and v6. This is important for stability, iSCSI does not play well with DHCP. In fact, all servers involved in this should have static IPs to ensure that the iSCSI storage is reconnected properly at boot.
  • Install the iSCSI target feature using powershell.
    • Install-WindowsFeature FS-iSCSITarget-Server
  • Add a virtual hard drive to serve as storage for the iSCSI volumes.
  • Initialize the drive, add a volume and format it. Assign the drive letter I:

  • clip_image002

  • Open Server Manager and navigate to File and Storage Services, iSCSI
  • Start the “New iSCSI virtual disk wizard”
  • Select the I: drive as a location for the new virtual disk

  • clip_image003

  • Select a name for you disk. Here, I have called it SQL19_Backup01
  • clip_image004

  • Set a disk size in accordance with your needs. As this is for a LAB environment, select a Dynamically expanding disk type.
  • clip_image005

  • Create a new target. I used SQL19Cluster as the name for the target.
  • Add the cluster nodes as initiators. You should have enabled iSCSI on the cluster nodes before this step (see above). The cluster nodes also has to be online for this to be successful.
  • clip_image006

  • As this is a lab we skip the authentication setup
  • clip_image007

  • Done!
  • clip_image008

  • On the cluster nodes, start the iSCSI initator.
  • Input the name of the iSCSI target and click the quick connect button.
  • clip_image009

  • Initialize, online and format the disk on one of the cluster nodes
  • clip_image010

  • Run storage validation in failover cluster manager to verify your setup. Make sure that your disk/disks are part of the storage validation. You may have to add the disk to the cluster and set it offline for validation to work.
  • Check that the disk fails over properly to the other node. It should appear with the same drive letter or mapped folder on both nodes, but only on one node at the time unless you convert it to a Cluster Shared Volume. (Cluster Shared Volumes are for Hyper-V Clusters).

Static IPv6 Cluster address

Problem

If you install a Windows Failover cluster using IPv6, you will get a dynamic IPv6 address as the standard failover clustering management tools only support dynamic IPv6 addresses. If you live in Microsoft Wonderland (a mythical land without any firewalls) this is fine, but for the rest of us, this is kind of impractical. We need a static address.

Solution

Run this script to add a static address. Then you have to find and remove the dynamic address. You can do this in Failover Cluster Manager or through powershell, but you have to identify it first so I cannot script it for you.

The script defaults to the Cluster Group resource group, but you can do the same for say a SQL Server instance or any other cluster group with IP addresses and network names.

#Add a static IPv6 address to the cluster group and Network name specified
$IPv6="STATIC ADDRESS HERE"
$IPv6Name="IP Address $IPv6" #Do not change this line
$Group= "Cluster Group"
$NetworkName = "Cluster Name"
                               
Add-ClusterResource -name $IPv6Name -ResourceType "IPv6 Address" -Group $Group
Get-ClusterResource -Name $IPv6Name | Get-ClusterParameter
Get-ClusterResource -Name $IPv6Name | Set-ClusterParameter -Multiple @{"PrefixLength" = "64";"Address"= $IPv6}
Get-ClusterResource -Name $IPv6Name | Start-ClusterResource
Stop-ClusterResource $NetworkName
Add-ClusterResourceDependency $IPv6Name -Resource $NetworkName
Start-ClusterResource $NetworkName

Disable automatic loading of Server Manager

Problem

When you log in to a server, the Server Manager windows loads automatically. On a small VM this can take some time and waste som e resources, especially if you forget to close it and log off.

Solution

Create a GPO to disable the automatic loading of Server Manager.

  • Start Group Policy Managment Editor
  • Create and link a new GPO on the OU/OUs where you want to apply it.
  • Find the setting Computer Configuration\Policies\Administrative Templates\System\Server Manager\Do not display Server Manager at logon
  • Enable it
  • close the GPO and wait for a GPO refresh, or trigger a gpupdate /force on any applicable computers.

image

Interface Metric for cluster nodes

In place of the hopefully well-known binding order setting, Windows Server 2016 requires us to set the interface metric to prioritize traffic on multi-homed servers. This is especially relevant on cluster servers with a separate Internal/Heartbeat interface or LiveMigration interface.

First, list the interfaces and their current metric:

Then, change the metric of your domain-facing interface to a lower value than any other interface. In the example, we will set the metric for Public (index 10) to 14.

Set-NetIPInterface -ifIndex 10 -InterfaceMetric 14

Make sure that the domain-facing adapter has the lowest metric, and it should always be first on the list whenever an application sends network packets without a specified interface.

More about the default values

Failover Cluster Checklist, Windows Server 2019

Introduction

This post was originally written for Windows 2012R2. This is a rework with updates for Windows 2019. It is currently a work in progress.

OK, so you want to install a cluster? This is not a “Should I build a cluster?” post, this is a “How to build a proper cluster” post. I like checklists, so I made a Windows Failover Cluster installation checklist. Some of the points have their own post, some are just a short sentence. I will add more details as time allows. The goal is to share my knowledge about how to build stable clusters. I may disagree with other best practices out there, but this list is based on my experience, what works in production and what does not. I use it to build new clusters, as well as troubleshooting clusters made by others. Clustering is so easy that anyone can build a working cluster these days, but building a stable production worthy cluster may still be like finding you way out of a maze. A difficult maze filled with ghosts, trolls and angry badgers.

There are some things you need to know about this post before you continue reading:

  • This list is made for production clusters. There is nothing stopping you from building a lab using this list, but if you do as I say, you will build a very expensive lab.
  • I work with SQL Server, Hyper-V and File clusters. This list may work for other kinds of clusters as well, but I have not tested it on recent versions.
  • This list was originally published in 2014 for Windows 2008R2 up until Windows 2012R2. It os now updated for Windows Server 2019. I will try to add version specific instructions when necessary.
  • This list is for physical clusters. I dislike virtual clusters, because most organizations are not clever enough to create functioning virtual production clusters that won’t fail miserably due to user error someday. (By “virtual clusters” I mean cluster nodes on top of hypervisors, not clustered hypervisors). It is however entirely possible to build virtual clusters using this list, especially if you employ technologies such as Virtual FC.
  • This is my checklist. I have spent more than a decade honing it, and it works very well for me. That does not guarantee that it will work for you. I welcome any comments on alternative approaches, but don’t expect me to agree with you.
  • This list is mostly written in a “How to do it” manner, and may be lacking in the “But why should I do it” department. This is due to several reasons, but mostly a lack of time on my part. I do however want you to know that there are several hours, if not days of work behind each point.
  • Updates will be made as I discover new information.
  • The list is chronological. That is, start at the top and make your way down the list. If you jump back and forth, you will not achieve the desired result.
  • This list is based on the LTSB (Long-term Servicing Branch) GUI version of Windows Server, not Core. You can build clusters on Core, but I do not recommend it. Clusters may be very finicky to troubleshoot when things go wrong, and doing so on Windows Core is like trying to paint a room through the keyhole. So unless you have the infrastructure and budget necessary to treat your physical servers as throw-away commodities I recommend installing the “Desktop Experience”. To elaborate, if you have trouble with a core server, you remove it and deploy a replacement server. All automated of course.
  • Understanding this list requires knowledge of Active Directory and basic knowledge of Failover Clustering.
  • There are many special cases not covered. This list is for the basic 2-10 node single datacenter cluster. The basic rules still apply though, even if you have nodes in four datacenters and use a hybrid cloud setup.

The design phase

In the design phase, there are a lot of decisions you have to make BEFORE you start building the cluster. These are just a few of them:

  • How many nodes do you need? Remember you need at least one standby node for HA (High Availability). Depending on the total number of nodes you may need several standby nodes. Some managers will complain about the extra nodes just sitting there unused, but they forget that they are there to provide HA. No matter the number of nodes, make sure the hardware is as equal as possible. I don’t care what the manual says, having cluster nodes with different hardware in them is a recipe for disaster. If possible, all nodes should be built on the same day by the same persons and have consecutive serial numbers.
  • How many network fabrics do you need? And how many can you afford? See Networks, teaming and heartbeats for clusters for more information. This is where most troublesome clusters fail.
  • Will you use shared storage? And what kind of shared storage? In short: FCOE is bad for you, ISCSI is relatively cheap, SMB3 is complicated and may be cheap, shared DAS/SAS is very cheap, FC is the enterprise norm and infiniband is for those who want very high performance at any cost. Note that the deployment cost for Infiniband in small deployments has fallen significantly in the last couple of years. In most cases you will have to use what is already in place in your datacenter though. And it is usually better to have something your storage guys are used to supporting. Just remember that storage is very important for your overall performance, no matter what kind of cluster. For file clusters, high throughput is important. For SQL Server, low latency is key and you should use FC or Infiniband.
  • What kind of hardware should you use in your cluster nodes? These are my opinions, based on my personal experience to date. My opinions on this change frequently as new generations are released, but here goes:
    • Emulex should stop making any kind of hardware. It is all bad for you and bad for your cluster. If you are having trouble with cluster stability and you have Emulex made parts in your nodes, remove them at once.
    • QLogic make good FC HBAs. If you have a FC SAN, QLogic HBAs are highly recommended. If you have QLogic network adapters on the other hand, use them for target practice.
    • Broadcom network adapters used to be good, but the drivers for Windows are getting worse by the minute.
    • Intel X560 is my current favorite network adapter. It is sold under many names, so check what chip is actually used on the cards offered by your server manufacturer.
    • Use Brocade FC switches only. They are sold under many other brand names as well, I have seen them with both HP and IBM stickers.
    • Use Cisco or HP ProCurve network switches, but do not use them for FC traffic.
    • Make sure your nodes have local disk controllers with battery or flash backed cache. Entry level disk controllers are not worth the cardboard box they are delivered in and may slow down the most hard-core cluster.
    • Intel Xeon CPUs currently reigns supreme for most applications. There are however some edge cases for SQL Server where AMD CPUs will perform better. I recommend reading Glenn Berry’s blogs for up to date SQL Server CPU information.
    • HP, Lenovo and Dell all make reasonably good servers for clustering. Or, I should say equally bad, but better than the alternatives.
  • RACK or Blade?
    • RACK servers
      • are easier to troubleshoot
      • are versatile
      • give you a lot of expansion options
      • are cheaper to buy
    • Blade servers are
      • space efficient
      • cheaper to maintain if you rent rack space
      • easier to install
      • limited in terms of expansion options
  • Where should your nodes be located physically? I do not recommend putting them all in the same rack. The best solution is to put them in separate rooms within sub-millisecond network distance. You can also place them in separate data centers with a long distance between them if you do not use shared storage or use some kind of hybrid solution. I do not recommend SAN synchronization to data centers far, far away though, it is better to have synchronization higher up in the stack. If you only have one datacenter, place the nodes in different racks and make sure they have redundant power supplies.
  • Talking about power, your redundant power supplies should be connected to separate power circuits, preferably with each connected to an independent UPS.
  • What domain should your servers be member of, and which organizational unit should you use? Failover clustering will not work without Active Directory. No domain clusters are supported from W2019 but not recommended. You probably need AD for other stuff anyway.
  • The Active Directory role should NOT be installed on the cluster nodes. You should have at least two domain controllers, one of which should be a dedicated physical machine. I know that MS now supports virtualizing all your domain controllers, but that does not mean that you should do it, or that it is smart to do so. I would also recommend creating a separate OU for each cluster.
  • What account should you use to run the installation? Previously a separate cluster installation account was recommended, but with newer versions it is usually no problem using a regular sysadmin account. The account should be a domain administrator to make everything easy, but this checklist will work as long as you have local admin on the cluster nodes. (Be aware that some points require som form of AD write access).
  • And then there are a lot of product and project specifics, such as storage requirements, CPU and memory sizing and so on, all of which may affect your cluster design.

The actual checklist

All list items should be performed on each node in the cluster unless specified otherwise. You can do one node at the time or all at once until you get to cluster validation. All nodes should be ready when you run cluster validation. I find it easiest to remember everything by doing one list item for each node before I move on to the next, making notes along the way.

  • Mount the hardware
  • Set BIOS/UEFI settings as required by your environment. Remember to enable High Performance mode, otherwise you will be chasing performance gremlins.
  • If your cluster nodes are virtual machines, make sure that they are not allowed to be hosted by the same host. How you configure this will depend on your virtualization platform.
  • Install Windows Server
  • Copy any required media, drivers etc. to a folder on each node
  • Static or reserved IP addresses are recommended, bot IPv4 and IPv6.
  • If you are not able to use IPv6 to talk to your domain controllers, disable IPv6 completely in registry. See How to disable IPv6
  • Make sure all your drivers are installed using Device Manager.
  • Make sure you are running current BIOS, Remote Access, RAID, HBA and Network firmware according with your patch regime. If in doubt, use the latest available version from your server vendor. Do NOT download drivers and firmware from the chip vendor unless you are troubleshooting a specific problem.
  • Make sure your drivers are compatible with the firmware mentioned above.
  • Check whether the server is a member of the domain, and add it to the domain if necessary.
    Activate a machine proxy if you use a proxy server to access the internet. See Proxy for cluster nodes for more information.
  • Activate RDP.
  • Create a firewall rule to allow ICMP (ping) on all interfaces regardless of profile.
New-NetFirewallRule -DisplayName "Allow ICMP all profiles IPv4" -Direction Inbound -Protocol ICMPv4  -Action Allow
New-NetFirewallRule -DisplayName "Allow ICMP all profiles IPv6" -Direction Inbound -Protocol ICMPv6  -Action Allow
  • Select the High performance power plan.
  • If virtual node, enable VRSS. If physical, enable RSS. If you are creating a Hyper-V cluster, enable VMQ as well. See https://lokna.no/?p=2464 for details.
  • Make sure that your nodes are located in the correct OU. The default “Computers” container is not the correct OU.

  • Add the failover cluster features:
Install-WindowsFeature -Name Failover-Clustering –IncludeManagementTools
  • Check the interface metric. Your domain facing team/adapter should have the lowest metric. See https://lokna.no/?p=2637 
  • Disable NICs that are not in use
  • Install any prerequisites required by your shared storage. Check with your SAN admin for details.
  • Change page file settings according to Page file defaults
  • Install PSWindowsupdate and run it for Microsoft update.
  • Install cluster hotfixes. See Does your cluster have the recommended hotfixes?
  • If you are using shared storage, verify storage connections and MPIO in accordance with guidelines from your SAN vendor. Most SAN vendors have specific guidelines/whitepapers for Failover Clustering.
  • Make sure that you are connected to your shared storage on all nodes and have at least one LUN (drive) presented for validation.
  • Validate the configuration: Validating a Failover Cluster. Do not continue until your cluster passes validation. I have yet to see a production cluster without validation warnings, but you should document why you have each warning before you continue.
  • Create the cluster: Creating a Failover Cluster
  • Verify the Quorum configuration. Make sure dynamic quorum is enabled.  You should always have a quorum witness drive (even if you don’t use it). The create cluster wizard will without fail select another quorum witness drive than the one you intended to use, so make sure to correct this as well.
  • Grant create computer object permissions to the cluster. This is necessary for installation of most clustered roles, and this is why each cluster should have its own OU.

SQL Server 2016 sysadmin escalation using PowerUpSQL

Scenario

For some reason you need to gain sysadmin access to a SQL Server instance. Maybe you have inherited it from a DBA that was eaten by a sleepy tiger, or more likely, the SQL Server instance was installed by a consultant and the sysadmin password was forgotten a long time ago. Oh, and you want to do it while the SQL Server is running. No matter why, here is one possible solution, provided that you have local admin access. There are probably ways to escalate access without local admin as well. And of course there are tons of ways to escalate a normal user to an admin user, but that is not the focus of this post.

This procedure has been tested on SQL Server 2014 and 2016. On previous versions, simply running SQL Server Management Studio as admin locally on the server was enough, but on never versions that path has been blocked by default.

To make the process easier, we will utilize the PowerUpSQL PowerShell module from NetSPI.

Procedure

Note: One of the SQL Server instances used in the example is the default instance of the server IM01. As such, it has no instance specification on the form Server\instance. If your server has on or more named instances, you will have to specify the instance name.

You need to obtain local or remote console access to the server. The user has to be a local admin on the server, and should be a user on the SQL Server. Application service accounts are nice starting point if no one has any access to the SQL Server at all. That is, you may not have access to the server, but you have an application using it, and that application accesses the SQL Server using a service account in Active Directory. You can give that account temporary local admin access on the SQL Server. Then you can escalate it to sysadmin access, and then use that login to grant yourself access. You should of course remove sysadmin access from the service account when you are done.

As a sidenote, you will often find that the service account already has sysadmin access, or that the application has a hard-coded SQL Server login account (not in AD but a SQL Server specific account) that is called sa (sysadmin) or is a member of the sysadmin group.

  • Download the module from https://github.com/NetSPI/PowerUpSQL. There are ways to launch the module remotely, but in this example we are copying them to the server.
  • Copy the files to a local folder, we use C:\Temp in our example.
  • Get a copy of psexec.exe from sysinternals.
  • Open an administrative PowerShell session running as the local system.
  • PsExec.exe -i -s powershell.exe

image

  • Run the following commands in the black PowerShell console.
  • Verify that you are running as the nt authority\system account by running whoami.
  • Import-Module c:\Temp\PowerUpSQL.psd1
  • Enumerate local SQL Server instances
  • Get-SQLInstanceLocal
  • Check the current access level
  • Get-SQLServerInfo -Verbose -Instance Server\instance

image

  • Look for IsSysadmin in the output
  • Escalate access
    • Invoke-SQLEscalatePriv –Verbose –Instance “SQLServer1\Instance1”
    • Invoke-SQLImpersonateService -Verbose -Instance SQLServer1\Instance1

Note: Sadly, some of the screenshots were lost. I will add some new ones later if I remember.

  • If you are successful, you may execute TSQL code to grant yourself access.
  • Get-SQLServerInfo -Verbose -Instance Server\instance
  • Look for IsSysadmin in the output
  • Import SQLPS
  • Import-Module -Name Sqlps
  • Test that you are able to execute arbitrary TSQL
  • Invoke-Sqlcmd -Query “Select @@version” -serverinstance SQLServer1\Instance1
  • Run the TSQL necessary to grant yourself access. Sample:
  • Invoke-Sqlcmd -Query “ALTER SERVER ROLE [sysadmin] ADD MEMBER [Domain\user]” -serverinstance
  • De-escalate access
  • Invoke-SQLImpersonateService -Verbose -Rev2Self
  • Connect to the server using your chosen account.

If that did not work

There is another method, based on shutting the SQL Server down and starting it in single user mode.

  • Log in to the SQL Server as a local admin.
  • Shut down SQL Server and the SQL Server Agent.
  • image
  • Start an administrative command prompt.
  • Navigate to the SQL Server instance directory, usually something like this: “C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Binn”
  • If you are using a default instance, execute sqlservr.exe –m –c
  • For a named instance, use sqlservr.exe –m –c –s Instancename
  • If someone else keep nabbing your connection, use -m”SQLCMD” to only allow sqlcmd connections.
  • Open a new command prompt and navigate to the instance directory.
  • Execute sqlcmd.exe
  • You should get a 1> prompt.
  • Input “@@SERVERVERSION” on line 1 without the quotes and press enter.
  • Input “GO” at line 2 and again press enter. The server should respond with the current version.

image

  • Add yourself to the sysadmin role by entering the following lines:
USE [master]
GO
CREATE LOGIN [DOMAIN\user] FROM WINDOWS WITH DEFAULT_DATABASE=[master]
GO
ALTER SERVER ROLE [sysadmin] ADD MEMBER [DOMAIN\user]
GO

image

  • There will be no response, but you can verify by executing the following lines.
EXEC sp_helpsrvrolemember 'sysadmin'
GO
  • It will return a list of sysadmin group members.
  • When you are done, return to the first cmd window and press [Ctrl]+C to shut down the SQL Server.
  • Restart the SQL Server and SQL Server Agent services in the normal mode.

Unexpected sense, Sense key B code 41

Problem

The system event log is running over with Event Id 2095 from Server Administrator. “Unexpected sense. SCSI sense data: Sense key:  B Sense code: 41 Sense qualifier:  0:  Physical Disk 0:1:20 Controller 1, Connector 0”.

Several events per second at the most.

image

 

Analysis

https://en.wikipedia.org/wiki/Key_Code_Qualifier shows a list of common Sense Key Codes. Key B translates to aborted command. Key 41 is not listed, but it is very likely that something is not as it should be with disk 20 on controller 1. OMSA can tell you which disk this is, and what arrays it is a part of.

image

OMSA claims that the disk is working fine, but unless the drive is trying to tell me that it has found some missing common sense, I think I have to respectfully disagree. Such faults are usually not a good sign, especially when they are so prevalent as in this case. So I performed a firmware/driver upgrade as that will often provide some insight. In this case SUU 1809 replaced SUU 1803, that is a 6 month span in revisions.

The upgrade resulted in a new error:

Event ID 2335, Controller event log: PD 14(e0x20/s20) is not a certified drive:  Controller 1 (PERC H730P Adapter) .

image

 

OMSA tells me that the disk is in a failed state.

image

 

Time to register a support case with the evil wizards of the silver slanted E.

 

Solution

The disk was replaced. OMSA still complains about an error, specifically an enclosure error, but the iDRAC shows a green storage status.

OMSA:

image

image

 

iDRAC:

image

After restarting all the OMSA services and the iDRAC Service Module service the status returned to green in OMSA as well.

image

Event ID 3 from Resourcemanager: No such user

Problem

A MIMWAL workflow fails, and the ForeFront Identity Manager event log records Event ID 3 from Microsoft.Resourcemanager with the message:

GetCurrentUserFromSecurityIdentifier: No such user [DOMAIN\USER], [SID]

image

The WAL event log was not particularly helpful in this case, it just threw out generic exceptions:

Event ID 40405: WAL (2.17.0927.0): 09/27/2018 14:20:47.3127: The type initializer for ‘Microsoft.ResourceManagement.WebServices.Client.ResourceManagementClient’ threw an exception.

Analysis

The Workflow in question tries to execute a PowerShell script, and we spent quite a lot of time troubleshooting the script to no avail. In retrospect, the problem is actually stated quite directly in the first event from the MIM/FIM log: “No such user”.

The user mentioned is the service account for the FIM Service, and both the username and SID is correct. Thus the error message did not really make any sense, I was not able to figure out why MIM could not find the user in AD when both the sAMAccountName and SID were correct. Wherein lies the problem. I was thinking backwards. The problem was not in AD, but in the FIMService database. For some reason the service account was not registered with a SID in the database. To troubleshoot you have to run some queries against the FIMService database.

Find the ObjectKey for the user

USE FIMService;
SELECT [ObjectKey],[DomainAndAccountName]
FROM [FIMService].[fim].[DomainAndAccountName]
WHERE DomainAndAccountName = 'DOMAIN\USER';
ObjectKey            DomainAndAccountName
-------------------- ----------------------------------------------------
12345                DOMAIN\USER

(1 row affected)

We find the object key “12345” and place it into the where-clause for the next query.

Find the SID

USE FIMService;
SELECT  [UserObjectKey] ,[SecurityIdentifier]
  FROM [FIMService].[fim].[UserSecurityIdentifiers]
  WHERE UserObjectKey = '12345';
UserObjectKey        SecurityIdentifier
-------------------- ----------------------

(0 rows affected)

This query returns no result. It should return the SID for the user in hexadecimal format.

Solution

Add the SID <-> ObjectKey mapping. As usual, make sure that you understand these steps before you execute them.

  • Get the hexadecimal SID. You can get it from AD Users and Computers in advanced mode.
  • Open your user, look at the attribute list and find the ObjectSID attribute.
  • View the attribute in hexadecimal form. It should look something like 01 05 00 00 00 and so on.
  • Remove the spaces using your favorite text editor, and add a 0x prefix to indicate that this is a hex value.
  • Your result should look like this: 0x010500000000000512345234504560734063457AFCDEBB69EE0000
  • Run the following query, inserting the username and hexadecimal SID
USE FIMService;
INSERT INTO  [FIMService].[fim].UserSecurityIdentifiers VALUES ('12345', 0x010500000000000512345234504560734063457AFCDEBB69EE0000);

To verify, run the query against UserSecurityIdentifiers again (the one that returned 0 rows) and verify that you now get a response mapping your ObjectKey to your SID. If you are lucky, your workflow is now working as expected. If you are not so lucky, you should at least get a different error message…