Event 20501 Hyper-V-VMMS

Problem

The following event is logged non-stop in the Hyper-V High Availability log:

Log Name:      Microsoft-Windows-Hyper-V-High-Availability-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          27.07.2017 12.59.35
Event ID:      20501
Task Category: None
Level:         Warning
Description:
Failed to register cluster name in the local user groups: A new member could not be added to a local group because the member has the wrong account type. (0x8007056C). Hyper-V will retry the operation.

image

Analysis

I got this in as an error report on a new Windows Server 2016 Hyper-V cluster that I had not built myself. I ran a full cluster validation report, and it returned this warning:

Validating network name resource Name: [Cluster resource] for Active Directory issues.

Validating create computer object permissions was not run because the Cluster network name is part of the local Administrators group. Ensure that the Cluster Network name has “Create Computer Object” permissions.

I then checked AD, and found that the cluster object did in fact have the Create Computer Object permissions mentioned in the message.

The event log error refers to the cluster computer object being a member of the local admins group. I checked, and found that it was the case. The nodes themselves were also added as local admins on all cluster nodes. That is, the computer objects for node 1, 2 and so on was a member of the local admins group on all nodes. My records show that this practice was necessary when using SOFS storage in 2012. It is not necessary for Hyper-V clusters using FC-based shared storage.

The permissions needed to create a cluster in AD

  • Local admin on all the cluster nodes
  • Create computer objects on the Computers container, the default container for new computers in AD. This could be changes, in which case you need permissions in the new container.
  • Read all properties permissions in the Computers container.
  • If you specify a specific OU for the cluster object, you need permissions in this OU in addition to the new objects container.
  • If your nodes are located in a specific OU, and not the Computers OU, you will also need permissions in the specific OU as the cluster object will be created in the OU where the nodes reside.

See Grant create computer object permissions to the cluster for more details.

Solution

As usual, a warning: If you do not understand these tasks and their possible ramifications, seek help from someone that does before you continue.

Solution 1, low impact

If it is difficult to destroy the cluster as it requires the VMs to be removed from the cluster temporarily, you can try this method. We do not know if there are other detrimental effects caused by not having the proper permissions when creating the cluster.

  • Remove the cluster object from the local admin on all cluster nodes.
  • Remove the cluster nodes from the local admin group on all nodes.
  • Make sure that the cluster object has create computer objects permissions on the OU in which the cluster object and nodes are located
  • Make sure that the cluster object and the cluster node computer objects are all located in the same OU.
  • Validate the cluster and make sure that it is all green.

Solution 2, high impact

Shotgun approach, removes any collateral damage from failed attempts at fixing the problem.

  • Migrate any VMs away from the cluster
  • Remove the cluster from VMM if it is a member.
  • Remove the “Create computer objects” permissions for the cluster object
  • Destroy the cluster.
  • Delete the cluster object from AD
  • Re-create the cluster with the same name and IP, using a domain admin account.
  • Add create computer objects and read all properties permissions to the new cluster object in the current OU. 
  • Validate the cluster and make sure it is all green.
  • Add the server to VMM if necessary.
  • Migrate the VMs back.

Cluster disk resource XX contains an invalid mount point

Problem

During cluster startup or failover one of the following event is logged in the system event log:

SNAGHTML342b0d8SNAGHTML341df38

Event-ID 1208 from Physical Disk Resource: Cluster disk resource ‘[Resource name]’ contains an invalid mount point. Both the source and target disks associated with the mount point must be clustered disks, and must be members of the same group.
Mount point ‘[Mount path]’ for volume ‘\\?\Volume{[GUID]}\’ references an invalid target disk. Please ensure that the target disk is also a clustered disk and in the same group as the source disk (hosting the mount point).

Cause and investigation

The cause could of course be the fact that the base drive is not a clustered disk as the event message states. If that is the case, read a book about WFC (Windows failover clustering) and try again. If not, I have found the following causes:

  • If the mount point path is C:\$Recycle.bin\[guid], it is caused by replacing a SAN drive with another one at the same drive letter or mount point but with a different LUN. This confuses the recycle bin.
  • If the clustered drive for either the mount point or the volume being mounted is in maintenance mode and/or currently running autchk/chkdsk. This could happen so quickly that you are unable to detect it, and when you come back to check, the services are already up and running. Unless you disable it, WFC will run autochk/chkdsk when a drive with the dirty bit set is brought online. This is probably logged somewhere, but I have yet to determine in which log. Look in the application event log for Chkdsk events or something like this:

Event 17207 from MSSQL[instance]:

Event 1066 from FailoverClustering

 

Resolution

  • If it is the recycle.bin folder, make sure you have a backup of your data and delete the mount point folder under C:\recycle.bin. You might have to take ownership of the folder to be able to complete this task. If files are in use, take all cluster resources offline and try again.
  • If you suspect a corrupt mount point or drive, run chkdsk on ALL clustered drives. See https://lokna.no/?p=1194 for details.

Check C:\Windows\Cluster\Reports (default location) for files titled ChkDSK_[volume].txt, indicating that the cluster service has triggered an automatic chkdsk on a drive.

Run disk maintenance on a failover cluster mountpoint

Problem

“Validate this cluster” or another tool tells you that the dirty bit is set for a cluster shared volume, and taking the disk offline and online again (to trigger autochk) does not help. The error message from “Validate this cluster” looks like this:

SNAGHTML2c4a12c

 

Continue reading “Run disk maintenance on a failover cluster mountpoint”

Permission error installing Failover Cluster instance

Problem

While testing out MSSQL 2012 Always On Failover Clustering in my lab, I stumbled upon a strange error which I have never seen before: “Updating permission settings for file [Shared drive]\[Mountpoint]\System Volume Information\ResumeKeyFilter.store failed”. This happened for all drives that was mounted as a mount point (folder) instead of a drive letter.image

Continue reading “Permission error installing Failover Cluster instance”