Hyper-V VM with VirtualFC fails to start

Problem

This is just a quick note to remember the solution and EventIDs.

The VM fails to start complaining about failed resources or resource not available in Failover Cluster manager. Analysis of the event log reveals messages related to VirtualFC:

  • EventID 32110 from Hyper-V-SynthFC: ‘VMName’: NPIV virtual port operation on virtual port (WWN) failed with an error: The world wide port name already exists on the fabric. (Virtual machine ID ID)
  • EventID 32265 from Hyper-V-SynthFC: ‘VMName’: Virtual port (WWN) creation failed with a NPIV error(Virtual machine ID ID).
  • EventID 32100 from Hyper-V-VMMS: ‘VMNAME’: NPIV virtual port operation on virtual port (WWN) failed with an unknown error. (Virtual machine ID ID)
  • EventID 1205 from Microsoft-Windows-FailoverClustering: The Cluster service failed to bring clustered role ‘SCVMM VM Name Resources’ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Analysis

The events point in the direction of Virtual Fibre Channel or Fibre Channel issues. After a while we realised that one of the nodes in the cluster did not release the WWN when a VM migrated away from it. Further analysis revealed that the FC driver versions were different.

SNAGHTML66058b10

SNAGHTML660875d4

Solution

  • Make sure all cluster nodes are running the exact same driver and firmware for the SAN and network adapters. This is crucial for failovers to function smoothly.
  • To “release” the stuck WWNs you have to reboot the offending node. To figure out which node is holding the WWN you have to consult the FC Switch logs. Or you could just do a rolling restart and restart all nodes until it starts working.
  • I have successfully worked around the problem by removing and re-adding the virtual FC adapters n the VM that is not working. I do not know why this resolved the problem.
  • Another workaround would be to change the WWN on the virtual FC adapters. You would of course have to make this change at the SAN side as well.

Slow startup on VMs with Virtual Fibre Channel

Problem

After you enable Virtual Fibre Channel on a Hyper-V VM it takes forever to start up. In this instance forever equals about two minutes. The VM is stuck in the Starting… stat for most of this period.

Analysis

During startup, the Hypervisor waits for LUNs to appear on the virtual HBA before the VM is allowed to boot the OS. When there are no LUNs defined for a VM, i.e. when you are deploying a new VM, the Hypervisor patiently waits for 90 seconds before it gives up. Thus startup of the VM is delayed by 90 seconds if there are no LUNs presented to the VM, or if the SAN is down or misconfigured. Event ID 32213 from Hyper-V-SynthFC is logged:

SNAGHTML24190ea1

Solution

Depending on your specific cause, one of the following should do the trick:

  • Present some LUNs
  • Remove the Virtual HBA adapters if they are not in use
  • Correct the SAN config to make sure the VM is able to talk to the SAN