The event log fills up with Event ID 2 from Kernel-EventTracing stating Session “” failed to start with the following error: 0xC0000022.



If you look into the system data for one of the events, you will find the associated ProcessID and ThreadID:


If the event is relatively current, the Process ID  should still be registered by the offending process. Open Process Explorer and list processes by PID:


We can clearly see that the culprit is one of those pesky WMI-processes. The ThreadID is a lot more fluctuating than the ProcessID, but we can always take a chance and se if it will reveal more data. I spent a few minutes writing this, and in that time it had already disappeared. I waited for another event, and immediately went to process explorer to look for thread 18932. Sadly though, this didn’t do me any good. For someone more versed in kernel API calls the data might make some sense, but not to me.


I had more luck rummaging around in the ad-profile generator (google search). It pointed me in the direction of KB3087042. It talks about WMI calls to the LBFO teaming (Windows 2012 native network teaming) and conflicts with third-party WMI providers. Some more digging around indicated that the third-party WMI provider in question is HP WBEM. HP WBEM is a piece of software used on HP servers to facilitate centralized server management (HP Insight). As KB3087042 states the third-party provider is not the culprit. That implies a fault in Windows itself, but one must not admit such things publicly of course.

In their infinite wisdom (or as an attempt to compensate for their lack thereof), the good people of Microsoft has also provided a manual workaround for the issue. It is a bit difficult to understand, so I will provide my own version below.


As usual, if the following looks to you as something that belongs in a Harry Potter charms class, please seek assistance before you implement this in production. You will be messing with central operating system files, and a slip of the hand may very well end up with a defective server. You have been warned.

The fix

But let us get on with the fix. First, you have to get yourself an administrative command prompt. The good old fashioned black cmd.exe (or any of the 16 available colors). There is no reason why this would not work in one of those fancy new blue PowerShell thingy’s as well, but why take unnecessary risks?

Then, we have a list of four incantations – uh.., commands to run through. Be aware that if for some reason your system drive is not C:, you will have to take that into account. And then spend five hours repenting and trying to come up with a good excuse for why you did it in the first place. Or perhaps spend the time looking for the person who did it and give them a good talking to. But I digress. The commands to run from the administrative command prompt are as follows:

Takeown /f c:\windows\inf
icacls c:\windows\inf /grant “NT AUTHORITY\NETWORK SERVICE”:”(OI)(CI)(F)”
icacls c:\windows\inf\netcfgx.0.etl /grant “NT AUTHORITY\NETWORK SERVICE”:F
icacls c:\windows\inf\netcfgx.1.etl /grant “NT AUTHORITY\NETWORK SERVICE”:F

The first command takes ownership of the Windows\Inf folder. This is done to make sure that you are able to make the changes. The three icacls-commands grants permissions to the NETWORK SERVICE system account on the INF-folder and two ETL-files. The result should look something like this:


To test if you were successful, run this command:

icacls c:\windows\inf

And look for the highlighted result:


Should you want to learn more about the icacls command, this is a good starting point.

The cleanup

This point is very important. If you do not hand over ownership of Windows\Inf back to the system, bad things will happen in your life.

This time, you only need a normal file explorer window. Open it, and navigate to C:\Windows. Then open the advanced security dialog for the folder.

Next to the name of the current owner (should be your account) click the change button/link.


Then, select the Local Computer as location and NT SERVICE\TrustedInstaller as object name. Click Check Names to make sure you entered everything correctly. If you did, the object name changes to TrustedInstaller (underlined).


Click OK twice to get back to the file explorer window. If you did not get any error messages, you are done.

It IS possible to script the ownership transfer as well, but in my experience the failure rate is way to high. I guess the writers of the KB agrees, as they have only given a manual approach.

Print This Post Print This Post

Tags: ,


Fore some reason, the Store Icon comes back to haunt you every time you restart. That is, it stays pinned to the task bar no matter what, and if you un-pin it, like a zombie it will rise from the grave as soon as you reboot…


This is probably a scheme to make us buy more of those stupid “modern” apps. Not that there aren’t useful apps, but they are few and far between. Anyways, the point is to get rid of the icon. I could of course disable the store altogether, but I just want it out of my way and off my lawn –eh, taskbar.


The good people of Microsoft has finally given us a proper option to get rid of it.  Salvation comes in the form of a GPO called “Do not allow pinning Store app to the Taskbar”. The wording is such as to make us believe that it is all our fault to begin with, but no matter, lets just remove it.

The GPO is hidden in User Configuration under Policies, Administrative Templates,Start Menu and Taskbar:


Set it as enabled and deploy it to your users as best fits you. If you are looking to make this change on you own local computer without a domain, just start gpedit.msc to edit your local policy.


Print This Post Print This Post


Western Digital Scorpio Blue.



Print This Post Print This Post


When trying to start Failover Cluster manager you get an error message: “Microsoft Management Console has stopped working”


Inspection of the application event log reveals an error event id 1000, also known as an application error with the following text:

Faulting application name: mmc.exe, version: 6.3.9600.17415, time stamp: 0x54504e26
Faulting module name: clr.dll, version: 4.6.1055.0, time stamp: 0x563c12de
Exception code: 0xc0000409




As usual, this is a .NET Framework debacle. Remove KB 3102467 (Update for .NET Framwework 4.6.1), or wait for a fix.


Print This Post Print This Post


It was nearing the end of summer, but most of the Knights of Hyper-V were still on vacation. There was of course always one knight on call, but the others were lazily roaming the countryside, or lounging along the bank of a river pretending to be on a fishing trip. Some even went on expeditions to far away realms looking for trouble, relaxation, fancy fishing gear, VMWare-proof armor, or new riding boots. The all-seeing monitors however, were not on vacation. To be honest we do not even know if they ever sleep, they just seem to take turns going into hibernation mode. Instead they had spent the summer installing new crystal orbs, automated all-seeing eyes and such. One of their new contraptions was some kind of network enabled spooky ghost detector. Its purpose was to send probes into The Wasteland of Nexus and attempt to locate signs of the ghosts of forgotten VMs and other security problems.

This came about as flaws had been discovered in the procedure for disposal of outdated VMs. The minions responsible for dealing with outdated VM disposal had gotten increasingly bureaucratic, spending most of their time hassling others with demands of forms filled in triplicate to update documentation. And such tasks are of course important, but the most important thing is to actually dispose of the old VM. The result was a number of undocumented (as the documentation had been updated) VMs roaming The Wasteland of Nexus without updated security software, making the entire realm vulnerable to outside attacks from beyond the wall. Firewall that is.

Such was the back-story, when one dark and gloomy midsummer morning, a trouble ticket landed in the inbox of the knight on call with a loud boom. It was another list of suspect activities detected in the wasteland. A couple of probes had returned during the night, complaining about servers without patches several years old. To add a little spice to the mix, this was ghost servers. If you nocked on the right door they would answer, but they were not listed anywhere. Not in the labyrinthine CMDB, and certainly not in any of the address books. For all intents and purposes they did not exist. Except of course for the undeniable fact that they most certainly did. This was something that could provide days, if not months of confused contemplation for social studies majors, human resources, project managers and others of similar ilk. But the knight was an engineer and simply scoffed at such irrelevancies. To him this was simply a problem looking for a solution. But which solution? The available information pointed to an ancient server from 2010. That is a very long time ago, and at least two documentations systems has been sent off to Valhalla by the way of funeral pyre in the meantime. The current buzzword-friendly variant was named after the Chinese philosopher Confucius. He was the inventor of the term “Do not do to others what you do not want done to yourself”, but if such terms was to be enforced in documentation systems, violent outbreaks would be the norm, as most documentation can be interpreted as a form of torture. Anyways, no trace of the ghosts were found in the current system, and the old ones were burned. There was always a faint hope that someone had kept a personal log mentioning the ghosts former names, but no such luck was to be had this time around.

The knight went back to the all-seeing monitors and requested more information to aid him in his search. Another probe was dispatched into the spirit world, this time with instructions to look for identifying marks instead of fuzzing about missing security updates, foul stenches and gates left open. While waiting for the probes to return, the knight identified an old long forgotten storage system. The storage minions swore it had been properly decommissioned and disposed of years ago, but it was found to be chugging along under a desk, consuming power and collecting dust.

Another sub-quest expedition to the physical realm of Hyper-V hosts revealed that someone had been re-inserting old decommissioned servers that were kept around for spare parts into the magic cabinet of the silver slanted ‘E’. Or it could of course bee that they had never been removed in the first place due to bureaucratic loops and lost scrolls of Todo. Anyways, the knight had bagged two ghosts.

We rejoin our knight the next morning. For once it was a good morning. The sun was shining, and the success of yesterday’s sub-quests were still lingering in the knights mind. Sadly, that would soon change. The probes were back, and they were happily reporting that the former names of the ghosts had been decoded. This identified the responsible service team, but the service team minions were all relatively new and had never heard of these old ghosts. Armed with new knowledge the knight went straight to the VMM daemons to demand an explanation. But to his great alarm, he found that the VMM daemons to had never heard of these ghosts. Feverishly the knight searched the scrolls of physical servers, in a vain hope that the servers nevertheless were physical beings, but no. No such server had ever existed. With that, only one possible solution remained; the ghost were located in the realm of VMWare!

There was no choice other than to beseech the man with the crowbar to borrow his Hazard Suit and plan an expedition to the toxic fields of vCenter. Once there, the ghosts were immediately detected. On a closer (but hasty) inspection of the remaining area, the knight also identified two other ghosts. He quickly filled out a scroll identifying the ghosts, and went back to more pleasing surroundings. He then updated the trouble ticket and forwarded it to the unholy riders of VMWare, hoping that he wouldn’t have to go back for a long, long time.

Print This Post Print This Post

This is an attempt at giving a technical overview of how the native network teaming in Windows 2012R2 works, and how I would recommend using it. From time to time I am presented with problems “caused” by network teaming, so figuring out how it all works has been essential. Compared to the days of old, where teaming was NIC vendor dependent, todays Windows native teaming is a delight, but it is not necessarily trouble free.


Someone at Microsoft has written an excellent guide called Windows Server 2012 R2 NIC Teaming (LBFO) Deployment and Management, available at here. It gives a detailed technical guide to all the available options. I have added my field experience to the mix to create this guide.


  • NIC: Network Interface Card. Also known as Network Adapter.
  • vNIC/virtual NIC: a team adapter on a host or another computer (virtual or physical) that use teaming.
  • Physical NIC/adapter: An adapter port that is a member of a team. Usually a physical NIC, but could be a virtual NIC if someone has made a complicated setup with teaming on a virtual machine.
  • vSwitch: A virtual switch, usually a Hyper-V switch.
  • Team member: a NIC that is a member of a team.
  • LACP: Link Aggregation Control Protocol, also IEE 802.3ad. See

Active-Active vs Active-Passive


If none of the adapters are set as standby, you are running an Active-Active config. If one is standby and you have a total of two adapters, you are running an Active-Passive config. If you have more than two team members, you may be running a mixed Active-Active-Passive config (strandby adapter set), or an Active-Active config without a standby adapter.

If you are using a configuration with more than one active team member on a 10G infrastructure, my recommendation is to make sure that both members are connected to the same physical switch and in the same module. If not, be prepared to sink literally hundreds, if not thousands of hours into troubleshooting that could otherwise be avoided. There are far too many problems related to the switch teaming protocols used on 10G, especially with the Cisco Nexus platform. And it is not that they do not work, it is usually an implementation problem. A particularly nasty kind of device is something Cisco refers to as a FEX or fabric extender. Again, it is not that it cannot work. It’s just that when you connect it to the main switch with a long cable run it usually works fine for a couple of months. And then it starts dropping packets and pretends nothing happened. So if you connect one of your team members to a FEX, and another to a switch, you are setting yourself up for failure.

Due to the problems mentioned above and similar troubles, many it operations have a ban on Active-Active teaming. It is just not worth the hassle. If you really want to try it out, I recommend one of the following configurations:

  • Switch independent, Hyper-V load balancing. Naturally for vSwitch connected teams only. No, do not use Dynamic.
  • LACP with Address Hash or Hyper-V load balancing. Again, do not use Dynamic mode.

Team members

I do not recommend using more than two team members in Switch Independent teaming due to artifacts in load distribution. Your servers and switches may handle everything correctly, but the rest of the network may not. For switch dependent teaming, you should be OK, provided that all team members are connected to the same switch module. I do not recommend using more than four team members though, as it seems to be the breaking point between added redundancy and too much complexity.

Make sure all team members are using the exact same network adapter with the exact same firmware and driver versions. Mixing them up will work, but even if base jumping is legal you don’t have to go jumping. NICs are cheap, so fork over the cash for a proper intel card.

Load distribution algorithms

Be aware that the load distribution algorithm primarily affects outbound connections only. The behavior of inbound connections and routing for switch independent mode is described for each algorithm. In switch dependent mode (either LACP or static) the switch will determine where to send the inbound packets.

Address hash

Using parts of the address components, a hash is created for each load/connection. There are three different modes available, but the default one available in the GUI (Port and IP) is mostly used. The other alternatives are IP only and MAC only. For traffic that does not support the default method, one of the others is used as fallback.

Address hash creates a very granular distribution of traffic initiated at the VM, as each packet/connection is load balanced independently. The hash is kept for the duration of the connection, as long as the active team members are the same. If a failover occurs, or if you add or remove a team member, the connections are rebalanced. The total outbound load from one source is limited by the total outbound capacity of the team and the distribution.


Inbound connections

The IP address for the vNIC is bound to the so called primary team member, which is selected from the available team members when the team goes online. Thus, everything that use this team will share one inbound interface. Furthermore, the inbound route may be different from the outbound route. If the primary adapter goes offline, a new primary adapter is selected from the remaining team members.

Recommended usage
  • Active/passive teams with two members
  • Never ever use this for a Virtual Switch
  • Using more than two team members with this algorithm is highly discouraged. Do not do it.

MS recommends this for VM teaming, but you should never create teams in a VM. I have yet to hear a good reason to do so in production. What you do in you lab is between you and your therapist.

Hyper-V mode

Each vNIC, be it on a VM or on the host, is assigned to a team adapter and stays connected to this as long as it is online. The advantage is a predictable network path, the disadvantage is poor load balancing. As adapters are assigned in a round robin fashion, all your high bandwidth usage may overload one team adapter while the other team adapters have no traffic. There is no rebalancing of traffic. The outbound capacity for each vNIC is limited to the capacity of the Physical NIC it is attached to.

This algorithm supports VMQ.


It may be the case that the red connection in the example above is saturating the physical NIC, thus causing trouble for the green connection. The load will not be rebalanced as long as both physical NICs are online, even if the blue connection is completely idle.

The upside is that the connection is attached to a physical NIC, and thus incoming traffic is routed to the same NIC as outbound traffic.

Inbound connections

Inbound connections for VMs are routed to the Physical NIC assigned to the vNIC. Inbound connections to a host is routed to the primary team member (see address hash). Thus inbound load is balanced for VMs, and we are able to utilize VMQ for better performance. Dynamic has the same inbound load balancing problems as Address hash for host inbound connections.

Recommended use

Not recommended for use on 2012R2, as Dynamic will offer better performance in all scenarios. But, if you need MAC address stability for VMs on a Switch Independent team, Hyper-V load distribution mode may offer a solution.

On 2012, recommended for teams that are connected to a vSwitch.


Dynamic is a mix between Hyper-V and Address hash. It is an attempt to create a best of both worlds-scenario by distributing outbound loads using address hash algorithms and inbound load as Hyper-V, that is each vNIC is assigned one physical NIC for inbound traffic. Outbound loads are rebalanced in real time. The team detects breaks in the communication stream where no traffic is sent. The period between two such breaks are called flowlets. After each flowlet the team will rebalance the load if deemed necessary, expecting that the next flowlet will be equal to the previous one.

The teaming algorithm will also trigger a rebalancing of outbound streams if the total load becomes very unbalanced, a team member fails or other hidden magic black-box settings should determine that immediate rebalancing is required.

This mode supports VMQ.


Inbound connections

Inbound connections are mapped to one specific Physical Nic for each workload, be it a VM or a workload originating on a host. Thus, the inbound path may differ from the outbound path as in address hash.

Recommended use

MS recommends this mode for all teams with the following exceptions:

  • Teams inside a VM (which I do not recommend that you do no matter what).
  • LACP Switch dependent teaming
  • Active/Passive teams

I will add the following exception: If your network contains load balancers that do not employ proper routing, e.g. F5 BigIP with the “Auto Last Hop” option enabled to overcome the problems, routing will not work together with this teaming algorithm. Use Hyper-V or Address Hash Active/passive instead.

Source MAC address in Switch independent mode

Outbound packets from a VM that is exiting the host through the Primary adapter will use the MAC address of the VM as source address. Outbound packets that are using a different physical adapter to exit the host will get another MAC address as source address to avoid triggering a MAC flapping alert on the physical switches. This is done to ensure that one MAC address is only present at one physical NIC at any one point in time. The MAC assigned to the packet is the MAC of the Physical NIC in question.

To try to clarify, for Address Hash:

  • If a packet from a VM exits through the primary team member, the MAC of the vNIC on the VM is kept as source MAC address in the packet.
  • If a packet from a VM exits through (one of) the secondary team members, the source MAC address is changed to the MAC address of the secondary team member.

for Hyper-V:

  • Every vSwitch port is assigned to a physical NIC/team member. If you use this for host teaming (no vSwitch), you have 1 vSwitch port and all inbound traffic is assigned to one physical NIC.
  • Every packet use this team member until a failover occurs for any reason

for Dynamic:

  • Every vSwitch port is assigned to a physical NIC. If you use this for host teaming (no vSwitch), you have 1 vSwitch port and all inbound traffic is assigned to one physical NIC.
  • Outbound traffic will be balanced. MAC address will be changed for packets on secondary adapters.

For Hyper-V and Dynamic, the primary is not the team primary but the assigned team member. It will thus be different for each VM.

For Host teaming without a vSwitch the behavior is similar. One of the team members’ MAC is chosen as the primary for host traffic, and MAC replacement rules applies as for VMs. Remember, you should not use Hyper-V load balancing mode for host teaming. Use Address hash or Dynamic.

Algorithm Source MAC on primary Source MAC on secondary adapters
Address hash Unchanged MAC of the secondary in use
Hyper-V Unchanged Not used
Dynamic Unchanged MAC of the secondary in use

Source MAC address in switch dependent mode

No MAC replacement is performed on outbound packets. To be overly specific:

Algorithm Source MAC on primary Source MAC on secondary adapters
Static Address hash Unchanged Unchanged
Static Hyper-V Unchanged Unchanged
Static Dynamic Unchanged Unchanged
LACP Address hash Unchanged Unchanged
LACP Hyper-V Unchanged Unchanged
LACP Dynamic Unchanged Unchanged
Print This Post Print This Post

Tags: ,

Posting reviews of software is not something that I do every day. Or every year for that matter. But something unexplainable about the incident foretold in this post made me write it. You have been warned…

I have been on the lookout for a new password manager, especially one with “secure” cloud sync capabilities, and someone recommended 1Password (name withheld to protect the guilty). What peaked my interest was the claim that no one but me would be able to decrypt the data.

This is in stark contrast to most cloud solutions. Let us use as an example. It is touted as a completely secure way to receive digital documents from the Norwegian government and anyone else willing to pay for sender-access to the system. For instance, several brick and mortar stores in Norway are able to send you receipts and warranty-certificates over the system. But is it secure? Their FAQ claims that it is as safe as your bank. And maybe it is. But my bank does not aggregate data about me from other sources, at least not to my knowledge. Browsing further down the FAQ reveals the following quote: “Et fåtall sikkerhetsklarerte medarbeidere er autorisert til å vedlikeholde og korrigere kundeopplysninger.” Sadly this is in Norwegian only, but it basically says that some employees have the security clearance necessary to view or alter your data to perform “maintenance”. Images of underpaid outsourcing employees from Asia looking to make a quick buck on the side datamining flashed before my inner eye, but even if these people are all highly trustworthy, that is beside the point. The point is that someone other than me and the sender can access these data without me giving them the key. And then it is not really safer than regular email. And if you still believe that your emails, cloud storage and facebook messages are not stored, tagged and analyzed automatically by at least two governments beside your own, please stop reading. You are outside the target demographic and should keep your current post-it-under-the-keyboard password manager.

But I digress. I was supposed to write about password managers, more specifically 1Password from AgileBits.

I registered and downloaded a trial of the subscriptions based “family”-version, as it came so highly recommended by the website and was the only version targeted at end users with internet sync that didn’t include a known NSA-infected third party.


I was surprised to find that there was no stable version of the Windows Application available, only a beta, but I was feeling adventurous and downloaded the desktop version. The Modern/Metro version reports itself as an Alpha version in the Windows Store and was thus left alone.



Next, I attempted to import my existing data. The online help directed me to a community-built perl script and a pdf at I went through the perl script maze and ended up with a 1pif file in the end, which I was to import into the main program. 1pif is some form of intermediate proprietary import/export format. All that remained was importing it into 1password. To my astonishment, there was no import button to be found. Not even the File menu at which the Import button is supposed to be located according to the PDF was available. The app is almost completely left of buttons and menus. I tried inputting data manually, but the fancy modern UI is not exactly user friendly so I gave that up. Inputting 200+ entries manually at the pace the UI allowed was out of the question. There may be a hidden import function there somewhere, but I was unable to find it.


Rummaging around the 1Password website I found the stable 4.x version. This is the one that only supports DropBox sync or similar. It has the aforementioned import button (which worked), but after the data was imported and I tried opening the resulting vault in v6 (beta), the vault was locked and could not be reopened. After a second try with another file I got it going, and I was able to access the data in v6 through some kind of legacy function whose location I forgot to screenshot. I was about to move the data over to the “cloud” part, but I stopped… Glancing at my main monitor, I noticed it was filling up with security warnings complaining about unsafe access to system resources. By 1password v6. See screenshot below. Sadly, it is written in Norwegian, but it is basically a warning against invalid code signing certificates.


I have once before lost data due to poorly managed updates to a password manager, and here I am about to put my trust in beta software? Remembering the non-decryptable data from some years back and the time spent recovering the lost data, I was not feeling safe at all. If the claim that I am the only one with the encryption keys are true, is it then even possible to restore from a backup if a botched software update garbles the data? Are there in fact any backups at all? The documentation talks about a password history, indicating that delete means tag as deleted but keep in database, but says nothing about a restore function as far as I can tell.

There are stable clients for most other platforms though. I realized that most if not all screenshots on the 1Password site are from the Mac version, so I guess they just couldn’t be bothered to build a proper Windows client before they launched V6 for Mac. A stroll down the memory lane of confirmed my suspicions. In May 2016 they launched 1Password 6.3 for Mac. 6.0 was launched in January, with several updates in-between. The most recent post I can find about the stable Windows version is from July 2015, and as far as I can tell it just confirms that the current stable 4.6 version is compatible with Windows 10. Almost a year ago to the day.

I seriously considered reaching out to AgileBits support, but at this point I doubt there is anything they can tell me that will convince me to move my data to 1Password families. The 4.6 product looks a lot better, but I guess it is the old stuff now, as there does not seem to be any development to it. The MD5 signature on the current download as of July 2016 is from February 23. 2016. Neither does it support the kind of sync I was looking for, and if the horrible UI of families v6 is a sign of what is to come, I am out.

I have since moved on to somewhat greener pastures, and I am currently testing another similar product. If that results in another horrible experience, maybe there will be another review…

Update 2017.03.16

In response to comments, I have written another post here:

Print This Post Print This Post

Tags: , ,

Post originally from 2010, updated  2016.06.20. IPMI seems to be an endless source of “entertainment”…

Original post:


imageThe system event log is overflowing with EventID 1004 from IPMIDRV. “The IPMI device driver attempted to communicate with the IPMI BMC device during normal operation. However the operation failed due to a timeout.”

The frequency may vary from a couple of messages per day upwards to several messages per minute.


The BMC (Baseboard Management Controller) is a component found on most server motherboards. It is a microcontroller responsible for communication between the motherboard and management software. See wikipedia for more information. The BMC is also used for communication between the motherboard and dedicated out of band management boards such as Dell iDRAC. I have seen these error messages on systems from several suppliers, most notably on IBM and Dell blade servers, but most server motherboards have a BMC. As the error message states, you can resolve this error by increasing the timeout, and this is usually sufficient. I have found that the Windows default settings for the timeouts may cause conflicts, especially on blade servers. Thus an increase in the timeout values may be in order as described on technet. Lately though, I have found this error to be a symptom of more serious problems. To understand this, we have to look at what is actually happening. If you have some kind of monitoring agent running on the server, such as SCOM or similar, the error could be triggered by said agent trying to read the current voltage levels on the motherboard. If such operations fail routinely during the day, it is a sign of a conflict. This could be competing monitoring agents querying data to frequently, an issue with the BMC itself, or an issue with the out of band management controller. In my experience, this issue is more frequent on blade servers than rack-based servers. This makes sense, as most blade servers have a local out of band controller that is continuously talking to a chassis management controller to provide a central overview of the chassis.

Read the rest of this entry »

Print This Post Print This Post

Tags: , , , ,


A newly converted Cluster Shared Volume refuses to come on-line. Cluster Validation passed with flying colours pre conversion. Looking in the event log you find this:

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 05.06.2016 15:01:31
Event ID: 5120
Task Category: Cluster Shared Volume
Level: Error
Computer: HyperVHostname
Cluster Shared Volume ‘Volume1’ (‘VMStore1’) has entered a paused state because of ‘(c00000be)’. All I/O will temporarily be queued until a path to the volume is reestablished.

The event is repeated on all nodes.


The crafty SAN-Admins has probably enabled some kind of fancy SAN-mirroring on your LUN. If you check, you will probably find twice the amount of storage paths compared to your usual amount. A typical SAN has 4 connections per LUN, and thus you may see 8 paths. Be aware that your results may vary. The point is that you now have more than usual. Problem is that you cannot use all of the paths simultaneously. Half of them are for the SAN mirror, and your LUNS are offline at the mirror location. If a failover is triggered at the SAN side, your primary paths go down and your secondary paths come alive. Your poor server knows nothing about this though, it is only able to register that some of the paths does not work even if they claim to be operative. This confuses Failover Clustering. And if there is one thing Failover Clustering does not like, it is getting confused. As a result the CSV volume is put in a paused state while it waits for the confusion to disappear.


You have to give MPIO permission to verify the claims made by the SAN as to whether or not a path is active. Run the following powershell command on all cluster nodes. Be aware that this is a system wide setting and is activated for all MPIO connections that use the Microsoft DSM.

Set-MPIOSetting -NewPathVerificationState Enabled

Then reboot the nodes and all should be well in the realm again.

Print This Post Print This Post



After an upgrade to SQL 2012 SP3, a clustered instance fails to start listing error 33009, 912 and 3417:

2016-04-28 14:32:23.39 spid14s     Error: 33009, Severity: 16, State: 2.
2016-04-28 14:32:23.39 spid14s     The database owner SID recorded in the master database differs from the database owner SID recorded in database 'msdb'. You should correct this situation by resetting the owner of database 'msdb' using the ALTER AUTHORIZATION statement.
2016-04-28 14:32:23.39 spid14s     Error: 912, Severity: 21, State: 2.
2016-04-28 14:32:23.39 spid14s     Script level upgrade for database 'master' failed because upgrade step 'msdb110_upgrade.sql' encountered error 33009, state 2, severity 16. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the 'master' database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.
2016-04-28 14:32:23.39 spid14s     Error: 3417, Severity: 21, State: 3.
2016-04-28 14:32:23.39 spid14s     Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.

Something went down the steep road to oblivion during the upgrade process, and the instance is as dead as an overrun squirrel. To be more specific, the master database does not like the looks of the owner attribute on the msdb database, and is down right refusing to complete the upgrade process.


  • Start the instance with trace flag 902 to disable the upgrade scripts and get the server back online. You can do this by command line if you wish, but for clustered instances it is easier to use SQL Server config manager. There will be some existing startup parameters. Do not mess with those, just add a new one using the handy “Add” button provided.
  • image
  • Wait for the instance to start, and log in using SSMS.
  • To find out what SID is where, run the following script
--Sid in master
 SELECT SID FROM master..sysdatabases WHERE  Name = 'msdb'
--Sid in database
 SELECT [SID]  FROM msdb.sys.database_principals WHERE Name = 'DBO'

It will probably show 0x01  (sa) for one of the results, and an Active directory SID for the other one:


If you want to know what username SQL Server has stored for the SID, use this command, replacing 0x01 with the SID from the previous result:

 --Sid to username
 SELECT  Name AS [LoginName]  FROM   master..syslogins  WHERE  SID = 0x01

Be aware that if the username has been changed in AD, lets say you changed DOMAIN\johnsmith to DOMAIN\jsmith1, SQL Server will not necessarily be aware of this. You can validate the username in AD using powershell. See And just to make the point clear, do not mess around with account names in AD if they are linked to SQL Server. It may lead to login failures, and possibly cows being abducted by aliens at your local dairy farm.


Change the database owner to match the value stored in master. Remember to use brackets for AD users.


Then remove the trace flag and restart the instance to try the upgrade scripts again. If the current owner is not the owner you want, allow the upgrade to finish and change it again.

Print This Post Print This Post

Tags: ,

« Older entries § Newer entries »

%d bloggers like this: