Redirecting the root site to the MIM Portal

Problem

When you install the MIM Portal, the root site of your portal will display a rather glum page devoid of anything immediately useful. It looks like this:

image

This is not the portal you are looking for… The one you want is located at /IdentityManagement. Now, there are many ways to work around this. You may publish through some kind of load balancer and add the redirect there, you can install IIS URL Rewrite and fiddle around with the settings for a while, but the most elegant solution I have come across so far is to change the WelcomePage in SharePoint.

Solution

Source: http://konab.com/redirect-identitymanagement-site-spf-2013/.

 

Notes

  • I use an http link in the example as this is from a lab setup, but you should of course use SSL in production. HTTP to HTTPS redirect is another issue (for another post).
  • I have not tested this on SharePoint 2016, but it I see no reason why it shouldn’t work.

 

Action plan

  • Start by opening the SharePoint 2013 Management Shell.
  • Enter the following commands, replacing the web application name with the URL for your MIM Portal.
$webapp = Get-SPWeb http://portal.mim.local
$root = $webapp.RootFolder
$root.WelcomePage = "IdentityManagement/default.aspx"
$root.Update()
  • The change should take effect immediately.

Hypervisor not running

Problem

After upgrading my LAB to VMM 1801, and subsequently VMM 1806 (see https://lokna.no/?p=2519), VMs refuse to start on one of my hosts. EventID 20148 was logged when I tried to create a new VM. I restarted the host in hope of a quick fix, but the result was that none of the VMs living on this host wanted to boot.

Virtual machine ‘NAME’ could not be started because the hypervisor is not running”

image

Solution

For some reason the Hypervisor has been disabled. You can check this by running BCDEDIT in an administrative command prompt. The hypervisorlaunchtype should be set to Auto. If it is not, change it by running the following command:

bcdedit /set hypervisorlaunchtype auto


After that, reboot the host and everything should be running again. Unless, of course, you have a completely different issue preventing your VMs from starting.

image

MIM: Sharepoint central administration returns 404

Problem

After an unscheduled reboot of some of my production MIM 2016, SharePoint central administration just returned a 404 error. As the reboot was caused by a massive network outage, pinpointing this specific error took some time.

 

Analysis

Rebooting the servers brought the MIM portal up again (after some time), but Sharepoint Central administration was still down. Rummaging around in the ULS log I came across this message, the first error message logged after trying to load the management site:

Failed to get document content data. System.TypeInitializationException: The type initializer for ‘Cobalt.MetricsBase`1’ threw an exception. —> System.IO.FileLoadException: Loading this assembly would produce a different grant set from other instances. (Exception from HRESULT: 0x80131401)

at System.Linq.Expressions.Expression.Parameter(Type type, String name)

at Cobalt.MetricsBase`1..cctor() –

— End of inner exception stack trace —

at Cobalt.MetricsBase`1..ctor()

at Microsoft.SharePoint.SPFileStreamHostBlobStore..ctor(SPFileStreamStore spFileStreamStore, Config config)

at Microsoft.SharePoint.SPFileStreamManager.CreateCobaltStreamContainer(SPFileStreamStore spfs, ILockBytes ilb, Boolean copyOnFirstWrite, Boolean disposeIlb)

at Microsoft.SharePoint.SPFileStreamManager.SetInputLockBytes(SPFileInfo& fileInfo, SqlSession session, PrefetchResult prefetchResult)

at Microsoft.SharePoint.CoordinatedStreamBuffer.SPCoordinatedStreamBufferFactory.CreateFromDocumentRowset(Guid databaseId, SqlSession session, SPFileStreamManager spfstm, Object[] metadataRow, SPRowset contentRowset, SPDocumentBindRequest& dbreq, SPDocumentBindResults& dbres)

at Microsoft.SharePoint.SPSqlClient.GetDocumentContentRow(Int32 rowOrd, Object ospFileStmMgr, SPDocumentBindRequest& dbreq, SPDocumentBindResults& dbres)

And this one followed:

Could not get DocumentContent row: 0x80131534.

Some searching brought up this discussion on Technet that contained the solution. It worked like a charm. As it is currently past 3am I have not made any attempt to discover why this happended, and why this solution works.

Update 2018-08-18: J. Qvarnström on the MIM FB Group suggested that the problem could be caused by .Net framework patches from June 2018. I checked, and there was indeed some .Net patches installed recently. In this particular case KB4099639 and KB4099635.

Update 2018-08-23: As suggested by M Kaufman on the MIM FB Group I tried to remove SCOM APM. That allowed me to remove the LoaderOptimization registry setting. SCOM APM has been installed since the server was created years ago according to our SCOM team, so something else must have triggered the problem. The .Net framework updates mentioned above are my primary suspects. That being said, SCOM APM is not supported on Sharepoint servers so it should be removed in any case.

Solution

As usual, do not perform these steps in production if you do not understand them. These steps should be performed on the MIM Portal servers.

Alternative 1: Disable SCOM APM

  • Aquire a copy of your current SCOM msi, usually called MOMAgent.msi and place it on the server.
  • Run the following command from an administrative command prompt:
msiexec.exe /fvomus "MOMagent.msi" NOAPM=1
  • Restart the server

Alternative 2: Change the .Net framework LoaderOptimization

If alternative 1 did not help, change the LoaderOptimization to 1. Be aware that this is a sledgehammer approach, but it is highly likely to get your system back up and running. Further investigation into the root cause is recommended.

  • Locate HKLM\SOFTWARE\Microsoft\.NETFramework
  • Add a new DWORD value called LoaderOptimization.
  • Set the value to 1. See MSDN for documentation.
  • Perform an iisreset.

image

Upgrade to VMM 1801, a knights tale

This is a story in the “Knights of Hyper-V” series, an attempt at humor with actual technical content hidden in the details.

A proclamation had been issued several moons ago by the gremlins of the blue window, declaring that a new version of the Virtual Machine Manager had been released. This had mostly been ignored by our merry knights, they were all busy building new systems, putting out fires and slaying dragons. You know, the usual stuff. Thus they had no time to spare for doing such things as maintenance on systems that were chugging along nicely without issues. But when a second proclamation appeared about an even newer version, it was decided to spend some time trying to do an upgrade in the lab, down in the spare dungeon.

Alas, this was not to be an easy task. The lab servers were in dire need of some maintenance as well, and one of the host flat out refused to respond to commands. Closer inspection revealed a “No bootable device” error on the local console. The results of a botched patching run a long time ago. For some reason the main partition was no longer marked active, a relatively easy fix in diskpart. But on to the main quest. Rumors would have it that there was no in place upgrade from SCVMM 2016 to SCVMM 1801. Those rumors were true indeed.

A knight was sent into the maze of documentation to look for answers. He came upon several dead ends and a lot of references to the hidden cat of 404, but he persisted and finally ended up at https://docs.microsoft.com/en-us/system-center/vmm/upgrade-vmm?view=sc-vmm-1801. Just as in the upgrade from SCVMM 2012 to SCVMM 2016, an uninstall/reinstall was required.

A cunning plan is devised

The SCVMM 1801 scroll of  system requirements were reviewed to make sure that our systems were supported. The spare dungeon contains a single VM running both SCVMM and SQL Server and some old hosts. The VMM VM has the following setup:

  • Windows Server 2016
  • SQL Server 2012 SP4
  • SCVMM 2016 4.0.2314.0 (UR5)

After some pondering around the table reading the scroll of upgrade instructions mentioned above, the following plan was agreed upon:

  • Checkpoint/snapshot the VMM server.
  • Create a Copy-Only backup of the VMM database.
  • Reboot the VMM Server to make sure there are no pending reboots or other nasty stuff lurking in memory.
  • Uninstall VMM 2016.
  • Restart the server.
  • Install VMM 1801.
  • Upgrade VMM to 1807.
  • Remove the checkpoint.
  • Update the VMM Agent on the hosts.
  • Turn off Diagnostic and Usage data.

Note: If you are running other System Center products, make sure that you review the upgrade sequence. Especially noteworthy is the fact that Operations Manager should be upgraded before VMM.

Continue reading “Upgrade to VMM 1801, a knights tale”

CredSSP encryption oracle remediation

Problem

One of my minions contacted me about a strange error message connecting to a server. He was running scheduled maintenance, but he was unable to connect via RDP to one of his servers. The error message looked like this:

image

“An authentication error occurred The Function requested is not supported”

“This could be due to CredSSP encryption oracle remediation”

Analysis

Some Microsoft gremlin thought it was a good idea to block remote connections to Windows 2012R2 servers missing the march 2018 CredSSP patch if your client is patched. You know, just to make it extra easy to patch the servers. They even try to blame Oracle for their mess.

According to 4093492, this fine function was enabled on 2018-05-08. “By default, after this update is installed, patched clients cannot communicate with unpatched servers.” You can override this by creating a GPO and restarting all affected systems, but that would leave you permanently vulnerable to what is in fact a security issue. Moreover, as a reboot is needed for the workaround it is easier to just patch the servers (which was our initial plan).

Solution

Install the patches from this list on your servers: https://portal.msrc.microsoft.com/en-us/security-guidance/advisory/CVE-2018-0886. If you are lucky they are just VMs and you have access to the VM console ore some kind of KVM. If you are not lucky, a trip to the server room it is.

Configure VMQ and RSS on physical servers

Introduction

Samples below are collected from Windows Server 2016

The primary objective is to avoid weighing down Core 0 with networking traffic. This is the first core on the first NUMA node, and this core is responsible for a lot of kernel processing. If this core suffers from contention, a wild blue screen of death will appear. Thus, we want our network adapters to use other cores to process their traffic. We can achieve this in three ways, depending on what we use the adapter for:

  • Enable Receive Side Scaling (RSS) and configure it to use specific cores.
  • Enable Virtual Message Queueing and configure it to use specific cores
  • Set the preferred NUMA node

For physical machines

On network adapters used for generic traffic, we should enable RSS and disable VMQ. On adapters that are part of a virtual switch, we should disable RSS and enable VMQ. The preferred NUMA node should be configured for all physical adapters.

For virtual machines

If the machine has more than one CPU, enable vRSS.

Investigating the NUMA architecture

Sockets and NUMA nodes

Sysinternals coreinfo -s -n will show the relationship between logical processors, sockets and NUMA nodes. In the example below we have a CPU with two sockets and four nodes.

clip_image002

Closest NUMA node

Each PCIE adapter is physically connected to a specific NUMA node. If possible, RSS / VMQ should be mapped to cores on the same NUMA node that the NIC is connected to. Get-NetadApterRss will show you which NUMA node is closest for each adapter. The one in the sample is connected to/closest to NUMA node 0 as the NUMA distance for cores in group 0 is 0. We can also see that the NUMA distance to node 1 for this particular port is lower than the distance to nodes 2 and 3. This is caused by the fact that node 0 and 1 are on the same physical CPU, whereas node 2 and 3 are on another physical CPU.

clip_image004 Continue reading “Configure VMQ and RSS on physical servers”

Fake Anderson Powerpole

I recently replaced a set of winch connectors on a Land Cruiser rescue-vehicle. The vehicle is about 10 years old, but I do not know the age of the connectors, they may have been replaced previously. They have been fitted to the vehicle since it was new, and were mounted underneath the bumper front and back. Local regulations makes permanent winch mounts difficult on road legal vehicles, so the winch was stored in the trunk. The connector looks like and mates with the original Anderson Powerpole SB 175 connectors, but closer inspection revealed that the connector was un-keyed. The original connectors are keyed such that you can only mate connectors of the same color, with the exception of black connectors that are un-keyed. This one was red, so it should be keyed. The Anderson site contains detailed drawings of the original connector. If you are unfamiliar with the SB-series of connectors and want to learn more I recommend taking a look at the catalog and mounting instructions.

The connectors show signs of heavy corrosion, and the green death is prevalent on the contacts. They were fitted with a plastic end cap, but it only offers limited protection against the environment. There is also a power shut-off switch for these connectors made to limit the corrosion caused by electrical current when the winch is not in use, but it has probably been left in the “ON” position for some months. Interestingly, the actual wire contact lugs appear to be original Powerpole contacts. It is a bit difficult to make out on the photo as the contact is upside down, but the “A” stamp was clearly visible underneath the green death. I guess the reason is that the original contacts are relatively cheap if you buy bulk, whereas the housings are expensive.

 

clip_image001

 

clip_image001

 

The housing on the left was mounted underneath the rear bumper, the one on the right was mounted on the winch and was mostly kept in the trunk. The housing broke apart when I tried to remove the cable. There is a spring in the part that is broken off that is supposed to hold the contact in place. This spring was completely corroded, thus proper removal of the contact was impossible. As you can see there is also a bolt and captive nut stuck to the connector housing. There are two apparent reasons for this; the bolt is corroded (even though it is actually stainless) and the steel panel has collision damage. Someone had backed into something, buckling the steel panel under the car where the captive nuts were mounted. I was unable to un-screw the captive nut, it just rotated in its’ hole, so I had to employ the “Clarkson-method”, using hammers and pry-bars.

 

clip_image001[5]

I was unable to verify if the captive nuts were original. The Land Cruiser has a lot of captive nuts installed at the factory, but these may have been installed with the winch kit.

Script to migrate VMs back to their preferred node

Situation

You have a set of VMs that are not running at their preferred node due to maintenance or some kind of outage triggering unscheduled migration. You have set one (and just one) preferred host for all you VMs. You have done this because you are want to balance your VMs manually to guarantee a certain minimum of performance to each VM. By the way, automatic load balancing cannot do that, there will be a lag between a usage spike and load balancing if load balancing is required. But I digress. The point is, the VMs are not running where they should, and you have not enabled automatic failback because you are afraid of node flapping or other inconveniences that could create problems. Hopefully though, you have some kind of monitoring in place to tell you that the VMs are restless and you need to fix the problem and subsequently corral them into their designated hosts. Oh, and you are using Virtual Machine Manager. You could do this on the individual cluster level as well, but that would be another script for another day.

If you understand the scenario above and self-identify with at least parts of it, this script is for you. If not, this script could cause all kinds of trouble.

Script Notes

  • You can skip “Connect to VMM Server” and “Add VMM Cmdlets” if you are running this script from the SCVMM PowerShell window.
  • The MoveVMs variable can be set to $false to get a list. this could be a smart choice for your first run.
  • The script ignores VMs that are not clustered and VMs that does not have a preferred server set.
  • I do not know what will happen if you have more than one preferred server set.

 

The script

 

#######################################################################################################################
#   _____     __     ______     ______     __  __     ______     ______     _____     ______     ______     ______    #
#  /\  __-.  /\ \   /\___  \   /\___  \   /\ \_\ \   /\  == \   /\  __ \   /\  __-.  /\  ___\   /\  ___\   /\  == \   #
#  \ \ \/\ \ \ \ \  \/_/  /__  \/_/  /__  \ \____ \  \ \  __< \ \  __ \  \ \ \/\ \ \ \ \__ \  \ \  __\   \ \  __<   #
#   \ \____-  \ \_\   /\_____\   /\_____\  \/\_____\  \ \_____\  \ \_\ \_\  \ \____-  \ \_____\  \ \_____\  \ \_\ \_\ #
#    \/____/   \/_/   \/_____/   \/_____/   \/_____/   \/_____/   \/_/\/_/   \/____/   \/_____/   \/_____/   \/_/ /_/ #
#                                                                                                                     #
#                                                   http://lokna.no                                                   #
#---------------------------------------------------------------------------------------------------------------------#
#                                          -----=== Elevation required ===----                                        #
#---------------------------------------------------------------------------------------------------------------------#
# Purpose: List VMs that are not running at their preferred host, and migrate them to the correct host.               #
#                                                                                                                     #
#=====================================================================================================================#
# Notes:                                                                                                              #
# There is an option to disable VM migration. If migration is disabled, a list is returned of VMs that are running at #
# the wrong host.                                                                                                     #
#                                                                                                                     #
#######################################################################################################################



$CaptureTime = (Get-Date -Format "yyyy-MM-dd HH:mm:ss")
Write-Host "-----$CaptureTime-----`n"
# Add the VMM cmdlets to the powershell
Import-Module -Name "virtualmachinemanager"

# Connect to the VMM server
Get-VMMServer –ComputerName VMM.Server.Name|Select-Object Name

#Options
$HostGroup = "All Hosts\SQLMGMT\*" #End this with a star. You can go down to an individual VM. All Hosts\Hostgroup\VM.
$MoveVMs = $true #If you set this to true, we will try to migrate VMS to their preferred host.
#List VMS in the host group
$VMs = Get-SCVirtualMachine | where { $_.IsHighlyAvailable -eq $true -and $_.ClusterPreferredOwner -ne $null -and $_.HostGroupPath -like $HostGroup }

# Process
Foreach ($VM in $VMs) 
{
    # Get the Preferred Owner and the Current Owner
    $Preferred = Get-SCVirtualMachine $VM.Name | Select-Object -ExpandProperty clusterpreferredowner
    $Current = $VM.HostName
    
    
    # List discrepancies
    If ($Preferred -ne $Current) 
    {
        Write-Host "VM $VM should be running at $Preferred but is running at $Current." -ForegroundColor Yellow
        If ($MoveVMs -eq $true)
        {
            $NewHost = Get-SCVMHost -ComputerName $Preferred.Name
            Write-Host "We are trying to move $VM from  $Current to $NewHost." -ForegroundColor Green
            Move-SCVirtualMachine -VM $VM -VMHost $NewHost|Select-Object ComputerNameString, HostName
        } 
    }
}


Hyper-V VM with VirtualFC fails to start

Problem

This is just a quick note to remember the solution and EventIDs.

The VM fails to start complaining about failed resources or resource not available in Failover Cluster manager. Analysis of the event log reveals messages related to VirtualFC:

  • EventID 32110 from Hyper-V-SynthFC: ‘VMName’: NPIV virtual port operation on virtual port (WWN) failed with an error: The world wide port name already exists on the fabric. (Virtual machine ID ID)
  • EventID 32265 from Hyper-V-SynthFC: ‘VMName’: Virtual port (WWN) creation failed with a NPIV error(Virtual machine ID ID).
  • EventID 32100 from Hyper-V-VMMS: ‘VMNAME’: NPIV virtual port operation on virtual port (WWN) failed with an unknown error. (Virtual machine ID ID)
  • EventID 1205 from Microsoft-Windows-FailoverClustering: The Cluster service failed to bring clustered role ‘SCVMM VM Name Resources’ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Analysis

The events point in the direction of Virtual Fibre Channel or Fibre Channel issues. After a while we realised that one of the nodes in the cluster did not release the WWN when a VM migrated away from it. Further analysis revealed that the FC driver versions were different.

SNAGHTML66058b10

SNAGHTML660875d4

Solution

  • Make sure all cluster nodes are running the exact same driver and firmware for the SAN and network adapters. This is crucial for failovers to function smoothly.
  • To “release” the stuck WWNs you have to reboot the offending node. To figure out which node is holding the WWN you have to consult the FC Switch logs. Or you could just do a rolling restart and restart all nodes until it starts working.
  • I have successfully worked around the problem by removing and re-adding the virtual FC adapters n the VM that is not working. I do not know why this resolved the problem.
  • Another workaround would be to change the WWN on the virtual FC adapters. You would of course have to make this change at the SAN side as well.

Is your LAPS working as it should?

Intro

So, you have implemented LAPS, and you are wondering whether or not it is working as it should? Or at least, you should wonder about that. You see, LAPS is a solution with quite a few “moving parts”, and all of them have to work for your local administrator passwords to be randomized and rotated automatically. You need a Group Policy Client Side Extension on each and every Server and Workstation (client), you need a GPO using said extension, and you need to extend the schema and set AD permissions. If any of these are not working properly somewhere, LAPS will not work properly. The most usual problems are:

  • The GPO CSE is not deployed to some clients.
  • The GPO is not linked in all OUs where you have clients.

 Detection

We can easily check if LAPS is working for a specific client be reading the contents of the AD attributes associated with LAPS. We need access to read these properties, so all automated and manual testes mentioned henceforth has to be run by an account with permissions to read the properties. The LAPS operations guide details how you should configure the permissions. That being said, you should of course also test the permissions to make sure that only privileged users are able to read said properties. The properties are called:

  • ms-Mcs-AdmPwd stores the password as clear text.
  • ms-Mcs-AdmPwdExpirationTime stores the point in time for the next password change. The GPO checks this value when it is applied and resets the password if the time has passed.

We can use both of these to test if LAPS has been applied to a specific computer object at least once. If you do a manual test by using the Attribute Editor in AD Users and Computers you will see both. I have written PowerShell commands to automate the process based on the value of the ms-Mcs-AdmPwdExpirationTime attribute.

List computers without LAPS

This lists all computer objects without a LAPS expiry set. Virtual cluster computer objects are excluded. The results are exported to the file C:\TEMP\NoLaps.csv.

get-adcomputer -Properties Name, operatingSystem, Description, ms-Mcs-AdmPwdExpirationTime `
-LDAPFilter "(&(!ms-Mcs-AdmPwdExpirationTime=*)(operatingSystem=Windows*)(!Description=ClusterAwareUpdate*)(!Description=Failover cluster virtual network name account))"|`
Select Name, operatingSystem, Description, ms-Mcs-AdmPwdExpirationTime| Sort-Object Name | export-csv C:\Temp\NoLaps.csv -Delimiter ";" -NoTypeInformation

List computers with expired LAPS

Lists all computer objects where LAPS has been applied at least once, where the expiration time has passed. These are usually computers that are not powered on, maybe removed but not properly deleted from the AD. The results are exported to the file C:\TEMP\ExpiredLaps.csv.

$now = Get-Date
get-adcomputer -Properties Name, operatingSystem, Description, ms-Mcs-AdmPwdExpirationTime `
-LDAPFilter "(&(ms-Mcs-AdmPwdExpirationTime=*)(operatingSystem=Windows*)(!Description=ClusterAwareUpdate*)(!Description=Failover cluster virtual network name account))"|`
Select Name, operatingSystem, Description, @{N='ExpiryTime'; E={[DateTime]::FromFileTime($_."ms-Mcs-AdmPwdExpirationTime")}}| `
Where-Object ExpiryTime -lt $now| Sort-Object ExpiryTime| export-csv C:\Temp\ExpiredLAPS.csv -Delimiter ";" -NoTypeInformation 

Get LAPS Expiration date for one or more computer(s)

This command lists the expiration time for one or more computers based on an LDAP filter. The sample filter (Name=Badger*) will list all computers whose name starts with Badger. Computers where the expiration time is not set are filtered out. For more information about the LDAP filter syntax se this link:  https://social.technet.microsoft.com/wiki/contents/articles/5392.active-directory-ldap-syntax-filters.aspx

$now = Get-Date
 get-adcomputer -Properties Name, operatingSystem, Description, ms-Mcs-AdmPwdExpirationTime `
-LDAPFilter "(&(ms-Mcs-AdmPwdExpirationTime=*)(Name=Badger*))"|`
Select Name, operatingSystem, Description, @{N='ExpiryTime'; E={[DateTime]::FromFileTime($_."ms-Mcs-AdmPwdExpirationTime")}}| Sort-Object Name

Get LAPS expiration date for one or more computers, excluding those with no expiry set

Similar to above, but includes computer objects where the expiration time is not set. Those return 01.01.1601 01.00.00 as ExpiryTime because of the conversion of 0 from FileTime to DateTime. To put it in another way, if the expiration time is reported as 01.01.1601 01.00.00 it has not been set.

$now = Get-Date
 get-adcomputer -Properties Name, operatingSystem, Description, ms-Mcs-AdmPwdExpirationTime `
-LDAPFilter "(&(!ms-Mcs-AdmPwdExpirationTime=*)(Name=Badger*))"|`
Select Name, operatingSystem, Description, @{N='ExpiryTime'; E={[DateTime]::FromFileTime($_."ms-Mcs-AdmPwdExpirationTime")}}| Sort-Object Name