Script to migrate VMs back to their preferred node

Situation

You have a set of VMs that are not running at their preferred node due to maintenance or some kind of outage triggering unscheduled migration. You have set one (and just one) preferred host for all you VMs. You have done this because you are want to balance your VMs manually to guarantee a certain minimum of performance to each VM. By the way, automatic load balancing cannot do that, there will be a lag between a usage spike and load balancing if load balancing is required. But I digress. The point is, the VMs are not running where they should, and you have not enabled automatic failback because you are afraid of node flapping or other inconveniences that could create problems. Hopefully though, you have some kind of monitoring in place to tell you that the VMs are restless and you need to fix the problem and subsequently corral them into their designated hosts. Oh, and you are using Virtual Machine Manager. You could do this on the individual cluster level as well, but that would be another script for another day.

If you understand the scenario above and self-identify with at least parts of it, this script is for you. If not, this script could cause all kinds of trouble.

Script Notes

  • You can skip “Connect to VMM Server” and “Add VMM Cmdlets” if you are running this script from the SCVMM PowerShell window.
  • The MoveVMs variable can be set to $false to get a list. this could be a smart choice for your first run.
  • The script ignores VMs that are not clustered and VMs that does not have a preferred server set.
  • I do not know what will happen if you have more than one preferred server set.

The script

#######################################################################################################################
#   _____     __     ______     ______     __  __     ______     ______     _____     ______     ______     ______    #
#  /\  __-.  /\ \   /\___  \   /\___  \   /\ \_\ \   /\  == \   /\  __ \   /\  __-.  /\  ___\   /\  ___\   /\  == \   #
#  \ \ \/\ \ \ \ \  \/_/  /__  \/_/  /__  \ \____ \  \ \  __<   \ \  __ \  \ \ \/\ \ \ \ \__ \  \ \  __\   \ \  __<   #
#   \ \____-  \ \_\   /\_____\   /\_____\  \/\_____\  \ \_____\  \ \_\ \_\  \ \____-  \ \_____\  \ \_____\  \ \_\ \_\ #
#    \/____/   \/_/   \/_____/   \/_____/   \/_____/   \/_____/   \/_/\/_/   \/____/   \/_____/   \/_____/   \/_/ /_/ #
#                                                                                                                     #
#                                                   http://lokna.no                                                   #
#---------------------------------------------------------------------------------------------------------------------#
#                                          -----=== Elevation required ===----                                        #
#---------------------------------------------------------------------------------------------------------------------#
# Purpose: List VMs that are not running at their preferred host, and migrate them to the correct host.               #
#                                                                                                                     #
#=====================================================================================================================#
# Notes:                                                                                                              #
# There is an option to disable VM migration. If migration is disabled, a list is returned of VMs that are running at #
# the wrong host.                                                                                                     #
#                                                                                                                     #
#######################################################################################################################
 
 
 
$CaptureTime = (Get-Date -Format "yyyy-MM-dd HH:mm:ss")
Write-Host "-----$CaptureTime-----`n"
# Add the VMM cmdlets to the powershell
Import-Module -Name "virtualmachinemanager"
 
# Connect to the VMM server
Get-VMMServer –ComputerName VMM.Server.Name|Select-Object Name
 
#Options
$HostGroup = "All Hosts\SQLMGMT\*" #End this with a star. You can go down to an individual VM. All Hosts\Hostgroup\VM.
$MoveVMs = $true #If you set this to true, we will try to migrate VMS to their preferred host.
#List VMS in the host group
$VMs = Get-SCVirtualMachine | where { $_.IsHighlyAvailable -eq $true -and $_.ClusterPreferredOwner -ne $null -and $_.HostGroupPath -like $HostGroup }
 
# Process
Foreach ($VM in $VMs) 
{
    # Get the Preferred Owner and the Current Owner
    $Preferred = Get-SCVirtualMachine $VM.Name | Select-Object -ExpandProperty clusterpreferredowner
    $Current = $VM.HostName
 
 
    # List discrepancies
    If ($Preferred -ne $Current) 
    {
        Write-Host "VM $VM should be running at $Preferred but is running at $Current." -ForegroundColor Yellow
        If ($MoveVMs -eq $true)
        {
            $NewHost = Get-SCVMHost -ComputerName $Preferred.Name
            Write-Host "We are trying to move $VM from  $Current to $NewHost." -ForegroundColor Green
            Move-SCVirtualMachine -VM $VM -VMHost $NewHost|Select-Object ComputerNameString, HostName
        } 
    }
}

Hyper-V VM with VirtualFC fails to start

Problem

This is just a quick note to remember the solution and EventIDs.

The VM fails to start complaining about failed resources or resource not available in Failover Cluster manager. Analysis of the event log reveals messages related to VirtualFC:

  • EventID 32110 from Hyper-V-SynthFC: ‘VMName’: NPIV virtual port operation on virtual port (WWN) failed with an error: The world wide port name already exists on the fabric. (Virtual machine ID ID)
  • EventID 32265 from Hyper-V-SynthFC: ‘VMName’: Virtual port (WWN) creation failed with a NPIV error(Virtual machine ID ID).
  • EventID 32100 from Hyper-V-VMMS: ‘VMNAME’: NPIV virtual port operation on virtual port (WWN) failed with an unknown error. (Virtual machine ID ID)
  • EventID 1205 from Microsoft-Windows-FailoverClustering: The Cluster service failed to bring clustered role ‘SCVMM VM Name Resources’ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Analysis

The events point in the direction of Virtual Fibre Channel or Fibre Channel issues. After a while we realised that one of the nodes in the cluster did not release the WWN when a VM migrated away from it. Further analysis revealed that the FC driver versions were different.

SNAGHTML66058b10

SNAGHTML660875d4

Solution

  • Make sure all cluster nodes are running the exact same driver and firmware for the SAN and network adapters. This is crucial for failovers to function smoothly.
  • To “release” the stuck WWNs you have to reboot the offending node. To figure out which node is holding the WWN you have to consult the FC Switch logs. Or you could just do a rolling restart and restart all nodes until it starts working.
  • I have successfully worked around the problem by removing and re-adding the virtual FC adapters n the VM that is not working. I do not know why this resolved the problem.
  • Another workaround would be to change the WWN on the virtual FC adapters. You would of course have to make this change at the SAN side as well.

Is your LAPS working as it should?

Intro

So, you have implemented LAPS, and you are wondering whether or not it is working as it should? Or at least, you should wonder about that. You see, LAPS is a solution with quite a few “moving parts”, and all of them have to work for your local administrator passwords to be randomized and rotated automatically. You need a Group Policy Client Side Extension on each and every Server and Workstation (client), you need a GPO using said extension, and you need to extend the schema and set AD permissions. If any of these are not working properly somewhere, LAPS will not work properly. The most usual problems are:

  • The GPO CSE is not deployed to some clients.
  • The GPO is not linked in all OUs where you have clients.

 Detection

We can easily check if LAPS is working for a specific client be reading the contents of the AD attributes associated with LAPS. We need access to read these properties, so all automated and manual testes mentioned henceforth has to be run by an account with permissions to read the properties. The LAPS operations guide details how you should configure the permissions. That being said, you should of course also test the permissions to make sure that only privileged users are able to read said properties. The properties are called:

  • ms-Mcs-AdmPwd stores the password as clear text.
  • ms-Mcs-AdmPwdExpirationTime stores the point in time for the next password change. The GPO checks this value when it is applied and resets the password if the time has passed.

We can use both of these to test if LAPS has been applied to a specific computer object at least once. If you do a manual test by using the Attribute Editor in AD Users and Computers you will see both. I have written PowerShell commands to automate the process based on the value of the ms-Mcs-AdmPwdExpirationTime attribute.

List computers without LAPS

This lists all computer objects without a LAPS expiry set. Virtual cluster computer objects are excluded. The results are exported to the file C:\TEMP\NoLaps.csv.

get-adcomputer -Properties Name, operatingSystem, Description, ms-Mcs-AdmPwdExpirationTime `
-LDAPFilter "(&(!ms-Mcs-AdmPwdExpirationTime=*)(operatingSystem=Windows*)(!Description=ClusterAwareUpdate*)(!Description=Failover cluster virtual network name account))"|`
Select Name, operatingSystem, Description, ms-Mcs-AdmPwdExpirationTime| Sort-Object Name | export-csv C:\Temp\NoLaps.csv -Delimiter ";" -NoTypeInformation

List computers with expired LAPS

Lists all computer objects where LAPS has been applied at least once, where the expiration time has passed. These are usually computers that are not powered on, maybe removed but not properly deleted from the AD. The results are exported to the file C:\TEMP\ExpiredLaps.csv.

$now = Get-Date
get-adcomputer -Properties Name, operatingSystem, Description, ms-Mcs-AdmPwdExpirationTime `
-LDAPFilter "(&(ms-Mcs-AdmPwdExpirationTime=*)(operatingSystem=Windows*)(!Description=ClusterAwareUpdate*)(!Description=Failover cluster virtual network name account))"|`
Select Name, operatingSystem, Description, @{N='ExpiryTime'; E={[DateTime]::FromFileTime($_."ms-Mcs-AdmPwdExpirationTime")}}| `
Where-Object ExpiryTime -lt $now| Sort-Object ExpiryTime| export-csv C:\Temp\ExpiredLAPS.csv -Delimiter ";" -NoTypeInformation

Get LAPS Expiration date for one or more computer(s)

This command lists the expiration time for one or more computers based on an LDAP filter. The sample filter (Name=Badger*) will list all computers whose name starts with Badger. Computers where the expiration time is not set are filtered out. For more information about the LDAP filter syntax se this link:  https://social.technet.microsoft.com/wiki/contents/articles/5392.active-directory-ldap-syntax-filters.aspx

$now = Get-Date
 get-adcomputer -Properties Name, operatingSystem, Description, ms-Mcs-AdmPwdExpirationTime `
-LDAPFilter "(&(ms-Mcs-AdmPwdExpirationTime=*)(Name=Badger*))"|`
Select Name, operatingSystem, Description, @{N='ExpiryTime'; E={[DateTime]::FromFileTime($_."ms-Mcs-AdmPwdExpirationTime")}}| Sort-Object Name

Get LAPS expiration date for one or more computers, excluding those with no expiry set

Similar to above, but includes computer objects where the expiration time is not set. Those return 01.01.1601 01.00.00 as ExpiryTime because of the conversion of 0 from FileTime to DateTime. To put it in another way, if the expiration time is reported as 01.01.1601 01.00.00 it has not been set.

$now = Get-Date
 get-adcomputer -Properties Name, operatingSystem, Description, ms-Mcs-AdmPwdExpirationTime `
-LDAPFilter "(&(!ms-Mcs-AdmPwdExpirationTime=*)(Name=Badger*))"|`
Select Name, operatingSystem, Description, @{N='ExpiryTime'; E={[DateTime]::FromFileTime($_."ms-Mcs-AdmPwdExpirationTime")}}| Sort-Object Name

Securing Windows Active Directory

This is a list of measures you can implement to increase your Windows AD Security. The list is in no way exhaustive, and some of the items overlap. Be aware that security recommendations change over time. This article was originally created 2018.01.22. If that is several years in the past when you read this, I cannot promise that all recommendations are up to date.

LAPS – Local administrator password management

Implementing LAPS ensures that all your domain-joined computers have a unique password that is changed periodically for the local administrator account. It operates as a GPO Client Side Extension, and thus requires you to install and register a DLL on each target computer. You can do this via GPO, in your VM image, or through any other software deployment solution you may use.

On the management computers and/or the DC itself, you have to add management tools and GPO Editor templates. There is a graphical user interface and a PowerShell module. The PowerShell module also includes the commands necessary to extend the AD Schema for storing the passwords and their associated expiry date.

See https://technet.microsoft.com/en-us/mt227395.aspx for details.

Securing the built-in Administrator account

 

The built in Administrator account in the domain should be secured. The ObjectSID of the domain admin account always ends in -500, and is thus easy to identify even if the name has been changed. The guidance used to be “Disable the Administrator account”, but it has been changed due to some recovery scenarios requiring an active Administrator-account. Specifically, the Administrator account is the only account able to log on when no global catalogs are online.

See https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/plan/security-best-practices/appendix-d–securing-built-in-administrator-accounts-in-active-directory for details and an implementation guide. Some highlights are shown below.

Set the DOMAIN\Administrator account as sensitive and require smart card

 

clip_image001

Create a GPO to prevent Domain Admins from logging on to member servers or workstations

I have gone a bit further than the guide here, adding Domain Admins and Guests for good measure. The “Local account and member of Administrators group” is related to denying local administrator accounts access to the computer from the network. More about this below.

Make sure that this GPO does not apply to domain controllers, that is, do not link it at the domain level.

clip_image002

 

Block remote access for local accounts

Add Guests, Local account and member of Administrators group, Domain Admins, Enterprise Admins and Schema Admins to the policy Computer Configuration\Windows Settings\Local Policies\User Rights Assignment\Deny Access to this computer from the network.

clip_image003

For details, see https://blogs.technet.microsoft.com/secguide/2014/09/02/blocking-remote-use-of-local-accounts/

Disable weak ciphers for Windows Secure Channel

You can build a GPO to limit the cipher suites used by the Windows Secure Channel API, and by extension IIS. Be aware that this does not in any way limit other usage of weak ciphers. For instance, a TomCat server running on the same computer may very well use RC4 even if you have removed it from the list of Windows secure channel ciphers.

The GPO is located at Computer Configuration\Administrative Templates\Network\SSL Configuration Settings\Cipher Suites.

When you enable this setting, you get a list of all the default ciphers as a long comma separated string. Which ciphers you get is dependent of the Windows version. The easiest way to edit this list is to copy the string into a text editor. You can change the order to change the priority and remove weak ciphers.

clip_image004

clip_image005

 

Do not allow local users to run remote elevated sessions

Do not apply this fix: https://support.microsoft.com/en-us/help/951016/description-of-user-account-control-and-remote-restrictions-in-windows

That is, do not create the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System \LocalAccountTokenFilter Policy value, and if it exists, make sure it is set to 0. We could of course create a GPO to enforce this setting.

clip_image006

For details, see https://www.harmj0y.net/blog/redteaming/pass-the-hash-is-dead-long-live-localaccounttokenfilterpolicy/

Set a password policy and a lockout policy

 

  • Password length: 8 characters. Encourage users to create passwords with a random length between 8 and 20 characters. You want your users to have passwords that vary in length. If you set this limit to 14, chances are all passwords are exactly 14 characters long. This makes it a lot easier to crack them.
  • Complexity not required. If you require complexity, users tend to add numbers and capitals at the start and end of the password.
  • Password history: 10.
  • Maximum password age: 0, that is password never expires. To frequent password changes may lead to bad password diversity and predictable passwords. Leaked passwords are almost always exploited immediately, so there is no point in forcing a monthly password change. If you must, set the maximum age to one year. Urge users to choose new passwords that are completely different from the previous passwords. That is, do not use MypassWord1, MypasswOrd2 and so on.
  • Do not enable the reversible encryption option. Ever. Just don’t.
  • Lockout policy: Locked for 24 hours after five unsuccessful attempts.

clip_image007

clip_image008

For background information, see:

Enforce SMB Signing and disable SMB1

 

Enforce signing

You can enforce signing on both the server and client side. The server side is shown below. Be aware that some services require this setting to be disabled. If you have such services, create an overriding GPO for those servers only, leaving SMB signing on in the rest of the domain.

 

clip_image009

See https://technet.microsoft.com/en-us/library/cc731957(v=ws.11).aspx

Disable SMB1

You have to create some registry-GPO settings. Details are at the link below. Be aware that legacy clients like Windows XP will be dependent on SMBv1 on Domain Controllers to access the Sysvol share. The recommendation is still to disable SMBv1 everywhere.

 

clip_image010

 

See https://blogs.technet.microsoft.com/staysafe/2017/05/17/disable-smb-v1-in-managed-environments-with-ad-group-policy/ for details.

Create new computer objects in a separate OU, not in the Computers container

 

Thus you can delegate permissions to manage them, and you can apply GPOs to newly added computers. You do this with the rdircmp console application.

  • Log on to a domain controller.
  • Start an administrative CMD-shell
  • Execute rdircmp [FQDN of OU]

clip_image011

You can verify or check this setting using PowerShell:

Get-ADDomain |Select-Object ComputersContainer.

Limit the number of domain admins

Domain admin accounts should only be used for domain administration tasks, and you should not have many of them. Do not use service accounts with domain admin access.

A recommended number of domain admins is 5.

Avoid explicit permissions, prefer group permissions

 

All permissions in AD should preferably be given to groups, not individual users. This makes it a lot easier to manage permissions, and it is also easier to see what permissions a user has based on which groups he is a member of. That is, if you follow this principle. There will always be exceptions, but they should be few and far between.

Limit the number of people with delegated access to AD

 

AD administration task can be delegated. For instance, your service desk could be able to reset passwords and create users without full domain admin access. It is important to limit these delegations and keep tabs on them.

Use dedicated domain controllers

 

  • Make sure that your domain has at least two domain controllers.
  • If they are virtual, they should not be on the same cluster. Preferably you should have at least one dedicated physical domain controller.
  • Do not install anything on you domain controllers, with the exception of backup agents, antivirus software, monitoring agents and software deployment agents.
  • Do not enable the Hypervisor role on your physical domain controllers to run other software in a VM.
  • Make sure you have a system state backup of your domain controllers.

No trusts between domains

 

Avoid using forests and trusts between domains. Trusted domains should be handled as a single security context (e.g. dev, test, production, management etc), and thus you only really need one domain for each security context.

Enforce the Windows firewall

 

Make sure that Windows Firewall is turned on. There are many ways to do this, e.g. SCCM or GPO.

Install antivirus software on all servers and workstations

 

And make sure that it is activated and up to date. SCCM enables you to monitor and manage the default Windows Defender antivirus. Most commercial AntiVirus software comes with some kind of centralized management and monitoring tool.

Log out RDP sessions after 24 hours

 

Remote desktop server sessions that are still active (idle or disconnected) after 24 hours should be logged out automatically. Really active sessions are left for 5 days.

The GPO settings are located at:

Computer Configuration\Administrative Templates\Windows Components\Remote Desktop Services\Remtoe Desktop Session Host\Session Time limits

  • Set time limit for diosconnected sessions: 1 day.
  • Set time limit for active but idle RDS sessions: 1 day.
  • Set time limit for active RDS sessions: 5 days.

Group Managed Service accounts and Managed Service Accounts

Enable the domain for group managed service accounts, and encourage its use on supported services.

https://docs.microsoft.com/en-us/windows-server/security/group-managed-service-accounts/group-managed-service-accounts-overview

https://blogs.msdn.microsoft.com/markweberblog/2016/05/25/group-managed-service-accounts-gmsa-and-sql-server-2016/

Removing a drive from a cluster group moves all cluster group resources to Available Storage

Problem

During routine maintenance on a SQL Server cluster we were planning to remove one of the clustered drives. We had previously replaced the SAN, and this disk was backed by an old storage unit that we wanted to decommission. So we made sure that there were no dependencies, right-clicked the drive in Failover Cluster Manager under the SQL Server role and pressed “Remove from SQL Server”. Promptly the drive vanished from view, together with all other cluster resources associated with the role…

After a slightly panicky check to make sure that the SQL Server instance was still running (it was), we started to wonder about what was happening. Running Get-ClusterResource in PowerShell revealed that all our missing resources had been moved to the “Available Storage” resource group.

image

We did a failover to verify that the instance was still working, and it gladly failed over with the Available Storage group. There is a total of 4 instances of SQL Server on the sample cluster pictured above.

Solution

The usual warning: Performing this procedure may result in an outage. If you do not understand the commands, read up on them before you try.

Move the resources back to the SQL Server resource group. If you move the SQL Server resource, that is the resource with the ResourceType SQL Server, all other dependent resources should follow. If your dependency settings are not configured correctly, you may have to move some of the resources independently.

Command: Get-ClusterResource “SQL Server (instance)”|Move-ClusterResource –Group “SQL Server (instance)”

Just replace Instance with the name of your SQL Server instance.

Then, run Get-ClusterResource|Sort-Object OwnerGroup, ResourceType to verify that all you resources are associated with the correct resource group. The result should look something like this. As a minimum, you should have an IP address, a network name, SQL Server, SQL Server Agent and one ore more Physical disk drives.

image

Microsoft Update with PSWindowsUpdate 2.0

Preface

This is an update to my previous post about PSWindowsUpdate located here: https://lokna.no/?p=2132. The content is pretty much the same, but updated for PSWindowsUpdate 2.0.

Most of my Windows servers are patched by WSUS, SCCM or a similar automated patch management solution at regular intervals. But not all. Some servers are just too important to be autopatched. This is a combination of SLA requirements making downtime difficult to schedule and the sheer impact of a botched patch run on backend servers. Thus, a more hands-on approach is needed. In W2012R2 and far back this was easily achieved by running the manual Windows Update application. I ran through the process in QA, let it simmer for a while and went on to repeat the process in production if no nefarious effects were found during testing. Some systems even have three or more staging levels. It is a very manual process, but it works, and as we are required to hand-hold the servers during the update anyway, it does not really cost anything. Then along came Windows Server 2016. Or Windows 10 I should really say, as the Update-module in W2016 is carbon copied from W10 without changes. It is even trying to convince me to install W10 Creators update on my servers…

clip_image001

In Windows Server 2016 the lazy bastards at Microsoft just could not be bothered to implement the functionality from W2012R2 WU. It is no longer possible to defer specific updates I do not want, such as the stupid Silverlight mess. If I want Microsoft update, then I have to take it all. And if I should become slightly insane and suddenly decide I want driver updates from WU, the only way to do that is to go through device manager and check every single device for updates. Or install WUMT, a shady custom WU client of unknown origin.

I could of course use WSUS or SCCM to push just the updates I want, but then I have to magically imagine what updates each server wants and add them to an ever growing number of target groups. Every time I have a patch run. Now that is expensive. If I had enough of the “special needs” servers to justify the manpower-cost, I would have done so long ago. Thus, another solution was needed…

PSWindowsUpdate to the rescue. PSWindUpdate is a Powershell module written by a user called MichalGajda enabling management of Windows Update through Powershell. You can find it here: https://www.powershellgallery.com/packages/PSWindowsUpdate/2.0.0.0. In this post I go through how to install the module and use it to run Microsoft Update in a way that resembles the functionality from W2012R2. You could tell the module to install a certain list of updates, but I found it easier to hide the unwanted updates. It also ensures that they are not added by mistake with the next round of patches.

Getting started

(See the following chapters for details.)

  • You should of course start by installing the module. This should be a one-time deal, unless a new version has been released since last time you used it. New versions of the module should of course be tested in QA like any other software.
  • Then, make sure that Microsoft Update is active.
  • Check for updates to get a list of available patches.
  • Hide any unwanted patches
  • Install the updates
  • Re-check for updates to make sure there are no “round-two” patches to install.

Continue reading “Microsoft Update with PSWindowsUpdate 2.0”

About the UserAccountControl attribute

Intro

When working with MIM you will sooner or later have to deal directly with the UserAccountControl Active Directory attribute. This attribute defines account options, and we use it most prevalently to enable and disable users, but there are a lot of other options as well. These options are stored in a binary value as bit flags, where each bit defines a specific function.

Bit number 1 (or 2 if you are not used to zero-based numbering) defines whether or not an account is enabled. Bit number 9 defines an account as a normal account. Thus, a normal disabled account will have bits 1 and 9 set to one. As long as no other bits are set, the decimal value is 2^9 + 2^1 = 514 or (0010 0000 0010). If we enable the account, the value is 2^9 = 512 (0010 0000 0000).

In MIM we are usually presented with decimal values. These are easier to read, but not necessarily easier to understand.

Continue reading “About the UserAccountControl attribute”

MIM: The Portal cannot connect to the middle tier using the web service interface

Problem

After installing the MIM Service and Portal successfully, you are greeted by a portal that never loads and eventually displays a generic 503-error or a “Service not available notice”.

image

Analysis

This is a list of things I checked while trying to smoke out the badger causing this issue:

  • IIS bindings, even though I tested this prior to running the installer
  • The enormous setup log (verbose logging).
  • IISRESET.
  • Sharepoint alternate access mappings, also checked and found to be working prior to the installation.
  • Service status, both the FIM service and the Sharepoint services were running.
  • Restarted the server (have you tried turning it off and on again?).
  • FIM Event log, empty

And then I finally had the bright idea to check the application event log. It looked like the remnants of a great battle, only red and yellow messages in sight:

image

I dug in and found this one particularly interesting, Event 10 from Microsoft.ResourceManagement.PortalHealthSource:

The Portal cannot connect to the middle tier using the web service interface.  This failure prevents all portal scenarios from functioning correctly.


The cause may be due to a missing or invalid server url, a downed server, or an invalid server firewall configuration.


Ensure the portal configuration is present and points to the resource management service.

SNAGHTML2be3c711

I suddenly remembered that the load balancer was not yet configured and went to check the DNS records for the MIM urls. As I suspected, they were pointing to the load balancer, but the load balancer did not know where to redirect the traffic and thus did nothing.

Solution

For once, a simple solution without much of a risk factor:

  • Change the DNS record for the load balanced addresses, in this case the MIM Service server address to point directly to one of the portal servers.
  • Perform an IISRESET on the portal servers

I could of course fix the load balancer as well, but that requires a minion with access, and as the local time is 00:18 on a Saturday I will just add it to the list of things to fix later.

Administrator locked out of the MIM Portal after initial MA sync

Problem

After the first MIM Portal / Service management agent sync run the initial portal administrator account (the one used during portal installation) is locked out of the portal. The error message “Unable to process your request” and “The requestor of this operation is invalid” is displayed when you try to log in:

image

 

Analysis

For some reason, the User mapping is removed from the FIMService database. The query SELECT * FROM [FIMService].[fim].[UserSecurityIdentifiers] returns 0 rows. at this point, one row should be returned, lining the default admin UserObjectKey (2340 at time of writing) with the SID for the account used to install the MIM Service.

I found the solution with help from this post: http://dloder.blogspot.no/2011/12/administrator-locked-out-of-fim-portal.html.

In short, use Extended Events or SQL Profiler to find the ObjectID and corresponding ObjectKey.

SELECT *  FROM [FIMService].[fim].[Objects]
  WHERE ObjectKey = '2340'
 
ObjectKey	ObjectTypeKey	ObjectID
2340	24	7FB2B853-24F0-4498-9534-4E10589723C4

The ObjectKey and ObjectID for the first administrator account seems to be hard-coded into the FIMService database. This conclusion is based on the fact that I got the same values as those from a fresh MIM 2016 install as those listed in a post from 2011.

What remains is to re-establish the link between the FIMService Object and the AD SID (user).

Update 2017.09.20: Further analysis strongly indicates that the root cause of the problem is lack of a filter in the MIM/FIM Service MA during the initial sync run. There should be a filter in the MA preventing synchronization of the primary administrator account (the account used during installation) and the Built-in Synchronization account.

Solution

The usual warning: This solution details commands that should be understood before they are executed in a production environment. If the solution looks like gibberish, seek help before you continue. You may need a DBA to interpret the commands. And remember backups.

Get you SID in hexadecimal form. You can get it from AD Users and Computers in advanced mode. Open your user, look at the attribute list and find the ObjectSID attribute. View it in hexadecimal form. It should look something like 01 05 00 00 00 and so on. Remove the spaces using your favorite text editor, and add a 0x prefix to indicate that this is a hex value. Your result should look like this:

0x010500000000000512345234504560734063457AFCDEBB69EE0000

Execute the following SQL command against your FIMService database:

INSERT INTO [FIMService].[fim].UserSecurityIdentifiers 
VALUES (2340, 0x010500000000000512345234504560734063457AFCDEBB69EE0000)

Then, perform an IISRESET. You should now be able to log in to the portal again.

MIM LAB7: Testing Run profiles and populating data

This post is part of a series. The chapter index is located here.

In this post we will create run profiles and initialize the MAs.

Continue reading “MIM LAB7: Testing Run profiles and populating data”