SMBv3.1.1 disconnects and fails to reconnect on Windows 10

Be warned: This will be a long one with a lot of text and few images. I never planned on doing a write-up on this issue, so I did not take a lot of pictures.

I have been troubleshooting this issue on and off for two years, and I was on the brink of giving up several times. I pride myself in finding solutions where others only find stress and hair-loss, and do so routinely, but sadly there are still nuts I cannot crack. This issue was believed to be such a nut. But I was wrong. The solution had been staring me straight in the eyes for quite some time, but we must not get ahead of ourselves. Let us start at the beginning.

Problem

SMB sessions are invalidated, such that it is impossible to reconnect. This happens only on Windows 10 clients, Windows 7 and 8? clients running SMBv2.* can still reconnect as normal.

User story:

  • The user opens a file explorer window and navigates to a folder on a fileserver containing documents the user wants to read and/or edit.
  • This works without issue 100% of the time as long as the client computer has a network connection to the file server.
  • After a period of inactivity the SMB session is suspended. The user does not detect this, everything is still ok.
  • Some time later, the user will either
    • Try to save a file
    • Try to open a new file using the same File Explorer window
  • Possible outcomes
    • Everything works as expected
    • It is impossible to save the file to the server, it has to be saved locally.
    • The File Explorer window is gone. The user has to re-open the window and navigate back to the folder in question.
  • Thus, the user gets annoyed and and complains about the stupid Windows 10 upgrade, which is understandable.

Relevant Event IDs: 30807 from SMBClient and 1016 from SMBServer.

Continue reading “SMBv3.1.1 disconnects and fails to reconnect on Windows 10”

Unable to add shares to Windows 2012 File Cluster

Problem

image

When you try to add a share to a newly formed (and perhaps also an existing) Windows 2012 File Server Cluster, you get an error message stating that the you are unable to do so due to lack of WinRM communication between the cluster nodes. Additionally, you may spot event id 49 from WinRM MI Operation in the Windows Remote Management operational event log with the following message:

“The WinRM protocol operation failed due to the following error: The WinRM client sent a request to an HTTP server and got a response saying the requested HTTP URL was not available. This is usually returned by a HTTP server that does not support the WS-Management protocol..”

SNAGHTML6084577

Or the following text for Event 49:

“The WinRM protocol operation failed due to the following error: The connection to the specified remote host was refused. Verify that the WS-Management service is running on the remote host and configured to listen for requests on the correct port and HTTP URL..”

And event id 142 from Windows Remote management stating

“WSMan operation Enumeration failed, error code 2150859027”

SNAGHTML609f7a6

Other possible events:

EventID 0 from FileServices-Manager.Eventprovider

image

“ Exception: Caught exception Microsoft.Management.Infrastructure.CimException: The WinRM client received an HTTP status code of 502 from the remote WS-Management service.
   at Microsoft.Management.Infrastructure.Internal.Operations.CimSyncEnumeratorBase`1.MoveNext()
   at Microsoft.FileServer.Management.Plugin.Services.FSCimSession.PerformQuery(String cimNamespace, String queryString)
   at Microsoft.FileServer.Management.Plugin.Services.ClusterEnumerator.RetrieveClusterConnections(ComputerName serverName, ClusterMemberTypes memberTypeToQuery)”

Error code 504 has also been detected.

Analysis

The problem is clearly related to windows Remote Management. What was even more peculiar in this case, was the fact that when I failed over to another node, the error message disappeared. Thus I knew that the error was isolated to the one node. But even though I spent hours comparing settings on the nodes, all I was able to establish was the fact that they were exactly alike. Then I remembered something from my Exchange admin days; In earlier versions of Windows, WinRM could be removed and reinstalled from the system. I remember this because Exchange 2010 relied heavily on WinRM and remote powershell, bot of which could be a major pain to get working properly. In Win2012, remote management is heavily integrated in server manager, and I was unable to find a way to remove it. I did however find a way to turn it off an on again.

Update 2016.11.24:

I found another version of this problem where solution one did not work. It was still a WinRM-problem, but this time it was proxy-related. You may need an explicit  proxy exception for the local domain.

Solution one

Disable and enable WinRM. There are of course multiple ways to achieve this. I used powershell, but there is an option in the gui, and the command works in CMD.EXE as well. Beware, you have to use an elevated powershell prompt. When I come to think of it, most things that are worh doing seems to require an elevated shell.

Configure-SMRemoting -disable
Configure-SMRemoting -enable

That is it. no need to reboot or anything, just run the two commands and wait for them to finish. If you get a message that remoting is enforced by Group Policy, look for this GPO:

image

It has to be set as Not configured to allow you to disable and enable WinRM. If it is enforced by a domain policy, you have to block said policy temporarily while you fix this.

Enabling and disabling should also make sure that the necessary firewall settings are enabled. If you have a proxy server defined, make sure you have exceptions added for your local servers as this could also block WinRM, albeit with other error messages.

Solution two

Make sure you have an exception in your proxy definition for the local domain. For system proxy setups:

netsh winhttp set proxy [proxyserveraddress]:[proxy port] bypass-list=”*.ADDomain.local;<local>”

For other proxy configs, ask your proxy admin.

Testing winrm with powershell

You can use the Invoke-command powershell command to test powershell remote connections:

Invoke-Command -ComputerName Lab-DC -ScriptBlock { Get-ChildItem c:\ } -credential lab\sauser

This command will output a directory listing of c:\ on the computer Lab-DC. The command will be executed with the lab\sauser account. Powershell prompts for account password on execution. Sample output:

07-05-2014 11-57-04

Unable to access local drive(s)

Problem

On a Windows 2008 or 2008 R2 server administrators are unable to browse the contents of local drives while logged on to the server either directly at the console or via remote desktop. Access to the same drive using a network share works fine. UAC is turned on, and the local administrators group have full control access to the drive(s) in question. You get an “Access denied” error in Windows Explorer even when running in an elevated process (administrator mode).

The problem also affects Windows Vista and 7.

Analysis

If you try to access the drive using a program other than Windows Explorer, you can access the drive as long as the program is running in an elevated session. The problem seems to affect Windows Explorer alone, but I am not sure about that. What I have been able to establish though, is that it only affects users who are members of the local “Administrators” group. If a user has explicit access or access through another group, everything works as expected.

I detected the problem while migrating files and permissions from an old 2003 server to a new one running 2008 R2, and I think it is related to the local “Users” group not being granted access to the drive. Not denied, just removed from the root acl on the drive.

Solutions

  • Add explicit access to the drive for the administrative users that need access
  • Turn off UAC (not recommended)
  • Create a new group called Local_Admin_Access or something like that, add the local administrators group as a member, and give the new group full control of the drive.
  • Give the local group “Interactive” full control of the drive. This grants access to any user who have local logon permissions and are currently logged on to the server.