Run disk maintenance on a failover cluster mountpoint

Problem

“Validate this cluster” or another tool tells you that the dirty bit is set for a cluster shared volume, and taking the disk offline and online again (to trigger autochk) does not help. The error message from “Validate this cluster” looks like this:

SNAGHTML2c4a12c

 

Cause

Such issues are usually caused by intermittent SAN connections errors, as modern SANs have very resilient drives that rarely cause actual hard faults on the LUNs. The faults should be correctable soft errors in most cases. Check with your SAN admin BEFORE you run chkdsk if you don’t know the cause of the error. A hard fault on the SAN is no walk in the park and usually require that you restore from backup on separate spindles/another SAN.

Resolution

Perform a manual disk check. If you are using drive letters, follow the procedure in http://technet.microsoft.com/en-us/library/cc772587.aspx to enable maintenance mode:

Start with taking the services who are using the using the drives offline, e.g. SQL Server:

 SNAGHTML2cf9135

Then, put the drives in maintenance mode:

SNAGHTML2d1b690

Then run chkdsk [Driveletter:] /f as usual. With mountpoints, the easiest way is to use the GUI to access the graphical version of checkdisk:

SNAGHTML2cc811f

Or you can run chkdsk against the mountpoint:

chkdsk  [mountpoint path] /f

To check if the disk is dirty, use

fsutil dirty query [Driveletter:]
fsutil dirty query [mountpoint path]

 

When you are finished, turn of maintenance mode and restart the services. Hopefully this will resolve your issues. If not, check with you SAN administrator. If no errors are reported on the SAN side, the volume can safely be formatted an data restored from the backup you most certainly have available Smilefjes.

Author: DizzyBadger

SQL Server DBA, Cluster expert, Principal Analyst

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.