PERCSAS2 Event ID 129

Problem

Event ID 129 from percsas2 shows up in the system event log several times a day, stating “Reset to device, \Device\RaidPort4, was issued.”

image

I suddenly noticed this event in the log on four of my servers (Dell M820 blades). This is usually a bad tiding, foreboding imminent disk failure or a system wide badger infestation. As these servers are all quite new though and still running fine, I suspected the problem may be located elsewhere. The other culprit is usually drivers or firmware. Amazing as it may sound, it actually does happen that vendor support engineers are correct in demanding you update everything and the kitchen sink.

 

Analysis

Thus, I peered into the depths of the event log and various other logs, searching for signs of driver changes. And lo and behold, in the update history I found this:

image

A driver update from Windows Update! These may be quite mundane on an end user system, but on servers such updates are outright dangerous. Further analysis reveals that my troubles began just after the driver update was applied:

image

The driver version on the troubled servers is 6.801.5.0:

image

but the latest official release from Dell is 5.2.220.64

image

I tried downgrading to the previous driver version (5.2.220), but that didn’t solve the problem. I then checked the firmware. My servers had version 21.2.0-0007 installed, and the latest version from Dell is 21.2.1-0000. The release notes for the new version listed this fix:

–  Fixes an issue where IO can be stalled (up to 30s in some cases)

To me, this sounds like a possible root cause. Maybe the issue was dormant until the new driver version lured it out of hiding?

Solution

AFAIK it is impossible to uninstall the driver from Windows Update. Thus, you have to install the latest official driver and firmware from Dell to drive out the badgers. At the time of writing, the latest (official) versions was:

  • Driver 5.2.220.64 (2012.08.09)
  • Firmware 21.2.1-0000  (2013.06.12)

So far, I haven’t logged any further error messages after upgrading the firmware and downgrading the driver.

Author: DizzyBadger

SQL Server DBA, Cluster expert, Principal Analyst

8 thoughts on “PERCSAS2 Event ID 129”

  1. I spoke to soon – I have this exact same problem… the driver was upgraded via whql for my perc h700 … I tried to roll back and install the driver from the dell website: however after the driver is installed on reboot the machine BSOD’s on bootup with no recovery methods working except to restore from backup… it seems i’m trapped on the WHQL driver version :( dell prosupport are no help

    1. What BSOD are you getting? And did you get the exact firmware version, with the correct build date? I have heard rumors of a new PERC firmware that reintroduces the bug found in 21.2.0-0007.

  2. I have a solution… I updated via Microsoft update (3) T710 servers with H700 controllers on 7-12-12. Driver version 5.2.220.64, a version that was only supposed to work with H710 cntrollers-(have seen mulitple posts that H710 controllers have same issue that H700 did). I have tried numerous times to update driver. I updated firmware sucessfully to latest version using Q1 2014 SUU. When updating driver, BSOD. FYI-Dell SUU actually showed latest version 4.31.1.64 and it would be a downgrade to bring driver to latest supported driver version. Now solution- I found that the problem was the percsas2.sys file. If you changed that file, manually or during driver update utility, BSOD. The key was to find a driver for the perc h700 controller that was not using file named percsas2.sys. For my controller I found that LSI had a driver Megasas.sys version 4.5.1.64 would allow me to boot to Windows. Process- From windows, (if you can’t boot to windows, load to windows recovery, command prompt, open notepad, open file, then use explorer to replace percsas2.sys {whatever version you are using} with the percsas2.sys 5.2.220.64, the windows update version. That should allow you to boot to Windows). From Windows manually update H700 or H710 controller with Megasas2.sys driver. Reboot. Delete percsas2.sys version 5.2.220.64. Manually update controller to current driver using manual or utility method. Reboot. Problem solved. – Sorry so long, hope I covered everything. This is the 1st forum I have ever posted a reply. Only because this problem really p**sed me off and it took hours of after hours troubleshooting for me to come up with a resolution. Good luck..

    1. Thanks for sharing your experience. Most modern PERC adapters are rebranded/modified LSI hardware and will work with LSI drivers.

  3. Thanks for the reply – I fixed ours by restoring from backup and selecting a windows update driver for the perc h700 internal instead of h700 pcie manually and now its stable… but it seems percsas2.sys is definitely the root cause of the problem only i fixed it in a different method thanks! Dell was useless on this!

    1. Glad to hear it worked out in the end.
      In my experience, the quality of Dell-rebranded drivers and firmware has been on a rapid decline lately. I have spent quite a lot of time troubleshooting driver memory leaks and firmware errors on DELL-rebranded network mezzanine cards. I have reached a point were I no longer update drivers unless I have a specific problem, as each new version seems to introduce new errors…

  4. Thanks for post Shane Albee. I had exact same problem after running Dell SUU on system with H700 adapter. Was able to recover the system after reading your post. My case restoring percsas2.sys from version 6.801.5 did the trick.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.