PERCSAS2 Event ID 129

Problem

Event ID 129 from percsas2 shows up in the system event log several times a day, stating “Reset to device, \Device\RaidPort4, was issued.”

image

I suddenly noticed this event in the log on four of my servers (Dell M820 blades). This is usually a bad tiding, foreboding imminent disk failure or a system wide badger infestation. As these servers are all quite new though and still running fine, I suspected the problem may be located elsewhere. The other culprit is usually drivers or firmware. Amazing as it may sound, it actually does happen that vendor support engineers are correct in demanding you update everything and the kitchen sink.

Continue reading “PERCSAS2 Event ID 129”

Poor disk performance on Dell servers

Problem

I’ve been managing a lot of Dell servers lately, where the baseline showed very poor performance for local drives connected to PERC (PowerEdge Expandable RAID Controller) controllers. Poor enough to trigger negative marks on a MSSQL RAP. Typically, read and write latency would never get below 11ms, even with next to no load on a freshly reinstalled server. Even the cheapest laptops with 4500 RPM SATA drives would outperform such stats, and these servers had 10 or 15K RPM SAS drives on a 6Gbps bus. We have a combination of H200, H700 and H710 PERC controllers on these servers, and the issues didn’t seem to follow a pattern, with one exception: all H200 equipped servers experienced poor performance.

Analysis

A support ticket with Dell gave the usual response: update your firmware and drivers. We did, and one of the H700 equipped servers got worse. Further inquiries with Dell gave a recommendation to replace the H200 controllers with the more powerful H700. After having a look at the specs for the H200 I fully agree with their assessment, although I do wonder why on earth they sold them in the first place. The H200 doesn’t appear to be worth the price of the cardboard box it is delivered in. It has absolutely no cache whatsoever, and it also disables the built in cache on the drives. Snap from the H200 users guide:

image

This sounds like something one would use in a print server or small departmental file server in a very limited budget, not in a four-way database cluster node. And it explains why the connected drives are painfully slow, you are reduced to platter speed.

Note: The H200 is replaced by the H310 on newer servers. I have yet to test it, but from what the specs tell me it is just as bad as the H200.

Update: Test data from a H310 equipped test server doing nothing but displaying the perfmon curve:

SNAGHTMLf6bef00

Continue reading “Poor disk performance on Dell servers”