Kernel memory leak analysis

I have had several issues in the past year involving kernel memory leaks, so I decided to make a separate blog post about general kernel memory leak analysis. In this post I mostly use the amazing Sysinternals tools for troubleshooting. You also need Poolmon.exe, a small utility currently part of the Windows Driver Kit. Sadly, this 35k self contained .exe is not available as a separate download, you have to download and install the entire 500+MiB WDK somewhere to extract it. You only have to do this once though, as there is no need to install the WDK on every system you analyze. You can just copy the executable from the machine where you installed the WDK.

Problem

Something is causing the kernel paged or non paged pools to rise uncontrollably. Screenshot from Process Explorer’s System Information dialog:

image

In this sample, the non paged pool has grown to an unhealthy 2,2GB, and continues to grow. Even though the pool limit is 128GIB and the server has a whopping  256GIB of RAM, the kernel memory pools are usually way below the 1GiB mark. You should of course baseline this to make sure you actually have an issue, but generally speaking, every time I find a Kernel memory value above 1GiB I go hunting for the cause.

Note: To show the pool limits, you have to enable symbols in Process Explorer. Scott Hanselman has blogged about that here: http://www.hanselman.com/blog/SetUpYourSystemToUseMicrosoftsPublicSymbolServer.aspx

Analysis

Kernel leaks are usually caused by a driver. Kernel leaks in the OS itself are very rare, unless you are running some sort of beta version of Windows. To investigate further, you have to fire up poolmon.exe.

image

Poolmon has a lot of shortcuts. From KB177415:

P – Sorts tag list by Paged, Non-Paged, or mixed. Note that P cycles through each one.
B – Sorts tags by max byte usage.
M – Sorts tags by max byte allocation.
T – Sort tags alphabetically by tag name.
E – Display Paged, Non-paged total across bottom. Cycles through.
A – Sorts tags by allocation size.
F – Sorts tags by “frees”.
S – Sorts tags by the differences of allocs and frees.
E – Display Paged, Non-paged total across bottom. Cycles through.
Q – Quit.

The important ones are “P”, to view either paged or non-paged pool tags, and “B”, to list the ones using the most of it at the top. The same view as above, after pressing “P” and “B”:

image

The “Cont” tag relates to “Contiguous physical memory allocations for device drivers”, and is usually the largest tag on a normal system.

And this screenshot is from the server with the non-paged leak:

image

As you can see, the LkaL tag is using more than 1GiB on its own, accounting for half of the pool. we have identified the pool tag, now we have to look for the driver that owns it. To do that, I use one of two methods:

1: Do an internet search for the pool tag.

http://blogs.technet.com/b/yongrhee/archive/2009/06/24/pool-tag-list.aspx contains a large list of tags.

2: Use Sysinternals strings together with findstr.

Most kernel mode drivers are located in “%systemroot%\System32\drivers”. First you have to start an elevated command prompt.  Make sure the Sysinternals suite is installed somewhere on the path, and enter the following commands:

  • cd “%systemroot%\System32\drivers”
  • strings * | findstr [tag]

Example:

image

Hopefully, you should now have the name of the offending driver. To gather more intel about it, use Sysinternals sigcheck:

image

In this case, the offending driver is part of Diskeeper.

Solution

You have to contact the manufacturer of the offending driver and check for an updated version. In the example, I got a version of Diskeeper without the offending driver, but a good place to start is the manufacturers website. And remember, if you already have the newest version, you can always try downgrading to a previous version. If you present your findings as detailed above, most vendors are very interested in identifying the cause of the leak and will work with you to resolve the issue.

Author: DizzyBadger

SQL Server DBA, Cluster expert, Principal Analyst

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.