I have had several issues in the past year involving kernel memory leaks, so I decided to make a separate blog post about general kernel memory leak analysis. In this post I mostly use the amazing Sysinternals tools for troubleshooting. You also need Poolmon.exe, a small utility currently part of the Windows Driver Kit. Sadly, this 35k self contained .exe is not available as a separate download, you have to download and install the entire 500+MiB WDK somewhere to extract it. You only have to do this once though, as there is no need to install the WDK on every system you analyze. You can just copy the executable from the machine where you installed the WDK.
Problem
Something is causing the kernel paged or non paged pools to rise uncontrollably. Screenshot from Process Explorer’s System Information dialog:
In this sample, the non paged pool has grown to an unhealthy 2,2GB, and continues to grow. Even though the pool limit is 128GIB and the server has a whopping 256GIB of RAM, the kernel memory pools are usually way below the 1GiB mark. You should of course baseline this to make sure you actually have an issue, but generally speaking, every time I find a Kernel memory value above 1GiB I go hunting for the cause.
Note: To show the pool limits, you have to enable symbols in Process Explorer. Scott Hanselman has blogged about that here: http://www.hanselman.com/blog/SetUpYourSystemToUseMicrosoftsPublicSymbolServer.aspx
Analysis
Kernel leaks are usually caused by a driver. Kernel leaks in the OS itself are very rare, unless you are running some sort of beta version of Windows. To investigate further, you have to fire up poolmon.exe.
Poolmon has a lot of shortcuts. From KB177415:
P – Sorts tag list by Paged, Non-Paged, or mixed. Note that P cycles through each one.
B – Sorts tags by max byte usage.
M – Sorts tags by max byte allocation.
T – Sort tags alphabetically by tag name.
E – Display Paged, Non-paged total across bottom. Cycles through.
A – Sorts tags by allocation size.
F – Sorts tags by “frees”.
S – Sorts tags by the differences of allocs and frees.
E – Display Paged, Non-paged total across bottom. Cycles through.
Q – Quit.
The important ones are “P”, to view either paged or non-paged pool tags, and “B”, to list the ones using the most of it at the top. The same view as above, after pressing “P” and “B”:
The “Cont” tag relates to “Contiguous physical memory allocations for device drivers”, and is usually the largest tag on a normal system.
And this screenshot is from the server with the non-paged leak:
As you can see, the LkaL tag is using more than 1GiB on its own, accounting for half of the pool. we have identified the pool tag, now we have to look for the driver that owns it. To do that, I use one of two methods:
1: Do an internet search for the pool tag.
http://blogs.technet.com/b/yongrhee/archive/2009/06/24/pool-tag-list.aspx contains a large list of tags.
2: Use Sysinternals strings together with findstr.
Most kernel mode drivers are located in “%systemroot%\System32\drivers”. First you have to start an elevated command prompt. Make sure the Sysinternals suite is installed somewhere on the path, and enter the following commands:
- cd “%systemroot%\System32\drivers”
- strings * | findstr [tag]
Example:
Hopefully, you should now have the name of the offending driver. To gather more intel about it, use Sysinternals sigcheck:
In this case, the offending driver is part of Diskeeper.
Solution
You have to contact the manufacturer of the offending driver and check for an updated version. In the example, I got a version of Diskeeper without the offending driver, but a good place to start is the manufacturers website. And remember, if you already have the newest version, you can always try downgrading to a previous version. If you present your findings as detailed above, most vendors are very interested in identifying the cause of the leak and will work with you to resolve the issue.