I pulled some old source code off the shelf recently, dusted it off and started modifying it. Okay, that was my first mistake… I really should have tested it first but that actually would not have helped this time.
When the system booted up, it successfully loaded drivers and then when it came to loading Explorer.exe all heck broke loose. Sometime the desktop would appear and other times it would not. When the desktop would appear, GWES would data abort. The system was a mess, and needed to be fixed.
We knew the problem wasn’t really in GWES, so we didn’t even try to find the source of the data abort. This was a good decision because it wouldn’t have told us anything even if we did track down where the problem was. 
Instead, we when through my list of known problems that could stop the system from working. If I ignored the data abort, it looked like the scheduler stopped scheduling threads to run which could be why the desktop sometimes not be displayed, which happened more often than it being displayed. Maybe it is the old Intel bug that caused the timer interrupt to stop for 20 minutes. Looking at the code, showed me that this project was old enough that it had not been fixed, so we fixed it and had it output a character every 100 milliseconds. To my surprise, the interrupt stopped. Not only that, but all interrupts stopped.
Being a hopeful kind of engineer, I decided that maybe I could write an application that would continue to run long enough to read registers and see why the interrupts had stopped. I went back to my desk and wrote the following:
int WINAPI WinMain(     HINSTANCE hInstance,
                                                                                HINSTANCE hPrevInstance,
                                                                                LPTSTR    lpCmdLine,
                                                                                int       nCmdShow)
                CeSetThreadPriority( GetCurrentThread(), 50 );
                                RETAILMSG( 1, (TEXT("L\n")));
                } while( TRUE );
It isn’t the most remarkable code that I have ever written, but before I put too much into it, I wanted to make sure that it ran long enough that it would be useful. This code lives a little on the edge; it sets the thread priority fairly high and doesn’t play nice with the system by blocking to let other threads run. My hope was that once it started running, that the scheduler would let it keep running even after interrupts stopped.
The system that I am working with includes an application that is started using the HKEY_LOCAL_MACHINE\Init key. That application then starts other applications it finds on a CompactFlash card. So I grabbed a CompactFlash card that I had lying around and copied the new test application to it. But first before testing, I thought that it would be good to look at the files on the card to see if there was anything that should be deleted, like a bootloader or OS image that might not be for this system. That is when it hit me like a ton of bricks what the problem was because it was something that I had seen, and fixed, before on all of my current projects (the ones that aren’t so dusty.)
The Problem
What happened when I looked at the CompactFlash card was that I saw that it had a little exe on it that is special. This application is special in that if the application that runs other applications sees this file on the CF card, then it copies it to the flash drive so that it can run again even when the CF card is removed. The start of the problem is how it copies the file, which I won’t go into great detail about except to say that it creates a zero byte file even if the file isn’t on the CF card, so the file always exists on the flash drive, but may not really be an executable file.
The code then tries to run the file from the flash drive. The code appeared to be good on the surface. Here is some pseudo code that shows what it was doing:
                Check for the application on the flash drive
                If the application exists on the flash drive
                                Call CreateProcess to start the application
                                Close the handles returned from CreateProcess to avoid leaking handles
Now the problem comes from closing the handles returned from CreateProcess without first checking to see if the call to CreateProcess was successful. As it turns out, if you call CreateProcess on a zero byte exe file, CreateProcess will fail – I know that is a shocker. To exacerbate this problem, the PROCESS_INFORMATION structure being passed into CreateProcess was declared on the stack, but not initialized in any way shape or form. So the values contained in the handles were unknown and certainly not set to INVALID_HANDLE_VALUE which means that when CloseHandle is called the values could appear to actually mean something.
So while I titled this post to suggest that it is about calling CloseHandle with invalid handles, this application was a Perfect Storm just waiting to bite me in the …  It really had three problems; creates a zero byte exe, doesn’t initialize the PROCESS_INFORMATION structure, and then didn’t check the result from CreateProcess before closing the handles. Any one of those problem, if fixed, would prevent the problem.
So in the end, by fixing this code we were able to boot the system successfully.
Copyright © 2008 – Bruce Eitman
All Rights Reserved