Geeks With Blogs
Paul Kelly

This is a cautionary tale about the differences between unmanaged languages like C++, and managed languages like .NET, and the hidden gotchas when you make assumptions. We have some functionality that uses temporary files for passing information held in .NET streams and strings to another process. The temporary files have to get cleared up eventually, although we aren't too fussy about when.  The functionality:

  1. Writes a string to a temporary source file.
  2. generates some directives to another temporary file
  3. starts a child process which processes the temporary files.
Initially the test failed because the child process said it couldn't find the directives file. The test only failed on our test machine; and when I put it under a debugger there, a different test in the same test suite failed instead. I started adding extra logging to the to see what was happening. The test now failed with a missing source file instead of a missing directives file; the logging showed that the directives file appeared to get generated and with the expected name. It looks like a race condition between two threads because the behaviour appears to be timing-sensitive. So what other threads are always running in a .NET application? There's always a garbage collector thread regardless of what your application does.
 
We use the .NET TempFileCollection object to manage our temporary files. You create an instance of this class, provide it with an extension and it returns the name of a temporary file you can use. The nice thing is that when the object is garbage collected, it removes all the temporary files for you. In .NET, an object can be garbage collected anytime that the garbage collector determines that there is no code with an outstanding reference to that object.
 
The failing code looks like this:
 
public Results ProcessThis(Parameters options, string source)
{
 
  TempFileCollection tempfiles = newTempFileCollection();
   // Create a temporary source file
   
string sourceFilename = tempfiles.AddExtension("xxx", false); 
   
CreateSourceFileFromStream(sourceFilename, source);
   
   Results r = ProcessHelper(options, sourceFilename);
   
return r;
}
 
The  source  file was being deleted before ProcessHelper() has had a chance to finish its work. In the end, I guessed that the garbage collector must be collecting tempfiles (and hence deleting all the temporary files created) while ProcessHelper() was still running. The proof was that the change in red fixed the test.
 
public Results ProcessThis(Parameters options, string source)
{
   TempFileCollection tempfiles = newTempFileCollection();
   // Create a temporary source file
   
string sourceFilename = tempfiles.AddExtension("xxx", false); 
   
CreateSourceFileFromStream(sourceFilename, source);
   
   Results r = ProcessHelper(options, sourceFilename);

   tempfiles.Delete(); // Deterministically delete the files rather than wait for GC
   
return r;
}
 
When I wrote the original code, (this is a good few years ago), I had assumed that the tempfiles reference is still on the stack while we are in the Compile() method, so it won't get collected until the reference goes out of scope at the end of the method (the tempfiles object itself is actually on the heap like all reference type objects). But in fact, the CLR or JIT optimizes away the reference as soon as we stop using it. So the garbage collector thinks it's fine to delete the object from the heap. Except it isn't because that also removes the file we were expecting to use. Adding the call to Delete() keeps the reference alive till the end of this method and makes the deletion of our temporary files deterministic rather than leaving it up to the garbage collector. Alternatively, you could wrap the code that relies on the temp file inside a C# using block (TempFileCollection implements IDisposable):
 
public Results ProcessThis(Parameters options, string source)
{
   TempFileCollection tempfiles = newTempFileCollection();
 
   // Create a temporary source file
   
using (tempfiles)
   {
   . . .
}
 
The moral is: C# might look a little bit like C++, but just because an object reference looks like it is still on the stack, it doesn't mean it is. And because garbage collection is  non-deterministic, if your object is looking after an unmanaged resource for you, that resource might go missing when don't expect it to.

 

Posted on Sunday, December 27, 2009 6:41 AM .NET | Back to top


Comments on this post: One of our tests is failing - heaps, stacks and garbage collection

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © cyberycon | Powered by: GeeksWithBlogs.net