Geeks With Blogs
Chris Breisch   .NET Data Practices
Search this Blog!

I was doing a little research for a project the other day and stumbled upon this article on Informit.Com.  It's old (2002), so it doesn't even take into account new features in SQL Server 2005 (NEWSEQUENTIALID()), but does give you something to think about regarding GUIDs.

This table in particular is an eye-opener:

 

Result of Test A: INSERT of 500,000 Orders
Test A Time (with 500,000 Orders)
INTEGER 1
GUID 30
GUID (COMB) 1.1

To see what the difference between GUID and GUID(COMB) is, you'll have to read the article.  In short, it's similar to the newsequentialid() mentioned earlier.

Here are some pros and cons, according to Jeff Atwood:

GUID Pros

  • Unique across every table, every database, every server
  • Allows easy merging of records from different databases
  • Allows easy distribution of databases across multiple servers
  • You can generate IDs anywhere, instead of having to roundtrip to the database
  • Most replication scenarios require GUID columns anyway

GUID Cons
  • It is a whopping 4 times larger than the traditional 4-byte index value; this can have serious performance and storage implications if you're not careful
  • Cumbersome to debug (where userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}')
  • The generated GUIDs should be partially sequential for best performance (eg, newsequentialid() on SQL 2005) and to enable use of clustered indexes

There are still places where you'll likely want to use an INT (or, more likely, a composite key consisting of INTs), because that size issue is a big one when considering index performance.  If you're looking at very large tables, you'll want your indexes as small as possible (actually, that's a good guideline regardless, but becomes a bigger deal the larger the table is).

Remember that the speed of your index is determined by how many levels you have to traverse in the B-tree, and the number of levels is defined by how many index entries can fit on a page (8,060 bytes).  So, with a 4-byte index you get 1 level for the first 2,000 entries, 2 levels for the first 4,000,000, and 3 levels for the first 8,000,000,000.  If you expand the size of your index to 16 bytes, but you still have that 8,000,000,000 row table you now have 4 levels.  Still pretty good, but you have increased your search time by roughly 33%.  If data retrieval speed is highly important to you, this may be a factor.  These numbers get even worse if you're doing a lot of index scans versus index seeks, because with a large index, you'll likely have to scan multiple pages.

I recently had to deal with a legacy database where the primary key consisted of three 50 byte varchars.  And they wondered why their performance was bad.  But, even in that case, we could handle an 8,000,000,000 row table with only 6 levels in our index.

Posted on Tuesday, May 8, 2007 10:04 AM Database Practices , Architecture | Back to top


Comments on this post: The Cost of GUIDs as Primary Keys

No comments posted yet.
Your comment:
 (will show your gravatar)


Copyright © Chris J. Breisch | Powered by: GeeksWithBlogs.net