TheChaseMan's Frenetic SoapBox

Always looking for better ways to do things...

To GUID...or not to GUID

To GUID or not to GUID...that is the question....

I would sure like to have the simple ability to create new records on the fly by creating a new GUID rather than dealing with identity columns. Obviously with IDENTITY columns you have to insert a record, then get back the inserted key value (old school @@IDENTITY, the correct way SCOPE_IDENTITY). Using GUIDs you could reduce round trips to the server, but at the same time there are performance issues. You never want to put a clustered index on a GUID column, index speed is slow anyway, GUID columns are large (16 bytes versus 4 on an Int32), sorting is a pain, more likely to cause table scans instead of index scans which hurts performance...

So what's your feedback? Here's what I've found on Google...

SQL Team .COM

The major advantage of using GUIDs is that they are unique across all space and time. This comes in handy if you're consolidating records from multiple SQL Servers into one table, as in a data warehousing situation. GUIDs are also used heavily by SQL Server replication to keep track of rows when they're spread out among multiple SQL Servers.

The main disadvantage to using GUIDs as key values is that they are BIG. At 16 bytes a pop, they are one of the largest datatypes in SQL Server. Indexes built on GUIDs are going to be larger and slower than indexes built on IDENTITY columns, which are usually ints (4 bytes).

Not only that, but they're just plain hard to read. Unless you need a truly globally unique identifier, you're probably better off sticking with an IDENTITY.

http://www.sqlteam.com/item.asp?ItemID=283

I can't stress enough that this should be your last option when designing your tables. It may not seem worth the effort to go to such lengths, but to paraphrase Ron Soukup, former SQL Server project manager, “Identity primary keys are for people who believe there’s never time to design a table right but there’s always time to do it over.”

http://www.sqlteam.com/item.asp?ItemID=2599

Definite Maybe?

Host: DavidSc (Microsoft)
Q: Would you recommend using a guid or an autoincrement column as a record id?
A: That's a hotly contested debate. Developers often feel strongly one way or the other and debates on the topic sometimes end with pie throwing and/or name calling.

Host: DavidSc (Microsoft)
AutoIncrements are smaller values, so they take up less space and yield better performance in queries. However, since they're server-generated, an insert into the database requires that you query the database for the new key value.

Host: DavidSc (Microsoft)
This also complicates the process of submitting new parent and child data, as discussed earlier. Also, there is no standard for auto-increment functionality across databases.

Host: DavidSc (Microsoft)
Relying on Guids can simplify some of these issues, since you're generating your key values at the client. However, these values are much larger than integers and are less performant during searches as a result.

Host: DavidSc (Microsoft)
Not all databases natively support a guid type, but guids can be cast as strings.

http://msdn.microsoft.com/chats/vstudio/vstudio_110403.asp

GUID and IDENTITY Keys

Avoid using GUIDs (globally unique identifiers) as keys in the data warehouse database. GUIDs may be used in data from distributed source systems, but they are difficult to use as table keys. GUIDs use a significant amount of storage (16 bytes each), cannot be efficiently sorted, and are difficult for humans to read. Indexes on GUID columns may be relatively slower than indexes on integer keys because GUIDs are four times larger. The Transact-SQL NEWID function can be used to create GUIDs for a column of uniqueidentifier data type, and the ROWGUIDCOL property can be set for such a column to indicate that the GUID values in the column uniquely identify rows in the table, but uniqueness is not enforced.

Because a uniqueidentifier data type cannot be sorted, the GUID cannot be used in a GROUP BY statement, nor can the occurrences of the uniqueidentifier GUID be distinctly counted—both GROUP BY and COUNT DISTINCT operations are very common in data warehouses. The uniqueidentifier GUID cannot be used as a measure in an Analysis Services cube.

The IDENTITY property and IDENTITY function can be used to create identity columns in tables and to manage series of generated numeric keys. IDENTITY functionality is more useful in surrogate key management than uniqueidentifier GUIDs. 

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/sql_dwdesign.asp

Another Google Search

The primary index tree will be high and narrow, especially for a large table. This means that every time you traverse the primary index tree, you'll travel through many levels. This is quite opposite to what you want from an index tree. An integer surrogate tree will be comparably very low and narrow, and you'll be able to come to the root level much more quickly. You'll have the same problem in every foreign key index, all for the same reason...You need to be very careful when using GUIDs as PK's in the database, because they tend to give you index trees that are high and narrow, which is the opposite to what you want from an index. You'll find that SQL Server -- to mention one database product -- will tend to use more table scans and less index searches because of that. Many times you'll find that you can't take that performance hit in the database, which is the weakest point in terms of scalability anyway, because it can't be entirely transparently distributed. ---Sten Sundblad (Microsoft Regional Director, Sweden)

 


Digg!

posted on Friday, May 14, 2004 1:00 PM

Feedback

No comments posted yet.