Monday, March 19, 2012

From GUIDs to ints

Say you have a complex database which, due to the usual reasons, was
designed with most PKs as GUIDs. Knowing better now (and replication
in SQL2K no longer requires GUIDs), you rework the database and use
identity ints as PKs (OK, still not my personal favorite technique,
but undeniably common out here in the world).
I'm guessing this kind of GUID-to-int redesign has happened a lot out
there over the past year or two. Any good war stories? Did it turn
out to be a good thing?
Thanks.
J.
jxstern,
Sounds like an ambitious, but worthwhile endeavour for future projects.
If your existing application is working and performing well, and you
think it will scale, I would leave alone. However if it is causing some
performance issues (confirmed by testing), I would go ahead and convert
to int.
The motto of "If it ain't broke, don't fix it." has served me well over
the years.
Mark Allison, SQL Server MVP
http://www.markallison.co.uk
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602m.html
jxstern wrote:
> Say you have a complex database which, due to the usual reasons, was
> designed with most PKs as GUIDs. Knowing better now (and replication
> in SQL2K no longer requires GUIDs), you rework the database and use
> identity ints as PKs (OK, still not my personal favorite technique,
> but undeniably common out here in the world).
> I'm guessing this kind of GUID-to-int redesign has happened a lot out
> there over the past year or two. Any good war stories? Did it turn
> out to be a good thing?
> Thanks.
> J.
>
|||On Fri, 05 Aug 2005 09:21:21 +0100, Mark Allison
<mark@.no.tinned.meat.mvps.org> wrote:
>Sounds like an ambitious, but worthwhile endeavour for future projects.
>If your existing application is working and performing well, and you
>think it will scale, I would leave alone. However if it is causing some
>performance issues (confirmed by testing), I would go ahead and convert
>to int.
>The motto of "If it ain't broke, don't fix it." has served me well over
>the years.
It's a fine motto I fully endorse, but in this case the system is
broken, confirmed by testing, profiling, perfmon, yada ya.
In fact, I find it very interesting and somewhat hard to explain, much
less believe, just how incredibly broken it is. That's a different
rant, but as a tease, initial estimates are that simply doing the
conversion will improve performance by 100x, yes, that's 100 times,
not just 100%. For a little database on a big server that might not
matter, but this is at least a medium database and even on a big
server it won't have nearly the required capacity until and unless we
spike the GUIDs.
Josh
|||On Fri, 05 Aug 2005 13:34:32 -0700, jxstern <jxstern@.nowhere.xyz>
wrote:
> initial estimates are that simply doing the
>conversion will improve performance by 100x, yes, that's 100 times,
>not just 100%.
Hmm, put a hold on that. Anecdotal reports are mixed, and a search of
archived discussions on the topic - that's mixed, too.
My assumption going in was that the conversion should result in a
modest 2x improvement, maybe a bit more. Including all FK's there's a
pretty significant effect on total database size, ok that's hard to
quantify exactly, too, but if it's even 10% that would ... well, maybe
that would just correspond to a linear improvement in performance of
10% also, now that I think about it. Or even ln(10%).
I even see cases where people are converting databases *from* ints
*to* GUIDs, and not just because of replication.
Bottom line, on further study, I'm more confused than before.
Tie in the debate about using surrogate keys in the first place, add
in a discussion of what happens to natural keys, or even
normalization, when you use surrogates, and hey, somebody ought to
write a book, you really can't enumerate the arguments, much less
decide between them, in anything shorter.
Fun and games,
J.
|||100X improvement would be totally unreasonable just attributed from changing
a GUID to an INT. Personally I hate GUID's myself and would always welcome
the chance to use a INT key over a GUID where possible. But performance is
most likely suffering from two aspects associated with the guids themselves.
One is the amount of page splitting that can occur on poorly designed
schemas especially if the clustered index is on the GUID. The second is the
fact the GUID's do take up more room. While disk space is cheap these days
it is the memory space I am talking about. Since a guid is 4 times large
than an int you can get a significant larger amount of rows onto a page with
ints than guids. That means better use of memory for data and index pages.
This is compounded if you have lots of page splits associated with the
guids. There are other aspects along those lines but those are the biggest I
see. The other thing that is most likely hurting you is not necessarily
attributed to the guid over the int but the guid's do tend to make it worse.
That is poor plan reuse. If you are not using stored procedures or at lease
queries that can be parameterized properly you will not get plan reuse. I
had a system once that used guids as PK's with clustered indexes on them.
They then did all adhoc queries. I made two simple changes. One was to
change the Ci to a better column that was not guid. The other was to change
a single adhoc call to a sp. I immediately received a 54,863% gain in
performance when pumping calls intothe box. We went from 8 procs at over
60% utilization down to .2% for much more calls.
Andrew J. Kelly SQL MVP
"JXStern" <JXSternChangeX2R@.gte.net> wrote in message
news:45n9f11h4vm4mqhfgsti98p91ble64b4jc@.4ax.com...
> On Fri, 05 Aug 2005 13:34:32 -0700, jxstern <jxstern@.nowhere.xyz>
> wrote:
> Hmm, put a hold on that. Anecdotal reports are mixed, and a search of
> archived discussions on the topic - that's mixed, too.
> My assumption going in was that the conversion should result in a
> modest 2x improvement, maybe a bit more. Including all FK's there's a
> pretty significant effect on total database size, ok that's hard to
> quantify exactly, too, but if it's even 10% that would ... well, maybe
> that would just correspond to a linear improvement in performance of
> 10% also, now that I think about it. Or even ln(10%).
> I even see cases where people are converting databases *from* ints
> *to* GUIDs, and not just because of replication.
> Bottom line, on further study, I'm more confused than before.
> Tie in the debate about using surrogate keys in the first place, add
> in a discussion of what happens to natural keys, or even
> normalization, when you use surrogates, and hey, somebody ought to
> write a book, you really can't enumerate the arguments, much less
> decide between them, in anything shorter.
> Fun and games,
> J.
>
|||Just to add that you can reduce page split by guid by using sequential guid.
Gert has made one available for sql2k.
http://sqldev.net/xp/xpguid.htm
-oj
"Andrew J. Kelly" <sqlmvpnooospam@.shadhawk.com> wrote in message
news:OY3m7nqmFHA.3960@.TK2MSFTNGP12.phx.gbl...
> 100X improvement would be totally unreasonable just attributed from
> changing a GUID to an INT. Personally I hate GUID's myself and would
> always welcome the chance to use a INT key over a GUID where possible. But
> performance is most likely suffering from two aspects associated with the
> guids themselves. One is the amount of page splitting that can occur on
> poorly designed schemas especially if the clustered index is on the GUID.
> The second is the fact the GUID's do take up more room. While disk space
> is cheap these days it is the memory space I am talking about. Since a
> guid is 4 times large than an int you can get a significant larger amount
> of rows onto a page with ints than guids. That means better use of memory
> for data and index pages. This is compounded if you have lots of page
> splits associated with the guids. There are other aspects along those
> lines but those are the biggest I see. The other thing that is most
> likely hurting you is not necessarily attributed to the guid over the int
> but the guid's do tend to make it worse. That is poor plan reuse. If you
> are not using stored procedures or at lease queries that can be
> parameterized properly you will not get plan reuse. I had a system once
> that used guids as PK's with clustered indexes on them. They then did all
> adhoc queries. I made two simple changes. One was to change the Ci to a
> better column that was not guid. The other was to change a single adhoc
> call to a sp. I immediately received a 54,863% gain in performance when
> pumping calls intothe box. We went from 8 procs at over 60% utilization
> down to .2% for much more calls.
> --
> Andrew J. Kelly SQL MVP
>
> "JXStern" <JXSternChangeX2R@.gte.net> wrote in message
> news:45n9f11h4vm4mqhfgsti98p91ble64b4jc@.4ax.com...
>

No comments:

Post a Comment