Home All Groups Group Topic Archive Search About

Update Timeout and indexes

Author
21 Nov 2007 7:38 PM
Jim
Hi,

I have two questions about Updating a large table.  It has about 40
million rows.  When I try to do an update on one of the fields, it
goes on forever until I kill it.  The last time this happened, after I
killed it, I noticed that the messages screen said that  the number of
records (40 million) were effected by the update  (that they had
changed).  It also had something about a Timeout error (sorry but I
did not catch the exact wording and I won't be running it again like
that).  In order to try to get by this, I do the update in sections
(update the first 2 million, then the next, etc...)  My thought is
that some sort of timeout is occuring at the end and the connection is
lost.  Is that correct?

The other question has to do with updating a field that is indexed.
When is the index changed?  Is it each time the field is changed
(which would be 40 million times for my update above) or is it done
all at once at the end?

My idea is that the index is being changed at the end of the big
update (40 million rows) and it is timeing out.  The problem with that
theory though is that the problem with the 40 million row update
happens if the updated field is indexed or not.  Of course, there are
indexes on other fields so it could still be the cause.

Any ideas?

Thanks alot!

Jim

Author
21 Nov 2007 8:55 PM
Andrew J. Kelly
It's almost never a good idea to update that many rows at one time. You
would be better off wrapping the updates in a loop and updating say 5K or
10K rows at a time. That not only gives you better concurrency but will
usually finish much faster. If you have an index on the column being updated
and the table is not in use other than the update you might be better off
dropping the index and recreating it once all the updates are done. Also
make sure you have plenty of free space in your log file to cover all the
updates.

--
Andrew J. Kelly    SQL MVP
Solid Quality Mentors


Show quote
"Jim" <jsh***@datamann.com> wrote in message
news:85ef74d9-7c5a-414c-b670-aa5708c00655@l1g2000hsa.googlegroups.com...
>
> Hi,
>
> I have two questions about Updating a large table.  It has about 40
> million rows.  When I try to do an update on one of the fields, it
> goes on forever until I kill it.  The last time this happened, after I
> killed it, I noticed that the messages screen said that  the number of
> records (40 million) were effected by the update  (that they had
> changed).  It also had something about a Timeout error (sorry but I
> did not catch the exact wording and I won't be running it again like
> that).  In order to try to get by this, I do the update in sections
> (update the first 2 million, then the next, etc...)  My thought is
> that some sort of timeout is occuring at the end and the connection is
> lost.  Is that correct?
>
> The other question has to do with updating a field that is indexed.
> When is the index changed?  Is it each time the field is changed
> (which would be 40 million times for my update above) or is it done
> all at once at the end?
>
> My idea is that the index is being changed at the end of the big
> update (40 million rows) and it is timeing out.  The problem with that
> theory though is that the problem with the 40 million row update
> happens if the updated field is indexed or not.  Of course, there are
> indexes on other fields so it could still be the cause.
>
> Any ideas?
>
> Thanks alot!
>
> Jim
Author
23 Nov 2007 1:50 PM
Jim
Thanks!

The scripts that I have to do the loop on a smaller update statement
are useful.  Each update runs quickly (about 20 minutes for 2 million
rows) but there are several databases  that have a large number or
rows.   I can easily generalize the scripts so that it work anywhere
but I'm wondering why this is a problem.  In the grand scheme of
things, all tables are getting larger and 40 million rows is not
alot.  It should work (and maybe even faster than the speed I got from
the sectioning :-)

As for updating an indexed field, that too should work without a
decrease in speed.




Show quote
On Nov 21, 3:55 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com>
wrote:
> It's almost never a good idea to update that many rows at one time. You
> would be better off wrapping the updates in a loop and updating say 5K or
> 10K rows at a time. That not only gives you better concurrency but will
> usually finish much faster. If you have an index on the column being updated
> and the table is not in use other than the update you might be better off
> dropping the index and recreating it once all the updates are done. Also
> make sure you have plenty of free space in your log file to cover all the
> updates.
>
> --
> Andrew J. Kelly    SQL MVP
> Solid Quality Mentors
>
> "Jim" <jsh***@datamann.com> wrote in message
>
> news:85ef74d9-7c5a-414c-b670-aa5708c00655@l1g2000hsa.googlegroups.com...
>
>
>
>
>
> > Hi,
>
> > I have two questions about Updating a large table.  It has about 40
> > million rows.  When I try to do an update on one of the fields, it
> > goes on forever until I kill it.  The last time this happened, after I
> > killed it, I noticed that the messages screen said that  the number of
> > records (40 million) were effected by the update  (that they had
> > changed).  It also had something about a Timeout error (sorry but I
> > did not catch the exact wording and I won't be running it again like
> > that).  In order to try to get by this, I do the update in sections
> > (update the first 2 million, then the next, etc...)  My thought is
> > that some sort of timeout is occuring at the end and the connection is
> > lost.  Is that correct?
>
> > The other question has to do with updating a field that is indexed.
> > When is the index changed?  Is it each time the field is changed
> > (which would be 40 million times for my update above) or is it done
> > all at once at the end?
>
> > My idea is that the index is being changed at the end of the big
> > update (40 million rows) and it is timeing out.  The problem with that
> > theory though is that the problem with the 40 million row update
> > happens if the updated field is indexed or not.  Of course, there are
> > indexes on other fields so it could still be the cause.
>
> > Any ideas?
>
> > Thanks alot!
>
> > Jim- Hide quoted text -
>
> - Show quoted text -
Author
23 Nov 2007 3:57 PM
Andrew J. Kelly
The default timeout for most ODBC connections is 30 seconds.  So if your
update takes longer than that it can very easily timeout. You can change the
timeout to anything you like so it does not timeout but if you are trying to
update 40 million rows online your other users will not be happy with you.
The amount of time it takes depends on lots of factors such as the hardware
and database configurations, the type of updates, number of rows affected,
blocking, number and type of indexes etc.  The hardware itself will dictate
a lot of how fast this happens so if you plan to do lots of these large
updates you need to ensure the hardware is setup for that as well. Your
transaction log will also play a large role in this and if you have the tran
log on the same drive array as the data files it will make things much
worse. I don't know how fast you think it should be but I don't believe you
realize everything that is happening when you make an update such as that.

--
Andrew J. Kelly    SQL MVP
Solid Quality Mentors


Show quote
"Jim" <jsh***@datamann.com> wrote in message
news:d7dd33f0-78ca-4af2-a288-84c174b149ef@l1g2000hsa.googlegroups.com...
>
> Thanks!
>
> The scripts that I have to do the loop on a smaller update statement
> are useful.  Each update runs quickly (about 20 minutes for 2 million
> rows) but there are several databases  that have a large number or
> rows.   I can easily generalize the scripts so that it work anywhere
> but I'm wondering why this is a problem.  In the grand scheme of
> things, all tables are getting larger and 40 million rows is not
> alot.  It should work (and maybe even faster than the speed I got from
> the sectioning :-)
>
> As for updating an indexed field, that too should work without a
> decrease in speed.
>
>
>
>
> On Nov 21, 3:55 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com>
> wrote:
>> It's almost never a good idea to update that many rows at one time. You
>> would be better off wrapping the updates in a loop and updating say 5K or
>> 10K rows at a time. That not only gives you better concurrency but will
>> usually finish much faster. If you have an index on the column being
>> updated
>> and the table is not in use other than the update you might be better off
>> dropping the index and recreating it once all the updates are done. Also
>> make sure you have plenty of free space in your log file to cover all the
>> updates.
>>
>> --
>> Andrew J. Kelly    SQL MVP
>> Solid Quality Mentors
>>
>> "Jim" <jsh***@datamann.com> wrote in message
>>
>> news:85ef74d9-7c5a-414c-b670-aa5708c00655@l1g2000hsa.googlegroups.com...
>>
>>
>>
>>
>>
>> > Hi,
>>
>> > I have two questions about Updating a large table.  It has about 40
>> > million rows.  When I try to do an update on one of the fields, it
>> > goes on forever until I kill it.  The last time this happened, after I
>> > killed it, I noticed that the messages screen said that  the number of
>> > records (40 million) were effected by the update  (that they had
>> > changed).  It also had something about a Timeout error (sorry but I
>> > did not catch the exact wording and I won't be running it again like
>> > that).  In order to try to get by this, I do the update in sections
>> > (update the first 2 million, then the next, etc...)  My thought is
>> > that some sort of timeout is occuring at the end and the connection is
>> > lost.  Is that correct?
>>
>> > The other question has to do with updating a field that is indexed.
>> > When is the index changed?  Is it each time the field is changed
>> > (which would be 40 million times for my update above) or is it done
>> > all at once at the end?
>>
>> > My idea is that the index is being changed at the end of the big
>> > update (40 million rows) and it is timeing out.  The problem with that
>> > theory though is that the problem with the 40 million row update
>> > happens if the updated field is indexed or not.  Of course, there are
>> > indexes on other fields so it could still be the cause.
>>
>> > Any ideas?
>>
>> > Thanks alot!
>>
>> > Jim- Hide quoted text -
>>
>> - Show quoted text -
>
Author
26 Nov 2007 2:06 PM
jimshain
Yes, I understand that the hardware plays a large part in the time the
update takes.  I have the configurations you discuss below and  the
users are ok with a long update running.

When I run the update in 2 million row chunks, each set takes about 20
minutes.  With the 40 million row update, I would expect it to take 20
* 20 = 400 minutes (ok, maybe a little longer :-).  Is that
unreasonable? What happens is that something times out and after a day
of running, I kill it and it tries to undo the update.  I then reboot
the system and the server runs a recovery which takes several
hours....  Very frustrating...

You mention a timeout below.  What is it?  Can I make it longer?  I
still think it is timing out when it is doing the indexing portion of
the update.  I  can't test it by dropping the indexes and running it
because it has several foreign indexes.

I don't think I mentioned it above but I'm running SQL Server 200 on
an XP machine.

Thanks for your help!

Jim

Show quote
On Nov 23, 10:57 am, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com>
wrote:
> The default timeout for most ODBC connections is 30 seconds.  So if yourupdatetakes longer than that it can very easily timeout. You can change the
> timeout to anything you like so it does not timeout but if you are trying toupdate40millionrowsonline your other users will not be happy with you.
> The amount of time it takes depends on lots of factors such as the hardware
> and database configurations, the type of updates, number ofrowsaffected,
> blocking, number and type of indexes etc.  The hardware itself will dictate
> a lot of how fast this happens so if you plan to do lots of these large
> updates you need to ensure the hardware is setup for that as well. Your
> transaction log will also play a large role in this and if you have the tran
> log on the same drive array as the data files it will make things much
> worse. I don't know how fast you think it should be but I don't believe you
> realize everything that is happening when you make anupdatesuch as that.
>
> --
> Andrew J. Kelly    SQL MVP
> Solid Quality Mentors
>
> "Jim" <jsh***@datamann.com> wrote in message
>
> news:d7dd33f0-78ca-4af2-a288-84c174b149ef@l1g2000hsa.googlegroups.com...
>
>
>
>
>
> > Thanks!
>
> > The scripts that I have to do the loop on a smallerupdatestatement
> > are useful.  Eachupdateruns quickly (about 20 minutes for 2million
> >rows) but there are several databases  that have a large number or
> >rows.   I can easily generalize the scripts so that it work anywhere
> > but I'm wondering why this is a problem.  In the grand scheme of
> > things, all tables are getting larger and40millionrowsis not
> > alot.  It should work (and maybe even faster than the speed I got from
> > the sectioning :-)
>
> > As for updating an indexed field, that too should work without a
> > decrease in speed.
>
> > On Nov 21, 3:55 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com>
> > wrote:
> >> It's almost never a good idea toupdatethat manyrowsat one time. You
> >> would be better off wrapping the updates in a loop and updating say 5K or
> >> 10Krowsat a time. That not only gives you better concurrency but will
> >> usually finish much faster. If you have an index on the column being
> >> updated
> >> and the table is not in use other than theupdateyou might be better off
> >> dropping the index and recreating it once all the updates are done. Also
> >> make sure you have plenty of free space in your log file to cover all the
> >> updates.
>
> >> --
> >> Andrew J. Kelly    SQL MVP
> >> Solid Quality Mentors
>
> >> "Jim" <jsh***@datamann.com> wrote in message
>
> >>news:85ef74d9-7c5a-414c-b670-aa5708c00655@l1g2000hsa.googlegroups.com...
>
> >> > Hi,
>
> >> > I have two questions about Updating a large table.  It has about40
> >> >millionrows.  When I try to do anupdateon one of the fields, it
> >> > goes on forever until I kill it.  The last time this happened, after I
> >> > killed it, I noticed that the messages screen said that  the number of
> >> > records (40million) were effected by theupdate (that they had
> >> > changed).  It also had something about a Timeout error (sorry but I
> >> > did not catch the exact wording and I won't be running it again like
> >> > that).  In order to try to get by this, I do theupdatein sections
> >> > (updatethe first 2million, then the next, etc...)  My thought is
> >> > that some sort of timeout is occuring at the end and the connection is
> >> > lost.  Is that correct?
>
> >> > The other question has to do with updating a field that is indexed.
> >> > When is the index changed?  Is it each time the field is changed
> >> > (which would be40milliontimes for myupdateabove) or is it done
> >> > all at once at the end?
>
> >> > My idea is that the index is being changed at the end of the big
> >> >update(40millionrows) and it is timeing out.  The problem with that
> >> > theory though is that the problem with the40millionrowupdate
> >> > happens if the updated field is indexed or not.  Of course, there are
> >> > indexes on other fields so it could still be the cause.
>
> >> > Any ideas?
>
> >> > Thanks alot!
>
> >> > Jim- Hide quoted text -
>
> >> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -
Author
26 Nov 2007 4:00 PM
TheSQLGuru
Actually your 20*20 theory is probably 'unreasonable'.  Server resources
often reach a point of utilization I call the break-over point, whereby
increased requests for the resource lead to SLOWER performance - sometimes
slowing at an increasing rate with increased requests.  The 40M row update
batch could well take 500 or even 1000 minutes, depending on the slope of
the curve.

Yes, applications can manually set their connection timeout.  The mechanism
for this varies with the differing connectivity methods.  Are you using a
DSN or DSN-less connection?  Does the code present an ini file you can use?

--
Kevin G. Boles
TheSQLGuru
Indicium Resources, Inc.


<jimsh***@gmail.com> wrote in message
Show quote
news:faade63f-27fc-427a-a859-9873ca07144c@o6g2000hsd.googlegroups.com...
>
>
> Yes, I understand that the hardware plays a large part in the time the
> update takes.  I have the configurations you discuss below and  the
> users are ok with a long update running.
>
> When I run the update in 2 million row chunks, each set takes about 20
> minutes.  With the 40 million row update, I would expect it to take 20
> * 20 = 400 minutes (ok, maybe a little longer :-).  Is that
> unreasonable? What happens is that something times out and after a day
> of running, I kill it and it tries to undo the update.  I then reboot
> the system and the server runs a recovery which takes several
> hours....  Very frustrating...
>
> You mention a timeout below.  What is it?  Can I make it longer?  I
> still think it is timing out when it is doing the indexing portion of
> the update.  I  can't test it by dropping the indexes and running it
> because it has several foreign indexes.
>
> I don't think I mentioned it above but I'm running SQL Server 200 on
> an XP machine.
>
> Thanks for your help!
>
> Jim
>
> On Nov 23, 10:57 am, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com>
> wrote:
>> The default timeout for most ODBC connections is 30 seconds.  So if
>> yourupdatetakes longer than that it can very easily timeout. You can
>> change the
>> timeout to anything you like so it does not timeout but if you are trying
>> toupdate40millionrowsonline your other users will not be happy with you.
>> The amount of time it takes depends on lots of factors such as the
>> hardware
>> and database configurations, the type of updates, number ofrowsaffected,
>> blocking, number and type of indexes etc.  The hardware itself will
>> dictate
>> a lot of how fast this happens so if you plan to do lots of these large
>> updates you need to ensure the hardware is setup for that as well. Your
>> transaction log will also play a large role in this and if you have the
>> tran
>> log on the same drive array as the data files it will make things much
>> worse. I don't know how fast you think it should be but I don't believe
>> you
>> realize everything that is happening when you make anupdatesuch as that.
>>
>> --
>> Andrew J. Kelly    SQL MVP
>> Solid Quality Mentors
>>
>> "Jim" <jsh***@datamann.com> wrote in message
>>
>> news:d7dd33f0-78ca-4af2-a288-84c174b149ef@l1g2000hsa.googlegroups.com...
>>
>>
>>
>>
>>
>> > Thanks!
>>
>> > The scripts that I have to do the loop on a smallerupdatestatement
>> > are useful.  Eachupdateruns quickly (about 20 minutes for 2million
>> >rows) but there are several databases  that have a large number or
>> >rows.   I can easily generalize the scripts so that it work anywhere
>> > but I'm wondering why this is a problem.  In the grand scheme of
>> > things, all tables are getting larger and40millionrowsis not
>> > alot.  It should work (and maybe even faster than the speed I got from
>> > the sectioning :-)
>>
>> > As for updating an indexed field, that too should work without a
>> > decrease in speed.
>>
>> > On Nov 21, 3:55 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com>
>> > wrote:
>> >> It's almost never a good idea toupdatethat manyrowsat one time. You
>> >> would be better off wrapping the updates in a loop and updating say 5K
>> >> or
>> >> 10Krowsat a time. That not only gives you better concurrency but will
>> >> usually finish much faster. If you have an index on the column being
>> >> updated
>> >> and the table is not in use other than theupdateyou might be better
>> >> off
>> >> dropping the index and recreating it once all the updates are done.
>> >> Also
>> >> make sure you have plenty of free space in your log file to cover all
>> >> the
>> >> updates.
>>
>> >> --
>> >> Andrew J. Kelly    SQL MVP
>> >> Solid Quality Mentors
>>
>> >> "Jim" <jsh***@datamann.com> wrote in message
>>
>> >>news:85ef74d9-7c5a-414c-b670-aa5708c00655@l1g2000hsa.googlegroups.com...
>>
>> >> > Hi,
>>
>> >> > I have two questions about Updating a large table.  It has about40
>> >> >millionrows.  When I try to do anupdateon one of the fields, it
>> >> > goes on forever until I kill it.  The last time this happened, after
>> >> > I
>> >> > killed it, I noticed that the messages screen said that  the number
>> >> > of
>> >> > records (40million) were effected by theupdate (that they had
>> >> > changed).  It also had something about a Timeout error (sorry but I
>> >> > did not catch the exact wording and I won't be running it again like
>> >> > that).  In order to try to get by this, I do theupdatein sections
>> >> > (updatethe first 2million, then the next, etc...)  My thought is
>> >> > that some sort of timeout is occuring at the end and the connection
>> >> > is
>> >> > lost.  Is that correct?
>>
>> >> > The other question has to do with updating a field that is indexed.
>> >> > When is the index changed?  Is it each time the field is changed
>> >> > (which would be40milliontimes for myupdateabove) or is it done
>> >> > all at once at the end?
>>
>> >> > My idea is that the index is being changed at the end of the big
>> >> >update(40millionrows) and it is timeing out.  The problem with that
>> >> > theory though is that the problem with the40millionrowupdate
>> >> > happens if the updated field is indexed or not.  Of course, there
>> >> > are
>> >> > indexes on other fields so it could still be the cause.
>>
>> >> > Any ideas?
>>
>> >> > Thanks alot!
>>
>> >> > Jim- Hide quoted text -
>>
>> >> - Show quoted text -- Hide quoted text -
>>
>> - Show quoted text -
>
Author
26 Nov 2007 5:43 PM
jimshain
I agree with the time estimate increase you mention below.  It just
seems weird that the whole thing would timeout though.  The server
could do the breakup if there were alot of rows (IMHO).

Do you know  if SQL 2005 or even the latest version would be better at
large updates?

I have not setup a DSN on the server.  There isn't an ini file that I
know of.

Thanks again!


Jim

Show quote
On Nov 26, 11:00 am, "TheSQLGuru" <kgbo***@earthlink.net> wrote:
> Actually your 20*20 theory is probably 'unreasonable'.  Server resources
> often reach a point of utilization I call the break-over point, whereby
> increased requests for the resource lead to SLOWER performance - sometimes
> slowing at an increasing rate with increased requests.  The 40M rowupdate
> batch could well take 500 or even 1000 minutes, depending on the slope of
> the curve.
>
> Yes, applications can manually set their connection timeout.  The mechanism
> for this varies with the differing connectivity methods.  Are you using a
> DSN or DSN-less connection?  Does the code present an ini file you can use?
>
> --
> Kevin G. Boles
> TheSQLGuru
> Indicium Resources, Inc.
>
> <jimsh***@gmail.com> wrote in message
>
> news:faade63f-27fc-427a-a859-9873ca07144c@o6g2000hsd.googlegroups.com...
>
>
>
>
>
> > Yes, I understand that the hardware plays a large part in the time the
> >updatetakes.  I have the configurations you discuss below and  the
> > users are ok with a longupdaterunning.
>
> > When I run theupdatein 2millionrow chunks, each set takes about 20
> > minutes.  With the40millionrowupdate, I would expect it to take 20
> > * 20 = 400 minutes (ok, maybe a little longer :-).  Is that
> > unreasonable? What happens is that something times out and after a day
> > of running, I kill it and it tries to undo theupdate.  I then reboot
> > the system and the server runs a recovery which takes several
> > hours....  Very frustrating...
>
> > You mention a timeout below.  What is it?  Can I make it longer?  I
> > still think it is timing out when it is doing the indexing portion of
> > theupdate.  I  can't test it by dropping the indexes and running it
> > because it has several foreign indexes.
>
> > I don't think I mentioned it above but I'm running SQL Server 200 on
> > an XP machine.
>
> > Thanks for your help!
>
> > Jim
>
> > On Nov 23, 10:57 am, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com>
> > wrote:
> >> The default timeout for most ODBC connections is 30 seconds.  So if
> >> yourupdatetakes longer than that it can very easily timeout. You can
> >> change the
> >> timeout to anything you like so it does not timeout but if you are trying
> >> toupdate40millionrowsonline your other users will not be happy with you.
> >> The amount of time it takes depends on lots of factors such as the
> >> hardware
> >> and database configurations, the type of updates, number ofrowsaffected,
> >> blocking, number and type of indexes etc.  The hardware itself will
> >> dictate
> >> a lot of how fast this happens so if you plan to do lots of these large
> >> updates you need to ensure the hardware is setup for that as well. Your
> >> transaction log will also play a large role in this and if you have the
> >> tran
> >> log on the same drive array as the data files it will make things much
> >> worse. I don't know how fast you think it should be but I don't believe
> >> you
> >> realize everything that is happening when you make anupdatesuch as that.
>
> >> --
> >> Andrew J. Kelly    SQL MVP
> >> Solid Quality Mentors
>
> >> "Jim" <jsh***@datamann.com> wrote in message
>
> >>news:d7dd33f0-78ca-4af2-a288-84c174b149ef@l1g2000hsa.googlegroups.com...
>
> >> > Thanks!
>
> >> > The scripts that I have to do the loop on a smallerupdatestatement
> >> > are useful.  Eachupdateruns quickly (about 20 minutes for 2million
> >> >rows) but there are several databases  that have a large number or
> >> >rows.   I can easily generalize the scripts so that it work anywhere
> >> > but I'm wondering why this is a problem.  In the grand scheme of
> >> > things, all tables are getting larger and40millionrowsis not
> >> > alot.  It should work (and maybe even faster than the speed I got from
> >> > the sectioning :-)
>
> >> > As for updating an indexed field, that too should work without a
> >> > decrease in speed.
>
> >> > On Nov 21, 3:55 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com>
> >> > wrote:
> >> >> It's almost never a good idea toupdatethat manyrowsat one time. You
> >> >> would be better off wrapping the updates in a loop and updating say 5K
> >> >> or
> >> >> 10Krowsat a time. That not only gives you better concurrency but will
> >> >> usually finish much faster. If you have an index on the column being
> >> >> updated
> >> >> and the table is not in use other than theupdateyou might be better
> >> >> off
> >> >> dropping the index and recreating it once all the updates are done.
> >> >> Also
> >> >> make sure you have plenty of free space in your log file to cover all
> >> >> the
> >> >> updates.
>
> >> >> --
> >> >> Andrew J. Kelly    SQL MVP
> >> >> Solid Quality Mentors
>
> >> >> "Jim" <jsh***@datamann.com> wrote in message
>
> >> >>news:85ef74d9-7c5a-414c-b670-aa5708c00655@l1g2000hsa.googlegroups.com...
>
> >> >> > Hi,
>
> >> >> > I have two questions about Updating a large table.  It has about40
> >> >> >millionrows.  When I try to do anupdateon one of the fields, it
> >> >> > goes on forever until I kill it.  The last time this happened, after
> >> >> > I
> >> >> > killed it, I noticed that the messages screen said that  the number
> >> >> > of
> >> >> > records (40million) were effected by theupdate (that they had
> >> >> > changed).  It also had something about a Timeout error (sorry but I
> >> >> > did not catch the exact wording and I won't be running it again like
> >> >> > that).  In order to try to get by this, I do theupdatein sections
> >> >> > (updatethe first 2million, then the next, etc...)  My thought is
> >> >> > that some sort of timeout is occuring at the end and the connection
> >> >> > is
> >> >> > lost.  Is that correct?
>
> >> >> > The other question has to do with updating a field that is indexed.
> >> >> > When is the index changed?  Is it each time the field is changed
> >> >> > (which would be40milliontimes for myupdateabove) or is it done
> >> >> > all at once at the end?
>
> >> >> > My idea is that the index is being changed at the end of the big
> >> >> >update(40millionrows) and it is timeing out.  The problem with that
> >> >> > theory though is that the problem with the40millionrowupdate
> >> >> > happens if the updated field is indexed or not.  Of course, there
> >> >> > are
> >> >> > indexes on other fields so it could still be the cause.
>
> >> >> > Any ideas?
>
> >> >> > Thanks alot!
>
> >> >> > Jim- Hide quoted text -
>
> >> >> - Show quoted text -- Hide quoted text -
>
> >> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -
Author
27 Nov 2007 12:27 AM
Andrew J. Kelly
> I agree with the time estimate increase you mention below.  It just
> seems weird that the whole thing would timeout though.

I agree with Kevin on this as the time will definitely not be double for
updating double the rows with that many.

> The server
> could do the breakup if there were allot of rows (IMHO).

No that would be bad. The update must be atomic across all the rows being
updated otherwise the database would not be very well accepted in the
enterprise.  You can't leave it up to the Engine to determine where to break
this up that is your job because it may be different for each instance.


> Do you know  if SQL 2005 or even the latest version would be better at
> large updates?

The principles are the same. You have to understand everything that happens
when you do an update of any size and then multiply that by x for larger
ones. Each row that gets updated must be logged to the transaction log in a
way that it can redo or undo in the event of issues. The larger the
transaction the more overhead it will take and the more it will affect other
users and other operations in the transaction log. For instance you can't
truncate or reuse any portion of the log that has open transactions in it.
By breaking it up into smaller transactions you dramatically reduce the
exposure and overhead of each transaction. This is how it has been from day
one and will be tomorrow and the day after. But again the biggest factor in
how long it takes is your hardware setup. Having the data and log files on
the same physical drive when trying to do large data modifications is asking
for trouble. If you want fast (or reasonable) updates on 40 million rows you
need to ensure you have the proper hardware for the task.

As for the timeouts that can be set at the connection level. Change the
timeout on the connection to infinite and it should not timeout.




--
Andrew J. Kelly    SQL MVP
Solid Quality Mentors


<jimsh***@gmail.com> wrote in message
Show quote
news:f88c15e0-fbb4-45b1-86f6-7f9de0f4bb30@e25g2000prg.googlegroups.com...
>
Author
27 Nov 2007 2:14 PM
jimshain
Thanks!

The log and db files have been  on separate disks  since before the
problem occurred.

I agree that the hardware configuration dictates the time an update
will take.

It's down below in one of the messages so I just want to point out
that when I killed the larger update last time, I got a  message
saying that there were 40 million records affected.  There was then a
timeout message below that. These messages indicate to me that the
update and the indexing completed without a problem.  It also
indicates to me that the Hardware configuration is acceptable for a 40
million row update. It also indicates to me that the server is not
waiting around for the update to complete.  That is how I got to
trying to fix the server timeout problem.  The hardware is fine.  The
timeout is not.

Thanks again!

Jim


Show quote
On Nov 26, 7:27 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com>
wrote:
> > I agree with the time estimate increase you mention below.  It just
> > seems weird that the whole thing would timeout though.
>
> I agree with Kevin on this as the time will definitely not be double for
> updating double therowswith that many.
>
> > The server
> > could do the breakup if there were allot ofrows(IMHO).
>
> No that would be bad. Theupdatemust be atomic across all therowsbeing
> updated otherwise the database would not be very well accepted in the
> enterprise.  You can't leave it up to the Engine to determine where to break
> this up that is your job because it may be different for each instance.
>
> > Do you know  if SQL 2005 or even the latest version would be better at
> > large updates?
>
> The principles are the same. You have to understand everything that happens
> when you do anupdateof any size and then multiply that by x for larger
> ones. Each row that gets updated must be logged to the transaction log in a
> way that it can redo or undo in the event of issues. The larger the
> transaction the more overhead it will take and the more it will affect other
> users and other operations in the transaction log. For instance you can't
> truncate or reuse any portion of the log that has open transactions in it.
> By breaking it up into smaller transactions you dramatically reduce the
> exposure and overhead of each transaction. This is how it has been from day
> one and will be tomorrow and the day after. But again the biggest factor in
> how long it takes is your hardware setup. Having the data and log files on
> the same physical drive when trying to do large data modifications is asking
> for trouble. If you want fast (or reasonable) updates on40millionrowsyou
> need to ensure you have the proper hardware for the task.
>
> As for the timeouts that can be set at the connection level. Change the
> timeout on the connection to infinite and it should not timeout.
>
> --
> Andrew J. Kelly    SQL MVP
> Solid Quality Mentors
>
> <jimsh***@gmail.com> wrote in message
>
> news:f88c15e0-fbb4-45b1-86f6-7f9de0f4bb30@e25g2000prg.googlegroups.com...
>
>
>
> - Hide quoted text -
>
> - Show quoted text -
Author
27 Nov 2007 2:50 PM
Andrew J. Kelly
Jim,

> The log and db files have been  on separate disks  since before the
> problem occurred.

Yes but are they on separate physical disks or just logical? It makes a big
difference and I wasn't sure which way you have it.

> I got a  message
> saying that there were 40 million records affected.  There was then a
> timeout message below that.

Interesting, what was the exact code or sequence you used for the update?
Did you wrap it in a transaction or was there any other code in the batch?
You can run a profiler trace at the statement level (both starting and
completed events) for that spid to see what is happening and when in
relation to the timeout.  And I would also try changing the connection
timeout to infinite to see if that changes the behavior. Between the two it
should narrow down the issue significantly.


--
Andrew J. Kelly    SQL MVP
Solid Quality Mentors


<jimsh***@gmail.com> wrote in message
Show quote
news:eb71f288-ef8c-47bf-bf44-4a0e285339e7@v4g2000hsf.googlegroups.com...
>
> Thanks!
>
> The log and db files have been  on separate disks  since before the
> problem occurred.
>
> I agree that the hardware configuration dictates the time an update
> will take.
>
> It's down below in one of the messages so I just want to point out
> that when I killed the larger update last time, I got a  message
> saying that there were 40 million records affected.  There was then a
> timeout message below that. These messages indicate to me that the
> update and the indexing completed without a problem.  It also
> indicates to me that the Hardware configuration is acceptable for a 40
> million row update. It also indicates to me that the server is not
> waiting around for the update to complete.  That is how I got to
> trying to fix the server timeout problem.  The hardware is fine.  The
> timeout is not.
>
> Thanks again!
>
> Jim
>
>
> On Nov 26, 7:27 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com>
> wrote:
>> > I agree with the time estimate increase you mention below.  It just
>> > seems weird that the whole thing would timeout though.
>>
>> I agree with Kevin on this as the time will definitely not be double for
>> updating double therowswith that many.
>>
>> > The server
>> > could do the breakup if there were allot ofrows(IMHO).
>>
>> No that would be bad. Theupdatemust be atomic across all therowsbeing
>> updated otherwise the database would not be very well accepted in the
>> enterprise.  You can't leave it up to the Engine to determine where to
>> break
>> this up that is your job because it may be different for each instance.
>>
>> > Do you know  if SQL 2005 or even the latest version would be better at
>> > large updates?
>>
>> The principles are the same. You have to understand everything that
>> happens
>> when you do anupdateof any size and then multiply that by x for larger
>> ones. Each row that gets updated must be logged to the transaction log in
>> a
>> way that it can redo or undo in the event of issues. The larger the
>> transaction the more overhead it will take and the more it will affect
>> other
>> users and other operations in the transaction log. For instance you can't
>> truncate or reuse any portion of the log that has open transactions in
>> it.
>> By breaking it up into smaller transactions you dramatically reduce the
>> exposure and overhead of each transaction. This is how it has been from
>> day
>> one and will be tomorrow and the day after. But again the biggest factor
>> in
>> how long it takes is your hardware setup. Having the data and log files
>> on
>> the same physical drive when trying to do large data modifications is
>> asking
>> for trouble. If you want fast (or reasonable) updates on40millionrowsyou
>> need to ensure you have the proper hardware for the task.
>>
>> As for the timeouts that can be set at the connection level. Change the
>> timeout on the connection to infinite and it should not timeout.
>>
>> --
>> Andrew J. Kelly    SQL MVP
>> Solid Quality Mentors
>>
>> <jimsh***@gmail.com> wrote in message
>>
>> news:f88c15e0-fbb4-45b1-86f6-7f9de0f4bb30@e25g2000prg.googlegroups.com...
>>
>>
>>
>> - Hide quoted text -
>>
>> - Show quoted text -
>
Author
27 Nov 2007 5:46 PM
jimshain
The log and DB files are on separate physical disks.

The update used an inner join to another table.  The other table is
fully indexed and there is a one-to-one match between the field in the
big table and the field in the other (smaller) table.  It was
something like:

Update bigtable
set field1 = SmallTable.field6
inner join Smalltable on bigtable.field2 = SmallTable.field7 and
BigTable.field3 = SmallTable.field8
where BigTable.field4 = 'SomeString'

The above updates field1 of the Bigtable with field6 from SmallTable.
The inner join involves 2 other fields in the Bigtable and and 2 other
fields in smalltable.  This update is only done when field4 of the
Bigtable is 'SomeString'.  Everything is indexed.  I have tried the
Update using a variety of other configurations (remove the inner join,
remove the where clause, etc.) and each one failed.

It was not run in a transaction.  There wasn't any other code.

I did not run it in the profiler because it was running too slow
already.  :-)  I was planning on running it through the profiler the
next time I need to do this.  I have no time estimate so I will
probably have to run it for a day and then kill it.

Thanks!

Jim

Show quote
On Nov 27, 9:50 am, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com>
wrote:
> Jim,
>
> > The log and db files have been  on separate disks  since before the
> > problem occurred.
>
> Yes but are they on separate physical disks or just logical? It makes a big
> difference and I wasn't sure which way you have it.
>
> > I got a  message
> > saying that there were40millionrecords affected.  There was then a
> > timeout message below that.
>
> Interesting, what was the exact code or sequence you used for theupdate?
> Did you wrap it in a transaction or was there any other code in the batch?
> You can run a profiler trace at the statement level (both starting and
> completed events) for that spid to see what is happening and when in
> relation to the timeout.  And I would also try changing the connection
> timeout to infinite to see if that changes the behavior. Between the two it
> should narrow down the issue significantly.
>
> --
> Andrew J. Kelly    SQL MVP
> Solid Quality Mentors
>
> <jimsh***@gmail.com> wrote in message
>
> news:eb71f288-ef8c-47bf-bf44-4a0e285339e7@v4g2000hsf.googlegroups.com...
>
>
>
>
>
> > Thanks!
>
> > The log and db files have been  on separate disks  since before the
> > problem occurred.
>
> > I agree that the hardware configuration dictates the time anupdate
> > will take.
>
> > It's down below in one of the messages so I just want to point out
> > that when I killed the largerupdatelast time, I got a  message
> > saying that there were40millionrecords affected.  There was then a
> > timeout message below that. These messages indicate to me that the
> >updateand the indexing completed without a problem.  It also
> > indicates to me that the Hardware configuration is acceptable for a40
> >millionrowupdate. It also indicates to me that the server is not
> > waiting around for theupdateto complete.  That is how I got to
> > trying to fix the server timeout problem.  The hardware is fine.  The
> > timeout is not.
>
> > Thanks again!
>
> > Jim
>
> > On Nov 26, 7:27 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com>
> > wrote:
> >> > I agree with the time estimate increase you mention below.  It just
> >> > seems weird that the whole thing would timeout though.
>
> >> I agree with Kevin on this as the time will definitely not be double for
> >> updating double therowswith that many.
>
> >> > The server
> >> > could do the breakup if there were allot ofrows(IMHO).
>
> >> No that would be bad. Theupdatemust be atomic across all therowsbeing
> >> updated otherwise the database would not be very well accepted in the
> >> enterprise.  You can't leave it up to the Engine to determine where to
> >> break
> >> this up that is your job because it may be different for each instance.
>
> >> > Do you know  if SQL 2005 or even the latest version would be better at
> >> > large updates?
>
> >> The principles are the same. You have to understand everything that
> >> happens
> >> when you do anupdateof any size and then multiply that by x for larger
> >> ones. Each row that gets updated must be logged to the transaction log in
> >> a
> >> way that it can redo or undo in the event of issues. The larger the
> >> transaction the more overhead it will take and the more it will affect
> >> other
> >> users and other operations in the transaction log. For instance you can't
> >> truncate or reuse any portion of the log that has open transactions in
> >> it.
> >> By breaking it up into smaller transactions you dramatically reduce the
> >> exposure and overhead of each transaction. This is how it has been from
> >> day
> >> one and will be tomorrow and the day after. But again the biggest factor
> >> in
> >> how long it takes is your hardware setup. Having the data and log files
> >> on
> >> the same physical drive when trying to do large data modifications is
> >> asking
> >> for trouble. If you want fast (or reasonable) updates on40millionrowsyou
> >> need to ensure you have the proper hardware for the task.
>
> >> As for the timeouts that can be set at the connection level. Change the
> >> timeout on the connection to infinite and it should not timeout.
>
> >> --
> >> Andrew J. Kelly    SQL MVP
> >> Solid Quality Mentors
>
> >> <jimsh***@gmail.com> wrote in message
>
> >>news:f88c15e0-fbb4-45b1-86f6-7f9de0f4bb30@e25g2000prg.googlegroups.com...
>
> >> - Hide quoted text -
>
> >> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -
Author
27 Nov 2007 7:41 PM
Andrew J. Kelly
Jim,

An update (or any DML operation) on millions of rows will almost never use
an index. It is far more expensive to traverse the BTrees for that many rows
than to just scan the table. If you look at the estimated query plan you
should see two scans.  The performance should be greatly enhanced by
specifying a TABLOCK for the table being updated (assuming it can get one).
That way it will only take 1 lock vs. up to several million. If the smaller
table isn't changing try a NOLOCK on it. This may just be the pseudo code
you wrote but the JOIN syntax for the UPDATE is incorrect. It should read
closer to this:



Update b WITH (TABLOCK)
set b.field1 = s.field6
FROM BigTable AS b inner join Smalltable AS s WITH (NOLOCK)
on b.field2 = s.field7 and
b.field3 = s.field8
where b.field4 = 'SomeString'

If you filter on the SPID that you will do the update on the overhead will
be negligable.

--
Andrew J. Kelly    SQL MVP
Solid Quality Mentors


<jimsh***@gmail.com> wrote in message
Show quote
news:97e01f46-8b63-4c9b-82ed-f84b101bd573@s19g2000prg.googlegroups.com...
>
> The log and DB files are on separate physical disks.
>
> The update used an inner join to another table.  The other table is
> fully indexed and there is a one-to-one match between the field in the
> big table and the field in the other (smaller) table.  It was
> something like:
>
> Update bigtable
> set field1 = SmallTable.field6
> inner join Smalltable on bigtable.field2 = SmallTable.field7 and
> BigTable.field3 = SmallTable.field8
> where BigTable.field4 = 'SomeString'
>
> The above updates field1 of the Bigtable with field6 from SmallTable.
> The inner join involves 2 other fields in the Bigtable and and 2 other
> fields in smalltable.  This update is only done when field4 of the
> Bigtable is 'SomeString'.  Everything is indexed.  I have tried the
> Update using a variety of other configurations (remove the inner join,
> remove the where clause, etc.) and each one failed.
>
> It was not run in a transaction.  There wasn't any other code.
>
> I did not run it in the profiler because it was running too slow
> already.  :-)  I was planning on running it through the profiler the
> next time I need to do this.  I have no time estimate so I will
> probably have to run it for a day and then kill it.
>
> Thanks!
>
> Jim
>
> On Nov 27, 9:50 am, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com>
> wrote:
>> Jim,
>>
>> > The log and db files have been  on separate disks  since before the
>> > problem occurred.
>>
>> Yes but are they on separate physical disks or just logical? It makes a
>> big
>> difference and I wasn't sure which way you have it.
>>
>> > I got a  message
>> > saying that there were40millionrecords affected.  There was then a
>> > timeout message below that.
>>
>> Interesting, what was the exact code or sequence you used for theupdate?
>> Did you wrap it in a transaction or was there any other code in the
>> batch?
>> You can run a profiler trace at the statement level (both starting and
>> completed events) for that spid to see what is happening and when in
>> relation to the timeout.  And I would also try changing the connection
>> timeout to infinite to see if that changes the behavior. Between the two
>> it
>> should narrow down the issue significantly.
>>
>> --
>> Andrew J. Kelly    SQL MVP
>> Solid Quality Mentors
>>
>> <jimsh***@gmail.com> wrote in message
>>
>> news:eb71f288-ef8c-47bf-bf44-4a0e285339e7@v4g2000hsf.googlegroups.com...
>>
>>
>>
>>
>>
>> > Thanks!
>>
>> > The log and db files have been  on separate disks  since before the
>> > problem occurred.
>>
>> > I agree that the hardware configuration dictates the time anupdate
>> > will take.
>>
>> > It's down below in one of the messages so I just want to point out
>> > that when I killed the largerupdatelast time, I got a  message
>> > saying that there were40millionrecords affected.  There was then a
>> > timeout message below that. These messages indicate to me that the
>> >updateand the indexing completed without a problem.  It also
>> > indicates to me that the Hardware configuration is acceptable for a40
>> >millionrowupdate. It also indicates to me that the server is not
>> > waiting around for theupdateto complete.  That is how I got to
>> > trying to fix the server timeout problem.  The hardware is fine.  The
>> > timeout is not.
>>
>> > Thanks again!
>>
>> > Jim
>>
>> > On Nov 26, 7:27 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com>
>> > wrote:
>> >> > I agree with the time estimate increase you mention below.  It just
>> >> > seems weird that the whole thing would timeout though.
>>
>> >> I agree with Kevin on this as the time will definitely not be double
>> >> for
>> >> updating double therowswith that many.
>>
>> >> > The server
>> >> > could do the breakup if there were allot ofrows(IMHO).
>>
>> >> No that would be bad. Theupdatemust be atomic across all therowsbeing
>> >> updated otherwise the database would not be very well accepted in the
>> >> enterprise.  You can't leave it up to the Engine to determine where to
>> >> break
>> >> this up that is your job because it may be different for each
>> >> instance.
>>
>> >> > Do you know  if SQL 2005 or even the latest version would be better
>> >> > at
>> >> > large updates?
>>
>> >> The principles are the same. You have to understand everything that
>> >> happens
>> >> when you do anupdateof any size and then multiply that by x for larger
>> >> ones. Each row that gets updated must be logged to the transaction log
>> >> in
>> >> a
>> >> way that it can redo or undo in the event of issues. The larger the
>> >> transaction the more overhead it will take and the more it will affect
>> >> other
>> >> users and other operations in the transaction log. For instance you
>> >> can't
>> >> truncate or reuse any portion of the log that has open transactions in
>> >> it.
>> >> By breaking it up into smaller transactions you dramatically reduce
>> >> the
>> >> exposure and overhead of each transaction. This is how it has been
>> >> from
>> >> day
>> >> one and will be tomorrow and the day after. But again the biggest
>> >> factor
>> >> in
>> >> how long it takes is your hardware setup. Having the data and log
>> >> files
>> >> on
>> >> the same physical drive when trying to do large data modifications is
>> >> asking
>> >> for trouble. If you want fast (or reasonable) updates
>> >> on40millionrowsyou
>> >> need to ensure you have the proper hardware for the task.
>>
>> >> As for the timeouts that can be set at the connection level. Change
>> >> the
>> >> timeout on the connection to infinite and it should not timeout.
>>
>> >> --
>> >> Andrew J. Kelly    SQL MVP
>> >> Solid Quality Mentors
>>
>> >> <jimsh***@gmail.com> wrote in message
>>
>> >>news:f88c15e0-fbb4-45b1-86f6-7f9de0f4bb30@e25g2000prg.googlegroups.com...
>>
>> >> - Hide quoted text -
>>
>> >> - Show quoted text -- Hide quoted text -
>>
>> - Show quoted text -
>

AddThis Social Bookmark Button