|
sql
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Update Timeout and indexesI have two questions about Updating a large table. It has about 40 million rows. When I try to do an update on one of the fields, it goes on forever until I kill it. The last time this happened, after I killed it, I noticed that the messages screen said that the number of records (40 million) were effected by the update (that they had changed). It also had something about a Timeout error (sorry but I did not catch the exact wording and I won't be running it again like that). In order to try to get by this, I do the update in sections (update the first 2 million, then the next, etc...) My thought is that some sort of timeout is occuring at the end and the connection is lost. Is that correct? The other question has to do with updating a field that is indexed. When is the index changed? Is it each time the field is changed (which would be 40 million times for my update above) or is it done all at once at the end? My idea is that the index is being changed at the end of the big update (40 million rows) and it is timeing out. The problem with that theory though is that the problem with the 40 million row update happens if the updated field is indexed or not. Of course, there are indexes on other fields so it could still be the cause. Any ideas? Thanks alot! Jim It's almost never a good idea to update that many rows at one time. You
would be better off wrapping the updates in a loop and updating say 5K or 10K rows at a time. That not only gives you better concurrency but will usually finish much faster. If you have an index on the column being updated and the table is not in use other than the update you might be better off dropping the index and recreating it once all the updates are done. Also make sure you have plenty of free space in your log file to cover all the updates. -- Show quoteAndrew J. Kelly SQL MVP Solid Quality Mentors "Jim" <jsh***@datamann.com> wrote in message news:85ef74d9-7c5a-414c-b670-aa5708c00655@l1g2000hsa.googlegroups.com... > > Hi, > > I have two questions about Updating a large table. It has about 40 > million rows. When I try to do an update on one of the fields, it > goes on forever until I kill it. The last time this happened, after I > killed it, I noticed that the messages screen said that the number of > records (40 million) were effected by the update (that they had > changed). It also had something about a Timeout error (sorry but I > did not catch the exact wording and I won't be running it again like > that). In order to try to get by this, I do the update in sections > (update the first 2 million, then the next, etc...) My thought is > that some sort of timeout is occuring at the end and the connection is > lost. Is that correct? > > The other question has to do with updating a field that is indexed. > When is the index changed? Is it each time the field is changed > (which would be 40 million times for my update above) or is it done > all at once at the end? > > My idea is that the index is being changed at the end of the big > update (40 million rows) and it is timeing out. The problem with that > theory though is that the problem with the 40 million row update > happens if the updated field is indexed or not. Of course, there are > indexes on other fields so it could still be the cause. > > Any ideas? > > Thanks alot! > > Jim Thanks!
The scripts that I have to do the loop on a smaller update statement are useful. Each update runs quickly (about 20 minutes for 2 million rows) but there are several databases that have a large number or rows. I can easily generalize the scripts so that it work anywhere but I'm wondering why this is a problem. In the grand scheme of things, all tables are getting larger and 40 million rows is not alot. It should work (and maybe even faster than the speed I got from the sectioning :-) As for updating an indexed field, that too should work without a decrease in speed. Show quote On Nov 21, 3:55 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com> wrote: > It's almost never a good idea to update that many rows at one time. You > would be better off wrapping the updates in a loop and updating say 5K or > 10K rows at a time. That not only gives you better concurrency but will > usually finish much faster. If you have an index on the column being updated > and the table is not in use other than the update you might be better off > dropping the index and recreating it once all the updates are done. Also > make sure you have plenty of free space in your log file to cover all the > updates. > > -- > Andrew J. Kelly SQL MVP > Solid Quality Mentors > > "Jim" <jsh***@datamann.com> wrote in message > > news:85ef74d9-7c5a-414c-b670-aa5708c00655@l1g2000hsa.googlegroups.com... > > > > > > > Hi, > > > I have two questions about Updating a large table. It has about 40 > > million rows. When I try to do an update on one of the fields, it > > goes on forever until I kill it. The last time this happened, after I > > killed it, I noticed that the messages screen said that the number of > > records (40 million) were effected by the update (that they had > > changed). It also had something about a Timeout error (sorry but I > > did not catch the exact wording and I won't be running it again like > > that). In order to try to get by this, I do the update in sections > > (update the first 2 million, then the next, etc...) My thought is > > that some sort of timeout is occuring at the end and the connection is > > lost. Is that correct? > > > The other question has to do with updating a field that is indexed. > > When is the index changed? Is it each time the field is changed > > (which would be 40 million times for my update above) or is it done > > all at once at the end? > > > My idea is that the index is being changed at the end of the big > > update (40 million rows) and it is timeing out. The problem with that > > theory though is that the problem with the 40 million row update > > happens if the updated field is indexed or not. Of course, there are > > indexes on other fields so it could still be the cause. > > > Any ideas? > > > Thanks alot! > > > Jim- Hide quoted text - > > - Show quoted text - The default timeout for most ODBC connections is 30 seconds. So if your
update takes longer than that it can very easily timeout. You can change the timeout to anything you like so it does not timeout but if you are trying to update 40 million rows online your other users will not be happy with you. The amount of time it takes depends on lots of factors such as the hardware and database configurations, the type of updates, number of rows affected, blocking, number and type of indexes etc. The hardware itself will dictate a lot of how fast this happens so if you plan to do lots of these large updates you need to ensure the hardware is setup for that as well. Your transaction log will also play a large role in this and if you have the tran log on the same drive array as the data files it will make things much worse. I don't know how fast you think it should be but I don't believe you realize everything that is happening when you make an update such as that. -- Show quoteAndrew J. Kelly SQL MVP Solid Quality Mentors "Jim" <jsh***@datamann.com> wrote in message news:d7dd33f0-78ca-4af2-a288-84c174b149ef@l1g2000hsa.googlegroups.com... > > Thanks! > > The scripts that I have to do the loop on a smaller update statement > are useful. Each update runs quickly (about 20 minutes for 2 million > rows) but there are several databases that have a large number or > rows. I can easily generalize the scripts so that it work anywhere > but I'm wondering why this is a problem. In the grand scheme of > things, all tables are getting larger and 40 million rows is not > alot. It should work (and maybe even faster than the speed I got from > the sectioning :-) > > As for updating an indexed field, that too should work without a > decrease in speed. > > > > > On Nov 21, 3:55 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com> > wrote: >> It's almost never a good idea to update that many rows at one time. You >> would be better off wrapping the updates in a loop and updating say 5K or >> 10K rows at a time. That not only gives you better concurrency but will >> usually finish much faster. If you have an index on the column being >> updated >> and the table is not in use other than the update you might be better off >> dropping the index and recreating it once all the updates are done. Also >> make sure you have plenty of free space in your log file to cover all the >> updates. >> >> -- >> Andrew J. Kelly SQL MVP >> Solid Quality Mentors >> >> "Jim" <jsh***@datamann.com> wrote in message >> >> news:85ef74d9-7c5a-414c-b670-aa5708c00655@l1g2000hsa.googlegroups.com... >> >> >> >> >> >> > Hi, >> >> > I have two questions about Updating a large table. It has about 40 >> > million rows. When I try to do an update on one of the fields, it >> > goes on forever until I kill it. The last time this happened, after I >> > killed it, I noticed that the messages screen said that the number of >> > records (40 million) were effected by the update (that they had >> > changed). It also had something about a Timeout error (sorry but I >> > did not catch the exact wording and I won't be running it again like >> > that). In order to try to get by this, I do the update in sections >> > (update the first 2 million, then the next, etc...) My thought is >> > that some sort of timeout is occuring at the end and the connection is >> > lost. Is that correct? >> >> > The other question has to do with updating a field that is indexed. >> > When is the index changed? Is it each time the field is changed >> > (which would be 40 million times for my update above) or is it done >> > all at once at the end? >> >> > My idea is that the index is being changed at the end of the big >> > update (40 million rows) and it is timeing out. The problem with that >> > theory though is that the problem with the 40 million row update >> > happens if the updated field is indexed or not. Of course, there are >> > indexes on other fields so it could still be the cause. >> >> > Any ideas? >> >> > Thanks alot! >> >> > Jim- Hide quoted text - >> >> - Show quoted text - > Yes, I understand that the hardware plays a large part in the time the
update takes. I have the configurations you discuss below and the users are ok with a long update running. When I run the update in 2 million row chunks, each set takes about 20 minutes. With the 40 million row update, I would expect it to take 20 * 20 = 400 minutes (ok, maybe a little longer :-). Is that unreasonable? What happens is that something times out and after a day of running, I kill it and it tries to undo the update. I then reboot the system and the server runs a recovery which takes several hours.... Very frustrating... You mention a timeout below. What is it? Can I make it longer? I still think it is timing out when it is doing the indexing portion of the update. I can't test it by dropping the indexes and running it because it has several foreign indexes. I don't think I mentioned it above but I'm running SQL Server 200 on an XP machine. Thanks for your help! Jim Show quote On Nov 23, 10:57 am, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com> wrote: > The default timeout for most ODBC connections is 30 seconds. So if yourupdatetakes longer than that it can very easily timeout. You can change the > timeout to anything you like so it does not timeout but if you are trying toupdate40millionrowsonline your other users will not be happy with you. > The amount of time it takes depends on lots of factors such as the hardware > and database configurations, the type of updates, number ofrowsaffected, > blocking, number and type of indexes etc. The hardware itself will dictate > a lot of how fast this happens so if you plan to do lots of these large > updates you need to ensure the hardware is setup for that as well. Your > transaction log will also play a large role in this and if you have the tran > log on the same drive array as the data files it will make things much > worse. I don't know how fast you think it should be but I don't believe you > realize everything that is happening when you make anupdatesuch as that. > > -- > Andrew J. Kelly SQL MVP > Solid Quality Mentors > > "Jim" <jsh***@datamann.com> wrote in message > > news:d7dd33f0-78ca-4af2-a288-84c174b149ef@l1g2000hsa.googlegroups.com... > > > > > > > Thanks! > > > The scripts that I have to do the loop on a smallerupdatestatement > > are useful. Eachupdateruns quickly (about 20 minutes for 2million > >rows) but there are several databases that have a large number or > >rows. I can easily generalize the scripts so that it work anywhere > > but I'm wondering why this is a problem. In the grand scheme of > > things, all tables are getting larger and40millionrowsis not > > alot. It should work (and maybe even faster than the speed I got from > > the sectioning :-) > > > As for updating an indexed field, that too should work without a > > decrease in speed. > > > On Nov 21, 3:55 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com> > > wrote: > >> It's almost never a good idea toupdatethat manyrowsat one time. You > >> would be better off wrapping the updates in a loop and updating say 5K or > >> 10Krowsat a time. That not only gives you better concurrency but will > >> usually finish much faster. If you have an index on the column being > >> updated > >> and the table is not in use other than theupdateyou might be better off > >> dropping the index and recreating it once all the updates are done. Also > >> make sure you have plenty of free space in your log file to cover all the > >> updates. > > >> -- > >> Andrew J. Kelly SQL MVP > >> Solid Quality Mentors > > >> "Jim" <jsh***@datamann.com> wrote in message > > >>news:85ef74d9-7c5a-414c-b670-aa5708c00655@l1g2000hsa.googlegroups.com... > > >> > Hi, > > >> > I have two questions about Updating a large table. It has about40 > >> >millionrows. When I try to do anupdateon one of the fields, it > >> > goes on forever until I kill it. The last time this happened, after I > >> > killed it, I noticed that the messages screen said that the number of > >> > records (40million) were effected by theupdate (that they had > >> > changed). It also had something about a Timeout error (sorry but I > >> > did not catch the exact wording and I won't be running it again like > >> > that). In order to try to get by this, I do theupdatein sections > >> > (updatethe first 2million, then the next, etc...) My thought is > >> > that some sort of timeout is occuring at the end and the connection is > >> > lost. Is that correct? > > >> > The other question has to do with updating a field that is indexed. > >> > When is the index changed? Is it each time the field is changed > >> > (which would be40milliontimes for myupdateabove) or is it done > >> > all at once at the end? > > >> > My idea is that the index is being changed at the end of the big > >> >update(40millionrows) and it is timeing out. The problem with that > >> > theory though is that the problem with the40millionrowupdate > >> > happens if the updated field is indexed or not. Of course, there are > >> > indexes on other fields so it could still be the cause. > > >> > Any ideas? > > >> > Thanks alot! > > >> > Jim- Hide quoted text - > > >> - Show quoted text -- Hide quoted text - > > - Show quoted text - Actually your 20*20 theory is probably 'unreasonable'. Server resources
often reach a point of utilization I call the break-over point, whereby increased requests for the resource lead to SLOWER performance - sometimes slowing at an increasing rate with increased requests. The 40M row update batch could well take 500 or even 1000 minutes, depending on the slope of the curve. Yes, applications can manually set their connection timeout. The mechanism for this varies with the differing connectivity methods. Are you using a DSN or DSN-less connection? Does the code present an ini file you can use? -- Show quoteKevin G. Boles TheSQLGuru Indicium Resources, Inc. <jimsh***@gmail.com> wrote in message news:faade63f-27fc-427a-a859-9873ca07144c@o6g2000hsd.googlegroups.com... > > > Yes, I understand that the hardware plays a large part in the time the > update takes. I have the configurations you discuss below and the > users are ok with a long update running. > > When I run the update in 2 million row chunks, each set takes about 20 > minutes. With the 40 million row update, I would expect it to take 20 > * 20 = 400 minutes (ok, maybe a little longer :-). Is that > unreasonable? What happens is that something times out and after a day > of running, I kill it and it tries to undo the update. I then reboot > the system and the server runs a recovery which takes several > hours.... Very frustrating... > > You mention a timeout below. What is it? Can I make it longer? I > still think it is timing out when it is doing the indexing portion of > the update. I can't test it by dropping the indexes and running it > because it has several foreign indexes. > > I don't think I mentioned it above but I'm running SQL Server 200 on > an XP machine. > > Thanks for your help! > > Jim > > On Nov 23, 10:57 am, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com> > wrote: >> The default timeout for most ODBC connections is 30 seconds. So if >> yourupdatetakes longer than that it can very easily timeout. You can >> change the >> timeout to anything you like so it does not timeout but if you are trying >> toupdate40millionrowsonline your other users will not be happy with you. >> The amount of time it takes depends on lots of factors such as the >> hardware >> and database configurations, the type of updates, number ofrowsaffected, >> blocking, number and type of indexes etc. The hardware itself will >> dictate >> a lot of how fast this happens so if you plan to do lots of these large >> updates you need to ensure the hardware is setup for that as well. Your >> transaction log will also play a large role in this and if you have the >> tran >> log on the same drive array as the data files it will make things much >> worse. I don't know how fast you think it should be but I don't believe >> you >> realize everything that is happening when you make anupdatesuch as that. >> >> -- >> Andrew J. Kelly SQL MVP >> Solid Quality Mentors >> >> "Jim" <jsh***@datamann.com> wrote in message >> >> news:d7dd33f0-78ca-4af2-a288-84c174b149ef@l1g2000hsa.googlegroups.com... >> >> >> >> >> >> > Thanks! >> >> > The scripts that I have to do the loop on a smallerupdatestatement >> > are useful. Eachupdateruns quickly (about 20 minutes for 2million >> >rows) but there are several databases that have a large number or >> >rows. I can easily generalize the scripts so that it work anywhere >> > but I'm wondering why this is a problem. In the grand scheme of >> > things, all tables are getting larger and40millionrowsis not >> > alot. It should work (and maybe even faster than the speed I got from >> > the sectioning :-) >> >> > As for updating an indexed field, that too should work without a >> > decrease in speed. >> >> > On Nov 21, 3:55 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com> >> > wrote: >> >> It's almost never a good idea toupdatethat manyrowsat one time. You >> >> would be better off wrapping the updates in a loop and updating say 5K >> >> or >> >> 10Krowsat a time. That not only gives you better concurrency but will >> >> usually finish much faster. If you have an index on the column being >> >> updated >> >> and the table is not in use other than theupdateyou might be better >> >> off >> >> dropping the index and recreating it once all the updates are done. >> >> Also >> >> make sure you have plenty of free space in your log file to cover all >> >> the >> >> updates. >> >> >> -- >> >> Andrew J. Kelly SQL MVP >> >> Solid Quality Mentors >> >> >> "Jim" <jsh***@datamann.com> wrote in message >> >> >>news:85ef74d9-7c5a-414c-b670-aa5708c00655@l1g2000hsa.googlegroups.com... >> >> >> > Hi, >> >> >> > I have two questions about Updating a large table. It has about40 >> >> >millionrows. When I try to do anupdateon one of the fields, it >> >> > goes on forever until I kill it. The last time this happened, after >> >> > I >> >> > killed it, I noticed that the messages screen said that the number >> >> > of >> >> > records (40million) were effected by theupdate (that they had >> >> > changed). It also had something about a Timeout error (sorry but I >> >> > did not catch the exact wording and I won't be running it again like >> >> > that). In order to try to get by this, I do theupdatein sections >> >> > (updatethe first 2million, then the next, etc...) My thought is >> >> > that some sort of timeout is occuring at the end and the connection >> >> > is >> >> > lost. Is that correct? >> >> >> > The other question has to do with updating a field that is indexed. >> >> > When is the index changed? Is it each time the field is changed >> >> > (which would be40milliontimes for myupdateabove) or is it done >> >> > all at once at the end? >> >> >> > My idea is that the index is being changed at the end of the big >> >> >update(40millionrows) and it is timeing out. The problem with that >> >> > theory though is that the problem with the40millionrowupdate >> >> > happens if the updated field is indexed or not. Of course, there >> >> > are >> >> > indexes on other fields so it could still be the cause. >> >> >> > Any ideas? >> >> >> > Thanks alot! >> >> >> > Jim- Hide quoted text - >> >> >> - Show quoted text -- Hide quoted text - >> >> - Show quoted text - > I agree with the time estimate increase you mention below. It just
seems weird that the whole thing would timeout though. The server could do the breakup if there were alot of rows (IMHO). Do you know if SQL 2005 or even the latest version would be better at large updates? I have not setup a DSN on the server. There isn't an ini file that I know of. Thanks again! Jim Show quote On Nov 26, 11:00 am, "TheSQLGuru" <kgbo***@earthlink.net> wrote: > Actually your 20*20 theory is probably 'unreasonable'. Server resources > often reach a point of utilization I call the break-over point, whereby > increased requests for the resource lead to SLOWER performance - sometimes > slowing at an increasing rate with increased requests. The 40M rowupdate > batch could well take 500 or even 1000 minutes, depending on the slope of > the curve. > > Yes, applications can manually set their connection timeout. The mechanism > for this varies with the differing connectivity methods. Are you using a > DSN or DSN-less connection? Does the code present an ini file you can use? > > -- > Kevin G. Boles > TheSQLGuru > Indicium Resources, Inc. > > <jimsh***@gmail.com> wrote in message > > news:faade63f-27fc-427a-a859-9873ca07144c@o6g2000hsd.googlegroups.com... > > > > > > > Yes, I understand that the hardware plays a large part in the time the > >updatetakes. I have the configurations you discuss below and the > > users are ok with a longupdaterunning. > > > When I run theupdatein 2millionrow chunks, each set takes about 20 > > minutes. With the40millionrowupdate, I would expect it to take 20 > > * 20 = 400 minutes (ok, maybe a little longer :-). Is that > > unreasonable? What happens is that something times out and after a day > > of running, I kill it and it tries to undo theupdate. I then reboot > > the system and the server runs a recovery which takes several > > hours.... Very frustrating... > > > You mention a timeout below. What is it? Can I make it longer? I > > still think it is timing out when it is doing the indexing portion of > > theupdate. I can't test it by dropping the indexes and running it > > because it has several foreign indexes. > > > I don't think I mentioned it above but I'm running SQL Server 200 on > > an XP machine. > > > Thanks for your help! > > > Jim > > > On Nov 23, 10:57 am, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com> > > wrote: > >> The default timeout for most ODBC connections is 30 seconds. So if > >> yourupdatetakes longer than that it can very easily timeout. You can > >> change the > >> timeout to anything you like so it does not timeout but if you are trying > >> toupdate40millionrowsonline your other users will not be happy with you. > >> The amount of time it takes depends on lots of factors such as the > >> hardware > >> and database configurations, the type of updates, number ofrowsaffected, > >> blocking, number and type of indexes etc. The hardware itself will > >> dictate > >> a lot of how fast this happens so if you plan to do lots of these large > >> updates you need to ensure the hardware is setup for that as well. Your > >> transaction log will also play a large role in this and if you have the > >> tran > >> log on the same drive array as the data files it will make things much > >> worse. I don't know how fast you think it should be but I don't believe > >> you > >> realize everything that is happening when you make anupdatesuch as that. > > >> -- > >> Andrew J. Kelly SQL MVP > >> Solid Quality Mentors > > >> "Jim" <jsh***@datamann.com> wrote in message > > >>news:d7dd33f0-78ca-4af2-a288-84c174b149ef@l1g2000hsa.googlegroups.com... > > >> > Thanks! > > >> > The scripts that I have to do the loop on a smallerupdatestatement > >> > are useful. Eachupdateruns quickly (about 20 minutes for 2million > >> >rows) but there are several databases that have a large number or > >> >rows. I can easily generalize the scripts so that it work anywhere > >> > but I'm wondering why this is a problem. In the grand scheme of > >> > things, all tables are getting larger and40millionrowsis not > >> > alot. It should work (and maybe even faster than the speed I got from > >> > the sectioning :-) > > >> > As for updating an indexed field, that too should work without a > >> > decrease in speed. > > >> > On Nov 21, 3:55 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com> > >> > wrote: > >> >> It's almost never a good idea toupdatethat manyrowsat one time. You > >> >> would be better off wrapping the updates in a loop and updating say 5K > >> >> or > >> >> 10Krowsat a time. That not only gives you better concurrency but will > >> >> usually finish much faster. If you have an index on the column being > >> >> updated > >> >> and the table is not in use other than theupdateyou might be better > >> >> off > >> >> dropping the index and recreating it once all the updates are done. > >> >> Also > >> >> make sure you have plenty of free space in your log file to cover all > >> >> the > >> >> updates. > > >> >> -- > >> >> Andrew J. Kelly SQL MVP > >> >> Solid Quality Mentors > > >> >> "Jim" <jsh***@datamann.com> wrote in message > > >> >>news:85ef74d9-7c5a-414c-b670-aa5708c00655@l1g2000hsa.googlegroups.com... > > >> >> > Hi, > > >> >> > I have two questions about Updating a large table. It has about40 > >> >> >millionrows. When I try to do anupdateon one of the fields, it > >> >> > goes on forever until I kill it. The last time this happened, after > >> >> > I > >> >> > killed it, I noticed that the messages screen said that the number > >> >> > of > >> >> > records (40million) were effected by theupdate (that they had > >> >> > changed). It also had something about a Timeout error (sorry but I > >> >> > did not catch the exact wording and I won't be running it again like > >> >> > that). In order to try to get by this, I do theupdatein sections > >> >> > (updatethe first 2million, then the next, etc...) My thought is > >> >> > that some sort of timeout is occuring at the end and the connection > >> >> > is > >> >> > lost. Is that correct? > > >> >> > The other question has to do with updating a field that is indexed. > >> >> > When is the index changed? Is it each time the field is changed > >> >> > (which would be40milliontimes for myupdateabove) or is it done > >> >> > all at once at the end? > > >> >> > My idea is that the index is being changed at the end of the big > >> >> >update(40millionrows) and it is timeing out. The problem with that > >> >> > theory though is that the problem with the40millionrowupdate > >> >> > happens if the updated field is indexed or not. Of course, there > >> >> > are > >> >> > indexes on other fields so it could still be the cause. > > >> >> > Any ideas? > > >> >> > Thanks alot! > > >> >> > Jim- Hide quoted text - > > >> >> - Show quoted text -- Hide quoted text - > > >> - Show quoted text -- Hide quoted text - > > - Show quoted text - > I agree with the time estimate increase you mention below. It just I agree with Kevin on this as the time will definitely not be double for > seems weird that the whole thing would timeout though. updating double the rows with that many. > The server No that would be bad. The update must be atomic across all the rows being > could do the breakup if there were allot of rows (IMHO). updated otherwise the database would not be very well accepted in the enterprise. You can't leave it up to the Engine to determine where to break this up that is your job because it may be different for each instance. > Do you know if SQL 2005 or even the latest version would be better at The principles are the same. You have to understand everything that happens > large updates? when you do an update of any size and then multiply that by x for larger ones. Each row that gets updated must be logged to the transaction log in a way that it can redo or undo in the event of issues. The larger the transaction the more overhead it will take and the more it will affect other users and other operations in the transaction log. For instance you can't truncate or reuse any portion of the log that has open transactions in it. By breaking it up into smaller transactions you dramatically reduce the exposure and overhead of each transaction. This is how it has been from day one and will be tomorrow and the day after. But again the biggest factor in how long it takes is your hardware setup. Having the data and log files on the same physical drive when trying to do large data modifications is asking for trouble. If you want fast (or reasonable) updates on 40 million rows you need to ensure you have the proper hardware for the task. As for the timeouts that can be set at the connection level. Change the timeout on the connection to infinite and it should not timeout. -- Show quoteAndrew J. Kelly SQL MVP Solid Quality Mentors <jimsh***@gmail.com> wrote in message news:f88c15e0-fbb4-45b1-86f6-7f9de0f4bb30@e25g2000prg.googlegroups.com... > Thanks!
The log and db files have been on separate disks since before the problem occurred. I agree that the hardware configuration dictates the time an update will take. It's down below in one of the messages so I just want to point out that when I killed the larger update last time, I got a message saying that there were 40 million records affected. There was then a timeout message below that. These messages indicate to me that the update and the indexing completed without a problem. It also indicates to me that the Hardware configuration is acceptable for a 40 million row update. It also indicates to me that the server is not waiting around for the update to complete. That is how I got to trying to fix the server timeout problem. The hardware is fine. The timeout is not. Thanks again! Jim Show quote On Nov 26, 7:27 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com> wrote: > > I agree with the time estimate increase you mention below. It just > > seems weird that the whole thing would timeout though. > > I agree with Kevin on this as the time will definitely not be double for > updating double therowswith that many. > > > The server > > could do the breakup if there were allot ofrows(IMHO). > > No that would be bad. Theupdatemust be atomic across all therowsbeing > updated otherwise the database would not be very well accepted in the > enterprise. You can't leave it up to the Engine to determine where to break > this up that is your job because it may be different for each instance. > > > Do you know if SQL 2005 or even the latest version would be better at > > large updates? > > The principles are the same. You have to understand everything that happens > when you do anupdateof any size and then multiply that by x for larger > ones. Each row that gets updated must be logged to the transaction log in a > way that it can redo or undo in the event of issues. The larger the > transaction the more overhead it will take and the more it will affect other > users and other operations in the transaction log. For instance you can't > truncate or reuse any portion of the log that has open transactions in it. > By breaking it up into smaller transactions you dramatically reduce the > exposure and overhead of each transaction. This is how it has been from day > one and will be tomorrow and the day after. But again the biggest factor in > how long it takes is your hardware setup. Having the data and log files on > the same physical drive when trying to do large data modifications is asking > for trouble. If you want fast (or reasonable) updates on40millionrowsyou > need to ensure you have the proper hardware for the task. > > As for the timeouts that can be set at the connection level. Change the > timeout on the connection to infinite and it should not timeout. > > -- > Andrew J. Kelly SQL MVP > Solid Quality Mentors > > <jimsh***@gmail.com> wrote in message > > news:f88c15e0-fbb4-45b1-86f6-7f9de0f4bb30@e25g2000prg.googlegroups.com... > > > > - Hide quoted text - > > - Show quoted text - Jim,
> The log and db files have been on separate disks since before the Yes but are they on separate physical disks or just logical? It makes a big > problem occurred. difference and I wasn't sure which way you have it. > I got a message Interesting, what was the exact code or sequence you used for the update? > saying that there were 40 million records affected. There was then a > timeout message below that. Did you wrap it in a transaction or was there any other code in the batch? You can run a profiler trace at the statement level (both starting and completed events) for that spid to see what is happening and when in relation to the timeout. And I would also try changing the connection timeout to infinite to see if that changes the behavior. Between the two it should narrow down the issue significantly. -- Show quoteAndrew J. Kelly SQL MVP Solid Quality Mentors <jimsh***@gmail.com> wrote in message news:eb71f288-ef8c-47bf-bf44-4a0e285339e7@v4g2000hsf.googlegroups.com... > > Thanks! > > The log and db files have been on separate disks since before the > problem occurred. > > I agree that the hardware configuration dictates the time an update > will take. > > It's down below in one of the messages so I just want to point out > that when I killed the larger update last time, I got a message > saying that there were 40 million records affected. There was then a > timeout message below that. These messages indicate to me that the > update and the indexing completed without a problem. It also > indicates to me that the Hardware configuration is acceptable for a 40 > million row update. It also indicates to me that the server is not > waiting around for the update to complete. That is how I got to > trying to fix the server timeout problem. The hardware is fine. The > timeout is not. > > Thanks again! > > Jim > > > On Nov 26, 7:27 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com> > wrote: >> > I agree with the time estimate increase you mention below. It just >> > seems weird that the whole thing would timeout though. >> >> I agree with Kevin on this as the time will definitely not be double for >> updating double therowswith that many. >> >> > The server >> > could do the breakup if there were allot ofrows(IMHO). >> >> No that would be bad. Theupdatemust be atomic across all therowsbeing >> updated otherwise the database would not be very well accepted in the >> enterprise. You can't leave it up to the Engine to determine where to >> break >> this up that is your job because it may be different for each instance. >> >> > Do you know if SQL 2005 or even the latest version would be better at >> > large updates? >> >> The principles are the same. You have to understand everything that >> happens >> when you do anupdateof any size and then multiply that by x for larger >> ones. Each row that gets updated must be logged to the transaction log in >> a >> way that it can redo or undo in the event of issues. The larger the >> transaction the more overhead it will take and the more it will affect >> other >> users and other operations in the transaction log. For instance you can't >> truncate or reuse any portion of the log that has open transactions in >> it. >> By breaking it up into smaller transactions you dramatically reduce the >> exposure and overhead of each transaction. This is how it has been from >> day >> one and will be tomorrow and the day after. But again the biggest factor >> in >> how long it takes is your hardware setup. Having the data and log files >> on >> the same physical drive when trying to do large data modifications is >> asking >> for trouble. If you want fast (or reasonable) updates on40millionrowsyou >> need to ensure you have the proper hardware for the task. >> >> As for the timeouts that can be set at the connection level. Change the >> timeout on the connection to infinite and it should not timeout. >> >> -- >> Andrew J. Kelly SQL MVP >> Solid Quality Mentors >> >> <jimsh***@gmail.com> wrote in message >> >> news:f88c15e0-fbb4-45b1-86f6-7f9de0f4bb30@e25g2000prg.googlegroups.com... >> >> >> >> - Hide quoted text - >> >> - Show quoted text - > The log and DB files are on separate physical disks.
The update used an inner join to another table. The other table is fully indexed and there is a one-to-one match between the field in the big table and the field in the other (smaller) table. It was something like: Update bigtable set field1 = SmallTable.field6 inner join Smalltable on bigtable.field2 = SmallTable.field7 and BigTable.field3 = SmallTable.field8 where BigTable.field4 = 'SomeString' The above updates field1 of the Bigtable with field6 from SmallTable. The inner join involves 2 other fields in the Bigtable and and 2 other fields in smalltable. This update is only done when field4 of the Bigtable is 'SomeString'. Everything is indexed. I have tried the Update using a variety of other configurations (remove the inner join, remove the where clause, etc.) and each one failed. It was not run in a transaction. There wasn't any other code. I did not run it in the profiler because it was running too slow already. :-) I was planning on running it through the profiler the next time I need to do this. I have no time estimate so I will probably have to run it for a day and then kill it. Thanks! Jim Show quote On Nov 27, 9:50 am, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com> wrote: > Jim, > > > The log and db files have been on separate disks since before the > > problem occurred. > > Yes but are they on separate physical disks or just logical? It makes a big > difference and I wasn't sure which way you have it. > > > I got a message > > saying that there were40millionrecords affected. There was then a > > timeout message below that. > > Interesting, what was the exact code or sequence you used for theupdate? > Did you wrap it in a transaction or was there any other code in the batch? > You can run a profiler trace at the statement level (both starting and > completed events) for that spid to see what is happening and when in > relation to the timeout. And I would also try changing the connection > timeout to infinite to see if that changes the behavior. Between the two it > should narrow down the issue significantly. > > -- > Andrew J. Kelly SQL MVP > Solid Quality Mentors > > <jimsh***@gmail.com> wrote in message > > news:eb71f288-ef8c-47bf-bf44-4a0e285339e7@v4g2000hsf.googlegroups.com... > > > > > > > Thanks! > > > The log and db files have been on separate disks since before the > > problem occurred. > > > I agree that the hardware configuration dictates the time anupdate > > will take. > > > It's down below in one of the messages so I just want to point out > > that when I killed the largerupdatelast time, I got a message > > saying that there were40millionrecords affected. There was then a > > timeout message below that. These messages indicate to me that the > >updateand the indexing completed without a problem. It also > > indicates to me that the Hardware configuration is acceptable for a40 > >millionrowupdate. It also indicates to me that the server is not > > waiting around for theupdateto complete. That is how I got to > > trying to fix the server timeout problem. The hardware is fine. The > > timeout is not. > > > Thanks again! > > > Jim > > > On Nov 26, 7:27 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com> > > wrote: > >> > I agree with the time estimate increase you mention below. It just > >> > seems weird that the whole thing would timeout though. > > >> I agree with Kevin on this as the time will definitely not be double for > >> updating double therowswith that many. > > >> > The server > >> > could do the breakup if there were allot ofrows(IMHO). > > >> No that would be bad. Theupdatemust be atomic across all therowsbeing > >> updated otherwise the database would not be very well accepted in the > >> enterprise. You can't leave it up to the Engine to determine where to > >> break > >> this up that is your job because it may be different for each instance. > > >> > Do you know if SQL 2005 or even the latest version would be better at > >> > large updates? > > >> The principles are the same. You have to understand everything that > >> happens > >> when you do anupdateof any size and then multiply that by x for larger > >> ones. Each row that gets updated must be logged to the transaction log in > >> a > >> way that it can redo or undo in the event of issues. The larger the > >> transaction the more overhead it will take and the more it will affect > >> other > >> users and other operations in the transaction log. For instance you can't > >> truncate or reuse any portion of the log that has open transactions in > >> it. > >> By breaking it up into smaller transactions you dramatically reduce the > >> exposure and overhead of each transaction. This is how it has been from > >> day > >> one and will be tomorrow and the day after. But again the biggest factor > >> in > >> how long it takes is your hardware setup. Having the data and log files > >> on > >> the same physical drive when trying to do large data modifications is > >> asking > >> for trouble. If you want fast (or reasonable) updates on40millionrowsyou > >> need to ensure you have the proper hardware for the task. > > >> As for the timeouts that can be set at the connection level. Change the > >> timeout on the connection to infinite and it should not timeout. > > >> -- > >> Andrew J. Kelly SQL MVP > >> Solid Quality Mentors > > >> <jimsh***@gmail.com> wrote in message > > >>news:f88c15e0-fbb4-45b1-86f6-7f9de0f4bb30@e25g2000prg.googlegroups.com... > > >> - Hide quoted text - > > >> - Show quoted text -- Hide quoted text - > > - Show quoted text - Jim,
An update (or any DML operation) on millions of rows will almost never use an index. It is far more expensive to traverse the BTrees for that many rows than to just scan the table. If you look at the estimated query plan you should see two scans. The performance should be greatly enhanced by specifying a TABLOCK for the table being updated (assuming it can get one). That way it will only take 1 lock vs. up to several million. If the smaller table isn't changing try a NOLOCK on it. This may just be the pseudo code you wrote but the JOIN syntax for the UPDATE is incorrect. It should read closer to this: Update b WITH (TABLOCK) set b.field1 = s.field6 FROM BigTable AS b inner join Smalltable AS s WITH (NOLOCK) on b.field2 = s.field7 and b.field3 = s.field8 where b.field4 = 'SomeString' If you filter on the SPID that you will do the update on the overhead will be negligable. -- Show quoteAndrew J. Kelly SQL MVP Solid Quality Mentors <jimsh***@gmail.com> wrote in message news:97e01f46-8b63-4c9b-82ed-f84b101bd573@s19g2000prg.googlegroups.com... > > The log and DB files are on separate physical disks. > > The update used an inner join to another table. The other table is > fully indexed and there is a one-to-one match between the field in the > big table and the field in the other (smaller) table. It was > something like: > > Update bigtable > set field1 = SmallTable.field6 > inner join Smalltable on bigtable.field2 = SmallTable.field7 and > BigTable.field3 = SmallTable.field8 > where BigTable.field4 = 'SomeString' > > The above updates field1 of the Bigtable with field6 from SmallTable. > The inner join involves 2 other fields in the Bigtable and and 2 other > fields in smalltable. This update is only done when field4 of the > Bigtable is 'SomeString'. Everything is indexed. I have tried the > Update using a variety of other configurations (remove the inner join, > remove the where clause, etc.) and each one failed. > > It was not run in a transaction. There wasn't any other code. > > I did not run it in the profiler because it was running too slow > already. :-) I was planning on running it through the profiler the > next time I need to do this. I have no time estimate so I will > probably have to run it for a day and then kill it. > > Thanks! > > Jim > > On Nov 27, 9:50 am, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com> > wrote: >> Jim, >> >> > The log and db files have been on separate disks since before the >> > problem occurred. >> >> Yes but are they on separate physical disks or just logical? It makes a >> big >> difference and I wasn't sure which way you have it. >> >> > I got a message >> > saying that there were40millionrecords affected. There was then a >> > timeout message below that. >> >> Interesting, what was the exact code or sequence you used for theupdate? >> Did you wrap it in a transaction or was there any other code in the >> batch? >> You can run a profiler trace at the statement level (both starting and >> completed events) for that spid to see what is happening and when in >> relation to the timeout. And I would also try changing the connection >> timeout to infinite to see if that changes the behavior. Between the two >> it >> should narrow down the issue significantly. >> >> -- >> Andrew J. Kelly SQL MVP >> Solid Quality Mentors >> >> <jimsh***@gmail.com> wrote in message >> >> news:eb71f288-ef8c-47bf-bf44-4a0e285339e7@v4g2000hsf.googlegroups.com... >> >> >> >> >> >> > Thanks! >> >> > The log and db files have been on separate disks since before the >> > problem occurred. >> >> > I agree that the hardware configuration dictates the time anupdate >> > will take. >> >> > It's down below in one of the messages so I just want to point out >> > that when I killed the largerupdatelast time, I got a message >> > saying that there were40millionrecords affected. There was then a >> > timeout message below that. These messages indicate to me that the >> >updateand the indexing completed without a problem. It also >> > indicates to me that the Hardware configuration is acceptable for a40 >> >millionrowupdate. It also indicates to me that the server is not >> > waiting around for theupdateto complete. That is how I got to >> > trying to fix the server timeout problem. The hardware is fine. The >> > timeout is not. >> >> > Thanks again! >> >> > Jim >> >> > On Nov 26, 7:27 pm, "Andrew J. Kelly" <sqlmvpnooos***@shadhawk.com> >> > wrote: >> >> > I agree with the time estimate increase you mention below. It just >> >> > seems weird that the whole thing would timeout though. >> >> >> I agree with Kevin on this as the time will definitely not be double >> >> for >> >> updating double therowswith that many. >> >> >> > The server >> >> > could do the breakup if there were allot ofrows(IMHO). >> >> >> No that would be bad. Theupdatemust be atomic across all therowsbeing >> >> updated otherwise the database would not be very well accepted in the >> >> enterprise. You can't leave it up to the Engine to determine where to >> >> break >> >> this up that is your job because it may be different for each >> >> instance. >> >> >> > Do you know if SQL 2005 or even the latest version would be better >> >> > at >> >> > large updates? >> >> >> The principles are the same. You have to understand everything that >> >> happens >> >> when you do anupdateof any size and then multiply that by x for larger >> >> ones. Each row that gets updated must be logged to the transaction log >> >> in >> >> a >> >> way that it can redo or undo in the event of issues. The larger the >> >> transaction the more overhead it will take and the more it will affect >> >> other >> >> users and other operations in the transaction log. For instance you >> >> can't >> >> truncate or reuse any portion of the log that has open transactions in >> >> it. >> >> By breaking it up into smaller transactions you dramatically reduce >> >> the >> >> exposure and overhead of each transaction. This is how it has been >> >> from >> >> day >> >> one and will be tomorrow and the day after. But again the biggest >> >> factor >> >> in >> >> how long it takes is your hardware setup. Having the data and log >> >> files >> >> on >> >> the same physical drive when trying to do large data modifications is >> >> asking >> >> for trouble. If you want fast (or reasonable) updates >> >> on40millionrowsyou >> >> need to ensure you have the proper hardware for the task. >> >> >> As for the timeouts that can be set at the connection level. Change >> >> the >> >> timeout on the connection to infinite and it should not timeout. >> >> >> -- >> >> Andrew J. Kelly SQL MVP >> >> Solid Quality Mentors >> >> >> <jimsh***@gmail.com> wrote in message >> >> >>news:f88c15e0-fbb4-45b1-86f6-7f9de0f4bb30@e25g2000prg.googlegroups.com... >> >> >> - Hide quoted text - >> >> >> - Show quoted text -- Hide quoted text - >> >> - Show quoted text - > |
|||||||||||||||||||||||