|
sql
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
SP4 reports queries blocking on themselves?Greetings.
I just installed SP4 yesterday. I've noticed that several heavy queries that worked fine in SP3a now have unusual behavior. The queries report that they are terminated as a deadlock victim; however, there aren't any DML operations running on the database at the time. When I look at the status with sp_who2, I see that the query appears to be blocked by itself: SPID 71 sleeping reporting KAGUYA 71 Jon_Test3 SELECT 904579 1404452 05/12 14:56:01 .Net SqlClient Data Provider 71 Sorry for premature posting. Here's what I meant to ask:
I've noticed that several heavy queries that worked fine in SP3a now have unusual behavior. The queries report that they are terminated as a deadlock victim; however, there aren't any DML operations running on the database at the time. When I look at the status with sp_who2, I see that the query appears to be blocked by itself: SPID Status Login BlkBy Command 71 sleeping reporting ... 71 ... SELECT sp_lock reports that the process has an exclusive lock on an extent in tempdb, but I don't see any other X locks around. Has anyone else seen anything like this? Regards, Jonathan If you haven't done it already, update statistics with full scan or run dbcc
dbreindex. Then reproduce the problem again and post the output of sp_lock and select * from sysprocesses for that spid. Adrian Show quote "Jonathan Levine" <myfoo@nospam.nospam> wrote in message news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com... > Sorry for premature posting. Here's what I meant to ask: > > I've noticed that several heavy queries that worked fine in SP3a now have > unusual behavior. The queries report that they are terminated as a > deadlock > victim; however, there aren't any DML operations running on the database > at > the time. When I look at the status with sp_who2, I see that the query > appears to be blocked by itself: > > SPID Status Login BlkBy Command > 71 sleeping reporting ... 71 ... SELECT > > sp_lock reports that the process has an exclusive lock on an extent in > tempdb, but I don't see any other X locks around. > > Has anyone else seen anything like this? > > Regards, > > Jonathan Adrian,
Thanks for your reply. I'm currently working on updating the stats for my big tables. I have had some update stats running for more than 24 hours -- is that normal? Here is an example of sp_locks and sysprocesses output for a query that took about an hour in sp3a, and is taking 5-10 times longer with sp4. I have updated the stats for all the tables in the query. sp_lock: 64 8 0 0 DB S GRANT 64 8 1332199796 0 TAB Sch-S GRANT 64 8 435532635 1 PAG 5:3926175 IS GRANT 64 8 1316199739 0 TAB Sch-S GRANT 64 8 1145823194 1 PAG 5:3999638 S GRANT 64 8 1284199625 0 TAB IS GRANT 64 8 435532635 1 KEY (7b001c56dc00) S GRANT 64 8 1333631844 0 TAB IS GRANT 64 8 1145823194 0 TAB IS GRANT 64 8 435532635 0 TAB IS GRANT 64 8 784721848 0 TAB IS GRANT 64 8 1236199454 0 TAB Sch-S GRANT 64 8 435532635 2 PAG 4:3195035 IS GRANT 64 8 1076198884 0 TAB Sch-S GRANT 64 8 1284199625 1 PAG 1:4998275 IS GRANT 64 8 1333631844 1 PAG 4:3107397 IS GRANT sysprocesses 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381 0 2005-05-14 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 KAGUYA-Cube Microsoft SQL Server Analysis Services 2104 SELECT 000D601CAB48 TCP/IP analysis 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0x010008001012822AC080EDB40000000000000000 76 -1 Appreciate your help. -- J Show quote "Adrian Zajkeskovic" wrote: > If you haven't done it already, update statistics with full scan or run dbcc > dbreindex. Then reproduce the problem again and post the output of sp_lock > and select * from sysprocesses for that spid. > > Adrian > > > > > "Jonathan Levine" <myfoo@nospam.nospam> wrote in message > news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com... > > Sorry for premature posting. Here's what I meant to ask: > > > > I've noticed that several heavy queries that worked fine in SP3a now have > > unusual behavior. The queries report that they are terminated as a > > deadlock > > victim; however, there aren't any DML operations running on the database > > at > > the time. When I look at the status with sp_who2, I see that the query > > appears to be blocked by itself: > > > > SPID Status Login BlkBy Command > > 71 sleeping reporting ... 71 ... SELECT > > > > sp_lock reports that the process has an exclusive lock on an extent in > > tempdb, but I don't see any other X locks around. > > > > Has anyone else seen anything like this? > > > > Regards, > > > > Jonathan > > > Is there any more word on this issue? We are having the same issue.
Thanks in advance. Show quote "Jonathan Levine" <myfoo@nospam.nospam> wrote in message 2005-05-14news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com... > Adrian, > > Thanks for your reply. > > I'm currently working on updating the stats for my big tables. I have had > some update stats running for more than 24 hours -- is that normal? > > Here is an example of sp_locks and sysprocesses output for a query that took > about an hour in sp3a, and is taking 5-10 times longer with sp4. I have > updated the stats for all the tables in the query. > > sp_lock: > 64 8 0 0 DB S GRANT > 64 8 1332199796 0 TAB Sch-S GRANT > 64 8 435532635 1 PAG 5:3926175 IS GRANT > 64 8 1316199739 0 TAB Sch-S GRANT > 64 8 1145823194 1 PAG 5:3999638 S GRANT > 64 8 1284199625 0 TAB IS GRANT > 64 8 435532635 1 KEY (7b001c56dc00) S GRANT > 64 8 1333631844 0 TAB IS GRANT > 64 8 1145823194 0 TAB IS GRANT > 64 8 435532635 0 TAB IS GRANT > 64 8 784721848 0 TAB IS GRANT > 64 8 1236199454 0 TAB Sch-S GRANT > 64 8 435532635 2 PAG 4:3195035 IS GRANT > 64 8 1076198884 0 TAB Sch-S GRANT > 64 8 1284199625 1 PAG 1:4998275 IS GRANT > 64 8 1333631844 1 PAG 4:3107397 IS GRANT > > sysprocesses > 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381 0 > 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping 0000000000000000000000 KAGUYA-Cube> 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000 Show quote > > Microsoft SQL Server Analysis Services > > 2104 SELECT > > > > 000D601CAB48 TCP/IP analysis > > 0x00000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000 0x010008001012822AC080EDB40000000000000000 76 -1 > > > Appreciate your help. > > -- J > "Adrian Zajkeskovic" wrote: > > > If you haven't done it already, update statistics with full scan or run dbcc > > dbreindex. Then reproduce the problem again and post the output of sp_lock > > and select * from sysprocesses for that spid. > > > > Adrian > > > > > > > > > > "Jonathan Levine" <myfoo@nospam.nospam> wrote in message > > news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com... > > > Sorry for premature posting. Here's what I meant to ask: > > > > > > I've noticed that several heavy queries that worked fine in SP3a now have > > > unusual behavior. The queries report that they are terminated as a > > > deadlock > > > victim; however, there aren't any DML operations running on the database > > > at > > > the time. When I look at the status with sp_who2, I see that the query > > > appears to be blocked by itself: > > > > > > SPID Status Login BlkBy Command > > > 71 sleeping reporting ... 71 ... SELECT > > > > > > sp_lock reports that the process has an exclusive lock on an extent in > > > tempdb, but I don't see any other X locks around. > > > > > > Has anyone else seen anything like this? > > > > > > Regards, > > > > > > Jonathan > > > > > > Are Microsoft folks no longer responding to questions in this newsgroup? I
started this thread with my MSDN-registered alias last week but still haven't heard anything from [MS] folks. Thanks for your help! -- J What you are seeing is not related to locks, rather it is related to the
latch waits you see in sysprocesses. In SP4 we are able to determine the blocking spid for latch waits when the blocking spid holds the latch in exclusive or update mode. In these cases the blocking spid information is populated into the "blocked" field in sysprocesses. So why is the spid blocked by itself? Well, that has to do with how latches are used for IO operations. When an IO is issued for page, a latch is held on the page. Depending on the IO operation, the latch mode acquired is shared (SH) or exclusive (EX). The latch for the IO is acquired by the thread that issues the IO. Since all SQL Server IO operations are asynchronous, if the spid that issued the IO wants to wait for the IO to complete it will attempt to acquire another latch on the same page after issuing the IO. The first latch is released when the IO completes. This release allows the second latch request to be granted. Here's an example of how this works: 1. Spid 55 wants to read page P1 which does not exist in the buffer pool. 2. Spid 55 acquires an EX latch on page P1 -- this marks spid 55 as owning the latch. The latch is in an in memory data structure, not the physical page itself. Since the page does not yet exist in memory the mode is EX in order to force other spids that may also want to access the page to wait for the IO to complete and also to prevent them from issueing a second IO operation for the same page. 3. Spid 55 issues the IO request to read P1 from disk. 4. Since Spid 55 wants to read the page, it must wait for the IO to complete. It does this by attempting to acquire another latch (in this case a share (SH)) latch on the page. Since the latch is already held in EX, the SH request is blocked and the spid is suspended. 5. Upon completion of the IO the EX latch on the page is released. 6. The release of the EX latch grants the SH latch to spid 55. 7. Spid 55 can now read the page. For the duration between steps 4 (the SH latch acquire) and step 5 (EX latch release) sysprocesses will indicate that spid 55 is blocked by itself with a wait type of PAGEIOLATCH_XX (where XX can be SH, UP, or EX) as an indication that it is waiting for the completion of an IO that it itself issued. -- Show quoteSanteri (Santtu) Voutilainen This posting is provided "AS IS" with no warranties, and confers no rights. "Jonathan Levine" <myfoo@nospam.nospam> wrote in message news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com... > Adrian, > > Thanks for your reply. > > I'm currently working on updating the stats for my big tables. I have > had > some update stats running for more than 24 hours -- is that normal? > > Here is an example of sp_locks and sysprocesses output for a query that > took > about an hour in sp3a, and is taking 5-10 times longer with sp4. I have > updated the stats for all the tables in the query. > > sp_lock: > 64 8 0 0 DB S GRANT > 64 8 1332199796 0 TAB Sch-S GRANT > 64 8 435532635 1 PAG 5:3926175 IS GRANT > 64 8 1316199739 0 TAB Sch-S GRANT > 64 8 1145823194 1 PAG 5:3999638 S GRANT > 64 8 1284199625 0 TAB IS GRANT > 64 8 435532635 1 KEY (7b001c56dc00) S GRANT > 64 8 1333631844 0 TAB IS GRANT > 64 8 1145823194 0 TAB IS GRANT > 64 8 435532635 0 TAB IS GRANT > 64 8 784721848 0 TAB IS GRANT > 64 8 1236199454 0 TAB Sch-S GRANT > 64 8 435532635 2 PAG 4:3195035 IS GRANT > 64 8 1076198884 0 TAB Sch-S GRANT > 64 8 1284199625 1 PAG 1:4998275 IS GRANT > 64 8 1333631844 1 PAG 4:3107397 IS GRANT > > sysprocesses > 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381 0 > 2005-05-14 > 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping > 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > KAGUYA-Cube > > Microsoft SQL Server Analysis > Services > > 2104 SELECT > > > > 000D601CAB48 TCP/IP analysis > > 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > 0x010008001012822AC080EDB40000000000000000 76 -1 > > > Appreciate your help. > > -- J > "Adrian Zajkeskovic" wrote: > >> If you haven't done it already, update statistics with full scan or run >> dbcc >> dbreindex. Then reproduce the problem again and post the output of >> sp_lock >> and select * from sysprocesses for that spid. >> >> Adrian >> >> >> >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message >> news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com... >> > Sorry for premature posting. Here's what I meant to ask: >> > >> > I've noticed that several heavy queries that worked fine in SP3a now >> > have >> > unusual behavior. The queries report that they are terminated as a >> > deadlock >> > victim; however, there aren't any DML operations running on the >> > database >> > at >> > the time. When I look at the status with sp_who2, I see that the query >> > appears to be blocked by itself: >> > >> > SPID Status Login BlkBy Command >> > 71 sleeping reporting ... 71 ... SELECT >> > >> > sp_lock reports that the process has an exclusive lock on an extent in >> > tempdb, but I don't see any other X locks around. >> > >> > Has anyone else seen anything like this? >> > >> > Regards, >> > >> > Jonathan >> >> >> Santtu,
Thanks very much for your reply. "Santeri Voutilainen [MSFT]" wrote: That information is certainly going to be quite helpful.> What you are seeing is not related to locks, rather it is related to the > latch waits you see in sysprocesses. In SP4 we are able to determine the > blocking spid for latch waits when the blocking spid holds the latch in > exclusive or update mode. In these cases the blocking spid information is > populated into the "blocked" field in sysprocesses. But what's alarming is that the queries in question seem to be running much slower than they did in SP3a. As I think I mentioned, several of my queries now take 5-10x longer than they used to. And one query that used to complete in under an hour now takes about 10-12 hours, then is terminated as a deadlock victim -- even though there shouldn't be any other queries running on the same database at the time the deadlock is detected. I'm not exactly sure what's going on. It happens on two different machines, but only on specific queries. One more clue is that the queries seem to do a lot of work in parallel fashion (with 4-10 threads working on the query). They then go to one thread that spends a lot of time in PAGEIOLATCH_SH state, and accumulate many many locks (after five hours, one query had accumulated more than a million locks). Any suggestions gladly welcomed! Regards, Jonathan Show quote > So why is the spid blocked by itself? Well, that has to do with how latches > are used for IO operations. When an IO is issued for page, a latch is held > on the page. Depending on the IO operation, the latch mode acquired is > shared (SH) or exclusive (EX). The latch for the IO is acquired by the > thread that issues the IO. Since all SQL Server IO operations are > asynchronous, if the spid that issued the IO wants to wait for the IO to > complete it will attempt to acquire another latch on the same page after > issuing the IO. The first latch is released when the IO completes. This > release allows the second latch request to be granted. > Here's an example of how this works: > 1. Spid 55 wants to read page P1 which does not exist in the buffer pool. > 2. Spid 55 acquires an EX latch on page P1 -- this marks spid 55 as owning > the latch. The latch is in an in memory data structure, not the physical > page itself. Since the page does not yet exist in memory the mode is EX in > order to force other spids that may also want to access the page to wait for > the IO to complete and also to prevent them from issueing a second IO > operation for the same page. > 3. Spid 55 issues the IO request to read P1 from disk. > 4. Since Spid 55 wants to read the page, it must wait for the IO to > complete. It does this by attempting to acquire another latch (in this case > a share (SH)) latch on the page. Since the latch is already held in EX, the > SH request is blocked and the spid is suspended. > 5. Upon completion of the IO the EX latch on the page is released. > 6. The release of the EX latch grants the SH latch to spid 55. > 7. Spid 55 can now read the page. > > For the duration between steps 4 (the SH latch acquire) and step 5 (EX latch > release) sysprocesses will indicate that spid 55 is blocked by itself with a > wait type of PAGEIOLATCH_XX (where XX can be SH, UP, or EX) as an indication > that it is waiting for the completion of an IO that it itself issued. > > -- > Santeri (Santtu) Voutilainen > > This posting is provided "AS IS" with no warranties, and confers no rights. > > > "Jonathan Levine" <myfoo@nospam.nospam> wrote in message > news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com... > > Adrian, > > > > Thanks for your reply. > > > > I'm currently working on updating the stats for my big tables. I have > > had > > some update stats running for more than 24 hours -- is that normal? > > > > Here is an example of sp_locks and sysprocesses output for a query that > > took > > about an hour in sp3a, and is taking 5-10 times longer with sp4. I have > > updated the stats for all the tables in the query. > > > > sp_lock: > > 64 8 0 0 DB S GRANT > > 64 8 1332199796 0 TAB Sch-S GRANT > > 64 8 435532635 1 PAG 5:3926175 IS GRANT > > 64 8 1316199739 0 TAB Sch-S GRANT > > 64 8 1145823194 1 PAG 5:3999638 S GRANT > > 64 8 1284199625 0 TAB IS GRANT > > 64 8 435532635 1 KEY (7b001c56dc00) S GRANT > > 64 8 1333631844 0 TAB IS GRANT > > 64 8 1145823194 0 TAB IS GRANT > > 64 8 435532635 0 TAB IS GRANT > > 64 8 784721848 0 TAB IS GRANT > > 64 8 1236199454 0 TAB Sch-S GRANT > > 64 8 435532635 2 PAG 4:3195035 IS GRANT > > 64 8 1076198884 0 TAB Sch-S GRANT > > 64 8 1284199625 1 PAG 1:4998275 IS GRANT > > 64 8 1333631844 1 PAG 4:3107397 IS GRANT > > > > sysprocesses > > 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381 0 > > 2005-05-14 > > 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping > > 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > > KAGUYA-Cube > > > > Microsoft SQL Server Analysis > > Services > > > > 2104 SELECT > > > > > > > > 000D601CAB48 TCP/IP analysis > > > > 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > > 0x010008001012822AC080EDB40000000000000000 76 -1 > > > > > > Appreciate your help. > > > > -- J > > "Adrian Zajkeskovic" wrote: > > > >> If you haven't done it already, update statistics with full scan or run > >> dbcc > >> dbreindex. Then reproduce the problem again and post the output of > >> sp_lock > >> and select * from sysprocesses for that spid. > >> > >> Adrian > >> > >> > >> > >> > >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message > >> news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com... > >> > Sorry for premature posting. Here's what I meant to ask: > >> > > >> > I've noticed that several heavy queries that worked fine in SP3a now > >> > have > >> > unusual behavior. The queries report that they are terminated as a > >> > deadlock > >> > victim; however, there aren't any DML operations running on the > >> > database > >> > at > >> > the time. When I look at the status with sp_who2, I see that the query > >> > appears to be blocked by itself: > >> > > >> > SPID Status Login BlkBy Command > >> > 71 sleeping reporting ... 71 ... SELECT > >> > > >> > sp_lock reports that the process has an exclusive lock on an extent in > >> > tempdb, but I don't see any other X locks around. > >> > > >> > Has anyone else seen anything like this? > >> > > >> > Regards, > >> > > >> > Jonathan > >> > >> > >> > > > Jonathan,
Do you happen to know whether your queries had any plan changes? The increased PAGEIOLATCH waits and increased lock accumulation could be symptoms of a plan change, for example from an index seek to a table scan, or changes in join types. -- Show quoteSanteri (Santtu) Voutilainen This posting is provided "AS IS" with no warranties, and confers no rights. "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message news:7E009F0E-C354-4975-854D-5A9A93036507@microsoft.com... > Santtu, > > Thanks very much for your reply. > > "Santeri Voutilainen [MSFT]" wrote: >> What you are seeing is not related to locks, rather it is related to the >> latch waits you see in sysprocesses. In SP4 we are able to determine the >> blocking spid for latch waits when the blocking spid holds the latch in >> exclusive or update mode. In these cases the blocking spid information >> is >> populated into the "blocked" field in sysprocesses. > > That information is certainly going to be quite helpful. > > But what's alarming is that the queries in question seem to be running > much > slower than they did in SP3a. > As I think I mentioned, several of my queries now take 5-10x longer than > they used to. And one query that > used to complete in under an hour now takes about 10-12 hours, then is > terminated as a deadlock victim -- > even though there shouldn't be any other queries running on the same > database at the time the deadlock is > detected. > > I'm not exactly sure what's going on. It happens on two different > machines, > but only on specific queries. One > more clue is that the queries seem to do a lot of work in parallel fashion > (with 4-10 threads working on the > query). They then go to one thread that spends a lot of time in > PAGEIOLATCH_SH state, and accumulate many > many locks (after five hours, one query had accumulated more than a > million > locks). > > Any suggestions gladly welcomed! > > Regards, > > Jonathan > >> So why is the spid blocked by itself? Well, that has to do with how >> latches >> are used for IO operations. When an IO is issued for page, a latch is >> held >> on the page. Depending on the IO operation, the latch mode acquired is >> shared (SH) or exclusive (EX). The latch for the IO is acquired by the >> thread that issues the IO. Since all SQL Server IO operations are >> asynchronous, if the spid that issued the IO wants to wait for the IO to >> complete it will attempt to acquire another latch on the same page after >> issuing the IO. The first latch is released when the IO completes. This >> release allows the second latch request to be granted. >> Here's an example of how this works: >> 1. Spid 55 wants to read page P1 which does not exist in the buffer pool. >> 2. Spid 55 acquires an EX latch on page P1 -- this marks spid 55 as >> owning >> the latch. The latch is in an in memory data structure, not the physical >> page itself. Since the page does not yet exist in memory the mode is EX >> in >> order to force other spids that may also want to access the page to wait >> for >> the IO to complete and also to prevent them from issueing a second IO >> operation for the same page. >> 3. Spid 55 issues the IO request to read P1 from disk. >> 4. Since Spid 55 wants to read the page, it must wait for the IO to >> complete. It does this by attempting to acquire another latch (in this >> case >> a share (SH)) latch on the page. Since the latch is already held in EX, >> the >> SH request is blocked and the spid is suspended. >> 5. Upon completion of the IO the EX latch on the page is released. >> 6. The release of the EX latch grants the SH latch to spid 55. >> 7. Spid 55 can now read the page. >> >> For the duration between steps 4 (the SH latch acquire) and step 5 (EX >> latch >> release) sysprocesses will indicate that spid 55 is blocked by itself >> with a >> wait type of PAGEIOLATCH_XX (where XX can be SH, UP, or EX) as an >> indication >> that it is waiting for the completion of an IO that it itself issued. >> >> -- >> Santeri (Santtu) Voutilainen >> >> This posting is provided "AS IS" with no warranties, and confers no >> rights. >> >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message >> news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com... >> > Adrian, >> > >> > Thanks for your reply. >> > >> > I'm currently working on updating the stats for my big tables. I have >> > had >> > some update stats running for more than 24 hours -- is that normal? >> > >> > Here is an example of sp_locks and sysprocesses output for a query that >> > took >> > about an hour in sp3a, and is taking 5-10 times longer with sp4. I >> > have >> > updated the stats for all the tables in the query. >> > >> > sp_lock: >> > 64 8 0 0 DB S GRANT >> > 64 8 1332199796 0 TAB Sch-S GRANT >> > 64 8 435532635 1 PAG 5:3926175 IS GRANT >> > 64 8 1316199739 0 TAB Sch-S GRANT >> > 64 8 1145823194 1 PAG 5:3999638 S GRANT >> > 64 8 1284199625 0 TAB IS GRANT >> > 64 8 435532635 1 KEY (7b001c56dc00) S GRANT >> > 64 8 1333631844 0 TAB IS GRANT >> > 64 8 1145823194 0 TAB IS GRANT >> > 64 8 435532635 0 TAB IS GRANT >> > 64 8 784721848 0 TAB IS GRANT >> > 64 8 1236199454 0 TAB Sch-S GRANT >> > 64 8 435532635 2 PAG 4:3195035 IS GRANT >> > 64 8 1076198884 0 TAB Sch-S GRANT >> > 64 8 1284199625 1 PAG 1:4998275 IS GRANT >> > 64 8 1333631844 1 PAG 4:3107397 IS GRANT >> > >> > sysprocesses >> > 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381 0 >> > 2005-05-14 >> > 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping >> > 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 >> > KAGUYA-Cube >> > >> > Microsoft SQL Server Analysis >> > Services >> > >> > 2104 SELECT >> > >> > >> > >> > 000D601CAB48 TCP/IP analysis >> > >> > 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 >> > 0x010008001012822AC080EDB40000000000000000 76 -1 >> > >> > >> > Appreciate your help. >> > >> > -- J >> > "Adrian Zajkeskovic" wrote: >> > >> >> If you haven't done it already, update statistics with full scan or >> >> run >> >> dbcc >> >> dbreindex. Then reproduce the problem again and post the output of >> >> sp_lock >> >> and select * from sysprocesses for that spid. >> >> >> >> Adrian >> >> >> >> >> >> >> >> >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message >> >> news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com... >> >> > Sorry for premature posting. Here's what I meant to ask: >> >> > >> >> > I've noticed that several heavy queries that worked fine in SP3a now >> >> > have >> >> > unusual behavior. The queries report that they are terminated as a >> >> > deadlock >> >> > victim; however, there aren't any DML operations running on the >> >> > database >> >> > at >> >> > the time. When I look at the status with sp_who2, I see that the >> >> > query >> >> > appears to be blocked by itself: >> >> > >> >> > SPID Status Login BlkBy Command >> >> > 71 sleeping reporting ... 71 ... SELECT >> >> > >> >> > sp_lock reports that the process has an exclusive lock on an extent >> >> > in >> >> > tempdb, but I don't see any other X locks around. >> >> > >> >> > Has anyone else seen anything like this? >> >> > >> >> > Regards, >> >> > >> >> > Jonathan >> >> >> >> >> >> >> >> >> Santu,
Thanks for your reply. "Santeri Voutilainen [MSFT]" wrote: I don't think there's been any change in plan.> Do you happen to know whether your queries had any plan changes? The > increased PAGEIOLATCH waits and increased lock accumulation could be > symptoms of a plan change, for example from an index seek to a table scan, > or changes in join types. I tried creating an indexed view to eliminate one of the joins on one of the queries. It did simplify the Estimated plan, and there are no table scans, but the query still does not complete. In the case of this query, if I limit it the query (for example SELECT TOP 100 ...) it does complete. (I estimate the result set for the full query to be on the order of hundreds of thousands of records, with some fairly straightforward GROUP BY/SUM calculations). It seems like this query ran for about 10 hours and then terminated on deadlock. I'm going to turn on the deadlock trace flags to see if I can get some more information about it. In the case of the other query, it is being run on another server during an Analysis Services cube update. Once again, I suppose the plan could have changed from SP3a to SP4, but I haven't done anything else that I would expect to make the plan change. Regards, Jonathan Show quote > > -- > Santeri (Santtu) Voutilainen > > This posting is provided "AS IS" with no warranties, and confers no rights. > > > "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message > news:7E009F0E-C354-4975-854D-5A9A93036507@microsoft.com... > > Santtu, > > > > Thanks very much for your reply. > > > > "Santeri Voutilainen [MSFT]" wrote: > >> What you are seeing is not related to locks, rather it is related to the > >> latch waits you see in sysprocesses. In SP4 we are able to determine the > >> blocking spid for latch waits when the blocking spid holds the latch in > >> exclusive or update mode. In these cases the blocking spid information > >> is > >> populated into the "blocked" field in sysprocesses. > > > > That information is certainly going to be quite helpful. > > > > But what's alarming is that the queries in question seem to be running > > much > > slower than they did in SP3a. > > As I think I mentioned, several of my queries now take 5-10x longer than > > they used to. And one query that > > used to complete in under an hour now takes about 10-12 hours, then is > > terminated as a deadlock victim -- > > even though there shouldn't be any other queries running on the same > > database at the time the deadlock is > > detected. > > > > I'm not exactly sure what's going on. It happens on two different > > machines, > > but only on specific queries. One > > more clue is that the queries seem to do a lot of work in parallel fashion > > (with 4-10 threads working on the > > query). They then go to one thread that spends a lot of time in > > PAGEIOLATCH_SH state, and accumulate many > > many locks (after five hours, one query had accumulated more than a > > million > > locks). > > > > Any suggestions gladly welcomed! > > > > Regards, > > > > Jonathan > > > >> So why is the spid blocked by itself? Well, that has to do with how > >> latches > >> are used for IO operations. When an IO is issued for page, a latch is > >> held > >> on the page. Depending on the IO operation, the latch mode acquired is > >> shared (SH) or exclusive (EX). The latch for the IO is acquired by the > >> thread that issues the IO. Since all SQL Server IO operations are > >> asynchronous, if the spid that issued the IO wants to wait for the IO to > >> complete it will attempt to acquire another latch on the same page after > >> issuing the IO. The first latch is released when the IO completes. This > >> release allows the second latch request to be granted. > >> Here's an example of how this works: > >> 1. Spid 55 wants to read page P1 which does not exist in the buffer pool. > >> 2. Spid 55 acquires an EX latch on page P1 -- this marks spid 55 as > >> owning > >> the latch. The latch is in an in memory data structure, not the physical > >> page itself. Since the page does not yet exist in memory the mode is EX > >> in > >> order to force other spids that may also want to access the page to wait > >> for > >> the IO to complete and also to prevent them from issueing a second IO > >> operation for the same page. > >> 3. Spid 55 issues the IO request to read P1 from disk. > >> 4. Since Spid 55 wants to read the page, it must wait for the IO to > >> complete. It does this by attempting to acquire another latch (in this > >> case > >> a share (SH)) latch on the page. Since the latch is already held in EX, > >> the > >> SH request is blocked and the spid is suspended. > >> 5. Upon completion of the IO the EX latch on the page is released. > >> 6. The release of the EX latch grants the SH latch to spid 55. > >> 7. Spid 55 can now read the page. > >> > >> For the duration between steps 4 (the SH latch acquire) and step 5 (EX > >> latch > >> release) sysprocesses will indicate that spid 55 is blocked by itself > >> with a > >> wait type of PAGEIOLATCH_XX (where XX can be SH, UP, or EX) as an > >> indication > >> that it is waiting for the completion of an IO that it itself issued. > >> > >> -- > >> Santeri (Santtu) Voutilainen > >> > >> This posting is provided "AS IS" with no warranties, and confers no > >> rights. > >> > >> > >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message > >> news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com... > >> > Adrian, > >> > > >> > Thanks for your reply. > >> > > >> > I'm currently working on updating the stats for my big tables. I have > >> > had > >> > some update stats running for more than 24 hours -- is that normal? > >> > > >> > Here is an example of sp_locks and sysprocesses output for a query that > >> > took > >> > about an hour in sp3a, and is taking 5-10 times longer with sp4. I > >> > have > >> > updated the stats for all the tables in the query. > >> > > >> > sp_lock: > >> > 64 8 0 0 DB S GRANT > >> > 64 8 1332199796 0 TAB Sch-S GRANT > >> > 64 8 435532635 1 PAG 5:3926175 IS GRANT > >> > 64 8 1316199739 0 TAB Sch-S GRANT > >> > 64 8 1145823194 1 PAG 5:3999638 S GRANT > >> > 64 8 1284199625 0 TAB IS GRANT > >> > 64 8 435532635 1 KEY (7b001c56dc00) S GRANT > >> > 64 8 1333631844 0 TAB IS GRANT > >> > 64 8 1145823194 0 TAB IS GRANT > >> > 64 8 435532635 0 TAB IS GRANT > >> > 64 8 784721848 0 TAB IS GRANT > >> > 64 8 1236199454 0 TAB Sch-S GRANT > >> > 64 8 435532635 2 PAG 4:3195035 IS GRANT > >> > 64 8 1076198884 0 TAB Sch-S GRANT > >> > 64 8 1284199625 1 PAG 1:4998275 IS GRANT > >> > 64 8 1333631844 1 PAG 4:3107397 IS GRANT > >> > > >> > sysprocesses > >> > 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381 0 > >> > 2005-05-14 > >> > 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping > >> > 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > >> > KAGUYA-Cube > >> > > >> > Microsoft SQL Server Analysis > >> > Services > >> > > >> > 2104 SELECT > >> > > >> > > >> > > >> > 000D601CAB48 TCP/IP analysis > >> > > >> > 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > >> > 0x010008001012822AC080EDB40000000000000000 76 -1 > >> > > >> > > >> > Appreciate your help. > >> > > >> > -- J > >> > "Adrian Zajkeskovic" wrote: > >> > > >> >> If you haven't done it already, update statistics with full scan or > >> >> run > >> >> dbcc > >> >> dbreindex. Then reproduce the problem again and post the output of > >> >> sp_lock > >> >> and select * from sysprocesses for that spid. > >> >> > >> >> Adrian > >> >> > >> >> > >> >> > >> >> > >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message > >> >> news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com... > >> >> > Sorry for premature posting. Here's what I meant to ask: > >> >> > > >> >> > I've noticed that several heavy queries that worked fine in SP3a now > >> >> > have > >> >> > unusual behavior. The queries report that they are terminated as a > >> >> > deadlock > >> >> > victim; however, there aren't any DML operations running on the > >> >> > database > >> >> > at > >> >> > the time. When I look at the status with sp_who2, I see that the > >> >> > query > >> >> > appears to be blocked by itself: > >> >> > > >> >> > SPID Status Login BlkBy Command > >> >> > 71 sleeping reporting ... 71 ... SELECT > >> >> > > >> >> > sp_lock reports that the process has an exclusive lock on an extent > >> >> > in > >> >> > tempdb, but I don't see any other X locks around. > >> >> > > >> >> > Has anyone else seen anything like this? > >> >> > > >> >> > Regards, > >> >> > > >> >> > Jonathan > >> >> > >> >> > >> >> > >> > >> > >> > > > Jonathan,
To help track this down, could you turn on statistics profile (set statistics profile on), run your query, and send me the statistics profile output? You can turn the statistics profile off after running the query. Do you have you auto stats enabled? If not, do you regular update statistics? Thanks, Santeri (Santtu) Voutilainen This posting is provided "AS IS" with no warranties, and confers no rights. Show quote "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message news:B8769300-13CC-4E09-8CE4-672585525EFC@microsoft.com... > Santu, > > Thanks for your reply. > > "Santeri Voutilainen [MSFT]" wrote: >> Do you happen to know whether your queries had any plan changes? The >> increased PAGEIOLATCH waits and increased lock accumulation could be >> symptoms of a plan change, for example from an index seek to a table >> scan, >> or changes in join types. > > I don't think there's been any change in plan. > > I tried creating an indexed view to eliminate one of the joins on one of > the > queries. It did simplify the Estimated plan, > and there are no table scans, but the query still does not complete. > > In the case of this query, if I limit it the query (for example SELECT TOP > 100 ...) it does complete. (I estimate the result set for the full query > to > be > on the order of hundreds of thousands of records, with some fairly > straightforward GROUP BY/SUM calculations). > > It seems like this query ran for about 10 hours and then terminated on > deadlock. I'm going to turn on the deadlock trace flags > to see if I can get some more information about it. > > In the case of the other query, it is being run on another server during > an > Analysis Services cube update. Once again, > I suppose the plan could have changed from SP3a to SP4, but I haven't done > anything else that I would expect to make > the plan change. > > Regards, > > Jonathan > >> >> -- >> Santeri (Santtu) Voutilainen >> >> This posting is provided "AS IS" with no warranties, and confers no >> rights. >> >> >> "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message >> news:7E009F0E-C354-4975-854D-5A9A93036507@microsoft.com... >> > Santtu, >> > >> > Thanks very much for your reply. >> > >> > "Santeri Voutilainen [MSFT]" wrote: >> >> What you are seeing is not related to locks, rather it is related to >> >> the >> >> latch waits you see in sysprocesses. In SP4 we are able to determine >> >> the >> >> blocking spid for latch waits when the blocking spid holds the latch >> >> in >> >> exclusive or update mode. In these cases the blocking spid >> >> information >> >> is >> >> populated into the "blocked" field in sysprocesses. >> > >> > That information is certainly going to be quite helpful. >> > >> > But what's alarming is that the queries in question seem to be running >> > much >> > slower than they did in SP3a. >> > As I think I mentioned, several of my queries now take 5-10x longer >> > than >> > they used to. And one query that >> > used to complete in under an hour now takes about 10-12 hours, then is >> > terminated as a deadlock victim -- >> > even though there shouldn't be any other queries running on the same >> > database at the time the deadlock is >> > detected. >> > >> > I'm not exactly sure what's going on. It happens on two different >> > machines, >> > but only on specific queries. One >> > more clue is that the queries seem to do a lot of work in parallel >> > fashion >> > (with 4-10 threads working on the >> > query). They then go to one thread that spends a lot of time in >> > PAGEIOLATCH_SH state, and accumulate many >> > many locks (after five hours, one query had accumulated more than a >> > million >> > locks). >> > >> > Any suggestions gladly welcomed! >> > >> > Regards, >> > >> > Jonathan >> > >> >> So why is the spid blocked by itself? Well, that has to do with how >> >> latches >> >> are used for IO operations. When an IO is issued for page, a latch is >> >> held >> >> on the page. Depending on the IO operation, the latch mode acquired >> >> is >> >> shared (SH) or exclusive (EX). The latch for the IO is acquired by >> >> the >> >> thread that issues the IO. Since all SQL Server IO operations are >> >> asynchronous, if the spid that issued the IO wants to wait for the IO >> >> to >> >> complete it will attempt to acquire another latch on the same page >> >> after >> >> issuing the IO. The first latch is released when the IO completes. >> >> This >> >> release allows the second latch request to be granted. >> >> Here's an example of how this works: >> >> 1. Spid 55 wants to read page P1 which does not exist in the buffer >> >> pool. >> >> 2. Spid 55 acquires an EX latch on page P1 -- this marks spid 55 as >> >> owning >> >> the latch. The latch is in an in memory data structure, not the >> >> physical >> >> page itself. Since the page does not yet exist in memory the mode is >> >> EX >> >> in >> >> order to force other spids that may also want to access the page to >> >> wait >> >> for >> >> the IO to complete and also to prevent them from issueing a second IO >> >> operation for the same page. >> >> 3. Spid 55 issues the IO request to read P1 from disk. >> >> 4. Since Spid 55 wants to read the page, it must wait for the IO to >> >> complete. It does this by attempting to acquire another latch (in >> >> this >> >> case >> >> a share (SH)) latch on the page. Since the latch is already held in >> >> EX, >> >> the >> >> SH request is blocked and the spid is suspended. >> >> 5. Upon completion of the IO the EX latch on the page is released. >> >> 6. The release of the EX latch grants the SH latch to spid 55. >> >> 7. Spid 55 can now read the page. >> >> >> >> For the duration between steps 4 (the SH latch acquire) and step 5 (EX >> >> latch >> >> release) sysprocesses will indicate that spid 55 is blocked by itself >> >> with a >> >> wait type of PAGEIOLATCH_XX (where XX can be SH, UP, or EX) as an >> >> indication >> >> that it is waiting for the completion of an IO that it itself issued. >> >> >> >> -- >> >> Santeri (Santtu) Voutilainen >> >> >> >> This posting is provided "AS IS" with no warranties, and confers no >> >> rights. >> >> >> >> >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message >> >> news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com... >> >> > Adrian, >> >> > >> >> > Thanks for your reply. >> >> > >> >> > I'm currently working on updating the stats for my big tables. I >> >> > have >> >> > had >> >> > some update stats running for more than 24 hours -- is that normal? >> >> > >> >> > Here is an example of sp_locks and sysprocesses output for a query >> >> > that >> >> > took >> >> > about an hour in sp3a, and is taking 5-10 times longer with sp4. I >> >> > have >> >> > updated the stats for all the tables in the query. >> >> > >> >> > sp_lock: >> >> > 64 8 0 0 DB S GRANT >> >> > 64 8 1332199796 0 TAB Sch-S GRANT >> >> > 64 8 435532635 1 PAG 5:3926175 IS GRANT >> >> > 64 8 1316199739 0 TAB Sch-S GRANT >> >> > 64 8 1145823194 1 PAG 5:3999638 S GRANT >> >> > 64 8 1284199625 0 TAB IS GRANT >> >> > 64 8 435532635 1 KEY (7b001c56dc00) S GRANT >> >> > 64 8 1333631844 0 TAB IS GRANT >> >> > 64 8 1145823194 0 TAB IS GRANT >> >> > 64 8 435532635 0 TAB IS GRANT >> >> > 64 8 784721848 0 TAB IS GRANT >> >> > 64 8 1236199454 0 TAB Sch-S GRANT >> >> > 64 8 435532635 2 PAG 4:3195035 IS GRANT >> >> > 64 8 1076198884 0 TAB Sch-S GRANT >> >> > 64 8 1284199625 1 PAG 1:4998275 IS GRANT >> >> > 64 8 1333631844 1 PAG 4:3107397 IS GRANT >> >> > >> >> > sysprocesses >> >> > 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381 >> >> > 0 >> >> > 2005-05-14 >> >> > 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping >> >> > 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 >> >> > KAGUYA-Cube >> >> > >> >> > Microsoft SQL Server Analysis >> >> > Services >> >> > >> >> > 2104 SELECT >> >> > >> >> > >> >> > >> >> > 000D601CAB48 TCP/IP analysis >> >> > >> >> > 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 >> >> > 0x010008001012822AC080EDB40000000000000000 76 -1 >> >> > >> >> > >> >> > Appreciate your help. >> >> > >> >> > -- J >> >> > "Adrian Zajkeskovic" wrote: >> >> > >> >> >> If you haven't done it already, update statistics with full scan or >> >> >> run >> >> >> dbcc >> >> >> dbreindex. Then reproduce the problem again and post the output of >> >> >> sp_lock >> >> >> and select * from sysprocesses for that spid. >> >> >> >> >> >> Adrian >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message >> >> >> news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com... >> >> >> > Sorry for premature posting. Here's what I meant to ask: >> >> >> > >> >> >> > I've noticed that several heavy queries that worked fine in SP3a >> >> >> > now >> >> >> > have >> >> >> > unusual behavior. The queries report that they are terminated as >> >> >> > a >> >> >> > deadlock >> >> >> > victim; however, there aren't any DML operations running on the >> >> >> > database >> >> >> > at >> >> >> > the time. When I look at the status with sp_who2, I see that the >> >> >> > query >> >> >> > appears to be blocked by itself: >> >> >> > >> >> >> > SPID Status Login BlkBy Command >> >> >> > 71 sleeping reporting ... 71 ... SELECT >> >> >> > >> >> >> > sp_lock reports that the process has an exclusive lock on an >> >> >> > extent >> >> >> > in >> >> >> > tempdb, but I don't see any other X locks around. >> >> >> > >> >> >> > Has anyone else seen anything like this? >> >> >> > >> >> >> > Regards, >> >> >> > >> >> >> > Jonathan >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Santtu,
Thanks for your reply. "Santeri Voutilainen [MSFT]" wrote: I'll do my best. The problem is that the main query I'm having trouble with > To help track this down, could you turn on statistics profile (set > statistics profile on), run your query, and send me the statistics profile > output? You can turn the statistics profile off after running the query. typically no longer runs to completion. The query is input to a Report Services daily report based on data loaded daily. The query used to finish in about 45 minutes (SP3a), now it can run for up to about 24 hours, at which point it deadlocks on the daily INSERT and is terminated. Another query I'm having trouble with is done by Analysis Services to refresh a partition. This one used to take about 30 minutes, now it can take up to 4 hours. The problem with this one is that the output is pretty big, so I'm not sure how long it will take to run the query in Query Analyzer. One interesting thing: If I run the Report Services query as SELECT TOP n (I've tried with n as large as 10000), it finishes really quickly. This is pretty strange to me, as most of the "heavy" work of the query is done in two sub-selects which are later joined together. > Do you have you auto stats enabled? If not, do you regular update I update statistics every day, and I automatically REINDEX when > statistics? fragmentation reaches about 10%. Regards, Jonathan Show quote > > Thanks, > Santeri (Santtu) Voutilainen > > This posting is provided "AS IS" with no warranties, and confers no rights. > > > "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message > news:B8769300-13CC-4E09-8CE4-672585525EFC@microsoft.com... > > Santu, > > > > Thanks for your reply. > > > > "Santeri Voutilainen [MSFT]" wrote: > >> Do you happen to know whether your queries had any plan changes? The > >> increased PAGEIOLATCH waits and increased lock accumulation could be > >> symptoms of a plan change, for example from an index seek to a table > >> scan, > >> or changes in join types. > > > > I don't think there's been any change in plan. > > > > I tried creating an indexed view to eliminate one of the joins on one of > > the > > queries. It did simplify the Estimated plan, > > and there are no table scans, but the query still does not complete. > > > > In the case of this query, if I limit it the query (for example SELECT TOP > > 100 ...) it does complete. (I estimate the result set for the full query > > to > > be > > on the order of hundreds of thousands of records, with some fairly > > straightforward GROUP BY/SUM calculations). > > > > It seems like this query ran for about 10 hours and then terminated on > > deadlock. I'm going to turn on the deadlock trace flags > > to see if I can get some more information about it. > > > > In the case of the other query, it is being run on another server during > > an > > Analysis Services cube update. Once again, > > I suppose the plan could have changed from SP3a to SP4, but I haven't done > > anything else that I would expect to make > > the plan change. > > > > Regards, > > > > Jonathan > > > >> > >> -- > >> Santeri (Santtu) Voutilainen > >> > >> This posting is provided "AS IS" with no warranties, and confers no > >> rights. > >> > >> > >> "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message > >> news:7E009F0E-C354-4975-854D-5A9A93036507@microsoft.com... > >> > Santtu, > >> > > >> > Thanks very much for your reply. > >> > > >> > "Santeri Voutilainen [MSFT]" wrote: > >> >> What you are seeing is not related to locks, rather it is related to > >> >> the > >> >> latch waits you see in sysprocesses. In SP4 we are able to determine > >> >> the > >> >> blocking spid for latch waits when the blocking spid holds the latch > >> >> in > >> >> exclusive or update mode. In these cases the blocking spid > >> >> information > >> >> is > >> >> populated into the "blocked" field in sysprocesses. > >> > > >> > That information is certainly going to be quite helpful. > >> > > >> > But what's alarming is that the queries in question seem to be running > >> > much > >> > slower than they did in SP3a. > >> > As I think I mentioned, several of my queries now take 5-10x longer > >> > than > >> > they used to. And one query that > >> > used to complete in under an hour now takes about 10-12 hours, then is > >> > terminated as a deadlock victim -- > >> > even though there shouldn't be any other queries running on the same > >> > database at the time the deadlock is > >> > detected. > >> > > >> > I'm not exactly sure what's going on. It happens on two different > >> > machines, > >> > but only on specific queries. One > >> > more clue is that the queries seem to do a lot of work in parallel > >> > fashion > >> > (with 4-10 threads working on the > >> > query). They then go to one thread that spends a lot of time in > >> > PAGEIOLATCH_SH state, and accumulate many > >> > many locks (after five hours, one query had accumulated more than a > >> > million > >> > locks). > >> > > >> > Any suggestions gladly welcomed! > >> > > >> > Regards, > >> > > >> > Jonathan > >> > > >> >> So why is the spid blocked by itself? Well, that has to do with how > >> >> latches > >> >> are used for IO operations. When an IO is issued for page, a latch is > >> >> held > >> >> on the page. Depending on the IO operation, the latch mode acquired > >> >> is > >> >> shared (SH) or exclusive (EX). The latch for the IO is acquired by > >> >> the > >> >> thread that issues the IO. Since all SQL Server IO operations are > >> >> asynchronous, if the spid that issued the IO wants to wait for the IO > >> >> to > >> >> complete it will attempt to acquire another latch on the same page > >> >> after > >> >> issuing the IO. The first latch is released when the IO completes. > >> >> This > >> >> release allows the second latch request to be granted. > >> >> Here's an example of how this works: > >> >> 1. Spid 55 wants to read page P1 which does not exist in the buffer > >> >> pool. > >> >> 2. Spid 55 acquires an EX latch on page P1 -- this marks spid 55 as > >> >> owning > >> >> the latch. The latch is in an in memory data structure, not the > >> >> physical > >> >> page itself. Since the page does not yet exist in memory the mode is > >> >> EX > >> >> in > >> >> order to force other spids that may also want to access the page to > >> >> wait > >> >> for > >> >> the IO to complete and also to prevent them from issueing a second IO > >> >> operation for the same page. > >> >> 3. Spid 55 issues the IO request to read P1 from disk. > >> >> 4. Since Spid 55 wants to read the page, it must wait for the IO to > >> >> complete. It does this by attempting to acquire another latch (in > >> >> this > >> >> case > >> >> a share (SH)) latch on the page. Since the latch is already held in > >> >> EX, > >> >> the > >> >> SH request is blocked and the spid is suspended. > >> >> 5. Upon completion of the IO the EX latch on the page is released. > >> >> 6. The release of the EX latch grants the SH latch to spid 55. > >> >> 7. Spid 55 can now read the page. > >> >> > >> >> For the duration between steps 4 (the SH latch acquire) and step 5 (EX > >> >> latch > >> >> release) sysprocesses will indicate that spid 55 is blocked by itself > >> >> with a > >> >> wait type of PAGEIOLATCH_XX (where XX can be SH, UP, or EX) as an > >> >> indication > >> >> that it is waiting for the completion of an IO that it itself issued. > >> >> > >> >> -- > >> >> Santeri (Santtu) Voutilainen > >> >> > >> >> This posting is provided "AS IS" with no warranties, and confers no > >> >> rights. > >> >> > >> >> > >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message > >> >> news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com... > >> >> > Adrian, > >> >> > > >> >> > Thanks for your reply. > >> >> > > >> >> > I'm currently working on updating the stats for my big tables. I > >> >> > have > >> >> > had > >> >> > some update stats running for more than 24 hours -- is that normal? > >> >> > > >> >> > Here is an example of sp_locks and sysprocesses output for a query > >> >> > that > >> >> > took > >> >> > about an hour in sp3a, and is taking 5-10 times longer with sp4. I > >> >> > have > >> >> > updated the stats for all the tables in the query. > >> >> > > >> >> > sp_lock: > >> >> > 64 8 0 0 DB S GRANT > >> >> > 64 8 1332199796 0 TAB Sch-S GRANT > >> >> > 64 8 435532635 1 PAG 5:3926175 IS GRANT > >> >> > 64 8 1316199739 0 TAB Sch-S GRANT > >> >> > 64 8 1145823194 1 PAG 5:3999638 S GRANT > >> >> > 64 8 1284199625 0 TAB IS GRANT > >> >> > 64 8 435532635 1 KEY (7b001c56dc00) S GRANT > >> >> > 64 8 1333631844 0 TAB IS GRANT > >> >> > 64 8 1145823194 0 TAB IS GRANT > >> >> > 64 8 435532635 0 TAB IS GRANT > >> >> > 64 8 784721848 0 TAB IS GRANT > >> >> > 64 8 1236199454 0 TAB Sch-S GRANT > >> >> > 64 8 435532635 2 PAG 4:3195035 IS GRANT > >> >> > 64 8 1076198884 0 TAB Sch-S GRANT > >> >> > 64 8 1284199625 1 PAG 1:4998275 IS GRANT > >> >> > 64 8 1333631844 1 PAG 4:3107397 IS GRANT > >> >> > > >> >> > sysprocesses > >> >> > 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381 > >> >> > 0 > >> >> > 2005-05-14 > >> >> > 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping > >> >> > 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > >> >> > KAGUYA-Cube > >> >> > > >> >> > Microsoft SQL Server Analysis > >> >> > Services > >> >> > > >> >> > 2104 SELECT > >> >> > > >> >> > > >> >> > > >> >> > 000D601CAB48 TCP/IP analysis > >> >> > > >> >> > 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > >> >> > 0x010008001012822AC080EDB40000000000000000 76 -1 > >> >> > > >> >> > > >> >> > Appreciate your help. > >> >> > > >> >> > -- J > >> >> > "Adrian Zajkeskovic" wrote: > >> >> > > >> >> >> If you haven't done it already, update statistics with full scan or > >> >> >> run > >> >> >> dbcc > >> >> >> dbreindex. Then reproduce the problem again and post the output of > >> >> >> sp_lock > >> >> >> and select * from sysprocesses for that spid. > >> >> >> > >> >> >> Adrian > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message > >> >> >> news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com... > >> >> >> > Sorry for premature posting. Here's what I meant to ask: > >> >> >> > > >> >> >> > I've noticed that several heavy queries that worked fine in SP3a > >> >> >> > now > >> >> >> > have > >> >> >> > unusual behavior. The queries report that they are terminated as > >> >> >> > a > >> >> >> > deadlock > >> >> >> > victim; however, there aren't any DML operations running on the > >> >> >> > database > >> >> >> > at > >> >> >> > the time. When I look at the status with sp_who2, I see that the > >> >> >> > query > >> >> >> > appears to be blocked by itself: > >> >> >> > > >> >> >> > SPID Status Login BlkBy Command > >> >> >> > 71 sleeping reporting ... 71 ... SELECT > >> >> >> > > >> >> >> > sp_lock reports that the process has an exclusive lock on an > >> >> >> > extent > >> >> >> > in > >> >> >> > tempdb, but I don't see any other X locks around. > >> >> >> > > >> >> >> > Has anyone else seen anything like this? > >> >> >> > > >> >> >> > Regards, > >> >> >> > > >> >> >> > Jonathan > >> >> >> > >> >> >> > >> >> >> > >> >> > >> >> > >> >> > >> > >> > >> > > > Santtu,
Thanks for your reply via email to my statistics output. Here's something else interesting. If I add a normal index on the columns DateTable.year, DateTable.month and DateTable.quarter, then the SP4 query plan for: SELECT "dbo"."PurchaseCubeView_2"."Gender", "dbo"."PurchaseCubeView_2"."Age", "dbo"."DateView"."day", "dbo"."PurchaseCubeView_2"."genre_id", "dbo"."ItemView"."shop_item_id", "dbo"."PurchaseCubeView_2"."units", "dbo"."PurchaseCubeView_2"."price", "dbo"."PurchaseCubeView_2"."price"*"dbo"."PurchaseCubeView_2"."units" FROM "dbo"."PurchaseCubeView_2", "dbo"."DateView", "dbo"."ItemView" WHERE (reg_date >= '2005-5-1' and reg_date < '2005-6-1') AND ("dbo"."DateView"."month"=5) AND ("dbo"."PurchaseCubeView_2"."reg_date"="dbo"."DateView"."date_to_hour") AND ("dbo"."DateView"."quarter"=2) AND ("dbo"."DateView"."year"=2005) AND ("dbo"."ItemView"."shop_id"="dbo"."PurchaseCubeView_2"."shop_id") AND ("dbo"."ItemView"."item_id"="dbo"."PurchaseCubeView_2"."item_id") is much, much worse than the plan for SELECT "dbo"."PurchaseCubeView_2"."Gender", "dbo"."PurchaseCubeView_2"."Age", "dbo"."DateView"."day", "dbo"."PurchaseCubeView_2"."genre_id", "dbo"."ItemView"."shop_item_id", "dbo"."PurchaseCubeView_2"."units", "dbo"."PurchaseCubeView_2"."price", "dbo"."PurchaseCubeView_2"."price"*"dbo"."PurchaseCubeView_2"."units" FROM "dbo"."PurchaseCubeView_2", "dbo"."DateView", "dbo"."ItemView" WHERE (reg_date >= '2005-5-1' and reg_date < '2005-6-1') AND (date_to_hour >= '2005-5-1' and date_to_hour < '2005-6-1') AND ("dbo"."DateView"."month"=5) AND ("dbo"."PurchaseCubeView_2"."reg_date"="dbo"."DateView"."date_to_hour") AND ("dbo"."DateView"."quarter"=2) AND ("dbo"."DateView"."year"=2005) AND ("dbo"."ItemView"."shop_id"="dbo"."PurchaseCubeView_2"."shop_id") AND ("dbo"."ItemView"."item_id"="dbo"."PurchaseCubeView_2"."item_id") There is a clustered index on DateTable.date_to_hour (the underlying table for the view) The first query (without (date_to_hour >= '2005-5-1' and date_to_hour < '2005-6-1')) seems to scan the entire DateTable clustered index (about 3 million rows). The second query (with (date_to_hour >= '2005-5-1' and date_to_hour < '2005-6-1')) does an index seek on the non-clustered index (about 500 rows). I think this is the problem. Seems like a regression in SP4? I can provide more details if you'd like. Regards, Jonathan Show quote "Jonathan Levine" wrote: > Santtu, > > Thanks for your reply. > > "Santeri Voutilainen [MSFT]" wrote: > > To help track this down, could you turn on statistics profile (set > > statistics profile on), run your query, and send me the statistics profile > > output? You can turn the statistics profile off after running the query. > > I'll do my best. The problem is that the main query I'm having trouble with > typically > no longer runs to completion. The query is input to a Report Services daily > report based > on data loaded daily. The query used to finish in about 45 minutes (SP3a), > now it can > run for up to about 24 hours, at which point it deadlocks on the daily > INSERT and is > terminated. > > Another query I'm having trouble with is done by Analysis Services to > refresh a partition. > This one used to take about 30 minutes, now it can take up to 4 hours. The > problem with > this one is that the output is pretty big, so I'm not sure how long it will > take to run the > query in Query Analyzer. > > One interesting thing: If I run the Report Services query as SELECT TOP n > (I've tried with n as > large as 10000), it finishes really quickly. This is pretty strange to me, > as most of the > "heavy" work of the query is done in two sub-selects which are later joined > together. > > > Do you have you auto stats enabled? If not, do you regular update > > statistics? > > I update statistics every day, and I automatically REINDEX when > fragmentation reaches about > 10%. > > Regards, > > Jonathan > > > > Thanks, > > Santeri (Santtu) Voutilainen > > > > This posting is provided "AS IS" with no warranties, and confers no rights. > > > > > > "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message > > news:B8769300-13CC-4E09-8CE4-672585525EFC@microsoft.com... > > > Santu, > > > > > > Thanks for your reply. > > > > > > "Santeri Voutilainen [MSFT]" wrote: > > >> Do you happen to know whether your queries had any plan changes? The > > >> increased PAGEIOLATCH waits and increased lock accumulation could be > > >> symptoms of a plan change, for example from an index seek to a table > > >> scan, > > >> or changes in join types. > > > > > > I don't think there's been any change in plan. > > > > > > I tried creating an indexed view to eliminate one of the joins on one of > > > the > > > queries. It did simplify the Estimated plan, > > > and there are no table scans, but the query still does not complete. > > > > > > In the case of this query, if I limit it the query (for example SELECT TOP > > > 100 ...) it does complete. (I estimate the result set for the full query > > > to > > > be > > > on the order of hundreds of thousands of records, with some fairly > > > straightforward GROUP BY/SUM calculations). > > > > > > It seems like this query ran for about 10 hours and then terminated on > > > deadlock. I'm going to turn on the deadlock trace flags > > > to see if I can get some more information about it. > > > > > > In the case of the other query, it is being run on another server during > > > an > > > Analysis Services cube update. Once again, > > > I suppose the plan could have changed from SP3a to SP4, but I haven't done > > > anything else that I would expect to make > > > the plan change. > > > > > > Regards, > > > > > > Jonathan > > > > > >> > > >> -- > > >> Santeri (Santtu) Voutilainen > > >> > > >> This posting is provided "AS IS" with no warranties, and confers no > > >> rights. > > >> > > >> > > >> "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message > > >> news:7E009F0E-C354-4975-854D-5A9A93036507@microsoft.com... > > >> > Santtu, > > >> > > > >> > Thanks very much for your reply. > > >> > > > >> > "Santeri Voutilainen [MSFT]" wrote: > > >> >> What you are seeing is not related to locks, rather it is related to > > >> >> the > > >> >> latch waits you see in sysprocesses. In SP4 we are able to determine > > >> >> the > > >> >> blocking spid for latch waits when the blocking spid holds the latch > > >> >> in > > >> >> exclusive or update mode. In these cases the blocking spid > > >> >> information > > >> >> is > > >> >> populated into the "blocked" field in sysprocesses. > > >> > > > >> > That information is certainly going to be quite helpful. > > >> > > > >> > But what's alarming is that the queries in question seem to be running > > >> > much > > >> > slower than they did in SP3a. > > >> > As I think I mentioned, several of my queries now take 5-10x longer > > >> > than > > >> > they used to. And one query that > > >> > used to complete in under an hour now takes about 10-12 hours, then is > > >> > terminated as a deadlock victim -- > > >> > even though there shouldn't be any other queries running on the same > > >> > database at the time the deadlock is > > >> > detected. > > >> > > > >> > I'm not exactly sure what's going on. It happens on two different > > >> > machines, > > >> > but only on specific queries. One > > >> > more clue is that the queries seem to do a lot of work in parallel > > >> > fashion > > >> > (with 4-10 threads working on the > > >> > query). They then go to one thread that spends a lot of time in > > >> > PAGEIOLATCH_SH state, and accumulate many > > >> > many locks (after five hours, one query had accumulated more than a > > >> > million > > >> > locks). > > >> > > > >> > Any suggestions gladly welcomed! > > >> > > > >> > Regards, > > >> > > > >> > Jonathan > > >> > > > >> >> So why is the spid blocked by itself? Well, that has to do with how > > >> >> latches > > >> >> are used for IO operations. When an IO is issued for page, a latch is > > >> >> held > > >> >> on the page. Depending on the IO operation, the latch mode acquired > > >> >> is > > >> >> shared (SH) or exclusive (EX). The latch for the IO is acquired by > > >> >> the > > >> >> thread that issues the IO. Since all SQL Server IO operations are > > >> >> asynchronous, if the spid that issued the IO wants to wait for the IO > > >> >> to > > >> >> complete it will attempt to acquire another latch on the same page > > >> >> after > > >> >> issuing the IO. The first latch is released when the IO completes. > > >> >> This > > >> >> release allows the second latch request to be granted. > > >> >> Here's an example of how this works: > > >> >> 1. Spid 55 wants to read page P1 which does not exist in the buffer > > >> >> pool. > > >> >> 2. Spid 55 acquires an EX latch on page P1 -- this marks spid 55 as > > >> >> owning > > >> >> the latch. The latch is in an in memory data structure, not the > > >> >> physical > > >> >> page itself. Since the page does not yet exist in memory the mode is > > >> >> EX > > >> >> in > > >> >> order to force other spids that may also want to access the page to > > >> >> wait > > >> >> for > > >> >> the IO to complete and also to prevent them from issueing a second IO > > >> >> operation for the same page. > > >> >> 3. Spid 55 issues the IO request to read P1 from disk. > > >> >> 4. Since Spid 55 wants to read the page, it must wait for the IO to > > >> >> complete. It does this by attempting to acquire another latch (in > > >> >> this > > >> >> case > > >> >> a share (SH)) latch on the page. Since the latch is already held in > > >> >> EX, > > >> >> the > > >> >> SH request is blocked and the spid is suspended. > > >> >> 5. Upon completion of the IO the EX latch on the page is released. > > >> >> 6. The release of the EX latch grants the SH latch to spid 55. > > >> >> 7. Spid 55 can now read the page. > > >> >> > > >> >> For the duration between steps 4 (the SH latch acquire) and step 5 (EX > > >> >> latch > > >> >> release) sysprocesses will indicate that spid 55 is blocked by itself > > >> >> with a > > >> >> wait type of PAGEIOLATCH_XX (where XX can be SH, UP, or EX) as an > > >> >> indication > > >> >> that it is waiting for the completion of an IO that it itself issued. > > >> >> > > >> >> -- > > >> >> Santeri (Santtu) Voutilainen > > >> >> > > >> >> This posting is provided "AS IS" with no warranties, and confers no > > >> >> rights. > > >> >> > > >> >> > > >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message > > >> >> news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com... > > >> >> > Adrian, > > >> >> > > > >> >> > Thanks for your reply. > > >> >> > > > >> >> > I'm currently working on updating the stats for my big tables. I > > >> >> > have > > >> >> > had > > >> >> > some update stats running for more than 24 hours -- is that normal? > > >> >> > > > >> >> > Here is an example of sp_locks and sysprocesses output for a query > > >> >> > that > > >> >> > took > > >> >> > about an hour in sp3a, and is taking 5-10 times longer with sp4. I > > >> >> > have > > >> >> > updated the stats for all the tables in the query. > > >> >> > > > >> >> > sp_lock: > > >> >> > 64 8 0 0 DB S GRANT > > >> >> > 64 8 1332199796 0 TAB Sch-S GRANT > > >> >> > 64 8 435532635 1 PAG 5:3926175 IS GRANT > > >> >> > 64 8 1316199739 0 TAB Sch-S GRANT > > >> >> > 64 8 1145823194 1 PAG 5:3999638 S GRANT > > >> >> > 64 8 1284199625 0 TAB IS GRANT > > >> >> > 64 8 435532635 1 KEY (7b001c56dc00) S GRANT > > >> >> > 64 8 1333631844 0 TAB IS GRANT > > >> >> > 64 8 1145823194 0 TAB IS GRANT > > >> >> > 64 8 435532635 0 TAB IS GRANT > > >> >> > 64 8 784721848 0 TAB IS GRANT > > >> >> > 64 8 1236199454 0 TAB Sch-S GRANT > > >> >> > 64 8 435532635 2 PAG 4:3195035 IS GRANT > > >> >> > 64 8 1076198884 0 TAB Sch-S GRANT > > >> >> > 64 8 1284199625 1 PAG 1:4998275 IS GRANT > > >> >> > 64 8 1333631844 1 PAG 4:3107397 IS GRANT > > >> >> > > > >> >> > sysprocesses > > >> >> > 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381 > > >> >> > 0 > > >> >> > 2005-05-14 > > >> >> > 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping > > >> >> > 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > > >> >> > KAGUYA-Cube > > >> >> > > > >> >> > Microsoft SQL Server Analysis > > >> >> > Services > > >> >> > > > >> >> > 2104 SELECT > > >> >> > > > >> >> > > > >> >> > > > >> >> > 000D601CAB48 TCP/IP analysis > > >> >> > > > >> >> > 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > > >> >> > 0x010008001012822AC080EDB40000000000000000 76 -1 > > >> >> > > > >> >> > > > >> >> > Appreciate your help. > > >> >> > > > >> >> > -- J > > >> >> > "Adrian Zajkeskovic" wrote: > > >> >> > > > >> >> >> If you haven't done it already, update statistics with full scan or > > >> >> >> run > > >> >> >> dbcc > > >> >> >> dbreindex. Then reproduce the problem again and post the output of > > >> >> >> sp_lock > > >> >> >> and select * from sysprocesses for that spid. > > >> >> >> > > >> >> >> Adrian > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message > > >> >> >> news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com... > > >> >> >> > Sorry for premature posting. Here's what I meant to ask: > > >> >> >> > > > >> >> >> > I've noticed that several heavy queries that worked fine in SP3a > > >> >> >> > now > > >> >> >> > have > > >> >> >> > unusual behavior. The queries report that they are terminated as > > >> >> >> > a > > >> >> >> > deadlock > > >> >> >> > victim; however, there aren't any DML operations running on the > > >> >> >> > database > > >> >> >> > at > > >> >> >> > the time. When I look at the status with sp_who2, I see that the > > >> >> >> > query > > >> >> >> > appears to be blocked by itself: Santtu, Here's something strange. When I run the query from Query Analyzer, it seems to compile to a different execution plan than when I run the same query from Analysis Services SP1. I've tried looking at SQL Profiler output for Show Plan All, and when I run the query from Query Analyzer, it picks an Indexed View to use, and runs much faster. When I run the same query from Analysis Services (I cut and paste), it picks the underlying tables, and runs much, much slower. I'm looking into this some more, but if this reminds you of something, I'd appreciate the help... Regards, Jonathan Show quote "Santeri Voutilainen [MSFT]" wrote: > Jonathan, > > To help track this down, could you turn on statistics profile (set > statistics profile on), run your query, and send me the statistics profile > output? You can turn the statistics profile off after running the query. > Do you have you auto stats enabled? If not, do you regular update > statistics? > > Thanks, > Santeri (Santtu) Voutilainen > > This posting is provided "AS IS" with no warranties, and confers no rights. > > > "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message > news:B8769300-13CC-4E09-8CE4-672585525EFC@microsoft.com... > > Santu, > > > > Thanks for your reply. > > > > "Santeri Voutilainen [MSFT]" wrote: > >> Do you happen to know whether your queries had any plan changes? The > >> increased PAGEIOLATCH waits and increased lock accumulation could be > >> symptoms of a plan change, for example from an index seek to a table > >> scan, > >> or changes in join types. > > > > I don't think there's been any change in plan. > > > > I tried creating an indexed view to eliminate one of the joins on one of > > the > > queries. It did simplify the Estimated plan, > > and there are no table scans, but the query still does not complete. > > > > In the case of this query, if I limit it the query (for example SELECT TOP > > 100 ...) it does complete. (I estimate the result set for the full query > > to > > be > > on the order of hundreds of thousands of records, with some fairly > > straightforward GROUP BY/SUM calculations). > > > > It seems like this query ran for about 10 hours and then terminated on > > deadlock. I'm going to turn on the deadlock trace flags > > to see if I can get some more information about it. > > > > In the case of the other query, it is being run on another server during > > an > > Analysis Services cube update. Once again, > > I suppose the plan could have changed from SP3a to SP4, but I haven't done > > anything else that I would expect to make > > the plan change. > > > > Regards, > > > > Jonathan > > > >> > >> -- > >> Santeri (Santtu) Voutilainen > >> > >> This posting is provided "AS IS" with no warranties, and confers no > >> rights. > >> > >> > >> "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message > >> news:7E009F0E-C354-4975-854D-5A9A93036507@microsoft.com... > >> > Santtu, > >> > > >> > Thanks very much for your reply. > >> > > >> > "Santeri Voutilainen [MSFT]" wrote: > >> >> What you are seeing is not related to locks, rather it is related to > >> >> the > >> >> latch waits you see in sysprocesses. In SP4 we are able to determine > >> >> the > >> >> blocking spid for latch waits when the blocking spid holds the latch > >> >> in > >> >> exclusive or update mode. In these cases the blocking spid > >> >> information > >> >> is > >> >> populated into the "blocked" field in sysprocesses. > >> > > >> > That information is certainly going to be quite helpful. > >> > > >> > But what's alarming is that the queries in question seem to be running > >> > much > >> > slower than they did in SP3a. > >> > As I think I mentioned, several of my queries now take 5-10x longer > >> > than > >> > they used to. And one query that > >> > used to complete in under an hour now takes about 10-12 hours, then is > >> > terminated as a deadlock victim -- > >> > even though there shouldn't be any other queries running on the same > >> > database at the time the deadlock is > >> > detected. > >> > > >> > I'm not exactly sure what's going on. It happens on two different > >> > machines, > >> > but only on specific queries. One > >> > more clue is that the queries seem to do a lot of work in parallel > >> > fashion > >> > (with 4-10 threads working on the > >> > query). They then go to one thread that spends a lot of time in > >> > PAGEIOLATCH_SH state, and accumulate many > >> > many locks (after five hours, one query had accumulated more than a > >> > million > >> > locks). > >> > > >> > Any suggestions gladly welcomed! > >> > > >> > Regards, > >> > > >> > Jonathan > >> > > >> >> So why is the spid blocked by itself? Well, that has to do with how > >> >> latches > >> >> are used for IO operations. When an IO is issued for page, a latch is > >> >> held > >> >> on the page. Depending on the IO operation, the latch mode acquired > >> >> is > >> >> shared (SH) or exclusive (EX). The latch for the IO is acquired by > >> >> the > >> >> thread that issues the IO. Since all SQL Server IO operations are > >> >> asynchronous, if the spid that issued the IO wants to wait for the IO > >> >> to > >> >> complete it will attempt to acquire another latch on the same page > >> >> after > >> >> issuing the IO. The first latch is released when the IO completes. > >> >> This > >> >> release allows the second latch request to be granted. > >> >> Here's an example of how this works: > >> >> 1. Spid 55 wants to read page P1 which does not exist in the buffer > >> >> pool. > >> >> 2. Spid 55 acquires an EX latch on page P1 -- this marks spid 55 as > >> >> owning > >> >> the latch. The latch is in an in memory data structure, not the > >> >> physical > >> >> page itself. Since the page does not yet exist in memory the mode is > >> >> EX > >> >> in > >> >> order to force other spids that may also want to access the page to > >> >> wait > >> >> for > >> >> the IO to complete and also to prevent them from issueing a second IO > >> >> operation for the same page. > >> >> 3. Spid 55 issues the IO request to read P1 from disk. > >> >> 4. Since Spid 55 wants to read the page, it must wait for the IO to > >> >> complete. It does this by attempting to acquire another latch (in > >> >> this > >> >> case > >> >> a share (SH)) latch on the page. Since the latch is already held in > >> >> EX, > >> >> the > >> >> SH request is blocked and the spid is suspended. > >> >> 5. Upon completion of the IO the EX latch on the page is released. > >> >> 6. The release of the EX latch grants the SH latch to spid 55. > >> >> 7. Spid 55 can now read the page. > >> >> > >> >> For the duration between steps 4 (the SH latch acquire) and step 5 (EX > >> >> latch > >> >> release) sysprocesses will indicate that spid 55 is blocked by itself > >> >> with a > >> >> wait type of PAGEIOLATCH_XX (where XX can be SH, UP, or EX) as an > >> >> indication > >> >> that it is waiting for the completion of an IO that it itself issued. > >> >> > >> >> -- > >> >> Santeri (Santtu) Voutilainen > >> >> > >> >> This posting is provided "AS IS" with no warranties, and confers no > >> >> rights. > >> >> > >> >> > >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message > >> >> news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com... > >> >> > Adrian, > >> >> > > >> >> > Thanks for your reply. > >> >> > > >> >> > I'm currently working on updating the stats for my big tables. I > >> >> > have > >> >> > had > >> >> > some update stats running for more than 24 hours -- is that normal? > >> >> > > >> >> > Here is an example of sp_locks and sysprocesses output for a query > >> >> > that > >> >> > took > >> >> > about an hour in sp3a, and is taking 5-10 times longer with sp4. I > >> >> > have > >> >> > updated the stats for all the tables in the query. > >> >> > > >> >> > sp_lock: > >> >> > 64 8 0 0 DB S GRANT > >> >> > 64 8 1332199796 0 TAB Sch-S GRANT > >> >> > 64 8 435532635 1 PAG 5:3926175 IS GRANT > >> >> > 64 8 1316199739 0 TAB Sch-S GRANT > >> >> > 64 8 1145823194 1 PAG 5:3999638 S GRANT > >> >> > 64 8 1284199625 0 TAB IS GRANT > >> >> > 64 8 435532635 1 KEY (7b001c56dc00) S GRANT > >> >> > 64 8 1333631844 0 TAB IS GRANT > >> >> > 64 8 1145823194 0 TAB IS GRANT > >> >> > 64 8 435532635 0 TAB IS GRANT > >> >> > 64 8 784721848 0 TAB IS GRANT > >> >> > 64 8 1236199454 0 TAB Sch-S GRANT > >> >> > 64 8 435532635 2 PAG 4:3195035 IS GRANT > >> >> > 64 8 1076198884 0 TAB Sch-S GRANT > >> >> > 64 8 1284199625 1 PAG 1:4998275 IS GRANT > >> >> > 64 8 1333631844 1 PAG 4:3107397 IS GRANT > >> >> > > >> >> > sysprocesses > >> >> > 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381 > >> >> > 0 > >> >> > 2005-05-14 > >> >> > 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping > >> >> > 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > >> >> > KAGUYA-Cube > >> >> > > >> >> > Microsoft SQL Server Analysis > >> >> > Services > >> >> > > >> >> > 2104 SELECT > >> >> > > >> >> > > >> >> > > >> >> > 000D601CAB48 TCP/IP analysis > >> >> > > >> >> > 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 > >> >> > 0x010008001012822AC080EDB40000000000000000 76 -1 > >> >> > > >> >> > > >> >> > Appreciate your help. > >> >> > > >> >> > -- J > >> >> > "Adrian Zajkeskovic" wrote: > >> >> > > >> >> >> If you haven't done it already, update statistics with full scan or > >> >> >> run > >> >> >> dbcc > >> >> >> dbreindex. Then reproduce the problem again and post the output of > >> >> >> sp_lock > >> >> >> and select * from sysprocesses for that spid. > >> >> >> > >> >> >> Adrian > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message > >> | |||||||||||||||||||||||