Home All Groups Group Topic Archive Search About

SP4 reports queries blocking on themselves?

Author
12 May 2005 8:30 AM
Jonathan Levine
Greetings.

I just installed SP4 yesterday.

I've noticed that several heavy queries that worked fine in SP3a now have
unusual behavior.  The queries report that they are terminated as a deadlock
victim; however, there aren't any DML operations running on the database at
the time.  When I look at the status with sp_who2, I see that the query
appears to be blocked by itself:

SPID
71       sleeping                          reporting    KAGUYA      71  
    Jon_Test3    SELECT              904579    1404452    05/12 14:56:01    .Net SqlClient Data
Provider                                                 71

Author
12 May 2005 8:41 AM
Jonathan Levine
Sorry for premature posting.  Here's what I meant to ask:

I've noticed that several heavy queries that worked fine in SP3a now have
unusual behavior.  The queries report that they are terminated as a deadlock
victim; however, there aren't any DML operations running on the database at
the time.  When I look at the status with sp_who2, I see that the query
appears to be blocked by itself:

SPID         Status      Login              BlkBy     Command
71       sleeping   reporting    ... 71     ... SELECT         

sp_lock reports that the process has an exclusive lock on an extent in
tempdb, but I don't see any other X locks around.

Has anyone else seen anything like this?

Regards,

Jonathan
Author
12 May 2005 11:46 AM
Adrian Zajkeskovic
If you haven't done it already, update statistics with full scan or run dbcc
dbreindex. Then reproduce the problem again and post the output of sp_lock
and select * from sysprocesses for that spid.

Adrian




Show quote
"Jonathan Levine" <myfoo@nospam.nospam> wrote in message
news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com...
> Sorry for premature posting.  Here's what I meant to ask:
>
> I've noticed that several heavy queries that worked fine in SP3a now have
> unusual behavior.  The queries report that they are terminated as a
> deadlock
> victim; however, there aren't any DML operations running on the database
> at
> the time.  When I look at the status with sp_who2, I see that the query
> appears to be blocked by itself:
>
> SPID         Status      Login              BlkBy     Command
> 71   sleeping   reporting ... 71     ... SELECT
>
> sp_lock reports that the process has an exclusive lock on an extent in
> tempdb, but I don't see any other X locks around.
>
> Has anyone else seen anything like this?
>
> Regards,
>
> Jonathan
Author
14 May 2005 8:29 AM
Jonathan Levine
Adrian,

Thanks for your reply.

I'm currently working on updating the stats for my big tables.   I have had
some update stats running for more than 24 hours -- is that normal?

Here is an example of sp_locks and sysprocesses output for a query that took
about an hour in sp3a, and is taking 5-10 times longer with sp4.  I have
updated the stats for all the tables in the query.

sp_lock:
64    8    0    0    DB                        S    GRANT
64    8    1332199796    0    TAB                        Sch-S    GRANT
64    8    435532635    1    PAG    5:3926175           IS    GRANT
64    8    1316199739    0    TAB                        Sch-S    GRANT
64    8    1145823194    1    PAG    5:3999638           S    GRANT
64    8    1284199625    0    TAB                        IS    GRANT
64    8    435532635    1    KEY    (7b001c56dc00)      S    GRANT
64    8    1333631844    0    TAB                        IS    GRANT
64    8    1145823194    0    TAB                        IS    GRANT
64    8    435532635    0    TAB                        IS    GRANT
64    8    784721848    0    TAB                        IS    GRANT
64    8    1236199454    0    TAB                        Sch-S    GRANT
64    8    435532635    2    PAG    4:3195035           IS    GRANT
64    8    1076198884    0    TAB                        Sch-S    GRANT
64    8    1284199625    1    PAG    1:4998275           IS    GRANT
64    8    1333631844    1    PAG    4:3107397           IS    GRANT

sysprocesses
64    4232    64    0x0422    0    PAGEIOLATCH_SH    8:1:5156833    8    0    13219878    1160381    0    2005-05-14
11:53:46.567    2005-05-14 12:04:11.867    0    0    sleeping                     
    0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000    KAGUYA-Cube

                                          Microsoft SQL Server Analysis Services

               2104        SELECT                                                     



    000D601CAB48    TCP/IP          analysis                                          

    0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000    0x010008001012822AC080EDB40000000000000000    76    -1


Appreciate your help.

-- J
Show quote
"Adrian Zajkeskovic" wrote:

> If you haven't done it already, update statistics with full scan or run dbcc
> dbreindex. Then reproduce the problem again and post the output of sp_lock
> and select * from sysprocesses for that spid.
>
> Adrian
>
>
>
>
> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
> news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com...
> > Sorry for premature posting.  Here's what I meant to ask:
> >
> > I've noticed that several heavy queries that worked fine in SP3a now have
> > unusual behavior.  The queries report that they are terminated as a
> > deadlock
> > victim; however, there aren't any DML operations running on the database
> > at
> > the time.  When I look at the status with sp_who2, I see that the query
> > appears to be blocked by itself:
> >
> > SPID         Status      Login              BlkBy     Command
> > 71   sleeping   reporting ... 71     ... SELECT
> >
> > sp_lock reports that the process has an exclusive lock on an extent in
> > tempdb, but I don't see any other X locks around.
> >
> > Has anyone else seen anything like this?
> >
> > Regards,
> >
> > Jonathan
>
>
>
Author
17 May 2005 6:30 PM
Richard Richards
Is there any more word on this issue? We are having the same issue.

Thanks in advance.

Show quote
"Jonathan Levine" <myfoo@nospam.nospam> wrote in message
news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com...
> Adrian,
>
> Thanks for your reply.
>
> I'm currently working on updating the stats for my big tables.   I have
had
> some update stats running for more than 24 hours -- is that normal?
>
> Here is an example of sp_locks and sysprocesses output for a query that
took
> about an hour in sp3a, and is taking 5-10 times longer with sp4.  I have
> updated the stats for all the tables in the query.
>
> sp_lock:
> 64 8 0 0 DB                 S GRANT
> 64 8 1332199796 0 TAB                 Sch-S GRANT
> 64 8 435532635 1 PAG 5:3926175       IS GRANT
> 64 8 1316199739 0 TAB                 Sch-S GRANT
> 64 8 1145823194 1 PAG 5:3999638       S GRANT
> 64 8 1284199625 0 TAB                 IS GRANT
> 64 8 435532635 1 KEY (7b001c56dc00)  S GRANT
> 64 8 1333631844 0 TAB                 IS GRANT
> 64 8 1145823194 0 TAB                 IS GRANT
> 64 8 435532635 0 TAB                 IS GRANT
> 64 8 784721848 0 TAB                 IS GRANT
> 64 8 1236199454 0 TAB                 Sch-S GRANT
> 64 8 435532635 2 PAG 4:3195035       IS GRANT
> 64 8 1076198884 0 TAB                 Sch-S GRANT
> 64 8 1284199625 1 PAG 1:4998275       IS GRANT
> 64 8 1333631844 1 PAG 4:3107397       IS GRANT
>
> sysprocesses
> 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381 0
2005-05-14
> 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping
>
0x2B82AF0AE9CA284BB2C9C1DC1A71D94A000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000 KAGUYA-Cube
Show quote
>
>                                       Microsoft SQL Server Analysis
Services
>
>            2104    SELECT
>
>
>
> 000D601CAB48 TCP/IP      analysis
>
>
0x00000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000 0x010008001012822AC080EDB40000000000000000
76 -1
>
>
> Appreciate your help.
>
> -- J
> "Adrian Zajkeskovic" wrote:
>
> > If you haven't done it already, update statistics with full scan or run
dbcc
> > dbreindex. Then reproduce the problem again and post the output of
sp_lock
> > and select * from sysprocesses for that spid.
> >
> > Adrian
> >
> >
> >
> >
> > "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
> > news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com...
> > > Sorry for premature posting.  Here's what I meant to ask:
> > >
> > > I've noticed that several heavy queries that worked fine in SP3a now
have
> > > unusual behavior.  The queries report that they are terminated as a
> > > deadlock
> > > victim; however, there aren't any DML operations running on the
database
> > > at
> > > the time.  When I look at the status with sp_who2, I see that the
query
> > > appears to be blocked by itself:
> > >
> > > SPID         Status      Login              BlkBy     Command
> > > 71   sleeping   reporting ... 71     ... SELECT
> > >
> > > sp_lock reports that the process has an exclusive lock on an extent in
> > > tempdb, but I don't see any other X locks around.
> > >
> > > Has anyone else seen anything like this?
> > >
> > > Regards,
> > >
> > > Jonathan
> >
> >
> >
Author
18 May 2005 2:19 AM
Jonathan Levine
Are Microsoft folks no longer responding to questions in this newsgroup?  I
started this
thread with my MSDN-registered alias last week but still haven't heard
anything from [MS]
folks.

Thanks for your help!

-- J
Author
19 May 2005 6:42 PM
Santeri Voutilainen [MSFT]
What you are seeing is not related to locks, rather it is related to the
latch waits you see in sysprocesses.  In SP4 we are able to determine the
blocking spid for latch waits when the blocking spid holds the latch in
exclusive or update mode.  In these cases the blocking spid information is
populated into the "blocked" field in sysprocesses.
So why is the spid blocked by itself?  Well, that has to do with how latches
are used for IO operations.  When an IO is issued for page, a latch is held
on the page.  Depending on the IO operation, the latch mode acquired is
shared (SH) or exclusive (EX).  The latch for the IO is acquired by the
thread that issues the IO.  Since all SQL Server IO operations are
asynchronous, if the spid that issued the IO wants to wait for the IO to
complete it will attempt to acquire another latch on the same page after
issuing the IO.  The first latch is released when the IO completes.  This
release allows the second latch request to be granted.
Here's an example of how this works:
1. Spid 55 wants to read page P1 which does not exist in the buffer pool.
2. Spid 55 acquires an EX latch on page P1 -- this marks spid 55 as owning
the latch.  The latch is in an in memory data structure, not the physical
page itself.  Since the page does not yet exist in memory the mode is EX in
order to force other spids that may also want to access the page to wait for
the IO to complete and also to prevent them from issueing a second IO
operation for the same page.
3. Spid 55 issues the IO request to read P1 from disk.
4. Since Spid 55 wants to read the page, it must wait for the IO to
complete.  It does this by attempting to acquire another latch (in this case
a share (SH)) latch on the page.  Since the latch is already held in EX, the
SH request is blocked and the spid is suspended.
5. Upon completion of the IO the EX latch on the page is released.
6. The release of the EX latch grants the SH latch to spid 55.
7. Spid 55 can now read the page.

For the duration between steps 4 (the SH latch acquire) and step 5 (EX latch
release) sysprocesses will indicate that spid 55 is blocked by itself with a
wait type of PAGEIOLATCH_XX (where XX can be SH, UP, or EX) as an indication
that it is waiting for the completion of an IO that it itself issued.

--
Santeri (Santtu) Voutilainen

This posting is provided "AS IS" with no warranties, and confers no rights.


Show quote
"Jonathan Levine" <myfoo@nospam.nospam> wrote in message
news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com...
> Adrian,
>
> Thanks for your reply.
>
> I'm currently working on updating the stats for my big tables.   I have
> had
> some update stats running for more than 24 hours -- is that normal?
>
> Here is an example of sp_locks and sysprocesses output for a query that
> took
> about an hour in sp3a, and is taking 5-10 times longer with sp4.  I have
> updated the stats for all the tables in the query.
>
> sp_lock:
> 64 8 0 0 DB                 S GRANT
> 64 8 1332199796 0 TAB                 Sch-S GRANT
> 64 8 435532635 1 PAG 5:3926175       IS GRANT
> 64 8 1316199739 0 TAB                 Sch-S GRANT
> 64 8 1145823194 1 PAG 5:3999638       S GRANT
> 64 8 1284199625 0 TAB                 IS GRANT
> 64 8 435532635 1 KEY (7b001c56dc00)  S GRANT
> 64 8 1333631844 0 TAB                 IS GRANT
> 64 8 1145823194 0 TAB                 IS GRANT
> 64 8 435532635 0 TAB                 IS GRANT
> 64 8 784721848 0 TAB                 IS GRANT
> 64 8 1236199454 0 TAB                 Sch-S GRANT
> 64 8 435532635 2 PAG 4:3195035       IS GRANT
> 64 8 1076198884 0 TAB                 Sch-S GRANT
> 64 8 1284199625 1 PAG 1:4998275       IS GRANT
> 64 8 1333631844 1 PAG 4:3107397       IS GRANT
>
> sysprocesses
> 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381 0
> 2005-05-14
> 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping
> 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> KAGUYA-Cube
>
>                                      Microsoft SQL Server Analysis
> Services
>
>           2104    SELECT
>
>
>
> 000D601CAB48 TCP/IP      analysis
>
> 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> 0x010008001012822AC080EDB40000000000000000 76 -1
>
>
> Appreciate your help.
>
> -- J
> "Adrian Zajkeskovic" wrote:
>
>> If you haven't done it already, update statistics with full scan or run
>> dbcc
>> dbreindex. Then reproduce the problem again and post the output of
>> sp_lock
>> and select * from sysprocesses for that spid.
>>
>> Adrian
>>
>>
>>
>>
>> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
>> news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com...
>> > Sorry for premature posting.  Here's what I meant to ask:
>> >
>> > I've noticed that several heavy queries that worked fine in SP3a now
>> > have
>> > unusual behavior.  The queries report that they are terminated as a
>> > deadlock
>> > victim; however, there aren't any DML operations running on the
>> > database
>> > at
>> > the time.  When I look at the status with sp_who2, I see that the query
>> > appears to be blocked by itself:
>> >
>> > SPID         Status      Login              BlkBy     Command
>> > 71   sleeping   reporting ... 71     ... SELECT
>> >
>> > sp_lock reports that the process has an exclusive lock on an extent in
>> > tempdb, but I don't see any other X locks around.
>> >
>> > Has anyone else seen anything like this?
>> >
>> > Regards,
>> >
>> > Jonathan
>>
>>
>>
Author
20 May 2005 5:42 AM
Jonathan Levine
Santtu,

Thanks very much for your reply.

"Santeri Voutilainen [MSFT]" wrote:
> What you are seeing is not related to locks, rather it is related to the
> latch waits you see in sysprocesses.  In SP4 we are able to determine the
> blocking spid for latch waits when the blocking spid holds the latch in
> exclusive or update mode.  In these cases the blocking spid information is
> populated into the "blocked" field in sysprocesses.

That  information is certainly going to be quite helpful.

But what's alarming is that the queries in question seem to be running much
slower than they did in SP3a.
As I think I mentioned, several of my queries now take 5-10x longer than
they used to.  And one query that
used to complete in under an hour now takes about 10-12 hours, then is
terminated as a deadlock victim --
even though there shouldn't be any other queries running on the same
database at the time the deadlock is
detected.

I'm not exactly sure what's going on.  It happens on two different machines,
but only on specific queries.  One
more clue is that the queries seem to do a lot of work in parallel fashion
(with 4-10 threads working on the
query).  They then go to one thread that spends a lot of time in
PAGEIOLATCH_SH state, and accumulate many
many locks (after five hours, one query had accumulated more than a million
locks).

Any suggestions gladly welcomed!

Regards,

Jonathan

Show quote
> So why is the spid blocked by itself?  Well, that has to do with how latches
> are used for IO operations.  When an IO is issued for page, a latch is held
> on the page.  Depending on the IO operation, the latch mode acquired is
> shared (SH) or exclusive (EX).  The latch for the IO is acquired by the
> thread that issues the IO.  Since all SQL Server IO operations are
> asynchronous, if the spid that issued the IO wants to wait for the IO to
> complete it will attempt to acquire another latch on the same page after
> issuing the IO.  The first latch is released when the IO completes.  This
> release allows the second latch request to be granted.
> Here's an example of how this works:
> 1. Spid 55 wants to read page P1 which does not exist in the buffer pool.
> 2. Spid 55 acquires an EX latch on page P1 -- this marks spid 55 as owning
> the latch.  The latch is in an in memory data structure, not the physical
> page itself.  Since the page does not yet exist in memory the mode is EX in
> order to force other spids that may also want to access the page to wait for
> the IO to complete and also to prevent them from issueing a second IO
> operation for the same page.
> 3. Spid 55 issues the IO request to read P1 from disk.
> 4. Since Spid 55 wants to read the page, it must wait for the IO to
> complete.  It does this by attempting to acquire another latch (in this case
> a share (SH)) latch on the page.  Since the latch is already held in EX, the
> SH request is blocked and the spid is suspended.
> 5. Upon completion of the IO the EX latch on the page is released.
> 6. The release of the EX latch grants the SH latch to spid 55.
> 7. Spid 55 can now read the page.
>
> For the duration between steps 4 (the SH latch acquire) and step 5 (EX latch
> release) sysprocesses will indicate that spid 55 is blocked by itself with a
> wait type of PAGEIOLATCH_XX (where XX can be SH, UP, or EX) as an indication
> that it is waiting for the completion of an IO that it itself issued.
>
> --
> Santeri (Santtu) Voutilainen
>
> This posting is provided "AS IS" with no warranties, and confers no rights.
>
>
> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
> news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com...
> > Adrian,
> >
> > Thanks for your reply.
> >
> > I'm currently working on updating the stats for my big tables.   I have
> > had
> > some update stats running for more than 24 hours -- is that normal?
> >
> > Here is an example of sp_locks and sysprocesses output for a query that
> > took
> > about an hour in sp3a, and is taking 5-10 times longer with sp4.  I have
> > updated the stats for all the tables in the query.
> >
> > sp_lock:
> > 64 8 0 0 DB                 S GRANT
> > 64 8 1332199796 0 TAB                 Sch-S GRANT
> > 64 8 435532635 1 PAG 5:3926175       IS GRANT
> > 64 8 1316199739 0 TAB                 Sch-S GRANT
> > 64 8 1145823194 1 PAG 5:3999638       S GRANT
> > 64 8 1284199625 0 TAB                 IS GRANT
> > 64 8 435532635 1 KEY (7b001c56dc00)  S GRANT
> > 64 8 1333631844 0 TAB                 IS GRANT
> > 64 8 1145823194 0 TAB                 IS GRANT
> > 64 8 435532635 0 TAB                 IS GRANT
> > 64 8 784721848 0 TAB                 IS GRANT
> > 64 8 1236199454 0 TAB                 Sch-S GRANT
> > 64 8 435532635 2 PAG 4:3195035       IS GRANT
> > 64 8 1076198884 0 TAB                 Sch-S GRANT
> > 64 8 1284199625 1 PAG 1:4998275       IS GRANT
> > 64 8 1333631844 1 PAG 4:3107397       IS GRANT
> >
> > sysprocesses
> > 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381 0
> > 2005-05-14
> > 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping
> > 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> > KAGUYA-Cube
> >
> >                                      Microsoft SQL Server Analysis
> > Services
> >
> >           2104    SELECT
> >
> >
> >
> > 000D601CAB48 TCP/IP      analysis
> >
> > 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> > 0x010008001012822AC080EDB40000000000000000 76 -1
> >
> >
> > Appreciate your help.
> >
> > -- J
> > "Adrian Zajkeskovic" wrote:
> >
> >> If you haven't done it already, update statistics with full scan or run
> >> dbcc
> >> dbreindex. Then reproduce the problem again and post the output of
> >> sp_lock
> >> and select * from sysprocesses for that spid.
> >>
> >> Adrian
> >>
> >>
> >>
> >>
> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
> >> news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com...
> >> > Sorry for premature posting.  Here's what I meant to ask:
> >> >
> >> > I've noticed that several heavy queries that worked fine in SP3a now
> >> > have
> >> > unusual behavior.  The queries report that they are terminated as a
> >> > deadlock
> >> > victim; however, there aren't any DML operations running on the
> >> > database
> >> > at
> >> > the time.  When I look at the status with sp_who2, I see that the query
> >> > appears to be blocked by itself:
> >> >
> >> > SPID         Status      Login              BlkBy     Command
> >> > 71   sleeping   reporting ... 71     ... SELECT
> >> >
> >> > sp_lock reports that the process has an exclusive lock on an extent in
> >> > tempdb, but I don't see any other X locks around.
> >> >
> >> > Has anyone else seen anything like this?
> >> >
> >> > Regards,
> >> >
> >> > Jonathan
> >>
> >>
> >>
>
>
>
Author
20 May 2005 7:48 PM
Santeri Voutilainen [MSFT]
Jonathan,

Do you happen to know whether your queries had any plan changes?  The
increased PAGEIOLATCH waits and increased lock accumulation could be
symptoms of a plan change, for example from an index seek to a table scan,
or changes in join types.

--
Santeri (Santtu) Voutilainen

This posting is provided "AS IS" with no warranties, and confers no rights.


Show quote
"Jonathan Levine" <myfoo2@nospam.nospam> wrote in message
news:7E009F0E-C354-4975-854D-5A9A93036507@microsoft.com...
> Santtu,
>
> Thanks very much for your reply.
>
> "Santeri Voutilainen [MSFT]" wrote:
>> What you are seeing is not related to locks, rather it is related to the
>> latch waits you see in sysprocesses.  In SP4 we are able to determine the
>> blocking spid for latch waits when the blocking spid holds the latch in
>> exclusive or update mode.  In these cases the blocking spid information
>> is
>> populated into the "blocked" field in sysprocesses.
>
> That  information is certainly going to be quite helpful.
>
> But what's alarming is that the queries in question seem to be running
> much
> slower than they did in SP3a.
> As I think I mentioned, several of my queries now take 5-10x longer than
> they used to.  And one query that
> used to complete in under an hour now takes about 10-12 hours, then is
> terminated as a deadlock victim --
> even though there shouldn't be any other queries running on the same
> database at the time the deadlock is
> detected.
>
> I'm not exactly sure what's going on.  It happens on two different
> machines,
> but only on specific queries.  One
> more clue is that the queries seem to do a lot of work in parallel fashion
> (with 4-10 threads working on the
> query).  They then go to one thread that spends a lot of time in
> PAGEIOLATCH_SH state, and accumulate many
> many locks (after five hours, one query had accumulated more than a
> million
> locks).
>
> Any suggestions gladly welcomed!
>
> Regards,
>
> Jonathan
>
>> So why is the spid blocked by itself?  Well, that has to do with how
>> latches
>> are used for IO operations.  When an IO is issued for page, a latch is
>> held
>> on the page.  Depending on the IO operation, the latch mode acquired is
>> shared (SH) or exclusive (EX).  The latch for the IO is acquired by the
>> thread that issues the IO.  Since all SQL Server IO operations are
>> asynchronous, if the spid that issued the IO wants to wait for the IO to
>> complete it will attempt to acquire another latch on the same page after
>> issuing the IO.  The first latch is released when the IO completes.  This
>> release allows the second latch request to be granted.
>> Here's an example of how this works:
>> 1. Spid 55 wants to read page P1 which does not exist in the buffer pool.
>> 2. Spid 55 acquires an EX latch on page P1 -- this marks spid 55 as
>> owning
>> the latch.  The latch is in an in memory data structure, not the physical
>> page itself.  Since the page does not yet exist in memory the mode is EX
>> in
>> order to force other spids that may also want to access the page to wait
>> for
>> the IO to complete and also to prevent them from issueing a second IO
>> operation for the same page.
>> 3. Spid 55 issues the IO request to read P1 from disk.
>> 4. Since Spid 55 wants to read the page, it must wait for the IO to
>> complete.  It does this by attempting to acquire another latch (in this
>> case
>> a share (SH)) latch on the page.  Since the latch is already held in EX,
>> the
>> SH request is blocked and the spid is suspended.
>> 5. Upon completion of the IO the EX latch on the page is released.
>> 6. The release of the EX latch grants the SH latch to spid 55.
>> 7. Spid 55 can now read the page.
>>
>> For the duration between steps 4 (the SH latch acquire) and step 5 (EX
>> latch
>> release) sysprocesses will indicate that spid 55 is blocked by itself
>> with a
>> wait type of PAGEIOLATCH_XX (where XX can be SH, UP, or EX) as an
>> indication
>> that it is waiting for the completion of an IO that it itself issued.
>>
>> --
>> Santeri (Santtu) Voutilainen
>>
>> This posting is provided "AS IS" with no warranties, and confers no
>> rights.
>>
>>
>> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
>> news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com...
>> > Adrian,
>> >
>> > Thanks for your reply.
>> >
>> > I'm currently working on updating the stats for my big tables.   I have
>> > had
>> > some update stats running for more than 24 hours -- is that normal?
>> >
>> > Here is an example of sp_locks and sysprocesses output for a query that
>> > took
>> > about an hour in sp3a, and is taking 5-10 times longer with sp4.  I
>> > have
>> > updated the stats for all the tables in the query.
>> >
>> > sp_lock:
>> > 64 8 0 0 DB                 S GRANT
>> > 64 8 1332199796 0 TAB                 Sch-S GRANT
>> > 64 8 435532635 1 PAG 5:3926175       IS GRANT
>> > 64 8 1316199739 0 TAB                 Sch-S GRANT
>> > 64 8 1145823194 1 PAG 5:3999638       S GRANT
>> > 64 8 1284199625 0 TAB                 IS GRANT
>> > 64 8 435532635 1 KEY (7b001c56dc00)  S GRANT
>> > 64 8 1333631844 0 TAB                 IS GRANT
>> > 64 8 1145823194 0 TAB                 IS GRANT
>> > 64 8 435532635 0 TAB                 IS GRANT
>> > 64 8 784721848 0 TAB                 IS GRANT
>> > 64 8 1236199454 0 TAB                 Sch-S GRANT
>> > 64 8 435532635 2 PAG 4:3195035       IS GRANT
>> > 64 8 1076198884 0 TAB                 Sch-S GRANT
>> > 64 8 1284199625 1 PAG 1:4998275       IS GRANT
>> > 64 8 1333631844 1 PAG 4:3107397       IS GRANT
>> >
>> > sysprocesses
>> > 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381 0
>> > 2005-05-14
>> > 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping
>> > 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
>> > KAGUYA-Cube
>> >
>> >                                      Microsoft SQL Server Analysis
>> > Services
>> >
>> >           2104    SELECT
>> >
>> >
>> >
>> > 000D601CAB48 TCP/IP      analysis
>> >
>> > 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
>> > 0x010008001012822AC080EDB40000000000000000 76 -1
>> >
>> >
>> > Appreciate your help.
>> >
>> > -- J
>> > "Adrian Zajkeskovic" wrote:
>> >
>> >> If you haven't done it already, update statistics with full scan or
>> >> run
>> >> dbcc
>> >> dbreindex. Then reproduce the problem again and post the output of
>> >> sp_lock
>> >> and select * from sysprocesses for that spid.
>> >>
>> >> Adrian
>> >>
>> >>
>> >>
>> >>
>> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
>> >> news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com...
>> >> > Sorry for premature posting.  Here's what I meant to ask:
>> >> >
>> >> > I've noticed that several heavy queries that worked fine in SP3a now
>> >> > have
>> >> > unusual behavior.  The queries report that they are terminated as a
>> >> > deadlock
>> >> > victim; however, there aren't any DML operations running on the
>> >> > database
>> >> > at
>> >> > the time.  When I look at the status with sp_who2, I see that the
>> >> > query
>> >> > appears to be blocked by itself:
>> >> >
>> >> > SPID         Status      Login              BlkBy     Command
>> >> > 71   sleeping   reporting ... 71     ... SELECT
>> >> >
>> >> > sp_lock reports that the process has an exclusive lock on an extent
>> >> > in
>> >> > tempdb, but I don't see any other X locks around.
>> >> >
>> >> > Has anyone else seen anything like this?
>> >> >
>> >> > Regards,
>> >> >
>> >> > Jonathan
>> >>
>> >>
>> >>
>>
>>
>>
Author
20 May 2005 11:01 PM
Jonathan Levine
Santu,

Thanks for your reply.

"Santeri Voutilainen [MSFT]" wrote:
> Do you happen to know whether your queries had any plan changes?  The
> increased PAGEIOLATCH waits and increased lock accumulation could be
> symptoms of a plan change, for example from an index seek to a table scan,
> or changes in join types.

I don't think there's been any change in plan.

I tried creating an indexed view to eliminate one of the joins on one of the
queries.  It did simplify the Estimated plan,
and there are no table scans, but the query still does not complete.

In the case of this query, if I limit it the query (for example SELECT TOP
100 ...) it does complete.  (I estimate the result set for the full query to
be
on the order of hundreds of thousands of records, with some fairly
straightforward GROUP BY/SUM calculations).

It seems like this query ran for about 10 hours and then terminated on
deadlock.  I'm going to turn on the deadlock trace flags
to see if I can get some more information about it.

In the case of the other query, it is being run on another server during an
Analysis Services cube update.  Once again,
I suppose the plan could have changed from SP3a to SP4, but I haven't done
anything else that I would expect to make
the plan change.

Regards,

Jonathan

Show quote
>
> --
> Santeri (Santtu) Voutilainen
>
> This posting is provided "AS IS" with no warranties, and confers no rights.
>
>
> "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message
> news:7E009F0E-C354-4975-854D-5A9A93036507@microsoft.com...
> > Santtu,
> >
> > Thanks very much for your reply.
> >
> > "Santeri Voutilainen [MSFT]" wrote:
> >> What you are seeing is not related to locks, rather it is related to the
> >> latch waits you see in sysprocesses.  In SP4 we are able to determine the
> >> blocking spid for latch waits when the blocking spid holds the latch in
> >> exclusive or update mode.  In these cases the blocking spid information
> >> is
> >> populated into the "blocked" field in sysprocesses.
> >
> > That  information is certainly going to be quite helpful.
> >
> > But what's alarming is that the queries in question seem to be running
> > much
> > slower than they did in SP3a.
> > As I think I mentioned, several of my queries now take 5-10x longer than
> > they used to.  And one query that
> > used to complete in under an hour now takes about 10-12 hours, then is
> > terminated as a deadlock victim --
> > even though there shouldn't be any other queries running on the same
> > database at the time the deadlock is
> > detected.
> >
> > I'm not exactly sure what's going on.  It happens on two different
> > machines,
> > but only on specific queries.  One
> > more clue is that the queries seem to do a lot of work in parallel fashion
> > (with 4-10 threads working on the
> > query).  They then go to one thread that spends a lot of time in
> > PAGEIOLATCH_SH state, and accumulate many
> > many locks (after five hours, one query had accumulated more than a
> > million
> > locks).
> >
> > Any suggestions gladly welcomed!
> >
> > Regards,
> >
> > Jonathan
> >
> >> So why is the spid blocked by itself?  Well, that has to do with how
> >> latches
> >> are used for IO operations.  When an IO is issued for page, a latch is
> >> held
> >> on the page.  Depending on the IO operation, the latch mode acquired is
> >> shared (SH) or exclusive (EX).  The latch for the IO is acquired by the
> >> thread that issues the IO.  Since all SQL Server IO operations are
> >> asynchronous, if the spid that issued the IO wants to wait for the IO to
> >> complete it will attempt to acquire another latch on the same page after
> >> issuing the IO.  The first latch is released when the IO completes.  This
> >> release allows the second latch request to be granted.
> >> Here's an example of how this works:
> >> 1. Spid 55 wants to read page P1 which does not exist in the buffer pool.
> >> 2. Spid 55 acquires an EX latch on page P1 -- this marks spid 55 as
> >> owning
> >> the latch.  The latch is in an in memory data structure, not the physical
> >> page itself.  Since the page does not yet exist in memory the mode is EX
> >> in
> >> order to force other spids that may also want to access the page to wait
> >> for
> >> the IO to complete and also to prevent them from issueing a second IO
> >> operation for the same page.
> >> 3. Spid 55 issues the IO request to read P1 from disk.
> >> 4. Since Spid 55 wants to read the page, it must wait for the IO to
> >> complete.  It does this by attempting to acquire another latch (in this
> >> case
> >> a share (SH)) latch on the page.  Since the latch is already held in EX,
> >> the
> >> SH request is blocked and the spid is suspended.
> >> 5. Upon completion of the IO the EX latch on the page is released.
> >> 6. The release of the EX latch grants the SH latch to spid 55.
> >> 7. Spid 55 can now read the page.
> >>
> >> For the duration between steps 4 (the SH latch acquire) and step 5 (EX
> >> latch
> >> release) sysprocesses will indicate that spid 55 is blocked by itself
> >> with a
> >> wait type of PAGEIOLATCH_XX (where XX can be SH, UP, or EX) as an
> >> indication
> >> that it is waiting for the completion of an IO that it itself issued.
> >>
> >> --
> >> Santeri (Santtu) Voutilainen
> >>
> >> This posting is provided "AS IS" with no warranties, and confers no
> >> rights.
> >>
> >>
> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
> >> news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com...
> >> > Adrian,
> >> >
> >> > Thanks for your reply.
> >> >
> >> > I'm currently working on updating the stats for my big tables.   I have
> >> > had
> >> > some update stats running for more than 24 hours -- is that normal?
> >> >
> >> > Here is an example of sp_locks and sysprocesses output for a query that
> >> > took
> >> > about an hour in sp3a, and is taking 5-10 times longer with sp4.  I
> >> > have
> >> > updated the stats for all the tables in the query.
> >> >
> >> > sp_lock:
> >> > 64 8 0 0 DB                 S GRANT
> >> > 64 8 1332199796 0 TAB                 Sch-S GRANT
> >> > 64 8 435532635 1 PAG 5:3926175       IS GRANT
> >> > 64 8 1316199739 0 TAB                 Sch-S GRANT
> >> > 64 8 1145823194 1 PAG 5:3999638       S GRANT
> >> > 64 8 1284199625 0 TAB                 IS GRANT
> >> > 64 8 435532635 1 KEY (7b001c56dc00)  S GRANT
> >> > 64 8 1333631844 0 TAB                 IS GRANT
> >> > 64 8 1145823194 0 TAB                 IS GRANT
> >> > 64 8 435532635 0 TAB                 IS GRANT
> >> > 64 8 784721848 0 TAB                 IS GRANT
> >> > 64 8 1236199454 0 TAB                 Sch-S GRANT
> >> > 64 8 435532635 2 PAG 4:3195035       IS GRANT
> >> > 64 8 1076198884 0 TAB                 Sch-S GRANT
> >> > 64 8 1284199625 1 PAG 1:4998275       IS GRANT
> >> > 64 8 1333631844 1 PAG 4:3107397       IS GRANT
> >> >
> >> > sysprocesses
> >> > 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381 0
> >> > 2005-05-14
> >> > 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping
> >> > 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> >> > KAGUYA-Cube
> >> >
> >> >                                      Microsoft SQL Server Analysis
> >> > Services
> >> >
> >> >           2104    SELECT
> >> >
> >> >
> >> >
> >> > 000D601CAB48 TCP/IP      analysis
> >> >
> >> > 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> >> > 0x010008001012822AC080EDB40000000000000000 76 -1
> >> >
> >> >
> >> > Appreciate your help.
> >> >
> >> > -- J
> >> > "Adrian Zajkeskovic" wrote:
> >> >
> >> >> If you haven't done it already, update statistics with full scan or
> >> >> run
> >> >> dbcc
> >> >> dbreindex. Then reproduce the problem again and post the output of
> >> >> sp_lock
> >> >> and select * from sysprocesses for that spid.
> >> >>
> >> >> Adrian
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
> >> >> news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com...
> >> >> > Sorry for premature posting.  Here's what I meant to ask:
> >> >> >
> >> >> > I've noticed that several heavy queries that worked fine in SP3a now
> >> >> > have
> >> >> > unusual behavior.  The queries report that they are terminated as a
> >> >> > deadlock
> >> >> > victim; however, there aren't any DML operations running on the
> >> >> > database
> >> >> > at
> >> >> > the time.  When I look at the status with sp_who2, I see that the
> >> >> > query
> >> >> > appears to be blocked by itself:
> >> >> >
> >> >> > SPID         Status      Login              BlkBy     Command
> >> >> > 71   sleeping   reporting ... 71     ... SELECT
> >> >> >
> >> >> > sp_lock reports that the process has an exclusive lock on an extent
> >> >> > in
> >> >> > tempdb, but I don't see any other X locks around.
> >> >> >
> >> >> > Has anyone else seen anything like this?
> >> >> >
> >> >> > Regards,
> >> >> >
> >> >> > Jonathan
> >> >>
> >> >>
> >> >>
> >>
> >>
> >>
>
>
>
Author
23 May 2005 10:39 PM
Santeri Voutilainen [MSFT]
Jonathan,

To help track this down, could you turn on statistics profile (set
statistics profile on), run your query, and send me the statistics profile
output?  You can turn the statistics profile off after running the query.
Do you have you auto stats enabled?  If not, do you regular update
statistics?

Thanks,
Santeri (Santtu) Voutilainen

This posting is provided "AS IS" with no warranties, and confers no rights.


Show quote
"Jonathan Levine" <myfoo2@nospam.nospam> wrote in message
news:B8769300-13CC-4E09-8CE4-672585525EFC@microsoft.com...
> Santu,
>
> Thanks for your reply.
>
> "Santeri Voutilainen [MSFT]" wrote:
>> Do you happen to know whether your queries had any plan changes?  The
>> increased PAGEIOLATCH waits and increased lock accumulation could be
>> symptoms of a plan change, for example from an index seek to a table
>> scan,
>> or changes in join types.
>
> I don't think there's been any change in plan.
>
> I tried creating an indexed view to eliminate one of the joins on one of
> the
> queries.  It did simplify the Estimated plan,
> and there are no table scans, but the query still does not complete.
>
> In the case of this query, if I limit it the query (for example SELECT TOP
> 100 ...) it does complete.  (I estimate the result set for the full query
> to
> be
> on the order of hundreds of thousands of records, with some fairly
> straightforward GROUP BY/SUM calculations).
>
> It seems like this query ran for about 10 hours and then terminated on
> deadlock.  I'm going to turn on the deadlock trace flags
> to see if I can get some more information about it.
>
> In the case of the other query, it is being run on another server during
> an
> Analysis Services cube update.  Once again,
> I suppose the plan could have changed from SP3a to SP4, but I haven't done
> anything else that I would expect to make
> the plan change.
>
> Regards,
>
> Jonathan
>
>>
>> --
>> Santeri (Santtu) Voutilainen
>>
>> This posting is provided "AS IS" with no warranties, and confers no
>> rights.
>>
>>
>> "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message
>> news:7E009F0E-C354-4975-854D-5A9A93036507@microsoft.com...
>> > Santtu,
>> >
>> > Thanks very much for your reply.
>> >
>> > "Santeri Voutilainen [MSFT]" wrote:
>> >> What you are seeing is not related to locks, rather it is related to
>> >> the
>> >> latch waits you see in sysprocesses.  In SP4 we are able to determine
>> >> the
>> >> blocking spid for latch waits when the blocking spid holds the latch
>> >> in
>> >> exclusive or update mode.  In these cases the blocking spid
>> >> information
>> >> is
>> >> populated into the "blocked" field in sysprocesses.
>> >
>> > That  information is certainly going to be quite helpful.
>> >
>> > But what's alarming is that the queries in question seem to be running
>> > much
>> > slower than they did in SP3a.
>> > As I think I mentioned, several of my queries now take 5-10x longer
>> > than
>> > they used to.  And one query that
>> > used to complete in under an hour now takes about 10-12 hours, then is
>> > terminated as a deadlock victim --
>> > even though there shouldn't be any other queries running on the same
>> > database at the time the deadlock is
>> > detected.
>> >
>> > I'm not exactly sure what's going on.  It happens on two different
>> > machines,
>> > but only on specific queries.  One
>> > more clue is that the queries seem to do a lot of work in parallel
>> > fashion
>> > (with 4-10 threads working on the
>> > query).  They then go to one thread that spends a lot of time in
>> > PAGEIOLATCH_SH state, and accumulate many
>> > many locks (after five hours, one query had accumulated more than a
>> > million
>> > locks).
>> >
>> > Any suggestions gladly welcomed!
>> >
>> > Regards,
>> >
>> > Jonathan
>> >
>> >> So why is the spid blocked by itself?  Well, that has to do with how
>> >> latches
>> >> are used for IO operations.  When an IO is issued for page, a latch is
>> >> held
>> >> on the page.  Depending on the IO operation, the latch mode acquired
>> >> is
>> >> shared (SH) or exclusive (EX).  The latch for the IO is acquired by
>> >> the
>> >> thread that issues the IO.  Since all SQL Server IO operations are
>> >> asynchronous, if the spid that issued the IO wants to wait for the IO
>> >> to
>> >> complete it will attempt to acquire another latch on the same page
>> >> after
>> >> issuing the IO.  The first latch is released when the IO completes.
>> >> This
>> >> release allows the second latch request to be granted.
>> >> Here's an example of how this works:
>> >> 1. Spid 55 wants to read page P1 which does not exist in the buffer
>> >> pool.
>> >> 2. Spid 55 acquires an EX latch on page P1 -- this marks spid 55 as
>> >> owning
>> >> the latch.  The latch is in an in memory data structure, not the
>> >> physical
>> >> page itself.  Since the page does not yet exist in memory the mode is
>> >> EX
>> >> in
>> >> order to force other spids that may also want to access the page to
>> >> wait
>> >> for
>> >> the IO to complete and also to prevent them from issueing a second IO
>> >> operation for the same page.
>> >> 3. Spid 55 issues the IO request to read P1 from disk.
>> >> 4. Since Spid 55 wants to read the page, it must wait for the IO to
>> >> complete.  It does this by attempting to acquire another latch (in
>> >> this
>> >> case
>> >> a share (SH)) latch on the page.  Since the latch is already held in
>> >> EX,
>> >> the
>> >> SH request is blocked and the spid is suspended.
>> >> 5. Upon completion of the IO the EX latch on the page is released.
>> >> 6. The release of the EX latch grants the SH latch to spid 55.
>> >> 7. Spid 55 can now read the page.
>> >>
>> >> For the duration between steps 4 (the SH latch acquire) and step 5 (EX
>> >> latch
>> >> release) sysprocesses will indicate that spid 55 is blocked by itself
>> >> with a
>> >> wait type of PAGEIOLATCH_XX (where XX can be SH, UP, or EX) as an
>> >> indication
>> >> that it is waiting for the completion of an IO that it itself issued.
>> >>
>> >> --
>> >> Santeri (Santtu) Voutilainen
>> >>
>> >> This posting is provided "AS IS" with no warranties, and confers no
>> >> rights.
>> >>
>> >>
>> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
>> >> news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com...
>> >> > Adrian,
>> >> >
>> >> > Thanks for your reply.
>> >> >
>> >> > I'm currently working on updating the stats for my big tables.   I
>> >> > have
>> >> > had
>> >> > some update stats running for more than 24 hours -- is that normal?
>> >> >
>> >> > Here is an example of sp_locks and sysprocesses output for a query
>> >> > that
>> >> > took
>> >> > about an hour in sp3a, and is taking 5-10 times longer with sp4.  I
>> >> > have
>> >> > updated the stats for all the tables in the query.
>> >> >
>> >> > sp_lock:
>> >> > 64 8 0 0 DB                 S GRANT
>> >> > 64 8 1332199796 0 TAB                 Sch-S GRANT
>> >> > 64 8 435532635 1 PAG 5:3926175       IS GRANT
>> >> > 64 8 1316199739 0 TAB                 Sch-S GRANT
>> >> > 64 8 1145823194 1 PAG 5:3999638       S GRANT
>> >> > 64 8 1284199625 0 TAB                 IS GRANT
>> >> > 64 8 435532635 1 KEY (7b001c56dc00)  S GRANT
>> >> > 64 8 1333631844 0 TAB                 IS GRANT
>> >> > 64 8 1145823194 0 TAB                 IS GRANT
>> >> > 64 8 435532635 0 TAB                 IS GRANT
>> >> > 64 8 784721848 0 TAB                 IS GRANT
>> >> > 64 8 1236199454 0 TAB                 Sch-S GRANT
>> >> > 64 8 435532635 2 PAG 4:3195035       IS GRANT
>> >> > 64 8 1076198884 0 TAB                 Sch-S GRANT
>> >> > 64 8 1284199625 1 PAG 1:4998275       IS GRANT
>> >> > 64 8 1333631844 1 PAG 4:3107397       IS GRANT
>> >> >
>> >> > sysprocesses
>> >> > 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381
>> >> > 0
>> >> > 2005-05-14
>> >> > 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping
>> >> > 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
>> >> > KAGUYA-Cube
>> >> >
>> >> >                                      Microsoft SQL Server Analysis
>> >> > Services
>> >> >
>> >> >           2104    SELECT
>> >> >
>> >> >
>> >> >
>> >> > 000D601CAB48 TCP/IP      analysis
>> >> >
>> >> > 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
>> >> > 0x010008001012822AC080EDB40000000000000000 76 -1
>> >> >
>> >> >
>> >> > Appreciate your help.
>> >> >
>> >> > -- J
>> >> > "Adrian Zajkeskovic" wrote:
>> >> >
>> >> >> If you haven't done it already, update statistics with full scan or
>> >> >> run
>> >> >> dbcc
>> >> >> dbreindex. Then reproduce the problem again and post the output of
>> >> >> sp_lock
>> >> >> and select * from sysprocesses for that spid.
>> >> >>
>> >> >> Adrian
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
>> >> >> news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com...
>> >> >> > Sorry for premature posting.  Here's what I meant to ask:
>> >> >> >
>> >> >> > I've noticed that several heavy queries that worked fine in SP3a
>> >> >> > now
>> >> >> > have
>> >> >> > unusual behavior.  The queries report that they are terminated as
>> >> >> > a
>> >> >> > deadlock
>> >> >> > victim; however, there aren't any DML operations running on the
>> >> >> > database
>> >> >> > at
>> >> >> > the time.  When I look at the status with sp_who2, I see that the
>> >> >> > query
>> >> >> > appears to be blocked by itself:
>> >> >> >
>> >> >> > SPID         Status      Login              BlkBy     Command
>> >> >> > 71   sleeping   reporting ... 71     ... SELECT
>> >> >> >
>> >> >> > sp_lock reports that the process has an exclusive lock on an
>> >> >> > extent
>> >> >> > in
>> >> >> > tempdb, but I don't see any other X locks around.
>> >> >> >
>> >> >> > Has anyone else seen anything like this?
>> >> >> >
>> >> >> > Regards,
>> >> >> >
>> >> >> > Jonathan
>> >> >>
>> >> >>
>> >> >>
>> >>
>> >>
>> >>
>>
>>
>>
Author
24 May 2005 12:41 AM
Jonathan Levine
Santtu,

   Thanks for your reply.

"Santeri Voutilainen [MSFT]" wrote:
> To help track this down, could you turn on statistics profile (set
> statistics profile on), run your query, and send me the statistics profile
> output?  You can turn the statistics profile off after running the query.

I'll do my best.  The problem is that the main query I'm having trouble with
typically
no longer runs to completion. The query is input to a Report Services daily
report based
on data loaded daily.  The query used to finish in about 45 minutes (SP3a),
now it can
run for up to about 24 hours, at which point it deadlocks on the daily
INSERT and is
terminated.

Another query I'm having trouble with is done by Analysis Services to
refresh a partition.
This one used to take about 30 minutes, now it can take up to 4 hours.  The
problem with
this one is that the output is pretty big, so I'm not sure how long it will
take to run the
query in Query Analyzer.

One interesting thing: If I run the Report Services query as SELECT TOP n
(I've tried with n as
large as 10000), it finishes really quickly.  This is pretty strange to me,
as most of the
"heavy" work of the query is done in two sub-selects which are later joined
together.

> Do you have you auto stats enabled?  If not, do you regular update
> statistics?

I update statistics every day, and I automatically REINDEX when
fragmentation reaches about
10%.

Regards,

Jonathan
Show quote
>
> Thanks,
> Santeri (Santtu) Voutilainen
>
> This posting is provided "AS IS" with no warranties, and confers no rights.
>
>
> "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message
> news:B8769300-13CC-4E09-8CE4-672585525EFC@microsoft.com...
> > Santu,
> >
> > Thanks for your reply.
> >
> > "Santeri Voutilainen [MSFT]" wrote:
> >> Do you happen to know whether your queries had any plan changes?  The
> >> increased PAGEIOLATCH waits and increased lock accumulation could be
> >> symptoms of a plan change, for example from an index seek to a table
> >> scan,
> >> or changes in join types.
> >
> > I don't think there's been any change in plan.
> >
> > I tried creating an indexed view to eliminate one of the joins on one of
> > the
> > queries.  It did simplify the Estimated plan,
> > and there are no table scans, but the query still does not complete.
> >
> > In the case of this query, if I limit it the query (for example SELECT TOP
> > 100 ...) it does complete.  (I estimate the result set for the full query
> > to
> > be
> > on the order of hundreds of thousands of records, with some fairly
> > straightforward GROUP BY/SUM calculations).
> >
> > It seems like this query ran for about 10 hours and then terminated on
> > deadlock.  I'm going to turn on the deadlock trace flags
> > to see if I can get some more information about it.
> >
> > In the case of the other query, it is being run on another server during
> > an
> > Analysis Services cube update.  Once again,
> > I suppose the plan could have changed from SP3a to SP4, but I haven't done
> > anything else that I would expect to make
> > the plan change.
> >
> > Regards,
> >
> > Jonathan
> >
> >>
> >> --
> >> Santeri (Santtu) Voutilainen
> >>
> >> This posting is provided "AS IS" with no warranties, and confers no
> >> rights.
> >>
> >>
> >> "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message
> >> news:7E009F0E-C354-4975-854D-5A9A93036507@microsoft.com...
> >> > Santtu,
> >> >
> >> > Thanks very much for your reply.
> >> >
> >> > "Santeri Voutilainen [MSFT]" wrote:
> >> >> What you are seeing is not related to locks, rather it is related to
> >> >> the
> >> >> latch waits you see in sysprocesses.  In SP4 we are able to determine
> >> >> the
> >> >> blocking spid for latch waits when the blocking spid holds the latch
> >> >> in
> >> >> exclusive or update mode.  In these cases the blocking spid
> >> >> information
> >> >> is
> >> >> populated into the "blocked" field in sysprocesses.
> >> >
> >> > That  information is certainly going to be quite helpful.
> >> >
> >> > But what's alarming is that the queries in question seem to be running
> >> > much
> >> > slower than they did in SP3a.
> >> > As I think I mentioned, several of my queries now take 5-10x longer
> >> > than
> >> > they used to.  And one query that
> >> > used to complete in under an hour now takes about 10-12 hours, then is
> >> > terminated as a deadlock victim --
> >> > even though there shouldn't be any other queries running on the same
> >> > database at the time the deadlock is
> >> > detected.
> >> >
> >> > I'm not exactly sure what's going on.  It happens on two different
> >> > machines,
> >> > but only on specific queries.  One
> >> > more clue is that the queries seem to do a lot of work in parallel
> >> > fashion
> >> > (with 4-10 threads working on the
> >> > query).  They then go to one thread that spends a lot of time in
> >> > PAGEIOLATCH_SH state, and accumulate many
> >> > many locks (after five hours, one query had accumulated more than a
> >> > million
> >> > locks).
> >> >
> >> > Any suggestions gladly welcomed!
> >> >
> >> > Regards,
> >> >
> >> > Jonathan
> >> >
> >> >> So why is the spid blocked by itself?  Well, that has to do with how
> >> >> latches
> >> >> are used for IO operations.  When an IO is issued for page, a latch is
> >> >> held
> >> >> on the page.  Depending on the IO operation, the latch mode acquired
> >> >> is
> >> >> shared (SH) or exclusive (EX).  The latch for the IO is acquired by
> >> >> the
> >> >> thread that issues the IO.  Since all SQL Server IO operations are
> >> >> asynchronous, if the spid that issued the IO wants to wait for the IO
> >> >> to
> >> >> complete it will attempt to acquire another latch on the same page
> >> >> after
> >> >> issuing the IO.  The first latch is released when the IO completes.
> >> >> This
> >> >> release allows the second latch request to be granted.
> >> >> Here's an example of how this works:
> >> >> 1. Spid 55 wants to read page P1 which does not exist in the buffer
> >> >> pool.
> >> >> 2. Spid 55 acquires an EX latch on page P1 -- this marks spid 55 as
> >> >> owning
> >> >> the latch.  The latch is in an in memory data structure, not the
> >> >> physical
> >> >> page itself.  Since the page does not yet exist in memory the mode is
> >> >> EX
> >> >> in
> >> >> order to force other spids that may also want to access the page to
> >> >> wait
> >> >> for
> >> >> the IO to complete and also to prevent them from issueing a second IO
> >> >> operation for the same page.
> >> >> 3. Spid 55 issues the IO request to read P1 from disk.
> >> >> 4. Since Spid 55 wants to read the page, it must wait for the IO to
> >> >> complete.  It does this by attempting to acquire another latch (in
> >> >> this
> >> >> case
> >> >> a share (SH)) latch on the page.  Since the latch is already held in
> >> >> EX,
> >> >> the
> >> >> SH request is blocked and the spid is suspended.
> >> >> 5. Upon completion of the IO the EX latch on the page is released.
> >> >> 6. The release of the EX latch grants the SH latch to spid 55.
> >> >> 7. Spid 55 can now read the page.
> >> >>
> >> >> For the duration between steps 4 (the SH latch acquire) and step 5 (EX
> >> >> latch
> >> >> release) sysprocesses will indicate that spid 55 is blocked by itself
> >> >> with a
> >> >> wait type of PAGEIOLATCH_XX (where XX can be SH, UP, or EX) as an
> >> >> indication
> >> >> that it is waiting for the completion of an IO that it itself issued.
> >> >>
> >> >> --
> >> >> Santeri (Santtu) Voutilainen
> >> >>
> >> >> This posting is provided "AS IS" with no warranties, and confers no
> >> >> rights.
> >> >>
> >> >>
> >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
> >> >> news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com...
> >> >> > Adrian,
> >> >> >
> >> >> > Thanks for your reply.
> >> >> >
> >> >> > I'm currently working on updating the stats for my big tables.   I
> >> >> > have
> >> >> > had
> >> >> > some update stats running for more than 24 hours -- is that normal?
> >> >> >
> >> >> > Here is an example of sp_locks and sysprocesses output for a query
> >> >> > that
> >> >> > took
> >> >> > about an hour in sp3a, and is taking 5-10 times longer with sp4.  I
> >> >> > have
> >> >> > updated the stats for all the tables in the query.
> >> >> >
> >> >> > sp_lock:
> >> >> > 64 8 0 0 DB                 S GRANT
> >> >> > 64 8 1332199796 0 TAB                 Sch-S GRANT
> >> >> > 64 8 435532635 1 PAG 5:3926175       IS GRANT
> >> >> > 64 8 1316199739 0 TAB                 Sch-S GRANT
> >> >> > 64 8 1145823194 1 PAG 5:3999638       S GRANT
> >> >> > 64 8 1284199625 0 TAB                 IS GRANT
> >> >> > 64 8 435532635 1 KEY (7b001c56dc00)  S GRANT
> >> >> > 64 8 1333631844 0 TAB                 IS GRANT
> >> >> > 64 8 1145823194 0 TAB                 IS GRANT
> >> >> > 64 8 435532635 0 TAB                 IS GRANT
> >> >> > 64 8 784721848 0 TAB                 IS GRANT
> >> >> > 64 8 1236199454 0 TAB                 Sch-S GRANT
> >> >> > 64 8 435532635 2 PAG 4:3195035       IS GRANT
> >> >> > 64 8 1076198884 0 TAB                 Sch-S GRANT
> >> >> > 64 8 1284199625 1 PAG 1:4998275       IS GRANT
> >> >> > 64 8 1333631844 1 PAG 4:3107397       IS GRANT
> >> >> >
> >> >> > sysprocesses
> >> >> > 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381
> >> >> > 0
> >> >> > 2005-05-14
> >> >> > 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping
> >> >> > 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> >> >> > KAGUYA-Cube
> >> >> >
> >> >> >                                      Microsoft SQL Server Analysis
> >> >> > Services
> >> >> >
> >> >> >           2104    SELECT
> >> >> >
> >> >> >
> >> >> >
> >> >> > 000D601CAB48 TCP/IP      analysis
> >> >> >
> >> >> > 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> >> >> > 0x010008001012822AC080EDB40000000000000000 76 -1
> >> >> >
> >> >> >
> >> >> > Appreciate your help.
> >> >> >
> >> >> > -- J
> >> >> > "Adrian Zajkeskovic" wrote:
> >> >> >
> >> >> >> If you haven't done it already, update statistics with full scan or
> >> >> >> run
> >> >> >> dbcc
> >> >> >> dbreindex. Then reproduce the problem again and post the output of
> >> >> >> sp_lock
> >> >> >> and select * from sysprocesses for that spid.
> >> >> >>
> >> >> >> Adrian
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
> >> >> >> news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com...
> >> >> >> > Sorry for premature posting.  Here's what I meant to ask:
> >> >> >> >
> >> >> >> > I've noticed that several heavy queries that worked fine in SP3a
> >> >> >> > now
> >> >> >> > have
> >> >> >> > unusual behavior.  The queries report that they are terminated as
> >> >> >> > a
> >> >> >> > deadlock
> >> >> >> > victim; however, there aren't any DML operations running on the
> >> >> >> > database
> >> >> >> > at
> >> >> >> > the time.  When I look at the status with sp_who2, I see that the
> >> >> >> > query
> >> >> >> > appears to be blocked by itself:
> >> >> >> >
> >> >> >> > SPID         Status      Login              BlkBy     Command
> >> >> >> > 71   sleeping   reporting ... 71     ... SELECT
> >> >> >> >
> >> >> >> > sp_lock reports that the process has an exclusive lock on an
> >> >> >> > extent
> >> >> >> > in
> >> >> >> > tempdb, but I don't see any other X locks around.
> >> >> >> >
> >> >> >> > Has anyone else seen anything like this?
> >> >> >> >
> >> >> >> > Regards,
> >> >> >> >
> >> >> >> > Jonathan
> >> >> >>
> >> >> >>
> >> >> >>
> >> >>
> >> >>
> >> >>
> >>
> >>
> >>
>
>
>
Author
25 May 2005 1:33 AM
Jonathan Levine
Santtu,

   Thanks for your reply via email to my statistics output.

    Here's something else interesting.  If I add a normal index on the
columns DateTable.year, DateTable.month
and DateTable.quarter, then the SP4 query plan for:

SELECT "dbo"."PurchaseCubeView_2"."Gender",
"dbo"."PurchaseCubeView_2"."Age",
"dbo"."DateView"."day", "dbo"."PurchaseCubeView_2"."genre_id",
"dbo"."ItemView"."shop_item_id", "dbo"."PurchaseCubeView_2"."units",
"dbo"."PurchaseCubeView_2"."price",
"dbo"."PurchaseCubeView_2"."price"*"dbo"."PurchaseCubeView_2"."units"
FROM "dbo"."PurchaseCubeView_2", "dbo"."DateView", "dbo"."ItemView"
WHERE
(reg_date >= '2005-5-1' and reg_date < '2005-6-1') AND
("dbo"."DateView"."month"=5) AND
("dbo"."PurchaseCubeView_2"."reg_date"="dbo"."DateView"."date_to_hour") AND
("dbo"."DateView"."quarter"=2) AND ("dbo"."DateView"."year"=2005) AND
("dbo"."ItemView"."shop_id"="dbo"."PurchaseCubeView_2"."shop_id") AND
("dbo"."ItemView"."item_id"="dbo"."PurchaseCubeView_2"."item_id")

is much, much worse than the plan for

SELECT "dbo"."PurchaseCubeView_2"."Gender",
"dbo"."PurchaseCubeView_2"."Age",
"dbo"."DateView"."day", "dbo"."PurchaseCubeView_2"."genre_id",
"dbo"."ItemView"."shop_item_id", "dbo"."PurchaseCubeView_2"."units",
"dbo"."PurchaseCubeView_2"."price",
"dbo"."PurchaseCubeView_2"."price"*"dbo"."PurchaseCubeView_2"."units"
FROM "dbo"."PurchaseCubeView_2", "dbo"."DateView", "dbo"."ItemView"
WHERE
(reg_date >= '2005-5-1' and reg_date < '2005-6-1') AND
(date_to_hour >= '2005-5-1' and date_to_hour < '2005-6-1') AND
("dbo"."DateView"."month"=5) AND
("dbo"."PurchaseCubeView_2"."reg_date"="dbo"."DateView"."date_to_hour") AND
("dbo"."DateView"."quarter"=2) AND ("dbo"."DateView"."year"=2005) AND
("dbo"."ItemView"."shop_id"="dbo"."PurchaseCubeView_2"."shop_id") AND
("dbo"."ItemView"."item_id"="dbo"."PurchaseCubeView_2"."item_id")

There is a clustered index on DateTable.date_to_hour (the underlying table
for the view)

The first query (without (date_to_hour >= '2005-5-1' and date_to_hour <
'2005-6-1'))
seems to scan the entire DateTable clustered index (about 3 million rows).

The second query (with (date_to_hour >= '2005-5-1' and date_to_hour <
'2005-6-1'))
does an index seek on the non-clustered index (about 500 rows).

I think this is the problem.  Seems like a regression in SP4?  I can provide
more details if you'd like.

Regards,

Jonathan

Show quote
"Jonathan Levine" wrote:

> Santtu,
>
>    Thanks for your reply.
>
> "Santeri Voutilainen [MSFT]" wrote:
> > To help track this down, could you turn on statistics profile (set
> > statistics profile on), run your query, and send me the statistics profile
> > output?  You can turn the statistics profile off after running the query.
>
> I'll do my best.  The problem is that the main query I'm having trouble with
> typically
> no longer runs to completion. The query is input to a Report Services daily
> report based
> on data loaded daily.  The query used to finish in about 45 minutes (SP3a),
> now it can
> run for up to about 24 hours, at which point it deadlocks on the daily
> INSERT and is
> terminated.
>
> Another query I'm having trouble with is done by Analysis Services to
> refresh a partition.
> This one used to take about 30 minutes, now it can take up to 4 hours.  The
> problem with
> this one is that the output is pretty big, so I'm not sure how long it will
> take to run the
> query in Query Analyzer.
>
> One interesting thing: If I run the Report Services query as SELECT TOP n
> (I've tried with n as
> large as 10000), it finishes really quickly.  This is pretty strange to me,
> as most of the
> "heavy" work of the query is done in two sub-selects which are later joined
> together.
>
> > Do you have you auto stats enabled?  If not, do you regular update
> > statistics?
>
> I update statistics every day, and I automatically REINDEX when
> fragmentation reaches about
> 10%.
>
> Regards,
>
> Jonathan
> >
> > Thanks,
> > Santeri (Santtu) Voutilainen
> >
> > This posting is provided "AS IS" with no warranties, and confers no rights.
> >
> >
> > "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message
> > news:B8769300-13CC-4E09-8CE4-672585525EFC@microsoft.com...
> > > Santu,
> > >
> > > Thanks for your reply.
> > >
> > > "Santeri Voutilainen [MSFT]" wrote:
> > >> Do you happen to know whether your queries had any plan changes?  The
> > >> increased PAGEIOLATCH waits and increased lock accumulation could be
> > >> symptoms of a plan change, for example from an index seek to a table
> > >> scan,
> > >> or changes in join types.
> > >
> > > I don't think there's been any change in plan.
> > >
> > > I tried creating an indexed view to eliminate one of the joins on one of
> > > the
> > > queries.  It did simplify the Estimated plan,
> > > and there are no table scans, but the query still does not complete.
> > >
> > > In the case of this query, if I limit it the query (for example SELECT TOP
> > > 100 ...) it does complete.  (I estimate the result set for the full query
> > > to
> > > be
> > > on the order of hundreds of thousands of records, with some fairly
> > > straightforward GROUP BY/SUM calculations).
> > >
> > > It seems like this query ran for about 10 hours and then terminated on
> > > deadlock.  I'm going to turn on the deadlock trace flags
> > > to see if I can get some more information about it.
> > >
> > > In the case of the other query, it is being run on another server during
> > > an
> > > Analysis Services cube update.  Once again,
> > > I suppose the plan could have changed from SP3a to SP4, but I haven't done
> > > anything else that I would expect to make
> > > the plan change.
> > >
> > > Regards,
> > >
> > > Jonathan
> > >
> > >>
> > >> --
> > >> Santeri (Santtu) Voutilainen
> > >>
> > >> This posting is provided "AS IS" with no warranties, and confers no
> > >> rights.
> > >>
> > >>
> > >> "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message
> > >> news:7E009F0E-C354-4975-854D-5A9A93036507@microsoft.com...
> > >> > Santtu,
> > >> >
> > >> > Thanks very much for your reply.
> > >> >
> > >> > "Santeri Voutilainen [MSFT]" wrote:
> > >> >> What you are seeing is not related to locks, rather it is related to
> > >> >> the
> > >> >> latch waits you see in sysprocesses.  In SP4 we are able to determine
> > >> >> the
> > >> >> blocking spid for latch waits when the blocking spid holds the latch
> > >> >> in
> > >> >> exclusive or update mode.  In these cases the blocking spid
> > >> >> information
> > >> >> is
> > >> >> populated into the "blocked" field in sysprocesses.
> > >> >
> > >> > That  information is certainly going to be quite helpful.
> > >> >
> > >> > But what's alarming is that the queries in question seem to be running
> > >> > much
> > >> > slower than they did in SP3a.
> > >> > As I think I mentioned, several of my queries now take 5-10x longer
> > >> > than
> > >> > they used to.  And one query that
> > >> > used to complete in under an hour now takes about 10-12 hours, then is
> > >> > terminated as a deadlock victim --
> > >> > even though there shouldn't be any other queries running on the same
> > >> > database at the time the deadlock is
> > >> > detected.
> > >> >
> > >> > I'm not exactly sure what's going on.  It happens on two different
> > >> > machines,
> > >> > but only on specific queries.  One
> > >> > more clue is that the queries seem to do a lot of work in parallel
> > >> > fashion
> > >> > (with 4-10 threads working on the
> > >> > query).  They then go to one thread that spends a lot of time in
> > >> > PAGEIOLATCH_SH state, and accumulate many
> > >> > many locks (after five hours, one query had accumulated more than a
> > >> > million
> > >> > locks).
> > >> >
> > >> > Any suggestions gladly welcomed!
> > >> >
> > >> > Regards,
> > >> >
> > >> > Jonathan
> > >> >
> > >> >> So why is the spid blocked by itself?  Well, that has to do with how
> > >> >> latches
> > >> >> are used for IO operations.  When an IO is issued for page, a latch is
> > >> >> held
> > >> >> on the page.  Depending on the IO operation, the latch mode acquired
> > >> >> is
> > >> >> shared (SH) or exclusive (EX).  The latch for the IO is acquired by
> > >> >> the
> > >> >> thread that issues the IO.  Since all SQL Server IO operations are
> > >> >> asynchronous, if the spid that issued the IO wants to wait for the IO
> > >> >> to
> > >> >> complete it will attempt to acquire another latch on the same page
> > >> >> after
> > >> >> issuing the IO.  The first latch is released when the IO completes.
> > >> >> This
> > >> >> release allows the second latch request to be granted.
> > >> >> Here's an example of how this works:
> > >> >> 1. Spid 55 wants to read page P1 which does not exist in the buffer
> > >> >> pool.
> > >> >> 2. Spid 55 acquires an EX latch on page P1 -- this marks spid 55 as
> > >> >> owning
> > >> >> the latch.  The latch is in an in memory data structure, not the
> > >> >> physical
> > >> >> page itself.  Since the page does not yet exist in memory the mode is
> > >> >> EX
> > >> >> in
> > >> >> order to force other spids that may also want to access the page to
> > >> >> wait
> > >> >> for
> > >> >> the IO to complete and also to prevent them from issueing a second IO
> > >> >> operation for the same page.
> > >> >> 3. Spid 55 issues the IO request to read P1 from disk.
> > >> >> 4. Since Spid 55 wants to read the page, it must wait for the IO to
> > >> >> complete.  It does this by attempting to acquire another latch (in
> > >> >> this
> > >> >> case
> > >> >> a share (SH)) latch on the page.  Since the latch is already held in
> > >> >> EX,
> > >> >> the
> > >> >> SH request is blocked and the spid is suspended.
> > >> >> 5. Upon completion of the IO the EX latch on the page is released.
> > >> >> 6. The release of the EX latch grants the SH latch to spid 55.
> > >> >> 7. Spid 55 can now read the page.
> > >> >>
> > >> >> For the duration between steps 4 (the SH latch acquire) and step 5 (EX
> > >> >> latch
> > >> >> release) sysprocesses will indicate that spid 55 is blocked by itself
> > >> >> with a
> > >> >> wait type of PAGEIOLATCH_XX (where XX can be SH, UP, or EX) as an
> > >> >> indication
> > >> >> that it is waiting for the completion of an IO that it itself issued.
> > >> >>
> > >> >> --
> > >> >> Santeri (Santtu) Voutilainen
> > >> >>
> > >> >> This posting is provided "AS IS" with no warranties, and confers no
> > >> >> rights.
> > >> >>
> > >> >>
> > >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
> > >> >> news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com...
> > >> >> > Adrian,
> > >> >> >
> > >> >> > Thanks for your reply.
> > >> >> >
> > >> >> > I'm currently working on updating the stats for my big tables.   I
> > >> >> > have
> > >> >> > had
> > >> >> > some update stats running for more than 24 hours -- is that normal?
> > >> >> >
> > >> >> > Here is an example of sp_locks and sysprocesses output for a query
> > >> >> > that
> > >> >> > took
> > >> >> > about an hour in sp3a, and is taking 5-10 times longer with sp4.  I
> > >> >> > have
> > >> >> > updated the stats for all the tables in the query.
> > >> >> >
> > >> >> > sp_lock:
> > >> >> > 64 8 0 0 DB                 S GRANT
> > >> >> > 64 8 1332199796 0 TAB                 Sch-S GRANT
> > >> >> > 64 8 435532635 1 PAG 5:3926175       IS GRANT
> > >> >> > 64 8 1316199739 0 TAB                 Sch-S GRANT
> > >> >> > 64 8 1145823194 1 PAG 5:3999638       S GRANT
> > >> >> > 64 8 1284199625 0 TAB                 IS GRANT
> > >> >> > 64 8 435532635 1 KEY (7b001c56dc00)  S GRANT
> > >> >> > 64 8 1333631844 0 TAB                 IS GRANT
> > >> >> > 64 8 1145823194 0 TAB                 IS GRANT
> > >> >> > 64 8 435532635 0 TAB                 IS GRANT
> > >> >> > 64 8 784721848 0 TAB                 IS GRANT
> > >> >> > 64 8 1236199454 0 TAB                 Sch-S GRANT
> > >> >> > 64 8 435532635 2 PAG 4:3195035       IS GRANT
> > >> >> > 64 8 1076198884 0 TAB                 Sch-S GRANT
> > >> >> > 64 8 1284199625 1 PAG 1:4998275       IS GRANT
> > >> >> > 64 8 1333631844 1 PAG 4:3107397       IS GRANT
> > >> >> >
> > >> >> > sysprocesses
> > >> >> > 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381
> > >> >> > 0
> > >> >> > 2005-05-14
> > >> >> > 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping
> > >> >> > 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> > >> >> > KAGUYA-Cube
> > >> >> >
> > >> >> >                                      Microsoft SQL Server Analysis
> > >> >> > Services
> > >> >> >
> > >> >> >           2104    SELECT
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > 000D601CAB48 TCP/IP      analysis
> > >> >> >
> > >> >> > 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> > >> >> > 0x010008001012822AC080EDB40000000000000000 76 -1
> > >> >> >
> > >> >> >
> > >> >> > Appreciate your help.
> > >> >> >
> > >> >> > -- J
> > >> >> > "Adrian Zajkeskovic" wrote:
> > >> >> >
> > >> >> >> If you haven't done it already, update statistics with full scan or
> > >> >> >> run
> > >> >> >> dbcc
> > >> >> >> dbreindex. Then reproduce the problem again and post the output of
> > >> >> >> sp_lock
> > >> >> >> and select * from sysprocesses for that spid.
> > >> >> >>
> > >> >> >> Adrian
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >>
> > >> >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
> > >> >> >> news:A9F5109B-6C54-424B-9D03-A9B84E01B7F8@microsoft.com...
> > >> >> >> > Sorry for premature posting.  Here's what I meant to ask:
> > >> >> >> >
> > >> >> >> > I've noticed that several heavy queries that worked fine in SP3a
> > >> >> >> > now
> > >> >> >> > have
> > >> >> >> > unusual behavior.  The queries report that they are terminated as
> > >> >> >> > a
> > >> >> >> > deadlock
> > >> >> >> > victim; however, there aren't any DML operations running on the
> > >> >> >> > database
> > >> >> >> > at
> > >> >> >> > the time.  When I look at the status with sp_who2, I see that the
> > >> >> >> > query
> > >> >> >> > appears to be blocked by itself:
Author
24 May 2005 7:15 AM
Jonathan Levine
Santtu,

    Here's something strange.

    When I run the query from Query Analyzer, it seems to compile to a
different execution
plan than when I run the same query from Analysis Services SP1.

    I've tried looking at SQL Profiler output for Show Plan All, and when I
run the query from Query Analyzer,
it picks an Indexed View to use, and runs much faster.

    When I run the same query from Analysis Services (I cut and paste), it
picks the underlying tables, and runs
much, much slower.

    I'm looking into this some more, but if this reminds you of something,
I'd appreciate the help...

Regards,

Jonathan
Show quote
"Santeri Voutilainen [MSFT]" wrote:

> Jonathan,
>
> To help track this down, could you turn on statistics profile (set
> statistics profile on), run your query, and send me the statistics profile
> output?  You can turn the statistics profile off after running the query.
> Do you have you auto stats enabled?  If not, do you regular update
> statistics?
>
> Thanks,
> Santeri (Santtu) Voutilainen
>
> This posting is provided "AS IS" with no warranties, and confers no rights.
>
>
> "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message
> news:B8769300-13CC-4E09-8CE4-672585525EFC@microsoft.com...
> > Santu,
> >
> > Thanks for your reply.
> >
> > "Santeri Voutilainen [MSFT]" wrote:
> >> Do you happen to know whether your queries had any plan changes?  The
> >> increased PAGEIOLATCH waits and increased lock accumulation could be
> >> symptoms of a plan change, for example from an index seek to a table
> >> scan,
> >> or changes in join types.
> >
> > I don't think there's been any change in plan.
> >
> > I tried creating an indexed view to eliminate one of the joins on one of
> > the
> > queries.  It did simplify the Estimated plan,
> > and there are no table scans, but the query still does not complete.
> >
> > In the case of this query, if I limit it the query (for example SELECT TOP
> > 100 ...) it does complete.  (I estimate the result set for the full query
> > to
> > be
> > on the order of hundreds of thousands of records, with some fairly
> > straightforward GROUP BY/SUM calculations).
> >
> > It seems like this query ran for about 10 hours and then terminated on
> > deadlock.  I'm going to turn on the deadlock trace flags
> > to see if I can get some more information about it.
> >
> > In the case of the other query, it is being run on another server during
> > an
> > Analysis Services cube update.  Once again,
> > I suppose the plan could have changed from SP3a to SP4, but I haven't done
> > anything else that I would expect to make
> > the plan change.
> >
> > Regards,
> >
> > Jonathan
> >
> >>
> >> --
> >> Santeri (Santtu) Voutilainen
> >>
> >> This posting is provided "AS IS" with no warranties, and confers no
> >> rights.
> >>
> >>
> >> "Jonathan Levine" <myfoo2@nospam.nospam> wrote in message
> >> news:7E009F0E-C354-4975-854D-5A9A93036507@microsoft.com...
> >> > Santtu,
> >> >
> >> > Thanks very much for your reply.
> >> >
> >> > "Santeri Voutilainen [MSFT]" wrote:
> >> >> What you are seeing is not related to locks, rather it is related to
> >> >> the
> >> >> latch waits you see in sysprocesses.  In SP4 we are able to determine
> >> >> the
> >> >> blocking spid for latch waits when the blocking spid holds the latch
> >> >> in
> >> >> exclusive or update mode.  In these cases the blocking spid
> >> >> information
> >> >> is
> >> >> populated into the "blocked" field in sysprocesses.
> >> >
> >> > That  information is certainly going to be quite helpful.
> >> >
> >> > But what's alarming is that the queries in question seem to be running
> >> > much
> >> > slower than they did in SP3a.
> >> > As I think I mentioned, several of my queries now take 5-10x longer
> >> > than
> >> > they used to.  And one query that
> >> > used to complete in under an hour now takes about 10-12 hours, then is
> >> > terminated as a deadlock victim --
> >> > even though there shouldn't be any other queries running on the same
> >> > database at the time the deadlock is
> >> > detected.
> >> >
> >> > I'm not exactly sure what's going on.  It happens on two different
> >> > machines,
> >> > but only on specific queries.  One
> >> > more clue is that the queries seem to do a lot of work in parallel
> >> > fashion
> >> > (with 4-10 threads working on the
> >> > query).  They then go to one thread that spends a lot of time in
> >> > PAGEIOLATCH_SH state, and accumulate many
> >> > many locks (after five hours, one query had accumulated more than a
> >> > million
> >> > locks).
> >> >
> >> > Any suggestions gladly welcomed!
> >> >
> >> > Regards,
> >> >
> >> > Jonathan
> >> >
> >> >> So why is the spid blocked by itself?  Well, that has to do with how
> >> >> latches
> >> >> are used for IO operations.  When an IO is issued for page, a latch is
> >> >> held
> >> >> on the page.  Depending on the IO operation, the latch mode acquired
> >> >> is
> >> >> shared (SH) or exclusive (EX).  The latch for the IO is acquired by
> >> >> the
> >> >> thread that issues the IO.  Since all SQL Server IO operations are
> >> >> asynchronous, if the spid that issued the IO wants to wait for the IO
> >> >> to
> >> >> complete it will attempt to acquire another latch on the same page
> >> >> after
> >> >> issuing the IO.  The first latch is released when the IO completes.
> >> >> This
> >> >> release allows the second latch request to be granted.
> >> >> Here's an example of how this works:
> >> >> 1. Spid 55 wants to read page P1 which does not exist in the buffer
> >> >> pool.
> >> >> 2. Spid 55 acquires an EX latch on page P1 -- this marks spid 55 as
> >> >> owning
> >> >> the latch.  The latch is in an in memory data structure, not the
> >> >> physical
> >> >> page itself.  Since the page does not yet exist in memory the mode is
> >> >> EX
> >> >> in
> >> >> order to force other spids that may also want to access the page to
> >> >> wait
> >> >> for
> >> >> the IO to complete and also to prevent them from issueing a second IO
> >> >> operation for the same page.
> >> >> 3. Spid 55 issues the IO request to read P1 from disk.
> >> >> 4. Since Spid 55 wants to read the page, it must wait for the IO to
> >> >> complete.  It does this by attempting to acquire another latch (in
> >> >> this
> >> >> case
> >> >> a share (SH)) latch on the page.  Since the latch is already held in
> >> >> EX,
> >> >> the
> >> >> SH request is blocked and the spid is suspended.
> >> >> 5. Upon completion of the IO the EX latch on the page is released.
> >> >> 6. The release of the EX latch grants the SH latch to spid 55.
> >> >> 7. Spid 55 can now read the page.
> >> >>
> >> >> For the duration between steps 4 (the SH latch acquire) and step 5 (EX
> >> >> latch
> >> >> release) sysprocesses will indicate that spid 55 is blocked by itself
> >> >> with a
> >> >> wait type of PAGEIOLATCH_XX (where XX can be SH, UP, or EX) as an
> >> >> indication
> >> >> that it is waiting for the completion of an IO that it itself issued.
> >> >>
> >> >> --
> >> >> Santeri (Santtu) Voutilainen
> >> >>
> >> >> This posting is provided "AS IS" with no warranties, and confers no
> >> >> rights.
> >> >>
> >> >>
> >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
> >> >> news:7A8A5EF7-4459-40B4-99A7-0EDAE49DD8EF@microsoft.com...
> >> >> > Adrian,
> >> >> >
> >> >> > Thanks for your reply.
> >> >> >
> >> >> > I'm currently working on updating the stats for my big tables.   I
> >> >> > have
> >> >> > had
> >> >> > some update stats running for more than 24 hours -- is that normal?
> >> >> >
> >> >> > Here is an example of sp_locks and sysprocesses output for a query
> >> >> > that
> >> >> > took
> >> >> > about an hour in sp3a, and is taking 5-10 times longer with sp4.  I
> >> >> > have
> >> >> > updated the stats for all the tables in the query.
> >> >> >
> >> >> > sp_lock:
> >> >> > 64 8 0 0 DB                 S GRANT
> >> >> > 64 8 1332199796 0 TAB                 Sch-S GRANT
> >> >> > 64 8 435532635 1 PAG 5:3926175       IS GRANT
> >> >> > 64 8 1316199739 0 TAB                 Sch-S GRANT
> >> >> > 64 8 1145823194 1 PAG 5:3999638       S GRANT
> >> >> > 64 8 1284199625 0 TAB                 IS GRANT
> >> >> > 64 8 435532635 1 KEY (7b001c56dc00)  S GRANT
> >> >> > 64 8 1333631844 0 TAB                 IS GRANT
> >> >> > 64 8 1145823194 0 TAB                 IS GRANT
> >> >> > 64 8 435532635 0 TAB                 IS GRANT
> >> >> > 64 8 784721848 0 TAB                 IS GRANT
> >> >> > 64 8 1236199454 0 TAB                 Sch-S GRANT
> >> >> > 64 8 435532635 2 PAG 4:3195035       IS GRANT
> >> >> > 64 8 1076198884 0 TAB                 Sch-S GRANT
> >> >> > 64 8 1284199625 1 PAG 1:4998275       IS GRANT
> >> >> > 64 8 1333631844 1 PAG 4:3107397       IS GRANT
> >> >> >
> >> >> > sysprocesses
> >> >> > 64 4232 64 0x0422 0 PAGEIOLATCH_SH 8:1:5156833 8 0 13219878 1160381
> >> >> > 0
> >> >> > 2005-05-14
> >> >> > 11:53:46.567 2005-05-14 12:04:11.867 0 0 sleeping
> >> >> > 0x2B82AF0AE9CA284BB2C9C1DC1A71D94A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> >> >> > KAGUYA-Cube
> >> >> >
> >> >> >                                      Microsoft SQL Server Analysis
> >> >> > Services
> >> >> >
> >> >> >           2104    SELECT
> >> >> >
> >> >> >
> >> >> >
> >> >> > 000D601CAB48 TCP/IP      analysis
> >> >> >
> >> >> > 0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> >> >> > 0x010008001012822AC080EDB40000000000000000 76 -1
> >> >> >
> >> >> >
> >> >> > Appreciate your help.
> >> >> >
> >> >> > -- J
> >> >> > "Adrian Zajkeskovic" wrote:
> >> >> >
> >> >> >> If you haven't done it already, update statistics with full scan or
> >> >> >> run
> >> >> >> dbcc
> >> >> >> dbreindex. Then reproduce the problem again and post the output of
> >> >> >> sp_lock
> >> >> >> and select * from sysprocesses for that spid.
> >> >> >>
> >> >> >> Adrian
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> "Jonathan Levine" <myfoo@nospam.nospam> wrote in message
> >>