Home All Groups Group Topic Archive Search About

Intermittent and undesired SQL execution performance behavior



Author
9 Dec 2008 7:29 PM
Martin Gasevski
All,

Please help!  Have a trully bizarre (at least to me) performance case that
cannot put a finger on and need prompt resolution advice as its been
troubling me for weeks on.

The sproc is attached.   At first I suspected its the nested views, so
created a flat sproc code where the views were all "AS my*" aliases.  The
issue still comes back.

The PROBLEM:
- One sproc that accepts bigint (a userID) at times performs as expected:
about 2500 logical disk reads; CPU and Duration well under 1sec.
- At irregular times during the day (or night), the same sproc takes 5-7min
and logical data reads reach over 11 million!!  I am not able to reproduce
this even when copy/paste the same RPC:Completed call into SQL Mgmt Studio
and run it immediately after seeing the anomalous behavior.

Note: The issue is intermittently observed even when same userID is passed.
Example of random userId:  these sometimes perform fine, sometimes i see the
11+ million reads.
"exec GetUserWorkflowCaseWorklist @userId=33" or "exec
GetUserWorkflowCaseWorklist @userId=52"

I will provide SQL Profiler Traces of 'good' and 'bad' performing SProcs
along
with execution plans additionally.

The SYSTEM:
Windows Server 2008 x32-bit
2 x QuadCore CPUs (not HT as Task Manager shows 8 CPU windows)
4GB RAM
SQL: all system mdf/ldf and user mdf files are on C:\ disk.  One exception,
the db's ldf file (whose sproc is mis-behaving) is on disk D:\.


(see Results03.txt)  All static values seem default SQL2005:
- SQL2005 SP2 STD
- Recovery model = Simple
- Compat. Level = SQL90
- data growth = 100MB, autogrow
- Log growth 10%, autogrow
- Nightly maint. plan run (all user databases, all default values, in
following order):
    a) Check DB Integrity
    b) Rebuild INdexes
    c) Reorganize Indexes
    d) Shrink Database
    e) Update Stats
    f) Cleanup History

Don't want to lead the thread and the discussions, but i cannot determine
where are those 11M reads spent on; the major difference I see b/w the two
execution plans is the use of Paralellism in the good exec plan.  Could this
be the cause and if so, why would SQL Engine choose NOT to use parallelism
at times but decide to go with a single thread executing it all?

Thank you much in advance,
-mg


[attached file: Results03.txt]
[attached file: Corrected_GetUserWorkflowCaseWorklist_Query.sql]

Author
9 Dec 2008 7:53 PM
Martin Gasevski
[addendum]

Attached are both of the Execution Plans...

I will try running in SSMS the "SET ARITHABORT OFF" statement before the
offending sproc execution to see if I could repro the issue.  Even if I
could repro, this still leaves me with question of why is this happening and
how to avoid this 'randomness' I'm seeing.

Any help is appreciated,
-mg

Show quoteHide quote
"Martin Gasevski" <martinDOTgasevskiATautodeskDOTcom> wrote in message
news:%23oEGRSjWJHA.556@TK2MSFTNGP06.phx.gbl...
> All,
>
> Please help!  Have a trully bizarre (at least to me) performance case that
> cannot put a finger on and need prompt resolution advice as its been
> troubling me for weeks on.
>
> The sproc is attached.   At first I suspected its the nested views, so
> created a flat sproc code where the views were all "AS my*" aliases.  The
> issue still comes back.
>
> The PROBLEM:
> - One sproc that accepts bigint (a userID) at times performs as expected:
> about 2500 logical disk reads; CPU and Duration well under 1sec.
> - At irregular times during the day (or night), the same sproc takes
> 5-7min
> and logical data reads reach over 11 million!!  I am not able to reproduce
> this even when copy/paste the same RPC:Completed call into SQL Mgmt Studio
> and run it immediately after seeing the anomalous behavior.
>
> Note: The issue is intermittently observed even when same userID is
> passed.
> Example of random userId:  these sometimes perform fine, sometimes i see
> the
> 11+ million reads.
> "exec GetUserWorkflowCaseWorklist @userId=33" or "exec
> GetUserWorkflowCaseWorklist @userId=52"
>
> I will provide SQL Profiler Traces of 'good' and 'bad' performing SProcs
> along
> with execution plans additionally.
>
> The SYSTEM:
> Windows Server 2008 x32-bit
> 2 x QuadCore CPUs (not HT as Task Manager shows 8 CPU windows)
> 4GB RAM
> SQL: all system mdf/ldf and user mdf files are on C:\ disk.  One
> exception,
> the db's ldf file (whose sproc is mis-behaving) is on disk D:\.
>
>
> (see Results03.txt)  All static values seem default SQL2005:
> - SQL2005 SP2 STD
> - Recovery model = Simple
> - Compat. Level = SQL90
> - data growth = 100MB, autogrow
> - Log growth 10%, autogrow
> - Nightly maint. plan run (all user databases, all default values, in
> following order):
>    a) Check DB Integrity
>    b) Rebuild INdexes
>    c) Reorganize Indexes
>    d) Shrink Database
>    e) Update Stats
>    f) Cleanup History
>
> Don't want to lead the thread and the discussions, but i cannot determine
> where are those 11M reads spent on; the major difference I see b/w the two
> execution plans is the use of Paralellism in the good exec plan.  Could
> this
> be the cause and if so, why would SQL Engine choose NOT to use parallelism
> at times but decide to go with a single thread executing it all?
>
> Thank you much in advance,
> -mg
>
>
>
>

[attached file: GOOD vs. BAD.zip]
Are all your drivers up to date? click for free checkup

Author
10 Dec 2008 9:21 AM
Uri Dimant
Martin

You can try to recompile the sp every time it runs. Also SQL Server 2005
introduced OPTION(RECOMPILE) on statement level.

Run the following

GetUserWorkflowCaseWorklist @userId=33"
SET SHOWPLAN_XML ON

GetUserWorkflowCaseWorklist @userId=52"

Near the bottom of the XML document, you should see the following:

  <ParameterList>

                    <ColumnReference Column="@p"
ParameterCompiledValue="(33)" ParameterRuntimeValue="(52")" />

  </ParameterList>

Seeing the different values used for compilation and execution can lead you
to suspect an issue with parameter sniffing.





Show quoteHide quote
"Martin Gasevski" <martinDOTgasevskiATautodeskDOTcom> wrote in message
news:O56mhfjWJHA.4184@TK2MSFTNGP06.phx.gbl...
> [addendum]
>
> Attached are both of the Execution Plans...
>
> I will try running in SSMS the "SET ARITHABORT OFF" statement before the
> offending sproc execution to see if I could repro the issue.  Even if I
> could repro, this still leaves me with question of why is this happening
> and how to avoid this 'randomness' I'm seeing.
>
> Any help is appreciated,
> -mg
>
> "Martin Gasevski" <martinDOTgasevskiATautodeskDOTcom> wrote in message
> news:%23oEGRSjWJHA.556@TK2MSFTNGP06.phx.gbl...
>> All,
>>
>> Please help!  Have a trully bizarre (at least to me) performance case
>> that
>> cannot put a finger on and need prompt resolution advice as its been
>> troubling me for weeks on.
>>
>> The sproc is attached.   At first I suspected its the nested views, so
>> created a flat sproc code where the views were all "AS my*" aliases.  The
>> issue still comes back.
>>
>> The PROBLEM:
>> - One sproc that accepts bigint (a userID) at times performs as expected:
>> about 2500 logical disk reads; CPU and Duration well under 1sec.
>> - At irregular times during the day (or night), the same sproc takes
>> 5-7min
>> and logical data reads reach over 11 million!!  I am not able to
>> reproduce
>> this even when copy/paste the same RPC:Completed call into SQL Mgmt
>> Studio
>> and run it immediately after seeing the anomalous behavior.
>>
>> Note: The issue is intermittently observed even when same userID is
>> passed.
>> Example of random userId:  these sometimes perform fine, sometimes i see
>> the
>> 11+ million reads.
>> "exec GetUserWorkflowCaseWorklist @userId=33" or "exec
>> GetUserWorkflowCaseWorklist @userId=52"
>>
>> I will provide SQL Profiler Traces of 'good' and 'bad' performing SProcs
>> along
>> with execution plans additionally.
>>
>> The SYSTEM:
>> Windows Server 2008 x32-bit
>> 2 x QuadCore CPUs (not HT as Task Manager shows 8 CPU windows)
>> 4GB RAM
>> SQL: all system mdf/ldf and user mdf files are on C:\ disk.  One
>> exception,
>> the db's ldf file (whose sproc is mis-behaving) is on disk D:\.
>>
>>
>> (see Results03.txt)  All static values seem default SQL2005:
>> - SQL2005 SP2 STD
>> - Recovery model = Simple
>> - Compat. Level = SQL90
>> - data growth = 100MB, autogrow
>> - Log growth 10%, autogrow
>> - Nightly maint. plan run (all user databases, all default values, in
>> following order):
>>    a) Check DB Integrity
>>    b) Rebuild INdexes
>>    c) Reorganize Indexes
>>    d) Shrink Database
>>    e) Update Stats
>>    f) Cleanup History
>>
>> Don't want to lead the thread and the discussions, but i cannot determine
>> where are those 11M reads spent on; the major difference I see b/w the
>> two
>> execution plans is the use of Paralellism in the good exec plan.  Could
>> this
>> be the cause and if so, why would SQL Engine choose NOT to use
>> parallelism
>> at times but decide to go with a single thread executing it all?
>>
>> Thank you much in advance,
>> -mg
>>
>>
>>
>>
>
>
>

Bookmark and Share