Home All Groups Group Topic Archive Search About

Troubleshooting High CPU



Author
13 Mar 2007 5:56 PM
cbrichards
We are having occasional high, SUSTAINED, CPU. If we just let it go, it stays
sustained upwards of 90 percent. We have not discovered the root cause, but
when it occurs we are forced to restart SQL Server Service to get back to our
client service levels, since our clients our timing out. We have 18 servers,
all running Windows 2003 R2, with SQL Server 2005 SP1. We have approximately
40 databases per server (give or take, plus or minus), where the average size
is about 1.5 GB. All the servers have with 8 GB RAM, with MinServerMemory = 0,
and MaxServerMemory = 6144. The Total and Target Server memory counters all
run at about 6 GB. Each server is identical. Each database setup, options,
and configurations are identical. The tables within the databases have
identical schemas, but containing different data.

Our SQL Servers are dedicated SQL Servers, with no other applications.

When I say we occasionally have high CPU, it has only occurred on three of
our of our 18 servers. Of the three servers, on two of the servers it has
occurred once each, and the third server it has occurred twice. This third
server where it has occurred twice, it has happened these two times in the
last two weeks and usually occurs in the mid-morning hours, but on different
days of the week. We have TRAN backups that occur every hour, and take about
5 minutes to complete total, for the approximate 40 databases per server.

We run update statistics with fullscan every night, and reindex once a week.

During these high CPU occurances, in Task Manager, SQL Server is the top
process.

During these high CPU occurances, our reads and writes are at normal levels.

During these high CPU occurances, our buffer cache hit ratio is 99%.

Running CHECKPOINT on all databases, including TEMPDB, provides momentary
improvement, but in a matter of seconds, CPU is back and sustained over 90
percent.

Querying sys.sysprocesses, and viewing Activity Monitor does not reveal any
abnormal high CPU, but it does reveal quite a few open trans, and spwho does
show some blocking.

While I am not 100% sure, it appears our compilations escalate to
approximately 100-150 per second, while during normal CPU load we are at
approximately 20 per second. (Whether this is a sympton across all other high
CPU times we have experienced I do not know, but when it occurred yesterday,
this was the case.)

The SQL Error log has a couple of interesting lines:
2007-03-12 10:48:10.040 spid1s AppDomain 15 (<I have replaced the real
database name here>.dbo[runtime].14) is marked for unload due to memory
pressure.
2007-03-12 10:49:22.850 spid245 There is insufficient system memory to run
this query.

In looking at Memory Object Pages/Sec, this server that has had the high CPU
occur twice, the average is 316. This average is during normal CPU load. I do
not know what it is during high CPU load. However, another server that also
experienced high CPU, has Pages/Sec = 13. And another that has never
experience this high CPU has Pages/Sec = 247. Yet most of the 18 servers, are
in the range of 10-15 Pages/Sec.

I find that interesting, because, as I said, the servers, databases, and
table schemas are identical across all servers, the only difference being the
data. These servers, at least to our best attempts are load balanced. It is
interesting to me that identical servers, databases, etc., where two servers
each have such high Pages/Sec, compared to the other 16.

In reading on http://www.sql-server-performance.com/qdpma/inst_3_pmlogs.asp,
I came across the following: "The SQL Server process Working Set should track
closely with Private Bytes, the SQL Server:Buffer Manager->Total Pages
counters and the SQL Server:Memory Manager->Total Server Memory(KB) counters.
It is not necessary to log all of these counters. If the working set suddenly
drops much below private bytes, then this could indicate that the OS had to
taken memory from SQL Server for another application."

When running Private Bytes and Working Set on this server that has had the
two high CPU occurances, the Private Bytes averages 280300544 and stays
rather steady. However, the Working Set averages 116593746 and every 4 to 5
minutes dips down to near zero.

Which leads to another statement I read on
http://www.sql-server-performance.com/performance_monitor_counters_memory.asp
which states: "Generally, on a physical server dedicated to SQL Server with
an adequate amount of RAM, paging will average near zero. An adequate amount
of RAM for SQL Server is a server that has a Buffer Hit Cache Ratio
(described in more detail later) of 99% and higher. If you have a SQL Server
that has a Buffer Hit Cache Ratio of 99% or higher for a period of 24 hours,
but you are getting an average paging level of over 1, this generally means
that you may be running other applications on the physical server other than
SQL Server. If this is the case, you should remove those applications,
allowing SQL Server to be the only major application on the physical server.

If your SQL Server is not running any other applications, and paging exceeds
1 on average for a 24-hour period, this could mean that you have changed the
SQL Server memory settings. SQL Server should be configured so that it is set
to the "Dynamically configure SQL Server memory" option, and the "Maximum
Memory" setting should be set at the highest level. For optimum performance,
SQL Server should be allowed to take as much RAM as it wants for its own use
without having to compete for RAM with other applications."

So, while things point to a possible memory issue, I have run out of ideas on
how to perhaps prove my findings. Please help with or any other ideas.

--
Message posted via http://www.sqlmonster.com

Author
14 Mar 2007 2:50 AM
M A Srinivas
Show quote Hide quote
On Mar 13, 10:56 pm, "cbrichards" <u3288@uwe> wrote:
> We are having occasional high, SUSTAINED, CPU. If we just let it go, it stays
> sustained upwards of 90 percent. We have not discovered the root cause, but
> when it occurs we are forced to restart SQL Server Service to get back to our
> client service levels, since our clients our timing out. We have 18 servers,
> all running Windows 2003 R2, with SQL Server 2005 SP1. We have approximately
> 40 databases per server (give or take, plus or minus), where the average size
> is about 1.5 GB. All the servers have with 8 GB RAM, with MinServerMemory = 0,
> and MaxServerMemory = 6144. The Total and Target Server memory counters all
> run at about 6 GB. Each server is identical. Each database setup, options,
> and configurations are identical. The tables within the databases have
> identical schemas, but containing different data.
>
> Our SQL Servers are dedicated SQL Servers, with no other applications.
>
> When I say we occasionally have high CPU, it has only occurred on three of
> our of our 18 servers. Of the three servers, on two of the servers it has
> occurred once each, and the third server it has occurred twice. This third
> server where it has occurred twice, it has happened these two times in the
> last two weeks and usually occurs in the mid-morning hours, but on different
> days of the week. We have TRAN backups that occur every hour, and take about
> 5 minutes to complete total, for the approximate 40 databases per server.
>
> We run update statistics with fullscan every night, and reindex once a week.
>
> During these high CPU occurances, in Task Manager, SQL Server is the top
> process.
>
> During these high CPU occurances, our reads and writes are at normal levels.
>
> During these high CPU occurances, our buffer cache hit ratio is 99%.
>
> Running CHECKPOINT on all databases, including TEMPDB, provides momentary
> improvement, but in a matter of seconds, CPU is back and sustained over 90
> percent.
>
> Querying sys.sysprocesses, and viewing Activity Monitor does not reveal any
> abnormal high CPU, but it does reveal quite a few open trans, and spwho does
> show some blocking.
>
> While I am not 100% sure, it appears our compilations escalate to
> approximately 100-150 per second, while during normal CPU load we are at
> approximately 20 per second. (Whether this is a sympton across all other high
> CPU times we have experienced I do not know, but when it occurred yesterday,
> this was the case.)
>
> The SQL Error log has a couple of interesting lines:
> 2007-03-12 10:48:10.040 spid1s AppDomain 15 (<I have replaced the real
> database name here>.dbo[runtime].14) is marked for unload due to memory
> pressure.
> 2007-03-12 10:49:22.850 spid245 There is insufficient system memory to run
> this query.
>
> In looking at Memory Object Pages/Sec, this server that has had the high CPU
> occur twice, the average is 316. This average is during normal CPU load. I do
> not know what it is during high CPU load. However, another server that also
> experienced high CPU, has Pages/Sec = 13. And another that has never
> experience this high CPU has Pages/Sec = 247. Yet most of the 18 servers, are
> in the range of 10-15 Pages/Sec.
>
> I find that interesting, because, as I said, the servers, databases, and
> table schemas are identical across all servers, the only difference being the
> data. These servers, at least to our best attempts are load balanced. It is
> interesting to me that identical servers, databases, etc., where two servers
> each have such high Pages/Sec, compared to the other 16.
>
> In reading onhttp://www.sql-server-performance.com/qdpma/inst_3_pmlogs.asp,
> I came across the following: "The SQL Server process Working Set should track
> closely with Private Bytes, the SQL Server:Buffer Manager->Total Pages
> counters and the SQL Server:Memory Manager->Total Server Memory(KB) counters.
> It is not necessary to log all of these counters. If the working set suddenly
> drops much below private bytes, then this could indicate that the OS had to
> taken memory from SQL Server for another application."
>
> When running Private Bytes and Working Set on this server that has had the
> two high CPU occurances, the Private Bytes averages 280300544 and stays
> rather steady. However, the Working Set averages 116593746 and every 4 to 5
> minutes dips down to near zero.
>
> Which leads to another statement I read onhttp://www.sql-server-performance.com/performance_monitor_counters_me...
> which states: "Generally, on a physical server dedicated to SQL Server with
> an adequate amount of RAM, paging will average near zero. An adequate amount
> of RAM for SQL Server is a server that has a Buffer Hit Cache Ratio
> (described in more detail later) of 99% and higher. If you have a SQL Server
> that has a Buffer Hit Cache Ratio of 99% or higher for a period of 24 hours,
> but you are getting an average paging level of over 1, this generally means
> that you may be running other applications on the physical server other than
> SQL Server. If this is the case, you should remove those applications,
> allowing SQL Server to be the only major application on the physical server.
>
> If your SQL Server is not running any other applications, and paging exceeds
> 1 on average for a 24-hour period, this could mean that you have changed the
> SQL Server memory settings. SQL Server should be configured so that it is set
> to the "Dynamically configure SQL Server memory" option, and the "Maximum
> Memory" setting should be set at the highest level. For optimum performance,
> SQL Server should be allowed to take as much RAM as it wants for its own use
> without having to compete for RAM with other applications."
>
> So, while things point to a possible memory issue, I have run out of ideas on
> how to perhaps prove my findings. Please help with or any other ideas.
>
> --
> Message posted viahttp://www.sqlmonster.com

Following may be causes

1. Lot of cursors and unreleased cursors
2. Lot of Prepared statements in cache . These statements are not
going out of cache quickly. Look  at syscacheobjects table . If this
is the case use sp_executesql instead of prepared statements .
3. Look at DBCC MEMORY STATUS
4. sysprocesses table see cpu column

M A Srinivas
Are all your drivers up to date? click for free checkup

Author
16 Mar 2007 8:52 PM
cbrichards via SQLMonster.com
Thanks for the suggestions. I have responded to each suggestion, one with a
question.

1. Lot of cursors and unreleased cursors
We have very, very few cursors. I will not rule it out, as the procedures
that do have cursors could have possibly been hit hard at the time. I will
review my traces.

2. Lot of Prepared statements in cache.
Is this a count of cacheobjtype = 'Compiled Plan' ? Or is it Count(*) From
sys.syscacheobjects with no filtering?

3. Look at DBCC MEMORY STATUS
I have had a look at this and my target and committed values were identical.

4. sysprocesses table see cpu column
I ran this during the high CPU episode and no one process stood out as
consuming CPU greatly one over the other.

M A Srinivas wrote:
Show quoteHide quote
>> We are having occasional high, SUSTAINED, CPU. If we just let it go, it stays
>> sustained upwards of 90 percent. We have not discovered the root cause, but
>[quoted text clipped - 98 lines]
>> --
>> Message posted viahttp://www.sqlmonster.com
>
>Following may be causes
>
>1. Lot of cursors and unreleased cursors
>2. Lot of Prepared statements in cache . These statements are not
>going out of cache quickly. Look  at syscacheobjects table . If this
>is the case use sp_executesql instead of prepared statements .
>3. Look at DBCC MEMORY STATUS
>4. sysprocesses table see cpu column
>
>M A Srinivas

Author
17 Mar 2007 2:49 PM
Andrew J. Kelly
That's a lot to digest but a few comments none the less. One is that with
only 8GB total and no other apps running on the server you should be able to
get by with setting the MAX memory to around 7GB or so. But be careful in
that if you are seeing real paging at the OS level this may increase it but
that would seem to indicate you do have other processes running. Even though
these servers may seem identical the data and users in each db are not. So
that patterns and usage's can vary greatly. If you have one db that does a
higher level of adhoc or non-reusable queries you can dramatically affect
the proc cache. How much memory is taken up by the proc cache on the servers
that have issues? What do you get when you select * from syscacheobjects (or
the dmv) ordered by UseCounts Desc?  If there are only a few at the top with
high usecounts and a lot of ones with a usecount of 1 you are not getting
good plan reuse. This will also eat up the cache. I take it these are 32 bit
servers?  Do you have /3GB set?  If not you may want to try setting that.
If you are doing things that need contiguous memory you could be depleting
or fragmenting your memtoleave area. If so you can get some relief by
increasing its size with the /g startup option.  Running on X64 will help a
number of these issues.  But I also recommend you load the latest version of
SP2 for SQL2005 as well.

--
Andrew J. Kelly SQL MVP

Show quoteHide quote
"cbrichards" <u3288@uwe> wrote in message news:6f20b5a5da07c@uwe...
> We are having occasional high, SUSTAINED, CPU. If we just let it go, it
> stays
> sustained upwards of 90 percent. We have not discovered the root cause,
> but
> when it occurs we are forced to restart SQL Server Service to get back to
> our
> client service levels, since our clients our timing out. We have 18
> servers,
> all running Windows 2003 R2, with SQL Server 2005 SP1. We have
> approximately
> 40 databases per server (give or take, plus or minus), where the average
> size
> is about 1.5 GB. All the servers have with 8 GB RAM, with MinServerMemory
> = 0,
> and MaxServerMemory = 6144. The Total and Target Server memory counters
> all
> run at about 6 GB. Each server is identical. Each database setup, options,
> and configurations are identical. The tables within the databases have
> identical schemas, but containing different data.
>
> Our SQL Servers are dedicated SQL Servers, with no other applications.
>
> When I say we occasionally have high CPU, it has only occurred on three of
> our of our 18 servers. Of the three servers, on two of the servers it has
> occurred once each, and the third server it has occurred twice. This third
> server where it has occurred twice, it has happened these two times in the
> last two weeks and usually occurs in the mid-morning hours, but on
> different
> days of the week. We have TRAN backups that occur every hour, and take
> about
> 5 minutes to complete total, for the approximate 40 databases per server.
>
> We run update statistics with fullscan every night, and reindex once a
> week.
>
> During these high CPU occurances, in Task Manager, SQL Server is the top
> process.
>
> During these high CPU occurances, our reads and writes are at normal
> levels.
>
> During these high CPU occurances, our buffer cache hit ratio is 99%.
>
> Running CHECKPOINT on all databases, including TEMPDB, provides momentary
> improvement, but in a matter of seconds, CPU is back and sustained over 90
> percent.
>
> Querying sys.sysprocesses, and viewing Activity Monitor does not reveal
> any
> abnormal high CPU, but it does reveal quite a few open trans, and spwho
> does
> show some blocking.
>
> While I am not 100% sure, it appears our compilations escalate to
> approximately 100-150 per second, while during normal CPU load we are at
> approximately 20 per second. (Whether this is a sympton across all other
> high
> CPU times we have experienced I do not know, but when it occurred
> yesterday,
> this was the case.)
>
> The SQL Error log has a couple of interesting lines:
> 2007-03-12 10:48:10.040 spid1s AppDomain 15 (<I have replaced the real
> database name here>.dbo[runtime].14) is marked for unload due to memory
> pressure.
> 2007-03-12 10:49:22.850 spid245 There is insufficient system memory to run
> this query.
>
> In looking at Memory Object Pages/Sec, this server that has had the high
> CPU
> occur twice, the average is 316. This average is during normal CPU load. I
> do
> not know what it is during high CPU load. However, another server that
> also
> experienced high CPU, has Pages/Sec = 13. And another that has never
> experience this high CPU has Pages/Sec = 247. Yet most of the 18 servers,
> are
> in the range of 10-15 Pages/Sec.
>
> I find that interesting, because, as I said, the servers, databases, and
> table schemas are identical across all servers, the only difference being
> the
> data. These servers, at least to our best attempts are load balanced. It
> is
> interesting to me that identical servers, databases, etc., where two
> servers
> each have such high Pages/Sec, compared to the other 16.
>
> In reading on
> http://www.sql-server-performance.com/qdpma/inst_3_pmlogs.asp,
> I came across the following: "The SQL Server process Working Set should
> track
> closely with Private Bytes, the SQL Server:Buffer Manager->Total Pages
> counters and the SQL Server:Memory Manager->Total Server Memory(KB)
> counters.
> It is not necessary to log all of these counters. If the working set
> suddenly
> drops much below private bytes, then this could indicate that the OS had
> to
> taken memory from SQL Server for another application."
>
> When running Private Bytes and Working Set on this server that has had the
> two high CPU occurances, the Private Bytes averages 280300544 and stays
> rather steady. However, the Working Set averages 116593746 and every 4 to
> 5
> minutes dips down to near zero.
>
> Which leads to another statement I read on
> http://www.sql-server-performance.com/performance_monitor_counters_memory.asp
> which states: "Generally, on a physical server dedicated to SQL Server
> with
> an adequate amount of RAM, paging will average near zero. An adequate
> amount
> of RAM for SQL Server is a server that has a Buffer Hit Cache Ratio
> (described in more detail later) of 99% and higher. If you have a SQL
> Server
> that has a Buffer Hit Cache Ratio of 99% or higher for a period of 24
> hours,
> but you are getting an average paging level of over 1, this generally
> means
> that you may be running other applications on the physical server other
> than
> SQL Server. If this is the case, you should remove those applications,
> allowing SQL Server to be the only major application on the physical
> server.
>
> If your SQL Server is not running any other applications, and paging
> exceeds
> 1 on average for a 24-hour period, this could mean that you have changed
> the
> SQL Server memory settings. SQL Server should be configured so that it is
> set
> to the "Dynamically configure SQL Server memory" option, and the "Maximum
> Memory" setting should be set at the highest level. For optimum
> performance,
> SQL Server should be allowed to take as much RAM as it wants for its own
> use
> without having to compete for RAM with other applications."
>
> So, while things point to a possible memory issue, I have run out of ideas
> on
> how to perhaps prove my findings. Please help with or any other ideas.
>
> --
> Message posted via http://www.sqlmonster.com
>
Author
19 Mar 2007 4:17 PM
cbrichards via SQLMonster.com
Thanks for your response Andrew. I will attempt to address some of your
questions.

1. These are 32 bit servers?
Answer: Yes, all are 32 bit.

2. Do you have /3GB set?
Answer: Yes, all servers have /3GB set.

3. How much memory is taken up by the proc cache on the servers that have
issues?
Answer: The values in (A) are from the troublesome server. The values in (B)
are from one of the 16 servers not having high CPU issues. The following is
from DBCC MemoryStatus


Buffer Distribution  Buffers (A) Buffers (B)   
--------------------       -----------      -----------
Stolen                     -69747      -249540
Free                    1127         7755
Cached                244096       446426
Database (clean)      590999       532995
Database (dirty)       19952        48794


Procedure Cache      Value (A)    Value (B)
--------------------           -----------        -----------
TotalProcs              4934         4478
TotalPages            155087       179879
InUsePages               202           50

4. What do you get when you select * from syscacheobjects (or the dmv)
ordered by UseCounts Desc?
Answer: When I run the following statement, I get the counts following the
statement:

select sql_text.text,
CP.UseCounts,
CP.Size_In_Bytes,
CP.CacheObjType,
CP.ObjType
from sys.dm_exec_cached_plans CP
cross apply sys.dm_exec_sql_text(plan_handle) as sql_text
WHERE <<See Below>>

No WHERE filter, total count: 2693
WHERE UseCounts = 1: 1707
WHERE UseCounts >= 100: 113


NOTE: The values shown here were taken this morning during normal, non-High
CPU operations. Not knowing when high CPU will be experienced again, these
are the values obtained.



Andrew J. Kelly wrote:
Show quoteHide quote
>That's a lot to digest but a few comments none the less. One is that with
>only 8GB total and no other apps running on the server you should be able to
>get by with setting the MAX memory to around 7GB or so. But be careful in
>that if you are seeing real paging at the OS level this may increase it but
>that would seem to indicate you do have other processes running. Even though
>these servers may seem identical the data and users in each db are not. So
>that patterns and usage's can vary greatly. If you have one db that does a
>higher level of adhoc or non-reusable queries you can dramatically affect
>the proc cache. How much memory is taken up by the proc cache on the servers
>that have issues? What do you get when you select * from syscacheobjects (or
>the dmv) ordered by UseCounts Desc?  If there are only a few at the top with
>high usecounts and a lot of ones with a usecount of 1 you are not getting
>good plan reuse. This will also eat up the cache. I take it these are 32 bit
>servers?  Do you have /3GB set?  If not you may want to try setting that.
>If you are doing things that need contiguous memory you could be depleting
>or fragmenting your memtoleave area. If so you can get some relief by
>increasing its size with the /g startup option.  Running on X64 will help a
>number of these issues.  But I also recommend you load the latest version of
>SP2 for SQL2005 as well.
>
>> We are having occasional high, SUSTAINED, CPU. If we just let it go, it
>> stays
>[quoted text clipped - 138 lines]
>> on
>> how to perhaps prove my findings. Please help with or any other ideas.

Author
20 Mar 2007 4:21 AM
Andrew J. Kelly
Wow those are some large query plans. Even the A server has about 1.2GB
taken up just for the proc cache.
       155087 pages * 8192 bytes / 1024 /1024 = 1.2MB
You only have less than 5K plans in total. These must be very huge queries
and it is easy to see how you can run up a large CPU hit when compiling
these types of plans. You mentioned that you see lots of blocking when this
is happening. If you have lots of locks taken out with so much memory taken
by proc cache you can easily exhaust enough memory to see the types of
errors you listed. Can you explain what is going on with these large plans
in the first place?  If you can reduce their size and get better plan reuse
you should be able to cut down on the CPU usages and wasted space for the
proc cache. I assume that the reason one or two servers show this more than
the others is due to the way in which they are utilized. Possibly several
really inefficient sps are called more often on those servers than the
others?  A couple more things to try or at least consider. You may want to
try changing the Force Parameterization option to try and get better plan
reuse to cut down on compiles and get better use of plan cache memory. Load
SP2 (latest version) of SQL Server since it has some optimizations for
managing Plan cache memory.  But you really need to find out why the plans
are so large and not being reused in general.

--
Andrew J. Kelly SQL MVP

"cbrichards via SQLMonster.com" <u3288@uwe> wrote in message
news:6f6bc5a76962c@uwe...
Show quoteHide quote
> Thanks for your response Andrew. I will attempt to address some of your
> questions.
>
> 1. These are 32 bit servers?
> Answer: Yes, all are 32 bit.
>
> 2. Do you have /3GB set?
> Answer: Yes, all servers have /3GB set.
>
> 3. How much memory is taken up by the proc cache on the servers that have
> issues?
> Answer: The values in (A) are from the troublesome server. The values in
> (B)
> are from one of the 16 servers not having high CPU issues. The following
> is
> from DBCC MemoryStatus
>
>
> Buffer Distribution  Buffers (A) Buffers (B)
> --------------------       -----------      -----------
> Stolen                     -69747   -249540
> Free                    1127      7755
> Cached                244096    446426
> Database (clean)      590999    532995
> Database (dirty)       19952     48794
>
>
> Procedure Cache      Value (A) Value (B)
> --------------------           -----------        -----------
> TotalProcs              4934      4478
> TotalPages            155087    179879
> InUsePages               202        50
>
> 4. What do you get when you select * from syscacheobjects (or the dmv)
> ordered by UseCounts Desc?
> Answer: When I run the following statement, I get the counts following the
> statement:
>
> select sql_text.text,
> CP.UseCounts,
> CP.Size_In_Bytes,
> CP.CacheObjType,
> CP.ObjType
> from sys.dm_exec_cached_plans CP
> cross apply sys.dm_exec_sql_text(plan_handle) as sql_text
> WHERE <<See Below>>
>
> No WHERE filter, total count: 2693
> WHERE UseCounts = 1: 1707
> WHERE UseCounts >= 100: 113
>
>
> NOTE: The values shown here were taken this morning during normal,
> non-High
> CPU operations. Not knowing when high CPU will be experienced again, these
> are the values obtained.
>
>
>
> Andrew J. Kelly wrote:
>>That's a lot to digest but a few comments none the less. One is that with
>>only 8GB total and no other apps running on the server you should be able
>>to
>>get by with setting the MAX memory to around 7GB or so. But be careful in
>>that if you are seeing real paging at the OS level this may increase it
>>but
>>that would seem to indicate you do have other processes running. Even
>>though
>>these servers may seem identical the data and users in each db are not. So
>>that patterns and usage's can vary greatly. If you have one db that does a
>>higher level of adhoc or non-reusable queries you can dramatically affect
>>the proc cache. How much memory is taken up by the proc cache on the
>>servers
>>that have issues? What do you get when you select * from syscacheobjects
>>(or
>>the dmv) ordered by UseCounts Desc?  If there are only a few at the top
>>with
>>high usecounts and a lot of ones with a usecount of 1 you are not getting
>>good plan reuse. This will also eat up the cache. I take it these are 32
>>bit
>>servers?  Do you have /3GB set?  If not you may want to try setting that.
>>If you are doing things that need contiguous memory you could be depleting
>>or fragmenting your memtoleave area. If so you can get some relief by
>>increasing its size with the /g startup option.  Running on X64 will help
>>a
>>number of these issues.  But I also recommend you load the latest version
>>of
>>SP2 for SQL2005 as well.
>>
>>> We are having occasional high, SUSTAINED, CPU. If we just let it go, it
>>> stays
>>[quoted text clipped - 138 lines]
>>> on
>>> how to perhaps prove my findings. Please help with or any other ideas.
>
> --
> Message posted via SQLMonster.com
> http://www.sqlmonster.com/Uwe/Forums.aspx/sql-server/200703/1
>
Author
20 Mar 2007 2:45 PM
cbrichards via SQLMonster.com
As far as identifying these large query plans, would the following query do
the trick?

select top 50 
    sum(qs.total_worker_time) as total_cpu_time, 
    sum(qs.execution_count) as total_execution_count,
    count(*) as  number_of_statements, 
    sql_text.text
from 
    sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(plan_handle) as sql_text
group by sql_text.text
order by sum(qs.total_worker_time) desc


Andrew J. Kelly wrote:
Show quoteHide quote
>Wow those are some large query plans. Even the A server has about 1.2GB
>taken up just for the proc cache.
>       155087 pages * 8192 bytes / 1024 /1024 = 1.2MB
>You only have less than 5K plans in total. These must be very huge queries
>and it is easy to see how you can run up a large CPU hit when compiling
>these types of plans. You mentioned that you see lots of blocking when this
>is happening. If you have lots of locks taken out with so much memory taken
>by proc cache you can easily exhaust enough memory to see the types of
>errors you listed. Can you explain what is going on with these large plans
>in the first place?  If you can reduce their size and get better plan reuse
>you should be able to cut down on the CPU usages and wasted space for the
>proc cache. I assume that the reason one or two servers show this more than
>the others is due to the way in which they are utilized. Possibly several
>really inefficient sps are called more often on those servers than the
>others?  A couple more things to try or at least consider. You may want to
>try changing the Force Parameterization option to try and get better plan
>reuse to cut down on compiles and get better use of plan cache memory. Load
>SP2 (latest version) of SQL Server since it has some optimizations for
>managing Plan cache memory.  But you really need to find out why the plans
>are so large and not being reused in general.
>
>> Thanks for your response Andrew. I will attempt to address some of your
>> questions.
>[quoted text clipped - 84 lines]
>>>> on
>>>> how to perhaps prove my findings. Please help with or any other ideas.

Author
20 Mar 2007 3:35 PM
Andrew J. Kelly
You would be better off looking for the plans that take up the most space.
This query will show the largest memory hogs on top.

SELECT b.[bucketid], b.[cacheobjtype], b.[objtype], b.[refcounts],
b.[usecounts]
    , a.[dbid], a.[objectid], b.[size_in_bytes], a.[text]
FROM sys.dm_exec_cached_plans as b
CROSS APPLY sys.dm_exec_sql_text(b.[plan_handle]) AS a
ORDER BY [size_in_bytes] DESC


--
Andrew J. Kelly SQL MVP

"cbrichards via SQLMonster.com" <u3288@uwe> wrote in message
news:6f778a39a4039@uwe...
Show quoteHide quote
> As far as identifying these large query plans, would the following query
> do
> the trick?
>
> select top 50
>    sum(qs.total_worker_time) as total_cpu_time,
>    sum(qs.execution_count) as total_execution_count,
>    count(*) as  number_of_statements,
>    sql_text.text
> from
>    sys.dm_exec_query_stats qs
> cross apply sys.dm_exec_sql_text(plan_handle) as sql_text
> group by sql_text.text
> order by sum(qs.total_worker_time) desc
>
>
> Andrew J. Kelly wrote:
>>Wow those are some large query plans. Even the A server has about 1.2GB
>>taken up just for the proc cache.
>>       155087 pages * 8192 bytes / 1024 /1024 = 1.2MB
>>You only have less than 5K plans in total. These must be very huge queries
>>and it is easy to see how you can run up a large CPU hit when compiling
>>these types of plans. You mentioned that you see lots of blocking when
>>this
>>is happening. If you have lots of locks taken out with so much memory
>>taken
>>by proc cache you can easily exhaust enough memory to see the types of
>>errors you listed. Can you explain what is going on with these large plans
>>in the first place?  If you can reduce their size and get better plan
>>reuse
>>you should be able to cut down on the CPU usages and wasted space for the
>>proc cache. I assume that the reason one or two servers show this more
>>than
>>the others is due to the way in which they are utilized. Possibly several
>>really inefficient sps are called more often on those servers than the
>>others?  A couple more things to try or at least consider. You may want to
>>try changing the Force Parameterization option to try and get better plan
>>reuse to cut down on compiles and get better use of plan cache memory.
>>Load
>>SP2 (latest version) of SQL Server since it has some optimizations for
>>managing Plan cache memory.  But you really need to find out why the plans
>>are so large and not being reused in general.
>>
>>> Thanks for your response Andrew. I will attempt to address some of your
>>> questions.
>>[quoted text clipped - 84 lines]
>>>>> on
>>>>> how to perhaps prove my findings. Please help with or any other ideas.
>
> --
> Message posted via SQLMonster.com
> http://www.sqlmonster.com/Uwe/Forums.aspx/sql-server/200703/1
>
Author
20 Mar 2007 2:48 PM
cbrichards via SQLMonster.com
Based upon having 8 GB RAM, and 6 GB dedicated to SQL Server, what would be a
more optimal size to shoot for when it comes to the size of the proc cache?
Is there a recommended size for the proc cache under these conditions?

Andrew J. Kelly wrote:
Show quoteHide quote
>Wow those are some large query plans. Even the A server has about 1.2GB
>taken up just for the proc cache.
>       155087 pages * 8192 bytes / 1024 /1024 = 1.2MB
>You only have less than 5K plans in total. These must be very huge queries
>and it is easy to see how you can run up a large CPU hit when compiling
>these types of plans. You mentioned that you see lots of blocking when this
>is happening. If you have lots of locks taken out with so much memory taken
>by proc cache you can easily exhaust enough memory to see the types of
>errors you listed. Can you explain what is going on with these large plans
>in the first place?  If you can reduce their size and get better plan reuse
>you should be able to cut down on the CPU usages and wasted space for the
>proc cache. I assume that the reason one or two servers show this more than
>the others is due to the way in which they are utilized. Possibly several
>really inefficient sps are called more often on those servers than the
>others?  A couple more things to try or at least consider. You may want to
>try changing the Force Parameterization option to try and get better plan
>reuse to cut down on compiles and get better use of plan cache memory. Load
>SP2 (latest version) of SQL Server since it has some optimizations for
>managing Plan cache memory.  But you really need to find out why the plans
>are so large and not being reused in general.
>
>> Thanks for your response Andrew. I will attempt to address some of your
>> questions.
>[quoted text clipped - 84 lines]
>>>> on
>>>> how to perhaps prove my findings. Please help with or any other ideas.

Author
20 Mar 2007 3:38 PM
Andrew J. Kelly
It's not a matter of a size to shoot for. As with anything that uses memory
you want it to be as efficient as possible. I have seen 10K + plans take up
only several hundred MB's. You have only a few thousand and they take up
over a GB. The bottom line is that they take up what they need to but I
question what the plans are that are taking up so much space in yours.

--
Andrew J. Kelly SQL MVP

"cbrichards via SQLMonster.com" <u3288@uwe> wrote in message
news:6f7791933c00d@uwe...
Show quoteHide quote
> Based upon having 8 GB RAM, and 6 GB dedicated to SQL Server, what would
> be a
> more optimal size to shoot for when it comes to the size of the proc
> cache?
> Is there a recommended size for the proc cache under these conditions?
>
> Andrew J. Kelly wrote:
>>Wow those are some large query plans. Even the A server has about 1.2GB
>>taken up just for the proc cache.
>>       155087 pages * 8192 bytes / 1024 /1024 = 1.2MB
>>You only have less than 5K plans in total. These must be very huge queries
>>and it is easy to see how you can run up a large CPU hit when compiling
>>these types of plans. You mentioned that you see lots of blocking when
>>this
>>is happening. If you have lots of locks taken out with so much memory
>>taken
>>by proc cache you can easily exhaust enough memory to see the types of
>>errors you listed. Can you explain what is going on with these large plans
>>in the first place?  If you can reduce their size and get better plan
>>reuse
>>you should be able to cut down on the CPU usages and wasted space for the
>>proc cache. I assume that the reason one or two servers show this more
>>than
>>the others is due to the way in which they are utilized. Possibly several
>>really inefficient sps are called more often on those servers than the
>>others?  A couple more things to try or at least consider. You may want to
>>try changing the Force Parameterization option to try and get better plan
>>reuse to cut down on compiles and get better use of plan cache memory.
>>Load
>>SP2 (latest version) of SQL Server since it has some optimizations for
>>managing Plan cache memory.  But you really need to find out why the plans
>>are so large and not being reused in general.
>>
>>> Thanks for your response Andrew. I will attempt to address some of your
>>> questions.
>>[quoted text clipped - 84 lines]
>>>>> on
>>>>> how to perhaps prove my findings. Please help with or any other ideas.
>
> --
> Message posted via SQLMonster.com
> http://www.sqlmonster.com/Uwe/Forums.aspx/sql-server/200703/1
>
Author
20 Mar 2007 5:26 PM
JXStern
On Tue, 13 Mar 2007 17:56:29 GMT, "cbrichards" <u3288@uwe> wrote:
>So, while things point to a possible memory issue, I have run out of ideas on
>how to perhaps prove my findings. Please help with or any other ideas.

I see downthread discussion about huge query plans.

Do you have any idea what code is involved?  Is it even SP, or would
something be submitting large (say >1000 line) dynamic queries?  Would
several SPIDs be contesting for data?

You say sp_who2 shows some blocking, is it self-blocking or real
cross-spid blocking?  Any app-level idea why that would occur?

Do you have any loops in your SPs?

I don't see anything in your research that eliminates just some one
bad piece of code, with a bad query plan, showing up and causing the
choke.  It may be too obvious for you to mention, but I thought I'd
ask!  The one thing I think you said is that no one SPID seems to be
eating the CPU time.

Narrowing down the code executing during these episodes would be very,
very helpful!

Josh
Author
21 Mar 2007 2:53 PM
cbrichards via SQLMonster.com
Thanks Josh. Some of what you wrote I can answer. Some I just do not have
saved data to verify, other than what I noticed during the episode(s), such
as, is it real cross-spid blocking or self blocking. This I will have to pay
more attention too.

I do notice this morning, that as connectivity has increased, that the page
life expectancy has dropped to an average of 280 and falling as we have not
yet reached the peak of our connectivity. Page Faults/Sec are averaging 750+.
Our Buffer Cache Hit Ratio remains at 99%, Available Memory = 1.3 GB, and our
Target and Total are matched at 6 GB.

When I run a query to return the plans that are the largest consumers of
memory, the ones at the top of the list are several thousand lines of code
each. One in particular always seems to be prevalent in my profiler traces
when these episodes occur. During these episodes it is compiling, because
it's plan has been removed from cache, and with a plan as large as it is, I
am sure it takes upward of 5 seconds or more to compile.

I have a developer currently looking into optimizing this procedure.

JXStern wrote:
Show quoteHide quote
>>So, while things point to a possible memory issue, I have run out of ideas on
>>how to perhaps prove my findings. Please help with or any other ideas.
>
>I see downthread discussion about huge query plans.
>
>Do you have any idea what code is involved?  Is it even SP, or would
>something be submitting large (say >1000 line) dynamic queries?  Would
>several SPIDs be contesting for data?
>
>You say sp_who2 shows some blocking, is it self-blocking or real
>cross-spid blocking?  Any app-level idea why that would occur?
>
>Do you have any loops in your SPs?
>
>I don't see anything in your research that eliminates just some one
>bad piece of code, with a bad query plan, showing up and causing the
>choke.  It may be too obvious for you to mention, but I thought I'd
>ask!  The one thing I think you said is that no one SPID seems to be
>eating the CPU time.
>
>Narrowing down the code executing during these episodes would be very,
>very helpful!
>
>Josh

Author
23 Mar 2007 1:03 PM
Andrew J. Kelly
I think you have found the root cause. Sp's that are several thousand lines
are extremely hard to optimize and keep plans in cache. Usually operations
that have that much processing are batch type operations and may be better
served as an SSIS package. But in any case try split the procs up into
several smaller ones and really try hard to optimize the existing code.
That is a lot of code for a sp.

--
Andrew J. Kelly SQL MVP

"cbrichards via SQLMonster.com" <u3288@uwe> wrote in message
news:6f842de0ee8f9@uwe...
Show quoteHide quote
> Thanks Josh. Some of what you wrote I can answer. Some I just do not have
> saved data to verify, other than what I noticed during the episode(s),
> such
> as, is it real cross-spid blocking or self blocking. This I will have to
> pay
> more attention too.
>
> I do notice this morning, that as connectivity has increased, that the
> page
> life expectancy has dropped to an average of 280 and falling as we have
> not
> yet reached the peak of our connectivity. Page Faults/Sec are averaging
> 750+.
> Our Buffer Cache Hit Ratio remains at 99%, Available Memory = 1.3 GB, and
> our
> Target and Total are matched at 6 GB.
>
> When I run a query to return the plans that are the largest consumers of
> memory, the ones at the top of the list are several thousand lines of code
> each. One in particular always seems to be prevalent in my profiler traces
> when these episodes occur. During these episodes it is compiling, because
> it's plan has been removed from cache, and with a plan as large as it is,
> I
> am sure it takes upward of 5 seconds or more to compile.
>
> I have a developer currently looking into optimizing this procedure.
>
> JXStern wrote:
>>>So, while things point to a possible memory issue, I have run out of
>>>ideas on
>>>how to perhaps prove my findings. Please help with or any other ideas.
>>
>>I see downthread discussion about huge query plans.
>>
>>Do you have any idea what code is involved?  Is it even SP, or would
>>something be submitting large (say >1000 line) dynamic queries?  Would
>>several SPIDs be contesting for data?
>>
>>You say sp_who2 shows some blocking, is it self-blocking or real
>>cross-spid blocking?  Any app-level idea why that would occur?
>>
>>Do you have any loops in your SPs?
>>
>>I don't see anything in your research that eliminates just some one
>>bad piece of code, with a bad query plan, showing up and causing the
>>choke.  It may be too obvious for you to mention, but I thought I'd
>>ask!  The one thing I think you said is that no one SPID seems to be
>>eating the CPU time.
>>
>>Narrowing down the code executing during these episodes would be very,
>>very helpful!
>>
>>Josh
>
> --
> Message posted via SQLMonster.com
> http://www.sqlmonster.com/Uwe/Forums.aspx/sql-server/200703/1
>

Bookmark and Share