|
sql
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Troubleshooting High CPU
sustained upwards of 90 percent. We have not discovered the root cause, but when it occurs we are forced to restart SQL Server Service to get back to our client service levels, since our clients our timing out. We have 18 servers, all running Windows 2003 R2, with SQL Server 2005 SP1. We have approximately 40 databases per server (give or take, plus or minus), where the average size is about 1.5 GB. All the servers have with 8 GB RAM, with MinServerMemory = 0, and MaxServerMemory = 6144. The Total and Target Server memory counters all run at about 6 GB. Each server is identical. Each database setup, options, and configurations are identical. The tables within the databases have identical schemas, but containing different data. Our SQL Servers are dedicated SQL Servers, with no other applications. When I say we occasionally have high CPU, it has only occurred on three of our of our 18 servers. Of the three servers, on two of the servers it has occurred once each, and the third server it has occurred twice. This third server where it has occurred twice, it has happened these two times in the last two weeks and usually occurs in the mid-morning hours, but on different days of the week. We have TRAN backups that occur every hour, and take about 5 minutes to complete total, for the approximate 40 databases per server. We run update statistics with fullscan every night, and reindex once a week. During these high CPU occurances, in Task Manager, SQL Server is the top process. During these high CPU occurances, our reads and writes are at normal levels. During these high CPU occurances, our buffer cache hit ratio is 99%. Running CHECKPOINT on all databases, including TEMPDB, provides momentary improvement, but in a matter of seconds, CPU is back and sustained over 90 percent. Querying sys.sysprocesses, and viewing Activity Monitor does not reveal any abnormal high CPU, but it does reveal quite a few open trans, and spwho does show some blocking. While I am not 100% sure, it appears our compilations escalate to approximately 100-150 per second, while during normal CPU load we are at approximately 20 per second. (Whether this is a sympton across all other high CPU times we have experienced I do not know, but when it occurred yesterday, this was the case.) The SQL Error log has a couple of interesting lines: 2007-03-12 10:48:10.040 spid1s AppDomain 15 (<I have replaced the real database name here>.dbo[runtime].14) is marked for unload due to memory pressure. 2007-03-12 10:49:22.850 spid245 There is insufficient system memory to run this query. In looking at Memory Object Pages/Sec, this server that has had the high CPU occur twice, the average is 316. This average is during normal CPU load. I do not know what it is during high CPU load. However, another server that also experienced high CPU, has Pages/Sec = 13. And another that has never experience this high CPU has Pages/Sec = 247. Yet most of the 18 servers, are in the range of 10-15 Pages/Sec. I find that interesting, because, as I said, the servers, databases, and table schemas are identical across all servers, the only difference being the data. These servers, at least to our best attempts are load balanced. It is interesting to me that identical servers, databases, etc., where two servers each have such high Pages/Sec, compared to the other 16. In reading on http://www.sql-server-performance.com/qdpma/inst_3_pmlogs.asp, I came across the following: "The SQL Server process Working Set should track closely with Private Bytes, the SQL Server:Buffer Manager->Total Pages counters and the SQL Server:Memory Manager->Total Server Memory(KB) counters. It is not necessary to log all of these counters. If the working set suddenly drops much below private bytes, then this could indicate that the OS had to taken memory from SQL Server for another application." When running Private Bytes and Working Set on this server that has had the two high CPU occurances, the Private Bytes averages 280300544 and stays rather steady. However, the Working Set averages 116593746 and every 4 to 5 minutes dips down to near zero. Which leads to another statement I read on http://www.sql-server-performance.com/performance_monitor_counters_memory.asp which states: "Generally, on a physical server dedicated to SQL Server with an adequate amount of RAM, paging will average near zero. An adequate amount of RAM for SQL Server is a server that has a Buffer Hit Cache Ratio (described in more detail later) of 99% and higher. If you have a SQL Server that has a Buffer Hit Cache Ratio of 99% or higher for a period of 24 hours, but you are getting an average paging level of over 1, this generally means that you may be running other applications on the physical server other than SQL Server. If this is the case, you should remove those applications, allowing SQL Server to be the only major application on the physical server. If your SQL Server is not running any other applications, and paging exceeds 1 on average for a 24-hour period, this could mean that you have changed the SQL Server memory settings. SQL Server should be configured so that it is set to the "Dynamically configure SQL Server memory" option, and the "Maximum Memory" setting should be set at the highest level. For optimum performance, SQL Server should be allowed to take as much RAM as it wants for its own use without having to compete for RAM with other applications." So, while things point to a possible memory issue, I have run out of ideas on how to perhaps prove my findings. Please help with or any other ideas.
Show quote
Hide quote
On Mar 13, 10:56 pm, "cbrichards" <u3288@uwe> wrote: Following may be causes> We are having occasional high, SUSTAINED, CPU. If we just let it go, it stays > sustained upwards of 90 percent. We have not discovered the root cause, but > when it occurs we are forced to restart SQL Server Service to get back to our > client service levels, since our clients our timing out. We have 18 servers, > all running Windows 2003 R2, with SQL Server 2005 SP1. We have approximately > 40 databases per server (give or take, plus or minus), where the average size > is about 1.5 GB. All the servers have with 8 GB RAM, with MinServerMemory = 0, > and MaxServerMemory = 6144. The Total and Target Server memory counters all > run at about 6 GB. Each server is identical. Each database setup, options, > and configurations are identical. The tables within the databases have > identical schemas, but containing different data. > > Our SQL Servers are dedicated SQL Servers, with no other applications. > > When I say we occasionally have high CPU, it has only occurred on three of > our of our 18 servers. Of the three servers, on two of the servers it has > occurred once each, and the third server it has occurred twice. This third > server where it has occurred twice, it has happened these two times in the > last two weeks and usually occurs in the mid-morning hours, but on different > days of the week. We have TRAN backups that occur every hour, and take about > 5 minutes to complete total, for the approximate 40 databases per server. > > We run update statistics with fullscan every night, and reindex once a week. > > During these high CPU occurances, in Task Manager, SQL Server is the top > process. > > During these high CPU occurances, our reads and writes are at normal levels. > > During these high CPU occurances, our buffer cache hit ratio is 99%. > > Running CHECKPOINT on all databases, including TEMPDB, provides momentary > improvement, but in a matter of seconds, CPU is back and sustained over 90 > percent. > > Querying sys.sysprocesses, and viewing Activity Monitor does not reveal any > abnormal high CPU, but it does reveal quite a few open trans, and spwho does > show some blocking. > > While I am not 100% sure, it appears our compilations escalate to > approximately 100-150 per second, while during normal CPU load we are at > approximately 20 per second. (Whether this is a sympton across all other high > CPU times we have experienced I do not know, but when it occurred yesterday, > this was the case.) > > The SQL Error log has a couple of interesting lines: > 2007-03-12 10:48:10.040 spid1s AppDomain 15 (<I have replaced the real > database name here>.dbo[runtime].14) is marked for unload due to memory > pressure. > 2007-03-12 10:49:22.850 spid245 There is insufficient system memory to run > this query. > > In looking at Memory Object Pages/Sec, this server that has had the high CPU > occur twice, the average is 316. This average is during normal CPU load. I do > not know what it is during high CPU load. However, another server that also > experienced high CPU, has Pages/Sec = 13. And another that has never > experience this high CPU has Pages/Sec = 247. Yet most of the 18 servers, are > in the range of 10-15 Pages/Sec. > > I find that interesting, because, as I said, the servers, databases, and > table schemas are identical across all servers, the only difference being the > data. These servers, at least to our best attempts are load balanced. It is > interesting to me that identical servers, databases, etc., where two servers > each have such high Pages/Sec, compared to the other 16. > > In reading onhttp://www.sql-server-performance.com/qdpma/inst_3_pmlogs.asp, > I came across the following: "The SQL Server process Working Set should track > closely with Private Bytes, the SQL Server:Buffer Manager->Total Pages > counters and the SQL Server:Memory Manager->Total Server Memory(KB) counters. > It is not necessary to log all of these counters. If the working set suddenly > drops much below private bytes, then this could indicate that the OS had to > taken memory from SQL Server for another application." > > When running Private Bytes and Working Set on this server that has had the > two high CPU occurances, the Private Bytes averages 280300544 and stays > rather steady. However, the Working Set averages 116593746 and every 4 to 5 > minutes dips down to near zero. > > Which leads to another statement I read onhttp://www.sql-server-performance.com/performance_monitor_counters_me... > which states: "Generally, on a physical server dedicated to SQL Server with > an adequate amount of RAM, paging will average near zero. An adequate amount > of RAM for SQL Server is a server that has a Buffer Hit Cache Ratio > (described in more detail later) of 99% and higher. If you have a SQL Server > that has a Buffer Hit Cache Ratio of 99% or higher for a period of 24 hours, > but you are getting an average paging level of over 1, this generally means > that you may be running other applications on the physical server other than > SQL Server. If this is the case, you should remove those applications, > allowing SQL Server to be the only major application on the physical server. > > If your SQL Server is not running any other applications, and paging exceeds > 1 on average for a 24-hour period, this could mean that you have changed the > SQL Server memory settings. SQL Server should be configured so that it is set > to the "Dynamically configure SQL Server memory" option, and the "Maximum > Memory" setting should be set at the highest level. For optimum performance, > SQL Server should be allowed to take as much RAM as it wants for its own use > without having to compete for RAM with other applications." > > So, while things point to a possible memory issue, I have run out of ideas on > how to perhaps prove my findings. Please help with or any other ideas. > > -- > Message posted viahttp://www.sqlmonster.com 1. Lot of cursors and unreleased cursors 2. Lot of Prepared statements in cache . These statements are not going out of cache quickly. Look at syscacheobjects table . If this is the case use sp_executesql instead of prepared statements . 3. Look at DBCC MEMORY STATUS 4. sysprocesses table see cpu column M A Srinivas Thanks for the suggestions. I have responded to each suggestion, one with a
question. 1. Lot of cursors and unreleased cursors We have very, very few cursors. I will not rule it out, as the procedures that do have cursors could have possibly been hit hard at the time. I will review my traces. 2. Lot of Prepared statements in cache. Is this a count of cacheobjtype = 'Compiled Plan' ? Or is it Count(*) From sys.syscacheobjects with no filtering? 3. Look at DBCC MEMORY STATUS I have had a look at this and my target and committed values were identical. 4. sysprocesses table see cpu column I ran this during the high CPU episode and no one process stood out as consuming CPU greatly one over the other. M A Srinivas wrote: Show quoteHide quote >> We are having occasional high, SUSTAINED, CPU. If we just let it go, it stays >> sustained upwards of 90 percent. We have not discovered the root cause, but >[quoted text clipped - 98 lines] >> -- >> Message posted viahttp://www.sqlmonster.com > >Following may be causes > >1. Lot of cursors and unreleased cursors >2. Lot of Prepared statements in cache . These statements are not >going out of cache quickly. Look at syscacheobjects table . If this >is the case use sp_executesql instead of prepared statements . >3. Look at DBCC MEMORY STATUS >4. sysprocesses table see cpu column > >M A Srinivas That's a lot to digest but a few comments none the less. One is that with
only 8GB total and no other apps running on the server you should be able to get by with setting the MAX memory to around 7GB or so. But be careful in that if you are seeing real paging at the OS level this may increase it but that would seem to indicate you do have other processes running. Even though these servers may seem identical the data and users in each db are not. So that patterns and usage's can vary greatly. If you have one db that does a higher level of adhoc or non-reusable queries you can dramatically affect the proc cache. How much memory is taken up by the proc cache on the servers that have issues? What do you get when you select * from syscacheobjects (or the dmv) ordered by UseCounts Desc? If there are only a few at the top with high usecounts and a lot of ones with a usecount of 1 you are not getting good plan reuse. This will also eat up the cache. I take it these are 32 bit servers? Do you have /3GB set? If not you may want to try setting that. If you are doing things that need contiguous memory you could be depleting or fragmenting your memtoleave area. If so you can get some relief by increasing its size with the /g startup option. Running on X64 will help a number of these issues. But I also recommend you load the latest version of SP2 for SQL2005 as well. -- Show quoteHide quoteAndrew J. Kelly SQL MVP "cbrichards" <u3288@uwe> wrote in message news:6f20b5a5da07c@uwe... > We are having occasional high, SUSTAINED, CPU. If we just let it go, it > stays > sustained upwards of 90 percent. We have not discovered the root cause, > but > when it occurs we are forced to restart SQL Server Service to get back to > our > client service levels, since our clients our timing out. We have 18 > servers, > all running Windows 2003 R2, with SQL Server 2005 SP1. We have > approximately > 40 databases per server (give or take, plus or minus), where the average > size > is about 1.5 GB. All the servers have with 8 GB RAM, with MinServerMemory > = 0, > and MaxServerMemory = 6144. The Total and Target Server memory counters > all > run at about 6 GB. Each server is identical. Each database setup, options, > and configurations are identical. The tables within the databases have > identical schemas, but containing different data. > > Our SQL Servers are dedicated SQL Servers, with no other applications. > > When I say we occasionally have high CPU, it has only occurred on three of > our of our 18 servers. Of the three servers, on two of the servers it has > occurred once each, and the third server it has occurred twice. This third > server where it has occurred twice, it has happened these two times in the > last two weeks and usually occurs in the mid-morning hours, but on > different > days of the week. We have TRAN backups that occur every hour, and take > about > 5 minutes to complete total, for the approximate 40 databases per server. > > We run update statistics with fullscan every night, and reindex once a > week. > > During these high CPU occurances, in Task Manager, SQL Server is the top > process. > > During these high CPU occurances, our reads and writes are at normal > levels. > > During these high CPU occurances, our buffer cache hit ratio is 99%. > > Running CHECKPOINT on all databases, including TEMPDB, provides momentary > improvement, but in a matter of seconds, CPU is back and sustained over 90 > percent. > > Querying sys.sysprocesses, and viewing Activity Monitor does not reveal > any > abnormal high CPU, but it does reveal quite a few open trans, and spwho > does > show some blocking. > > While I am not 100% sure, it appears our compilations escalate to > approximately 100-150 per second, while during normal CPU load we are at > approximately 20 per second. (Whether this is a sympton across all other > high > CPU times we have experienced I do not know, but when it occurred > yesterday, > this was the case.) > > The SQL Error log has a couple of interesting lines: > 2007-03-12 10:48:10.040 spid1s AppDomain 15 (<I have replaced the real > database name here>.dbo[runtime].14) is marked for unload due to memory > pressure. > 2007-03-12 10:49:22.850 spid245 There is insufficient system memory to run > this query. > > In looking at Memory Object Pages/Sec, this server that has had the high > CPU > occur twice, the average is 316. This average is during normal CPU load. I > do > not know what it is during high CPU load. However, another server that > also > experienced high CPU, has Pages/Sec = 13. And another that has never > experience this high CPU has Pages/Sec = 247. Yet most of the 18 servers, > are > in the range of 10-15 Pages/Sec. > > I find that interesting, because, as I said, the servers, databases, and > table schemas are identical across all servers, the only difference being > the > data. These servers, at least to our best attempts are load balanced. It > is > interesting to me that identical servers, databases, etc., where two > servers > each have such high Pages/Sec, compared to the other 16. > > In reading on > http://www.sql-server-performance.com/qdpma/inst_3_pmlogs.asp, > I came across the following: "The SQL Server process Working Set should > track > closely with Private Bytes, the SQL Server:Buffer Manager->Total Pages > counters and the SQL Server:Memory Manager->Total Server Memory(KB) > counters. > It is not necessary to log all of these counters. If the working set > suddenly > drops much below private bytes, then this could indicate that the OS had > to > taken memory from SQL Server for another application." > > When running Private Bytes and Working Set on this server that has had the > two high CPU occurances, the Private Bytes averages 280300544 and stays > rather steady. However, the Working Set averages 116593746 and every 4 to > 5 > minutes dips down to near zero. > > Which leads to another statement I read on > http://www.sql-server-performance.com/performance_monitor_counters_memory.asp > which states: "Generally, on a physical server dedicated to SQL Server > with > an adequate amount of RAM, paging will average near zero. An adequate > amount > of RAM for SQL Server is a server that has a Buffer Hit Cache Ratio > (described in more detail later) of 99% and higher. If you have a SQL > Server > that has a Buffer Hit Cache Ratio of 99% or higher for a period of 24 > hours, > but you are getting an average paging level of over 1, this generally > means > that you may be running other applications on the physical server other > than > SQL Server. If this is the case, you should remove those applications, > allowing SQL Server to be the only major application on the physical > server. > > If your SQL Server is not running any other applications, and paging > exceeds > 1 on average for a 24-hour period, this could mean that you have changed > the > SQL Server memory settings. SQL Server should be configured so that it is > set > to the "Dynamically configure SQL Server memory" option, and the "Maximum > Memory" setting should be set at the highest level. For optimum > performance, > SQL Server should be allowed to take as much RAM as it wants for its own > use > without having to compete for RAM with other applications." > > So, while things point to a possible memory issue, I have run out of ideas > on > how to perhaps prove my findings. Please help with or any other ideas. > > -- > Message posted via http://www.sqlmonster.com > Thanks for your response Andrew. I will attempt to address some of your
questions. 1. These are 32 bit servers? Answer: Yes, all are 32 bit. 2. Do you have /3GB set? Answer: Yes, all servers have /3GB set. 3. How much memory is taken up by the proc cache on the servers that have issues? Answer: The values in (A) are from the troublesome server. The values in (B) are from one of the 16 servers not having high CPU issues. The following is from DBCC MemoryStatus Buffer Distribution Buffers (A) Buffers (B) -------------------- ----------- ----------- Stolen -69747 -249540 Free 1127 7755 Cached 244096 446426 Database (clean) 590999 532995 Database (dirty) 19952 48794 Procedure Cache Value (A) Value (B) -------------------- ----------- ----------- TotalProcs 4934 4478 TotalPages 155087 179879 InUsePages 202 50 4. What do you get when you select * from syscacheobjects (or the dmv) ordered by UseCounts Desc? Answer: When I run the following statement, I get the counts following the statement: select sql_text.text, CP.UseCounts, CP.Size_In_Bytes, CP.CacheObjType, CP.ObjType from sys.dm_exec_cached_plans CP cross apply sys.dm_exec_sql_text(plan_handle) as sql_text WHERE <<See Below>> No WHERE filter, total count: 2693 WHERE UseCounts = 1: 1707 WHERE UseCounts >= 100: 113 NOTE: The values shown here were taken this morning during normal, non-High CPU operations. Not knowing when high CPU will be experienced again, these are the values obtained. Andrew J. Kelly wrote: Show quoteHide quote >That's a lot to digest but a few comments none the less. One is that with >only 8GB total and no other apps running on the server you should be able to >get by with setting the MAX memory to around 7GB or so. But be careful in >that if you are seeing real paging at the OS level this may increase it but >that would seem to indicate you do have other processes running. Even though >these servers may seem identical the data and users in each db are not. So >that patterns and usage's can vary greatly. If you have one db that does a >higher level of adhoc or non-reusable queries you can dramatically affect >the proc cache. How much memory is taken up by the proc cache on the servers >that have issues? What do you get when you select * from syscacheobjects (or >the dmv) ordered by UseCounts Desc? If there are only a few at the top with >high usecounts and a lot of ones with a usecount of 1 you are not getting >good plan reuse. This will also eat up the cache. I take it these are 32 bit >servers? Do you have /3GB set? If not you may want to try setting that. >If you are doing things that need contiguous memory you could be depleting >or fragmenting your memtoleave area. If so you can get some relief by >increasing its size with the /g startup option. Running on X64 will help a >number of these issues. But I also recommend you load the latest version of >SP2 for SQL2005 as well. > >> We are having occasional high, SUSTAINED, CPU. If we just let it go, it >> stays >[quoted text clipped - 138 lines] >> on >> how to perhaps prove my findings. Please help with or any other ideas. Wow those are some large query plans. Even the A server has about 1.2GB
taken up just for the proc cache. 155087 pages * 8192 bytes / 1024 /1024 = 1.2MB You only have less than 5K plans in total. These must be very huge queries and it is easy to see how you can run up a large CPU hit when compiling these types of plans. You mentioned that you see lots of blocking when this is happening. If you have lots of locks taken out with so much memory taken by proc cache you can easily exhaust enough memory to see the types of errors you listed. Can you explain what is going on with these large plans in the first place? If you can reduce their size and get better plan reuse you should be able to cut down on the CPU usages and wasted space for the proc cache. I assume that the reason one or two servers show this more than the others is due to the way in which they are utilized. Possibly several really inefficient sps are called more often on those servers than the others? A couple more things to try or at least consider. You may want to try changing the Force Parameterization option to try and get better plan reuse to cut down on compiles and get better use of plan cache memory. Load SP2 (latest version) of SQL Server since it has some optimizations for managing Plan cache memory. But you really need to find out why the plans are so large and not being reused in general. -- Andrew J. Kelly SQL MVP "cbrichards via SQLMonster.com" <u3288@uwe> wrote in message news:6f6bc5a76962c@uwe...Show quoteHide quote > Thanks for your response Andrew. I will attempt to address some of your > questions. > > 1. These are 32 bit servers? > Answer: Yes, all are 32 bit. > > 2. Do you have /3GB set? > Answer: Yes, all servers have /3GB set. > > 3. How much memory is taken up by the proc cache on the servers that have > issues? > Answer: The values in (A) are from the troublesome server. The values in > (B) > are from one of the 16 servers not having high CPU issues. The following > is > from DBCC MemoryStatus > > > Buffer Distribution Buffers (A) Buffers (B) > -------------------- ----------- ----------- > Stolen -69747 -249540 > Free 1127 7755 > Cached 244096 446426 > Database (clean) 590999 532995 > Database (dirty) 19952 48794 > > > Procedure Cache Value (A) Value (B) > -------------------- ----------- ----------- > TotalProcs 4934 4478 > TotalPages 155087 179879 > InUsePages 202 50 > > 4. What do you get when you select * from syscacheobjects (or the dmv) > ordered by UseCounts Desc? > Answer: When I run the following statement, I get the counts following the > statement: > > select sql_text.text, > CP.UseCounts, > CP.Size_In_Bytes, > CP.CacheObjType, > CP.ObjType > from sys.dm_exec_cached_plans CP > cross apply sys.dm_exec_sql_text(plan_handle) as sql_text > WHERE <<See Below>> > > No WHERE filter, total count: 2693 > WHERE UseCounts = 1: 1707 > WHERE UseCounts >= 100: 113 > > > NOTE: The values shown here were taken this morning during normal, > non-High > CPU operations. Not knowing when high CPU will be experienced again, these > are the values obtained. > > > > Andrew J. Kelly wrote: >>That's a lot to digest but a few comments none the less. One is that with >>only 8GB total and no other apps running on the server you should be able >>to >>get by with setting the MAX memory to around 7GB or so. But be careful in >>that if you are seeing real paging at the OS level this may increase it >>but >>that would seem to indicate you do have other processes running. Even >>though >>these servers may seem identical the data and users in each db are not. So >>that patterns and usage's can vary greatly. If you have one db that does a >>higher level of adhoc or non-reusable queries you can dramatically affect >>the proc cache. How much memory is taken up by the proc cache on the >>servers >>that have issues? What do you get when you select * from syscacheobjects >>(or >>the dmv) ordered by UseCounts Desc? If there are only a few at the top >>with >>high usecounts and a lot of ones with a usecount of 1 you are not getting >>good plan reuse. This will also eat up the cache. I take it these are 32 >>bit >>servers? Do you have /3GB set? If not you may want to try setting that. >>If you are doing things that need contiguous memory you could be depleting >>or fragmenting your memtoleave area. If so you can get some relief by >>increasing its size with the /g startup option. Running on X64 will help >>a >>number of these issues. But I also recommend you load the latest version >>of >>SP2 for SQL2005 as well. >> >>> We are having occasional high, SUSTAINED, CPU. If we just let it go, it >>> stays >>[quoted text clipped - 138 lines] >>> on >>> how to perhaps prove my findings. Please help with or any other ideas. > > -- > Message posted via SQLMonster.com > http://www.sqlmonster.com/Uwe/Forums.aspx/sql-server/200703/1 > As far as identifying these large query plans, would the following query do
the trick? select top 50 sum(qs.total_worker_time) as total_cpu_time, sum(qs.execution_count) as total_execution_count, count(*) as number_of_statements, sql_text.text from sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(plan_handle) as sql_text group by sql_text.text order by sum(qs.total_worker_time) desc Andrew J. Kelly wrote: Show quoteHide quote >Wow those are some large query plans. Even the A server has about 1.2GB >taken up just for the proc cache. > 155087 pages * 8192 bytes / 1024 /1024 = 1.2MB >You only have less than 5K plans in total. These must be very huge queries >and it is easy to see how you can run up a large CPU hit when compiling >these types of plans. You mentioned that you see lots of blocking when this >is happening. If you have lots of locks taken out with so much memory taken >by proc cache you can easily exhaust enough memory to see the types of >errors you listed. Can you explain what is going on with these large plans >in the first place? If you can reduce their size and get better plan reuse >you should be able to cut down on the CPU usages and wasted space for the >proc cache. I assume that the reason one or two servers show this more than >the others is due to the way in which they are utilized. Possibly several >really inefficient sps are called more often on those servers than the >others? A couple more things to try or at least consider. You may want to >try changing the Force Parameterization option to try and get better plan >reuse to cut down on compiles and get better use of plan cache memory. Load >SP2 (latest version) of SQL Server since it has some optimizations for >managing Plan cache memory. But you really need to find out why the plans >are so large and not being reused in general. > >> Thanks for your response Andrew. I will attempt to address some of your >> questions. >[quoted text clipped - 84 lines] >>>> on >>>> how to perhaps prove my findings. Please help with or any other ideas. You would be better off looking for the plans that take up the most space.
This query will show the largest memory hogs on top. SELECT b.[bucketid], b.[cacheobjtype], b.[objtype], b.[refcounts], b.[usecounts] , a.[dbid], a.[objectid], b.[size_in_bytes], a.[text] FROM sys.dm_exec_cached_plans as b CROSS APPLY sys.dm_exec_sql_text(b.[plan_handle]) AS a ORDER BY [size_in_bytes] DESC -- Andrew J. Kelly SQL MVP "cbrichards via SQLMonster.com" <u3288@uwe> wrote in message news:6f778a39a4039@uwe...Show quoteHide quote > As far as identifying these large query plans, would the following query > do > the trick? > > select top 50 > sum(qs.total_worker_time) as total_cpu_time, > sum(qs.execution_count) as total_execution_count, > count(*) as number_of_statements, > sql_text.text > from > sys.dm_exec_query_stats qs > cross apply sys.dm_exec_sql_text(plan_handle) as sql_text > group by sql_text.text > order by sum(qs.total_worker_time) desc > > > Andrew J. Kelly wrote: >>Wow those are some large query plans. Even the A server has about 1.2GB >>taken up just for the proc cache. >> 155087 pages * 8192 bytes / 1024 /1024 = 1.2MB >>You only have less than 5K plans in total. These must be very huge queries >>and it is easy to see how you can run up a large CPU hit when compiling >>these types of plans. You mentioned that you see lots of blocking when >>this >>is happening. If you have lots of locks taken out with so much memory >>taken >>by proc cache you can easily exhaust enough memory to see the types of >>errors you listed. Can you explain what is going on with these large plans >>in the first place? If you can reduce their size and get better plan >>reuse >>you should be able to cut down on the CPU usages and wasted space for the >>proc cache. I assume that the reason one or two servers show this more >>than >>the others is due to the way in which they are utilized. Possibly several >>really inefficient sps are called more often on those servers than the >>others? A couple more things to try or at least consider. You may want to >>try changing the Force Parameterization option to try and get better plan >>reuse to cut down on compiles and get better use of plan cache memory. >>Load >>SP2 (latest version) of SQL Server since it has some optimizations for >>managing Plan cache memory. But you really need to find out why the plans >>are so large and not being reused in general. >> >>> Thanks for your response Andrew. I will attempt to address some of your >>> questions. >>[quoted text clipped - 84 lines] >>>>> on >>>>> how to perhaps prove my findings. Please help with or any other ideas. > > -- > Message posted via SQLMonster.com > http://www.sqlmonster.com/Uwe/Forums.aspx/sql-server/200703/1 > Based upon having 8 GB RAM, and 6 GB dedicated to SQL Server, what would be a
more optimal size to shoot for when it comes to the size of the proc cache? Is there a recommended size for the proc cache under these conditions? Andrew J. Kelly wrote: Show quoteHide quote >Wow those are some large query plans. Even the A server has about 1.2GB >taken up just for the proc cache. > 155087 pages * 8192 bytes / 1024 /1024 = 1.2MB >You only have less than 5K plans in total. These must be very huge queries >and it is easy to see how you can run up a large CPU hit when compiling >these types of plans. You mentioned that you see lots of blocking when this >is happening. If you have lots of locks taken out with so much memory taken >by proc cache you can easily exhaust enough memory to see the types of >errors you listed. Can you explain what is going on with these large plans >in the first place? If you can reduce their size and get better plan reuse >you should be able to cut down on the CPU usages and wasted space for the >proc cache. I assume that the reason one or two servers show this more than >the others is due to the way in which they are utilized. Possibly several >really inefficient sps are called more often on those servers than the >others? A couple more things to try or at least consider. You may want to >try changing the Force Parameterization option to try and get better plan >reuse to cut down on compiles and get better use of plan cache memory. Load >SP2 (latest version) of SQL Server since it has some optimizations for >managing Plan cache memory. But you really need to find out why the plans >are so large and not being reused in general. > >> Thanks for your response Andrew. I will attempt to address some of your >> questions. >[quoted text clipped - 84 lines] >>>> on >>>> how to perhaps prove my findings. Please help with or any other ideas. It's not a matter of a size to shoot for. As with anything that uses memory
you want it to be as efficient as possible. I have seen 10K + plans take up only several hundred MB's. You have only a few thousand and they take up over a GB. The bottom line is that they take up what they need to but I question what the plans are that are taking up so much space in yours. -- Andrew J. Kelly SQL MVP "cbrichards via SQLMonster.com" <u3288@uwe> wrote in message news:6f7791933c00d@uwe...Show quoteHide quote > Based upon having 8 GB RAM, and 6 GB dedicated to SQL Server, what would > be a > more optimal size to shoot for when it comes to the size of the proc > cache? > Is there a recommended size for the proc cache under these conditions? > > Andrew J. Kelly wrote: >>Wow those are some large query plans. Even the A server has about 1.2GB >>taken up just for the proc cache. >> 155087 pages * 8192 bytes / 1024 /1024 = 1.2MB >>You only have less than 5K plans in total. These must be very huge queries >>and it is easy to see how you can run up a large CPU hit when compiling >>these types of plans. You mentioned that you see lots of blocking when >>this >>is happening. If you have lots of locks taken out with so much memory >>taken >>by proc cache you can easily exhaust enough memory to see the types of >>errors you listed. Can you explain what is going on with these large plans >>in the first place? If you can reduce their size and get better plan >>reuse >>you should be able to cut down on the CPU usages and wasted space for the >>proc cache. I assume that the reason one or two servers show this more >>than >>the others is due to the way in which they are utilized. Possibly several >>really inefficient sps are called more often on those servers than the >>others? A couple more things to try or at least consider. You may want to >>try changing the Force Parameterization option to try and get better plan >>reuse to cut down on compiles and get better use of plan cache memory. >>Load >>SP2 (latest version) of SQL Server since it has some optimizations for >>managing Plan cache memory. But you really need to find out why the plans >>are so large and not being reused in general. >> >>> Thanks for your response Andrew. I will attempt to address some of your >>> questions. >>[quoted text clipped - 84 lines] >>>>> on >>>>> how to perhaps prove my findings. Please help with or any other ideas. > > -- > Message posted via SQLMonster.com > http://www.sqlmonster.com/Uwe/Forums.aspx/sql-server/200703/1 > On Tue, 13 Mar 2007 17:56:29 GMT, "cbrichards" <u3288@uwe> wrote: I see downthread discussion about huge query plans.>So, while things point to a possible memory issue, I have run out of ideas on >how to perhaps prove my findings. Please help with or any other ideas. Do you have any idea what code is involved? Is it even SP, or would something be submitting large (say >1000 line) dynamic queries? Would several SPIDs be contesting for data? You say sp_who2 shows some blocking, is it self-blocking or real cross-spid blocking? Any app-level idea why that would occur? Do you have any loops in your SPs? I don't see anything in your research that eliminates just some one bad piece of code, with a bad query plan, showing up and causing the choke. It may be too obvious for you to mention, but I thought I'd ask! The one thing I think you said is that no one SPID seems to be eating the CPU time. Narrowing down the code executing during these episodes would be very, very helpful! Josh Thanks Josh. Some of what you wrote I can answer. Some I just do not have
saved data to verify, other than what I noticed during the episode(s), such as, is it real cross-spid blocking or self blocking. This I will have to pay more attention too. I do notice this morning, that as connectivity has increased, that the page life expectancy has dropped to an average of 280 and falling as we have not yet reached the peak of our connectivity. Page Faults/Sec are averaging 750+. Our Buffer Cache Hit Ratio remains at 99%, Available Memory = 1.3 GB, and our Target and Total are matched at 6 GB. When I run a query to return the plans that are the largest consumers of memory, the ones at the top of the list are several thousand lines of code each. One in particular always seems to be prevalent in my profiler traces when these episodes occur. During these episodes it is compiling, because it's plan has been removed from cache, and with a plan as large as it is, I am sure it takes upward of 5 seconds or more to compile. I have a developer currently looking into optimizing this procedure. JXStern wrote: Show quoteHide quote >>So, while things point to a possible memory issue, I have run out of ideas on >>how to perhaps prove my findings. Please help with or any other ideas. > >I see downthread discussion about huge query plans. > >Do you have any idea what code is involved? Is it even SP, or would >something be submitting large (say >1000 line) dynamic queries? Would >several SPIDs be contesting for data? > >You say sp_who2 shows some blocking, is it self-blocking or real >cross-spid blocking? Any app-level idea why that would occur? > >Do you have any loops in your SPs? > >I don't see anything in your research that eliminates just some one >bad piece of code, with a bad query plan, showing up and causing the >choke. It may be too obvious for you to mention, but I thought I'd >ask! The one thing I think you said is that no one SPID seems to be >eating the CPU time. > >Narrowing down the code executing during these episodes would be very, >very helpful! > >Josh I think you have found the root cause. Sp's that are several thousand lines
are extremely hard to optimize and keep plans in cache. Usually operations that have that much processing are batch type operations and may be better served as an SSIS package. But in any case try split the procs up into several smaller ones and really try hard to optimize the existing code. That is a lot of code for a sp. -- Andrew J. Kelly SQL MVP "cbrichards via SQLMonster.com" <u3288@uwe> wrote in message news:6f842de0ee8f9@uwe...Show quoteHide quote > Thanks Josh. Some of what you wrote I can answer. Some I just do not have > saved data to verify, other than what I noticed during the episode(s), > such > as, is it real cross-spid blocking or self blocking. This I will have to > pay > more attention too. > > I do notice this morning, that as connectivity has increased, that the > page > life expectancy has dropped to an average of 280 and falling as we have > not > yet reached the peak of our connectivity. Page Faults/Sec are averaging > 750+. > Our Buffer Cache Hit Ratio remains at 99%, Available Memory = 1.3 GB, and > our > Target and Total are matched at 6 GB. > > When I run a query to return the plans that are the largest consumers of > memory, the ones at the top of the list are several thousand lines of code > each. One in particular always seems to be prevalent in my profiler traces > when these episodes occur. During these episodes it is compiling, because > it's plan has been removed from cache, and with a plan as large as it is, > I > am sure it takes upward of 5 seconds or more to compile. > > I have a developer currently looking into optimizing this procedure. > > JXStern wrote: >>>So, while things point to a possible memory issue, I have run out of >>>ideas on >>>how to perhaps prove my findings. Please help with or any other ideas. >> >>I see downthread discussion about huge query plans. >> >>Do you have any idea what code is involved? Is it even SP, or would >>something be submitting large (say >1000 line) dynamic queries? Would >>several SPIDs be contesting for data? >> >>You say sp_who2 shows some blocking, is it self-blocking or real >>cross-spid blocking? Any app-level idea why that would occur? >> >>Do you have any loops in your SPs? >> >>I don't see anything in your research that eliminates just some one >>bad piece of code, with a bad query plan, showing up and causing the >>choke. It may be too obvious for you to mention, but I thought I'd >>ask! The one thing I think you said is that no one SPID seems to be >>eating the CPU time. >> >>Narrowing down the code executing during these episodes would be very, >>very helpful! >> >>Josh > > -- > Message posted via SQLMonster.com > http://www.sqlmonster.com/Uwe/Forums.aspx/sql-server/200703/1 >
Other interesting topics
query
Problem starting up databases Trace flags for deadlock detection on startup. Second Instance os SQL 2005? Backup log isn't overwritten in SQL Server 2005 Deadlocks in the error log Full Backup DB is too large of DB & TL Service account authentication cache SQL Express on a small network How to setup SQL Mail on SQL 2005 express |
|||||||||||||||||||||||