|
sql
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Statistics and MaintenanceI am trying to determine the best approach to creating and maintaining
statistics in a large data warehouse environment on SQL Server 2000. The current environment has auto create statistics on and auto update statistics off. There are scheduled jobs that manually update statistics. Those jobs are taking longer and I want to know what the impact would be if we dropped some of the statistics. Due to the auto create statistics option, there are statistics on nearly every column. What is the benefit of this if a column is not indexed? Most tables only have on or two indexes. Thanks, Ed Ed,
The stats help the optimizer chose the most efficient access method (plan) for the query. Knowing the distribution of values for a given column would help the optimizer more quickly choose that plan while ruling out other plans. You might also want to review: Microsoft SQL Server 2000 Index Defragmentation Best Practices http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ss2kidbp.mspx and Microsoft SQL Server 2000 RDBMS Performance Tuning Guide for Data Warehousing http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/rdbmspft.mspx HTH Jerry Show quoteHide quote "Ed Mays" <EdM***@discussions.microsoft.com> wrote in message news:3B94BD42-2F7F-44C4-B7AC-B34CCEE6B99B@microsoft.com... >I am trying to determine the best approach to creating and maintaining > statistics in a large data warehouse environment on SQL Server 2000. The > current environment has auto create statistics on and auto update > statistics > off. There are scheduled jobs that manually update statistics. Those > jobs > are taking longer and I want to know what the impact would be if we > dropped > some of the statistics. Due to the auto create statistics option, there > are > statistics on nearly every column. What is the benefit of this if a > column > is not indexed? Most tables only have on or two indexes. > > Thanks, Ed Thanks Jerry - I still have a question. What is the benefit of creating and
maintaining a statistic on a column that will never benefit from an index. Wouldn't the query plan always be the same? For example, if I had a 2 million row table and had a column that only had 5 unique values evenly distributed, an index on that column would not be selective enough to be used. So, what would the benefit be of having statistics on that column? Ed Show quoteHide quote "Jerry Spivey" wrote: > Ed, > > The stats help the optimizer chose the most efficient access method (plan) > for the query. Knowing the distribution of values for a given column would > help the optimizer more quickly choose that plan while ruling out other > plans. > > You might also want to review: > > Microsoft SQL Server 2000 Index Defragmentation Best Practices > http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ss2kidbp.mspx > > and > > Microsoft SQL Server 2000 RDBMS Performance Tuning Guide for Data > Warehousing > http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/rdbmspft.mspx > > HTH > > Jerry > > "Ed Mays" <EdM***@discussions.microsoft.com> wrote in message > news:3B94BD42-2F7F-44C4-B7AC-B34CCEE6B99B@microsoft.com... > >I am trying to determine the best approach to creating and maintaining > > statistics in a large data warehouse environment on SQL Server 2000. The > > current environment has auto create statistics on and auto update > > statistics > > off. There are scheduled jobs that manually update statistics. Those > > jobs > > are taking longer and I want to know what the impact would be if we > > dropped > > some of the statistics. Due to the auto create statistics option, there > > are > > statistics on nearly every column. What is the benefit of this if a > > column > > is not indexed? Most tables only have on or two indexes. > > > > Thanks, Ed > > > Ed,
Agreed. But how would the optimizer know that the column has a low selectivity if the column is not the first column in an index, an index doesn't exist or if the column does not have statistics on it? HTH Jerry Show quoteHide quote "Ed Mays" <EdM***@discussions.microsoft.com> wrote in message news:D585B614-E767-40E7-A370-FE393FCE2BEE@microsoft.com... > Thanks Jerry - I still have a question. What is the benefit of creating > and > maintaining a statistic on a column that will never benefit from an index. > Wouldn't the query plan always be the same? For example, if I had a 2 > million row table and had a column that only had 5 unique values evenly > distributed, an index on that column would not be selective enough to be > used. So, what would the benefit be of having statistics on that column? > > Ed > > "Jerry Spivey" wrote: > >> Ed, >> >> The stats help the optimizer chose the most efficient access method >> (plan) >> for the query. Knowing the distribution of values for a given column >> would >> help the optimizer more quickly choose that plan while ruling out other >> plans. >> >> You might also want to review: >> >> Microsoft SQL Server 2000 Index Defragmentation Best Practices >> http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ss2kidbp.mspx >> >> and >> >> Microsoft SQL Server 2000 RDBMS Performance Tuning Guide for Data >> Warehousing >> http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/rdbmspft.mspx >> >> HTH >> >> Jerry >> >> "Ed Mays" <EdM***@discussions.microsoft.com> wrote in message >> news:3B94BD42-2F7F-44C4-B7AC-B34CCEE6B99B@microsoft.com... >> >I am trying to determine the best approach to creating and maintaining >> > statistics in a large data warehouse environment on SQL Server 2000. >> > The >> > current environment has auto create statistics on and auto update >> > statistics >> > off. There are scheduled jobs that manually update statistics. Those >> > jobs >> > are taking longer and I want to know what the impact would be if we >> > dropped >> > some of the statistics. Due to the auto create statistics option, >> > there >> > are >> > statistics on nearly every column. What is the benefit of this if a >> > column >> > is not indexed? Most tables only have on or two indexes. >> > >> > Thanks, Ed >> >> >> That takes me back to my original question. Even if there were an index,
the optimizer would not use the index, so what is the benefit of the statistic for that column? I understand how it would benefit if there were a chance of using the index, but if the optimizer is going to choose a table scan regardless, why have and maintain the statistic? Ed Show quoteHide quote "Jerry Spivey" wrote: > Ed, > > Agreed. But how would the optimizer know that the column has a low > selectivity if the column is not the first column in an index, an index > doesn't exist or if the column does not have statistics on it? > > HTH > > Jerry > > "Ed Mays" <EdM***@discussions.microsoft.com> wrote in message > news:D585B614-E767-40E7-A370-FE393FCE2BEE@microsoft.com... > > Thanks Jerry - I still have a question. What is the benefit of creating > > and > > maintaining a statistic on a column that will never benefit from an index. > > Wouldn't the query plan always be the same? For example, if I had a 2 > > million row table and had a column that only had 5 unique values evenly > > distributed, an index on that column would not be selective enough to be > > used. So, what would the benefit be of having statistics on that column? > > > > Ed > > > > "Jerry Spivey" wrote: > > > >> Ed, > >> > >> The stats help the optimizer chose the most efficient access method > >> (plan) > >> for the query. Knowing the distribution of values for a given column > >> would > >> help the optimizer more quickly choose that plan while ruling out other > >> plans. > >> > >> You might also want to review: > >> > >> Microsoft SQL Server 2000 Index Defragmentation Best Practices > >> http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ss2kidbp.mspx > >> > >> and > >> > >> Microsoft SQL Server 2000 RDBMS Performance Tuning Guide for Data > >> Warehousing > >> http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/rdbmspft.mspx > >> > >> HTH > >> > >> Jerry > >> > >> "Ed Mays" <EdM***@discussions.microsoft.com> wrote in message > >> news:3B94BD42-2F7F-44C4-B7AC-B34CCEE6B99B@microsoft.com... > >> >I am trying to determine the best approach to creating and maintaining > >> > statistics in a large data warehouse environment on SQL Server 2000. > >> > The > >> > current environment has auto create statistics on and auto update > >> > statistics > >> > off. There are scheduled jobs that manually update statistics. Those > >> > jobs > >> > are taking longer and I want to know what the impact would be if we > >> > dropped > >> > some of the statistics. Due to the auto create statistics option, > >> > there > >> > are > >> > statistics on nearly every column. What is the benefit of this if a > >> > column > >> > is not indexed? Most tables only have on or two indexes. > >> > > >> > Thanks, Ed > >> > >> > >> > > > Good point.
How would the optimizer know not to use the index? Because it would analyze the stats of the index. I'm thinking along the lines of composite NC indexes where multiple columns are used and only the first column has stats (behavior when creating an index) and coverying queries by only using the NC index(es) if possible. If the optimizer can determine that it is more efficient to cover the query via the NC index(es) by using the available column(s) stats then it might not perform a table scan which could considerably reduce I/O and increase the speed of the query while optimizing the cache usage. HTH Jerry Show quoteHide quote "Ed Mays" <EdM***@discussions.microsoft.com> wrote in message news:A99102B2-18B7-4C10-9A28-B4F1C4A38313@microsoft.com... > That takes me back to my original question. Even if there were an index, > the optimizer would not use the index, so what is the benefit of the > statistic for that column? I understand how it would benefit if there > were a > chance of using the index, but if the optimizer is going to choose a table > scan regardless, why have and maintain the statistic? > > Ed > > "Jerry Spivey" wrote: > >> Ed, >> >> Agreed. But how would the optimizer know that the column has a low >> selectivity if the column is not the first column in an index, an index >> doesn't exist or if the column does not have statistics on it? >> >> HTH >> >> Jerry >> >> "Ed Mays" <EdM***@discussions.microsoft.com> wrote in message >> news:D585B614-E767-40E7-A370-FE393FCE2BEE@microsoft.com... >> > Thanks Jerry - I still have a question. What is the benefit of >> > creating >> > and >> > maintaining a statistic on a column that will never benefit from an >> > index. >> > Wouldn't the query plan always be the same? For example, if I had a 2 >> > million row table and had a column that only had 5 unique values evenly >> > distributed, an index on that column would not be selective enough to >> > be >> > used. So, what would the benefit be of having statistics on that >> > column? >> > >> > Ed >> > >> > "Jerry Spivey" wrote: >> > >> >> Ed, >> >> >> >> The stats help the optimizer chose the most efficient access method >> >> (plan) >> >> for the query. Knowing the distribution of values for a given column >> >> would >> >> help the optimizer more quickly choose that plan while ruling out >> >> other >> >> plans. >> >> >> >> You might also want to review: >> >> >> >> Microsoft SQL Server 2000 Index Defragmentation Best Practices >> >> http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ss2kidbp.mspx >> >> >> >> and >> >> >> >> Microsoft SQL Server 2000 RDBMS Performance Tuning Guide for Data >> >> Warehousing >> >> http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/rdbmspft.mspx >> >> >> >> HTH >> >> >> >> Jerry >> >> >> >> "Ed Mays" <EdM***@discussions.microsoft.com> wrote in message >> >> news:3B94BD42-2F7F-44C4-B7AC-B34CCEE6B99B@microsoft.com... >> >> >I am trying to determine the best approach to creating and >> >> >maintaining >> >> > statistics in a large data warehouse environment on SQL Server 2000. >> >> > The >> >> > current environment has auto create statistics on and auto update >> >> > statistics >> >> > off. There are scheduled jobs that manually update statistics. >> >> > Those >> >> > jobs >> >> > are taking longer and I want to know what the impact would be if we >> >> > dropped >> >> > some of the statistics. Due to the auto create statistics option, >> >> > there >> >> > are >> >> > statistics on nearly every column. What is the benefit of this if a >> >> > column >> >> > is not indexed? Most tables only have on or two indexes. >> >> > >> >> > Thanks, Ed >> >> >> >> >> >> >> >> >> |
|||||||||||||||||||||||