Home All Groups Group Topic Archive Search About

Covering indexes question



Author
19 Nov 2008 5:21 PM
David Walker
If I have a nonclustered index in a Sales table that uses the CustomerId
and DateSold columns, and I do a Select query that filters by the DateSold
column only, will that index be used?

Or do I need a separate index for only the DateSold column?

Covering indexes are confusing, even though I have read Kalen and other
people on the topic.

At one time, I thought that if 5 columns of a table are searched with
various queries, then, instead of creating 5 separate indexes, I should
just create ONE index that uses all 5 columns... and whichever column was
used in any query, the covering index would be enough.

But somehow I don't think that's the right answer.

Thanks for any clarification.


David Walker

Author
19 Nov 2008 5:28 PM
Tibor Karaszi
You need to differentiate between seek and a scan. consider a
telephone directory (being the index) and the real world (being the
data). You can seek for the last name. You can scan for a first name.
As long as you are only interested in info which is in the directory,
then it covers, but if you need eye colour, you need to visit
data-pages.

Show quoteHide quote
"David Walker" <n***@none.com> wrote in message
news:Xns9B5B695DDE288DavidWalker@207.46.248.16...
> If I have a nonclustered index in a Sales table that uses the
> CustomerId
> and DateSold columns, and I do a Select query that filters by the
> DateSold
> column only, will that index be used?
>
> Or do I need a separate index for only the DateSold column?
>
> Covering indexes are confusing, even though I have read Kalen and
> other
> people on the topic.
>
> At one time, I thought that if 5 columns of a table are searched
> with
> various queries, then, instead of creating 5 separate indexes, I
> should
> just create ONE index that uses all 5 columns... and whichever
> column was
> used in any query, the covering index would be enough.
>
> But somehow I don't think that's the right answer.
>
> Thanks for any clarification.
>
>
> David Walker
Are all your drivers up to date? click for free checkup

Author
19 Nov 2008 9:26 PM
Tom Cooper
That index would be used, but the entire index would be scanned.  For
example, suppose the table had 1,000,000 rows averaging 80 bytes each.
Suppose also that the table has 20 years of data in it so that there are
about 50,000 rows for each year.  And your query is to get the count of
sales for each customer which had at least one sale in 2005:

Select CustomerID, Count(*)
From Sales Where DateSold >= '20050101' And DateSold < '20060101'

Then if you don't have a index that can be used, SQL will have to scan the
whole table which is about 80MB (1,000,000 rows * 80 bytes per row).

If you have a nonclustered index on CustomerID, DateSold, that index covers
the query and can be used, however SQL will need to scan the entire index.
But each leaf level entry in the index will take about 4 bytes for
CustomerID (assuming it is an int) plus 8 bytes for the DateSold (assuming
it is a datetime) plus the length of any clustered index key columns other
than CustomerID and DateSold (let's assume that's another 4 bytes).  So the
total is 16 bytes for each leaf level entry.  So the total size of the index
will be on the order of 16MB (1,000,000 rows * 16 bytes per row) (actually
more depending on how much empty space is in the index, but you get the
idea).  So this index will run this query much faster than if you didn't
have the index.

But if you have a nonclustered index on DateSold, then each leaf level entry
takes 12 bytes (8 for DateSold and 4 for the clustered index key).  And
since all of the 2005 rows are together in the index, only the 50,000 rows
for the year 2005 need to be scanned, so that is about 600,000 bytes (plus
overhead).  So this index is much better for this query than a nonclustered
index on CustomerID, DateSold.

Tom

Show quoteHide quote
"David Walker" <n***@none.com> wrote in message
news:Xns9B5B695DDE288DavidWalker@207.46.248.16...
> If I have a nonclustered index in a Sales table that uses the CustomerId
> and DateSold columns, and I do a Select query that filters by the DateSold
> column only, will that index be used?
>
> Or do I need a separate index for only the DateSold column?
>
> Covering indexes are confusing, even though I have read Kalen and other
> people on the topic.
>
> At one time, I thought that if 5 columns of a table are searched with
> various queries, then, instead of creating 5 separate indexes, I should
> just create ONE index that uses all 5 columns... and whichever column was
> used in any query, the covering index would be enough.
>
> But somehow I don't think that's the right answer.
>
> Thanks for any clarification.
>
>
> David Walker
Author
20 Nov 2008 4:34 AM
Uri Dimant
Tom
Another option could be having a CI  on DateSold  and NCI on CustomerID.....
to cover the query




Show quoteHide quote
"Tom Cooper" <tomcooper@comcast.no.spam.please.net> wrote in message
news:uQYwv1oSJHA.3584@TK2MSFTNGP06.phx.gbl...
> That index would be used, but the entire index would be scanned.  For
> example, suppose the table had 1,000,000 rows averaging 80 bytes each.
> Suppose also that the table has 20 years of data in it so that there are
> about 50,000 rows for each year.  And your query is to get the count of
> sales for each customer which had at least one sale in 2005:
>
> Select CustomerID, Count(*)
> From Sales Where DateSold >= '20050101' And DateSold < '20060101'
>
> Then if you don't have a index that can be used, SQL will have to scan the
> whole table which is about 80MB (1,000,000 rows * 80 bytes per row).
>
> If you have a nonclustered index on CustomerID, DateSold, that index
> covers the query and can be used, however SQL will need to scan the entire
> index. But each leaf level entry in the index will take about 4 bytes for
> CustomerID (assuming it is an int) plus 8 bytes for the DateSold (assuming
> it is a datetime) plus the length of any clustered index key columns other
> than CustomerID and DateSold (let's assume that's another 4 bytes).  So
> the total is 16 bytes for each leaf level entry.  So the total size of the
> index will be on the order of 16MB (1,000,000 rows * 16 bytes per row)
> (actually more depending on how much empty space is in the index, but you
> get the idea).  So this index will run this query much faster than if you
> didn't have the index.
>
> But if you have a nonclustered index on DateSold, then each leaf level
> entry takes 12 bytes (8 for DateSold and 4 for the clustered index key).
> And since all of the 2005 rows are together in the index, only the 50,000
> rows for the year 2005 need to be scanned, so that is about 600,000 bytes
> (plus overhead).  So this index is much better for this query than a
> nonclustered index on CustomerID, DateSold.
>
> Tom
>
> "David Walker" <n***@none.com> wrote in message
> news:Xns9B5B695DDE288DavidWalker@207.46.248.16...
>> If I have a nonclustered index in a Sales table that uses the CustomerId
>> and DateSold columns, and I do a Select query that filters by the
>> DateSold
>> column only, will that index be used?
>>
>> Or do I need a separate index for only the DateSold column?
>>
>> Covering indexes are confusing, even though I have read Kalen and other
>> people on the topic.
>>
>> At one time, I thought that if 5 columns of a table are searched with
>> various queries, then, instead of creating 5 separate indexes, I should
>> just create ONE index that uses all 5 columns... and whichever column was
>> used in any query, the covering index would be enough.
>>
>> But somehow I don't think that's the right answer.
>>
>> Thanks for any clarification.
>>
>>
>> David Walker
>
>
Author
19 Nov 2008 9:54 PM
Daniel Jameson
David,

"Covering Indexes" are about increasing the performance of very specific and
fequently used queries.  The performance comes from allowing the server to
fulfill the query entirely from the non-clustered index without having to
process from the full body of data in the clustered index (or heap).  The
columns of the WHERE clause should be the first columns of the index and the
columns of the SELECT clause should be the last columns of the index.  If
you need to frequently look up customer name by customer id, then you would
want this index:

CREATE INDEX CustIDName on XXX (CustID, LName, FName);

If you need to frequently look up user names from e-mail addresses, then you
would use this:

CREATE INDEX EMailUName on XXX (EMail, UserID);

As always, every index speeds SELECTs and slows INSERTs, so add indexes
judiciously.

--
Thank you,

Daniel Jameson
SQL Server DBA
Children's Oncology Group
www.childrensoncologygroup.org


Show quoteHide quote
"David Walker" <n***@none.com> wrote in message
news:Xns9B5B695DDE288DavidWalker@207.46.248.16...
> If I have a nonclustered index in a Sales table that uses the CustomerId
> and DateSold columns, and I do a Select query that filters by the DateSold
> column only, will that index be used?
>
> Or do I need a separate index for only the DateSold column?
>
> Covering indexes are confusing, even though I have read Kalen and other
> people on the topic.
>
> At one time, I thought that if 5 columns of a table are searched with
> various queries, then, instead of creating 5 separate indexes, I should
> just create ONE index that uses all 5 columns... and whichever column was
> used in any query, the covering index would be enough.
>
> But somehow I don't think that's the right answer.
>
> Thanks for any clarification.
>
>
> David Walker

Bookmark and Share