Home All Groups Group Topic Archive Search About

SQL2k: Constant subquery executed as correlated - performance problem

Author
2 Mar 2007 2:34 AM
mzacharzewski
Hi All,
I have a great problem with SQL Server 2000 (Ent. Ed., sp4,
8.00.2187) performance.
I don't know exactly when discribed behaviour of SQL Svr has started
(maybe after sp, fix?), but I am sure, this performed well in the
past.
The server has started to choose bad plans when executing stored
procedures.
Example:
SELECT Col1, Col2
FROM TableA
WHERE ID_Data IN ( select ID from TableB where Val = 123 )

- TableA -> 6000000 rows;
- TableB -> 7000 rows;
- query: (select ID from TableB where Val = 123 )   -> returns 10 rows
IDs: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
- all indexes are present and rebuild:
   TableA (ID_Data)
   TableB (Val, ID)
- statistics are updated with full scan

SQL Server chooses to execute subquery as correlated subquery -
executes subquery once for every row from TableA - why?

Different query:
SELECT Col1, Col2
FROM TableA
WHERE ID_Data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

executes as expected (index seek TableA.ID_Data)

In some places I had to use query plan hints in order to achive
satisfactory perfomance - but in my opinion SQL Svr shouldn't behave
that way..
This is just an easy example, but the problem exists in many SP's /
queries much more complicated - the reason is always the same:
subquery executed in a bad way.

I've noticed same SP executed in query analyzer - chooses proper
execution plan, but
called from web page (ASP.NET 2.0) - chooses bad plan described above.
I know SQL Svr stores different SP execution plans, but even with all
the same settings:
set quoted_identifier on
set implicit_transactions off
set cursor_close_on_commit off
set ansi_warnings on
set ansi_padding on
set ansi_nulls on
set concat_null_yields_null on
set language us_english
set dateformat mdy
set datefirst 7

SP executes differently in both environment..

My question is, why SQL Server is choosing such an inefficient plan
for constant subquery ?

Regards:
  Marcin Zacharzewski

Author
2 Mar 2007 3:19 AM
oj
Try rewrite it as a join.

e.g.
SELECT Col1, Col2
FROM TableA join TableB on TableA .ID_Data=TableB.ID
WHERE  TableA.Val = 123

Also, have you looked at this article. Excellent info there.
http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx



--
-oj



<mzacharzew***@linksoft.pl> wrote in message
Show quoteHide quote
news:1172802851.773779.16460@z35g2000cwz.googlegroups.com...
> Hi All,
> I have a great problem with SQL Server 2000 (Ent. Ed., sp4,
> 8.00.2187) performance.
> I don't know exactly when discribed behaviour of SQL Svr has started
> (maybe after sp, fix?), but I am sure, this performed well in the
> past.
> The server has started to choose bad plans when executing stored
> procedures.
> Example:
> SELECT Col1, Col2
> FROM TableA
> WHERE ID_Data IN ( select ID from TableB where Val = 123 )
>
> - TableA -> 6000000 rows;
> - TableB -> 7000 rows;
> - query: (select ID from TableB where Val = 123 )   -> returns 10 rows
> IDs: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> - all indexes are present and rebuild:
>   TableA (ID_Data)
>   TableB (Val, ID)
> - statistics are updated with full scan
>
> SQL Server chooses to execute subquery as correlated subquery -
> executes subquery once for every row from TableA - why?
>
> Different query:
> SELECT Col1, Col2
> FROM TableA
> WHERE ID_Data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
>
> executes as expected (index seek TableA.ID_Data)
>
> In some places I had to use query plan hints in order to achive
> satisfactory perfomance - but in my opinion SQL Svr shouldn't behave
> that way..
> This is just an easy example, but the problem exists in many SP's /
> queries much more complicated - the reason is always the same:
> subquery executed in a bad way.
>
> I've noticed same SP executed in query analyzer - chooses proper
> execution plan, but
> called from web page (ASP.NET 2.0) - chooses bad plan described above.
> I know SQL Svr stores different SP execution plans, but even with all
> the same settings:
> set quoted_identifier on
> set implicit_transactions off
> set cursor_close_on_commit off
> set ansi_warnings on
> set ansi_padding on
> set ansi_nulls on
> set concat_null_yields_null on
> set language us_english
> set dateformat mdy
> set datefirst 7
>
> SP executes differently in both environment..
>
> My question is, why SQL Server is choosing such an inefficient plan
> for constant subquery ?
>
> Regards:
>  Marcin Zacharzewski
>
Are all your drivers up to date? click for free checkup

Author
2 Mar 2007 3:37 AM
mzacharzewski
Thank you,
for your quick answer.
Yes, I know I can use joins usually, but sometimes I can't use them,
for example with SQL CE RDA synchronization - when you get a table
with TRACKINGON, SQL CE refuses selects with joins - so all the
necessary conditions must be set in WHERE.

But the question is still - why SQL Server chooses so strange/
inefficient execution plan ? Why reevaluates constant subquery ?

Regards:
    Marcin Zacharzewski

Show quoteHide quote
On 2 Mar, 04:19, "oj" <nospam_oj...@home.com> wrote:
> Try rewrite it as a join.
>
> e.g.
>  SELECT Col1, Col2
>  FROM TableA join TableB on TableA .ID_Data=TableB.ID
>  WHERE  TableA.Val = 123
>
> Also, have you looked at this article. Excellent info there.http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
>
> --
> -oj
>
> <mzacharzew***@linksoft.pl> wrote in message
>
> news:1172802851.773779.16460@z35g2000cwz.googlegroups.com...
>
>
>
> > Hi All,
> > I have a great problem with SQL Server 2000 (Ent. Ed., sp4,
> > 8.00.2187) performance.
> > I don't know exactly when discribed behaviour of SQL Svr has started
> > (maybe after sp, fix?), but I am sure, this performed well in the
> > past.
> > The server has started to choose bad plans when executing stored
> > procedures.
> > Example:
> > SELECT Col1, Col2
> > FROM TableA
> > WHERE ID_Data IN ( select ID from TableB where Val = 123 )
>
> > - TableA -> 6000000 rows;
> > - TableB -> 7000 rows;
> > - query: (select ID from TableB where Val = 123 )   -> returns 10 rows
> > IDs: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> > - all indexes are present and rebuild:
> >   TableA (ID_Data)
> >   TableB (Val, ID)
> > - statistics are updated with full scan
>
> > SQL Server chooses to execute subquery as correlated subquery -
> > executes subquery once for every row from TableA - why?
>
> > Different query:
> > SELECT Col1, Col2
> > FROM TableA
> > WHERE ID_Data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
>
> > executes as expected (index seek TableA.ID_Data)
>
> > In some places I had to use query plan hints in order to achive
> > satisfactory perfomance - but in my opinion SQL Svr shouldn't behave
> > that way..
> > This is just an easy example, but the problem exists in many SP's /
> > queries much more complicated - the reason is always the same:
> > subquery executed in a bad way.
>
> > I've noticed same SP executed in query analyzer - chooses proper
> > execution plan, but
> > called from web page (ASP.NET 2.0) - chooses bad plan described above.
> > I know SQL Svr stores different SP execution plans, but even with all
> > the same settings:
> > set quoted_identifier on
> > set implicit_transactions off
> > set cursor_close_on_commit off
> > set ansi_warnings on
> > set ansi_padding on
> > set ansi_nulls on
> > set concat_null_yields_null on
> > set language us_english
> > set dateformat mdy
> > set datefirst 7
>
> > SP executes differently in both environment..
>
> > My question is, why SQL Server is choosing such an inefficient plan
> > for constant subquery ?
>
> > Regards:
> >  Marcin Zacharzewski- Ukryj cytowany tekst -
>
> - Poka¿ cytowany tekst -
Author
2 Mar 2007 5:12 AM
oj
sqlserver is cost-based optimizer. depending on the amount of data, index,
stats, it will pick the best plan that it can evaluate. there is also an
excellent article on this subject on technet. i suggest you take a read.

http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx

also, i recommend you take a look at bart's blog:
http://blogs.msdn.com/bartd/archive/2006/07/25/678015.aspx



--
-oj



<mzacharzew***@linksoft.pl> wrote in message
news:1172806655.928429.237480@j27g2000cwj.googlegroups.com...
Thank you,
for your quick answer.
Yes, I know I can use joins usually, but sometimes I can't use them,
for example with SQL CE RDA synchronization - when you get a table
with TRACKINGON, SQL CE refuses selects with joins - so all the
necessary conditions must be set in WHERE.

But the question is still - why SQL Server chooses so strange/
inefficient execution plan ? Why reevaluates constant subquery ?

Regards:
    Marcin Zacharzewski

Show quoteHide quote
On 2 Mar, 04:19, "oj" <nospam_oj...@home.com> wrote:
> Try rewrite it as a join.
>
> e.g.
>  SELECT Col1, Col2
>  FROM TableA join TableB on TableA .ID_Data=TableB.ID
>  WHERE  TableA.Val = 123
>
> Also, have you looked at this article. Excellent info
> there.http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
>
> --
> -oj
>
> <mzacharzew***@linksoft.pl> wrote in message
>
> news:1172802851.773779.16460@z35g2000cwz.googlegroups.com...
>
>
>
> > Hi All,
> > I have a great problem with SQL Server 2000 (Ent. Ed., sp4,
> > 8.00.2187) performance.
> > I don't know exactly when discribed behaviour of SQL Svr has started
> > (maybe after sp, fix?), but I am sure, this performed well in the
> > past.
> > The server has started to choose bad plans when executing stored
> > procedures.
> > Example:
> > SELECT Col1, Col2
> > FROM TableA
> > WHERE ID_Data IN ( select ID from TableB where Val = 123 )
>
> > - TableA -> 6000000 rows;
> > - TableB -> 7000 rows;
> > - query: (select ID from TableB where Val = 123 )   -> returns 10 rows
> > IDs: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> > - all indexes are present and rebuild:
> >   TableA (ID_Data)
> >   TableB (Val, ID)
> > - statistics are updated with full scan
>
> > SQL Server chooses to execute subquery as correlated subquery -
> > executes subquery once for every row from TableA - why?
>
> > Different query:
> > SELECT Col1, Col2
> > FROM TableA
> > WHERE ID_Data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
>
> > executes as expected (index seek TableA.ID_Data)
>
> > In some places I had to use query plan hints in order to achive
> > satisfactory perfomance - but in my opinion SQL Svr shouldn't behave
> > that way..
> > This is just an easy example, but the problem exists in many SP's /
> > queries much more complicated - the reason is always the same:
> > subquery executed in a bad way.
>
> > I've noticed same SP executed in query analyzer - chooses proper
> > execution plan, but
> > called from web page (ASP.NET 2.0) - chooses bad plan described above.
> > I know SQL Svr stores different SP execution plans, but even with all
> > the same settings:
> > set quoted_identifier on
> > set implicit_transactions off
> > set cursor_close_on_commit off
> > set ansi_warnings on
> > set ansi_padding on
> > set ansi_nulls on
> > set concat_null_yields_null on
> > set language us_english
> > set dateformat mdy
> > set datefirst 7
>
> > SP executes differently in both environment..
>
> > My question is, why SQL Server is choosing such an inefficient plan
> > for constant subquery ?
>
> > Regards:
> >  Marcin Zacharzewski- Ukryj cytowany tekst -
>
> - Poka¿ cytowany tekst -
Author
2 Mar 2007 10:27 AM
mzacharzewski
So you mean, it is OK, and designed behaviour?

Just imagine how dramatically this decreases performance - some
queries using this senseless plan run about 10 minutes instead of 5
sec!!!

In my opinion there must be a bug in optimizer. Hope fix comes ASAP.

Regards:
  Marcin Zacharzewski (MCDBA, MCSD, MCT, MCSE)


Show quoteHide quote
On 2 Mar, 06:12, "oj" <nospam_oj...@home.com> wrote:
> sqlserver is cost-based optimizer. depending on the amount of data, index,
> stats, it will pick the best plan that it can evaluate. there is also an
> excellent article on this subject on technet. i suggest you take a read.
>
> http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx
>
> also, i recommend you take a look at bart's blog:http://blogs.msdn.com/bartd/archive/2006/07/25/678015.aspx
>
> --
> -oj
>
> <mzacharzew***@linksoft.pl> wrote in message
>
> news:1172806655.928429.237480@j27g2000cwj.googlegroups.com...
> Thank you,
> for your quick answer.
> Yes, I know I can use joins usually, but sometimes I can't use them,
> for example with SQL CE RDA synchronization - when you get a table
> with TRACKINGON, SQL CE refuses selects with joins - so all the
> necessary conditions must be set in WHERE.
>
> But the question is still - why SQL Server chooses so strange/
> inefficient execution plan ? Why reevaluates constant subquery ?
>
> Regards:
>     Marcin Zacharzewski
>
> On 2 Mar, 04:19, "oj" <nospam_oj...@home.com> wrote:
>
>
>
> > Try rewrite it as a join.
>
> > e.g.
> >  SELECT Col1, Col2
> >  FROM TableA join TableB on TableA .ID_Data=TableB.ID
> >  WHERE  TableA.Val = 123
>
> > Also, have you looked at this article. Excellent info
> > there.http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
>
> > --
> > -oj
>
> > <mzacharzew***@linksoft.pl> wrote in message
>
> >news:1172802851.773779.16460@z35g2000cwz.googlegroups.com...
>
> > > Hi All,
> > > I have a great problem with SQL Server 2000 (Ent. Ed., sp4,
> > > 8.00.2187) performance.
> > > I don't know exactly when discribed behaviour of SQL Svr has started
> > > (maybe after sp, fix?), but I am sure, this performed well in the
> > > past.
> > > The server has started to choose bad plans when executing stored
> > > procedures.
> > > Example:
> > > SELECT Col1, Col2
> > > FROM TableA
> > > WHERE ID_Data IN ( select ID from TableB where Val = 123 )
>
> > > - TableA -> 6000000 rows;
> > > - TableB -> 7000 rows;
> > > - query: (select ID from TableB where Val = 123 )   -> returns 10 rows
> > > IDs: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> > > - all indexes are present and rebuild:
> > >   TableA (ID_Data)
> > >   TableB (Val, ID)
> > > - statistics are updated with full scan
>
> > > SQL Server chooses to execute subquery as correlated subquery -
> > > executes subquery once for every row from TableA - why?
>
> > > Different query:
> > > SELECT Col1, Col2
> > > FROM TableA
> > > WHERE ID_Data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
>
> > > executes as expected (index seek TableA.ID_Data)
>
> > > In some places I had to use query plan hints in order to achive
> > > satisfactory perfomance - but in my opinion SQL Svr shouldn't behave
> > > that way..
> > > This is just an easy example, but the problem exists in many SP's /
> > > queries much more complicated - the reason is always the same:
> > > subquery executed in a bad way.
>
> > > I've noticed same SP executed in query analyzer - chooses proper
> > > execution plan, but
> > > called from web page (ASP.NET 2.0) - chooses bad plan described above.
> > > I know SQL Svr stores different SP execution plans, but even with all
> > > the same settings:
> > > set quoted_identifier on
> > > set implicit_transactions off
> > > set cursor_close_on_commit off
> > > set ansi_warnings on
> > > set ansi_padding on
> > > set ansi_nulls on
> > > set concat_null_yields_null on
> > > set language us_english
> > > set dateformat mdy
> > > set datefirst 7
>
> > > SP executes differently in both environment..
>
> > > My question is, why SQL Server is choosing such an inefficient plan
> > > for constant subquery ?
>
> > > Regards:
> > >  Marcin Zacharzewski- Ukryj cytowany tekst -
>
> > - Poka¿ cytowany tekst -- Ukryj cytowany tekst -
>
> - Pokaż cytowany tekst -
Author
3 Mar 2007 9:58 PM
Gert-Jan Strik
Marcin, see inline

mzacharzew***@linksoft.pl wrote:
>
> Hi All,
> I have a great problem with SQL Server 2000 (Ent. Ed., sp4,
> 8.00.2187) performance.
> I don't know exactly when discribed behaviour of SQL Svr has started
> (maybe after sp, fix?), but I am sure, this performed well in the
> past.
> The server has started to choose bad plans when executing stored
> procedures.
> Example:
>  SELECT Col1, Col2
>  FROM TableA
>  WHERE ID_Data IN ( select ID from TableB where Val = 123 )

Does your procedure really have the literal 123 in it, or are you using
a parameter (or variable) instead?

> - TableA -> 6000000 rows;
> - TableB -> 7000 rows;
> - query: (select ID from TableB where Val = 123 )   -> returns 10 rows
> IDs: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> - all indexes are present and rebuild:
>    TableA (ID_Data)
>    TableB (Val, ID)
> - statistics are updated with full scan
>
> SQL Server chooses to execute subquery as correlated subquery -
> executes subquery once for every row from TableA - why?

Because it estimates that the alternative query plan is more expensive.

This would be the case if the optimizer estimates that the subquery will
return many rows and that looking up all those matching rows in TableA
would require more I/O than a table scan.

Another situation when this might happen is when the data distribution
of ID_Data in TableA is skewed, and the optimizer estimates that one of
the high frequency values might be returned by the subquery.

> Different query:
>  SELECT Col1, Col2
>  FROM TableA
>  WHERE ID_Data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
>
> executes as expected (index seek TableA.ID_Data)
>
> In some places I had to use query plan hints in order to achive
> satisfactory perfomance - but in my opinion SQL Svr shouldn't behave
> that way..

At compile time, the optimizer does not which IDs the subquery will
return. If you are using a parameter or variable then it doesn't even
know how many rows the subquery will select. If the values of ID_Data in
TableA are not evenly distributed, then the optimizer has no choice but
to select the conservative strategy of the table scan.

Show quoteHide quote
> This is just an easy example, but the problem exists in many SP's /
> queries much more complicated - the reason is always the same:
> subquery executed in a bad way.
>
> I've noticed same SP executed in query analyzer - chooses proper
> execution plan, but
> called from web page (ASP.NET 2.0) - chooses bad plan described above.
> I know SQL Svr stores different SP execution plans, but even with all
> the same settings:
> set quoted_identifier on
> set implicit_transactions off
> set cursor_close_on_commit off
> set ansi_warnings on
> set ansi_padding on
> set ansi_nulls on
> set concat_null_yields_null on
> set language us_english
> set dateformat mdy
> set datefirst 7
>
> SP executes differently in both environment..
>
> My question is, why SQL Server is choosing such an inefficient plan
> for constant subquery ?

See above. By the way: what is a constant subquery? The optimizer cannot
assume any table data to be constant.

HTH,
Gert-Jan

Show quoteHide quote
> Regards:
>   Marcin Zacharzewski
Author
4 Mar 2007 12:42 AM
mzacharzewski
Thank you for your answer, see inline for my comments:



On 3 Mar, 22:58, Gert-Jan Strik <s...@toomuchspamalready.nl> wrote:
Show quoteHide quote
> Marcin, see inline
>
> mzacharzew***@linksoft.pl wrote:
>
> > Hi All,
> > I have a great problem with SQL Server 2000 (Ent. Ed., sp4,
> > 8.00.2187) performance.
> > I don't know exactly when discribed behaviour of SQL Svr has started
> > (maybe after sp, fix?), but I am sure, this performed well in the
> > past.
> > The server has started to choose bad plans when executing stored
> > procedures.
> > Example:
> >  SELECT Col1, Col2
> >  FROM TableA
> >  WHERE ID_Data IN ( select ID from TableB where Val = 123 )
>
> Does your procedure really have the literal 123 in it, or are you using
> a parameter (or variable) instead?
Yes, there was a literal 711 exactly:)

>
> > - TableA -> 6000000 rows;
> > - TableB -> 7000 rows;
> > - query: (select ID from TableB where Val = 123 )   -> returns 10 rows
> > IDs: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> > - all indexes are present and rebuild:
> >    TableA (ID_Data)
> >    TableB (Val, ID)
> > - statistics are updated with full scan
>
> > SQL Server chooses to execute subquery as correlated subquery -
> > executes subquery once for every row from TableA - why?
> Because it estimates that the alternative query plan is more expensive.

But TableA is so big, and TableB is so small - is there any other more
expensive plan then correlated subquery??? That's why I was so
surprised!

Show quoteHide quote
> This would be the case if the optimizer estimates that the subquery will
> return many rows and that looking up all those matching rows in TableA
> would require more I/O than a table scan.
> Another situation when this might happen is when the data distribution
> of ID_Data in TableA is skewed, and the optimizer estimates that one of
> the high frequency values might be returned by the subquery.
>
> > Different query:
> >  SELECT Col1, Col2
> >  FROM TableA
> >  WHERE ID_Data IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
>
> > executes as expected (index seek TableA.ID_Data)
>
> > In some places I had to use query plan hints in order to achive
> > satisfactory perfomance - but in my opinion SQL Svr shouldn't behave
> > that way..
>
> At compile time, the optimizer does not which IDs the subquery will
> return. If you are using a parameter or variable then it doesn't even
> know how many rows the subquery will select. If the values of ID_Data in
> TableA are not evenly distributed, then the optimizer has no choice but
> to select the conservative strategy of the table scan.
Yes, but compiler knows that TableB has 7000 rows, and TableA has
6000000 rows, also knows the statistical data about Val data in TableB
- for any TableB returns 10 rows in average and 100 rows maximum...
So these parameters suggest that seeking index on TableB for each one
record from TableA is the worst solution.

Show quoteHide quote
> > This is just an easy example, but the problem exists in many SP's /
> > queries much more complicated - the reason is always the same:
> > subquery executed in a bad way.
>
> > I've noticed same SP executed in query analyzer - chooses proper
> > execution plan, but
> > called from web page (ASP.NET 2.0) - chooses bad plan described above.
> > I know SQL Svr stores different SP execution plans, but even with all
> > the same settings:
> > set quoted_identifier on
> > set implicit_transactions off
> > set cursor_close_on_commit off
> > set ansi_warnings on
> > set ansi_padding on
> > set ansi_nulls on
> > set concat_null_yields_null on
> > set language us_english
> > set dateformat mdy
> > set datefirst 7
>
> > SP executes differently in both environment..
>
> > My question is, why SQL Server is choosing such an inefficient plan
> > for constant subquery ?
>
> See above. By the way: what is a constant subquery? The optimizer cannot
> assume any table data to be constant.
Constant subquery - I meant such a query should be estimated once
during execution (like deterministic functions).

Here is the whole exact query:
------- START
SELECT ID_Survey_Question, Answer_Date, ID_Outlet, Txt
FROM Surveys_Answers with (nolock)
WHERE ( Param = @Param AND answer_date > dateadd( mm, -2, getdate()))
OR ( @Param like '4%' AND  id_survey_question in (select
id_survey_question from surveys_questions where id_survey = 711))
------- END

@Param -> varchar(6) parameter always 6 digits egz: '123456', '432123'
and so on

Table Survey_Answers indexes:
Clustered: Answer_date desc, ID_Outlet, ID_Survey_Question
Nonclustered: ID_Survey_Question, ID_Outlet, Param

Table Surveys_Questions indexes:
Clustered: ID_Survey_Question
Nonclustered: ID_Survey

Maybe that "OR" condition, or @Param change anything? - but shouldn't
in my opinion - why then in some situations (the same connection
properties) the query executes a proper plan ?

Regards:
    Marcin Zacharzewski
Author
4 Mar 2007 11:47 PM
Gert-Jan Strik
Marcin,

Show quoteHide quote
> > > - TableA -> 6000000 rows;
> > > - TableB -> 7000 rows;
> > > - query: (select ID from TableB where Val = 123 )   -> returns 10 rows
> > > IDs: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> > > - all indexes are present and rebuild:
> > >    TableA (ID_Data)
> > >    TableB (Val, ID)
> >
> > At compile time, the optimizer does not which IDs the subquery will
> > return. If you are using a parameter or variable then it doesn't even
> > know how many rows the subquery will select. If the values of ID_Data in
> > TableA are not evenly distributed, then the optimizer has no choice but
> > to select the conservative strategy of the table scan.
>
> Yes, but compiler knows that TableB has 7000 rows, and TableA has
> 6000000 rows, also knows the statistical data about Val data in TableB
> - for any TableB returns 10 rows in average and 100 rows maximum...
> So these parameters suggest that seeking index on TableB for each one
> record from TableA is the worst solution.

Not necessarily.

Before I explain this, some assumptions: I will assume that the index on
TableA(ID_Data) is not clustered. Otherwise, you most probably would see
a different query plan. That is also why I will assume that this index
does not cover all the columns of TableA that are required for this
query. If these assumptions are incorrect, then please let us know.

With up to date statistics, the optimizer knows that the select from
TableB will return approximately 10 rows, but it does not know which
values. It also knows how many distinct values of ID_Data there are in
TableA, and has information about the number of rows per ID_Data (or
group of ID_Data values). Most likely, each ID_Data value has many rows
in TableA, anywhere from a 1000 to several 1000 of rows. If the
selection from TableB returns 10 rows, and the nonclustered index on
TableA is used, then the storage engine would need 10 index seeks /
partial scans on the nonclustered index, which could easily result in
10,000 entries that have to be looked up in the base table. That are
10,000 potential I/O's. That is probably less than what a table scan
would require.

But if the optimizer estimates 20 rows as the result of the subquery
(which could happen because the statistics of TableB values are probably
in range groups), and estimates 2500 rows in TableA per ID_Data, then
that might result in 50,000 I/O's which might be more than what a table
scan would require. I am guessing here, because I don't know how many
pages are really reserved for TableA, I don't know the estimate for the
TableB subquery and I don't know the exact worst case estimate for
ID_Data values.

Show quoteHide quote
> Here is the whole exact query:
> ------- START
> SELECT ID_Survey_Question, Answer_Date, ID_Outlet, Txt
> FROM Surveys_Answers with (nolock)
> WHERE ( Param = @Param AND answer_date > dateadd( mm, -2, getdate()))
> OR ( @Param like '4%' AND  id_survey_question in (select
> id_survey_question from surveys_questions where id_survey = 711))
> ------- END

> Table Survey_Answers indexes:
> Clustered: Answer_date desc, ID_Outlet, ID_Survey_Question
> Nonclustered: ID_Survey_Question, ID_Outlet, Param

> Maybe that "OR" condition, or @Param change anything? - but shouldn't
> in my opinion - why then in some situations (the same connection
> properties) the query executes a proper plan ?

Yeah, it does change things. It means the query cannot be satisfied if
only mathing id_survey_questions are looked up, because of the other
predicates. For the predicate "Param = @Param", there is no suitable
index. For the predicate "answer_date > dateadd(mm, -2, getdate())" the
optimizer might want to use the clustered index.

Now both these predicates are also covered by the nonclustered index,
but that would require a full nonclustered index scan, followed by
bookmark lookups.

If there are statistics of column Param (which are probably
autocreated), then the value of @Param is very important too, because
assuming Parameter Sniffing is used (if you don't know what that is, you
should look it up) this would give the optimizer a good estimate of the
maximum number of qualifying TableA rows (for the first two predicates).

The query may seems simple, but for the optimizer there are many
variables in this: the two sets of predicates and an "OR", the
nondeterministic getdate() and parameter sniffing for @Param.

HTH,
Gert-Jan

Bookmark and Share

Post Thread options