Home All Groups Group Topic Archive Search About

collation for multiple language in a column

Author
23 Jun 2009 9:43 AM
Boyan
I need to store different languages of text in a single column as nvarchar.
When selecting text from this column targeting a particular language, I wish
I get the results sort by meaning dictionary order. One way is to append a
collate clause with the appropriate collation.
  order by collate Chinese_PRC_CI_A
or when  filtering data
   column_name = value collate Chinese_PRC_CI_A

If there is an index on this column, I guess the index is physically
arranged based on the default column collation (latin general). In this case,
if I specify a collate clause when filtering or sorting, will the index be
used? Or in general, will using a collate clause bring performance penalty?

Author
25 Jun 2009 10:18 PM
Erland Sommarskog
Boyan (Bo***@discussions.microsoft.com) writes:
> I need to store different languages of text in a single column as
> nvarchar. When selecting text from this column targeting a particular
> language, I wish I get the results sort by meaning dictionary order. One
> way is to append a collate clause with the appropriate collation.
>   order by collate Chinese_PRC_CI_A
> or when  filtering data
>    column_name = value collate Chinese_PRC_CI_A
>
> If there is an index on this column, I guess the index is physically
> arranged based on the default column collation (latin general). In this
> case, if I specify a collate clause when filtering or sorting, will the
> index be used? Or in general, will using a collate clause bring
> performance penalty?

If the index is sorted according to one collation, and you want data
according to another, the index is not useful for what you want, so
yes, there will be a performance penalty.

If you need to sort by different collations, and need to have indexes
to support all sorts, you can add computed columns and then index these.

   chinese_data AS data COLLATE Chinesse_PRC_CI_AS


--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Are all your drivers up to date? click for free checkup

Author
26 Jun 2009 1:41 AM
Boyan
Hi Erland, thanks for you reply. The computed column probably won't work for
me, because it forces me to add one column for one lanuage. If that is the
case, I can divide the column into multiple columns in the first place.
Regarding to the performance penalty, do you have any data revealing the
difference in real world example?

Show quoteHide quote
"Erland Sommarskog" wrote:

> Boyan (Bo***@discussions.microsoft.com) writes:
> > I need to store different languages of text in a single column as
> > nvarchar. When selecting text from this column targeting a particular
> > language, I wish I get the results sort by meaning dictionary order. One
> > way is to append a collate clause with the appropriate collation.
> >   order by collate Chinese_PRC_CI_A
> > or when  filtering data
> >    column_name = value collate Chinese_PRC_CI_A
> >
> > If there is an index on this column, I guess the index is physically
> > arranged based on the default column collation (latin general). In this
> > case, if I specify a collate clause when filtering or sorting, will the
> > index be used? Or in general, will using a collate clause bring
> > performance penalty?

> If the index is sorted according to one collation, and you want data
> according to another, the index is not useful for what you want, so
> yes, there will be a performance penalty.
>
> If you need to sort by different collations, and need to have indexes
> to support all sorts, you can add computed columns and then index these.
>
>    chinese_data AS data COLLATE Chinesse_PRC_CI_AS
>
>
> --
> Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se
>
> Links for SQL Server Books Online:
> SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
> SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
> SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
>
>
Author
27 Jun 2009 9:02 AM
Erland Sommarskog
Boyan (Bo***@discussions.microsoft.com) writes:
> Hi Erland, thanks for you reply. The computed column probably won't work
> for me, because it forces me to add one column for one lanuage. If that
> is the case, I can divide the column into multiple columns in the first
> place. Regarding to the performance penalty, do you have any data
> revealing the difference in real world example?

To find out the performance impact, you need to run your own benchmarks.
The effect can be miniscule, and it can be magnitudes.

Say that you have a query that goes:

   SELECT ...
   FROM   ...
   ORDER  BY col COLLATE Chinese_PRC_CI_AS

The query returns 100 rows, and without the collate clause, it would have
been possible to use the index on col to satisfy the sorting. With the
COLLATE clause, SQL Server needs to add a Sort operator to the plan. With a
100 rows that's not a big deal.

But consider this query:

   SELECT TOP 100 ...
   FROM   ...
   ORDER  BY col


With the TOP clause, the query would return millions of rows. The query
is such that the TOP 100 restriction makes it very easy for the optimizer
to find 100, and there is no need to read the entire table. But if you
add a COLLATE clause, that is exactly what the optimizer will need to
do, because the index is not useful anymore.

As I've understood, you are looking at forcing the collation in the output.
In this case, you may get away with not indexing for each language. But
if you also want searches to respect collation rules, then you will
probably have no other choice.


--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

Bookmark and Share