You may not know the character of the comparison in the "secret"

Source: Internet
Author: User

Original: You may not know the character of the comparison in the "secret"

Sometimes, a simple character comparison, you may also be confused. Why do you say that? Take a look at the following example (the code is not posted because the page is later found to not support the display of both characters). Guess what the result would be? Is it 1 or 0? Before you answer this question, please continue looking down. Create a database of several different collations (see the database name).

Figure-1: Comparison under SQL_Latin1_General_CP1_CI_AS collation

Figure-2: Comparison under Chinese_prc_ci_as collation

In SQL_Latin1_General_CP1_CI_AS or chinese_prc_ci_as two different sort rules, these two characters should be equal!! It's obvious that it's not equal! Using the Unicode function conversion is also invalid. What's going on?! Here is the answer to this question. The original is in English, I have translated into Chinese (English proficiency is limited, mistakes are unavoidable, it is best to see the original Kazakhstan).

Attached to the original address:

Http://blogs.msdn.com/b/sqlprogrammability/archive/2006/04/07/570397.aspx

Undefined character sorting problem in Unicode or Windows collation

When you compare two Unicode strings in a non-binary collation, SQL Server uses a library that is basically the same as the Windows API CompareStringW. This library defines a weight value (Weight) for each identified character, and compares the characters with this weight value. However, not all code points have been defined in the sort library. The reasons why they may be undefined are:

    1. Code points are not defined in the Unicode standard.

    2. Code points are defined in the Unicode standard but are not defined in Windows. This takes time and effort to define the sort of language semantics for new characters. Windows teams often need to work with standard organizations and/or regional project managers at the time to define collations for new characters. They add support for new characters in each version and try to catch up. Some characters may already have a font definition, so they may appear normally, but the comparison is still not defined. such as NCHAR (13144)-NCHAR (13174).

    3. Code points are defined in Windows but are not defined in SQL Server.

The Windows NLS team has decided that undefined characters are ignored when compared, in part because there is no good way to compare undefined characters with other defined characters. SQL Server inherits this semantics. This has indeed led to some confusing behavior. Take a look at the following example.

Declare @undefined_char1 nvarchar(Ten),@undefined_char2 nvarchar(Ten)Set @undefined_char1 = nchar(0x0000)Set @undefined_char2 = nchar(13144)Select 'undefine characters compare equal to empty string'where @undefined_char1 = "'Select 'All undefine characters compare equal'where @undefined_char1 = @undefined_char2Go
Create TableT (cnvarchar(Ten))GoCreate Unique IndexIt onT (c)Go--First insert succeeds, but second inserts fails with duplicate key error.InsertTValues(nchar(0x0000))InsertTValues(nchar(13144))Go

As you can see, because all the undefined character comparisons are equal, they cause duplicate key errors. Similarly, if you create a table that does not have a defined character as the table name, and then try to create another table that does not have a defined character as the table name, the second table will fail because the table name repeats, even if the code points for the two undefined characters are different. This can also lead to confusing results in a built-in string match (function) such as CHARINDEX, PATINDEX, or like.

Although these results may seem confusing, the basic rules are simple. That is, comparisons that do not define characters and strings are ignored. Once you understand and remember this rule, this behavior is easy to understand.

As long as there are undefined characters the parameters will be ignored. Since this is the behavior on the Windows platform, there is no absolute better way to sort them, and right-compatible, we want to maintain this behavior.

If your application uses these undefined characters and treats them as regular characters, you can use binary collation (binary collation). In binary collations, comparisons are entirely based on code points, not language rules, and therefore there are no defined and undefined concepts. Finish

After reading this blog, you should understand what's going on. The reason: SQL Server is considered equal when compared to characters that are not defined. If the actual application is to be treated as a regular character, a binary comparison can be made. Because the binary comparison is purely based on the character code point comparison. So it's not difficult to solve the initial problem as follows:

Figure-3: Comparison under Latin1_General_BIN collation

It is not difficult to deduce that, in a database under binary collation, the two characters are not equal by default.

Figure-4: Comparison under Latin1_General_BIN collation

FIGURE-5: Unable to insert two rows of records under SQL_Latin1_General_CP1_CI_AS collation

Figure-6: The two rows of records were successfully inserted under the Latin1_General_BIN collation.

You may not know the character of the comparison in the "secret"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.