Sort out the SQL Server collation

Sort out the SQL Server collation _mssql

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

SQL Server collation is not usually used a lot, perhaps many beginners are still relatively unfamiliar, but there
An error should be often encountered: SQL Server database, in the cross-Library multiple table connection query, if two data
The library default character set is different, and the system returns such an error:

"Unable to resolve collation violation for equal to operation. ”

I. Error ANALYSIS:
This error is due to inconsistent collation, we do a test, such as:
CREATE TABLE #t1 (
Name varchar (COLLATE) ALBANIAN_CI_AI_WS,
Value int)

CREATE TABLE #t2 (
Name varchar (COLLATE) CHINESE_PRC_CI_AI_WS,
Value int)

When the table is built, execute the connection query:

SELECT * FROM #t1 A inner join #t2 B on A.name=b.name

In this way, the error occurs:

Server: Message 446, Level 16, State 9, line 1
Unable to resolve collation violation for equal to operation.
The easiest way to troubleshoot this error is to specify its collation when the table joins, so that the error
No longer appears. Statement to write this:

SELECT *
From #t1 A INNER join #t2 B
On A.name=b.name Collate Chinese_prc_ci_ai_ws

Two. Introduction to the collation:

What is a sort rule? Ms describes this: "in Microsoft SQL Server 2000,
The physical storage of strings is controlled by collations. Collations specify the bit patterns that represent each character and the Save
Rules used to store and compare characters. "
Execute the following statement within Query Analyzer to get all the collations supported by SQL Server.

SELECT * FROM:: Fn_helpcollations ()

The collation name is composed of two parts, and the first half refers to the character set supported by this collation.
Such as:
Chinese_prc_cs_ai_ws
First half: Refers to the Unicode character set, the collation of the CHINESE_PRC_ pointer to the Chinese simplified characters Unicode.
The latter part of the collation is the suffix meaning:
_bin binary Sorting
_ci (CS) is case-sensitive, CI does not differentiate, CS differentiates
_ai (AS) is accent-sensitive, AI does not differentiate, as differentiates
Whether _ki (KS) differentiates kana types, KI does not differentiate, KS differentiates
Whether _wi (WS) differentiates width WI does not differentiate, WS differentiates

Case sensitive: Select this option if you want comparisons to treat uppercase and lowercase letters as unequal.
Accent Sensitivity: Select this option if you want comparisons to treat accent and unstressed letters as unequal. If you select this option,
The comparison also treats letters with different accents as unequal.
Kana: Select this option if you want comparisons to treat katakana and hiragana Japanese syllables as unequal.
Width sensitive: If you want the comparison to treat Half-width and full-width characters as unequal, select this option

Three. Application of sorting rules:
SQL Server provides a large number of Windows and SQL Server-specific collations, but it is often used
Ignored by the developer. In fact, it is very useful in practice.

Example 1: Let the contents of the table Name column be sorted by phonetic alphabet:

CREATE TABLE #t (ID int,name varchar (20))
Insert #t Select 1, ' Medium '
UNION ALL Select 2, ' Kingdom '
UNION ALL Select 3, ' Man '
UNION ALL Select 4, ' Ah '

SELECT * from #t ORDER by name collate Chinese_prc_cs_as_ks_ws
drop table #t
/* Results:
ID Name
-----------  --------------------
4 O
2 countries
3 people
1 in
*/

Example 2: Let the contents of the table Name column be sorted by last name stroke:

CREATE TABLE #t (ID int,name varchar (20))

Insert #t Select 1, ' Three '
UNION ALL Select 2, ' B '
UNION ALL Select 3, ' II '
UNION ALL Select 4, ' one '
UNION ALL Select 5, ' Ten '
SELECT * from #t ORDER by name collate Chinese_prc_stroke_cs_as_ks_ws
drop table #t
/* Results:
ID Name
-----------  --------------------
41
2 b
32
50
13
*/

Four. In practice, the extension of the collation application
SQL Server Chinese collation can be sorted by pinyin, strokes, and so how do we take advantage of this feature
To deal with some of the problems of Chinese characters? Let me give you an example:

Calculation of strokes of Chinese characters by using the characteristics of sorting rules

To calculate the strokes of Chinese characters, we have to do preparations first, we know that Windows Multi-national kanji, Unicode currently
A total of 20,902 Chinese characters are included. Simplified GBK Code Chinese Unicode value starting from 19968.
First, we use the SQL Server method to get all the Chinese characters, without the dictionary, we simply use the SQL statement
can be obtained:

Select Top 20902 code=identity (int,19968,1) to #t from syscolumns A,syscolumns b

With the following statement, we get all the Chinese characters, which are sorted by Unicode values:

Select Code,nchar (code) as Cnword from #t

We then use the SELECT statement to sort it by stroke.

Select Code,nchar (code) as Cnword
From #t
ORDER by NCHAR (code) COLLATE Chinese_prc_stroke_cs_as_ks_ws,code

Results:
Code Cnword
-----------  ------
199,681
20008 丨
20022,
20031 in Heart
20032 乀
20033 乁
20057 b
20058 乚
20059 乛
20101 亅
19969 Ding
..........

From the above results, we can clearly see that a sum of Chinese characters, code is from 19968 to 20101, from small to large row, but to
Two characters in the first word "ding", the code is 19969, not in order to start again. With this result, we can light
Loosely use SQL statements to get the first or last character of each stroke Chinese character collation.
The following sentence is used to get the last character:

CREATE table #t1 (id int identity,code int,cnword nvarchar (2))

Insert #t1 (Code,cnword)
Select Code,nchar (code) as Cnword from #t
ORDER by NCHAR (code) COLLATE Chinese_prc_stroke_cs_as_ks_ws,code

Select A.cnword
From #t1 A
Left join #t1 B on a.id=b.id-1 and A.code<b.code
where B.code is null
ORDER BY a.ID

Get 36 Chinese characters, each Chinese character is the number of strokes per chinese_prc_stroke_cs_as_ks_ws sorted by
Last character:

亅阝 Ma Finglong poison 龀 niche 龂龆 gingival 龊龠龎龑龡 kotewall 龝齹龣龥齈龞麷鸞麣龖龗齾齉 Uighurs

As can be seen above, "亅" is the last word after all the Chinese characters are sorted, "阝" is the last one after all the two characters sorted.
A word ... Wait a minute.
But at the same time also found that from the 33rd Chinese characters "龗 (33 strokes)" After the stroke is a bit messy, incorrect. But it's okay, more than "龗" strokes
More than four Chinese characters, we manually add: 齾 35 strokes, 齉 36, 靐 39, Uighurs 64 pens

Build Chinese character stroke table (TAB_HZBH):
CREATE TABLE TAB_HZBH (ID int identity,cnword nchar (1))
--Insert the first 33 characters first
Insert Tab_hzbh
Select Top A.cnword
From #t1 A
Left join #t1 B on a.id=b.id-1 and A.code<b.code
where B.code is null
ORDER BY a.ID
--plus the last four characters
SET Identity_insert TAB_HZBH on
Go
Insert TAB_HZBH (Id,cnword)
Select 35,n ' 齾 '
UNION ALL select 36,n ' 齉 '
UNION ALL select 39,n ' 靐 '
UNION ALL select 64,n ' Uighurs '
Go
SET Identity_insert TAB_HZBH off
Go

So far, we can get the result, like we want the Chinese character "country" strokes:

DECLARE @a nchar (1)
Set @a= ' Country '
Select top 1 ID
From TAB_HZBH
where Cnword>=@a collate Chinese_prc_stroke_cs_as_ks_ws
ORDER BY ID

Id
-----------
8
(Result: Chinese character "country" strokes number is 8)

All the preparation process above, just to write the following function, this function set aside all the temporary table and solid
The table, in order to common and code transfer convenience, the table TAB_HZBH content in the statement, and then calculate the user input a string of
Total strokes of Chinese characters:

Create function fun_getbh (@str nvarchar (4000))
returns int
As
Begin
DECLARE @word nchar (1), @n int
Set @n=0
While Len (@str) >0
Begin
Set @word =left (@str, 1)
--If not Chinese characters, stroke 0
Set @n=@n+ (case when Unicode (@word) between 19968 and 19968+20901
Then (select top 1 ID from (
Select 1 as Id,n ' 亅 ' as Word
UNION ALL select 2,n ' 阝 '
UNION ALL select 3,n ' ma '
UNION ALL select 4,n ' Wind '
UNION ALL select 5,n ' Dragon '
UNION ALL select 6,n ' Qi '
UNION ALL select 7,n ' Turtle '
UNION ALL select 8,n ' teeth '
UNION ALL select 9,n ' poison '
UNION ALL select 10,n ' 龀 '
UNION ALL select 11,n ' niches '
UNION ALL select 12,n ' 龂 '
UNION ALL select 13,n ' 龆 '
UNION ALL select 14,n ' Gingiva '
UNION ALL select 15,n ' 龊 '
UNION ALL select 16,n ' Dragon '
UNION ALL select 17,n ' 龠 '
UNION ALL select 18,n ' 龎 '
UNION ALL select 19,n ' Bond '
UNION ALL select 20,n ' 龑 '
UNION ALL select 21,n ' 龡 '
UNION ALL select 22,n ' Kotewall '
UNION ALL select 23,n ' 龝 '
UNION ALL select 24,n ' 齹 '
UNION ALL select 25,n ' 龣 '
UNION ALL select 26,n ' 龥 '
UNION ALL select 27,n ' 齈 '
UNION ALL select 28,n ' 龞 '
UNION ALL select 29,n ' 麷 '
UNION ALL select 30,n ' 鸞 '
UNION ALL select 31,n ' 麣 '
UNION ALL select 32,n ' 龖 '
UNION ALL select 33,n ' 龗 '
UNION ALL select 35,n ' 齾 '
UNION ALL select 36,n ' 齉 '
UNION ALL select 39,n ' 靐 '
UNION ALL select 64,n ' Uighurs '
) T
where word>= @word collate chinese_prc_stroke_cs_as_ks_ws
Order BY ID ASC) ELSE 0 end)
Set @str =right (@str, Len (@str)-1)
End
Return @n
End

--Function Invocation instance:
Select DBO.FUN_GETBH (' PRC '), DBO.FUN_GETBH (' People's Republic of China ')
　
Implementation results: The total number of strokes are 39 and 46 respectively, Jane traditional all line.

Of course, you can also put the above "UNION all" in the Chinese characters and strokes in the fixed table, in Chinese characters
Lie Jian clustered INDEX, the column collation is set to:
Chinese_prc_stroke_cs_as_ks_ws
That's faster. If you are using the BIG5 code of the operating system, you have to create a different Chinese characters, the same method.
But one thing to keep in mind: These characters are select from the SQL statement, not manually input, not
Look up the dictionary, because the Xinhua dictionary is different from the Unicode character set, the results of the dictionary
Indeed

　　
Using the character of collation to get the first letter of Chinese phonetic Alphabet

Using the same method as the total number of strokes, we can also write a function that asks for the first letter of Chinese pinyin. As follows:

Create function fun_getpy (@str nvarchar (4000))
Returns nvarchar (4000)
As
Begin
DECLARE @word nchar (1), @PY nvarchar (4000)
Set @PY = '
While Len (@str) >0
Begin
Set @word =left (@str, 1)
--If non-Chinese characters, return the original character
Set @PY = @PY + (case when Unicode (@word) between 19968 and 19968+20901
Then (select top 1 PY from (
Select ' A ' as Py,n ' 驁 ' as Word
UNION ALL SELECT ' B ', N ' book '
UNION ALL SELECT ' C ', N ' wrong '
UNION ALL SELECT ' D ', N ' 鵽 '
UNION ALL SELECT ' E ', N ' 樲 '
UNION ALL SELECT ' F ', N ' 鰒 '
UNION ALL SELECT ' G ', N ' hiker '
UNION ALL SELECT ' H ', N ' 夻 '
UNION ALL SELECT ' J ', N ' 攈 '
UNION ALL SELECT ' K ', N ' 穒 '
UNION ALL SELECT ' L ', N ' 鱳 '
UNION ALL SELECT ' M ', N ' temperature '
UNION ALL SELECT ' n ', n ' 桛 '
UNION ALL SELECT ' O ', N ' 漚 '
UNION ALL SELECT ' P ', N ' exposure '
UNION ALL SELECT ' Q ', N ' 囕 '
UNION ALL SELECT ' R ', N ' 鶸 '
UNION ALL SELECT ' S ', N ' 蜶 '
UNION ALL SELECT ' T ', N ' 籜 '
UNION ALL SELECT ' W ', N ' 鶩 '
UNION ALL SELECT ' X ', N ' 鑂 '
UNION ALL SELECT ' Y ', N ' rhythmic '
UNION ALL SELECT ' Z ', N ' out '
) T
where word>= @word collate chinese_prc_cs_as_ks_ws
ORDER BY PY ASC) Else @word end)
Set @str =right (@str, Len (@str)-1)
End
Return @PY
End

--Function Invocation instance:
Select Dbo.fun_getpy (' PRC '), dbo.fun_getpy (' People's Republic of China ')
The results are: ZHRMGHG

If you are interested, you can also use the same method, expand to get the Chinese characters full spell function, and even get full spell reading
Tone, but the whole classification is mostly. The best is to use the comparison table, more than 20,000 Chinese characters search quickly, with the control
Tables also make the most of the table's indexes.
There are many other ingenious uses for sorting rules. Welcome everybody to discuss together.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Sort out the SQL Server collation _mssql

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Sort out the SQL Server collation _mssql

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support