Understanding Collate Chinese_prc_ci_as

Source: Internet
Author: User

we are CREATE TABLE often encounter such a statement, for example: password nvarchar (ten) collate chinese_prc_ci_as null , what does it mean? Take a look at the following:

First, Collate is a clause that can be applied to a database definition or column definition to define a collation, or to a string expression to apply a collation projection.

syntax is Collate collation_name

Collation_name ::={windows_collation_name}| {Sql_collation_name}

Parameters Collate_name is the name of the collation that is applied to the expression, column definition, or database definition. collation_name can be just the specified windows_collation_name or sql_collation_name.

Windows_collation_name is The collation name of the Windows collation. See Windows collation name. sql_collation_name is The collation name of the SQL collation. See SQL collation name.

Here's a quick introduction Sorting rules:

What sort of rules? ms " microsoft SQL Server The physical storage of strings is controlled by collations. Collations specify the bit patterns that represent each character and the rules that are used to save the and compare characters. " Execute the following statement inside the Query Analyzer, You can get sql Span style= "Font-size:medium" >server all collations supported.

    SELECT * FROM:: Fn_helpcollations ()

The collation name consists of two parts, and the first half refers to the character set supported by this collation. such as:  Chinese_prc_cs_ai_wsfirst half: meansUNICODECharacter Set,chinese_prc_pointers to Chinese Simplified charactersUNICODEsorting rules. the second half of the collation is the suffix Meaning:   _bin Binary Sort   _ci (CS) is case-sensitive,CIdo not differentiate,CSdifferentiate  _ai (AS) is accent-sensitive,AIdo not differentiate, asdifferentiate  _ki (KS) whether to differentiate kana types, KIdo not differentiate,KSdifferentiate_wi (WS) whether width is distinguished WIdo not differentiate,WSdifferentiate

Case Sensitive : Select this option if you want comparisons to treat uppercase and lowercase letters as unequal. Accent Sensitivity : Select this option if you want the comparison to treat accented and non-accented letters as unequal. If you select this option, The comparison also treats letters with different accents as unequal. Kana distinction : Select this option if you want the comparison to treat katakana and hiragana Japanese syllables as unequal. Width Difference : Select this option if you want the comparison to treat half-width characters and full-width characters as unequal

using collation features to calculate Chinese character strokes and obtain pinyin initials

SQL Server collation usually not a lot of use, perhaps a lot of beginners are unfamiliar, but there is a mistake people should often encounter: SQL Server database, in the cross-Library multi-table connection query, if the two databases default character set, the system will return such an error: "The collation violation for the equal to operation could not be resolved. ”

I. Error ANALYSIS: This error is due to inconsistent collation, we do a test, for example: Create TABLE #t1 (name varchar () collate albanian_ci_ai_ws, value int)

CREATE TABLE #t2 (name varchar (COLLATE) CHINESE_PRC_CI_AI_WS, value int)

After the table is built, execute the connection query:

SELECT * from #t1 A inner joins #t2 B on A.name=b.name

In this way, the error occurs:

Server: Msg 446, Level 16, State 9, Line 1 cannot resolve collation violation for equal to operation. The simplest way to exclude this error is to specify the collation of the table when it joins, so that the error no longer occurs. Statement to write this:

SELECT * from #t1 A inner joins #t2 B on A.name=b.name collate Chinese_prc_ci_ai_ws

Two. Introduction to collation Rules:

What sort of rules? MS is described as: "in Microsoft SQL Server 2000, the physical storage of strings is controlled by collations. Collations specify the bit patterns that represent each character and the rules that are used to store and compare characters. "Execute the following statement inside the Query Analyzer to get all the collations supported by SQL Server.

SELECT * FROM:: Fn_helpcollations ()

The collation name consists of two parts, and the first half refers to the character set supported by this collation. such as: Chinese_prc_cs_ai_ws first half: Refers to the Unicode character set, chinese_prc_ the pointer to the mainland simplified Unicode collation.       The second half of the collation is the suffix meaning: _bin binary sort _ci (CS) is case sensitive, CI is not differentiated, CS differentiates _ai (AS) whether accent-sensitive, AI does not differentiate, as differentiated _ki (KS) Whether the kana type is differentiated, KI does not differentiate, KS differentiates _WI (WS) Whether the width of the WI is differentiated, WS-Differentiated

Case Sensitivity: Select this option if you want the comparison to treat uppercase and lowercase letters as unequal. Accent Sensitivity: Select this option if you want the comparison to treat accented and non-accented letters as unequal. If you select this option, the comparison also treats letters with different accents as unequal. Kana distinction: Select this option if you want the comparison to treat katakana and hiragana Japanese syllables as unequal. Width difference: Select this option if you want the comparison to treat half-width characters and full-width characters as unequal

Three. Application of collations: SQL Server provides a large number of Windows-and SQL-server-specific collations, but its application is often overlooked by developers. In fact, it is of great use in practice.

Example 1: Make the contents of the table Name column sorted by phonetic alphabet:

CREATE TABLE #t (ID int,name varchar) Insert #t Select 1, ' Union ALL ' select 2, ' Country ' union ALL select 3, ' Man ' union All SE Lect 4, ' ah '

SELECT * from #t ORDER by name collate chinese_prc_cs_as_ks_ws DROP TABLE #t/* Result: ID Name----- --------------------------4, 2 countries, 3 people, 1 medium * *

Example 2: Make the contents of the table Name column sorted by last name stroke:

CREATE TABLE #t (ID int,name varchar (20))

Insert #t Select 1, ' Three ' union ALL select 2, ' B ' union ALL select 3, ' Two ' union ALL select 4, ' One ' union ALL select 5, ' Ten ' Select * from #t ORDER by name collate chinese_prc_stroke_cs_as_ks_ws DROP TABLE #t/* Result: ID Name--- ----------------------------412 B 3,263 */

Four. The extended SQL server Kanji collation applied in practice can be sorted by pinyin, stroke, etc. how do we use this function to deal with some of the problems of Chinese characters? I now give an example:

Calculation of Chinese character strokes with collation characteristics

To calculate the Chinese character strokes, we have to do the preparatory work, we know that the Windows multi-national characters, Unicode currently contains a total of 20,902 kanji.   Simplified GBK Code Chinese Unicode value starts from 19968. First of all, we use SQL Server method to get all the Chinese characters, without a dictionary, we can simply use the syntax to get:

Select Top 20902 code=identity (int,19968,1) to #t from syscolumns A,syscolumns b

With the following statement, we get all the kanji, which are sorted by Unicode value:

Select Code,nchar (code) as Cnword from #t

We then use the SELECT statement to sort it by stroke.

Select Code,nchar (code) as Cnword from #t ORDER by NCHAR (code) COLLATE Chinese_prc_stroke_cs_as_ks_ws,code

Result: Code cnword-----------------19,968,120,008 丨 20022, 20031 丿 20032 Cookbook 20033 Xiha 20057 b 20058 乚 20059 bereaved 20101 sanctioned 19969 D .....

From the above results, we can clearly see that a stroke of Chinese characters, code is from 19968 to 20101, from small to large rows, but to two characters of the first word "ding", code 19969, not in order to start again. With this result, we can easily use SQL statements to get the first or last Chinese character for each stroke. The following sentence is used to get the last kanji:

CREATE table #t1 (id int identity,code int,cnword nvarchar (2))

Insert #t1 (Code,cnword) Select Code,nchar (code) as Cnword from #t ORDER by NCHAR (code) COLLATE Chinese_prc_stroke_cs_as_ Ks_ws,code

Select A.cnword from #t1 A left join #t1 B on a.id=b.id-1 and a.code where b.code are null ORDER by a.id

Get 36 Chinese characters, each of which is the last Chinese character sorted by chinese_prc_stroke_cs_as_ks_ws sorting rule:

Sanctioned 阝 Ma Finglong qi turtle teeth poison ageing niches 龂 nightmares gum looser dragon Yue 龎 bond Qiuyan 龡 kotewall 龝 齹 龣 calls 齈 龞 麷 鸞 麣 龖 龗 齾 snuffle Tantrayana

As can be seen above: "Sanctioned" is the last word after all the Chinese characters are sorted, "阝" is the last word after all two Chinese characters are sorted ...   Wait a minute. But also found that from the 33rd Chinese character "龗 (33 pen)" After the stroke is a bit messy, incorrect. But it doesn't matter, more than "龗" stroke more than four characters, we add: 齾 35 pen, Snuffle 36 pen, 靐 39 pen, Tantrayana 64 pen

Chinese Character Stroke Table (TAB_HZBH): CREATE TABLE TAB_HZBH (ID int identity,cnword nchar (1))--insert first 33 Chinese characters inserts TAB_HZBH select top A.cnwor D from #t1 A left join #t1 B on a.id=b.id-1 and a.code where b.code are null ORDER by a.id--plus last four kanji set IDENTITY_INSERT t AB_HZBH on Go Insert tab_hzbh (id,cnword) Select 35,n ' 齾 ' union ALL Select 36,n ' snuffle ' union ALL Select 39,n ' 靐 ' UNION ALL s Elect 64,n ' Tantrayana ' Go Set identity_insert tab_hzbh off go

So far, we can get the results, like we want the strokes of the Chinese character "country":

DECLARE @a nchar (1) Set @a= ' state ' select top 1 ID from TAB_HZBH where cnword>[email protected] collate Chinese_prc_strok E_CS_AS_KS_WS ORDER BY ID

ID-----------8 (results: Chinese characters "country" stroke number is 8)

above all the preparation process, just to write the following function, this function is set aside all the temporary table and fixed table above, for general purpose and code transfer convenience, the table TAB_HZBH content written in the statement, The user then computes the total stroke of a string of characters:

Create function fun_getbh (@str nvarchar (4000)) returns INT as BEGIN DECLARE @word nchar (1), @n int set @n=0 while Len (@str) >0 begin set @word =left (@str, 1)--If non-kanji, stroke 0 Set @[email protected]+ (case when Unicode (@word) between 19968 and 1 9968+20901 then (select top 1 ID from (select 1 as Id,n ' sanctioned ' as Word UNION ALL select 2,n ' 阝 ' union ALL select 3,n ' ma ' union All select 4,n ' Wind ' union ALL select 5,n ' Dragon ' UNION ALL select 6,n ' Align ' union ALL select 7,n ' Turtle ' union ALL Select 8,n ' Teeth ' union al L Select 9,n ' Poison ' union ALL select 10,n ' ageing ' UNION ALL select 11,n ' niches ' union ALL select 12,n ' 龂 ' union ALL Select 13,n ' Nightmares ' union All select 14,n ' Gum ' union ALL select 15,n ' Looser ' union ALL select 16,n ' Dragon ' UNION ALL Select 17,n ' Yue ' union ALL select 18,n ' 龎 ' Uni On all select 19,n ' Bond ' all select 20,n ' Qiuyan ' union ALL Select 21,n ' 龡 ' union ALL Select 22,n ' kotewall ' union ALL Select 23,n ' 龝 ' UNION ALL select 24,n ' 齹 ' union ALL Select 25,n ' 龣 ' union ALL Select 26,n ' calls ' union ALL select 27,n ' 齈 ' union ALL Select 28,n ' 龞 ' UNION ALL Select 29,n' 麷 ' UNION ALL Select 30,n ' 鸞 ' union ALL Select 31,n ' 麣 ' union ALL Select 32,n ' 龖 ' union ALL Select 33,n ' 龗 ' UNION ALL select 3 5,n ' 齾 ' union ALL Select 36,n ' snuffle ' union ALL Select 39,n ' 靐 ' union ALL Select 64,n ' Tantrayana ') T where word>[email protected] Collate CHINESE_PRC_STROKE_CS_AS_KS_WS ORDER BY ID ASC) ELSE 0 end) Set @str =right (@str, Len (@str)-1) End return @n end

--Function Call instance: Select DBO.FUN_GETBH (' People's Republic of China '), DBO.FUN_GETBH (' Chinese People's Republic ') execution result: The total number of strokes is 39 and 46 respectively, simple and traditional.

Of course, you can also put the above "UNION all" within the fixed table of Chinese characters and strokes, in the Chinese character column built clustered INDEX, the column collation is set to: Chinese_prc_stroke_cs_as_ks_ws this faster. If you use the BIG5 code of the operating system, you have to generate additional characters, the same way. But one thing to keep in mind: These characters are chosen by SQL statements, not by hand, not by dictionary, because the Xinhua dictionary is different from the Unicode character set, and the results of the dictionary search will be incorrect.

To get the initials of Chinese characters by using the character of sorting rules

With the same total number of strokes, we can also write a function to find the first letter of Chinese pinyin. As follows:

Create function fun_getpy (@str nvarchar (4000)) returns nvarchar (4000) as BEGIN DECLARE @word nchar (1), @PY nvarchar (4000) s ET @PY = ' while Len (@str) >0 begin set @word =left (@str, 1)--if non-kanji characters, return the original character set @[email protected]+ (case when Unicod E (@word) between 19968 and 19968+20901 then (select top 1 PY from (SELECT ' A ' as Py,n ' 驁 ' as Word union ALL select ' B ', N ' book ' UNION ALL SELECT ' C ', n ' wrong ' union ALL Select ' D ', n ' 鵽 ' union ALL Select ' E ', n ' 樲 ' union ALL Select ' F ', n ' 鰒 ' UNION ALL Select ' G ', n ' hiker ' union ALL Select ' H ', n ' inceѕt ' union ALL Select ' J ', n ' 攈 ' union ALL Select ' K ', n ' 穒 ' union ALL Select ' L ', n ' 鱳 ' union al L Select ' M ', n ' temperature ' union ALL select ' N ', n ' 桛 ' union ALL Select ' O ', n ' 漚 ' union ALL Select ' P ', n ' exposure ' UNION ALL SELECT ' Q ', n ' 囕 ' UNION ALL SELECT ' R ', n ' 鶸 ' union ALL Select ' S ', n ' 蜶 ' union ALL Select ' T ', n ' 籜 ' union ALL Select ' W ', n ' clamoring ' union ALL SELECT ' X ', n ' 鑂 ' union All-select ' Y ', N ' Wan Leng ' union ALL Select ' Z ', n ' off ') T where word>[email protected] collate Chinese_prc_c S_as_ks_ws ORDER by PY asC) Else @word end) Set @str =right (@str, Len (@str)-1) End return @PY end

--Function Call instance: Select Dbo.fun_getpy (' People's Republic of China '), dbo.fun_getpy (' The People's Republic '), the results are: ZHRMGHG

Understanding Collate Chinese_prc_ci_as

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.