Understanding collate Chinese_PRC_CI

Understanding collate Chinese_PRC_CI_AS

Last Update:2018-12-05 Source: Internet

Author: User

Tags sorts

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

We often encounter such statements when creating a table, for example, password nvarchar (10)Collate Chinese_prc_ci_asNull, what does it mean? Let's take a look at the following: first,CollateIs a clause that can be applied to database definitions or column definitions to define sorting rules, or to string expressions to apply projection of sorting rules. The syntax isCollateCollation_name: = {windows_collation_name} | {SQL _collation_name} ParameterCollate_ Name is the name of the sorting rule applied to expressions, column definitions, or database definitions. collation_name can only be the specified Windows_collation_name or SQL _collation_name. windows_collation_name is the name of a Windows sorting rule. see Windows sorting rule name. SQL _collation_name is the name of the SQL sorting rule. see SQL sorting rule name. next, we will briefly introduce the sorting rules: What is a sorting rule? The MS is described as follows: "In Microsoft SQL Server 2000, the physical storage of strings is controlled by sorting rules. the sorting rule specifies the bit mode of each character and the rules used for saving and comparing characters. "execute the following statement in the query analyzer to obtain all the sorting rules supported by SQL SERVER. select * from: fn_helpcollations () the sorting rule name consists of two parts. The first half is the character set supported by this sorting rule. for example, the first half of Chinese_PRC_CS_AI_WS indicates the UNICODE Character Set and the Chinese_PRC _ pointer sorts UNICODE in simplified Chinese characters. the second half of the sorting rule is the suffix meaning: _ BIN binary sorting _ CI (CS) is case sensitive, CI is not differentiated, CS is differentiated _ AI (AS) is accent sensitive, AI is not distinguished, AS differentiation _ KI (KS) is differentiated by Kana type, KI is not distinguished, KS differentiation _ WI (WS) is differentiated by width WI is not distinguished, WS is case sensitive: select this option if you want to compare uppercase and lowercase letters. accent differentiation: select this option if you want to treat the comparison as different from the accent and non-accent letters. if this option is selected, letters with different accents are considered unequal. kana differentiation: select this option if you want to treat Katakana and katakana as different Japanese syllables. width differentiation: If you want to compare half-width and full-width characters, please select this option to use the sorting rule features to calculate Chinese Character strokes and obtain the first letter of the pinyin SQL SERVER sorting rules are usually not used a lot, maybe many beginners are still unfamiliar, however, when you encounter a cross-database multi-Table connection query in an SQL SERVER database, if the default Character Set of the two databases is different, the system will return the following error: "the sorting rule conflict of the equal to operation cannot be solved. "1. error Analysis: this error is caused by inconsistent sorting rules. For example, create table # t1 (name varchar (20)CollateAlbanian_CI_AI_WS, value int) create table # t2 (name varchar (20)CollateAfter the Chinese_PRC_CI_AI_WS, value int) Table is created, run the join query: select * from # t1 A inner join # t2 B on. name = B. in this case, the following error occurs: SERVER: Message 446, level 16, status 9, and line 1 cannot resolve the sorting rule conflict for the equal to operation. to eliminate this error, the simplest way is to specify its sorting rules when the table is connected, so that the error will no longer appear. write the statement as follows: select * from # t1 A inner join # t2 B on. name = B. nameCollateChinese_PRC_CI_AI_WS II. Introduction to sorting rules: What is a sorting rule? The MS is described as follows: "In Microsoft SQL Server 2000, the physical storage of strings is controlled by sorting rules. the sorting rule specifies the bit mode of each character and the rules used for saving and comparing characters. "execute the following statement in the query analyzer to obtain all the sorting rules supported by SQL SERVER. select * from: fn_helpcollations () the sorting rule name consists of two parts. The first half is the character set supported by this sorting rule. for example, the first half of Chinese_PRC_CS_AI_WS indicates the UNICODE Character Set and the Chinese_PRC _ pointer sorts UNICODE in simplified Chinese characters. the second half of the sorting rule is the suffix meaning: _ BIN binary sorting _ CI (CS) is case sensitive, CI is not differentiated, CS is differentiated _ AI (AS) is accent sensitive, AI is not distinguished, AS differentiation _ KI (KS) is differentiated by Kana type, KI is not distinguished, KS differentiation _ WI (WS) is differentiated by width WI is not distinguished, WS is case sensitive: select this option if you want to compare uppercase and lowercase letters. accent differentiation: select this option if you want to treat the comparison as different from the accent and non-accent letters. if this option is selected, letters with different accents are considered unequal. kana differentiation: select this option if you want to treat Katakana and katakana as different Japanese syllables. width differentiation: If you want to compare half-width and full-width characters, select this option. application of sorting rules: SQL server provides a large number of sorting rules for WINDOWS and SQL SERVER, but its application is often ignored by developers. in fact, it is of great use in practice. example 1: sort the content of the table NAME column in pinyin: create table # t (id int, name varchar (20) insert # t select 1, 'Union all select 2, 'guo' union all select 3, 'people' union all select 4, 'A' select * from # t order by nameCollateChinese_PRC_CS_AS_KS_WS drop table # t/* result: id name -------------------------- 4 2 3 1 */Example 2 of a Chinese population: sort the content of the table NAME column by name strokes: create table # t (id int, name varchar (20) insert # t select 1, '3' union all select 2, 'B' union all select 3, '2' union all select 4, '1' union all select 5, '10' select * from # t order by nameCollateChinese_PRC_Stroke_CS_AS_KS_WS drop table # t/* result: id name ----------- ------------------- 4 1 2 B 3 2 5 10 1 3 */4. in practice, the extended SQL SERVER Chinese Character sorting rules applied by sorting rules can be sorted by pinyin, strokes, etc. So how can we use this function to solve some difficulties of Chinese characters? Now, for example, to calculate a Chinese character stroke based on the characteristics of sorting rules, we have to prepare for the calculation. We know that for WINDOWS, there are currently 20902 Chinese characters included in UNICODE. the UNICODE value of the simplified GBK code starts from 19968. first, we first use the SQLSERVER method to get all Chinese characters without a dictionary. We can simply use SQL statements to get: select top 20902 code = identity (int, 19968,1) into # t from syscolumns, syscolumns B then uses the following statement to obtain all Chinese characters sorted by UNICODE values: select code, nchar (code) as CNWord from # t. Then, we use the SELECT statement, sort it by strokes. select code, nchar (code) as CNWord from # t order by nchar (code)CollateChinese_PRC_Stroke_CS_AS_KS_WS, code result: code ---------- 19968 20008 20022 20031 20032 20033 20057 20058 20059 20101 .......... CNWord ----- from the above results, we can clearly see that the code for a Chinese character ranges from 19968 to 20101, from small to large, however, when the first word "ding" of two Chinese characters is entered, the CODE is 19969, and it is not started again in order. with this result, we can use SQL statements to obtain the first or last Chinese character of each stroke. the following statement is used to obtain the last Chinese Character: create table # t1 (id int identity, code int, cnword nvarchar (2) insert # t1 (code, cnword) select code, nchar (code) as CNWord from # t order by nchar (code)CollateChinese_PRC_Stroke_CS_AS_KS_WS, code select. cnword from # t1 A left join # t1 B on. id = B. id-1 and. code where B. code is null order by. id to get 36 Chinese characters, each of which is the last Chinese character after sorting by Chinese_PRC_Stroke_CS_AS_KS_WS sorting rules: ma Fenglong Qi turtle tooth: "marker" is the last word after sorting all Chinese characters. "marker" is the last word after sorting all two Chinese characters ...... and so on. at the same time, it was found that the strokes after the 33rd Chinese characters "33 strokes" were messy and incorrect. but it doesn't matter. There are only four Chinese characters more than the "yellow" strokes. We manually add 35 strokes, 36 strokes, and 39 strokes, 64-bit TAB_HZBH: create table tab_hzbh (id int identity, cnword nchar (1) -- insert the first 33 Chinese characters into tab_hzbh select top 33. cnword from # t1 A left join # t1 B on. id = B. id-1 and. code where B. code is null order by. id -- add the last four Chinese characters set identity_insert tab_hzbh on go insert tab_hzbh (id, cnword) select 35, N 'weight' union all select 36, N 'weight' union all select 39, N 'authorization' union all select 64, N 'authorization' go set identity_insert tab_hzbh off go so far, we can get the result. For example, we want to get the stroke of the Chinese character "country: declare @ a nchar (1) set @ a = 'country 'select top 1 id from tab_hzbh where cnword >=@CollateChinese_PRC_Stroke_CS_AS_KS_WS order by id ---------- 8, this function disconnects all the temporary tables and fixed tables created above. For the convenience of common use and code transfer, the content of the tab_hzbh table is written in the statement, and then the total strokes of a string of Chinese characters entered by the user are calculated: create function fun_getbh (@ str nvarchar (4000) returns int as begin declare @ word nchar (1), @ n int set @ n = 0 while len (@ str)> 0 begin set @ word = left (@ str, 1) -- if the stroke is not a Chinese character, set @ n = @ n + (case when unicode (@ word) between 19968 and 19968 + 20901 then (select top 1 id from (select 1 as id, N 'region' as word union all select 2, N 'region' union all select 3, N 'map' union all select 4, N 'wind' union all select 5, N 'long' union all select 6, N 'qy' union all select 7, N 'turtle 'Union all select 8, n' tooth 'Union all select 9, n' 'Union all select 10, n' 'union all select 11, N ''union all select 12, N'' union all select 13, N ''union all select 14, N ''union all select 15, N 'ignore' union all select 16, N 'long' union all select 17, N 'hangzhou' union all select 18, N 'hangzhou' union all select 19, N 'pon' union all select 20, N 'hangzhou' union all select 21, N 'hangzhou' union all select 22, N 'hangzhou' union all select 23, N 'hour' union all select 24, N 'hangzhou' union all select 25, N 'hangzhou' union all select 26, N 'hangzhou' union all select 27, N 'hour' union all select 28, N 'hangzhou' union all select 29, N 'hangzhou' union all select 30, N 'hangzhou' union all select 31, N 'hour' union all select 32, N 'hour' union all select 33, N 'hour' union all select 35, N 'hour' union all select 36, N 'hangzhou' union all select 39, N 'hangzhou' union all select 64, N 'hangzhou') T where word> = @ wordCollateChinese_PRC_Stroke_CS_AS_KS_WS order by id ASC) else 0 end) set @ str = right (@ str, len (@ str)-1) end return @ n end -- function call instance: select dbo. fun_getbh ('People's Republic of China'), dbo. fun_getbh ('People's Republic of China') execution result: the total number of strokes is 39 and 46, which can be simplified or traditional Chinese. of course, you can also change the Chinese characters and strokes in the above "union all" to a fixed table and create the clustered index in the Chinese character column. The column sorting rule is set to Chinese_PRC_Stroke_CS_AS_KS_WS, which is faster. if you are using a BIG5 code operating system, you must generate another Chinese character. however, remember that these Chinese characters are selected using SQL statements, not manually input, but not dictionary query, because the Xinhua Dictionary is different from the UNICODE Character Set after all, the dictionary query result is incorrect. you can use the sorting rule to obtain the first letter of Chinese Character and Pinyin and use the same total number of strokes. You can also write a function to obtain the first letter of Chinese Character and pinyin. example: create function fun_getPY (@ str nvarchar (4000) returns nvarchar (4000) as begin declare @ word nchar (1), @ PY nvarchar (4000) set @ PY = ''while len (@ str)> 0 begin set @ word = left (@ str, 1) -- if it is not a Chinese character, returns the original character set @ PY = @ PY + (case when unicode (@ word) between 19968 and 19968 + 20901 then (select top 1 PY from (select 'A' as PY, N 'hour' as word union all select 'B', n' 'union all select 'C', n' error 'Union all select 'D ', N 'hour' union all select 'E', N 'hangzhou' union all select 'F', N 'hangzhou' union all select 'G ', N 'hour' union all select 'h', N 'hangzhou' union all select 'J', N 'hangzhou' union all select 'k ', N 'hour' union all select 'l', N 'hour' union all select 'M', N 'hour' union all select 'n ', N 'hour' union all select 'O', N 'hour' union all select 'P', N 'exposure 'Union all select 'Q ', N 'hour' union all select 'R', N 'region' union all select's ', N 'region' union all select 'T ', N 'weight' union all select 'w', N 'weight' union all select 'x', N 'weight' union all select 'y ', N 'yun 'union all select 'Z', n' Yun') T where word> = @ wordCollateChinese_PRC_CS_AS_KS_WS order by py asc) else @ word end) set @ str = right (@ str, len (@ str)-1) end return @ PY end -- function call instance: select dbo. fun_getPY ('People's Republic of China'), dbo. the results of fun_getPY ('People's Republic of China' are: ZHRMGHG

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More