Understanding collate chinese_prc_ci

Understanding collate chinese_prc_ci_as null

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When creating table, we often encounter such a statement, for example, password nvarchar (10) Collate chinese_prc_ci_as null. What does it mean? Take a look at the following:

First, collate is a clause that can be applied to database definitions or column definitions to define sorting rules, or to string expressions to apply projection of sorting rules.

The syntax is collate collation_name.

Collation_name ::={ windows_collation_name} | {SQL _collation_name}

The collate_name parameter is the name of the sorting rule applied to expressions, column definitions, or database definitions. Collation_name can only be the specified windows_collation_name or SQL _collation_name.

Windows_collation_name is the name of a Windows sorting rule. See windows sorting rule name.
SQL _collation_name is the name of the SQL sorting rule. See SQL sorting rule name.

The following describes the sorting rules:

What is a sorting rule? "In Microsoft SQL Server 2000,
The physical storage of strings is controlled by sorting rules. Sorting rules specify the bit mode and storage of each character
Rules Used for saving and comparing characters. "
Run the following statement in the query analyzer to obtain all the sorting rules supported by SQL Server.

Select * From: fn_helpcollations ()

The name of a sorting rule consists of two parts. The first half is the character set supported by this sorting rule.
For example:
Chinese_prc_cs_ai_ws
First half: the Unicode character set. The chinese_prc _ pointer sorts Unicode in simplified Chinese characters.
The second half of the sorting rule is the suffix meaning:
_ Bin binary sorting
_ Ci (CS) is case sensitive, CI is case insensitive, and CS is case sensitive
_ Whether AI (AS) distinguishes stress, AI does not distinguish,
_ KI (KS) indicates whether Kana is distinguished. Ki is not distinguished, and KS is distinguished.
_ Wi (WS)Whether to differentiate width WI and WS

Case Sensitive: select this option if you want to make the comparison between uppercase and lowercase letters different.
Accent differentiation: select this option if you want to treat the comparison as different from the accent and non-accent letters. If this option is selected,
Comparison also treats letters with different accents as unequal.
Kana differentiation: select this option if you want to treat Katakana and katakana as different Japanese syllables.
Width differentiation: select this option if you want to make the comparison between halfwidth and fullwidth characters.

Use sorting rule features to calculate Chinese Character strokes and obtain the first letter of Pinyin

SQL Server sorting rules are usually not used many times, and many beginners may be unfamiliar, but some
A common error occurs when you query a database that is connected to multiple tables in different databases.
If the default Character Set of the library is different, the system will return the following error:

"The sorting rule conflict for equal to operations cannot be resolved ."

I. Error Analysis:
This error is caused by inconsistent sorting rules. Let's perform a test, for example:
Create Table # T1 (
Name varchar (20) Collate albanian_ci_ai_ws,
Value INT)

Create Table # T2 (
Name varchar (20) Collate chinese_prc_ci_ai_ws,
Value INT)

After the table is created, execute the connection query:

Select * from # T1 a inner join # T2 B on A. Name = B. Name

In this way, the error occurs:

Server: Message 446, level 16, status 9, Row 1
The sorting rule conflict of the equal to operation cannot be solved.
To eliminate this error, the simplest way is to specify its sorting rules when the table is connected.
No longer appears. The statement is written as follows:

Select *
From # T1 a inner join # T2 B
On a. Name = B. Name collate chinese_prc_ci_ai_ws

Ii. Sorting rule introduction:

Select * From: fn_helpcollations ()

Iii. Application of sorting rules:
SQL Server provides a large number of sorting rules for Windows and SQL Server, but its applications often
Ignored by developers. In fact, it is of great use in practice.

Example 1: sort the content of the table name column in pinyin order:

Create Table # T (ID int, name varchar (20 ))
Insert # T select 1, '中'
Union all select 2, 'status'
Union all select 3, 'people'
Union all select 4, 'A'

Select * from # T order by name collate chinese_prc_cs_as_ks_ws
Drop table # T

Example 2: sort the content of the table name column by the last name strokes:

Create Table # T (ID int, name varchar (20 ))

Insert # T select 1, '3'
Union all select 2, 'B'
Union all select 3, '2'
Union all select 4, '1'
Union all select 5, '10'
Select * from # T order by name collate chinese_prc_stroke_cs_as_ks_ws
Drop table # T

4. Application extension of sorting rules in practice
SQL Server Chinese Character sorting rules can be sorted by pinyin, strokes, etc. How can we use this function?
To solve Chinese Character problems? Here is an example:

Calculate Chinese Character strokes based on the characteristics of sorting rules

To calculate Chinese Character strokes, we must first prepare for the computation. We know that Windows has many Chinese characters and Unicode currently
A total of 20902 Chinese characters are included. The Unicode value of the simplified GBK code starts from 19968.
First, we first use SQL Server to obtain all Chinese characters without dictionary. We simply use SQL statements
You can get:

Select top 20902 code = identity (INT, 19968,1) into # T from syscolumns A, syscolumns B

Use the following statement to obtain all Chinese characters sorted by Unicode values:

Select Code, nchar (CODE) as cnword from # T

Then, we use the SELECT statement to sort it by strokes.

Select Code, nchar (CODE) as cnword
From # T
Order by nchar (CODE) Collate chinese_prc_stroke_cs_as_ks_ws, code

Result:
Code       Cnword
-----------------
19968     I
20008     Bytes
20022     Dian
20031     Bytes
20032     Bytes
20033     Bytes
20057     B
20058     Bytes
20059     Bytes
20101     Bytes
19969     Ding
..........

From the above results, we can clearly see that the code for a Chinese character ranges from 19968 to 20101, from small to large,
The first word of the two Chinese characters is "ding", and the code is 19969, so it will not start again in order. With this result, we can easily
Use SQL statements to obtain the first or last Chinese character of each stroke.
The following statement is used to obtain the last Chinese character:

Create Table # T1 (ID int identity, code int, cnword nvarchar (2 ))

Insert # T1 (Code, cnword)
Select Code, nchar (CODE) as cnword From # T
Order by nchar (CODE) Collate chinese_prc_stroke_cs_as_ks_ws, code

Select a. cnword
From # T1
Left join # t1 B on A. ID = B. ID-1 and A. Code
Where B. Code is null
Order by A. ID

Obtain 36 Chinese characters. Each Chinese character is sorted by chinese_prc_stroke_cs_as_ks_ws sorting rule.
Last Chinese character:

Ma fenglongqi

As shown above, "marker" is the last word after sorting all Chinese characters. "marker" is the last word after sorting all the two Chinese characters.
A word... and so on.
However, it was also found that the strokes behind the 33rd Chinese character "33 strokes" were messy and incorrect. But it doesn't matter. It's better than "success" strokes.
There are only four Chinese characters. We manually add: 35, 36, 39, and 64.

Create a Chinese character stroke table (tab_hzbh ):
Create Table tab_hzbh (ID int identity, cnword nchar (1 ))
-- Insert the first 33 Chinese Characters
Insert tab_hzbh
Select top 33 A. cnword
From # T1
Left join # t1 B on A. ID = B. ID-1 and A. Code
Where B. Code is null
Order by A. ID
-- Add the last four Chinese Characters
Set identity_insert tab_hzbh on
Go
Insert tab_hzbh (ID, cnword)
Select 35, N 'hour'
Union all select 36, N 'hour'
Union all Select 39, N 'hour'
Union all select 64, N 'hour'
Go
Set identity_insert tab_hzbh off
Go

So far, we can get the result. For example, we want to get the Chinese character "country" strokes:

Declare @ A nchar (1)
Set @ A = 'Guo'
Select top 1 ID
From Tab_hzbh
Where cnword >=@ A collate chinese_prc_stroke_cs_as_ks_ws
Order by ID

ID
-----------
8
(Result: the number of strokes in the Chinese "country" is 8)

All the above preparation processes are only used to write the following function. This function disconnects all the temporary tables and
For general purpose and code transfer convenience, the tab_hzbh table content is written in the statement, and a string of user input is calculated.
Total strokes of Chinese characters:

Create Function fun_getbh (@ STR nvarchar (4000 ))
Returns int
As
Begin
Declare @ word nchar (1), @ n int
Set @ n = 0
While Len (@ Str)> 0
Begin
Set @ word = left (@ STR, 1)
-- If it is not a Chinese character, the stroke count is 0.
Set @ n = @ n + (case when Unicode (@ word) between 19968 and 19968 + 20901
Then (select top 1 ID from (
Select 1 as ID, N 'region' as word
Union all select 2, N 'hour'
Union all select 3, N 'mar'
Union all select 4, N 'wind'
Union all select 5, N 'Long'
Union all select 6, N 'qy'
Union all select 7, N 'turtles'
Union all Select 8, N 'tooth'
Union all select 9, N 'hour'
Union all select 10, N 'hour'
Union all select 11, N 'cores'
Union all select 12, N 'hour'
Union all select 13, N 'hour'
Union all select 14, N''
Union all select 15, N 'hour'
Union all select 16, N 'hour'
Union all select 17, N 'hour'
Union all select 18, N 'hour'
Union all select 19, N 'hour'
Union all select 20, N 'hour'
Union all select 21, N 'hour'
Union all select 22, N 'hour'
Union all select 23, N 'hour'
Union all select 24, N 'hour'
Union all Select 25, N 'hour'
Union all select 26, N 'hour'
Union all select 27, N 'hour'
Union all select 28, N 'hour'
Union all select 29, N 'hour'
Union all select 30, N 'hour'
Union all select 31, N 'hour'
Union all select 32, N 'hour'
Union all select 33, N 'hour'
Union all select 35, N 'hour'
Union all select 36, N 'hour'
Union all Select 39, N 'hour'
Union all select 64, N 'hour'
) T
Where word >=@ word collate chinese_prc_stroke_cs_as_ks_ws
Order by id asc) else 0 end)
Set @ STR = right (@ STR, Len (@ Str)-1)
End
Return @ n
End

-- Function call instance:
Select DBO. fun_getbh ('People 'S republic of china '), DBO. fun_getbh ('Central People's Republic of China ')
　
Execution result: the total number of strokes is 39 and 46, both in simplified and Traditional Chinese.

Of course, you can also change the Chinese characters and strokes in the above "Union all" to a fixed table.
Create clustered index for the column, and set the column sorting rule:
Chinese_prc_stroke_cs_as_ks_ws
This is faster. If you are using a big5 code operating system, you have to generate additional Chinese characters in the same way.
However, remember that these Chinese characters are selected using SQL statements rather than manually entered.
It is obtained by searching the dictionary, because the Xinhua Dictionary is different from the Unicode Character Set after all, and the result of searching the dictionary will be incorrect.
Yes.

　　
Obtain the first letter of the Chinese pinyin alphabet using the sorting rule feature.

Using the same total number of strokes, we can also write a function to calculate the first letter of Chinese pinyin. As follows:

Create Function fun_getpy (@ STR nvarchar (4000 ))
Returns nvarchar (4000)
As
Begin
Declare @ word nchar (1), @ py nvarchar (4000)
Set @ py =''
While Len (@ Str)> 0
Begin
Set @ word = left (@ STR, 1)
-- If it is not a Chinese character, the original character is returned.
Set @ py = @ py + (case when Unicode (@ word) between 19968 and 19968 + 20901
Then (select top 1 py from (
Select 'A' as py, n' comment 'as word
Union all select 'B', n''
Union all select 'C', n' then'
Union all select 'D', N 'region'
Union all select 'E', n' then'
Union all select 'F', n' then'
Union all select 'G', n' then'
Union all select 'h', N 'hour'
Union all select 'J', n' else'
Union all select 'k', n' then'
Union all select 'l', n' then'
Union all select 'M', n' then'
Union all select 'n', n' then'
Union all select 'O', n' then'
Union all select 'P', n' exposure'
Union all select 'Q', n' then'
Union all select 'R', N 'region'
Union all select's ', N 'region'
Union all select 'T', n' then'
Union all select 'w', N 'hangzhou'
Union all select 'x', n' then'
Union all select 'y', n' then'
Union all select 'Z', n' else'
) T
Where word >=@ word collate chinese_prc_cs_as_ks_ws
Order by Py ASC) else @ word end)
Set @ STR = right (@ STR, Len (@ Str)-1)
End
Return @ py
End

-- Function call instance:
Select DBO. fun_getpy ('People 'S republic of china '), DBO. fun_getpy ('Central People's Republic of China ')
All results are: zhrmghg

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Understanding collate chinese_prc_ci_as null

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Understanding collate chinese_prc_ci_as null

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support