Use sorting rule features to calculate Chinese Character strokes and obtain the first letter of Pinyin

Source: Internet
Author: User
Tags comparison table

SQL Server sorting rules are usually not used many times, and many beginners may be unfamiliar, but some
A common error occurs when you query a database that is connected to multiple tables in different databases.
If the default Character Set of the library is different, the system will return the following error:

"The sorting rule conflict for equal to operations cannot be resolved ."

I. Error Analysis:
This error is caused by inconsistent sorting rules. Let's perform a test, for example:
Create Table # T1 (
Name varchar (20) Collate albanian_ci_ai_ws,
Value INT)

Create Table # T2 (
Name varchar (20) Collate chinese_prc_ci_ai_ws,
Value INT)

After the table is created, execute the connection query:

Select * from # T1 a inner join # T2 B on A. Name = B. Name

In this way, the error occurs:

Server: Message 446, level 16, status 9, Row 1
The sorting rule conflict of the equal to operation cannot be solved.
To eliminate this error, the simplest way is to specify its sorting rules when the table is connected.
No longer appears. The statement is written as follows:

Select *
From # T1 a inner join # T2 B
On a. Name = B. Name collate chinese_prc_ci_ai_ws


Ii. Sorting rule introduction:

What is a sorting rule? "In Microsoft SQL Server 2000,
The physical storage of strings is controlled by sorting rules. Sorting rules specify the bit mode and storage of each character
Rules Used for saving and comparing characters. "
Run the following statement in the query analyzer to obtain all the sorting rules supported by SQL Server.

Select * From: fn_helpcollations ()

The name of a sorting rule consists of two parts. The first half is the character set supported by this sorting rule.
For example:
Chinese_prc_cs_ai_ws
First half: the Unicode character set. The chinese_prc _ pointer sorts Unicode in simplified Chinese characters.
The second half of the sorting rule is the suffix meaning:
_ Bin binary sorting
_ Ci (CS) is case sensitive, CI is case insensitive, and CS is case sensitive
_ Whether AI (AS) distinguishes stress, AI does not distinguish,
_ KI (KS) indicates whether Kana is distinguished. Ki is not distinguished, and KS is distinguished.
_ Whether wi (WS) is differentiated by width WI and WS

Case Sensitive: select this option if you want to make the comparison between uppercase and lowercase letters different.
Accent differentiation: select this option if you want to treat the comparison as different from the accent and non-accent letters. If this option is selected,
Comparison also treats letters with different accents as unequal.
Kana differentiation: select this option if you want to treat Katakana and katakana as different Japanese syllables.
Width differentiation: select this option if you want to make the comparison between halfwidth and fullwidth characters.


Iii. Application of sorting rules:
SQL Server provides a large number of sorting rules for Windows and SQL Server, but its applications often
Ignored by developers. In fact, it is of great use in practice.

Example 1: sort the content of the table name column in pinyin order:

Create Table # T (ID int, name varchar (20 ))
Insert # T select 1, '中'
Union all select 2, 'status'
Union all select 3, 'people'
Union all select 4, 'A'

Select * from # T order by name collate chinese_prc_cs_as_ks_ws
Drop table # T
/* Result:
ID name
-------------------------------
4
2 countries
3 persons
Medium
*/

Example 2: sort the content of the table name column by the last name strokes:

Create Table # T (ID int, name varchar (20 ))

Insert # T select 1, '3'
Union all select 2, 'B'
Union all select 3, '2'
Union all select 4, '1'
Union all select 5, '10'
Select * from # T order by name collate chinese_prc_stroke_cs_as_ks_ws
Drop table # T
/* Result:
ID name
-------------------------------
4yi
2 B
3 II
5th
1 3
*/

4. Application extension of sorting rules in practice
SQL Server Chinese Character sorting rules can be sorted by pinyin, strokes, etc. How can we use this function?
To solve Chinese Character problems? Here is an example:

Calculate Chinese Character strokes based on the characteristics of sorting rules

To calculate Chinese Character strokes, we must first prepare for the computation. We know that Windows has many Chinese characters and Unicode currently
A total of 20902 Chinese characters are included. The Unicode value of the simplified GBK code starts from 19968.
First, we first use SQL Server to obtain all Chinese characters without dictionary. We simply use SQL statements
You can get:

Select top 20902 code = identity (INT, 19968,1) into # T from syscolumns A, syscolumns B

Use the following statement to obtain all Chinese characters sorted by Unicode values:

Select Code, nchar (CODE) as cnword from # T

Then, we use the SELECT statement to sort it by strokes.

Select Code, nchar (CODE) as cnword
From # T
Order by nchar (CODE) Collate chinese_prc_stroke_cs_as_ks_ws, code

Result:
Code cnword
-----------------
19968 I
20008 bytes
20022
20031 bytes
20032 bytes
20033 bytes
20057 B
20058 bytes
20059 bytes
20101 bytes
19969 ding
..........

From the above results, we can clearly see that the code for a Chinese character ranges from 19968 to 20101, from small to large,
The first word of the two Chinese characters is "ding", and the code is 19969, so it will not start again in order. With this result, we can easily
Use SQL statements to obtain the first or last Chinese character of each stroke.
The following statement is used to obtain the last Chinese character:

Create Table # T1 (ID int identity, code int, cnword nvarchar (2 ))

Insert # T1 (Code, cnword)
Select Code, nchar (CODE) as cnword from # T
Order by nchar (CODE) Collate chinese_prc_stroke_cs_as_ks_ws, code


Select a. cnword
From # T1
Left join # t1 B on A. ID = B. ID-1 and A. Code <B. Code
Where B. Code is null
Order by A. ID

Obtain 36 Chinese characters. Each Chinese character is sorted by chinese_prc_stroke_cs_as_ks_ws sorting rule.
Last Chinese character:

Ma fenglongqi

As shown above, "marker" is the last word after sorting all Chinese characters. "marker" is the last word after sorting all the two Chinese characters.
A word... and so on.
However, it was also found that the strokes behind the 33rd Chinese character "33 strokes" were messy and incorrect. But it doesn't matter. It's better than "success" strokes.
There are only four Chinese characters. We manually add: 35, 36, 39, and 64.

Create a Chinese character stroke table (tab_hzbh ):
Create Table tab_hzbh (ID int identity, cnword nchar (1 ))
-- Insert the first 33 Chinese Characters
Insert tab_hzbh
Select top 33 A. cnword
From # T1
Left join # t1 B on A. ID = B. ID-1 and A. Code <B. Code
Where B. Code is null
Order by A. ID
-- Add the last four Chinese Characters
Set identity_insert tab_hzbh on
Go
Insert tab_hzbh (ID, cnword)
Select 35, N 'hour'
Union all select 36, N 'hour'
Union all Select 39, N 'hour'
Union all select 64, N 'hour'
Go
Set identity_insert tab_hzbh off
Go

So far, we can get the result. For example, we want to get the Chinese character "country" strokes:

Declare @ A nchar (1)
Set @ A = 'Guo'
Select top 1 ID
From tab_hzbh
Where cnword >=@ A collate chinese_prc_stroke_cs_as_ks_ws
Order by ID

ID
-----------
8
(Result: the number of strokes in the Chinese "country" is 8)

All the above preparation processes are only used to write the following function. This function disconnects all the temporary tables and
For general purpose and code transfer convenience, the tab_hzbh table content is written in the statement, and a string of user input is calculated.
Total strokes of Chinese characters:

Create Function fun_getbh (@ STR nvarchar (4000 ))
Returns int
As
Begin
Declare @ word nchar (1), @ n int
Set @ n = 0
While Len (@ Str)> 0
Begin
Set @ word = left (@ STR, 1)
-- If it is not a Chinese character, the stroke count is 0.
Set @ n = @ n + (case when Unicode (@ word) between 19968 and 19968 + 20901
Then (select top 1 ID from (
Select 1 as ID, N 'region' as word
Union all select 2, N 'hour'
Union all select 3, N 'mar'
Union all select 4, N 'wind'
Union all select 5, N 'Long'
Union all select 6, N 'qy'
Union all select 7, N 'turtles'
Union all Select 8, N 'tooth'
Union all select 9, N 'hour'
Union all select 10, N 'hour'
Union all select 11, N 'cores'
Union all select 12, N 'hour'
Union all select 13, N 'hour'
Union all select 14, N''
Union all select 15, N 'hour'
Union all select 16, N 'hour'
Union all select 17, N 'hour'
Union all select 18, N 'hour'
Union all select 19, N 'hour'
Union all select 20, N 'hour'
Union all select 21, N 'hour'
Union all select 22, N 'hour'
Union all select 23, N 'hour'
Union all select 24, N 'hour'
Union all Select 25, N 'hour'
Union all select 26, N 'hour'
Union all select 27, N 'hour'
Union all select 28, N 'hour'
Union all select 29, N 'hour'
Union all select 30, N 'hour'
Union all select 31, N 'hour'
Union all select 32, N 'hour'
Union all select 33, N 'hour'
Union all select 35, N 'hour'
Union all select 36, N 'hour'
Union all Select 39, N 'hour'
Union all select 64, N 'hour'
) T
Where word >=@ word collate chinese_prc_stroke_cs_as_ks_ws
Order by id asc) else 0 end)
Set @ STR = right (@ STR, Len (@ Str)-1)
End
Return @ n
End

-- Function call instance:
Select DBO. fun_getbh ('People 'S republic of china '), DBO. fun_getbh ('Central People's Republic of China ')

Execution result: the total number of strokes is 39 and 46, both in simplified and Traditional Chinese.
----------------------------------------

Of course, you can also change the Chinese characters and strokes in the above "Union all" to a fixed table.
Create clustered index for the column, and set the column sorting rule:
Chinese_prc_stroke_cs_as_ks_ws
This is faster. If you are using a big5 code operating system, you have to generate additional Chinese characters in the same way.
However, remember that these Chinese characters are selected using SQL statements rather than manually entered.
It is obtained by searching the dictionary, because the Xinhua Dictionary is different from the Unicode Character Set after all, and the result of searching the dictionary will be incorrect.
Yes.


Obtain the first letter of the Chinese pinyin alphabet using the sorting rule feature.

Using the same total number of strokes, we can also write a function to calculate the first letter of Chinese pinyin. As follows:

Create Function fun_getpy (@ STR nvarchar (4000 ))
Returns nvarchar (4000)
As
Begin
Declare @ word nchar (1), @ py nvarchar (4000)
Set @ py =''
While Len (@ Str)> 0
Begin
Set @ word = left (@ STR, 1)
-- If it is not a Chinese character, the original character is returned.
Set @ py = @ py + (case when Unicode (@ word) between 19968 and 19968 + 20901
Then (select top 1 py from (
Select 'A' as py, n' comment 'as word
Union all select 'B', n''
Union all select 'C', n' then'
Union all select 'D', N 'region'
Union all select 'E', n' then'
Union all select 'F', n' then'
Union all select 'G', n' then'
Union all select 'h', N 'hour'
Union all select 'J', n' else'
Union all select 'k', n' then'
Union all select 'l', n' then'
Union all select 'M', n' then'
Union all select 'n', n' then'
Union all select 'O', n' then'
Union all select 'P', n' exposure'
Union all select 'Q', n' then'
Union all select 'R', N 'region'
Union all select's ', N 'region'
Union all select 'T', n' then'
Union all select 'w', N 'hangzhou'
Union all select 'x', n' then'
Union all select 'y', n' then'
Union all select 'Z', n' else'
) T
Where word >=@ word collate chinese_prc_cs_as_ks_ws
Order by Py ASC) else @ word end)
Set @ STR = right (@ STR, Len (@ Str)-1)
End
Return @ py
End

-- Function call instance:
Select DBO. fun_getpy ('People 'S republic of china '), DBO. fun_getpy ('Central People's Republic of China ')
All results are: zhrmghg

If you are interested, you can use the same method to extend it to a function for getting all Chinese characters, and you can even get a full read.
Tone, but most of the categories are full. It is best to use a comparison table to get a full search. The search speed for more than 20 thousand Chinese characters is very fast.
The table can also make full use of the table index.
There are many other clever uses of sorting rules, which are not detailed here. You are welcome to discuss it together.

------------------------------------

The following statements are written in PL/SQL:

Create or replace function fun_getpy (STR varchar2)
Returns varchar2
As

Word char (1 );
PY varchar2 (4000): = '';

Begin
While length (STR)> 0 Loop
Word: = lpad (STR, 1 );
PY: = py + (case when Unicode (Word) between 19968 and 19968 + 20901
Then (select py from (select * from (
Select 'A' as py, n' comment 'as word
Union all select 'B', n''
Union all select 'C', n' then'
Union all select 'D', N 'region'
Union all select 'E', n' then'
Union all select 'F', n' then'
Union all select 'G', n' then'
Union all select 'h', N 'hour'
Union all select 'J', n' else'
Union all select 'k', n' then'
Union all select 'l', n' then'
Union all select 'M', n' then'
Union all select 'n', n' then'
Union all select 'O', n' then'
Union all select 'P', n' exposure'
Union all select 'Q', n' then'
Union all select 'R', N 'region'
Union all select's ', N 'region'
Union all select 'T', n' then'
Union all select 'w', N 'hangzhou'
Union all select 'x', n' then'
Union all select 'y', n' then'
Union all select 'Z', n' else'
) T order by Py ASC)
Where rownum <2 and word> = word collate chinese_prc_cs_as_ks_ws
) Else word end)
STR: = rpad (STR, length (STR)-1 );
End loop;
Return py;
End fun_getpy;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.