Extension of simplified cvcode conversion: GBK and big5 Conversion

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Cvcode uses the code table comparison method to implement simplified and complex conversion. It still has its practical significance in the prevalence of Unicode.
The most common application is that there are employees of Taiwan nationality and mainland China in the enterprise, and both the simplified and traditional OS are used, in this way, how can we ensure that gb2312, GBK, and big5 are normal in the MIS system?
Cvcode uses the code table comparison method to implement simplified and complex conversion. It still has its practical significance in the prevalence of Unicode.
The most common application is that there are employees of Taiwan nationality and mainland China in the enterprise, and both the simplified and traditional OS are used, in this way, in the MIS system, how can we ensure that gb2312, GBK, and big5 can be used normally, and the materials entered by big5 must be displayed normally on the GBK system, it can also match the characters entered in gb2312 (query by name is the most common ).
For such an application, cvcode provides a code table comparison method. Theoretically, as long as the code table is defined, big5 and gb2312 can be truly interconnected"

However, in cvcode, only gb2312 and big5 are converted. In today's popular GBK input methods, gb2312 is obviously not enough. Furthermore, the big5 character set is much larger than gb2312, so it is imperative to extend cvcode to enable GBK and big5 conversion functions.

The GBK character range is as follows:

GBK character set Range
High and low partition Levels <
----------------------------------------------
● GBK/1: gb2312 non-Chinese characters: A1 ~ A9 | A1 ~ Fe
● GBK/2: gb2312 Chinese characters: B0 ~ F7 | A1 ~ Fe
● GBK/3: Expanded Chinese characters: 81 ~ A0 | 40 ~ Fe
● GBK/4: Expanded Chinese characters: AA ~ Fe | 40 ~ A0
● GBK/5: expanded non-Chinese characters: A8 ~ A9 | 40 ~ A0
1 and 2 are the corresponding gb2312 character sets.

There are three questions about how to make cvcode support GBK:

1. Determine whether the code is GB.
2. Calculate the Character Sequence
3. compatible with the original code table

To solve the first problem, modify isgb as follows:

Function isgb (value: string): Boolean;
VaR
Mhigh, mlow: integer;
Begin
If (length (value)> = 2) then
Begin
Mhigh: = ord (value [1]);
Mlow: = ord (value [2]);
Result: = false;
// ● GBK/1: gb2312 non-Chinese characters: A1 ~ A9 | A1 ~ Fe
If (mhigh in [$ A1.. $ A9]) and (mlow in [$ A1.. $ Fe]) then result: = true;
// ● GBK/2: gb2312 Chinese Character: B0 ~ F7 | A1 ~ Fe
If (mhigh in [$ b0.. $ F7]) and (mlow in [$ A1.. $ Fe]) then result: = true;
// ● GBK/3: Expanded Chinese characters: 81 ~ A0 | 40 ~ Fe
If (mhigh in [$81 .. $ A0]) and (mlow in [$40 .. $ Fe]) then result: = true;
// ● GBK/4: Expanded Chinese characters: AA ~ Fe | 40 ~ A0
If (mhigh in [$ aa... $ Fe]) and (mlow in [$40... $ A0]) then result: = true;
// ● GBK/5: expand non-Chinese characters: A8 ~ A9 | 40 ~ A0
If (mhigh in [$ a8.. $ A9]) and (mlow in [$40... $ A0]) then result: = true;
End
Else
Result: = true;
{// This is the original one. It is based only on gb2312.
If (length (value)> = 2) then
Begin
If (value [1] <= #161) and (value [1] >=# 247) then
Result: = false
Else
If (value [2] <= #161) and (value [2]> = #254) then
Result: = false
Else
Result: = true
End
Else
Result: = true;
}
End;

The second is to calculate the order and be compatible with the original code table-in fact, the compatibility is mainly in the order:

Function gboffset (value: string): integer;
VaR
Mhigh, mlow: integer;
Mgbk1, mgbk2, mgbk3, mgbk4, mgbk5: integer;
Begin
{// This is the original ---
If length (value)> = 2 then
Result: = (ord (value [1])-$ A1) * $ 5E + (ord (value [2])-$ A1)
Else
Result: =-1;
}
Result: =-1;
If length (value)> = 2 then
Begin
Mhigh: = ord (value [1]);
Mlow: = ord (value [2]);
// How many Chinese characters are there in each area?
// Mgbk1: = ($ A9-$ A1 + 1) * ($ Fe-$ A1 + 1); // = 846 = $ 34E
// Mgbk2: = ($ F7-$ b0 + 1) * ($ Fe-$ A1 + 1); // = 6768 = $1a70
// Mgbk3: = ($ A0-$81 + 1) * ($ Fe-$40 + 1 );
// Mgbk4: = ($ Fe-$ AA + 1) * ($ A0-$40 + 1 );
// Mgbk5: = ($ A9-$ A8 + 1) * ($ A0-$40 + 1 );
Mgbk1: = $ 34E; // 846
Mgbk1: = mgbk1 + ($ B0-$ A9-1) * ($ Fe-$ A1 + 1); // This is intended to be compatible with previous code tables
Mgbk2: = $1a70; // 6768
Mgbk3: = $17e0; // 6112
Mgbk4: = $2035; // 8245
Mgbk5: = $ C2; // 194
// ● GBK/1: gb2312 non-Chinese characters: A1 ~ A9 | A1 ~ Fe
If (mhigh in [$ A1.. $ A9]) and (mlow in [$ A1.. $ Fe]) then
Result: = (mhigh-$ A1) * ($ Fe-$ A1 + 1) + (mlow-$ A1)
// ● GBK/2: gb2312 Chinese Character: B0 ~ F7 | A1 ~ Fe
Else if (mhigh in [$ b0.. $ F7]) and (mlow in [$ A1.. $ Fe]) then
Result: = mgbk1 +
(Mhigh-$ B0) * ($ Fe-$ A1 + 1) + (mlow-$ A1)
// ● GBK/3: Expanded Chinese characters: 81 ~ A0 | 40 ~ Fe
Else if (mhigh in [$81... $ A0]) and (mlow in [$40 .. $ Fe]) then
Result: = mgbk1 + mgbk2 +
(Mhigh-$81) * ($ Fe-$40 + 1) + (mlow-$40)
// ● GBK/4: Expanded Chinese characters: AA ~ Fe | 40 ~ A0
Else if (mhigh in [$ aa... $ Fe]) and (mlow in [$40... $ A0]) then
Result: = mgbk1 + mgbk2 + mgbk3 +
(Mhigh-$ aa) * ($ A0-$40 + 1) + (mlow-$40)
// ● GBK/5: expand non-Chinese characters: A8 ~ A9 | 40 ~ A0
Else if (mhigh in [$ a8... $ A9]) and (mlow in [$40... $ A0]) then
Result: = mgbk1 + mgbk2 + mgbk3 + mgbk4 +
(Mhigh-$ A8) * ($ A0-$40 + 1) + (mlow-$40 );
End
End;

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Extension of simplified cvcode conversion: GBK and big5 Conversion

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Extension of simplified cvcode conversion: GBK and big5 Conversion

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support