Cvcode uses the code table comparison method to implement simplified and complex conversion. It still has its practical significance in the prevalence of Unicode.
The most common application is that there are employees of Taiwan nationality and mainland China in the enterprise, and both the simplified and traditional OS are used, in this way, how can we ensure that gb2312, GBK, and big5 are normal in the MIS system?
Cvcode uses the code table comparison method to implement simplified and complex conversion. It still has its practical significance in the prevalence of Unicode.
The most common application is that there are employees of Taiwan nationality and mainland China in the enterprise, and both the simplified and traditional OS are used, in this way, in the MIS system, how can we ensure that gb2312, GBK, and big5 can be used normally, and the materials entered by big5 must be displayed normally on the GBK system, it can also match the characters entered in gb2312 (query by name is the most common ).
For such an application, cvcode provides a code table comparison method. Theoretically, as long as the code table is defined, big5 and gb2312 can be truly interconnected"
However, in cvcode, only gb2312 and big5 are converted. In today's popular GBK input methods, gb2312 is obviously not enough. Furthermore, the big5 character set is much larger than gb2312, so it is imperative to extend cvcode to enable GBK and big5 conversion functions.
The GBK character range is as follows:
GBK character set Range
High and low partition Levels <
----------------------------------------------
● GBK/1: gb2312 non-Chinese characters: A1 ~ A9 | A1 ~ Fe
● GBK/2: gb2312 Chinese characters: B0 ~ F7 | A1 ~ Fe
● GBK/3: Expanded Chinese characters: 81 ~ A0 | 40 ~ Fe
● GBK/4: Expanded Chinese characters: AA ~ Fe | 40 ~ A0
● GBK/5: expanded non-Chinese characters: A8 ~ A9 | 40 ~ A0
1 and 2 are the corresponding gb2312 character sets.
There are three questions about how to make cvcode support GBK:
1. Determine whether the code is GB.
2. Calculate the Character Sequence
3. compatible with the original code table
To solve the first problem, modify isgb as follows:
Function isgb (value: string): Boolean;
VaR
Mhigh, mlow: integer;
Begin
If (length (value)> = 2) then
Begin
Mhigh: = ord (value [1]);
Mlow: = ord (value [2]);
Result: = false;
// ● GBK/1: gb2312 non-Chinese characters: A1 ~ A9 | A1 ~ Fe
If (mhigh in [$ A1.. $ A9]) and (mlow in [$ A1.. $ Fe]) then result: = true;
// ● GBK/2: gb2312 Chinese Character: B0 ~ F7 | A1 ~ Fe
If (mhigh in [$ b0.. $ F7]) and (mlow in [$ A1.. $ Fe]) then result: = true;
// ● GBK/3: Expanded Chinese characters: 81 ~ A0 | 40 ~ Fe
If (mhigh in [$81 .. $ A0]) and (mlow in [$40 .. $ Fe]) then result: = true;
// ● GBK/4: Expanded Chinese characters: AA ~ Fe | 40 ~ A0
If (mhigh in [$ aa... $ Fe]) and (mlow in [$40... $ A0]) then result: = true;
// ● GBK/5: expand non-Chinese characters: A8 ~ A9 | 40 ~ A0
If (mhigh in [$ a8.. $ A9]) and (mlow in [$40... $ A0]) then result: = true;
End
Else
Result: = true;
{// This is the original one. It is based only on gb2312.
If (length (value)> = 2) then
Begin
If (value [1] <= #161) and (value [1] >=# 247) then
Result: = false
Else
If (value [2] <= #161) and (value [2]> = #254) then
Result: = false
Else
Result: = true
End
Else
Result: = true;
}
End;
The second is to calculate the order and be compatible with the original code table-in fact, the compatibility is mainly in the order:
Function gboffset (value: string): integer;
VaR
Mhigh, mlow: integer;
Mgbk1, mgbk2, mgbk3, mgbk4, mgbk5: integer;
Begin
{// This is the original ---
If length (value)> = 2 then
Result: = (ord (value [1])-$ A1) * $ 5E + (ord (value [2])-$ A1)
Else
Result: =-1;
}
Result: =-1;
If length (value)> = 2 then
Begin
Mhigh: = ord (value [1]);
Mlow: = ord (value [2]);
// How many Chinese characters are there in each area?
// Mgbk1: = ($ A9-$ A1 + 1) * ($ Fe-$ A1 + 1); // = 846 = $ 34E
// Mgbk2: = ($ F7-$ b0 + 1) * ($ Fe-$ A1 + 1); // = 6768 = $1a70
// Mgbk3: = ($ A0-$81 + 1) * ($ Fe-$40 + 1 );
// Mgbk4: = ($ Fe-$ AA + 1) * ($ A0-$40 + 1 );
// Mgbk5: = ($ A9-$ A8 + 1) * ($ A0-$40 + 1 );
Mgbk1: = $ 34E; // 846
Mgbk1: = mgbk1 + ($ B0-$ A9-1) * ($ Fe-$ A1 + 1); // This is intended to be compatible with previous code tables
Mgbk2: = $1a70; // 6768
Mgbk3: = $17e0; // 6112
Mgbk4: = $2035; // 8245
Mgbk5: = $ C2; // 194
// ● GBK/1: gb2312 non-Chinese characters: A1 ~ A9 | A1 ~ Fe
If (mhigh in [$ A1.. $ A9]) and (mlow in [$ A1.. $ Fe]) then
Result: = (mhigh-$ A1) * ($ Fe-$ A1 + 1) + (mlow-$ A1)
// ● GBK/2: gb2312 Chinese Character: B0 ~ F7 | A1 ~ Fe
Else if (mhigh in [$ b0.. $ F7]) and (mlow in [$ A1.. $ Fe]) then
Result: = mgbk1 +
(Mhigh-$ B0) * ($ Fe-$ A1 + 1) + (mlow-$ A1)
// ● GBK/3: Expanded Chinese characters: 81 ~ A0 | 40 ~ Fe
Else if (mhigh in [$81... $ A0]) and (mlow in [$40 .. $ Fe]) then
Result: = mgbk1 + mgbk2 +
(Mhigh-$81) * ($ Fe-$40 + 1) + (mlow-$40)
// ● GBK/4: Expanded Chinese characters: AA ~ Fe | 40 ~ A0
Else if (mhigh in [$ aa... $ Fe]) and (mlow in [$40... $ A0]) then
Result: = mgbk1 + mgbk2 + mgbk3 +
(Mhigh-$ aa) * ($ A0-$40 + 1) + (mlow-$40)
// ● GBK/5: expand non-Chinese characters: A8 ~ A9 | 40 ~ A0
Else if (mhigh in [$ a8... $ A9]) and (mlow in [$40... $ A0]) then
Result: = mgbk1 + mgbk2 + mgbk3 + mgbk4 +
(Mhigh-$ A8) * ($ A0-$40 + 1) + (mlow-$40 );
End
End;