Chinese character encoding character set for information exchange-Basic Set
The standard Chinese Character Exchange Code is divided into two levels. The first level is a commonly used word with 3755 words, which are arranged in alphabetical order of Chinese pinyin. The second level is a frequently used word with 3008 words, which are arranged by the beginning of the Department. The GB2312 encoding range is 212h-777eh.
UNICODE is the full encoding of two bytes. It also uses two bytes for ASCII characters. The code page uses the value range of the high byte to determine whether it is an ASCII character or a high byte of a Chinese character. If the data is damaged and the content is damaged, the subsequent Chinese characters may be disordered. UNICODE uses two bytes to represent a single character. The most obvious advantage is that it simplifies the processing of Chinese characters.
For more information about encoding, see:
Http://blog.iyi.cn/tech/2005/10/unicode_2.html
Http://blog.iyi.cn/tech/2005/10/unicode.html
Baidu's page is gb2312, And the URL encoding is naturally converted from gb, for example, "1". Baidu's conversion result is D2 % BB, the result of the conversion from the Utf-8 is % E4 % B8 % 80 such as google (gb is 2-byte encoding, UTF-8 is 3-BYTE Variable Length Encoding)
You can use encodeURI and decodeURI of javascript to obtain these results. You can set the page encoding to see different results.
I have to write the conversion program on my own. Fortunately, there is no shortage of gb-utf on the Internet of the table, modify it can be used: gb-utf.txt
This table transfers the gb-byte encoding to the utf hexadecimal encoding instead of the byte encoding.
In javascript, escape and unescape are used to convert hexadecimal encoding. Therefore, the idea of converting gb Chinese characters to utf Chinese characters is: encodeURI ("gb Chinese characters "), find the hexadecimal encoding of utf in the comparison table, and find unescape ("hexadecimal utf Encoding") to obtain the utf Chinese characters.
The most important step in the middle is that I only use this step for conversion. The other two steps can call the two functions directly. The following is the Conversion Program:
Copy codeThe Code is as follows:
Function genCodeStr (){
Var codeRE = new RegExp ("'(. *)': '(. *)'", "gi ");
Var tempStr, codeStr = "";
Var myReader = new Reader ();
MyReader. loadFile ('inc/gb2312_utf.txt '); // change it to the path where your table is stored.
While (! MyReader. fStream. atEndofLine ){
TempStr = new String (myReader. fStream. readLine ());
CodeStr + = tempStr. replace (codeRE, "$1") + ":" + tempStr. replace (codeRE, "$2") + ":";
}
Application ("codeData") = codeStr;
}
Function getCodeStr (){
Var codeStr = new String (Application ("codeData "));
If (codeStr. indexOf ("% a1 % a1") =-1 ){
GenCodeStr ();
}
Return new String (Application ("codeData "));
}
Function gb2utf (gbStr ){
Var codeStr = getCodeStr ();
Var codeRE = new RegExp ("(%...)", "gi ");
Var replaceRE = new RegExp ("(%...)", "I ");
Var gbCode;
Var utfCode;
Var gbStart;
While (codeRE. lastIndex <gbStr. length) & replaceRE. test (gbStr )){
CodeRE.exe c (gbStr );
GbCode = new String (RegExp. $1 );
GbStart = new Number (codeStr. indexOf (gbCode. toLowerCase ()));
Var utfStart = 0;
If (gbStart! =-1 ){
UtfStart = gbStart + 7;
UtfCode = codeStr. substring (utfStart, utfStart + 6 );
} Else {
UtfCode = "% u3000 ";
}
GbStr = gbStr. replace (replaceRE, utfCode );
}
Return gbStr;
}
Function Reader () {// Class Reader ()
This. fso; // Private fso
This. fUri; // Private fUri
This. fStream; // Private fStream
Try {
This. fso = new ActiveXObject ("Scripting. FileSystemObject ");
} Catch (exception ){
Throw exception;
}
This. loadFile = function (file) {// Public loadFile (file)
This. fUri = Server. mappath (file );
// Var fStream = fso. CreateTextFile (tfolder, true, false );
// FStream. WriteLine ('test ');
If (this. fso. fileExists (this. fUri )){
This. fStream = this. fso. openTextFile (this. fUri );
} Else {
Response. write ('file dos not exist ');
}
}
This. readLineN = function (num ){
Var I = 1;
While (I <num &&! This. fStream. atEndOfLine ){
This. fStream. skipLine ();
I ++
}
Return this. fStream. readLine ();
}
This. closeFile = function (){
FStream. Close ();
Fso. Close;
}
}
Check the gb-utf.txt table. You can change the read path by yourself.
Note that the above program needs to run on the server because file operations are involved.
For javascript client programs, see
[Ctrl + A select all Note: If you need to introduce external Js, You need to refresh it to execute]