Javascript| Code
Chinese character standard Exchange code is divided into two levels. The first level is commonly-called, there are 3755 words, alphabetical order, the second level for the second word, there are 3008 words, according to the radical arrangement. The GB2312 coding range is 2121h-777eh.
UNICODE is a two-byte full encoding, and it also uses a two-byte representation for ASCII characters. The code page determines whether it is an ASCII character or a high byte of a Chinese character by a high byte range of values. If data corruption occurs, some content is corrupted, which can cause the confusion of Chinese characters later. Unicode uses two bytes to denote a character, the most obvious advantage is that it simplifies the process of processing Chinese characters.
The article on encoding can refer to:
Baidu's page is gb2312, URL encoding nature is also from the GB conversion, such as "one" the word, Baidu conversion results are D2%BB, and from Utf-8 conversion to the result is%e4%b8%80 such as Google (GB is 2 byte encoding, Utf-8 is 3 bytes variable length encoding)
You can use JavaScript encodeURI and decodeURI to get these results, and you can set the page code to see the different results.
Search on the Internet, also did not find a ready-made conversion procedures, had to write their own. Fortunately, there is no shortage of gb-utf on the Internet, modified a bit can be used: gb-utf.txt
This table is a 16-byte encoding to the UTF, rather than a byte encoding.
In JavaScript, Escape and unescape are converted into 16 encoding, so the conversion of GB to UTF Chinese characters is: encodeURI ("GB characters"), to find the UTF in the comparison table of the 16 coding, unescape (" 16 UTF Code "), get UTF Chinese characters.
The middle of that step is the most critical, my conversion only used this step, the other two direct calls to the two functions on it. The following is a translator:
function Gencodestr () {
var codere = new RegExp ("' (. *) ': ' (. *) '", "GI");
var tempstr,codestr = "";
var myreader = new Reader ();
Myreader.loadfile (' inc/gb2312_utf.txt '); change this to your table. The path to deposit
while (!myreader.fstream.atendofline) {
TempStr = new String (MyReader.fStream.readLine ());
Codestr + = Tempstr.replace (Codere, "$") + ":" + tempstr.replace (Codere, "$") + ":";
}
Application ("codeData") = Codestr;
}
function Getcodestr () {
var codestr = new String (Application ("CodeData"));
if (Codestr.indexof ("%a1%a1") = = 1) {
Gencodestr ();
}
return new String (Application ("CodeData"));
}
function Gb2utf (GBSTR) {
var codestr = Getcodestr ();
var codere = new RegExp (%.. %..) "," GI ");
var replacere = new RegExp (%.. %..) "," I ");
var Gbcode;
var Utfcode;
var Gbstart;
while ((Codere.lastindex < Gbstr.length) && Replacere.test (gbstr)) {
Codere.exec (GBSTR);
Gbcode = new String (regexp.$1);
Gbstart = new Number (Codestr.indexof (Gbcode.tolowercase ()));
var utfstart = 0;
if (Gbstart!=-1) {
utfstart= Gbstart + 7;
Utfcode = codestr.substring (Utfstart,utfstart + 6);
}else{
Utfcode = "%u3000";
}
Gbstr = Gbstr.replace (Replacere,utfcode);
}
return gbstr;
}
function Reader () {//class reader ()
THIS.FSO; Private FSO
This.furi; Private Furi
This.fstream; Private FStream
try{
This.fso = new ActiveXObject ("Scripting.FileSystemObject");
}catch (Exception) {
Throw exception;
}
This.loadfile = function (file) {//public loadfile (file)
This.furi = Server.MapPath (file);
var fstream = fso. CreateTextFile (Tfolder,true,false);
Fstream.writeline (' Test ');
if (this.fso.fileExists (This.furi)) {
This.fstream = This.fso.openTextFile (This.furi);
}else{
Response.Write (' File dos not exist ');
}
}
This.readlinen = function (num) {
var i = 1;
while (i < num &&!this.fstream.atendofline) {
This.fStream.skipLine ();
i++
}
return This.fStream.readLine ();
}
This.closefile = function () {
Fstream.close ();
Fso. Close;
}
}
The comparison table is Gb-utf.txt, you can change your own reading path. Also note that the above program needs to be run on the server side because it involves file operations.