JavaScript converts GB2312 encoding to UTF-8 encoding

Source: Internet
Author: User
Tags comparison comparison table range readline return string throw exception
Javascript| Code

Chinese character standard Exchange code is divided into two levels. The first level is commonly-called, there are 3755 words, alphabetical order, the second level for the second word, there are 3008 words, according to the radical arrangement. The GB2312 coding range is 2121h-777eh.

UNICODE is a two-byte full encoding, and it also uses a two-byte representation for ASCII characters. The code page determines whether it is an ASCII character or a high byte of a Chinese character by a high byte range of values. If data corruption occurs, some content is corrupted, which can cause the confusion of Chinese characters later. Unicode uses two bytes to denote a character, the most obvious advantage is that it simplifies the process of processing Chinese characters.

The article on encoding can refer to:


Baidu's page is gb2312, URL encoding nature is also from the GB conversion, such as "one" the word, Baidu conversion results are D2%BB, and from Utf-8 conversion to the result is%e4%b8%80 such as Google (GB is 2 byte encoding, Utf-8 is 3 bytes variable length encoding)

You can use JavaScript encodeURI and decodeURI to get these results, and you can set the page code to see the different results.

Search on the Internet, also did not find a ready-made conversion procedures, had to write their own. Fortunately, there is no shortage of gb-utf on the Internet, modified a bit can be used: gb-utf.txt

This table is a 16-byte encoding to the UTF, rather than a byte encoding.

In JavaScript, Escape and unescape are converted into 16 encoding, so the conversion of GB to UTF Chinese characters is: encodeURI ("GB characters"), to find the UTF in the comparison table of the 16 coding, unescape (" 16 UTF Code "), get UTF Chinese characters.

The middle of that step is the most critical, my conversion only used this step, the other two direct calls to the two functions on it. The following is a translator:

function Gencodestr () {
var codere = new RegExp ("' (. *) ': ' (. *) '", "GI");
var tempstr,codestr = "";
var myreader = new Reader ();
Myreader.loadfile (' inc/gb2312_utf.txt '); change this to your table. The path to deposit
while (!myreader.fstream.atendofline) {
TempStr = new String (MyReader.fStream.readLine ());
Codestr + = Tempstr.replace (Codere, "$") + ":" + tempstr.replace (Codere, "$") + ":";
}
Application ("codeData") = Codestr;
}
function Getcodestr () {
var codestr = new String (Application ("CodeData"));
if (Codestr.indexof ("%a1%a1") = = 1) {
Gencodestr ();
}
return new String (Application ("CodeData"));
}
function Gb2utf (GBSTR) {
var codestr = Getcodestr ();
var codere = new RegExp (%.. %..) "," GI ");
var replacere = new RegExp (%.. %..) "," I ");
var Gbcode;
var Utfcode;
var Gbstart;
while ((Codere.lastindex < Gbstr.length) && Replacere.test (gbstr)) {
Codere.exec (GBSTR);
Gbcode = new String (regexp.$1);
Gbstart = new Number (Codestr.indexof (Gbcode.tolowercase ()));
var utfstart = 0;
if (Gbstart!=-1) {
utfstart= Gbstart + 7;
Utfcode = codestr.substring (Utfstart,utfstart + 6);
}else{
Utfcode = "%u3000";
}
Gbstr = Gbstr.replace (Replacere,utfcode);
}
return gbstr;
}

function Reader () {//class reader ()
THIS.FSO; Private FSO
This.furi; Private Furi
This.fstream; Private FStream
try{
This.fso = new ActiveXObject ("Scripting.FileSystemObject");
}catch (Exception) {
Throw exception;
}
This.loadfile = function (file) {//public loadfile (file)
This.furi = Server.MapPath (file);
var fstream = fso. CreateTextFile (Tfolder,true,false);
Fstream.writeline (' Test ');
if (this.fso.fileExists (This.furi)) {
This.fstream = This.fso.openTextFile (This.furi);
}else{
Response.Write (' File dos not exist ');
}
}
This.readlinen = function (num) {
var i = 1;
while (i < num &&!this.fstream.atendofline) {
This.fStream.skipLine ();
i++
}
return This.fStream.readLine ();
}
This.closefile = function () {
Fstream.close ();
Fso. Close;
}
}

The comparison table is Gb-utf.txt, you can change your own reading path. Also note that the above program needs to be run on the server side because it involves file operations.



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.