Summary of Chinese Ajax garbled characters

Last Update:2014-07-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Ajax Chinese garbled problem summary 5268 people read comments (1) collect and report ajaxurl1_criptservletcallback Server

This chapter solves common Chinese problems in Ajax, analyzes the causes of Chinese Garbled text, and how to solve Garbled text.

1. http protocol encoding provision

In HTTP, the browser cannot directly transmit certain special characters to the server. These characters must be URL encoded before being transmitted. URL encoding rules:

Convert space to (+)

The characters between 0-9, A-Z, and A-Z remain unchanged.

All other characters are encoded in the hex format in the memory using the current character set, and a percent sign (%) is added before each hex byte. For example, the character "+" is represented by % 2B, the character "=" is represented by % 3d, and the character "&" is represented by % 26, note that the character "country" is represented by % B9 % fa. The encoding value of the same Chinese Character in memory varies with the character set encoding mode, the URL encoding of a character is for the code value of the character in the memory. The results of URL encoding of the same character using different codes are different.

2. encodeuri () and encodeuricomponent () Functions

Javascript provides two functions for URL encoding: encodeuri () and encodeuricomponent (). The difference between the two is that the encodeuri function does not process the following characters: "! @ # $ & * () = :/;? + ', And the encodeuricomponent function will process more characters, for example, the URI component "/" will be processed by encodeuricomponent. These two methods to encode the passed value, the process is to find the character corresponding to the UTF-8 encoding, for example, the UTF-8 code of the word "zhangsan" is "0xe5bca0e4b889" (the front is Zero X, indicating a hexadecimal code ). "Zhang" is "0xe5bca0", "3" is "0xe4b889", then the converted result is

Yes "% E5 % BC % A0 % E4 % B8 % 89". Note that the conversion result has nothing to do with the webpage encoding, because these two functions always get the UTF-8 code corresponding to the character, and then perform URL encoding. That is to say, whether the webpage is GBK encoding or UTF-8 encoding, the conversion results are the same.

Therefore, if the request we send to the server contains Chinese characters or other special characters such as space "+", the user needs two functions to perform URL encoding on the characters.

3. encapsulate the Ajax Request Code for later use.

Create a new web project and add an Ajax. js file to the web project. The content contains the following two functions:

Createxmlhttp ()

Function createxmlhttp (){

If (window. XMLHttpRequest ){

// Alert ("non-IE browser ");

Return new XMLHttpRequest ();

} Else if (window. activexobject &&! Window. XMLHttpRequest ){

VaR aversion = ["msxml2.xmlhttp. 6.0 ",

"Msxml2.xmlhttp. 5.0", "msxml2.xmlhttp. 4.0 ",

"Msxml2.xmlhttp. 3.0", "msxml2.xmlhttp ",

"Microsoft. XMLHTTP"];

For (VAR I = 0; I <aversion. length; I ++ ){

Try {

VaR oxmlhttp = new activexobject (aversion [I]);

// Alert ("IE browser version" + aversion [I]);

Return oxmlhttp;

}

Catch (Ex ){}

}

Throw new error ("An error occurred while creating the XMLHTTPRequest object! ");

}

The doget (URL, callback) function has two parameters. You can call this method directly if you want to send an Ajax GET request in the future. The first parameter indicates the URL of the request to be sent, and the second parameter is the callback function. The callback function needs to process the data returned from the server.

/**

* @ Param URL the URL of the request

* @ Param callback function

* @ Return

Function doget (URL, callback ){

VaR request = createxmlhttp ();

Request. onreadystatechange = function (){

If (request. readystate = 4 & request. Status = 200 ){

// Note that when we define a callback function, we need to add another parameter to receive the returned data.

Callback (request. responsetext );

}

};

Request. Open ("get", URL );

Request. Send (null );

}

4. Compile the page that uses the character set UTF-8 encoding:

HTML section:

<Body>

<H3> verify whether the user name exists

Input Username: <input type = "text" id = "username"/> <span id = "warning"> </span> <br/>

</Body>

Javascript section:

First introduce the Ajax. js file, and then compile the code to be executed when the button is clicked:

Function checkusername (tagid ){

// Obtain the value entered in the text box

VaR username = Document. getelementbyid (tagid). value;

// URL encoding for Chinese Characters

① Var url = "Ajax. do? "+ Encodeuri (" username = "+ username );

// Data is the data returned from the server.

Doget (URL, function (data ){

Document. getelementbyid ("warning"). innerhtml = data;

});

}

</SCRIPT>

Page effect:

After entering "Zhang San" in the text box and clicking "verify", after the JavaScript code is executed to ①, the URL value is changed to "Ajax. do? Username = % E5 % BC % A0 % E4 % B8 % 89 ". You can use the firebug plug-in the Firefox browser to perform breakpoint debugging and obtain the sent URL value.

Why is the encodeuricomponent () function not used here? This is because the encodeuricomponent function will change "=" to "% 3d", "?" Changed to "% 3f". If there are multiple parameters, the "&" symbol will be used, and will also be converted. These characters can be submitted without conversion, so encodeuri is used here, this function will not be "? "," = "," & "For conversion. The following "% E5 % BC % A0 % E4 % B8 % 89" is the result of URL encoding of the three characters according to the UTF-8 character set.

5. Obtain the sent data from the server.

Compile a servlet. The servlet ing is/ajax. Do. The doget method is as follows:

Public void dopost (httpservletrequest request, httpservletresponse response)

Throws servletexception, ioexception {

// The encoding format that tells the client response information is UTF-8

Response. setcontenttype ("text/html; charset = UTF-8 ");

② String username = request. getparameter ("username ");

Printwriter out = response. getwriter ();

Out. Print ("the user name you want to verify is:" + username + ", this user name can be used ");

}

We place a breakpoint at ② and start Tomcat as a breakpoint. After the program is submitted, we find that the value of username obtained is: "???", Why is it garbled?

Let's analyze that the client Ajax wants the request sent by the server to be

"Ajax. do? Username = % E5 % BC % A0 % E4 % B8 % 89 ",

Request. the getparameter () method first needs to perform URL Decoding when obtaining the parameter value (in fact, it is to remove "%" in the character), after decoding, only the remaining bytes are converted to strings by Tomcat's internal default ISO-8859-1 character set, so garbled characters start to appear here. Because the sent byte after removing % of the remaining byte should follow the UTF-8 to convert the string, but uses the ISO-8859-1, so garbled.

After knowing the cause, it is easy to solve the problem. Since it is a string converted according to the ISO-8859-1, then we get the string to be restored to the byte of the ISO-8859-1, and then convert the byte to the string according to the correct UTF-8, in this way, the correct characters are obtained. Modify the servlet Code as follows:

Public void dopost (httpservletrequest request, httpservletresponse response)

Throws servletexception, ioexception {

// The encoding format that tells the client response information is UTF-8

Response. setcontenttype ("text/html; charset = UTF-8 ");

System. Out. println ("Enter servlet ");

String username = request. getparameter ("username ");

Username = new string (username. getbytes ("iso-8859-1"), "UTF-8 ");

System. Out. println (username );

Printwriter out = response. getwriter ();

Out. Print ("the user name you want to verify is:" + username + ", this user name can be used ");

}

The client response is:

6. Try changing the submission method to the post method.

Add a function to the Ajax. js file to submit a POST request.

/**

* @ Param URL the URL to be submitted

* @ Param submitdata the data to be submitted

* @ Param callback function

* @ Return

Function dopost (URL, submitdata, callback ){

VaR request = createxmlhttp ();

Request. onreadystatechange = function (){

If (request. readystate = 4 & request. Status = 200 ){

// Note that when we define a callback function, we need to add another parameter to receive the returned data.

Callback (request. responsetext );

}

};

Request. setRequestHeader ("Content-Type", "application/X-WWW-form-urlencoded ");

Request. Open ("Post", URL );

Request. Send (submitdata );

}

Modify the JavaScript code on the page:

Function checkusername (tagid ){

// Obtain the value entered in the text box

VaR username = Document. getelementbyid (tagid). value;

// Data is the data returned from the server.

Dopost ("Ajax. Do", "username =" + username, function (data ){

Document. getelementbyid ("warning"). innerhtml = data;

});

}

</SCRIPT>

When we send a POST request, even though we have set

Application/X-WWW-form-urlencoded, but the sent data is not URL encoded, while the traditional form Form submission method is set to post, the URL encoding is automatically performed when the request is submitted.

Therefore, when a POST request in Ajax is sent to the server, you only need to call reqeust. setcharacterencoding () to set the correct sequence set, and then you can retrieve the data.

7. Best Solution

Although the preceding methods solve the problem of the get and post methods in Chinese, they need to be processed separately. In addition, different servers have different default feature sets, in this way, we cannot use the get Method for manual transcoding.

Is there a uniform solution for both get requests and post requests? We can do the following:

Use the Javascript encodeuri () to encode the submitted data twice.

The server performs URL Decoding once.

The advantage of this method is that it has nothing to do with the collection of client web pages, has nothing to do with the default collection of servers, and is compatible with almost all browsers.

The following uses the get method as an example to understand the entire process of analysis:

Modify the JavaScript code:

Function checkusername (tagid ){

// Obtain the value entered in the text box

VaR username = Document. getelementbyid (tagid). value;

// Data is the data returned from the server.

VaR url = "Ajax. do? Username = "+ encodeuri (username ));

Doget (, function (data ){

Document. getelementbyid ("warning"). innerhtml = data;

});

}

</SCRIPT>

Modify the servlet code:

Public void dopost (httpservletrequest request, httpservletresponse response)

Throws servletexception, ioexception {

// The encoding format that tells the client response information is UTF-8

Response. setcontenttype ("text/html; charset = UTF-8 ");

String username = request. getparameter ("username ");

Username = urldecoder. Decode (username, "UTF-8 ");

System. Out. println (username );

Printwriter out = response. getwriter ();

Out. Print ("the user name you want to verify is:" + username + ", this user name can be used ");

}

After running, there is no garbled problem in various browsers. The post method does not cause garbled characters. If the page is changed to GBK encoding, there is no garbled problem.

Why is there no problem with this method? Why do I need to encodeuri twice? We only need to track the submitted data:

Assume that we submit "James ":

① The result after we perform encodeuri for the first time is:

% E6 % 9d % 8e % E5 % 9B % 9B

② The result after the second encodeuri is:

% 25e6% 259d % 258e % 25e5% 259b % 259b

③ We compare the two values and find that there are % in the middle after the first URL encoding, and after the second URL encoding, replace % in the first encoding result with % 25, so the final data sent is:

Ajax. do? Username = % 25e6% 259d % 258e % 25e5% 259b % 259b

④ In the server servlet, we call the request. when getparameter ("username") is set, the getparameter method decodes the URL of % 25e6% 259d % 258e % 25e5% 259b % 259b, the decoded result is % E6 % 9d % 8e % E5 % 9B % 9B, that is, % 25 is replaced with %, so at this time Tomcat server according to the default iso-8859-1 conversion string does not do any transformation at all, or % E6 % 9d % 8e % E5 % 9B % 9B

⑤ When we perform URL Decoding again: urldecoder. decode (username, "UTF-8"), in this case, remove the % into e69d8ee59b9b, which is exactly the UTF-8 code of "Zhang San", so use the UTF-8 code to convert to the string "Zhang San".

From the perspective of the whole process, the advantage of this method is that it has nothing to do with the page encoding, but also with the region set used by the server. All we need to do is to encodeuri the submitted data (whether post data or get data) twice.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Summary of Chinese Ajax garbled characters

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Summary of Chinese Ajax garbled characters

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support