Ajax Chinese garbled problem summary 5268 people read comments (1) collect and report ajaxurl1_criptservletcallback Server
This chapter solves common Chinese problems in Ajax, analyzes the causes of Chinese Garbled text, and how to solve Garbled text.
1. http protocol encoding provision
In HTTP, the browser cannot directly transmit certain special characters to the server. These characters must be URL encoded before being transmitted. URL encoding rules:
Convert space to (+)
The characters between 0-9, A-Z, and A-Z remain unchanged.
All other characters are encoded in the hex format in the memory using the current character set, and a percent sign (%) is added before each hex byte. For example, the character "+" is represented by % 2B, the character "=" is represented by % 3d, and the character "&" is represented by % 26, note that the character "country" is represented by % B9 % fa. The encoding value of the same Chinese Character in memory varies with the character set encoding mode, the URL encoding of a character is for the code value of the character in the memory. The results of URL encoding of the same character using different codes are different.
2. encodeuri () and encodeuricomponent () Functions
Javascript provides two functions for URL encoding: encodeuri () and encodeuricomponent (). The difference between the two is that the encodeuri function does not process the following characters: "! @ # $ & * () = :/;? + ', And the encodeuricomponent function will process more characters, for example, the URI component "/" will be processed by encodeuricomponent. These two methods to encode the passed value, the process is to find the character corresponding to the UTF-8 encoding, for example, the UTF-8 code of the word "zhangsan" is "0xe5bca0e4b889" (the front is Zero X, indicating a hexadecimal code ). "Zhang" is "0xe5bca0", "3" is "0xe4b889", then the converted result is
Yes "% E5 % BC % A0 % E4 % B8 % 89". Note that the conversion result has nothing to do with the webpage encoding, because these two functions always get the UTF-8 code corresponding to the character, and then perform URL encoding. That is to say, whether the webpage is GBK encoding or UTF-8 encoding, the conversion results are the same.
Therefore, if the request we send to the server contains Chinese characters or other special characters such as space "+", the user needs two functions to perform URL encoding on the characters.
3. encapsulate the Ajax Request Code for later use.
Create a new web project and add an Ajax. js file to the web project. The content contains the following two functions:
Createxmlhttp ()
Function createxmlhttp (){
If (window. XMLHttpRequest ){
// Alert ("non-IE browser ");
Return new XMLHttpRequest ();
} Else if (window. activexobject &&! Window. XMLHttpRequest ){
VaR aversion = ["msxml2.xmlhttp. 6.0 ",
"Msxml2.xmlhttp. 5.0", "msxml2.xmlhttp. 4.0 ",
"Msxml2.xmlhttp. 3.0", "msxml2.xmlhttp ",
"Microsoft. XMLHTTP"];
For (VAR I = 0; I <aversion. length; I ++ ){
Try {
VaR oxmlhttp = new activexobject (aversion [I]);
// Alert ("IE browser version" + aversion [I]);
Return oxmlhttp;
}
Catch (Ex ){}
}
}
Throw new error ("An error occurred while creating the XMLHTTPRequest object! ");
}
The doget (URL, callback) function has two parameters. You can call this method directly if you want to send an Ajax GET request in the future. The first parameter indicates the URL of the request to be sent, and the second parameter is the callback function. The callback function needs to process the data returned from the server.
/**
* @ Param URL the URL of the request
* @ Param callback function
* @ Return
*/
Function doget (URL, callback ){
VaR request = createxmlhttp ();
Request. onreadystatechange = function (){
If (request. readystate = 4 & request. Status = 200 ){
// Note that when we define a callback function, we need to add another parameter to receive the returned data.
Callback (request. responsetext );
}
};
Request. Open ("get", URL );
Request. Send (null );
}
4. Compile the page that uses the character set UTF-8 encoding:
HTML section:
<Body>
<H3> verify whether the user name exists
Input Username: <input type = "text" id = "username"/> <span id = "warning"> </span> <br/>
<Input type = "button" value = "verify" onclick = "checkusername ('username')"/>
</Body>
Javascript section:
First introduce the Ajax. js file, and then compile the code to be executed when the button is clicked:
<SCRIPT type = "text/JavaScript" src = "Ajax. js"> </SCRIPT>
<SCRIPT type = "text/JavaScript">
Function checkusername (tagid ){
// Obtain the value entered in the text box
VaR username = Document. getelementbyid (tagid). value;
// URL encoding for Chinese Characters
① Var url = "Ajax. do? "+ Encodeuri (" username = "+ username );
// Data is the data returned from the server.
Doget (URL, function (data ){
Document. getelementbyid ("warning"). innerhtml = data;
});
}
</SCRIPT>
Page effect:
After entering "Zhang San" in the text box and clicking "verify", after the JavaScript code is executed to ①, the URL value is changed to "Ajax. do? Username = % E5 % BC % A0 % E4 % B8 % 89 ". You can use the firebug plug-in the Firefox browser to perform breakpoint debugging and obtain the sent URL value.
Why is the encodeuricomponent () function not used here? This is because the encodeuricomponent function will change "=" to "% 3d", "?" Changed to "% 3f". If there are multiple parameters, the "&" symbol will be used, and will also be converted. These characters can be submitted without conversion, so encodeuri is used here, this function will not be "? "," = "," & "For conversion. The following "% E5 % BC % A0 % E4 % B8 % 89" is the result of URL encoding of the three characters according to the UTF-8 character set.
5. Obtain the sent data from the server.
Compile a servlet. The servlet ing is/ajax. Do. The doget method is as follows:
Public void dopost (httpservletrequest request, httpservletresponse response)
Throws servletexception, ioexception {
// The encoding format that tells the client response information is UTF-8
Response. setcontenttype ("text/html; charset = UTF-8 ");
② String username = request. getparameter ("username ");
Printwriter out = response. getwriter ();
Out. Print ("the user name you want to verify is:" + username + ", this user name can be used ");
}
We place a breakpoint at ② and start Tomcat as a breakpoint. After the program is submitted, we find that the value of username obtained is: "???", Why is it garbled?
Let's analyze that the client Ajax wants the request sent by the server to be
"Ajax. do? Username = % E5 % BC % A0 % E4 % B8 % 89 ",
Request. the getparameter () method first needs to perform URL Decoding when obtaining the parameter value (in fact, it is to remove "%" in the character), after decoding, only the remaining bytes are converted to strings by Tomcat's internal default ISO-8859-1 character set, so garbled characters start to appear here. Because the sent byte after removing % of the remaining byte should follow the UTF-8 to convert the string, but uses the ISO-8859-1, so garbled.
After knowing the cause, it is easy to solve the problem. Since it is a string converted according to the ISO-8859-1, then we get the string to be restored to the byte of the ISO-8859-1, and then convert the byte to the string according to the correct UTF-8, in this way, the correct characters are obtained. Modify the servlet Code as follows:
Public void dopost (httpservletrequest request, httpservletresponse response)
Throws servletexception, ioexception {
// The encoding format that tells the client response information is UTF-8
Response. setcontenttype ("text/html; charset = UTF-8 ");
System. Out. println ("Enter servlet ");
String username = request. getparameter ("username ");
Username = new string (username. getbytes ("iso-8859-1"), "UTF-8 ");
System. Out. println (username );
Printwriter out = response. getwriter ();
Out. Print ("the user name you want to verify is:" + username + ", this user name can be used ");
}
The client response is:
6. Try changing the submission method to the post method.
Add a function to the Ajax. js file to submit a POST request.
/**
*
* @ Param URL the URL to be submitted
* @ Param submitdata the data to be submitted
* @ Param callback function
* @ Return
*/
Function dopost (URL, submitdata, callback ){
VaR request = createxmlhttp ();
Request. onreadystatechange = function (){
If (request. readystate = 4 & request. Status = 200 ){
// Note that when we define a callback function, we need to add another parameter to receive the returned data.
Callback (request. responsetext );
}
};
Request. setRequestHeader ("Content-Type", "application/X-WWW-form-urlencoded ");
Request. Open ("Post", URL );
Request. Send (submitdata );
}
Modify the JavaScript code on the page:
<SCRIPT type = "text/JavaScript" src = "Ajax. js"> </SCRIPT>
<SCRIPT type = "text/JavaScript">
Function checkusername (tagid ){
// Obtain the value entered in the text box
VaR username = Document. getelementbyid (tagid). value;
// Data is the data returned from the server.
Dopost ("Ajax. Do", "username =" + username, function (data ){
Document. getelementbyid ("warning"). innerhtml = data;
});
}
</SCRIPT>
When we send a POST request, even though we have set
Application/X-WWW-form-urlencoded, but the sent data is not URL encoded, while the traditional form Form submission method is set to post, the URL encoding is automatically performed when the request is submitted.
Therefore, when a POST request in Ajax is sent to the server, you only need to call reqeust. setcharacterencoding () to set the correct sequence set, and then you can retrieve the data.
7. Best Solution
Although the preceding methods solve the problem of the get and post methods in Chinese, they need to be processed separately. In addition, different servers have different default feature sets, in this way, we cannot use the get Method for manual transcoding.
Is there a uniform solution for both get requests and post requests? We can do the following:
Use the Javascript encodeuri () to encode the submitted data twice.
The server performs URL Decoding once.
The advantage of this method is that it has nothing to do with the collection of client web pages, has nothing to do with the default collection of servers, and is compatible with almost all browsers.
The following uses the get method as an example to understand the entire process of analysis:
Modify the JavaScript code:
<SCRIPT type = "text/JavaScript" src = "Ajax. js"> </SCRIPT>
<SCRIPT type = "text/JavaScript">
Function checkusername (tagid ){
// Obtain the value entered in the text box
VaR username = Document. getelementbyid (tagid). value;
// Data is the data returned from the server.
VaR url = "Ajax. do? Username = "+ encodeuri (username ));
Doget (, function (data ){
Document. getelementbyid ("warning"). innerhtml = data;
});
}
</SCRIPT>
Modify the servlet code:
Public void dopost (httpservletrequest request, httpservletresponse response)
Throws servletexception, ioexception {
// The encoding format that tells the client response information is UTF-8
Response. setcontenttype ("text/html; charset = UTF-8 ");
String username = request. getparameter ("username ");
Username = urldecoder. Decode (username, "UTF-8 ");
System. Out. println (username );
Printwriter out = response. getwriter ();
Out. Print ("the user name you want to verify is:" + username + ", this user name can be used ");
}
After running, there is no garbled problem in various browsers. The post method does not cause garbled characters. If the page is changed to GBK encoding, there is no garbled problem.
Why is there no problem with this method? Why do I need to encodeuri twice? We only need to track the submitted data:
Assume that we submit "James ":
① The result after we perform encodeuri for the first time is:
% E6 % 9d % 8e % E5 % 9B % 9B
② The result after the second encodeuri is:
% 25e6% 259d % 258e % 25e5% 259b % 259b
③ We compare the two values and find that there are % in the middle after the first URL encoding, and after the second URL encoding, replace % in the first encoding result with % 25, so the final data sent is:
Ajax. do? Username = % 25e6% 259d % 258e % 25e5% 259b % 259b
④ In the server servlet, we call the request. when getparameter ("username") is set, the getparameter method decodes the URL of % 25e6% 259d % 258e % 25e5% 259b % 259b, the decoded result is % E6 % 9d % 8e % E5 % 9B % 9B, that is, % 25 is replaced with %, so at this time Tomcat server according to the default iso-8859-1 conversion string does not do any transformation at all, or % E6 % 9d % 8e % E5 % 9B % 9B
⑤ When we perform URL Decoding again: urldecoder. decode (username, "UTF-8"), in this case, remove the % into e69d8ee59b9b, which is exactly the UTF-8 code of "Zhang San", so use the UTF-8 code to convert to the string "Zhang San".
From the perspective of the whole process, the advantage of this method is that it has nothing to do with the page encoding, but also with the region set used by the server. All we need to do is to encodeuri the submitted data (whether post data or get data) twice.