Web garbled Solution

Source: Internet
Author: User
Tags i18n

Preface: My previous attitude to garbled is the Internet search solution, find an experiment, until found a solution to the line, also do not go to careful study, the result is every time garbled, I do not know how to solve, there are repeating the previous steps,

Find an experiment one. This kind of learning attitude is very undesirable, not only learn things, but also greatly reduced their enthusiasm for learning, now I want to completely solve it.

basic knowledge of coding

Encoding and decoding: What is the first explicit code? Code is a byte array, so the encoding (encode) is to get the byte array from the string, and decoding (decode) is to get the string from the byte array, in the encoding and decoding process requires a standard, actually is

The correspondence between bytes and characters, this is the character set. In Java, the string class has two common methods for encoding and decoding

GetBytes: For example, "Medium". GetBytes ("character set"), encoded according to the specified character set

String (bytes[], "character set"): Decodes a byte array based on a custom character set

There is also a concept for URI encoding and URI decoding, but URI encoding and decoding is not a conversion between a string and a byte stream, but rather a string representing another string, for example:

The UTF-8 URI of "medium" is encoded as%e4%b8%ad

String%e4%b8%ad URI decoded to character "medium" according to UTF-8

As can be seen, URI encoding is a string that is represented by a string of encoded%+ corresponding to the set of characters, the method of URI encoding and URI decoding in Java

Urlencoder.encode ("Dumb", "Utf-8")

Urldecoder.decode ("%E5%82%BB%E9%80%BC", "GBK");

In addition, the common method for URI coding in JS is: encodeURIComponent () function and encodeURI () function, see another note for their differences.

The reason for garbled:

The web is based on the request response, and when we send a request to the server through the browser, we carry the request parameters, and the parameters need to be converted to a byte stream to propagate on the network, that is, the encoding, so the browser

According to certain principles to determine the request parameter encoding according to a certain character set, when the request parameters passed to the server, the server will select the response character set according to a certain principle of the browser to pass the byte stream

To decode to a string. Understand this process we found the problem, if the two times encoding and decoding based on the character set is different, then nature will appear garbled , which is the nature of garbled appearance.

Let's start by looking at how the browser is encoded:

There are three ways to submit a request to a server: 1. Submit a Form 2. Hyperlink 3.ajax

1. The submission form is also divided into post and get two kinds of ways

When the Post method is used, the browser sends the string in the form to the server using the page's character set encoded as a byte stream.

When the Get method is used, the browser first sends the values in the form to the application server after the URL of the page is encoded into the line URI and then stitched to the action URLs.

So you can know whether post or get, encoding is determined by the character set specified on the page.

2.ajax

This is compared to the egg pain, because the encoding method set on the page for Ajax does not work, I do the fiber map of the project to eat a loss, but always according to Utf-8 to encode .

3. Hyperlinks

For hyperlinks, if you carry Chinese, the browser will encode the URI, but the encoding is related to the client's environment. therefore, in order to avoid the browser to make an indeterminate URI encoding, you need to encode the Chinese in the program after the URI is placed in the URL.

The following is the server-side decoding method:

The following is for Tomcat:

For post submissions (including forms and Ajax Post), the encoding is determined by the request.setcharacterencoding () method, which should be called before the GetParameter method is used in the servlet.

in struts is through < constant name="struts.i18n.encoding" value="GBK"></constant> This constant to set.

For get and hyper-connect scenarios, here's a reference to what others say:

tomcat5.5 GetParameter gets The arguments passed by the Get method or hyperlink by default is decoded with iso8859-1, for example, the browser sends a UTF-8 encoded request,tomcat5.5 getparameter use iso8859-1 decoding, the result is wrong, if you want to get the correct value, need to tomcat5.5 getparameter when the UTF-8 to decode, by setting uriencoding= "UTF-8" or usebodyencodingforuri= "true"will allow Tomcat to GetParameter with UTF-8 decoding (usebodyencodingforuri= "true" means that the decoded character set is in the same character set as the page encoding), for the case of hyperlinks, On the server side, you should also decode the URI that starts the URI encoding.

< Span lang= "en-US" > < Span lang= "en-US" > Through the above analysis, We can find that as long as the encoding and decoding of the character set to the same, there will be no garbled, so the summary is:

< Span lang= "en-US" > < Span lang= "en-US" >1. Sets the encoding of the request page, pageencoding= "GBK"

< Span lang= "en-US" > < Span lang= "en-US" >2. For hyperlinks with encodeuri ("China", "GBK") to transcode URIs.

3. On the server side set request.setcharacterencoding () or <constant name="struts.i18n.encoding "value= "GBK "></constant>

4. Set uriencoding= "UTF-8" or usebodyencodingforuri= "true" in Tomcat

So far, we have one more problem with Ajax coding, if we adopt the unified code for UTF-8, then Ajax is not a problem, but if the unified code for GBK, there is a problem, because Ajax in the browser side is always

by Utf-8 encoding, and on the server side we set the unified code for GBK, naturally garbled, we can actually analyze is

String---"uft-8 encoding---" GBK decoding caused by garbled, we first look at this idea: can reverse the conversion back, first use GBK encoding, and then use Utf-8 decoding, such as

 byte  [] buf = "China". GetBytes ("Utf-8"); //         using UFT-8 encoding  String str1 = new  String (buf, "GBK"); //         decode  using GBK System.out.println (STR1); //         print out garbled  byte  [] Buf2 = Str1.getbytes ("GBK"); //         using GBK encoding  String str2 = new  String (buf2, "utf-8"); //         decode  using Utf-8 System.out.println (STR2); //  print out China  

This approach seems reasonable, but it is wrong . because if garbled, may not be encoded by garbled string will be the original byte stream, such as the above example, if not "China", but "Chinese", or "Chinese", can not get the correct results.

So what is feasible is the following,

The Chinese string URI encoding first becomes an ASCII string, so no matter what encoding character set can be used to get this newly generated ASCII string on the server side, and then the ASCII string itself at the beginning of the URI encoding

Character set to URI decoding, you can guarantee to get the correct Chinese parameters.

Another way to do this is to treat the hyperlinks as if the parameters were URI-encoded once before the client sends them, making the arguments an ASCII string, so

You can use any character set encoding decoding ASCII string to get the same result principle, when the server side using a different character set decoding still get the original parameter URI encoded by the ASCII string, and finally in the URI to decode it at once.

        String a = Urlencoder.encode ("To say", "utf-8"); // impersonation is encoded once before the client is sent, and the parameter becomes an ASCII string                byte [] buf = A.getbytes ("Utf-8");; // Analog Browser-to-parameter encoding New String (buf, "GBK");           // analog server decoding of parameters                  = Urldecoder.decode (b, "Utf-8"); // Finally, I'm decoding the parameters .                 System.out.println (c); // print "Tell Me"

Finally, let's say why we don't need to use spring's encoding filter, which is in the Spring encoding filter:

 Public voidPrepare (HttpServletRequest request, httpservletresponse response) {String encoding=NULL; if(Defaultencoding! =NULL) {Encoding=defaultencoding; }           //omitted some code.          if(Encoding! =NULL) {              Try{request.setcharacterencoding (encoding);//set the character set encoding}Catch(Exception e) {log.error ("Error setting character encoding to '" + Encoding + "'-ignoring.", E); }          }      //omitted some code.}

We can see that it's actually called

request.setcharacterencoding (encoding); it's not much different from what we do.

Set the browser's page encoding

The server sent to the browser is encoded into a stream of bytes transmitted over the network, the browser receives a byte stream after the use of the specified character set decoded into a string to show, if the two-link character set inconsistency will also lead to garbled problems,

For example, static HTML files or JSPs are stored in UTF-8, you need to tell the browser to use UTF-8 to decode,

    • If JSP can be set by <%@ page contenttype= "text/html; charset=utf-8" language= "java"%> ,
    • Static files can be set by <meta http-equiv= "Content-type" content= "text/html; charset=utf-8"/> ,
    • If the output is directly in the servlet, it can be response.setcharacterencoding ("UTF-8"), setContentType ("Text/html;charset=utf-8"), SetHeader ("Content-type", "Text/html;charset=utf-8") is set,

These actions are equivalent to adding "content-type:text/html;charset=utf-8" information to the head of response,

The priority of the encoded information in the header is higher than the META tag of the HTML, i.e. if setContentType ("Text/html;charset=utf-8") is set in the Serlvet, the JSP is set to <meta http-equiv= "Content-type" content= "text/html; CHARSET=GBK "/> The browser will be decoded according to the UTF-8 character set,

Web garbled Solution

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.