Java on the Web page encoding and decoding processing and Chinese URL garbled solution _java

Source: Internet
Author: User
Tags form post tomcat server

Coding & Decoding
The following figure allows us to find out where there is a transcoding in the Javaweb:

The user wants the server to send an HTTP request, needs to encode the place to have the URL, the cookie, the parameter, after encodes the server to accept the HTTP request, resolves the HTTP request, then decodes the URL, the cookie, the parameter. It may be necessary to read the database, local files, or other files in the network while the server is doing business logic processing, and all of these processes need to be encoded and decoded. When processing is complete, the server encodes the data and sends it to the client, which is decoded and displayed to the user. In this whole process involved in the coding decoding more places, where the most prone to garbled location is the server and client interaction process.
The whole process can be summed up in this way, the page encoded data to the server, the server to obtain data decoding operations, after a business logic processing will be the final result of the code processing passed to the client, client decoding display to the user. So I'm going to ask for an elaboration of Javaweb coding & decoding.
Request
The client wants to send a request to the server simply through the four:
1, the URL way direct access.
2, page links.
3. Form Get submit
4. Form Post Submission
Url method
For URLs, if the URL is all in English that is not a problem, if there is Chinese will involve coding. How to encode? According to what rules to encode? And how to decode it? Here are the answers! First look at the part of the URL:

In this URL the browser will encode the path and parameter. To better explain the coding process, use the following URL
http://127.0.0.1:8080/perbank/, I'm cm?name=, I'm cm.
Enter the above address into the browser URL input box, by looking at the HTTP header information we can see how the browser is encoded. The following is the code for IE, Firefox, and Chrome three browsers:

You can see that the major browsers encode "I Am" as follows:


Path section

Query String

Firefox

E6 E6 AF

E6 E6 AF

Chrome

E6 E6 AF

E6 E6 AF

Ie

E6 E6 AF

CE D2 CA C7


Check the code of the previous blog to know that for the path part of Firefox, Chrome, ie are used UTF-8 encoding format, for query string part of Firefox, Chrome using Utf-8,ie using GBK. As for why to add%, this is because the code specification for the URL requires the browser to encode the ASCII character Fu Fei ASCII characters into a 16-digit number in a coded format and then add "%" to the byte before each of the 16 binary representations.
Of course, for different browsers, different versions of the same browser, different operating systems and other environments will result in different coding results, in a certain case, the URL coding rules for any conclusions are premature. Because the major browsers, each operating system of the URL Uri, querystring encoding may be different, so the decoding of the server is bound to cause a lot of trouble, the following we will have Tomcat, see how tomcat is the URL to decode operation.
The URL to parse the request is in the Org.apache.coyote.HTTP11.InternalInputBuffer Parserequestline method, which sets the byte[of the URL passed over to the org.apache.co Yote. The corresponding properties of the Request. The URL here is still in byte format, and the conversion to char is done in the Org.apache.catalina.connector.CoyoteAdapter Converturi method:

protected void Converturi (Messagebytes URI, request request) throws Exception {bytechunk BC = Uri.getbytechun 
     K (); 
     int length = Bc.getlength (); 
     Charchunk cc = Uri.getcharchunk (); 
     Cc.allocate (length,-1);  String enc = connector.geturiencoding (); 
      Gets the URI decoding set if (enc!= null) {b2cconverter conv = Request.geturiconverter (); 
        try {if (CONV = = null) {conv = new b2cconverter (ENC); 
       Request.seturiconverter (CONV); 
      } catch (IOException e) {...} 
        if (conv!= null) {try {conv.convert (BC, CC, Cc.getbuffer (). Length-cc.getend ()); 
        Uri.setchars (Cc.getbuffer (), Cc.getstart (), cc.getlength ()); 
       Return 
      The catch (IOException e) {...} 
     }//Default encoding:fast conversion byte[] Bbuf = Bc.getbuffer (); 
     char[] Cbuf = Cc.getbuffer (); 
     int start = Bc.getstart (); for (int i = 0; i < length; i++) {Cbuf[i] = (Char) (Bbuf[i + start] & 0xff); 
 } uri.setchars (cbuf, 0, length); 
 }

From the above code, the decoding operation of the URI is to get the connector decoding set first, which is in Server.xml

<connector uriencoding= "Utf-8"/> 

If not defined, the default encoding iso-8859-1 is used to resolve.
For the query string section, we know that regardless of whether we commit by GET or post, all parameters are stored in parameters, Then we pass the Request.getparameter, the decoding work is done the first time the GetParameter method is invoked. Within the GetParameter method it calls the Org.apache.catalina.connector.Request Parseparameters method, which decodes the passed arguments. The following code is only part of the Parseparameters method:

Gets the encoded 
 String enc = getcharacterencoding (); 
 Gets the Charset 
 boolean usebodyencodingforuri = Connector.getusebodyencodingforuri () defined in ContentType. 
 if (enc!= null) {//If the set encoding is not empty, the setting is encoded as ENC 
  parameters.setencoding (ENC); 
  if (Usebodyencodingforuri) {//If Chartset is set, the decoding of QueryString is set to Chartset 
   parameters.setquerystringencoding (enc );  
  } 
 } else {  //Set default decoding mode 
  parameters.setencoding (Org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING); 
  if (Usebodyencodingforuri) { 
   parameters.setquerystringencoding (org.apache.coyote.Constants.DEFAULT_ character_encoding); 
  } 
  

From the code above, you can see that the decoding format for query string either takes the set Chartset or the default decoding format iso-8859-1. Note that the chartset of this setting is contenttype defined in the HTTP header, and if we need to change the specified attribute to take effect, we need to configure the following:

<connector uriencoding= "UTF-8" usebodyencodingforuri= "true"/> 

The

  above provides a detailed description of the encoding and decoding process requested by the URL method. In fact for us, we are more in the form of form to submit.
Form Get
  We know that submitting data by URL is very easy to generate garbled problems, so we tend to be more likely to use form forms. When the user clicks submit Submission form, the browser will set the code to encode the data to the server. Data submitted through get is spliced behind the URL (can be used as a query String??). To submit, so the Tomcat server plays a role in the decoding process uriencoding. The tomcat server decodes according to the set uriencoding, and if not set, the default iso-8859-1 is used to decode it. If we set the encoding to UTF-8 on the page, and the uriencoding set is not or not set, then the server will decode the garbled. At this time we can generally get the correct data in the form of the new String ("Request.getparameter" ("name"). GetBytes ("Iso-8859-1"), "Utf-8".
Form Post
  for post, the encoding it uses is determined by the page, which is contenttype. When I submit the form by clicking the Submit button on the page, the browser first encodes the parameters of the post form and submits it to the server according to the Ontenttype CharSet encoding format. The server side is also decoded using the character set in ContentType (this is different from Get), which is that the parameters submitted by the Post form generally do not have garbled problems. Of course this character set encoding we can set ourselves: Request.setcharacterencoding (CharSet).


Resolve URL Chinese garbled problem
we submit requests to the server mainly in two forms: URL, form. and form form generally will not appear garbled problem, garbled problem is mainly in the URL above. Through the previous blog introduction we know that the URL to send the request to the server coding process is really too confusing. Different operating systems, different browsers, and different Web page character sets will result in completely different coding results. Is it too scary for programmers to take every outcome into account? Is there a way to ensure that clients only use one encoding method to send requests to the server?
Yes! Here I mainly provide the following several methods
JavaScript
use JavaScript encoding to not give the browser a chance to intervene, then send the request to the server after encoding and then decode it in the server. In mastering this method, we need three methods that are encoded with javascript: Escape (), encodeURI (), encodeURIComponent ().
Escape
encodes the specified string using the Sio Latin character set. All non-ASCII characters are encoded into%XX-formatted strings, where XX represents the 16-digit number that the character corresponds to in the character set. For example, the encoding for the format corresponds to%20. Its corresponding decoding method is unescape ().

In fact, escape () cannot be used directly for URL encoding, and its true function is to return a Unicode encoded value of one character. For example, the result of "I am cm" above is%U6211%U662FCM, where "I" corresponds to a code of 6211, "yes" is encoded 662F, and "CM" is encoded as cm.
Note that escape () does not encode "+". But we know that when the Web page submits the form, if there are spaces, it will be converted to the + character. When the server processes the data, the + number is processed into spaces. So be careful when you use it.
encodeURI
encodes the entire URL, which uses the UTF-8 format to output the encoded string. However, encodeURI is not encoded for some special characters except ASCII encoding such as:! @ # $& * () =:/;? + '.

encodeURIComponent
converts the URI string into a string in escape format using the UTF-8 encoding format. As opposed to encodeuri,encodeuricomponent, it will be more powerful for symbols that are not encoded in the encodeURI () (; / ? : @ & = + $, #) will all be encoded. However, encodeURIComponent will only encode the components of the URL individually, not the entire URL. The corresponding decoding function method is decodeuricomponent.
Of course, we usually use the encodeURI side to encode operations. The so-called JavaScript two-time coding background two times decoding is the use of this method. JavaScript solves the problem with a one-time transcoding and two-time transcoding methods.
One turn code
JavaScript Transfer code:

var url = ' <s:property value= ' webpath '/>/showmoblieqrcode.servlet?name= I am cm '; 

URL:HTTP://127.0.0.1:8080/PERBANK/SHOWMOBLIEQRCODE.SERVLET?NAME=%E6%88%91%E6%98%AFCM after the turn code
Background processing:

String name = Request.getparameter ("name"); 
System.out.println ("Foreground incoming parameter:" + name); 
name = new String (name.getbytes ("iso-8859-1"), "UTF-8"); 
System.out.println ("After decoding parameter:" + name); 

Output results:
Foreground incoming parameters:?????? Cm
After decoding the parameters: I am cm
Two times turn code
Javascript

var url = ' <s:property value= ' webpath '/>/showmoblieqrcode.servlet?name= I am cm '; 
Window.location.href = encodeURI (encodeURI (URL)); 

URL:HTTP://127.0.0.1:8080/PERBANK/SHOWMOBLIEQRCODE.SERVLET?NAME=%25E6%2588%2591%25E6%2598%25AFCM after the turn code
Background processing:

String name = Request.getparameter ("name"); 
System.out.println ("Foreground incoming parameter:" + name); 
Name = Urldecoder.decode (name, "UTF-8"); 
System.out.println ("After decoding parameter:" + name); 

Output results:
Foreground incoming parameters: E68891E698AFCM
After decoding the parameters: I am cm

Filter
using filters, the filter provides two kinds, the first one is coded and the second is decoded directly in the filter.
Filter 1
The filter directly sets the encoding format of the request.

public class Characterencoding implements Filter { 
 
 private filterconfig config; 
 String encoding = NULL; 
  
 public void Destroy () { 
  config = null; 
 } 
 
 public void Dofilter (ServletRequest request, servletresponse response, 
   Filterchain chain) throws IOException, servletexception { 
  request.setcharacterencoding (encoding); 
  Chain.dofilter (request, response); 
 
 public void init (Filterconfig config) throws servletexception { 
  this.config = config; 
  Gets the configuration parameter 
  String str = config.getinitparameter ("encoding"); 
  if (str!=null) { 
   encoding = str; 
  } 
 } 
 
} 

Configuration:

<!--Chinese filter configuration--> 
 <filter> 
  <filter-name>chineseEncoding</filter-name> 
  < filter-class>com.test.filter.characterencoding</filter-class> 
   
  <init-param> 
   < param-name>encoding</param-name> 
   <param-value>utf-8</param-value> 
  </init-param > 
 </filter> 
  
 <filter-mapping> 
  <filter-name>chineseencoding</filter-name > 
  <url-pattern>/*</url-pattern> 
 </filter-mapping> 

Filter 2
The filter decodes the parameters directly in the processing method, and then the decoded parameters are reset to the attribute of the request.

public class Characterencoding implements Filter {protected Filterconfig filterconfig; 
  
 String encoding = NULL; 
 public void Destroy () {this.filterconfig = null; 
 }/** * Initialization/public void init (Filterconfig filterconfig) {this.filterconfig = Filterconfig; /** * Convert INSTR to UTF-8 encoded form * * @param inStr Input String * @return UTF-8 encoded String * @throws unsupported 
  Encodingexception */private string Toutf (String inStr) throws Unsupportedencodingexception {string outstr = ""; 
  if (inStr!= null) {outstr = new String (instr.getbytes ("iso-8859-1"), "UTF-8"); 
 return outstr; /** * Chinese garbled filter processing * * public void Dofilter (ServletRequest servletrequest, Servletresponse servletresponse, F Ilterchain chain) throws IOException, servletexception {httpservletrequest request = (httpservletrequest) servletr 
  Equest; 
 
  HttpServletResponse response = (httpservletresponse) servletresponse; The way the request was obtained (1.post or 2. Get), according to different requests for different processing String method = Request.getmethod (); 1. A post-submitted request that directly sets the encoding to UTF-8 if (Method.equalsignorecase ("POST")) {try {request.setcharacterencoding ("utf- 
   8 "); 
   catch (Unsupportedencodingexception e) {e.printstacktrace (); }//2. 
   Requests that are submitted in Get are {//Take out the set of parameters submitted by the customer enumeration<string> Paramnames = Request.getparameternames (); 
    Traversal parameter set takes out the name and value of each parameter while (Paramnames.hasmoreelements ()) {String name = Paramnames.nextelement ();//Fetch parameter name String values[] = request.getparametervalues (name); Remove value by parameter name//If parameter value set is not empty if (values!= null) {//traversal parameter value set for (int i = 0; i < values.length; i++ 
       {The try {//Loop loops each value call Toutf (Values[i]) method to convert the character encoding of the parameter value string vlustr = Toutf (values[i)); 
      Values[i] = vlustr; 
      catch (Unsupportedencodingexception e) {e.printstacktrace (); The value is hidden in the form of a property in Request Request.setattribute (NAME, values); 
 
  }}//Set response mode and support Chinese character set Response.setcontenttype ("Text/html;charset=utf-8"); 
 Continue to execute the next filter, without a filter to execute the request Chain.dofilter (request, response); 
  } 
}

Configuration:

<!--Chinese filter configuration--> 
 <filter> 
  <filter-name>chineseEncoding</filter-name> 
  < filter-class>com.test.filter.characterencoding</filter-class> 
 </filter> 
  
 < filter-mapping> 
  <filter-name>chineseEncoding</filter-name> 
  <url-pattern>/*</ Url-pattern> 
 

Other
1, set up pageencoding, ContentType

<%@ page language= "java" contenttype= "Text/html;charset=utf-8" pageencoding= "UTF-8"%> 

2, set up Tomcat's uriencoding
By default, the Tomcat server uses the ISO-8859-1 encoding format, and the uriencoding parameter encodes the URL of the GET request, so we only need the <connector of the Tomcat's Server.xml file > tags add uriencoding= "utf-8" can be.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.