How to solve the garbled problem in Java Web application _ Android

Source: Internet
Author: User
Tags html form java web


This work is licensed with the knowledge sharing signature-non-commercial use-sharing 2.5 Chinese mainland license agreement in the same way.

When we browse through the Web from a Java Programmer's perspective, we find that, on the one hand, the client-side browser (ie or Firefox) submits the HTTP request as a form or link, while processing the response data from the HTTP server. Present the data flow (HTML data or other kinds of data) to the user in the appropriate way. On the other hand, on the Java Web Application Server, an HTTP request can be handled by a servlet class or a JSP Web page, which requests data from HttpServletRequest, and the response data is sent to HttpServletResponse. The entire process of an HTTP request is made up of four steps of a client-side commit request, a server-side processing request, a server-side return response data, and a client-side process response data. When data is transmitted in these four important links, it is encoded or decoded in the specified encoding. If improper processing will appear garbled problem. client-side processing

When a client sends an HTTP request, a data in the following format is sent to the server side:

<request-line>
<CRLF>
[<request-body><crlf>]

For the format of HTTP requests, you can learn more about the HTTP protocol and the HTML form (and then the difference between get and post).
In this case, both request-line and request-body need to encode the corresponding coding process. code Processing of Request-line

The URL portion of the request-line must be encoded in application/x-www-form-urlencoded mode. The character set used when encoding is the character set used by the current Web page when it is displayed on the browser.

There are two classes in the JDK that handle application/x-www-form-urlencoded types of data, they are urlencoder and Urldecoder. When the data on a Web page requires manual urlencoding processing, you can use the Urlencoder class to complete the encoding work. The locations where manual urlencoding processing is required include the href label attribute in the link (<a></a>), and the action label attribute in the form (<form></form>) submitted as post.

For example, you should not create such a link on a Web page:

<!--incorrect wording--> <a href= "/hello/checkuser.html?opt= Chinese > user Authentication" </a>

The correct wording is:

<!--the results of urlencoding using the UTF-8 character set--> <a href= "/hello/checkuser.html?opt=%e4%b8%ad%e6%96%87" > User authentication < /a>

To do this, one scenario can use a scripting language for urlencoding processing on a JSP Web page. Such as:

<% @page import= "Java.net.URLEncoder"%> <a href= "/hello/checkuser.html?opt=<%=urlencoder.encode" ("Chinese", " UTF-8 ")%>" > User authentication </a> request-body encoding processing

Request-body will only be generated in the way the post is submitted. The encoding of Request-body is specified by the Enctype label property of the form, which, like Request-line, is the character set used to encode the request-body when the current Web page is displayed on the browser. The Request-body encoding process is done automatically by the client browser and requires no additional programmatic processing. Processing of servers

On the client side, the server side provides two ways to process the request data when it receives an HTTP request: Automatic processing and no processing.
Servers typically automatically process application/x-www-form-urlencoded types of data, including Request-line and Request-body, in the case of a servlet (servlet class or JSP Web page), This data can be obtained through the getparameter () or getparametervalues () of the Request object. For other MIME-type data other than that, the HTTP server is handing the processing directly to the servlet (servlet class or JSP Web page) corresponding to the HTTP request.
For example, the client has the following form to submit:

<form action= "checkuser.html?opt=xxx" method= "POST" > <input type= "text" name= "username" value= "yyy"/> <input type= "text" name= "username" value= "zzz"/> "<inupt type=" Submit "value=" Submit "/> </form>

The servlet (servlet class or JSP Web page) that corresponds to the checkuser.html when the form is submitted is automatically processed by the server side and can be obtained in the following way:

String opt = request.getparameter ("opt"); string[] users = request.getparametervalues ("username");

By default, the server urldecoding processing the character set to Iso-8859-1 for the received application/x-www-form-urlencoded type data, followed by the string within the code is iso-8859-1. For HTTP servers that do not have any settings attached, our servlet must decode the data after it has been obtained and generate a UTF-16 (Unicode) string.
For example, for data that is urlencoding in the UTF-8 character set in the client request data, the servlet needs to decode the following ways:

String opt = request.getparameter ("opt"); if (opt!=null &&! "". Equals (opt)) {opt = new String (opt.getbytes ("iso-8859-1"), "UTF-8");}

To avoid this additional encoding/decoding process, that is, to let the server know the character set used by the client in Urlencoding, and to directly perform the urldecoding processing of the corresponding character set, different HTTP servers provide different solutions.
In the case of Tomcat, the processing of the Tomcat automatic decoding request-line is specified by the Tomcat configuration file server.xml. The Uriencoding Tag property is provided in the connector tag in Server.xml, and as long as the character set for decoding is specified, Tomcat automatically decodes request-line through application/ x-www-form-urlencoded encoding the data processed. For example:

<connector connectiontimeout= "40000" port= "8080" protocol= "http/1.1"
uriencoding= "UTF-8" redirectport= "8443"/>

Tomcat automatic decoding request-body is handled by setting the Characterencoding value of the request. Such as:

Request.setcharacterencoding ("UTF-8");

However, this operation must be done in advance in filter, and the use of this method in the servlet has no effect. The example of filter is as follows:

Import java.io.IOException; Import Javax.servlet.Filter; Import Javax.servlet.FilterChain; Import Javax.servlet.FilterConfig; Import javax.servlet.ServletException; Import Javax.servlet.ServletRequest; Import Javax.servlet.ServletResponse; public class Characterencodingfilter implements Filter {private String encoding. Public Characterencodingfilter () {encod ing = null; public void Destroy () {encoding = null;} public void Dofilter (ServletRequest request, servletresponse response, Filter Chain Chain) throws IOException, Servletexception {request.setcharacterencoding (encoding); Chain.dofilter (Request, Response); } public void init (Filterconfig filterconfig) throws servletexception {encoding = Filterconfig.getinitparameter (" Encoding "); if (encoding = NULL | | "". Equals (encoding)) {encoding = "UTF-8";}}}

We can use this filter in Web.xml. The corresponding configuration of Web.xml is as follows:

<filter> <filter-name>character Encoding filter</filter-name> <filter-class> Characterencodingfilter </filter-class> <init-param> <param-name>encoding</param-name> <param-value>UTF-8</param-value> </init-param> </filter> <filter-mapping> < Filter-name>character Encoding filter</filter-name> <url-pattern>/*</url-pattern> </ Filter-mapping>

The data extracted in the servlet can be directly used without iso-8859-1 decoding through the preprocessing of the two methods mentioned above. selection of character sets

Another issue that needs to be noted in processing application/x-www-form-urlencoded types of data is the choice of character sets. As mentioned above, the character set used by urlencoding, whether Request-line or Request-body, is the character set used by the current Web page when it is displayed on the browser. This information is also provided in the HTTP response when HTTP server-side generates HTML Web pages.
When an HTTP server receives an HTTP request, the server always needs to send an HTTP response to the client. HTTP response data is the same as the HTTP request data format, and also consists of the following sections:

<response-line>
<CRLF>
[<response-body><crlf>]

The following describes the server's response information when requesting an HTML Web page data:

http/1.1 OK
server:apache-coyote/1.1
Content-type:text/html;charset=utf-8
content-length:265
Date:thu, Dec 2009 05:20:36 GMT

<! DOCTYPE HTML PUBLIC "-//w3c//dtd HTML 4.01 transitional//en" "HTTP://WWW.W3.ORG/TR/HTML4/LOOSE.DTD" >
<meta http-equiv= "Content-type" content= "text/html; Charset=utf-8 ">
<title>test</title>
<body>
</body>
[end]

Where headers's content-type specifies the data format of the data stream and the character set used for display. This indicator can be specified in the following ways:
1. HTML Web page
There are multiple <meta/> tags in the

<meta http-equiv= "Content-type" content= "text/html; Charset=utf-8 ">

2. JSP Web page
JSP Web pages, in addition to <meta/> tags, you also need to set the following code in the head of the JSP page:

<%@ page language= "java" contenttype= "text/html; Charset=utf-8 "pageencoding=" UTF-8 "%>

3. Servlet class
If you want to route HTML data to the client via response in the servlet class, you need to specify Content-type before routing. The code is as follows:

Response.setcontenttype ("Text/html;charset=utf-8");

In these three ways, you can ensure that when the response data is delivered to the client browser, the browser displays its contents using the correct decoding method and character set. Concluding remarks

In short, Content-type is the link between client and server, through which both sides are able to encode and decode the relevant data correctly. As long as understand the role of Content-type and the use of methods, garbled problem will be solved.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.