Tomcat character set and the solution of Chinese garbled characters

Last Update:2018-07-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The use of Tomcat, I believe that everyone back to the problem of garbled Chinese, the specific performance of
1 The Chinese data obtained through the form is garbled
2 page submission of Chinese data, server-side received as garbled

First, the primary solution
After a number of searches, many people adopt the following methods, first of the acquisition string in accordance with iso8859-1 to decode the conversion, and then according to gb2312 code, and finally get the correct content. The sample code is as follows:
Page Pass: http://xxx.do?ptname= ' I'm Chinese '
Background conversion:
[Java]View plain copy String strptname = Request.getparameter ("Ptname");   Strptname = new String (strptname.getbytes ("iso-8859-1"), "UTF-8");    String para = new string (Request.getparameter ("Para"). GetBytes ("Iso8859-1"), "gb2312"); The reason for this is that Americans write Tomcat by default using Iso8859-1 to encode the results.
However, in our servlet and JSP pages There are a lot of parameters to be passed, so the conversion of the words will bring a lot of conversion code, very inconvenient.

Second, entry-level solutions
Later, everyone began to write a filter, the parameters obtained from the client before, through the filter first to obtain the parameter encoding to gb2312, and then you can directly use GetParameter to obtain the correct parameters. This filter has a detailed use example in the Tomcat sample code Jsp-examples, where the filter is set in Web.xml as follows, in the example, the Japanese encoding is used, so we just change it to gb2312
View Plaincopy to Clipboardprint?
[HTML] View Plain Copy <filter>       <filter-name>Set Character encoding</filter-name>       <filter-class>filters. setcharacterencodingfilter</filter-class>       <init-param>       <param-name>encoding</param-name>      <param-value>EUC_JP</param-value>       </init-param>        </filter>      <filter>   <filter-name>Set Character Encoding</filter-name>    < Filter-class>filters. setcharacterencodingfilter</filter-class>    <init-param>    < param-name>encoding</param-name>    <param-value>EUC_JP</param-value>    </initThe code for the-param>    </filter>    filter is as follows:
[Java] View Plain Copy public class setcharacterencodingfilter implements filter {       // encoded string        protected string encoding = null;      // Filter Configuration        protected filterconfig filterconfig = null;      // Whether to ignore client encoding        protected boolean ignore = true;       // Destruction Filter        Public void destroy () {       this.encoding = null;       this.filterconfig = null;      }      // Filtration Method        Public void dofilter (Servletrequest request, servletresponse response,       Filterchain chain)       throws ioexception, servletexception {      // If you use a filter , ignore the client's encoding, then use the filter to set the encoding        if (ignore | | (request.getcharacterencoding () == null)) {       String encoding = selectencoding (Request);       if (encoding != null)        request.setcharacterencoding (encoding);       }      // Transmission to the next filter        Chain.dofilter (request, response);      }          // Initialization filter        Public void init ( Filterconfig filterconfig) throws servletexception {       this.filterconfig = filterconfig;        this.encoding = filterconfig.getinitparameter ("encoding");       string value = filterconfig.getinitparameter ("Ignore");       if (value == null)        This.ignore = true;       else if (Value.equalsignorecase ("true"))       this.ignore = true;       else if (Value.equalsignorecase ("yes"))        this.ignore = true;       else       this.ignore = false;      }      // return filter set encoding         Protected string selectencoding (servletrequest request) {      return (this.encoding);      }      }       Public class setcharacterencodingfilter implements filter {   // encoded string     protected string encoding = null;    // Filter Configuration     protected filterconfig filterconfig = null;   // whether to ignore client encoding     Protected boolean ignore = true ;   // Destruction Filter     Public void destroy () {    this.encoding = null;    this.filterconfig = null;   }   // Filtration Method     Public void dofilter (servletrequest request, Servletresponse response,    filterchain chain)     throws ioexception, servletexception {   // If you use a filter to ignore the client's encoding, use the filter to set the encoding     if ( ignore | | (request.getcharacterencoding () == null)) {    string encoding = selectencoding (Request);    if (encoding != null)     Request.setcharacterencoding (encoding);   }   // transmission to the next filter    Chain.dofilter (request, response);   }        // Initialize filter     public void init (filterconfig filterconfig) throws servletexception {    this.filterconfig = filterconfig;    This.encoding = filterconfig.getinitparameter ("encoding");    string value = filterconfIg.getinitparameter ("Ignore");    if (value == null)     This.ignore = true;    else if (Value.equalsignorecase ("true"))     this.ignore = true;    else if (value.equalsignorecase ("yes"))    this.ignore = true;    else    this.ignore = false;    }   // return filter set encoding     protected string Selectencoding (servletrequest request) {    return (this.encoding);   }   }    However, in tomcat5, even if the use of filters, can still get garbled, why.

Iii. Advanced Solutions
Originally, in TOMCAT4 and TOMCAT5, the processing of parameters is not the same.
In Tomcat4, get is the same as post, so you can fix a get and post problem by setting the request.setcharacterencoding once in the filter.
However, in TOMCAT5, the processing of get and post is done separately.
In Tomcat 5, in order to solve the coding problem, the Tomcat author made a lot of effort to add the following configuration parameters to the Connector element in Tomcat's profile server.xml, specifically to configure the encoding directly
Uriencoding is used to set the encoding used by the content passed through the URI, and Tomcat encodes the client-delivered content using the encoding specified here.
What is a URI?
The description in Java doc is as follows: The URI is a Uniform resource identifier and the URL is a Uniform Resource locator. Therefore, generally speaking, each URL is a URI, but not necessarily every URI is a URL. This is because the URI also includes a subclass, the Uniform Resource Name (URN), which names the resource but does not specify how to locate the resource.

That is, the parameters that we submit through the Post method are actually submitted through the URI, which is managed by this parameter, and if this parameter is not set, Tomcat encodes the contents of the client using the default iso8859-1.

Usebodyencodingforuri uses the same code as the body to handle URIs, which are designed to be compatible with TOMCAT4. In TOMCAT5, the processing of the post is handled through the preceding uriencoding, and the contents of the get are still processed by the request.setcharacterencoding, so that the setting is maintained for compatibility.
When the Usebodyencodingforuri is set to true, the garbled problem in get and post can be solved directly by request.setcharacterencoding.
In this way, we can solve the problem of the parameters in the Get method by setting the uriencoding in the Server.xml, and using the filter to solve the problems in the Post method.
Alternatively, you can solve the coding problem by setting the Usebodyencodingforuri to true in the Server.xml and cooperating with the filter.
Here, I strongly recommend that in the creation of the site, the whole process of using utf-8 code to completely solve the garbled problem.
The specific actions are as follows:
1, page content use Utf-8 format to save, add <mete http-equiv= "ContentType" content= "textml;charst=utf-8" in the page >
2, server-side server.xml Set Usebodyencodingforuri = True
3, the use of filters, filter set code for UTF-8

Four: If there are some transcoding can not turn, but try to open Tomcat Server.xml, find
[HTML]View plain copy <connector acceptcount= "connectiontimeout= 20000" disableuploadtimeout= "true" port= "80" redirectport= "8443" > and at the end adds usebodyencodingforuri= "true" uriencoding= "UTF-8", as follows
[HTML]View plain copy <connector acceptcount= "connectiontimeout= 20000" disableuploadtimeout= "true" port= "80" Redirectport= "8443" usebodyencodingforuri= "true" uriencoding= "UTF-8" >

Five:
If you use Jstl, you can write an El function and call Urlencoder.encode to encode it.

ie defaults to the URL after the parameter is not encoded to send, but Tomat default is to press iso8859-1 for URL decoding, so this error occurs. Good practice is to:

1, in the URL parameter to ensure the use of UTF-8 encoding, the method can be encodeURI () JS function, or invoke the custom El function;
2, set Server.xml Connector familiar with uriencoding= "UTF-8", to ensure that the decoding format and coding format Unified;

Method Four:
[JavaScript]View plain copy <mce:script type= "Text/javascript" ></mce:script> <mce:script type= "Text/javascript" ></mce:script>

In action: [Java] view plain copy String s=request.getparameter ("s"); S=new String (s.getbytes ("iso-8859-1"), "GBK");

VI: JS garbled solution 1. Client:
[JavaScript] view plain copy url=encodeuri (URL); Server:
[Java] view Plain copy string linename = new String (Request.getparameter ("name"). GetBytes ("Iso-8859-1"), " UTF-8 ");
2. Client:
[JavaScript] view plain copy url=encodeuri (encodeURI (URL)); Use 2 times encodeURI This, is a more respected practice, why do this, there is a reason, later put up the Post ~ ~ ~

Server:
[JavaScript] view plain copy String linename = request.getparameter (name);

Java: Character decoding [Java] view plain Copy linename = Java.net.URLDecoder.decode (Linename, "UTF-8");

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Tomcat character set and the solution of Chinese garbled characters

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support