Servlet & JSP (22)

Source: Internet
Author: User

The topic of this article is about Chinese Garbled text. We will analyze the causes of Chinese Garbled text and solve the problem of Chinese Garbled text in multiple ways.

In a computer, only binary data is supported. Whether the data is stored in memory or in external storage, the characters we see are also saved in binary format. The rules for the numbers of binary numbers corresponding to different characters are character encoding. Character encoding sets are called character sets.

Common Character sets

Commonly used character sets include ASCII code, ISO8859-1, gb2312, GBK, Unicode, UTF-8. Each ASCII code is represented by an 8-bit binary number. For example, the character "0" is represented as 48. ASCII can contain up to 256 characters. It was originally used only for English letters and other characters. However, with the development and popularization of computers, other countries need to introduce local languages, so it is obviously not enough. The ISO8859-1 expands the ASCII code and adds some other Spanish characters. Gb2312 is a Chinese Standard Chinese character information interchange code. GBK provides extended support for gb2312. In addition to completely compatible with gb2312, it also supports traditional Chinese characters, not commonly used Chinese characters and many symbols. However, GBK is not a national standard, but a standard.

Each country defines its own character set. If you send an email to a German friend, the email will be converted to binary data according to the Chinese character encoding. After receiving the email, the binary data will be converted to German. then garbled characters will appear. Because, in different character sets, the same number may correspond to different symbols, or there may be no corresponding symbols in another character set. To solve these problems, the Unicode Association has developed unicode encoding. Unicode uses the double-byte unsigned number to encode each character. The range is 0-65535. Currently, more than 40000 Unicode characters have been defined. UTF-8 is designed to reduce the amount of data that stores and transfers English characters. Because unicode encoding occupies two bytes of a character, most of the information on the network is expressed in English.

Analysis of the Causes of garbled characters

To enable programs written in Java to run on a variety of language platforms, Java internally uses the Unicode Character Set to represent characters, which leads to conversion between the Unicode Character Set and the local character set. When reading local character data in Java, You need to convert it to unicode encoding, while in the output, you need to convert unicode encoding to local character encoding. For example, when reading a character "medium" from the console in the Chinese system, the actual reading is its GBK code 0xd6d0. In the Java language, traditional Chinese medicine converts GBK to Unicode code 0x4e2d, the value corresponding to the character "medium" stored in the memory is 0x4e2d. when outputting data to the console, convert it to GBK.

According to the above process, the conversion process is reversible and should not be garbled. In fact, different character sets may be used in Web applications, browsers, Web servers, web applications, and databases, leading to garbled characters during conversion between different characters.

When converting from a Unicode character set to a character set, if the character set does not have the corresponding encoding, 0x3f (question mark?) is obtained ?). When converting from another character set to unicode encoding, if the binary number is not identified in this character set, the result is 0 xfffd.

Chinese garbled Problem Solution

The form data submitted using the POST method contains Chinese Characters

Because the Web Container uses ISO-8859-1 encoding by default, in Servlet/JSP, the string obtained through the getparameter () method of the request object is converted to the ISO-8859-1, which is one of the reasons for garbled, to prevent the container from returning a string in ISO-8859-1 encoding format, for form data submitted using the POST method, you can call the request before obtaining the request parameters. setcharacterencoding ("GBK") indicates that the request body uses the encoding format GBK. Before sending data to the browser, the encoding method for calling the specified output content is GBK.

For JSP pages, write the following code before obtaining the request parameter values:

<%request.setCharacterEncoding("gb2312");%>

To specify the encoding format of the output content, set the contenttype attribute of the page command.

<%reponse.setContentType("text/html;charset=GBK");%>

The following code is automatically added to the servlet class after the Web Container converts the JSP page:

reponse.setContentType("text/html;charset=GBK");

The form data submitted in get mode has Chinese Characters

When the form is submitted in get mode, the submitted data is appended to the end of the URL as the query string and sent to the server. At this time, it is useless to call setcharacterencoding on the server side, we need to perform correct encoding conversion after obtaining the value of the request parameter.

String name = request.getParameter("name");name = new String(name.getBytes("ISO-8859-1","GBK"));

Read Chinese data stored in the database

To solve the problem of garbled Chinese Data Reading in the database, you only need to change the default encoding format of the database to GBK or gb2312.

Use filters to solve Chinese garbled characters

The filter code is as follows:

package com.shan.filter;import java.io.IOException;import javax.servlet.*;public class EncodingFilter implements Filter {private String encoding = null;private FilterConfig filterConfig = null;public void init(FilterConfig filterConfig) throws ServletException {this.filterConfig = filterConfig;this.encoding = filterConfig.getInitParameter("encoding");}public void destroy() {this.encoding = null;this.filterConfig = null;}public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {if(null != encoding){request.setCharacterEncoding(encoding);}chain.doFilter(request, response);}}

The content of the web. xml configuration file is as follows:

<?xml version="1.0" encoding="ISO-8859-1"?><web-app xmlns="http://java.sun.com/xml/ns/javaee"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation="http://java.sun.com/xml/ns/javaee                      http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd"  version="3.0"  metadata-complete="true">  <filter>    <filter-name>EncodingFilter</filter-name>    <filter-class>com.shan.filter.EncodingFilter</filter-class>    <init-param>      <param-name>encoding</param-name>      <param-value>gb2312</param-value>    </init-param>  </filter>  <filter-mapping>    <filter-name>EncodingFilter</filter-name>    <url-pattern>/*</url-pattern>  </filter-mapping></web-app>

Make Tomcat support Chinese file names

In Web applications, we may use files with Chinese names. However, Tomcat cannot access files with Chinese names by default. To make Tomcat accessible, You need to configure the server. xml file in the conf folder under the tomcat installation directory. Locate the <connector> element, add the uriencoding property to the element, and set its value to UTF-8, as shown below:

<Connector port="8080" protocol="HTTP/1.1"               connectionTimeout="20000"               redirectPort="8443" URIEncoding="UTF-8"/>

Most browsers are UTF-8-encoded When Uri is transferred. In a web application, if you call the response. sendredirect () method to redirect to the Chinese file name page, you need to call it in the following way:

Response. sendredirect (java.net. urlencoder. encode ("Chinese file .html", "UTF-8 "));

For other access methods, you can directly write Chinese file names without coding.

Reprinted please indicate the source:Http://blog.csdn.net/iAm333

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.