The ultimate solution to Java/J2EE Chinese questions

Source: Internet
Author: User

The oldest solution is to use String bytecode conversion, which is inconvenient. We need to break the object encapsulation and perform bytecode conversion.

Another method is to encode the J2EE container. If the J2EE application system leaves the container, garbled characters will occur, and the specified container configuration does not conform to the principles of J2EE application and container separation.

In Java internal operations, all strings involved are converted to UTF-8 encoding for computation. So what kind of character set is a string before it is converted to Java? Java always determines the initial encoding of a string based on the default encoding Character Set of the operating system, and the input and output of the Java System are all encoded by the default of the operating system.

Therefore, if we can unify the character set of the input, output, and operating system of the Java System, the Java System can correctly process and display Chinese characters. This is a principle for processing Chinese characters in the Java System, but in actual projects, it is difficult to grasp and control the input and output parts of the Java system correctly. In J2EE, because external browsers and databases are involved, Chinese garbled characters are very prominent.

A J2EE application runs in a J2EE container. In this system, there are many input methods: one is to package the page form into a request sent to the server; the other is to read data through the database; another 3rd types of input are complex. JSP is always compiled into Servlet at the first run. JSP often contains Chinese characters. When javac is compiled, java uses the default Operating System encoding as the initial encoding. Unless otherwise specified, for example, you can specify the default character set in Jbuilder/eclipse.
There are also several ways of output: the first is the output of JSP pages. Since the JSP page has been compiled into a Servlet, the output encoding will also be selected based on the default encoding of the operating system during the output, unless the output encoding method is specified, and the output path is the database, output the string to the database.

From this point of view, the input and output of a J2EE system are very complex and dynamic, while Java runs across platforms. In actual compilation and running, different operating systems may be involved. If Java is allowed to determine the encoding Character Set of the input and output based on the operating system, this will lead to uncontrollable garbled characters.

It is precisely because of the cross-platform feature of Java that the character set problem must be solved by a specific system. Therefore, in a Java application system, the fundamental solution to Chinese Garbled text is to specify the unified character set of the entire application system.

Specifies the unified character set, whether to specify ISO8859_1, GBK or UTF-8?

(1) For example, it is uniformly specified as ISO8859_1 because most software is compiled by Westerners. Their default character set is ISO8859_1, including the operating system Linux and database MySQL. In this way, if you specify that the unified Jive encoding is ISO8859_1, you must take the following three steps:

During code development and compilation, the character set is specified as ISO8859_1.

The default Operating System encoding must be ISO8859_1, for example, Linux.

Declare in the JSP header :.

(2) If it is specified as the GBK Chinese Character Set, the above three steps also need to be done. The difference is that it can only run on the default GBK-encoded operating system, such as the Chinese Windows.

Although unified encoding for ISO8859_1 and GBK facilitates code compilation, they can only run on the corresponding operating system. However, it also damages the superiority of Java cross-platform operations and only works within a certain range. For example, to make GBK encoding run on linux, set Linux encoding to GBK.

Is there a basic Chinese encoding solution that requires no additional settings except the application system?

Define the unified coding of the Java/J2EE system as a UTF-8. UTF-8 coding is a way of coding compatible with all languages, the only trouble is to find all the entrances and exits of the application system, and then use the UTF-8 to "ligation" it.
A J2EE application system requires the following steps:

Specifies the character set as a UTF-8 when developing and compiling code. Both JBuilder and Eclipse can be set in project properties.

Using a filter, if all requests go through a Servlet control distributor, use the Servlet filter to execute the statement and convert all requests from the browser to the UTF-8, because the request packet sent by the browser is encoded according to the operating system of the browser, it may be encoded in various forms. Key sentence:


Request. setCharacterEncoding ("UTF-8 ")

The source code of this filter is available on the Internet.
Declare in the JSP header:

<% @ Page contentType = "text/html; charset = UTF-8" %>

In Jsp html code, declare the UTF-8:

<META http-equiv = "Content-Type" CONTET = "text/html; charset = UTF-8">

Set the database connection method to a UTF-8. For example, when connecting to MYSQL, configure the URL as follows:

Jdbc: mysql: /// localhost: 3306/test? UseUnicode = true & characterEncoding = UTF-8

Typically, databases can set UTF-8 through management settings.

Other UTF-8 can be set when encoding is set when interacting with the outside world, such as reading files and operating XML.

I used to adopt this principle in Jsp/Servlet. Later, when I used Struts, Tapestry, EJB, Hibernate, Jdon and other frameworks, I was never troubled by garbled characters. It can be said that they were suitable for various architectures. I hope this solution will be shared by more beginners to reduce the first obstacle of Java/J2EE and avoid Chinese problems in the new technical architecture due to some temporary solutions.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.