Several analysis principles on Java Chinese

Last Update:2013-12-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Although there are already many discussions on Java's Chinese processing problems, Java technology involves a wide range of content (J2EE includes more than a dozen related technologies), and there are many technical vendors, java-oriented Web servers, application servers, and JDBC database drivers do not have official standards, therefore, Java applications have inherent problems in the process of Processing Chinese characters, and there are also the variability of Java Chinese problems caused by the selection of servers and drivers, this increases the complexity of the problem. So, how can we find the crux of the problem in such a complicated phenomenon?

General Solutions to Java Chinese problems

In fact, Java's Chinese problems are caused by the difference between the default encoding format used by Java applications and the encoding format of the characters to be read by the target or application (see document 1 for details ). There are usually four methods to solve the Chinese problem of Java:

1) Select the localized Chinese version of JDK. Although Java2 JDK's Chinese localized version (http://java.sun.com/products/jdk/1.2/chinesejdk.html) is not an official version, Sun does not promise to upgrade it, however, it is still a solution to the Java Chinese problem.

2) Select the appropriate compilation parameters. For the International Java version, we can also specify a specific compiler to support the compilation results of Java applications in Chinese. For example, you can use javac-encoding big5 sourcefile. java and javac-encoding gb2312 sourcefile. java to compile the source program.

3) Implement character encoding conversion code through programming. It has become a common practice to solve the Chinese problem of Java through programming. The following is the most common character encoding conversion function, which converts the character encoding format to the GBK encoding format of the Chinese Windows system.

public static String toChinese(String strvalue){
try{
 if(strvalue==null) return null;
 else{ 
  strvalue = new String(strvalue.getBytes("ISO8859_1"), "GBK");
return strvalue;
 } 
}catch(Exception e){
    return null; 
  }
}

4) define the character output set. For JSP applications, you can use <% @ page contentType = "text/html; charset = GBK" %> or <% @ page contentType = "text/html; charset = GB2312 "%> to define the character output set of the JSP page. Of course, we can also define the output set of characters through the HTML tag <META HTTP-EQUIV = "Content-Type" CONTENT = "text/html; charset = gb2312">.

Problems

Based on the method implementation, we can divide the above four methods into two categories. One is the method implemented by using some standards or rules. The above 1), 2), and 4) all belong to this class; one is implemented through targeted programming, and the above mentioned method 3) belongs to this class.

Because method 1), 2), 4) is a type of normative method, the method is relatively simple, and the solution is not highly targeted and common, for example, we can use method 2) to compile Java source files to implement internal code preset, without considering which part of the source code has encountered Java's Chinese processing problems, such as output garbled characters.

However, because these methods are not targeted and the solution to the problem is too uniform, in some cases, they cannot completely solve the Chinese Java problem. A very common example. In general, your Java applications often need to interact with other Java application interfaces, such as accessing the database through a certain version of JDBC. Because the JDBC driver supports different codes with different providers and versions, if the problem cannot be solved correctly in the input and output processes of the database, we need to perform two opposite encoding conversions in the data input and output processes. This is often not possible for methods 1), 2), and 4. Of course, for method 2, we can also use some techniques to satisfy the above situation. The most effective way is to componentize all parts of Java applications as much as possible. For example, we can compile the read and output code of the database in different source files to meet different character encoding requirements. But the general program design is unlikely to meet this requirement, because the Division results of such programs are likely to be unreasonable. For example, it is a suitable design to encapsulate the Read and Write methods of databases into a class, however, it is unreasonable to implement these two methods in two files. Therefore, for methods 1), 2), and 4), although the implementation is relatively simple, it has some insurmountable shortcomings. This is also the reason for the prevalence of programming methods that are relatively complicated to implement.

Compared with method 1), 2), 4), method 3 has better pertinence and flexibility. The program can make flexible processing according to different situations, and perform character encoding conversion wherever needed, however, the features of this method also put forward higher requirements for software developers-they must be able to accurately capture places where problems may occur in Chinese, and make correct judgment and handling.

Analysis principles

In general, all solutions to Java Chinese processing are not very complicated. On the contrary, due to the wide variety of Java technology, especially J2EE technology, various Web servers, application servers, and JDBC database drivers, therefore, how to correctly and timely discover the Chinese processing problems of applications becomes more complicated. How can we find these problems?

Generally, the problem that occurs when Java processes Chinese characters is caused by the difference between the default encoding format used by your Java application and the encoding format of the characters to be read by the target or application, one of the main reasons for these differences is that the user's Java application and other applications exchange data with mismatched encoding formats (including direct or indirect data input and output ). Therefore, in order to detect problems in time, we can start with this and analyze the application based on the following principles:

Note character Variables. Because the character encoding form of the variable is relatively hidden, changing the numeric values and operations between the variables may change the character set. In various operations on the variables and the data submitted on the page, it is easy to perform operations on characters of different encoding formats.
Note any form of character reading and Output. Java applications are mostly developed as network applications. Therefore, compared with applications in other languages, Java applications need to exchange various character data in the online world. For example, data submission of various forms, Data Reading in the form of URLs, character data exchange after encryption operations, input of the Results selected by the web control, display of the control content (such as the List control), and so on.
Use third-party components and applications with caution. Since the implementation of third-party components and applications is non-transparent, it is generally difficult for us to determine what the default encoding format of these components or drivers is and cannot be controlled. Therefore, pay special attention when using the interface functions provided by them for data exchange. If Chinese cannot be correctly processed, we should first check our own code and adjust the relevant code to adapt to these interfaces, because these components or applications basically do not provide interfaces to adjust the encoding mechanism. If necessary, we may need to use other replaceable components or applications.
Note the data input and output contained in the requested object. This is a very concealed situation. When our application interacts with an object (such as a serialized object), if

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Several analysis principles on Java Chinese

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Several analysis principles on Java Chinese

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support