Chinese in Java

Source: Internet
Author: User
Let's talk about Chinese in Java-general Linux technology-Linux programming and kernel information. The following is a detailed description. Abstract: There have been a lot of discussions about the problems that Java applications have encountered in processing Chinese characters. Unlike most discussions, this article will discuss the problems in Java when processing Chinese characters from the perspective of the input and output of Chinese characters.

Although there are many discussions about Java's problems in processing Chinese characters, Java technology involves a wide range of content (J2EE includes more than a dozen related technologies), and there are many technical vendors, java-oriented Web servers, application servers, and JDBC database drivers do not have official standards, therefore, Java applications have inherent problems in processing Chinese characters. In addition, some platform-related problems may occur with the selection of servers and drivers. That is to say, Java code portability discounts when dealing with Chinese problems.

In general, Java's Chinese processing problems are concentrated in JSP technology applications and Java database access. This is because both JSP applications and JDBC-based database access involve the interaction between Java programs and another application system, this kind of interaction inevitably requires data interaction and parameter transmission between systems. The problem with processing Chinese Characters in Java is often where the data is read and output.

Chinese notes for JSP programs

Take the JSP application of Tomcat 3.2.1 as an example. If you encounter a Chinese problem, you can use the following encoding force Conversion Function to convert the internal code.

Public static String toChinese (String strvalue)
{
Try {
If (strvalue = null)
Return null;
Else
{
Strvalue = new String (strvalue. getBytes ("ISO8859_1"), "GBK ");
Return strvalue;
}
} Catch (Exception e ){
Return null;
}
}

Note: before using this function, we need to analyze why Chinese cannot be correctly output, rather than using this method to solve all Chinese processing problems. For example, if you forget to define the JSP output code as GB2312 or GBK, you cannot use this function to output Chinese characters correctly. A good habit is to define the character set to be output by the program in the first line of the file when writing every JSP page, such

<% @ Page contentType = "text/html; charset = GBK" %> or <% @ page contentType = "text/html; charset = GB2312" %>

For JSP versions that do not support defining output character sets, we can also make the following settings:

<META HTTP-EQUIV = "Content-Type" CONTENT = "text/html; charset = gb2312">

In addition, it should be noted that this function is used to solve the code that does not allow correct output of Chinese characters, rather than a common function used to ensure correct output of Chinese characters. The reason why Chinese characters cannot be correctly output or read is because of the character encoding and the default character set encoding (or the character set to be output by the application, which is usually the same) so before applying this function, we must determine whether the encoding of the characters we want to read or output is the same as the default character set encoding.
The following example shows the correct and incorrect use of the function. In this example, the JSP system is Tomcat 3.2.1, and the running environment on the client and server is Windows2000 in Chinese.

Example 1

<% @ Page contentType = "text/html; charset = GBK" %>

<Html>

<Head>

<Title>

TestJSP

</Title>

</Head>

<Body>

<H1>

<%

Class testChina extends Object {

Public String toChinese (String strvalue)

{

Try {

If (strvalue = null)

Return null;

Else

{

Strvalue = new String (strvalue. getBytes ("ISO8859_1"), "GBK ");

Return strvalue;

}

} Catch (Exception e ){

Return null;

}

}

Public void test (){

}

}

TestChina testC = new testChina ();

String str1 = new String ("this is a test of Chinese support". getBytes ("GBK "));

String str2 = new String ("this is a test of Chinese support". getBytes ("GBK"), "ISO-8859-1 ");

String str3 = new String (testC. toChinese (str2 ));

Out. println ("Begin <br> ");

Out. println ("str1 ");

Out. println (str1 + "<br> ");

Out. println ("str2 ");

Out. println (str2 + "<br> ");

Out. println ("str3 ");

Out. println (str3 + "<br> ");

Out. println ("End <br> ");

System. getProperties (). list (System. out );

%>

</H1>

</Body>

</Html>

We know that the default encoding method of Java programming language is UNICODE, but the character set used by the Java compiler is the default Character Set of the operating system, the Windows of Chinese is GBK, the English system is ISO-8895-1. For example 1, the default Character Set of the system is GBK, and the output character set of JSP is GBK. The two are consistent. For str1, we make it adopt the default character set encoding; For str2 we deliberately convert it into ISO-8895-1 encoding to produce results that cannot be correctly output in Chinese; str3 is an incorrect usage of the toChinese function of the testC class. It converts the original correct character output into a character encoding that does not match the system character set, str3 is a correct usage of the toChinese function of the testC class. It corrects the character output error of str2. Therefore, we must correctly analyze the cause of abnormal character output and then use the toChinese function. So how can we identify the characters that may cause problems. Note the following principles:

1) mainly consider character variables. Because the character encoding form of variables is relatively hidden, changing the numbers and operations between variables may change the character set. In various operations on variables and the data submitted on the page, it is easy to perform operations on characters of different encoding formats.

2) read and read characters. When the encoding format of most characters conflicts with the target encoding format, it occurs in the character reading and Output Processes. For example, Form submission, URL obtaining, and display of control content (such as List control.

3) perform tests if necessary. Because the problem of Java's Chinese language may change with the differences between Web servers, browsers, runtime environments, and development tools, in order to better avoid the problem, we must perform some targeted tests.

Of course, the method to solve the Java Chinese problem is not limited to the forced encoding output. We can also use the following methods to solve the problem:

1) Compile the source program using javac-encoding big5 sourcefile. java or javac-encoding gb2312 sourcefile. java.

2) using Java2 JDK Chinese localized version (http://java.sun.com/products/jdk/1.2/chinesejdk.html), but this version is an unofficial version, Sun does not guarantee its upgrade.

Chinese problems during Database Access

After the above discussion, it is not difficult to understand the Chinese problems in the database access process.

At present, most of the JDBC driver is not designed for the Chinese system (most of the Chinese data adopts the ISO-8859-1 encoding method), so in the data read and write process often need character encoding conversion.

If the system runs on the Chinese operating system platform:

1) the following code can be used to read Chinese characters:

StrChinese = new String (rs. getObject (j). toString (). getBytes ("ISO-8859-1 "));

For the Win2000 platform, you can use the JDBC driver provided by Weblogic 6.0 to read Chinese code, as shown in the following code (character computation is performed in the example ):

Driver myDriver = (Driver) Class. forName ("weblogic. jdbc. mssqlserver4.Driver"). newInstance ();

Conn = myDriver. connect ("jdbc: weblogic: mssqlserver4", props );

Conn. setCatalog ("labmanager ");

Statement st = conn. createStatement ();

File: // execute a query

String testStr;

String testTempStr = new String ();

TestStr = new String (testTempStr. getBytes ("ISO-8859-1"); // encode the conversion

DatabaseMetaData DBMetaData = conn. getMetaData ();

ResultSet rs = DBMetaData. getTables (null, new String [] {"TABLE "});

While (rs. next ()){

For (int j = 1; j <= rs. getMetaData (). getColumnCount (); j ++ ){

TestStr = testStr + String (rs. getObject (j). toString (). getBytes ("ISO-8859-1 "));

}

}


2) Chinese output. Chinese output and reading are exactly inverse processes. We need to convert the system default encoding of characters to the ISO-8859-1 encoding supported by JDBC. The code can be written as follows:

TempBytes = strInput. getText (). getBytes ();

SQLstr = new String (tempBytes, "ISO-8859-1 ");

It should be noted that different JDBC drivers have different support for the same database, and the same JDBC driver has different support for different databases, that is to say, when the JDBC driver changes, our character conversion code must be tested to determine whether the code works normally. Otherwise, the code will become superfluous. For example, for I-net Una 2000 Driver Version 2.03 for ms SQL Server, we do not need to do any encoding conversion to achieve normal Chinese operations. However, since many JDBC drivers do not explicitly provide support for Chinese characters, we recommend that you perform a test when using JDBC.

Conclusion

In fact, the root cause of the problem in Java Chinese processing is that the encoding format of the operated Chinese character (variable) is different from the target encoding format, all these problems actually occur in the process of character reading and output. As long as we grasp this link, we can better understand and handle Java's Chinese problems.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.