A ramble on the Chinese problems in Java

Source: Internet
Author: User
Tags format define variables string version variable tostring access
Question | Chinese abstract: There has been quite a lot of discussion about the problems of Java application in dealing with Chinese, and in contrast to most of the discussions, this paper discusses the problems of Java language processing in Chinese from the angle of input and output of Chinese characters.

Although there is no shortage of discussion about the problems that Java has in dealing with Chinese characters, there is no official standard for Java technology, because it involves a wide range of content (more than 10 related technologies), a wide variety of technology vendors, Java-oriented Web servers, application servers, and JDBC database drivers. Therefore, Java application in the processing of Chinese when there are inherent problems, but also with the selection of servers, drivers of different issues related to the platform. In other words, the portability of Java code is discounted when dealing with Chinese problems.

In general, the Chinese processing of Java is more concentrated in JSP technology application and Java database access process. This is because both JSP applications and JDBC database access involve the interaction between Java programs and another application system, which inevitably requires data interaction and parameter transfer between the systems. The place where Java handles Chinese problems is often where the data is read and exported.

The Chinese problem that the JSP program should pay attention to

As an example of the JSP application of Tomcat 3.2.1, it is generally possible to use the following code-cast function to convert the code into a Chinese problem.

public static string Tochinese (String strvalue)
{
try{
if (strvalue==null)
return null;
Else
{
strvalue = new String (strvalue.getbytes ("Iso8859_1"), "GBK");
return strvalue;
}
}catch (Exception e) {
return null;
}
}

Note that before using this function, we need to analyze the reasons why Chinese can not be correctly exported, but not all of the Chinese to deal with the problem is solved by this method. For example, if you forget to define the JSP output code as GB2312 or GBK, the resulting Chinese cannot be correctly exported and cannot be solved with this function. A good habit is to define the character set that the program will output in the first line of the file when we write each JSP page, such as

<%@ page contenttype= "text/html; CHARSET=GBK "%" or "<%@ page contenttype=" text/html; charset=gb2312 "%>

For some JSP versions that do not support the definition of output, we can also make the following settings:

<meta http-equiv= "Content-type" content= "text/html; charset=gb2312 ">

Also note that this function is designed to address code that does not correctly output Chinese, rather than a general-purpose function that guarantees the correct output of the medium character. Because the Chinese characters cannot be correctly exported or read because of the difference between the encoding of this character and the default character set encoding of the system (or the character set to be output, which is generally the same), So before we can apply this function we have to determine whether the encoding of the characters we are going to read or output is the same as the system default character set encoding.
The following example gives the correct and incorrect use of the function. Examples of the JSP used in the system for Tomcat 3.2.1, client and server-side operating environment are Chinese Windows2000.

Example 1

<%@ page contenttype= "text/html; CHARSET=GBK "%>



<title>

testjsp

</title>


<body>

<h1>

<%

Class Testchina extends object{

public string Tochinese (string strvalue)

{

try{

if (strvalue==null)

return null;

Else

{

strvalue = new String (strvalue.getbytes ("Iso8859_1"), "GBK");

return strvalue;

}

}catch (Exception e) {

return null;

}

}

public void Test () {

}

}

Testchina TESTC = new Testchina ();

String str1 =new string ("This is a test for Chinese support". GetBytes ("GBK"));

String Str2=new string ("This is a test for Chinese support". GetBytes ("GBK"), "iso-8859-1");

String Str3 =new string (Testc.tochinese (str2));

Out.println ("Begin <br>");

Out.println ("str1");

Out.println (str1+ "<br>");

Out.println ("str2");

Out.println (str2+ "<br>");

Out.println ("Str3");

Out.println (str3+ "<br>");

Out.println ("End <br>");

System.getproperties (). List (System.out);

%>

</h1>

</body>


As we know, the default encoding for the Java programming language is Unicode, but the character set used by the Java compiler is the default character set for the operating system, and the Chinese windows are GBK and the English system is iso-8895-1. For example 1, the default character set of the system is the gbk,jsp output character set is also GBK, the two are consistent. For str1, we make it take the system default character set code, and for str2 we deliberately convert it to iso-8895-1 encoding to produce results that Chinese cannot correctly output; Str3 is an incorrect use of the TESTC function of the Tochinese class. It transforms the original correct character output into a character encoding that does not conform to the system character set, but causes the error of the Chinese output; STR3 is the correct use of the TESTC class Tochinese function, which corrects the str2 character output error. So we must correctly analyze the reason why the character output is not normal and then use the Tochinese function. So how do we differentiate between those characters that might be problematic? Here are a few key principles to note:

1) Mainly consider the character variable situation. Because the character encoding of the variable is more covert, the change of the number of variables and the operation of the variable can cause the change of character set. In the various operations of the variables and the data submitted by the page, it is easier to perform the operation of different coded format characters.

2 Note the reading of the character, read out. Most characters in the encoding format and the target encoding format conflict occurs in the character of the reading and output process. Examples include the submission of a form, the access to the URL, and the display of the content of the control (such as the list control).

3 need to be tested when necessary. Because the Java Chinese problem arises with the Web server, browser, operating environment and development tools are different can change, so in order to better avoid the problem, we have to do some targeted testing.

Of course, the solution to the Java Chinese problem is not limited to the forced encoding output. We can also use the following methods to solve:

1) Compile the source program in javac-encoding Big5 Sourcefile.java or javac-encoding gb2312 Sourcefile.java.

2 The Chinese localized version of the Java2 jdk (http://java.sun.com/products/jdk/1.2/chinesejdk.html), but the version is an unofficial version, and Sun does not guarantee its upgrade.

Chinese problems during database access

After the discussion above, the Chinese problem in the process of database access is not too difficult to understand.

At present, most of the JDBC drivers are not designed for Chinese systems (Chinese data mostly use iso-8859-1 encoding), so it is often necessary to transform the character encoding in the process of data reading and writing.

If the system is running under the Chinese operating system platform, then:

1 in the reading of Chinese characters, you can use the following code:

Strchinese= New String (Rs.getobject (j). ToString (). GetBytes ("iso-8859-1"));

For the Win2000 platform, the use of the JDBC driver provided by WebLogic 6.0 to read the Chinese code can be written as follows (in the case of a character operation):

Driver mydriver = (Driver) class.forname ("Weblogic.jdbc.mssqlserver4.Driver"). newinstance ();

conn = Mydriver.connect ("Jdbc:weblogic:mssqlserver4", props);

Conn.setcatalog ("Labmanager");

Statement st = Conn.createstatement ();

File://execute a query

String Teststr;

String testtempstr = new string ();

Teststr = new String (testtempstr.getbytes ("iso-8859-1"))//Encoding Conversion

DatabaseMetaData dbmetadata =conn.getmetadata ();

ResultSet rs = dbmetadata.gettables (null, null,null,new string[]{"TABLE"});

while (Rs.next ()) {

for (int j=1; J<=rs.getmetadata (). getColumnCount (); j + +) {

Teststr = Teststr +string (Rs.getobject (j). ToString (). GetBytes ("iso-8859-1"));

}

}


2 the Chinese output. The output and reading of Chinese is just a reverse process. We need to convert the system default encoding of the characters to the ISO-8859-1 code supported by JDBC. Code can be written as follows:

Tempbytes=strinput.gettext (). GetBytes ();

Sqlstr=new String (tempbytes, "iso-8859-1");

It is important to note that different JDBC drivers support the same database differently, while the same JDBC driver supports different databases differently, which means that our character conversion code must be tested to make sure that it is working properly when the JDBC driver changes. Otherwise we will become the superfluous. For example, for the i-net una Driver Version 2.03 for MS SQL Server, we simply do not need to do any coding conversion to achieve the normal operation of Chinese. However, since many JDBC drivers do not explicitly give support for Chinese characters, it is recommended that you test with JDBC.

Conclusion

In fact, the root cause of the problem with Java Chinese processing is that the encoded format of the manipulated Chinese character (variable) is different from the encoding format of the target, all of which occur in the process of reading and outputting characters, so long as we hold this link, will be able to better understand and deal with Java's Chinese problems.



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.