Coding | Chinese 1. Introduction
When developing in Java, the occasional loss of characters occurs in IO operations. If you are developing a CMP EJB using Bea's workshop, you will always compile without error:
You can see a pronounced "excetion" spelling error. And this code is workshop automatically generated. However, on some machines, the same engineering files can be compiled. Contacting the BEA engineer does not solve the problem.
The author consulted a large amount of data, it is difficult to find the relevant problems introduced. One time in the occasional review of the Sun's defective library [i], the discovery was due to the GB18030 Chinese coding problem.
2. Problem analysis
National standard gb18030-2000 The expansion of the basic set of Chinese character coded character set for information interchange is the most important encoding standard after gb2312-1980 and gb13000-1993 in our country, and is one of the basic standards that our computer system must follow. The State administration of quality Supervision of GB 18030 transition period (i.e. August 31, 2001) after the official release or factory products, must meet GB-18030 related requirements.
The operating system default internal encoding is not generally GB18030, and it is now known that in the Windows XP operating system, the default encoding of the operating system is changed from GB2312 to GB18030 after the upgrade of some components.
But even in the latest release of the jdk1.4.2_06 version, there are still some problems with its support. The main manifestation of GB18030 problem is that, based on Java application, there is a phenomenon of character loss when it involves the conversion of GB18030 encoding and other coding schemes.
The problem is that Java handles character buffering when it deals with extended character sets provided by Sun.nio.cs.ext.ExtendedCharsets. However, the buffer characters do not adopt the new Sun.nio.cs.ext packet processing, but the original processing mode, this way under the multithreading operation of the GB18030 coding scheme to deal with problems, which led to the loss of some characters.
This problem only affects the GB18030 encoding scheme and has no effect on the Chinese encoding scheme such as GB2312.
When the operating system default encoding scheme is GB18030, if the file write operation, not specified encoding scheme, Java operating system default encoding scheme operation, then the most likely to appear GB18030 problem.
To view the operating system default encoding, you can run the following Java program:
System.out.println ("Default System Encoding:" + Encoding);
}
}
The operating system default encoding for developing CMP EJB problems with workshop is GB18030.
Because fewer people are experiencing this problem. When you actually encounter it, many people can solve the problem by reinstalling the operating system, so this information is hard to find.
3. Solutions
The ideal solution would be to fix the bug by Sun. This issue was raised as early as November 2003, but until now (2004/12/30), the problem status is still "in process, bugs."
The main idea of alternative solutions is to avoid GB18030 coding, there are two main ways
Changing the operating system default encoding scheme
For unix/linux platforms, it is easy to modify the operating system encoding scheme. For example, under the Solaris platform, you can change the system code by running the following command:
Lang=zh. Gbk;export LANG
For Windows platforms, it is more complicated to modify the operating system's Chinese default encoding. Trying to change the operating system's regional and language options to other regions, using other languages, has no effect. Contact Microsoft Customer Service, nor can you provide the corresponding solution.
Specify default encoding when running Java application
When running a java-based application, add the parameters:
java–dfile.encoding=gb2312
The default encoding scheme for Java applications is hard bound with GB2312, that is, when the encoding scheme is not specified, the GB2312 encoding is used.
If the above changes are made for each application, the workload is great. Some applications implicitly invoke external Java applications, adding to the difficulty of making corrections. A more feasible approach is to modify the Java Run-time file to automatically add "-dfile.encoding=gb2312" parameters to the runtime.
It is recommended that the Windows platform be modified by this method. The programme is as follows:
1, renamed original Java.exe,javaw.exe, such as change to Javabak.exe,javawbak.exe
2. Rewrite Java.exe and Javaw.exe so that the runtime invokes the Javabak.exe,javawbak.exe and adds the "-dfile.encoding" parameter at run time.
The following C code can complete the above functions:
After compiling (note modifying the ARG value), the resulting file is named Java.exe and Javaw.exe, placed in the <java_home>/bin and <java_home>/jre/bin directories.
Through practice, this method can solve the GB18030 problem, and will not bring other hidden trouble. The only disadvantage is that when you run a Java application, an extra DOS window opens, which can be turned off and will not affect the application's operation.
4. Summary
In the application development, the Chinese coding has been a troublesome problem. Although the current GB18030 is a national mandatory standard, has a variety of advantages, but because its launch time is still short, in the application of its support is not perfect, or should be used as much as possible GB2312 compatible with the Chinese coding scheme.
The solution given in this article is not only suitable for solving the problem of GB18030 support for Java platform, but also provides another way to specify common Java run default parameters.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.
A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service