Summary of the Chinese processing of JNI

Source: Internet
Author: User
Tags character set tomcat

Because of the working relationship, you need to use JNI to make method calls and data transfer between C + + and Java programs. But in the past always work in English environment, the Chinese (Other language coding empathy) problem is not too concerned about, recently smoked a little time to study, will be their own experience sorted as follows, for you to discuss or reference.

Before further discussion, there are a few basics to note:

Inside Java, all string encodings use Unicode, or UCS-2. Unicode is a character encoding scheme that represents each character in two bytes. Unicode has an attribute: it includes all character glyphs in the world. Therefore, the language of each region can establish the mapping relationship with Unicode, and Java is to use this to achieve the conversion between different languages;

UTF-8 is another coding scheme different from Ucs-2/ucs-4, where UTF represents UCS transformation Format, which is encoded in a variable length format, which can be 1~3 (it is said theoretically can be up to 6, not understood).

Because of the ucs-2/ucs-4 encoding, the string generated by the encoding contains special characters, such as the 0x0, the first byte of all 0~256 character Unicode encoding, which in some cases, such as transmission or parsing, can cause us some trouble. And for the general English alphabet waste too much space, in addition, it is said that UTF-8 also has Unicode does not have the ability to error correction (do not understand!) Therefore, Unicode is often used only as an intermediate code for logical representations. For more information on Unicode/utf-8, see reference 1;

Java Chinese garbled problem in many cases may occur: different applications, different platforms and so on, but the above problem has been a lot of excellent articles discussed, here does not make in-depth discussion, see References 2, 3, 4, 5. Here's a quick summary:

When we save the source file using the default encoding, the contents of the file are actually encoded in accordance with our system settings, which means that file.encoding can be obtained through the following procedure:

public class Encoding {
  public static void main(String[] args) {
    System.out.println(System.getProperty("file.encoding"));
  }
}
Javac when the encoding parameter is not specified, if the locale is not set correctly, it may cause a codec error, which may occur when compiling a file from another environment;

2, although in the Java internal (that is, the runtime, Runtime) string is in Unicode form, but in the class file information is stored in the form of UTF-8 (Unicode is used only as a logical representation of the middle code);

For Web applications, for example Tomcat, the Jsp/servlet engine provides a JSP conversion tool (JSPC) to search the JSP file with <%@ page contentType = "text/html;" Charset=<jsp-charset> "%> the specified charset. If <JSP-CHARSET> is not specified in the Jsp file, the system default file.encoding (this value is GBK on the Chinese platform) can be modified by the regional options of the Control Panel; JSPC with the equivalent of "Javac The –encoding <Jsp-charset> command interprets all characters that appear in the Jsp file, including Chinese and ASCII characters, and converts them to Unicode characters, which are then converted into UTF-8 formats and stored as Java files.

I have accidentally stored the JSP file as a UTF-8, while the internal use of the file CharSet is GB2312, the results of the runtime is always unable to display the normal Chinese, and later transferred to the default encoding mode is normal. As long as the file storage format is consistent with the CharSet setting at the beginning of the JSP, it can be displayed normally (although I have not experimented successfully with saving the file to UTF-16);

In an XML file, encoding represents the encoding of the file itself, and if the parameter setting is inconsistent with the actual encoding of the file itself, the decoding may fail, so the encoding should always be set to the same value as the file encoding; jsp/ The charset of HTML indicates the character set to decode the strings read from the file (in understanding Chinese problems, the string should be interpreted as a binary or 16-binary string, which may be mapped to different characters according to different CharSet).

I have discussed the specific meaning of encoding on the Internet: If encoding refers to the encoding of the file itself, how does the application that reads the file interpret the file correctly without knowing encoding settings?

According to the discussion and personal understanding, handlers (such as JSPC) always press Iso8859-1 to read the input file, and then check the file start a few bytes (that is, byte order Mark,bom, how to judge, you can refer to the Tomcat source code $source_dir\ Jasper\jasper2\src\share\org\apache\jasper\xmlparser\xmlencodingdetector.java's Getencodingname method, in JSP The page Character Encoding section of specification is also discussed in detail to explore in what format the file is saved, and when parsing to the encoding option, if the encoding setting is inconsistent with the actual file save format, an attempt is made to convert the However, this conversion may be an error when the file is actually encoding encoded in iso8859-1/utf-8, such as Unicode, UTF-16, and so on.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.