Analysis of the principle of Java character encoding __ code

Source: Internet
Author: User

This week encountered a Java garbled problem, so the Java coding problem did some experiments and understanding. The simple analysis is as follows:

First look at the following code:

import java.io.UnsupportedEncodingException;

Public class charsettest {

Public Static void Main (string[] args) throws unsupportedencodingexception {

String test = "basketball";

byte [] Defaultresult = Test.getbytes ();

for ( byte e:defaultresult) {

System. Out. Print (E + "");

}

System. Out. println (System. GetProperty ("file.encoding"));

System. Out. println ("test=" + test);

}

}

1. Performs Javac Charsettest.java, compiles normally, but gets the following warning:
Charsettest.java:5: warning:unmappable character for encoding ASCII
String test = "????";
Analyze why this is the case. For the Java compiler, Charsettest.java is a text file that the Java compiler parses to parse the text file and compile the build. class file. Analysis of the next reason is probably this: Charsettest.java must be stored in a certain encoding format, so the Java compiler must know what to encode the text file, if not specified with the default file is the encoding format is "ansi_x3.4-1968" ( Different circumstances may not be the same, so you will find unexplained Chinese and there are garbled.
So how to solve the problem, is to compile the time to tell the compiler, need to compile the Java file Encoding format, otherwise the compiler may encounter incomprehensible characters as garbled processing. Because the Charsettest.java is in GBK format, this is done by the following command:
Javac CHARSETTEST.JAVA–ENCODING=GBK.

2. By executing javac CHARSETTEST.JAVA–ENCODING=GBK, the correct class file has been obtained, but the Java charsettest is executed as follows:
ansi_x3.4-1968
Test=??
So since the compiler has been correctly compiled, why the output results will be garbled it. It is certain that the Chinese string stored in the class file is correct, and that the reason for this is that when the JVM reads the string byte stream from the. class file and constructs a string object, it uses the wrong character encoding to construct the byte stream. In turn, the byte stream from the JVM output string to our console appears garbled. So obviously, we have to tell the JVM the code of our console, or what character code we want it to use to build the byte stream. If the JVM is not told, the file is encoded in "ansi_x3.4-1968" (different environments may not be the same).
Assuming that our console is a GBK encoding, it can correctly return the byte stream as long as we tell it correctly. So the reason is simple, we're not telling the JVM exactly how we need it to construct the string output stream, the encoding format that should be used. So how to deal with it. Use the following command:
JAVA-DFILE.ENCODING=GBK Charsettest
Get the result:
-64-70-57-14 GBK
Test= Basketball
The above tests are not set up under Eclipse because Eclipse will help us make some judgments. At the same time, different environments may not be the same.

Final Summary:
Java class files are UTF-8, using UTF-16 in the JVM. The whole process of encoding conversion can be seen in the following figure:

From the diagram above you can understand that, regardless of the format of the source file, as long as the compiler is correctly told, the compiler will get the correct results. At the same time, the JVM always returns the output stream in the correct encoded format as long as it tells the JVM the proper encoding format needed for the output stream.


So if you want to not generate garbled to note two links:
1.    tell the compiler your source file encoding.
2.    tells the JVM what encoding you want to display or construct the string output stream.
especially JSP garbled when you need to pay attention to request to use the encoding and parsing requests when the encoding is consistent, response encoding and HTML charset coding is consistent.

At the same time we often encounter JSP, database and other garbled problems can find out whether the following two kinds of reasons:
1.    misunderstanding: A file is GBK code, but you think it is UTF-8 type code, so with UTF-8 to understand it, will appear garbled.
2.   : A file GBK encoding, you know it is GBK code, but you want to convert to iso-8859-1 way to display, but GBK there are many characters when iso-8859-1 can not explain, At this time will also appear garbled.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.