Windows. Java and. class file character set encoding relationships and includes a similar analysis on C + +

Last Update:2016-08-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The summary Windows system defaults to the GBK character set, resulting in the inability to use UTF-8 decoding. This article begins with a description of the character set used on Windows, and then analyzes the character set relationships between Java,. class, Javac, and the character set relationships between the source files, binaries, and compilers of C/b + + projects that analyze vs. In the end, it is best to use the-encoding parameter to specify the character set used by the. java file in order to avoid the javac of non-recoverable Chinese characters .
"Problem Recurrence"
In a Java project, project output is garbled because the source file store uses a different character set . When using GBK storage source files, the normal characters out "Chinese", and the use of UTF-8 storage source files, but output garbled.
For the use of UTF-8 storage source files, but output garbled:

For the use of GBK storage source files, normal output:

"One, Windows system default character set"
In 1980, China set up a gb2310-80, a total of 7,445 characters included. In 1993, the Unicode 1.1 version, which included 20,902 Chinese characters, developed a "GB 13000.1-93" equivalent to the Unicode 1.1 version, referred to as GB13000. Microsoft extended the gb2312-80, and included GB13000 and Unicode1.1 in the Chinese characters, developed a GBK code. CP936 is represented in Windows using the code page. For example, use the CHCP command in the console to view the character set used by Windows.
GBK is the default character set for the Windows Chinese system .

"Ii. vs. C + + project source files, binary text and compiler relationships"
The VC compiler version used by the author is 19.00.24210, using the VS version of VS2015.
It is well known that each file is saved with the specified character set selected, that is, the character set used when the source file is saved on Windows can be selected in the form GBK and UTF-8, on the Chinese Windows 7 system, the character set of the default storage source file for the VS is GBK.
After compiling the binary executable with the VC compiler, the character set used by the binaries conforms to the following table. The UTF8 with the BOM means that the file is preceded by three characters as the BOM header, and the identity file uses the UTF8 character set .

Source file Character Set	Compiled binary file character set
GBK	GBK
UTF-8 (with BOM)	GBK
UTF-8	UTF8

"Third, Java." The relationship between Java,. class, JVM, Output console
In Ali, many people use IntelliJ idea as the IDE to develop Java applications, while IntelliJ idea uses the UTF8 character set by default, such as the IDE encoding means that the entire IDE uses UTF8 encoding, Project Encoding indicates that this project uses UTF8 encoding.

In Java,. java files and. class files have character set relationships in the following table, such as the string in the "Chinese" string Str,.class in. Java in three cases: ①.java is saved in GBK format, that is, Str saves the content "Chinese" in GBK format, After Javac compiled, the. Class Str becomes UTF-8 saved "Chinese"; ②.java is saved with UTF-8 (no BOM), that is, Str saves the content "Chinese" in UTF-8, after Javac compilation, the. class str changes to UTF-8 saved "trickle PO ③.java is stored as UTF-8 (with BOM) and cannot be compiled.

. java file Character set	. class file Character set
GBK	UTF-8
UTF-8 (no BOM)	UTF-8 (but Chinese is garbled)
UTF-8 (with BOM)	Compilation failed unable to build. class file

For the second case above, why does the. class keep the UTF-8 garbled? This is because. class must be using the Unicode character set, which is compatible with UTF8, and. Java can use any character set. The Java build process uses the character set as follows: ". Java (arbitrary encoding),. Class (Unicode)-In-JVM (Unicode). garbled is due to javac the UTF8 format of the. java file as the GBK format , because Javac can be specified by-encoding. Java character set, without specifying the case will default. Java uses character sets for the system. Since no use of-encoding,javac will already be UTF-8. java files are treated as GBK files, resulting in garbled characters. Specifically visible: https://www.zhihu.com/question/30977092

Windows. Java and. class file character set encoding relationships and includes a similar analysis on C + +

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Windows. Java and. class file character set encoding relationships and includes a similar analysis on C + +

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Windows. Java and. class file character set encoding relationships and includes a similar analysis on C + +

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support