In-depth parsing of Java garbled

Source: Internet
Author: User

1. What is encoding and why is it encoded?


Previously did not think so deep in the question, think that all take for granted, know that one day Java garbled let me kneel, he is not listening to my words, everywhere is garbled, this time I do not intend to let it, I want to clean it up.

As we all know, the text files, stored on the hard disk, are a bunch of binary, 01 combination, it is not carrying any, even a little, information to tell the text editor hi buddy son, I am GBK code, I am UTF-8

Code, I am ...

It is very simple, is 01 of the combination, he does not know what he is, so we only know what he is and then the right to edit the appropriate use it, first look at a small example.


This is a text file whose binary form is a string of 01


When I am using the encoding of the terminal is GBK display (first time)

When I use UTF-8 (second time)


He's just so simple. There are so fickle, very fortunate, this string of 01 whether in GBK encoding, or in UTF-8 are valid characters that is, by UTF-8 encoding rules it represents


"Han" and if according to GBK coding rules to parse it is also "Cha", this is the root, encoding and decoding is not a code. So in most cases you don't understand how the document is going to be messed up. This is required


The reason to encode. Binary file a string of 01 computers recognize it, but it's 01 meaningless to him. So we have to translate it into something that everyone can recognize. A kind of coding is a kind of mapping to me, that is, to what I just said


The GBK ring


Under the environment it mapped into a thing, UTF-8, under the circumstances it mapped into another thing, but the essence of them there is a thing, that Sir again asked, clearly again UTF-8 and GBK environment are the same, for example, a, this said to the, compiled


Compatibility problem between code and code,


To explore the friends themselves can explore, the code of the rules and compatibility, there is not much to say.



2. Getting Started code



To write a Java file, we'll start with a HelloWorld.




This is the same file I see in different coding environments, see things are not the same my system is Ubuntu 12.04 default encoding UTF-8 But in advance declare that I this file is GBK encoded so in GBK form of the compilation


Code view only makes sense, Zhang San and work, the bottom to compile my HelloWorld file, see what miracle?




I go, not live, HelloWorld can not make it all wrong, this day can't live. But don't worry, look at the error, this character (of course, this is not Zhang San representative of the character can not be found, but again GBK represents Zhang San 0101

Binary string in the UTF-8 cannot find the map so it can not parse, we may understand the code rules UTF-8 UTF-8, it is right to find it, you do not compile everywhere run it? Compiling all over the way how

Come on, don't worry. Javac has a parameter,-encoding <encoding> specify character encoding used by source files that is to say you tell Javac you this Java source File is what kind of code you don't give

Make a mistake, if you make a mistake, lucky, compile not to tell you where there is a problem, if the unfortunate Han becomes the cha to think about it is horrible so do not error does not mean that the program no problem, do not believe you try, I just said

, my system by default is UTF-8 so did not compile in the past this is GBK encoded source files, most people use Windows, but also installed the Chinese language pack, so the default encoding is mostly GBK so it can be compiled properly

GBK encoded source files, so generally no problem, but most of the program eventually run the environment is Linux Ah, all of you set the virtual machine parameters is-DFILE.ENCODING=GBK This is who also dare not lazy.

(But it's a little early to say this)




When I add this parameter, the compilation goes through.


Feel very sacred, run my Helloword


UTF-8 Environment for




I'm so excited, I'm finally out.


GBK Environment (This is the reason most garbled)





It's not science, it's a mess, God.


We certainly did not forget what Java class file is encoded AH Unicode yes so Zhang San this string so compiled, he is not GBK all the source files regardless of what the code, into the class file encoding is the same,


So I honestly added (the environment is still GBK)


What is this out in the System.out? PrintStream is a flow ah this stream you intend to use what code to show it, GBK environment is certainly in the form of GBK, but the first time is garbled why, the JVM is like this


I have a Unicode encoded Zhang San here I'm talking about my system UTF-8 when I'm not telling the JVM what encoding rules to encode, it's sure to use UTF-8 to compile and you get an input stream that is a string of


1112 binary represents UTF-8 under the Zhang San, and to the output to the screen here screen does not eat this set of 00001111 does not represent Zhang San, the top of the three I also do not know the words (shame AH), so we have to tell the virtual


Machine, I want the Zhang San is GBK encoded Zhang San, I want a string of binary corresponding GBK Zhang San, so add a this is good. Let's write it down here! It's too late to catch the second bus.


The next time I write a code about IO, I feel a little bit simpler than that. (with the learning of some Java and coding-related class effects more)


First write here, I wish you all good play in National day. Do not forget to masturbate two lines of code for a pleasant feeling, because a few days away, she probably forgot you.







In-depth parsing of Java garbled

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.