In-depth parsing of java garbled characters

Source: Internet
Author: User
Tags cve

In-depth parsing of java garbled characters

1. What is encoding and why?


I never thought about such a deep problem before. I took everything for granted. I knew that one day I had to kneel down with java garbled characters. He was not listening to me, and there were gibberish everywhere, I don't want to let it go this time. I want to clean it up.

As we all know, text files, stored on hard disks, are a string of binary and 01 combinations. They do not carry anything, even if they are a bit, and the information tells the text editor hi buddy, i'm GBK encoding, I'm UTF-8 Encoding

Code, I am .......

It is very simple, that is, the combination of 01. He does not know what he is, so we only know what he is and then can correctly edit and use it. First, let's look at a small example.


<喎?http: www.bkjia.com kf ware vc " target="_blank" class="keylink"> VcD4KPHA + signature + zsS8/qOsy/u1xLb + signature + cve-vcd4kpha + signature/Signature + u + signature + PGJyPgo8L3A + signature + dLUtPO24Mr9x + m/9 s/large + PGJyPgo8L3A + large + NbGzsS8/large/small/large + c1_vcd4kpha + vrPPwiDL/large + s8/Cy/medium/large + PGJyPgo8L3A + CjxwPs/examples + cve-vcd4kpha + PGJyPgo8L3A + signature + cda-vcd4kpha + PGJyPgo8L3A + signature + cda-vcd4kpha + signature "http://www.2cto.com/uploadfile/Collfiles/20140930/20140930092441341.png" alt = "\">



This is the same file I view in different encoding environments, and what I see is not the same my system is Ubuntu 12.04 default encoding UTF-8 but stated in advance, this file is encoded in GBK format.


It makes sense to check the code. James made another effort. Let's compile my HelloWorld file below to see if there is any miracle?




I can't do anything about helloWorld, but I can't do anything about it. It can make a mistake. This day cannot be over. But don't worry. Check the error. This character (of course, it is not the character represented by James here, but the GBK character represents Michael's 0101

Binary string in the UTF-8 can not find the ing so it can not resolve, you can understand the UTF-8 Encoding Rules) UTF-8 can not find, right can not find it, you are not a compilation everywhere to run? Compilation failed. Why?

Don't worry, there is a parameter in javac,-encoding Specify character encoding used by source files that means you tell javac what kind of java source file is encoded.

Wrong. If you make a mistake, fortunately, the compilation won't tell you where the problem exists, if the unfortunate guy turns into a hacker, it's terrible to think about it. So if you don't report an error, it doesn't mean that the program is okay. If you don't believe it, try it. I just said

By default, my system is a UTF-8, so I didn't compile it in the past. This is a GBK-encoded source file, most people are still using windows, and also installed a Chinese Language Pack, therefore, the default encoding is mostly GBK, so it can be compiled normally.

The source file of GBK encoding, so there is generally no problem, but most programs will eventually run in a linux environment. All users set the virtual machine parameter-Dfile. encoding = GBK. No one dares to be lazy.

(But it should be earlier)




When I add this parameter, the compilation passes smoothly.


Feel very holy, run my helloWord


In the UTF-8 Environment




So excited. I finally got out.


Running in GBK environment (this is the reason for most garbled code)





This is not scientific. Why are they confused?


Everyone must have forgotten what encoding the java class file is. Unicode, right? So after the compilation of Michael's string, it's not a GBK. No matter what encoding source file, the class file encoding is the same,


So I added it honestly (this environment is still GBK)

When the system UTF-8 didn't tell JVM what encoding rules to encode it, it certainly uses UTF-8 to compile and you will get an input stream. This input stream is a string


00111 binary represents three under the UTF-8, and to the output to the screen here the screen does not eat this set of 00001111 does not represent three, the three words that I don't know (shame) above, so I have to tell the virtual


Machine, I want Michael Jacob from GBK encoding, and I want a string of binary numbers corresponding to Michael Jacob from GBK, so I just need to add this. Write it here first! It's no longer time to catch up with the second-way car.


Next time I write another code about I/O, I think it is easier. (The effects of some java classes related to encoding are even better)


I would like to write it here and wish you a good day. Don't forget two lines of code yi Qing, because she may have forgotten you for a few days.







Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.