Getting started with character set encoding in Java (v) character encoding conversion in Java code

Source: Internet
Author: User
Tags character set file system

If you are the designer of the JVM, and you decide to make a representation of all the characters in the JVM, will you not allow characters that are encoded in a variety of formats to coexist?

I guess your answer is no, if the Java characters in memory can exist in various encoding forms such as GB2312,UTF-16,BIG5, then for developers, even the most basic string printing, connection and other operations will be unable to do anything. For example, after a GB2312 string is attached to a UTF-8 string, what should the final result of the connection be encoded? It makes no sense to choose whichever you prefer.

So with that in mind, this is the common will of Java developers: in Java, characters exist in only one encoding form, and that is UTF-16.

But where does "in Java" mean? is in the JVM, in memory, in the variable of every char,string type declared in your code. For example, you write this in a program.

Char han= ' Han ';

In the corresponding area of memory, this character is represented as 0x6c49. You can use the following code to prove:

char han='汉';
System.out.format("%x",(short)han);

The output is:

6c49

In turn, it is possible to specify a character with UTF-16 encoding, like this:

char han=0x6c49;
System.out.println(han);

The output is:

Chinese

This is also said, as long as you correctly read the word "Han", then its representation in memory must be 0x6c49, no other value can represent the word (of course, if you read the wrong, the result is not know, Fanwei said: Read, read wrong Ah, that is equal to hundreds of millions of? The mountain eldest brother said: Good hundreds of millions of you also did not answer, please listen to the next question.

This convention of the JVM divides a world of characters into two parts: the internal JVM and the OS file system. Within the JVM, uniform use of UTF-16 representations, when this character is moved from inside the JVM to the outside (that is, when the contents of a file in the file system are saved), the code conversion is done, using a specific coding scheme (there is also a very special case that requires conversion within the JVM, but this is something).

So it can be said that all code conversions occur only at the boundary, the JVM and the OS junction, where your various input and output streams (or Reader,writer classes) work.

In this case, the Java IO system must be added.

Despite seemingly messy, all IO can be divided into two camps: character-oriented reader, Wrtier, and byte-oriented input and output streams.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.