Analysis of Java Coding (note three concepts)

Last Update:2016-05-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Java vs. Unicode:

Java's class file is encoded in UTF8, and the JVM runs with UTF16.

The Java string is Unicode-encoded.

In summary, Java uses the Unicode character set to make it easy to internationalize.

Which character sets are supported by Java:

Which character sets can Java recognize and handle correctly?

View the CharSet class, with the latest JDK supporting 160 character sets. You can use the static method availablecharsets to get all the Java-supported character sets.

Java code

Assertequals (charset.availablecharsets (). Size ());
set<string> charsetnames = Charset.availablecharsets (). KeySet ();
Asserttrue (Charsetnames.contains ("Utf-8"));
Asserttrue (Charsetnames.contains ("utf-16"));
Asserttrue (Charsetnames.contains ("gb2312"));
Asserttrue (charset.issupported ("Utf-8"));

When do you need to pay attention to coding issues?

1. Read data from an external resource:

This is related to how external resources are encoded, and we need to use the character set used by external resources to read external data:

Java code

InputStream is = new FileInputStream ("Res/input2.data");
InputStreamReader StreamReader = new InputStreamReader (IS, "GB18030");

As we can see here, we use the GB18030 encoding to read the external data, which can be verified by looking at the encoding of StreamReader:

Java code

Assertequals ("GB18030", streamreader.getencoding ());

It is precisely because we specified the correct encoding for the external resource that we can decode correctly when it is turned into a char array (GB18030-i Unicode):

Java code

char[] chars = new char[is.available ()];
StreamReader.Read (chars, 0, is.available ());

But the code we often write is like this:

Java code

InputStream is = new FileInputStream ("Res/input2.data");
InputStreamReader StreamReader = new InputStreamReader (IS);

What encoding method does inputstreamreader use to read external resources? Unicode? No, the encoding used at this time is the default character set of the JVM, which is determined when the virtual machine starts, usually based on the locale and the charset of the underlying operating system. The default character set for the JVM can be obtained in the following ways:

Java code

Charset.defaultcharset ();

Why are you doing this? Because we read data from external resources, and external resources are encoded in the same way that the operating system uses the character set, it is understandable to use this default.

Well, then I created a file from my IDE ideas and read the data from this file in the JVM's default encoding, but the data read was garbled. Why? Oh, actually because the file created by ideas is encoded in utf-8. To get a file that the JVM defaults to encode, try it by manually creating a TXT file.

2. Converting strings and byte arrays to each other

We usually convert a string to a byte array using the following code:

Java code

"String". GetBytes ();

But have you ever noticed the code used in this conversion? In fact, the above code is equivalent to the following sentence:

Java code

"String". GetBytes (Charset.defaultcharset ());

That is, it converts a string into a byte array based on the JVM's default encoding, rather than what you might think of as Unicode.

Conversely, how do you create a string from a byte array?

Java code

New String ("string". GetBytes ());

Again, this method uses the default character set of the platform to decode the specified array of bytes (where decoding refers from a character set to Unicode).

String encoding Myth:

Java code

New String (Input.getbytes ("iso-8859-1"), "GB18030")

What does the code above represent? Someone would say, "convert the input string from ISO-8859-1 encoding to GB18030 encoding." If this is true, then how do we explain that the Java strings we just mentioned are Unicode encoded?

This statement is not only defective, but also is wrong, let us hit analysis, in fact, the fact is this: we should have used GB18030 code to read the data and decoded into a string, but the result is iso-8859-1 encoding, resulting in a wrong string generation. To recover, the string is restored to the original byte array, and then decoded again into a string by the correct encoding GB18030 (that is, the GB18030 encoded data is converted to a Unicode string). Note that the string is always Unicode encoded.

But the code conversion is not negative negative is so simple, here we can correctly convert back, because Iso8859-1 is a single byte encoding, so each byte is converted to a String as is, that is, although this is a wrong conversion, but the encoding does not change, So we still have a chance to convert the code back!

Summarize:

So, when we deal with Java coding problems, we need to be clear about three concepts: the encoding used by Java: The UNICODE,JVM platform default character set and the encoding of external resources.

http://www.iteye.com/topic/311583

Analysis of Java Coding (note three concepts) (turn)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Analysis of Java Coding (note three concepts)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Analysis of Java Coding (note three concepts)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support