Solve garbled problems with the Java String Class GetBytes (String CharsetName) and string (byte[] bytes, string charsetname)

Last Update:2016-08-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

How the data of string in Java is stored, the source code can be seen that the string data is stored in a member variable such as char[] value , the size of the char type is 2 bytes in Java
We also know that the Unicode version that is commonly used now is UCS-2, which is to use 2 bytes to represent the Unicode version of a character, which is right, Java is using the UCS-2 standard, so the value in string is stored in a number

For example, ' You ' Unicode encoding is 4F60, see the test code below

char c = ‘你‘;System.out.println(Integer.toHexString(c));System.out.println(Integer.valueOf(c));System.out.println(c);

The result is:
4f60
20320
You

So, now we know that the inside string is actually stored without any encoded Unicode encoding, that is, the corresponding character encoding, and then look at our two methods:

GetBytes (CharsetName)
It means to get a byte array based on this code.
What does that mean?
That is, converting the in-memory Unicode encoding to a byte array corresponding to the CharsetName format
Like ' You ', the conversion to Utf-8 is three words, so the resulting byte array is three bytes
i.e. [E4 BD A0]

And then string (bytes,charsetname)?

It means to bytes this byte array in accordance with CharsetName, and assemble it as a string to save it.
For example, the above byte array [e4 BD A0], according to Utf-8 interpretation, stored is "You" this string, if interpreted according to other codes, will not be interpreted as "you"

Say something else, why it's usually necessary to manipulate the parameters in the servlet to control the encoding:

String str = new String (Param.getbytes ("iso-8859-1"), "UTF-8");

In fact, this is very good understanding, the browser passed the byte data is UTF-8 encoded, and then the Web container default this byte data is iso-8859-1 encoded, so using iso-8859-1 to convert this byte data into a string storage, equivalent to do the following:

string s = new string (utf8bytes, "iso-8859-1");

Note that this code is single-byte, that is, each byte is converted to Unicode encoding, fortunately, so that we have the opportunity to convert the string to sing Woo the same byte array, so that we usually use the most of the code of the coding process

Finally, to say again, the reason for not understanding the code is that we understand the error, we must know:

Unicode encoding used by the Java internal storage string
We usually hear someone say, "I need to convert string from iso-8859-1 to GBK code", what's going on? In fact, we are not going to "convert a string encoded by iso-8859-1 into a GBK encoded string", and it is repeatedly stated that the string in Java is Unicode encoded, so there is no "iso- 8859-1 encoded string "or" GBK encoded string "is said. The only reason for the conversion is that the string was incorrectly encoded. we often encounter the need to convert from iso-8859-1 to such things as gbk/utf-8 and so on. The so-called conversion process is:string–> byte[]–>string

Solve garbled problems with the Java String Class GetBytes (String CharsetName) and string (byte[] bytes, string charsetname)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Solve garbled problems with the Java String Class GetBytes (String CharsetName) and string (byte[] bytes, string charsetname)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Solve garbled problems with the Java String Class GetBytes (String CharsetName) and string (byte[] bytes, string charsetname)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support