The Chinese coding problem in the String.getbytes () method (goto)

Last Update:2015-05-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The GetBytes () method of the string is a byte array that gets the default encoded format of the system
GetBytes ("Utf-8") gets a byte array in UTF-8 format

Convert string to bytes, various encoding conversion to bytes different, such as UTF-8 each Chinese character to 3bytes, and GBK to 2bytes, so to explain the encoding method, otherwise with the default encoding.

In Java, the GetBytes () method of string is a byte array that is given the default encoding format of the operating system. This means that under different operating systems, the returned things are not the same!

The String.getbytes (string decode) method returns a byte array representation of a string under the encoding, based on the specified decode encoding, such as:
BYTE[]B_GBK = "Medium". GetBytes ("GBK");
byte[] B_utf8 = "Medium". GetBytes ("UTF-8");
byte[] b_iso88591 = "Medium". GetBytes ("iso8859-1");
Returns a byte array in the GBK, UTF-8, and iso8859-1 encodings for the Chinese character, respectively, at which point

The length of the B_GBK is 2,

The length of the B_utf8 is 3,

The length of the b_iso88591 is 1.

In contrast to GetBytes, this "medium" Word can be restored by means of the new String (Byte[],decode),

This new string (Byte[],decode) actually parses byte[] into a string using the specified encoding decode .
String S_GBK = new String (B_GBK, "GBK");
String S_utf8 = new String (B_utf8, "UTF-8");
String s_iso88591 = new String (b_iso88591, "iso8859-1");
by outputting S_GBK, S_utf8, and s_iso88591, you will find that both S_GBK and S_utf8 are "medium", and only s_iso88591 is an unrecognized character (which can be interpreted as garbled), why you cannot restore "in after using ISO8859-1 encoding Character The reason is simple, because the ISO8859-1 encoded table does not contain Chinese characters at all , and of course it cannot pass "medium". GetBytes ("Iso8859-1") to get the correct "medium" in the iso8859-1 of the encoded value, so, Again through the newstring () to restore it is impossible to talk about.
Therefore, when using the String.getbytes (Stringdecode) method to get byte[], it is important to make sure that the code value of the string representation exists in the Decode encoding table, so that the resulting byte[] array can be correctly restored.

Attention:

Sometimes, in order for Chinese characters to accommodate certain special requirements ( such as httpheader requires that their content be iso8859-1 encoded ), it is possible to encode Chinese characters in bytes, such as:
strings_iso88591 = NewString ("Medium". GetBytes ("UTF-8"), "iso8859-1"), so that the resulting s_iso8859-1 string is actually three characters in Iso8859-1, After these characters are passed to the destination, the destination program then Strings_utf8 = newstring (S_iso88591.getbytes ("iso8859-1"), "UTF-8" in the opposite way to get the correct Chinese kanji "medium", This guarantees both compliance with the Agreement and the support of Chinese.

The Chinese coding problem in the String.getbytes () method (goto)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The Chinese coding problem in the String.getbytes () method (goto)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support