String (byte[] bytes, Charset Charset) and getBytes () use

Source: Internet
Author: User

Refer to this article: http://blog.csdn.net/maxracer/article/details/6075057

Test code:

@Test public void Testbytes () {//Bytes//Chinese: iso:1 gbk:2 utf-8:3//digit or letter: iso:1 gbk:1 utf-8:1 String Usernam
 		E = "Medium";
 			try {//Get the specified encoded byte array string---> Byte array byte[] U_iso=username.getbytes ("iso8859-1");
 			Byte[] U_gbk=username.getbytes ("GBK");
 			Byte[] U_utf8=username.getbytes ("Utf-8");
 			System.out.println (u_iso.length);
 			System.out.println (u_gbk.length);
 			System.out.println (u_utf8.length);
 			The byte array----> String un_iso=new strings (U_iso, "iso8859-1") is exactly the inverse of the above.
 			String Un_gbk=new string (U_GBK, "GBK");
 			String Un_utf8=new string (U_utf8, "utf-8");
 			System.out.println (Un_iso);
 			System.out.println (UN_GBK);		
 			System.out.println (Un_utf8);		
 			Sometimes it must be an ISO character encoding type, which is handled as follows string Un_utf8_iso=new string (U_utf8, "iso8859-1");
 			The ISO-encoded string is restored as String Un_iso_utf8=new string (un_utf8_iso.getbytes ("iso8859-1"), "UTF-8");				
 			
 		System.out.println (Un_iso_utf8); } catch (Unsupportedencodingexception e) {//TODO Auto-geNerated Catch block E.printstacktrace (); }
 	}

Test results:

1
2
3
?
In
In
ĸ
In

From the reproduced article excerpt:

Garbled reason: Why use iso8859-1 encoding and then combination, can not restore the word "medium", in fact, the reason is very simple, because iso8859-1 encoded in the encoding table, there is no Chinese characters, of course, can not pass the "medium". GetBytes ("iso8859-1"); To get the correct "medium" in the iso8859-1 of the encoded value, so again through the new String () to restore it is impossible to talk about.

Sometimes, in order for Chinese characters to accommodate certain special requirements (such as HTTP header headers requiring their content to be iso8859-1 encoded), it is possible to encode Chinese characters in bytes, such as:
String s_iso88591 = new String ("Medium". GetBytes ("UTF-8"), "iso8859-1"), so that the resulting s_iso8859-1 string is actually three characters in Iso8859-1, After these characters are passed to the destination, the destination program passes the reverse way of string S_utf8 = new String (S_iso88591.getbytes ("iso8859-1"), "UTF-8") to get the correct Chinese kanji "medium". This guarantees both compliance with the Agreement and the support of Chinese.




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.