The problem of ASCII transcoding in Java

Source: Internet
Author: User

question:

I developed in Java, according to business needs, to the character set for ASCII byte[], into Chinese.
as
String chinastring = "Hello";
byte[] Cascii = chinastring.getbytes ("Us-ascii");

question added:

string S1 = new String (Chinastring.getbytes ("Us-ascii"), "GB2312");  
String s2 = new String (Chinastring.getbytes ("Us-ascii"), "UTF-8");  
String s3 =new String (chinastring.getbytes ("iso-8859-1"), "GBK");
and other byte characters have tried, neither. The print is:?? 

 Back: Looking for the countercurrent fish  -not I want to use acsii code, but the front I said is the business needs, the other side of the interface returned to me is this, I also very resistant. So here to the heroes to ask.
Reply:
It was a little funny to see a few of the answers, the landlord itself out of a false proposition, several answers did not understand the character set conversion, the landlord also do not know that they have made a fatal error, "Hello" itself is a multiple-digit code, you are hard to use the low code to decode, not wrong to blame it.

String S1 =new string (chinastring.getbytes ("gb2312"), "iso-8859-1"); Here the gb2312 can be omitted, the system will default to your control panel set a good character set, because the original "Hello" is a multiple-bit code, so you have to use a number of code to decode, that is, gb2312 or utf-8 to decode, so Chinastring.getbytes ("gb2312" becomes a byte array, at which point you can arbitrarily reassign code such as Iso-8859-1, which is S1 into a iso-8859-1 encoded string, and if you want to change to Chinese again, what character set you encode, what character set you must use to decode it,
This is iso-8859-1, so you can do string s2 = new String (S1.getbytes ("iso-8859-1"), "gb2312");

So S2 back to Chinese again, so when you print S2, it's "Hello". There is also a question as to why the iso-8859-1 is used here instead of other character sets. Here because, other character sets can not be transferred to each other code loss occurs, you can try, the iso-8850-1 replaced Utf-8 try, this can not be turned back, which led to garbled phenomenon, if you ios-8850-1 replaced GBK, most of it is possible, because GBK compatible gb2312

, but not all can turn back to each other, it is best to use iso-8850-1 as the intermediate transcoding character set, because all the character sets are iso-8859-1 compatible, whether GBK or gb2312,utf-8, can be perfectly converted back. Say so much, perhaps everybody still a little confused, this application is in where. I give an example, you know, often some people do web crawler, download the Web page, but, a variety of Web page coding, mainly Iso-8859-1,gbk,gb2312,utf-8, the Web page with Io bytes down, to convert it to the character stream, that is, text, It's time to know what the code is. How to do it. To do so, regardless of it, first convert to Iso-8859-1 code, and then use regular expressions to look at the head of the page, such as: <meta content= "text/html; charset=gb2312 "http-equiv=" content-tYpe "", then know that the code of this page is gb2312, and then use this string s2 = new String (S1.getbytes ("iso-8859-1"), "gb2312"); S2 will be able to correctly display the content of the Web page. If you use other character sets such as utf-8 as the middle code, you can also know that the page encoding is gb2312, but this is you use the string s2 = new String (S1.getbytes ("Utf-8"), "gb2312"), there will be garbled phenomenon,

So it's best to use iso-8859-1 as the middle code. Therefore, the landlord in addition to committing the above errors, and, should not use US-ASCII as a middle code, otherwise, can not be converted to Chinese, they can not be converted between each other, only iso-8859-1 can be perfectly converted to other encodings.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.