The byte array constructor for the Java doubts string uses the

Source: Internet
Author: User


The following code:

Public class example018 {public static void main (String[] args)  {byte  bs[] = new byte[256];for  (int i = 0; i < 256;  i++)  {bs[i] =  (byte)  i;} String (BS);//  calls 1string (bs,  "iso-8859-1");//  calls 2string (bs,  "GBK");//  calls 3string (BS,   "Utf-8");//  call 4}static void string (BYTE[]&NBSP;BS)  {string str = new  string (BS); //  uses the String (byte[]) constructor for  (int i = 0, length =  bs.length; i < length; i++)  {system.out.print ((int)  str.charAt (i)   +  " ");}} Static void string (Byte[] bs, string charset)  {try {string str =  new string (Bs, charset);//  uses the String (Byte[],charset) constructor for  (int i = 0 , &NBSP;LENGTH&NBSP;=&NBSP;BS.LENGTH;&NBSP;I&NBSP;&LT;&NBSP;LENGTH;&Nbsp;i++)  {system.out.print ((int)  str.charat (i)  +  " ");}}  catch  (unsupportedencodingexception e)  {e.printstacktrace ();}}


Result Description:

in the above code, the other three calls do not execute correctly except that call 2 compiles correctly and executes the expected results.


Results Analysis:

All four calls to the preceding code use the byte array constructor of the string. In the API, the description of this constructor is: when constructing a new string by decoding a specified byte array using the platform default character set, the length of the new string is a function of the character set, so it may not be equal to the length of the byte array. The behavior of this constructor is indeterminate when all bytes given are not fully valid in the default character set.

iso-8859-1 is the only default character set that allows the above-mentioned programs to print integers from 0 to 255 sequentially, and it is more commonly known as the name Latin-1.


Facts +:

Character Set: The binding of the coded character set and the character encoding pattern. In other words, a character set is a package that contains a character, a numeric encoding that represents a character, and a way to and fro between a character encoding sequence and a sequence of bytes. There is a big difference between the character set and the conversion pattern: some are one-to-many mappings between characters and bytes, but most of them are not.  

  • iso-8859-1 : The earliest encoding, similar to ASCII encoding. is a single-byte encoding , the maximum number of characters can be expressed in the range of 0-255, applied to the English series, cannot be expressed in Chinese.

  • GB2312/GBK : Specifically used to denote Chinese characters, is a double-byte encoding , while the English letter and Iso-8859-1 are consistent (compatible with ISO-8859-1 encoding). where GBK encoding can be used to represent both traditional and simplified characters, while GB2312 can only represent simplified characters, GBK is compatible with GB2312 encoding.  

  • UNICODE: The most uniform encoding, which can be used to represent characters in all languages, is fixed-length double-byte (also four-byte) encoded, incompatible with iso-8859-1.

  • UTF: because Unicode encoding is incompatible with iso-8859-1, and it is easy to take up more space, Unicode is not easy to transmit and store, so it produces UTF encoding, which is compatible with ISO-8859-1 encoding and can also be used to represent characters in all languages. UTF encoding is an indefinite length encoding, with each character varying in length from 1-6 bytes. In addition, UTF code comes with a simple checksum function. In general, the English alphabet is expressed in one byte, while the characters use three bytes.



Resources:

1, http://blog.csdn.net/xiongchao2011/article/details/7276834

2, http://www.blogjava.net/thisliy/archive/2009/12/09/305313.html

3, http://bbs.csdn.net/topics/350128607



via blog or public message explore together. )

Source code Address: Https://github.com/rocwinger/java-disabuse


This article from "Winger" blog, declined reprint!

The byte array constructor for the Java doubts string uses the

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.