One: initialization of Char
Char is a reserved word in Java, and unlike other languages, Char is 16-bit in Java because Java uses Unicode. However, the 8-bit ASCII code is included in Unicode, from 0~127.
The reason for using Unicode in Java is that Java applets allow the world to run, and it requires a character encoding that can represent all human languages. Unicode. But English,spanish,german, French do not need to say so, they are actually using ASCII code will be more efficient. There is a trade-off between the two.
Because Char is 16-bit, the Unicode encoding is taken, so Char has the following initialization methods:
Char c= ' C '; character, which can be kanji, because it is a Unicode encoding
Char c= decimal number, octal number, hexadecimal number and so on; You can assign a value with an integer
Char c= ' \u number '; Initialized with the encoded value of a character, such as: Char= ' "," the end character, its ASCLL code is 0, the meaning of this sentence and Char c=0 is a meaning.
Two: The question about char occupies several bytes is as follows:
1: "Byte" is byte, "bit" is bit;
2:1 byte = 8 bit;
Char is 2 bytes in Java. Java uses a unicode,2 byte (16-bit) to represent one character.
The example code is as follows:
[Java]Code PublicclassTest { PublicStaticvoidMain (string[] args) {String str= "medium";Charx = ' Medium ';byte[] bytes=NULL;byte[] bytes1=NULL;Try{bytes = str.getbytes ("Utf-8"); Bytes1 = Chartobyte (x); }Catch(Unsupportedencodingexception e) {//TODO auto-generated catch block E.printstacktrace (); } System.out.println ("bytes Size:" +bytes.length); System.out.println ("Bytes1 size:" +bytes1.length); } PublicStaticbyte[] Chartobyte (Charc) {byte[] B =Newbyte[2]; B[0] = (byte) ((C & 0xff00) >> 8); B[1] = (byte) (C & 0xFF); returnb } }
Run Result:
Bytes Size: 3
Bytes1 Size: 2
Java is used to represent characters in Unicode, and Unicode for the Chinese character "medium" is 2 bytes.
The String.getbytes (encoding) method is to get a byte array representation of the specified encoding,
Typically, the gbk/gb2312 is 2 bytes, and the Utf-8 is 3 bytes .
If encoding is not specified, the system default encoding is taken.