First, a char in Java is 2 bytes. Java uses unicode,2 bytes to represent a character, which differs from the C language in that the C language uses ASCII, in most systems, a char typically accounts for 1 bytes, but a character mapping between 0~127 integers, Unicode backwards compatible with ASCII. While Java uses Unicode to represent characters, a Chinese or English character has a Unicode encoding of 2 bytes, but if other encodings are used, each character occupies a different number of bytes.
an English alphabet character storage requires 1 bytes, one kanji The character store requires 2 byte In UTF-8 encoding, an English-letter character store requires 1 bytes, and a Chinese character store requires 3 to 4 bytes. In UTF-16 encoding, an English letter character store takes 2 bytes, A kanji character store takes between 3 and 4 bytes (some Chinese characters stored in the Unicode extension require 4 bytes). In UTF-32 encoding, the storage of any character in the world requires 4 bytes.
1, my system's default encoding method is GBK, so for the string "Hello Hello",
If the length () method is called, the result will be: 7. The method returns the number of characters in a string, either Chinese or English, and is considered a character.
If you convert it to a byte array and then return the length of the byte array, the result will be: 9. Because in GBK encoding, Chinese accounts for 2 bytes, while English characters account for 1 bytes.
Execute the following code to get the output as shown:
public class Hel { public static void main (string[] args) {String str = "Hi Hello" ; int byte_len = Str.getbytes (). length; Str.length (); System.out.println ( "byte length:" + Byte_len); System.out.println ( "character length:" + Len); SYSTEM.OUT.PRINTLN ( system default encoding: "+ system.getproperty (" file.encoding "
output results such as:
2, the encoding method is changed to Utf-8, that is, the following code execution:
Public classHel { Public Static voidMain (string[] args)throwsexception{String str= "Hi Hello"; intByte_len = Str.getbytes ("Utf-8"). length; intLen =str.length (); System.out.println ("Byte length is:" +Byte_len); System.out.println ("Character length is:" +Len); System.out.println ("System default encoding mode:" + system.getproperty ("file.encoding")); } }
For the string "Hello Hello", the resulting output is as follows:
The length of the resulting byte array is: 11. Because in Utf-8 encoding, the Chinese character takes up 3 bytes and the English character occupies 1 bytes.
3, if the encoding mode is changedto: utf-16, the output is as follows:
The length of the resulting byte array is: 16. Because in utf-16 encoding, the Chinese character takes up 3 bytes and the English character occupies 2 bytes.
3, if the encoding mode is changedto: utf-32, the output is as follows:
The length of the resulting byte array is: 28. Because in utf-32 encoding, all characters account for 4 bytes.
The character encoding in Java and the number of bytes that the string occupies.