UTF-32 stores each character in 4 bytes to ensure that the UCS is fully represented. However, the number of characters in the UCS does not need to be represented by 32 bits at all, UTF-32 greatly wasted space. In addition, because of the combination of characters, the fixed length is not as fast as expected to locate characters, anyway, is super bad.
UTF-16 maps the UCS to a 16-long integer for data storage or transport. The code position of the UCS requires 1 or 2 16-bit codes to represent, so this is a variable-length representation. In addition, UTF-16 also needs to specify the byte order. The string representations in Java and C # are UTF-16 encoded so that their char type is 16 bits with the short type, and a byte type is added to represent the 8-bit byte .
UTF-8 is also a variable-length character encoding, which is a prefix code, the character of the prefix code is that the encoding system of any one of the legitimate code will not be the prefix of another code, so UTF-8 do not need to specify the byte order. A UTF-8 encoding can be expressed in 1~6 bytes, set the first bits of the initial byte to one to specify that the character occupies a few bits, such as a two-byte character encoding, the first bit is 110xxxxx, the second bit is 10xxxxxx, The encoding for a six-byte character is this: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx, so UTF-8 can encode up to 231 characters.
"Character set and character encoding" UTF-8, UTF-16, and UTF-32