http://blog.csdn.net/thl789/article/details/7506133
Https://zhuanlan.zhihu.com/p/23654187?refer=dreawer
Http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html
UTF-8
UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode that encodes each character with one to four bytes:
128 ASCII characters with a Unicode range of u+0000..u+007f need only one byte encoding;
A Unicode range of u+0080..u+07ff characters requires two byte encodings;
A Unicode range of other BMP characters in U+0800..u+ffff, which contains most of the commonly used words, is encoded using three bytes;
The characters of the Unicode auxiliary plane (other rarely used characters) use a four-byte encoding.
For the fourth character mentioned above, it seems too expensive for UTF-8 to use four bytes to encode. But UTF-8 for all the commonly used characters are only three byte expression, and the UTF-16 encoding for the fourth character of the same need four bytes to encode, and if it is the majority of ASCII characters, UTF-8 can greatly save storage space. UTF-8 gradually becomes the preferred encoding for e-mail, Web pages and other applications that store or transmit text. The Internet Engineering Task Force (IETF) requires all Internet protocols to support UTF-8 coding. Internet Mail Federation (IMC) recommends that all e-mail software support UTF-8 encoding.