Ext.: https://www.ibm.com/developerworks/cn/java/j-lo-chinesecoding/
How to encode several common encoding formats
Do not know whether people have thought of a question, that is why to encode? Can we not code? The answer to this question must go back to how the computer represents the symbols that we humans can understand, which are the languages we use for human beings. Because the human language has too many, thus expresses these languages the symbol too many, cannot use the computer the basic storage unit--byte to represent, therefore must have to undergo the splitting or some translation work, can let the computer understand. We can assume that the computer can understand the language as English, other languages to be able to use the computer must be translated into English. The process of this translation is coding. So you can imagine that a country that is not an English language must be coded to be able to use a computer. This seems to be a bit overbearing, but this is the status quo, this is the same as our country now vigorously promote Chinese, hope that other countries will speak Chinese, and other languages are translated into Chinese, we can put the computer stored in the smallest unit of information into Chinese characters, so that we do not have coding problems.
So in general, the reason for coding can be summed up as:
- The smallest unit of information stored in a computer is a byte that is 8 bits, so the range of characters that can be represented is 0~255
- There are too many symbols for humans to represent, not a single byte to fully represent
- To resolve this contradiction you must need a new data structure char, from char to byte must be encoded
In-depth analysis of the Chinese encoding problem in Java "Go"