The Java kernel is Unicode, that is, Unicode is used to represent characters when the program processes characters, but the file and stream are stored in byte streams. In the Basic Data Type of Java, char is Unicode and byte is byte. Therefore, Java needs to convert byte streams and char in different links. If the character set encoding is improperly selected during this conversion, garbled characters may occur.
There are several common garbled characters:
1. The Chinese character is changed to a question mark "? "
2. Some Chinese characters are correctly displayed, while others are incorrectly displayed.
3. garbled characters are displayed (some are Chinese characters but not as expected)
4. garbled characters in read/write Databases
Next we will explain the causes of these problems one by one:
First, we will discuss the question of converting Chinese characters into question marks.
In Java, byte and Char are converted to each other in the sun. Io package. The common conversion methods from byte to Char are as follows:
Public static bytetocharconverter getconverter (string encoding );
For your understanding, let's first make a small experiment: for example, the GBK of Chinese character "you" is encoded as 0xc4e3, and its Unicode code is/u4f60. This is the case in our experiment. First, we have a page named a_gbk.jsp to input the Chinese character "you" and submit it to the page B _gbk.jsp. In the B _gbk.jsp file, get the byte array of "you" in some encoding method, and then convert the array to char in some encoding method. If the obtained char value is 0x4f60, the conversion is correct.
The Code of a_gbk.jsp is as follows:
References:
FAQ about UTF-8 and Unicode
Edited by Sun Studio sun Xiaolong Zhao Li, Introduction to JSP dynamic website Technology and Improvement
References:
FAQ about UTF-8 and Unicode
Edited by Sun Studio sun Xiaolong Zhao Li, Introduction to JSP dynamic website Technology and Improvement