JAVA basics-character encoding
Character encoding
I. Overview
InputStreamReader
OutputStreamWriter
The character conversion stream is a bridge between the byte stream and the byte stream.Encoding conversion.
Ii. Origin of the encoding table
Computers can only identify binary data, which was originally an electrical signal;
To facilitate the application of computers and identify texts in various countries;
The text of each country is represented by numbers and matched one by one to form a table, which is the encoding table.
Iii. Common encoding tables
Note:
1. the char type in Java uses Unicode.
2. The beginning of each byte of the UTF-8 is addedId HeaderAfter adding, it is easy to differentiate. Chinese characters in UTF-8 are represented in 3 bytes.
3. The Chinese code table is compatible with the ASCII code table (including Chinese pinyin ).
Iv. encoding of conversion streams
1. You can store characters in a specified encoding format.
2. You can specify the encoding format for text data.
3. The specified encoding is completed by the constructor. The constructor is constructed as follows:
InputStreamReader (InputStream in, String charsetName) creates an OutputStreamWriter that uses the specified character set InputStreamReaderOutputStreamWriter (OutputStream out, String charsetName)
V,Encoding and decoding
1. Definition
Encoding: String → byte [] --> byte [] getBytes (Charset)
Decoding: byte [] → String --> String (byte [], charsetName)
String (byte [], int offset, int length, charsetName)
2. Illustration
Note:
1) if the encoding is wrong, it cannot be decoded. 2) The reason why can be decoded again is because the ISO8859-1 does not recognize Chinese characters, while decoding to maintain the original data. If you start to use UTF-8 decoding, because the UTF-8 supports Chinese, After decoding the data changes, no longer the original data. 3) Tomcat server default encoding is the ISO8859-1, when the Tomcat server garbled, you can then decode the restored data. 4) using notepad to open a file is decoded, because the data in the computer is binary files.
3.China Unicom phenomenon:Save it to China Unicom in notepad, and garbled characters are displayed again
Symptom explanation:
The UTF-8 has its own identification header, as follows:
China Unicom binary:
** 110 *** 00001
** 10 *** 101010
11001101
10101000
Just "Unicom" binary code just to meet the UTF-8 of the mark, once again open notepad, It is decoded according to the UTF-8 so caused by garbled.