JAVA basics-character encoding

Source: Internet
Author: User

JAVA basics-character encoding

Character encoding

I. Overview
InputStreamReader
OutputStreamWriter
The character conversion stream is a bridge between the byte stream and the byte stream.Encoding conversion.

Ii. Origin of the encoding table
Computers can only identify binary data, which was originally an electrical signal;
To facilitate the application of computers and identify texts in various countries;
The text of each country is represented by numbers and matched one by one to form a table, which is the encoding table.
Iii. Common encoding tables

Note:
1. the char type in Java uses Unicode.
2. The beginning of each byte of the UTF-8 is addedId HeaderAfter adding, it is easy to differentiate. Chinese characters in UTF-8 are represented in 3 bytes.
3. The Chinese code table is compatible with the ASCII code table (including Chinese pinyin ).

Iv. encoding of conversion streams
1. You can store characters in a specified encoding format.
2. You can specify the encoding format for text data.
3. The specified encoding is completed by the constructor. The constructor is constructed as follows:

InputStreamReader (InputStream in, String charsetName) creates an OutputStreamWriter that uses the specified character set InputStreamReaderOutputStreamWriter (OutputStream out, String charsetName)

V,Encoding and decoding
1. Definition
Encoding: String → byte [] --> byte [] getBytes (Charset)
Decoding: byte [] → String --> String (byte [], charsetName)
String (byte [], int offset, int length, charsetName)
2. Illustration

Note:

1) if the encoding is wrong, it cannot be decoded. 2) The reason why can be decoded again is because the ISO8859-1 does not recognize Chinese characters, while decoding to maintain the original data. If you start to use UTF-8 decoding, because the UTF-8 supports Chinese, After decoding the data changes, no longer the original data. 3) Tomcat server default encoding is the ISO8859-1, when the Tomcat server garbled, you can then decode the restored data. 4) using notepad to open a file is decoded, because the data in the computer is binary files.

3.China Unicom phenomenon:Save it to China Unicom in notepad, and garbled characters are displayed again

Symptom explanation:
The UTF-8 has its own identification header, as follows:


China Unicom binary:
** 110 *** 00001
** 10 *** 101010
11001101
10101000

Just "Unicom" binary code just to meet the UTF-8 of the mark, once again open notepad, It is decoded according to the UTF-8 so caused by garbled.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.