Java character encoding based on the use of detailed _java

Source: Internet
Author: User

1, what is character encoding?

Character (Character) is the general name of words and symbols, including text, graphic symbols, mathematical symbols and so on. A set of abstract characters is the character set (Charset). Character sets appear to facilitate the dissemination and storage of information. Currently used in the character set are: Ascii,iso 8859-1,unicode,gb2312

2, what are the characteristics of various coding sets?

Ascii:

ASCII (American Standard Code for Information Interchange, American Information Interchange standard codes) is a computer coding system based on the Latin alphabet.

Contains content: Control character (carriage return, backspace, newline key), can display type character (English case, Arabic numerals and Latin symbols).

Technical Features: 7 bits (bits) represent a character, a total of 128 characters

Shortcomings: Only English, think of western Europe, East Asia and Latin America language symbols can not be expressed.

ISO 8859-1:

ISO 8859-1, official number for ISO/IEC 8859-1:1998, also known as Latin-1 or "Western European language", is the first 8-bit character set in ISO/IEC 8859.

It is based on ASCII and adds 96 letters and symbols in the range of vacant 0xa0-0xff, for use in the Latin alphabet language with additional symbols. The ISO 8859-1:1987 Edition has been introduced.

Contains content: ASCII encoding contains, part of Western European language used.

Technical characteristics: 8 bits represent one character.

Unicode:

The Unicode character set encoding is the abbreviation for the Universal Multiple-octet coded Character set Universal eight-bit coded character set, which is called a Unicode Academic Society (Unicode Consortium) The organization's character coding system, which supports the exchange, processing and display of written texts in various languages of the world today. The code was developed in 1990, officially announced in 1994, and the latest version is the March 31, 2005 Unicode 4.1.0.

Technical features: 16-bit encoding, each character occupies 2 bytes. The Unicode encoding of a character is determined. But in the actual transmission process, because different system platform design is not necessarily consistent, and for space-saving purposes, the implementation of Unicode encoding is different. Unicode is implemented as a Unicode conversion format (Unicode Transformation Format, abbreviated as UTF). If a 7-bit ASCII character Unicode file, the use of 2 bytes of the original Unicode encoding transmission will result in a relatively large waste. In this case, you can use the UTF-8 encoding, which is a variable-length encoding that will still represent the basic 7-bit ASCII character in 7-bit encoding, taking a byte (first up to 0). When mixed with other Unicode characters, it will be converted to a certain algorithm, each character is encoded with 1-3 bytes, and the first is identified by 0 or 1.

GB2312:

GB 2312 or GB 2312-80 is the Chinese national standard Simplified Chinese character set, full name "information exchange with Chinese character encoding character set, Basic Set", also known as GB0, issued by the China National Standards Administration, May 1, 1981 implementation. GB2312 codes are used in mainland China, and in Singapore and other fields. Almost all Chinese-language systems and international software support GB 2312 in mainland China.

Contains: 6,763 Chinese characters, including 3,755 Chinese characters, two Chinese characters 3,008, and 682 characters including Latin alphabet, Greek alphabet, Japanese hiragana and katakana letter, Russian Cyrillic alphabet.

Technical features: Each Chinese character and symbol is expressed in two bytes. The first byte is called "High byte", and the second byte is called "Low byte." "High byte" uses 0xa1-0xf7, and "low byte" uses 0xa1-0xfe0xa0. Since the first level of Chinese characters from the beginning of 16, the "High byte" range is 0xb0-0xf7, "low byte" range is 0xa1-0xfe, occupy the code bit is 72*94=6768. 5 of these vacancies are d7fa-d7fe.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.