Common Character Set & garbled problems

Source: Internet
Author: User

Character set common character set classification ASCII and its extended character set

Role: predicative English and Western European languages.

Number of digits: ASCII is represented by 7 bits and can represent 128 characters, and its extension uses 8-bit notation, representing 256 characters.

Range: ASCII from 00 to 7F, extended from 00 to FF.

Iso-8859-1 Character Set

Function: Extended ASCII, representing Western Europe, Greek, etc.

Number of digits: 8 bits, range: from 00 to FF, compatible with ASCII character set.

GB2312 Character Set

Role: National Simplified Chinese character set, compatible with ASCII.

Number of digits: represented by 2 bytes, can represent 7,445 symbols, including 6,763 kanji, almost all high-frequency Chinese characters.

Range: High byte from A1 to F7, low byte from A1 to FE. The high-and low-byte are encoded by adding 0xa0 to each other.

GBK Character Set

Role: It is an extension of GB2312, adding support for traditional characters, compatible with GB2312.

Number of digits: 2 bytes, representing 21,886 characters.

Range: High byte from 81 to Fe, low byte from 40 to FE.

Unicode character Set

Function: Unified coding for 650 languages of the world, compatible with iso-8859-1.

Number of digits: The Unicode character set is encoded in multiple ways, utf-8,utf-16 and UTF-32, respectively.

BIG5 Character Set

Function: Unify traditional Chinese characters encoding.

Number of digits: represented by 2 bytes, representing 13,053 kanji.

Range: High byte from A1 to F9, low byte from 40 to 7E,A1 to FE.

GB18030 Character Set

Function: It solves the encoding of Chinese, Japanese, Korean, etc., and is compatible with GBK.

Number of bits: It takes a variable byte representation (1 ascii,2,4 bytes). can represent 27,484 words.

Range: 1 bytes from 00 to 7F; 2 bytes High bytes from 81 to Fe, low bytes from 40 to 7E and 80 to fe;4 bytes 13th bytes from 81 to Fe, 24th bytes from 30 to 39.

UCS Character Set

Role: The International standard ISO 10646 defines the universal Character set (Universal Character set). It is compatible with Unicode-homogeneous organizations, UCS-2, and Unicode.

Number of digits: it has UCS-2 and UCS-4 two formats, 2 bytes and 4 bytes, respectively.

Scope: At present, UCS-4 only in front of UCS-2 added 0x0000.

Sort by the text that is represented
Language Character Official name
English, Western European ASCII Iso-8859-1 MBCS Multi-byte
Chinese Simplified GB2312 MBCS Multi-byte
Chinese Traditional BIG5 MBCS Multi-byte
Simple and Traditional Chinese GBK MBCS Multi-byte
Chinese, Japanese and Korean GB18030 MBCS Multi-byte
National languages Unicode,ucs DBCS Wide Byte

Conversion between encodings:

Requirement: To know the encoding format of the current content and the encoding format to be converted to:

Example:

String username = request.getparameter ("username"). Trim (); String Password = request.getparameter ("password"). Trim ();

Gets the String type variable: username and password are encoded in the following format: iso-8859-1

How to convert them to UTF-8 encoding, do not appear garbled, the code is as follows:

String parameter = Request.getparameter ("username"); Gets the binary number corresponding to the parameter byte[] temp = parameter.getbytes ("iso-8859-1"); Manually encode the string into Utf-8 by the corresponding binary number param = new string (temp, "utf-8");

Principle:

The same content in the computer binary encoding is the same, so in different encodings between content delivery, to not appear garbled, first the content by its original encoding into a binary sequence. The binary sequence is then translated according to the encoding to be converted, and no garbled characters are present.

The meaning of the garbled form appearing:

?????? ---> represents the character encoding mismatch caused by

Ÿ?� ---> representative does not have this encoding method

Common Character Set & garbled problems

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.