Several encoding methods

Source: Internet
Author: User

UTF-16 is the third layer of the Unicode character encoding five hierarchy model, and the character encoding table (Character Encoding form) is a way to implement. That is, the abstract code bit of the Unicode character set

A sequence that maps to a 16-bit long integer (that is, a code element) for data storage or delivery. The code bit for Unicode characters, which requires 1 or 2 16-bit long code elements, is represented by a variable-length representation.

The advantage of UTF-16 compared to UTF-8 is that most characters are stored in fixed-length bytes (2 bytes), but UTF-16 is not compatible with ASCII encoding.

Unicode (Uniform Code, universal Code, single Code) is an industry standard in the field of computer science, including character set, encoding scheme, etc. Unicode is created to address the limitations of traditional character encoding schemes, which set a uniform and unique binary encoding for each character in each language to meet the requirements of cross-language, cross-platform text conversion and processing.

GB2312 is a Chinese-defined Chinese character coding, also can be said to be Simplified Chinese character set encoding;

GBK is an extension of GB2312, in addition to compatibility with GB2312, it can also display traditional Chinese, as well as Japanese kana.

The difference between UTF-8 and GBK:

Characters are represented by double-byte, except that they are distinguished in Chinese, and the highest bits are set to 1.

As for the UTF-8 encoding, which is used to solve the international character of a multi-byte encoding, it uses 8 bits in English (that is, one byte), Chinese

Encoded using 24 bits (three bytes). For the use of English characters more forums will use UTF-8 to save space.

The GBK includes all Chinese characters, while the UTF-8 contains the character that all countries in the world need to use.

UTF-8 encoded text can be displayed on a variety of browsers that support UTF-8 character sets in various countries.

Everyone has used Word,word when opening a text file, if its detection is not the system default encoding, it will let the user choose, and recommend a code to the user (sometimes not necessarily right, because guess, there is a chance wrong), let the user decide what code to display.

CSV file: CSV (comma seperated value) is a comma delimiter, with Notepad open the words hi "a", "B", "C" in this format.

Benefits: You can import tables and databases easily, one row represents a single piece of data, so you can import values from the exported database in bulk.

Half-width comma,
Full-width comma,

The appearance of the two is similar, but the half-width comma occupies only half the position of the Chinese character, and the full-width comma occupies the position of a Chinese character, the half-width comma is used in English.

If you have a TXT file and the items are separated by commas, you can change the extension directly to a CSV file, open in Excel, and each column separated by commas is automatically in each column of Excel.
You can also use Excel to save your content in a CSV file format.

Several encoding methods

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.