What is the difference between the different encodings of encoding in C #

Source: Internet
Author: User
Tags character set

Simply put, why do you need to encode? For example, our computer needs to represent letters ' a ', ' B ', and so on, but how are these letters represented in the computer's memory? It is well known that in computer memory data is represented in binary notation, so we need to convert these letters and numbers or symbols that need to be represented into binary representations that can be represented on the computer, which is what coding means.

Encoding a character into an in-memory binary representation requires that the character set be encoded first, and each encoding represents a fixed character. The encoding of this character is then converted into an in-memory binary representation.

There are two main types of computer characters: ASCII code and Unicode code.

1. ASCII Code

ASCII (American Standard Code for Information Interchange) American Information Interchange standard codes, a set of computer coding systems based on the Latin alphabet. ASCII is a standard Single-byte character encoding scheme for text-based data that uses a 7-bit or 8-bit binary combination to represent the possible characters in 128 or 256.

The biggest drawback of ASCII code is that it can only represent characters and symbols commonly used in American English, and cannot represent characters in other languages, such as Chinese characters.

2. Unicode Code

Unicode code is a coding scheme capable of accommodating all the words and symbols in the world, becoming a unified code to meet the requirements of Cross-language Cross-platform, Unicode code is developed based on the standard of universal Character set (Universal Character set). Unicode codes can hold all character symbols, so they are used more widely, and ASCII is hardly used.

The above two coding methods explain how to encode the commonly used characters and give each character a code point (a number) to indicate that this is fixed. Convenient for later application. For example, the Chinese character "word" corresponds to the Unicode encoding of 23383.

On the basis of these two encoding representations, the encoding can be expressed as a binary method that can be used in memory.

1. ASCII code is simpler, because the ASCII code is encoded in bytes, the maximum is 255, can directly use a byte in memory to express, encoding no special operation.

2. Unicode encoding is relatively responsible because Unicode represents the alphabet of all languages, so the encoding is not so simple.

Here's a description of how Unicode is encoded.

Unicode encoding can be divided into the following five kinds:

ASCIIEncoding

UTF7Encoding

UTF8Encoding

UnicodeEncoding

UTF32Encoding

The following first introduces the understanding of encoding, and then explains in detail the advantages and disadvantages of these several coding methods respectively.

Encoding's understanding

Internally, the. NET Framework stores text as Unicode UTF-16. An encoder transforms this text data to a sequence of bytes. A decoder transforms a sequence of bytes into this internal format. An encoding describes of the rules by which a encoder or decoder operates. For example, the UTF8Encoding class describes the rules for encoding to and decoding from a sequence of bytes representing Text as UTF-8. Encoding and decoding can also include certain validation steps. For example, Theunicodeencoding class checks all surrogates to make sure they the constitute valid surrogate. Both of classes inherit from Theencoding class.

The key sentence is: An encoding describes of the rules by which a encoder or decoder operates

UTF is a method of encoding a Unicode code into an in-memory binary representation. The Unicode Standard assigns a code point (a number) to each character in every supported script. A Unicode Transformation Format (UTF) is a way to encode this code point.

Selecting an Encoding Class

When you are have the opportunity to choose a encoding, for you are strongly and recommended to use a Unicode encoding, typically EIT Herutf8encoding orunicodeencoding (UTF32Encoding is also supported). In particular,utf8encoding is preferred overasciiencoding. If the content is ASCII, the two encodings are identical, but utf8encoding can also represent every Unicode character, whi Le ASCIIEncoding supports only the Unicode character values between u+0000 and u+007f. Because ASCIIEncoding does not provide the error detection,utf8encoding is also better to security.

UTF8Encoding has been tuned to is as fast as possible and should be faster the any other than. Even for content this is entirely ASCII, operations performed withutf8encoding are faster than operations performed with A Sciiencoding. You should consider usingasciiencoding a for certain legacy applications. However, even in the case, UTF8Encoding might still to be a better choice. Assuming default settings, the following scenarios can occur:

If your application has content, not strictly ASCII and encodes it withasciiencoding, each non-ascii character Des as a question mark ("?"). If The application then decodes this data, the information is lost.

If the application has content, is not strictly ASCII and encodes it with utf8encoding, the result seems unintelligibl e if interpreted as ASCII. However, if the application then decodes this data, the data performs a round trip successfully.

See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/csharp/

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.