Differences between various encoding formats in c #

Source: Internet
Author: User

I recently learned about the differences between Encoding methods in C #. I 'd like to share with you some comments if there are any mistakes. Simply put, why encoding is required? For example, the letters 'A', 'B', and so on are required in our computer. How do these letters be expressed in computer memory? As we all know, Data in computer memory is represented by binary. In this way, we need to convert these letters, numbers, and symbols into binary representation in the computer, this is the meaning of encoding. To encode a character into a binary representation in memory, you must first encode the character set. Each encoding represents a fixed character. Then convert the encoding of this character to the binary representation in the memory. Commonly used computer character encoding can be divided into two types: ASCII code and Unicode code. 1. ASCII Code ASCII (American Standard Code for Information Interchange) American Information Interchange Standard Code is a computer coding system based on Latin letters. ASCII is a standard single-byte character encoding scheme. It is used for text-based data. It uses a combination of seven or eight-bit binary codes to indicate the possible characters in 128 or 256. The biggest disadvantage of ASCII code is that it can only represent common character numbers and symbols in American English, and cannot represent character symbols in other languages, such as Chinese Characters in Chinese. 2. unicode code is a unified code that can accommodate all the texts and symbols in the world. It meets cross-language cross-platform requirements. Unicode Code is based on the Universal Character Set) standards have developed. Unicode code can accommodate all characters, so it is widely used, and ASCII is rarely used. The preceding two encoding methods demonstrate how to encode common characters and assign each character a code point (a number) for representation. This is fixed. Convenience for future applications. For example, the Unicode code corresponding to the Chinese character "word" is 23383. Based on the two encoding representations, the encoding can be expressed as a binary method that can be used in the memory. 1. the ASCII code encoding is relatively simple, because the ASCII code is encoded in bytes and the maximum value is 255. It can be expressed in memory using a single byte. No special operation is required for encoding. 2. Unicode encoding is relatively responsible because Unicode represents letters and symbols of all languages, so the encoding is not that simple. This section describes the Unicode encoding method. Unicode Encoding can be divided into the following five types: ASCIIEncodingUTF7EncodingUTF8EncodingUnicodeEncodingUTF32Encoding. The following describes the advantages, disadvantages, and differences of these Encoding methods. Encoding Internally,.. NET Framework stores text as Unicode UTF-16. an encoder transforms this text data to a sequence of bytes. A decoder transforms a sequence of bytes into this internal format. an encoding describes the rules by which an encoder or decoder operates. for example, the UTF8Encoding class describes the rules for encoding to and decoding from a sequence of bytes representing tex As UTF-8. encoding and decoding can also include certain validation steps. for example, theUnicodeEncoding class checks all surrogates to make sure they constitute valid surrogate pairs. both of these classes inherit from theEncoding class. the key sentence is: An encoding describes the rules by which an encoder or decoder operates UTF, which is a way to encode Unicode codes into binary representation in memory. The Unicode Standard assigns a code point (a number) to each character in every supported script. A Unicode Transformation Format (UTF) is a way to encode that code point. selecting an Encoding Classwhen you have the opportunity to choose an encoding, you are stronugly recommended to use a Unicode encoding, typically writable orUnicodeEncoding (UTF32Encoding is alsuppso orted ). in participant Ula R, UTF8Encoding is preferred overASCIIEncoding. if the content is ASCII, the two encodings are identical, but UTF8Encoding can also represent every Unicode character, while ASCIIEncoding supports only the Unicode character values between U + 0000 and U + 007F. because ASCIIEncoding does not provide error detection, UTF8Encoding is also better for security. UTF8Encoding has been tuned to be as fast as pos Sible and shoshould be faster than any other encoding. even for content that is entirely ASCII, operations passed med withuf8encoding are faster than operations passed Med with ASCIIEncoding. you shoshould consider usingASCIIEncoding only for certain legacy applications. however, even in this case, UTF8Encoding might still be a better choice. assuming default settings, the following scenarios can occur: I F your application has content that is not strictly ASCII and encodes it withASCIIEncoding, each non-ASCII character encodes as a question mark ("? "). If the application then decodes this data, the information is lost. if the application has content that is not strictly ASCII and encodes it with UTF8Encoding, the result seems unintelligible if interpreted as ASCII. however, if the application then decodes this data, the data performs a round trip successfully. 1. ASCIIEncodingASCIIEncoding only uses one byte to encode the Unicode code. ASCII letters are limited to a minimum of 128 characters in Unicode, from U + 0000 to U + 007F. ASCIIEncoding does not provide error detection. If you need error detection, UTF8Encoding, UnicodeEncoding, or UTF32Encoding is recommended for your program. UTF8Encoding, UnicodeEncoding, or UTF32Encoding are more suitable for building global applications. When selecting the ASCII encoding for your applications, consider the following: The ASCII encoding is usually appropriate for protocols that require ASCII. if your application requires 8-bit encoding, the UTF-8 encoding is recommended over the ASCII encoding. for the characters 0-7F, the results are identical, but use of UTF-8 avoids data loss by allowing representation of all Unicode characters th At are representable. note that the ASCII encoding has an 8th bit ambiguity that can allow malicious use, but the UTF-8 encoding removes ambiguity about the 8th bit. previous versions. NET Framework allowed spoofing by merely ignoring the 8th bit. the current version has been changed so that non-ASCII code points fall back during the decoding of bytes.2. UTF7EncodingRepresents a UTF-7 encoding o F Unicode characters. the UTF-7 encoding represents Unicode characters as sequences of 7-bit ASCII characters. this encoding supports certain protocols for which it is required, most often e-mail or newsgroup protocols. since UTF-7 is not particle ly secure or robust, and most modern systems allow 8-bit encodings, UTF-8 shocould normally be preferred to UTF-7.UTF7Encoding does not provide error dete Ction. For security reasons, the application shocould useUTF8Encoding, UnicodeEncoding, orUTF32Encoding and enable error detection. UTF7Encoding is not recommended. 3. UTF8EncodingUTF-8 encoding represents each code point as a sequence of one to four bytes. UTFEncoding encodes Unicode code into 1-4 single bytecode. The UTF-8 encoding is encoded in bytes for Unicode, the characters of different ranges are encoded in different lengths, and the maximum length of the UTF-8 encoding is 4 bytes. The encoding speed of UTF8Encoding is faster than that of all other encoding methods. Even if the content to be encoded is ASCII code, the encoding speed is faster than that of ASCIIEncoding. UTF8Encoding works much better than ASCIIEncoding. Therefore, we recommend that you use UTF8Encoding instead of ASCIIEncoding. When you have the opportunity to choose an encoding, you are stronugly recommended to use a Unicode encoding, typically eitherUTF8Encoding orUnicodeEncoding (UTF32Encoding is alsuppso orted ). in particle, UTF8Encoding is preferred overASCIIEncoding. if the content is ASCII, the two encodings are identical, but UTF8Encoding can also represent every Unicode character, while ASCIIEncoding supports onl Y the Unicode character values between U + 0000 and U + 007F. because ASCIIEncoding does not provide error detection, UTF8Encoding is also better for security. UTF8Encoding has been tuned to be as fast as possible and shoshould be faster than any other encoding. even for content that is entirely ASCII, operations passed med withuf8encoding are faster than operations passed Med with ASCIIEncoding. you shoshould Consider usingASCIIEncoding only for certain legacy applications. however, even in this case, UTF8Encoding might still be a better choice. assuming default settings, the following scenarios can occur: If your application has content that is not strictly ASCII and encodes it withASCIIEncoding, each non-ASCII character encodes as a question mark ("? "). If the application then decodes this data, the information is lost. if the application has content that is not strictly ASCII and encodes it with UTF8Encoding, the result seems unintelligible if interpreted as ASCII. however, if the application then decodes this data, the data performs a round trip successfully. 4. unicodeEncodingUnicodeEncoding is a 16-bit unsigned integer encoded into 1-2 16 integers. The Unicode Standard assigns a code point (a number) to each character in every supported script. A Unicode Transformation Format (UTF) is a way to encode that code point. theUnicode Standard uses the following UTFs: UTF-8, which represents each code point as a sequence of one to four bytes. UTF-16, which represents each code point as a sequence of one to two 16-bit integers. UTF-32, which represents. Each code point as a 32-bit integer. UnicodeEncoding is not compatible with ASCII. The default encoding method of C # Is UnicodeEncoding. The encoding method used is UTF-165. UTF32EncodingUTF32Encoding, encoded in 32-bit unsigned integer as a 32bit integer

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.