. Net Character Set

Source: Internet
Author: User

Http://msdn.microsoft.com/en-us/library/ms404377.aspx

Encoding

Class

Description

Advantages/disadvantages

ASCII

Asciiencoding

Encodes a limited range of characters by using the lower seven bits of a byte.

Because this encoding only supports character values from u + 0000 through U + 007f, in most cases it is inadequate for internationalized applications.

UTF-7

Utf7encoding

Represents characters as sequences of 7-bit ASCII characters. Non-ASCII Unicode characters are represented by an escape sequence of ASCII characters.

UTF-7 supports protocols such as e-mail and newsgroup protocols. however, UTF-7 is not particle ly secure or robust. in some cases, changing one bit can radically alter the interpretation of an entire UTF-7 string. in other cases, different UTF-7 strings can
Encode the same text. For sequences that include non-ASCII characters, UTF-7 requires more space than UTF-8, and encoding/decoding is slower. Consequently, you shocould use UTF-8 instead of UTF-7 if possible.

UTF-8

Utf8encoding

Represents each Unicode Code Point as a sequence of one to four bytes.

UTF-8 supports 8-Bit Data sizes and works well with your existing operating systems. for the ASCII range of characters, the UTF-8 is identical to ASCII encoding and allows a broader set of characters. however, for Chinese-Japan-Korean (CJK) scripts, UTF-8 can
Require three bytes for each character, and can potentially cause larger data sizes than UTF-16. Note that sometimes the amount of ASCII data, such as HTML tags, justifies the increased size for the CJK range.

UTF-16

Unicodeencoding

Represents each Unicode Code Point as a sequence of one or two 16-bit integers. most common Unicode characters require only one UTF-16 code point, although Unicode supplementary characters (U + 10000 and greater) require two UTF-16 surrogate code points. both
Little-Endian and big-Endian byte orders are supported.

UTF-16 encoding is used by the common language runtime to represent charand string values,
And it is used by the Windows operating system to represent wchar values.

UTF-32

Utf32encoding

Represents each Unicode Code Point as a 32-bit integer. Both little-Endian and big-Endian byte orders are supported.

UTF-32 encoding is used when applications want to avoid the surrogate Code Point behavior of UTF-16 encoding on operating systems for which encoded space is too important. single glyphs rendered on a display can still be encoded with more than one UTF-32 character.

ANSI/ISO encodings

Provides support for a variety of code pages. on Windows operating systems, code pages are used to support a specific language or group of operating ages. for a table that lists the code pages supported by. NET framework, see theencoding class.
You can retrieve an encoding object for a participant code page by calling theencoding. getencoding (int32) method.

A code page contains 256 code points and is zero-based. in most code pages, code points 0 through 127 represent the ASCII character set, and code points 128 through 255 differ significantly between code pages. for example, code page 1252 provides the characters
For Latin writing systems, including English, German, and French. the last 128 code points in code page 1252 contain the accent characters. code Page 1253 provides character codes that are required in the Greek writing system. the last 128 code points in code
Page 1253 contain the Greek characters. as a result, an application that relies on ANSI code pages cannot store Greek and German in the same text stream unless it contains des an identifier that indicates the referenced code page.

Double-byte character set (DBCS) encodings

Supports ages, such as Chinese, Japan, and Korean, that contain more than 256 characters. in a DBCS, a pair of code points (a double byte) represents each character. the encoding. issinglebyteproperty
Returns false for DBCS encodings. You can retrieve an encoding object for a participant DBCS by calling theencoding. getencoding (int32) method.

In a DBCS, a pair of code points (a double byte) represents each character. when an application handles DBCS data, the first byte of a DBCS character (the lead byte) is processed in combination with the trail byte that immediately follows it. because a single
Pair of double-byte code points can represent different characters depending on the code page, this scheme still does not allow for the combination of two ages, such as Japanese and Chinese, in the same data stream.

Encoding-Klasse

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.