Character Set and encoding 01 -- charset vs encoding, charsetpageencoding

Source: Internet
Author: User

Character Set and encoding 01 -- charset vs encoding, charsetpageencoding

Statement: This article is reprinted from http://my.oschina.net/goldenshaw/blog/304493

 

In many cases, Character Set and encoding are often confused, but the two are different. As the first step in deep understanding, we must first clarify:

Character SetAnd Character Set EncodingIs a concept of two different levels
 
 
  • Charset is short for character set, that isCharacter Set.

  • Encoding is short for charset encoding, that isCharacter Set Encoding, AbbreviationEncoding.

Comparison with interface and interface implementation

You can compare the twoInterfaceAndInterface implementationMake a comparison:

From here we can clearly see that,

Examples and usage 

Let's take a look at two examples. One is from an html file and uses charset:

<meta http-equiv="content-type" content="text/html;charset=utf-8">

The other is from an xml file, which uses encoding:

<?xml version="1.0" encoding="UTF-8"?>

Which method is more standardized? Obviously the latter, which is more accurate in terms of the concept of Character Set and encoding.

"Charset = UTF-8" is easy to misunderstand that there is a character set called "UTF-8", but in fact, whether it is a UTF-8 or UTF-16, UTF-32 is only for the same character set of different encoding.

Why strictly differentiate Character SetAnd EncodingThese two concepts? Character SetAnd EncodingOne-to-one scenario

There are many character encoding schemes. A character set has only one encoding implementation, and the two are one-to-one. For example, GB2312. In this case, no matter how you call them, for example, "GB2312 encoding" or "GB2312 Character Set", it is actually a thing to say, maybe it does not specifically differentiate itself, so it will not be wrong in any case.

Why is one-to-one a common situation?

We take GB2312 as an example. GB = Guo Biao = GB = National Standard. The standard is originally for unification. You have produced N codes for one standard. Which one do you use?

Character SetAnd EncodingOne-to-multiple scenarios

This is the only thing in Unicode.Unicode Character SetCorresponds to three types of encoding: UTF-8, UTF-16, UTF-32. If the name is still so general, it is easy to confuse.

Why is Unicode so special?

People come up with new character set standards, and the driving force is nothing more than the characters in the old character set are not enough.

The objective of Unicode is to unify all character sets and include all characters. Therefore, when the character set is developed into it, it will go to the beginning. It is unnecessary or unnecessary to complete any new character set.

But what if I think its current encoding scheme is not very good? In the absence of new character sets, we can only make an article on encoding, so we have multiple implementations, which breaks the traditional one-to-one correspondence.

We strictly distinguish character sets and encoding for this reason.

SpecifiedEncoding, Which correspondsCharacter SetNaturally, it is specified,EncodingThis is what we ultimately need to care about.

Unicode comparison

Let's look at a chart that shows some differences between Unicode in the early and present:

Note: For historical reasons, you will also see a mix of Unicode and UTF-8 in many places, in which case Unicode is usually a UTF-16 or an earlier UCS-2 encoding, in the subsequent chapters, we will further analyze.

The following is a "Notepad program" saved, is a non-standard use of Unicode, Here Unicode refers to the UTF-16:

We have mentioned a lot of Unicode. For various reasons, we must admit that in different contexts, the word "Unicode" has different meanings. It may refer:

  • Unicode Standard

  • Unicode Character Set

  • Unicode abstract encoding (number), that isCode point)

  • A specific Unicode encoding implementation, usuallyVariable LengthUTF-16 (16 or 32 bits), or a later 16-bit UCS-2

These topics will be further discussed in the subsequent chapters.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.