Character Set and encoding 01 -- charset vs encoding, charsetpageencoding

Last Update:2016-08-31 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Statement: This article is reprinted from http://my.oschina.net/goldenshaw/blog/304493

In many cases, Character Set and encoding are often confused, but the two are different. As the first step in deep understanding, we must first clarify:

Character SetAnd Character Set EncodingIs a concept of two different levels

 
 
  
  Charset is short for character set, that isCharacter Set.
  
  Encoding is short for charset encoding, that isCharacter Set Encoding, AbbreviationEncoding.

Comparison with interface and interface implementation

You can compare the twoInterfaceAndInterface implementationMake a comparison:

From here we can clearly see that,

Examples and usage

Let's take a look at two examples. One is from an html file and uses charset:

<meta http-equiv="content-type" content="text/html;charset=utf-8">

The other is from an xml file, which uses encoding:

<?xml version="1.0" encoding="UTF-8"?>

Which method is more standardized? Obviously the latter, which is more accurate in terms of the concept of Character Set and encoding.

"Charset = UTF-8" is easy to misunderstand that there is a character set called "UTF-8", but in fact, whether it is a UTF-8 or UTF-16, UTF-32 is only for the same character set of different encoding.

Why strictly differentiate Character SetAnd EncodingThese two concepts? Character SetAnd EncodingOne-to-one scenario

There are many character encoding schemes. A character set has only one encoding implementation, and the two are one-to-one. For example, GB2312. In this case, no matter how you call them, for example, "GB2312 encoding" or "GB2312 Character Set", it is actually a thing to say, maybe it does not specifically differentiate itself, so it will not be wrong in any case.

Why is one-to-one a common situation?

We take GB2312 as an example. GB = Guo Biao = GB = National Standard. The standard is originally for unification. You have produced N codes for one standard. Which one do you use?

Character SetAnd EncodingOne-to-multiple scenarios

This is the only thing in Unicode.Unicode Character SetCorresponds to three types of encoding: UTF-8, UTF-16, UTF-32. If the name is still so general, it is easy to confuse.

Why is Unicode so special?

People come up with new character set standards, and the driving force is nothing more than the characters in the old character set are not enough.

The objective of Unicode is to unify all character sets and include all characters. Therefore, when the character set is developed into it, it will go to the beginning. It is unnecessary or unnecessary to complete any new character set.

But what if I think its current encoding scheme is not very good? In the absence of new character sets, we can only make an article on encoding, so we have multiple implementations, which breaks the traditional one-to-one correspondence.

We strictly distinguish character sets and encoding for this reason.

SpecifiedEncoding, Which correspondsCharacter SetNaturally, it is specified,EncodingThis is what we ultimately need to care about.

Unicode comparison

Let's look at a chart that shows some differences between Unicode in the early and present:

Note: For historical reasons, you will also see a mix of Unicode and UTF-8 in many places, in which case Unicode is usually a UTF-16 or an earlier UCS-2 encoding, in the subsequent chapters, we will further analyze.

The following is a "Notepad program" saved, is a non-standard use of Unicode, Here Unicode refers to the UTF-16:

We have mentioned a lot of Unicode. For various reasons, we must admit that in different contexts, the word "Unicode" has different meanings. It may refer:

Unicode Standard
Unicode Character Set
Unicode abstract encoding (number), that isCode point)
A specific Unicode encoding implementation, usuallyVariable LengthUTF-16 (16 or 32 bits), or a later 16-bit UCS-2

These topics will be further discussed in the subsequent chapters.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Character Set and encoding 01 -- charset vs encoding, charsetpageencoding

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Character Set and encoding 01 -- charset vs encoding, charsetpageencoding

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support