Coding, CharSet, garbled, unicode,utf-8 and. Net Simple interpretation

Source: Internet
Author: User
Tags character set lowercase

Haven't written a blog for a long time, this afternoon work just came to an ending, a bit of time, the last week spent a lot of time to sum up some of the computer characters related to write out, hoping to help the original and I as confused people can easily understand, but also hope to be able to lead Jade (so much nonsense, not to start ...)

Because the company is using a traditional operating system, and I sometimes used to write some programs on the simplified computer in my home, but when I use the U disk to the code in between the two copy, often found that the file in the Chinese place into garbled, so it took some time to check the Internet, and found that there are a lot of garbled issues on the discussion, Summarize it in your own way (there are some wrong places, and I want you to point out):

1. The file is divided into text files and binary files, but the essence is the same, are some 01.

2. The computer storage device stores 0 or 1, as a bits (bit) of the computer.

3. Binary files of 0 and 1 have specialized applications to read, so they do not have any confusion problem, as long as the program recognized on the line. (like Doc,xls,exe,dll, etc.)

4. text file is not the same,notepad to know it,vs.net to know it, UE also want to know it ... So they're going to have a standard. The principle of this standard is actually very simple, that is, all the characters give it a serial number, and then according to this number to find the characters on it. This dongdong is the Code table, also known as the character set (CharSet).

5. text files are stored in characters, such as:a,?, @,x. It is obvious that a bit does not mean that just the computer's storage unit-byte (byte) is multiple bytes (1 byte=8 bit), so the character in byte is taken for granted.

6. The first code table--ascii code quickly produces, very simply, is a byte to represent a character (highest position 0), a total of 128 (2^8) characters can be stored. such as a in 65, the existence of the computer is 01000001 (65), in order to write conveniently, we generally remember as 0x41 (16), 97 for the lowercase a, the existence of the computer is 01100001 (97), recorded as 0x61. In 63, remember as 0x3f.

7. The uppercase and lowercase letters of English-speaking countries add up to only 52 characters, plus numbers, symbols and some special characters, which are sufficient for use. So ASCII was very popular at first (who called the computer not invented in China ...)

8. With the popularization of the computer, when the non-English-speaking countries began to use the,ascii already obviously can not meet (always do not use Xiao Sheng every day to express "niche"), so these countries (regions) began to develop their own standards.

9. The Chinese mainland has developed the character set of Simplified Chinese characters (GB2312). Unlike the English-speaking countries, our Chinese characters are far more than 128, so a byte must not be finished, then add a byte,16 bit (65536) can always. However, although this solves the problem of insufficient number of digits, but the original English document how to do? It's not all taken out and changed into Double-byte. Luckily, it turns out that the first place in the original ASCII is 0,. So we're going to change the 1th place to 1, OK? In the future, anyone who sees the beginning of 0 read 1 bytes,1 at the beginning of 2 bytes. (and 128*128 says all the simplified characters are enough)

10. Therefore, in the GB2312 standard, the "small" serial number is 0XD0A1, expressed as 11010000 10100001, and a is still represented as 01000001, which is why the simplified operating system read ASCII files will not garbled, and vice versa is not the reason.

11. For the time being, the situation is still relatively good and the computers in mainland China are functioning normally.

12. See the Mainland China has formulated a standard, other countries and regions are not to be outdone, have flashed their own character set, so what BIG5 (Taiwan), Shift_JIS (Japan), ks_c_5601-1987 (Korea) are shining debut, a time Bertice, flowers blossom.

13. Each country wants to be compatible with ASCII, for granted, the following characters are completely different, so the same 0xd0a1, in the GB2312 is "small" word, but in the BIG5 is "苤" word. If you think about it, it's not a mess.

14. At this time, there will always be some people think, and then continue to go on is certainly not, so they thought, if there is a standard, can include all the characters that's OK?

15. So the "Big Brother" standard came out, which is Unicode, in order to be enough to show all the characters in the world such a glorious and great task, this guy used four bytes to say (2 of the 32 times in the end is how much, I also lazy to forget), this is good, peace, and no more trouble, The ears are quiet ... (Stop, you boy so wordy)

15. But Unicode is good, but after all, four bytes indicates that a character "wasted" is too big (my cat is easy to surf the internet, telecom black, say yes 2m, give me 200K ...), and everyone "surprised" to find, incredibly the world some "strong" The country's characters just in front of the top 65,536, oh, the result of Unicode is also divided into unicode-16 and Unicode-32, the former only in two bytes (so can only represent the top 65,536, Eurasian countries, most of the word characters OK, what, you that @ $Y $% characters do not, hehe, no matter what I do, find the standard Association, are the guys do it ...)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.