Enter the word "China Unicom" in notepad, close and enable Garbled text, and then enable Garbled text.

Last Update:2014-09-27 Source: Internet

Author: User

Tags coding standards

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Enter the word "China Unicom" in notepad, close and enable Garbled text, and then enable Garbled text.
It is interesting to enter the word "Unicom" in notepad, close it, and then open Garbled text. Here, we need to mention ANSI. Different countries and regions have developed different standards, which generate GB2312, BIG5, JIS and other respective coding standards. Then, these encoding methods do not have a fixed format, but for example, the UTF-8 format is very obvious, such as the highest bit is 0,110,111 0 and so on.

In the notepad save operation, the default storage encoding for windows is ANSI (GB2312 in China ).

In this way, the binary inner code of the two words is: (one word occupies two bytes)

"Link" ANSI encoding is 0xC1AA binary arrangement is1100 0001101010;

"Pass" ANSI encoding is 0xCDA8 binary arrangement is1100 1101101000;

Coincidentally, the ANSI code of the word "Unicom" complies with the second utf8 Encoding template (refer to here ). The starting part of the two bytes of "Link" and "pass" is "110" and "10", which is exactly the same as the two-byte template in the UTF8 rule, so when I open notepad again, notepad mistakenly believes that this is a UTF-8 encoded file (this is the reason). Let's remove 110 of the first byte and 10 of the second byte, we get "00001 101010", and then align you with the leading 0 to get "0000 0000 0110 1010". Sorry, this is UNICODE 0x006A, that is, the lowercase letter "j" (you can see the letter j using UltraEdit), and the second two bytes are decoded with UTF8, Which is 0x0368. This character is nothing, and garbled characters are displayed.

This is why files with only the words "Unicom" cannot be normally displayed in notepad.

It can be considered that when the binary encoding of All characters in the document is [C0 ≤ AA (first byte) ≤ DF] [80 ≤ BB (second byte) ≤ BF, notepad cannot confirm the encoding format of the text, it is displayed in the UTF-8 format. Using "Unicom" for analysis is that C1 is between [C0, DF], CD is also between [C0, DF], AA and A8 is between [80, BB.

In this case, it is not just the word "Unicom" that causes garbled characters. You only need to find the gbk value of the two words between [C0, DF]. It seems that Unicom is lying down and still shot, so let's talk about China Unicom, for example,"China UnicomThe four characters are still garbled. Haha, no more, just kidding. In fact, it is too easy to find qualified Chinese characters.

Create a notepad file, enter the word "unknown", save and close the file, and then open the file with garbled characters.

This is caused by Microsoft's failure to handle the encoding problem. I remember that the Garbled text is not just these two characters. Have you ever seen a person using notepad or using a special encoding.
In short, this is a windows system problem, no solution ~~

Garbled text in Notepad

The encoding method saved in notepad is ANIC encoding, Which is c1aa cda8
That is, 11000001101010101100110110101000 is exactly in line with the UTF-8 encoding method. Therefore, if you mistakenly think of UTF-8, you can see that it turns into the UTF-8 encoding method. This vulnerability exists in notepad. You can enter aaaa aaa aaaaa to try it. The result is also interesting.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More