Enter the word "China Unicom" in notepad, close and enable Garbled text, and then enable Garbled text.
It is interesting to enter the word "Unicom" in notepad, close it, and then open Garbled text. Here, we need to mention ANSI. Different countries and regions have developed different standards, which generate GB2312, BIG5, JIS and other respective coding standards. Then, these encoding methods do not have a fixed format, but for example, the UTF-8 format is very obvious, such as the highest bit is 0,110,111 0 and so on.
In the notepad save operation, the default storage encoding for windows is ANSI (GB2312 in China ).
In this way, the binary inner code of the two words is: (one word occupies two bytes)
"Link" ANSI encoding is 0xC1AA binary arrangement is1100 0001101010;
"Pass" ANSI encoding is 0xCDA8 binary arrangement is1100 1101101000;
Coincidentally, the ANSI code of the word "Unicom" complies with the second utf8 Encoding template (refer to here ). The starting part of the two bytes of "Link" and "pass" is "110" and "10", which is exactly the same as the two-byte template in the UTF8 rule, so when I open notepad again, notepad mistakenly believes that this is a UTF-8 encoded file (this is the reason). Let's remove 110 of the first byte and 10 of the second byte, we get "00001 101010", and then align you with the leading 0 to get "0000 0000 0110 1010". Sorry, this is UNICODE 0x006A, that is, the lowercase letter "j" (you can see the letter j using UltraEdit), and the second two bytes are decoded with UTF8, Which is 0x0368. This character is nothing, and garbled characters are displayed.
This is why files with only the words "Unicom" cannot be normally displayed in notepad.
It can be considered that when the binary encoding of All characters in the document is [C0 ≤ AA (first byte) ≤ DF] [80 ≤ BB (second byte) ≤ BF, notepad cannot confirm the encoding format of the text, it is displayed in the UTF-8 format. Using "Unicom" for analysis is that C1 is between [C0, DF], CD is also between [C0, DF], AA and A8 is between [80, BB.
In this case, it is not just the word "Unicom" that causes garbled characters. You only need to find the gbk value of the two words between [C0, DF]. It seems that Unicom is lying down and still shot, so let's talk about China Unicom, for example,"China UnicomThe four characters are still garbled. Haha, no more, just kidding. In fact, it is too easy to find qualified Chinese characters.
Create a notepad file, enter the word "unknown", save and close the file, and then open the file with garbled characters.
This is caused by Microsoft's failure to handle the encoding problem. I remember that the Garbled text is not just these two characters. Have you ever seen a person using notepad or using a special encoding.
In short, this is a windows system problem, no solution ~~
Garbled text in Notepad
The encoding method saved in notepad is ANIC encoding, Which is c1aa cda8
That is, 11000001101010101100110110101000 is exactly in line with the UTF-8 encoding method. Therefore, if you mistakenly think of UTF-8, you can see that it turns into the UTF-8 encoding method. This vulnerability exists in notepad. You can enter aaaa aaa aaaaa to try it. The result is also interesting.