If your computer operating system is Win2000 or WINXP, then:
1. Right-click on the desktop and choose new-Text Document;
2. Open "New Text Document", enter "move", and save the disk and exit;
3. Open "New Text Document" again. What do you see?
4. Is the word "mobile" just entered?
Let's change it over:
1. Right-click on the desktop and choose new-Text Document;
2. Open "New Text Document", enter "Unicom", and save the disk and exit;
3. Open "New Text Document" again. What do you see?
4. Is the word "Unicom" just entered missing? Instead, it looks like a burning cell phone battery?
When you create a new file in Windows notepad, enter the word "Unicom", save, close, and open it again, you will find that the two words have disappeared, instead of several garbled characters! Some people say that this is why China Unicom cannot fight for mobile ."
In fact, this is because the gb2312 encoding conflicts with utf8 encoding.
When you create a text file, the notepad encoding is ANSI by default. If you input Chinese characters in the ANSI encoding, it is actually the encoding method of the GB series. In this encoding, the inner code of China Unicom is:
C1 1100 0001
AA 1010 1010
CD 1100 1101
A8 1010 1000
Have you noticed? The starting part of the first two bytes and the third four bytes is "110" and "10", which is exactly the same as the two-byte template in the utf8 rule,
So when I opened notepad again, notepad mistakenly thought it was a UTF-8 encoded file. Let's remove 10 of the First and Second bytes,
We get "00001 101010", and then align you with the leading 0 to get "0000 0000 0110 1010". Sorry,
This is the Unicode 006a, that is, the lowercase letter "J". The second two bytes are decoded by utf8 and the character is 0368. This character is nothing. This is why files with only the words "Unicom" cannot be normally displayed in notepad.
If you enter a few more words after "Unicom", the encoding of other words may not necessarily start with 110 or 10, notepad will not insist that this is a UTF-8 encoded file, but will be interpreted in ANSI mode, and garbled characters will not appear.