On the difference between Ansi,unicode and Utf-8
A very good article, worth a look, special turn of the
About the difference between encoding ANSI, GB2312, Unicode, and Utf-8
Let's do a little experiment first:
In a folder, put a txt text (the text contains "Today's weather very good" this sentence), respectively Save as ANSI, Unicode, utf-8 these three kinds of encoded TXT file. Then, right-click on the folder and select "Search (E) ...".
Search for the word "weather", you can search for both ANSI and Unicode encoding TXT file, search for utf-8 encoded files.
Reason:
1. Chinese operating system default ANSI encoding, generated TXT file by default is ANSI encoding, so, can be searched out.
2.unicode is an international universal code, so you can search it out.
3.UTF-8 encoding is a "workaround" and "Bridge" encoding of Unicode encoding when it is transmitted between networks (primarily Web pages). Utf-8 can save data when transferring between networks. Therefore, you cannot search for txt text using the operating system.
In accordance with the wishes of Utf-8 founders:
End (Unicode)--Transfer (UTF-8)--End (Unicode)
Later, however, many Web developers used UTF-8 encoding directly when developing Web pages.
End (Utf-8)--Transmission (UTF-8)--End (Utf-8)
So, the encoding you see on the browser is: Unicode (utf-8). Because the list of Unicode (UTF-8) is so juxtaposed on the browser, many netizens (and even many programmers) mistakenly think of Unicode=utf-8. In fact, according to the original intent of utf-8, it is wrong to use UTF-8 encoding when developing Web pages, and early browsers do not support parsing utf-8 encoding. However, the power of the public is huge, Microsoft has to "bandwagon", on the browser support parsing utf-8 encoding.
The problem is that UTF-8 encoding affects web developers, or that Web developers "expand" the use of UTF-8 encoding. However, web site developers still can't influence the developers of all types of documents, so Word documents and some international documents still use Unicode encoding instead of using UTF-8 encoding.
For example: "Strict" Unicode code is 4E25,UTF-8 encoding is E4B8A5, the two are not the same.
The encoding of the (TXT and XML) files generated in the Chinese and Japanese operating systems is ANSI, but, under the Simplified Chinese system, the ANSI encoding represents the GB2312 encoding, which, under the Japanese operating system, represents the JIS code. Different ANSI encodings are incompatible, and when information is exchanged internationally, text that is in two languages cannot be stored in the same piece of ANSI-encoded text.
Conclusion: The use of Unicode encoding for international documents (TXT and XML) is an authentic practice; both the operating system and the browser are able to "understand" Unicode encoding. The browser "is under pressure" to "understand" utf-8 encoding. However, the operating system sometimes only recognized Unicode encoding.
The difference between Unicode and Unicode big endian: Do you eat eggs first or eat big head first? The difference between Unicode and Unicode big endian is the difference between a small head and a top priority when coding. "Drift" using Unicode is OK.
On the difference between Ansi,unicode and Utf-8