The author encounters the problem background:
Windows use notepad++6.7, FTP connection remote Ubuntu Host, local create remote host file, after editing the upload appears in Chinese garbled.
I do not understand the problem at first, because set the default in notepad++ to use the UTF-8 encoding format for new files, but still invalid.
The author explores the problem in a single step:
- Create a new file, write plain English HTML text after uploading to Ubuntu host, Vi:set fileencoding display at this time for UTF-8 encoding
- The above file added several Chinese, save after uploading to Ubuntu host, this time hint fileencoding become latin1 code, the corresponding Chinese part becomes garbled
- There are many people on the Internet that set meta tags, is the most common charset=utf-8 this attribute, I also tried to declare failure, did not solve the garbled problem.
I think the problem arises, a file, not written in Chinese before is Utf-8 encoding, written to the file is Latin1 encoding, this is the key to the problem .
After a meal Baidu found, in the notepad++, the tab has a "format (M)", which has a lot of options: There are two large chunks: in XXX format encoding, to XXX format encoding, this is the key to solve the problem .
Combined with the author has encountered some of the Chinese garbled, so many netizens headache unceasingly, need to understand the rationale summarized as follows:
The content covers three parts of the Windows operating system, Linux operating system, and browser parsing .
first , to understand that the problem is that the default character encoding on Windows is different from the character encoding on Linux , usually the former is ANSI, the latter is UTF-8;
The garbled problem of text display on different platforms is caused by the different default character encoding between operating systems, the above example, plain English text, ANSI encoding and UTF-8 encoding can be said to be equivalent, no problem.
However, when it comes to Chinese encoding, ANSI is not supported, or it is different from the UTF-8 encoding. An ANSI-encoded text to Linux, which is solved in UTF-8 format, the result is to remove a pair of garbled characters.
In addition, Windows and Linux also use UTF-8 to encode text, slightly different, this is the difference between BOM and no BOM (see other articles), Linux no BOM.
Summarize:
- In order to make the files of the two system platform to be able to display seamlessly, a big key point is: Windows uses UTF-8 encoding (no BOM format) to save the file or use UTF-8 encoding to save (do not use other encodings, save Chinese with other encodings, assuming ANSI save, want to convert to another code , because itself is garbled, and then turn is probably garbled)
- When the file to be uploaded to the Linux host, to do a check, the file into UTF-8 (no BOM) and then pass it up, to ensure that there is no problem.
The second , meta tag, is the most common problem with this attribute Charset=utf-8
This problem caused by garbled, in fact, from the text file itself encoding and browser parsing using different encoding , which is caused by the 1th garbled is not the same concept.
- Example 1: Access to a UTF-8 formatted Web page text that does not specify <meta Charset=utf-8>, and it is likely that the browser will default to parsing the Web page with GBK encoding. Results: Decoding a copy of UTF-8 encoded text with GBK, garbled.
- Example 2: Access to a UTF-8 format of the Web page text, the text specifies <meta Charset=utf-8>, it is likely to encounter the browser by default to GBK encoding page parsing, encountered Meta, know to parse it with Utf-8. Results: A copy of UTF-8 encoded text was correctly parsed with UTF-8, no problem.
- Ps: Example 2 may also be the Chinese parsing out garbled. Why is this? That is because, the service side of the UTF-8 encoded files, in the place of Chinese, itself is already garbled! What is the reason? Yes, in this case, it is possible to connect the reasons for the two Chinese garbled characters. After writing the code on the windows to the server host, it has been because of the coding contradictions of the two systems and caused the Chinese garbled, when the browser initiates access, regardless of the correct encoding is used, the wrong encoding and how to parse, is wrong, this is the origin of the problem.
Summary: The transfer of files between different systems, note that the encoding to use the correct, to ensure that the file itself is not wrong; Client access, meta-notification browser file encoding, to ensure that the parsing of the file is not wrong, there are two steps, generally will not be able to solve the Chinese garbled problem!
A summary of the Chinese garbled problem in the program development under Windows