[Post] Vim Encoding

Source: Internet
Author: User

In vim, there are four encoding-related options:fileencodings,fileencoding、encodingAndtermencoding. In actual use, any option error may cause garbled characters. Therefore, each Vim user should clarify the meaning of these four options.

1. Encoding

    encodingIt is the internal character encoding method used by VIM. SetencodingAll the buffer, registers, and strings in the script in VIM use this encoding. When Vim is working, if the encoding method is inconsistent with its internal encoding, it will first convert the encoding to the internal encoding. If the encoding used for work contains characters that cannot be converted to internal encoding, these characters will be lost. Therefore, when selecting the vim internal encoding, you must use an encoding with sufficient performance to avoid affecting normal operations.

BecauseencodingThe option involves the internal representation of all characters in Vim. It can be set only once when Vim is started.encodingIt may cause many problems. If there is no special reason, always setencodingSetutf-8. To avoid garbled menus and system prompts in non-UTF-8 systems such as Windows, you can also make these settings:

Set encoding = UTF-8

Set langmenu = zh_CN.UTF-8

Language message zh_CN.UTF-8

2. termencoding

    termencodingVim is the code used for screen display. When it is displayed, VIM converts the internal code to screen encoding before outputting it. When the internal encoding contains a character that cannot be converted to screen encoding, the character becomes a question mark, but the editing operation is not affected. IftermencodingIf not set, useencodingNo conversion is performed.

For example, when you log on to Linux via Telnet in windows, because Windows telnet is GBK encoded, but Linux uses UTF-8 encoding, garbled characters appear in VIM in Telnet. There are two ways to solve this problem: first, the VimencodingChangegbk, Another way is to keepencodingIsutf-8termencodingChangegbkTo enable Vim to transcode during display. When using the previous method, if the edited file contains characters that cannot be expressed by GBK, these characters are lost, these characters cannot be displayed, but they are not lost during editing.

For gvim in the graphic interface, its display does not depend on the term, sotermencodingIt does not make sense. In gvim under gtk2,termencodingAlwaysutf-8And cannot be modified, while gvim in Windows ignorestermencoding.

3. fileencoding

When Vim reads a file from a disk, it detects the file encoding. If the file encoding method is different from the vim internal encoding method, VIM converts the encoding method. After the conversion, VIM convertsfileencodingSpecifies the encoding of the file. IfencodingAndfileencodingDifferent, VIM performs encoding conversion. Therefore, after opening the file, SetfileencodingTo convert an object from one encoding type to another encoding type. However, we can see from the previous introduction that,fileencodingIt is automatically set after Vim detects a file when it is opened. Therefore, in case of garbled characters, we cannot reset it after opening the file.fileencodingTo correct garbled characters.

4. fileencodings

The automatic identification of encoding is implemented by setting fileencodings. Note that it is in the plural form. Fileencodings is a list separated by commas (,). Each item in the list is an encoded name. When the file is opened, VIM uses the encoding in fileencodings to try decoding. If the encoding is successful, it uses this encoding method to decode andfileencodingSet it to this value. If the Code fails, continue to test the next encoding. ThereforefileencodingsStrict requirements, when the file is not the encoding, it is easier to put the decoding failure encoding method in front, put the loose encoding method in the back.

For example, Latin1 is a very loose encoding method. If the text obtained by any encoding method is decoded using Latin1, no decoding failure will occur, of course, the decoded results are naturally "garbled ". Therefore, if youlatin1Put itfileencodingsIn the first place, opening any Chinese file is garbled.

Which of the following isfileencodingsSettings:

Set fileencodings = ucs-bom, UTF-8, cp936, gb18030, big5, EUC-JP, EUC-KR, Latin1

Among them, the ucs-bom is a very strict encoding. Non-encoded files are hardly mistaken for the ucs-bom, so they are placed first. UTF-8 is also quite strict, except for very short files (for example, the GBK-encoded "Unicom" was misjudged as a classic error of UTF-8 code), in reality, files are almost impossible to be misjudged, so put in the second place; the following are the cp936 and gb18030 encoding types, which are relatively loose. If there is a lot of misjudgment in front of the two encoding types, let them back up; cp936 encoding space is smaller than gb18030, therefore, put cp936 in front of gb18030. As for big5, EUC-JP, and EUC-KR, they are strictly similar to cp936, and put them behind them, when editing these encoding files, there will inevitably be a lot of misjudgment, but this is not a solution for Vim's built-in encoding detection mechanism. Since Chinese users rarely have the opportunity to edit these encoding files, we decided to take cp936 and gb18030 to ensure the identification of these codes. Latin1 is the last, it is an extremely loose code, so we have to put it in the last place. Unfortunately, when you encounter a file with Latin1 encoding, in most cases, it does not have the opportunity to fall-back to Latin1, which is often mistaken in the previous encoding. However, as mentioned earlier, Chinese users do not have much access to such files.

If the encoding is wrong, the decoded results won't be recognized by humans, so we can say that this file is garbled. If you know the correct encoding of the file, you can use++enc=encodingTo open the file, such:

   :e ++enc=utf-8 myfile.txt

5. fencview

According to the previous introduction, we know that the recognition rate is very low through the built-in encoding recognition mechanism of VIM, especially for simplified Chinese (GBK/gb18030) and traditional Chinese (big5) identification Between Japanese (EUC-JP) and Korean (EUC-KR. For common users, it is unrealistic to see the encoding method of a file with the naked eye. Therefore, we strongly recommend the fencview plug-in developed by mbbill In the Shui Mu community. This plug-in uses word frequency statistics to identify the encoding, with a very high accuracy rate. Click here to download.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.