Let Vim know more about encoding.

Source: Internet
Author: User
Tags windows ssh client
If the current locale is UTF-8, use Vim to open a gb2312 encoded file, which is garbled. Open it with gedit, but it is displayed normally. Isn't Vim as good as gedit? The small issue of recognition encoding should be a piece of cake for vim. The following is the solution:

Edit ~ /, Vimrc, add the following three lines: Set Encoding = UTF - 8
Set fileencodings = UTF - 8 , Gb2312, gb18030, GBK, UCS - Bom, cp936, Latin1 "if the file encoding you want to open is not included in this column, add it
Set termencoding = UTF - 8

It is better to teach fish to fish. Below is a reprinted articleArticleWhich details the vim encoding problem.
Reprinted from: http://dawnh.net/software/59/vim-charset-encode/

Notes for editing different encoding files in VIM

This article describes some basic knowledge about the multi-byte encoding document (Chinese) edited by VIM. Note that gvim is not involved, and VIM on character terminals only.
Vim coding basics:

1. Three variables exist:
Encoding -- this option applies to buffered text (files you are editing), registers, VIM script files, and so on. You can set the 'encoding' option as the internal running mechanism of vim.
Fileencoding -- this option is the encoding type used by VIM when writing files.
Termencoding -- this option indicates the encoding type used to output to the client terminal (TERM.
2. default values of the three variables:
Encoding -- it is the same as the current locale of the system. Therefore, when editing files, consider the current locale; otherwise, there will be more to set.
Fileencoding -- Vim automatically identifies the encoding when the file is opened, and fileencoding is the recognized value. If it is null, the file is saved in the encoding format of encoding. If encoding is not modified, the value is the current locale of the system.
Termencoding -- the default value is null, that is, the output to the terminal does not undergo encoding conversion.

It can be seen that the attention of editing different encoding files is not only the three variables, but alsoCurrent locale of the systemAndFile encoding and automatic encoding Recognition,The encoding type used by the client to run vim.The three key points affect the setting of the three variables.
If someone asks: why is there a garbled code when I use Vim to open a Chinese document?
The answer is uncertain. The reason has already been discussed above. It is normal to find out the three key points and the set values of the three variables, it is a coincidence that no garbled characters appear.

let's take a look at the values of these three key points in common cases and the values of these three variables in this case:
1, locale-currently, most Linux systems use UTF-8 as the default locale, but it may not. For example, some systems use Chinese locale
zh_cn.gb18030. When locale is UTF-8, encoding will be set to UTF-8 after Vim is started. This is the best compatibility method, because if UTF-8 is used for internal processing, no defect conversion can be performed regardless of the external storage encoding. Locale determines the encoding of data internally processed by VIM, that is, encoding.
2. file encoding and automatic encoding identification-This involves various encoding rules and will not be detailed in detail. However, you need to understand that the file encoding type is not stored in the file, that is, there is no
descriptive field to record the encoding type of the document. Therefore, when editing a document, we must either know the encoding used for saving the document, or determine the encoding class by some other means.
type, this other method is determined by some encoded code table features, such as the number of bytes occupied by each character, whether the ASCII value of each character is greater than a field to determine the encoding of the file
code. This method is also used by VIM, which is the automatic encoding and recognition mechanism of vim. However, this mechanism is not possible because of the variety of encodings, and it is impossible for each encoding to have significant features to identify, so it is impossible to
100% accuracy. Because our gb2312 encoding uses two Chinese characters with an acⅱ value higher than 127, it is impossible to separate the gb2312 encoding file from the
Latin1 Encoding Area, therefore, the automatic identification mechanism is unsuccessful for gb2312. It only recognizes the file as Latin1 encoding. This problem also occurs in GBK and big5
. Therefore, when editing such documents, you need to manually set encoding and fileencoding. If the file is UTF-8 encoded, VIM can automatically identify the correct
encoding.

3. The encoding type used by the client to run Vim -- the same as the second one, which is also a key point that is hard to determine. The second key point is the encoding used to read and write content from the file to the file
. The key point is the encoding used when Vim outputs the content to the terminal, if the encoding type is different from the encoding type of the data received by the terminal, garbled characters may occur. In the
Linux local X environment, generally, the terminal considers that the encoding type of the received data is consistent with the locale type of the system. Therefore, you do not need to worry about the problem. However, if a remote terminal is involved, such as
logging on to the server via SSH, the problem may occur. For example, you can use SSH to connect a system with a locale of gb2310 (called a client) to a system with a locale of UTF-8 (called a server
server) and enable Vim to edit the document, without any modification, the server returns UTF-8 data, but the client considers the data returned by the server to be gb2312. According to
gb2312, it must be garbled. In this case, we need to set termencoding to gb2312 to solve this problem. This problem occurs even more when we log on to the server remotely by Using ssh on the
Windows
desktop, the encoding conversion problem of different systems is involved here. Therefore, it is highly related to Windows and SSH clients. In
Windows, there are two types of coding software. One is the software written for Unicode encoding, and the other is the ANSI software, that is, the Program processes data directly using byte streams. No
encoding is concerned. The previous program can correctly display the multi-language on windows in any language, and the other program can only display the correct text on the System of the language. For
Programs of the two types, we need to treat them differently. Take the SSH client as an example. The putty we use is Unicode, while the secure CRT is ANSI
. For the former, we need to correctly process Chinese characters. We only need to ensure that the encoding of VIM output to the terminal is UTF-8, that is, termencoding = UTF-8. However, for the latter, on the one hand, we need to confirm that the default Code page of the Windows system is cp936 (default Windows in Chinese ), on the other hand, confirm the termencoding =
cp936 set by VIM.

Finally, let's take a look at the typical situations and setting methods for handling Chinese documents:

1. The system locale is UTF-8 (many Linux systems use the default locale Format), and the edited documents are in gb2312 or GBK format (Windows notepad
The default storage format. Most editors save it as this format by default, so it is the most common). The terminal type is UTF-8 (that is, it is assumed that the client is a unicode software of the putty class)
After Vim opens the document, encoding = UTF-8 (determined by locale), fileencoding = Latin1 (caused by incorrect automatic encoding judgment mechanism), and termencoding = NULL (by default, term encoding is not required ), the file is garbled.
Solution 1: first, modify fileencoding to cp936 or EUC-CN (the two are the same, but they are called differently). Note that the correct method is not: Set
Fileencoding = cp936. This is only to save the file as cp936. The correct method is to re-load the file as: edit by encoding cp936.
++ ENC = cp936, which can be abbreviated as E ++ ENC = cp936.
Solution 2: temporarily change the locale environment running Vim by using lang = zh_cn Vim
To start Vim in abc.txt mode, encoding = EUC-CN (determined by locale) and fileencoding = NULL (file under locale
The automatic encoding function is not enabled, so fileencoding remains the same as the file encoding method, that is, EUC-CN), termencoding = NULL (default value, empty value, and so on ).
At this time, it is garbled because our SSH terminal considers the received data as UTF-8, but Vim sends the data as EUC-CN, so it is still incorrect. Run the following command:
Set termencoding = UTF-8: If the terminal data is output as UTF-8, the display is normal.

2. The scenario is basically the same as that of scenario 1, except that the SSH software used is secure CRT class ANSI software.

After Vim opens the document, encoding = UTF-8 (determined by locale), fileencoding = Latin1 (caused by incorrect automatic encoding judgment mechanism), and termencoding = NULL (no need to convert term encoding by default ), the file is garbled.

Solution 1: ensure that the default code page of the Windows machine running secure CRT is cp936, which is already set by default in Chinese Windows. Others are the same as solution 1 above, but we only need to add one step: Set termencoding = cp936

Solution 2: similar to solution 2 above, but the last step to modify termencoding is omitted. In this case, the minimum modification is required, as long as the locale is set to zh_cn.
Encoding = EUC-CN, fileencoding, and termencoding are both null, that is, the value of encoding.
Status.

It can be seen that understanding the three key points and the significance of the three parameters will greatly help the coding problem, and you will be able to process the document as you like in the future, not just for VIM, in other environments that require encoding and conversion, you can apply similar ideas to solve the problem.

Finally, we recommend a powerful windows SSH client-xshell, which has multiple tabs similar to secure CRT.
But the most convenient is that this tool also has the ability to change the term encoding, so that we do not need to adjust termencoding frequently, just switch in the SSH Software
Code. This is the most convenient SSH tool I have used. It is a commercial software, but there is no limit on the use of non-registered users, but after the 30-day trial period is exceeded, it will prompt registration every time it is started.
Any impact.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.