Notes for editing different encoding files in VIM

Source: Internet
Author: User
Tags windows ssh client

<Notes when editing files of different encodings in vim>
Tags: vim, linux-tool, edit

This article describes some basic knowledge about vim's multi-byte encoding documentation (Chinese). Note that it does not involve
Gvim: Refers to vim in character terminals.

1. Basic vim coding knowledge:

1.1 three variables:

Encoding this option is used for buffered text (files you are editing), registers, Vim script files, etc.
Wait. You can set the 'encoding' option as the internal running mechanism of Vim.
Fileencoding this option is the encoding type used by vim when writing files.
Termencoding this option indicates the encoding type used to output to the client terminal (Term) (only when it is set
The terminal encoding type is the same so that the characters can be properly displayed !).

1.2 default values of the three variables:

Encoding is the same as the current locale of the system. Therefore, when editing files, consider the current locale; otherwise
There are more settings.
Fileencoding vim automatically identifies the encoding when a file is opened, and fileencoding is the identified value.
Encoding is used for saving files. If encoding is not modified, the value is
Previously locale.
Termencoding defaults to a null value, that is, no encoding conversion is performed on the output terminal.

It can be seen that the attention of editing different encoding files is not only the three variables, but also the current locale
And, file encoding and automatic encoding identification, the encoding type used by the client to run vim terminal three key points, this
Three key points affect the setting of three variables.

If someone asks: why is there a garbled code when I use vim to open a Chinese document?

The answer is uncertain. The reason has already been discussed above. I cannot figure out the three key points and the set values of these three variables.
Garbled code is normal, but not garbled code is a coincidence.

1.3 The values of these three keywords and the values of these three variables in this case

1.3.1 locale

Currently, most Linux systems use UTF-8 as the default locale, but it may not. For example, some systems
Use the Chinese locale zh_CN.GB18030. When locale is UTF-8, encoding will set
Set to UTF-8, which is the best compatibility, because if the internal processing uses UTF-8, regardless of the external storage Encoding
(Locale determines the internal data processing encoding of vim, that is, encoding ).

1.3.2 file encoding and automatic encoding Recognition

This involves various encoding rules, so we will not elaborate on them one by one. But we need to understand that the file encoding type is not
Stored in the file, that is, there is no descriptive field to record the encoding type of the document. Therefore
When editing a document, we must either know the encoding used for saving the document, or
Some methods to determine the encoding type, this other means is determined by some encoding code table features, such
For example, the number of bytes occupied by each character, and whether the ascii value of each character is greater than a field to determine whether the file belongs
This method is also used by vim, which is the automatic encoding and identification mechanism of vim.
There are various types of encodings. It is impossible for each encoding to have significant features to identify, so it is impossible to be 100% accurate.
It is encoded in GB2312 format. Because the Chinese character uses two characters whose acⅱ value is higher than 127
The gb2312 encoding file cannot be separated from the latin1 Encoding Area. Therefore, the automatic identification mechanism
Gb2312 is unsuccessful. It only recognizes the file as latin1 encoding. This problem also occurs in gbk and big5.
Therefore, you must manually set encoding and fileencoding when editing such a document. If the document code is
Generally, vim can automatically recognize the correct encoding when UTF-8 is used.

1.3.3 encoding type used by the client to run vim

Like the second article, this is also a key point that is hard to determine. The second key point determines that the content is read from the file.
And the encoding used to write the content to the file. This key point determines the encoding used when vim outputs the content to the terminal,
If the encoding type is different from the encoding type of the data received by the terminal, garbled characters may occur.
In the Linux local X environment, generally, the terminal considers that the encoding type of the received data is consistent with the locale type.
You do not need to worry about the problem. However, if a remote terminal is involved, such as logging on to the server through SSH
This may occur. For example, from a system with a locale of gb2310 (called a client) SSH to a system with a locale of UTF-8
System (called server) and enable Vim to edit the document. Without any changes, the data returned by the server is
UTF-8, but the client considers the data returned by the server to be gb2312. The data is interpreted according to gb2312.
It is garbled. In this case, we need to set termencoding to gb2312 to solve this problem.
When we remotely log on to the server by using SSH on Windows desktop, the Code Conversion from different systems is involved.
For this reason, it is highly related to windows and the SSH client. There are two types of codes in windows.
A type of software, one is itself written for the Unicode encoding method, one is the ANSI software, that is, the process
Data is processed in sequence directly using byte streams, regardless of encoding. The previous program can be correct on windows in any language.
Displays the languages in which the other language is written, and the system in which the other language is correctly displayed.
For these two types of programs, we need to treat them differently. Take the SSH client as an example, we use
Putty is a unicode software, while secure CRT is an ANSI software. For the former, we must correctly process Chinese characters,
You only need to ensure that the encoding of VIM output to the terminal is UTF-8, that is, termencoding = UTF-8. But for the latter
On the other hand, we need to confirm that the default code page for windows is cp936 (default for Chinese windows ).
Confirm the termencoding = cp936.

2. Let's take a look at the typical situations and setting methods for handling Chinese documents:

1) The system locale is UTF-8 (many linux systems use the default locale Format). The edited documents are in GB2312 or GBK format.
(Windows notepad saves the form by default, and most editors Save the form by default, so the most
(Common). The terminal type is UTF-8 (that is, assume that the client is a unicode software of the putty class)

After vim opens the document, encoding = UTF-8 (determined by locale) and fileencoding = latin1 (automatically edited)
Due to incorrect code judgment mechanism), termencoding = NULL (no need to convert the term encoding by default), and the display file is messy
Code.

Solution:

1. First, modify fileencoding to cp936 or euc-cn (the two are the same, but they are called differently ),
Note that the correct method is not: set fileencoding = cp936, which only saves the file as cp936
The correct method is to re-load the file with the cp936 encoding method: edit ++ enc = cp936, which can be abbreviated
: E ++ ENC = cp936.

2. Temporarily change the locale environment running Vim by using lang = zh_cn Vim abc.txt
Run vim, then encoding = EUC-CN (determined by locale), fileencoding = NULL (under locale
The file encoding automatic identification function is not enabled, so fileencoding remains the same as the file encoding method.
Is EUC-CN), termencoding = NULL (default value, if it is null, It is equal to encoding) at this time it is still garbled
Because our SSH terminal considers the accepted data as UTF-8, but Vim sends the data as EUC-CN
In this case, run the following command: Set termencoding = UTF-8 to output the terminal data as UTF-8,
It is displayed as normal.

(2) The situation is basically the same as that of scenario 1, except that the SSH software used is the secure CRT class ANSI software. Vim
, Encoding = UTF-8 (determined by locale), fileencoding = Latin1 (automatic encoding judgment mechanism not allowed)
), Termencoding = NULL (no need to convert the term encoding by default), the display file is garbled.

Solution:

1. Ensure that the default code page of the Windows machine running secure CRT is cp936.
Windows is already set by default. Others are the same as solution 1 above, but you only need to add one step: Set
Termencoding = cp936

2. Similar to solution 2 above, but the last step to modify termencoding is omitted. In this case
If Vim is enabled with locale as zh_cn, encoding = EUC-CN,
Fileencoding and termencoding are both null, that is, the value of encoding, which is the ideal condition.

It can be seen that understanding the three key points and the significance of the three parameters will greatly help the coding problem, and you will be able to do whatever you want in the future.
And can be used in other environments that require encoding and conversion.
To solve the problem.

Finally, we recommend a powerful windows ssh client-xshell, which has the same
Multi-tab ssh window capabilities, but the most convenient is that this tool also has the function of changing the Term encoding, so we
You don't need to adjust termencoding frequently. You only need to switch the encoding in the ssh software. This is the most used.
Convenient ssh tool. It is commercial software, but there is no limit on the use of non-registered users, but the 30-day trial period has exceeded
The system prompts registration every time it is started, which has no effect on the function.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.