How does VIM display garbled characters in Chinese encoding files?

Source: Internet
Author: User
VIM: how to modify your. vimrc file by checking whether the file is garbled in Chinese encoding, so that it supports gb2312 and will be automatically recognized. You can refer to my setting code: & quot; to set the file encoding type to completely solve the Chinese encoding problem let & amp; termencoding & amp; encodingsetfileenco... VIM: Check whether the Chinese encoding file is garbled. the vimrc file is automatically identified by gb2312. Refer to my setting code: "Set the file encoding type to completely solve the Chinese encoding problem. let & termencoding = & encodingset fileencodings = UTF-8, gbk, ucs-bom, cp936 ------------------------------------------------------------------------------ oldniu solves the vi fancy text problem in the terminal according to karron's method. The content is as follows: some notes should be noted when editing files of different encodings in vim. This article describes some basic knowledge about vim's multi-byte encoding documentation (Chinese, note that gvim is not involved. it only refers to vim under the character terminal. [Vim encoding basics] 1. there are three variables: encoding-This option is used in buffered text (the file you are editing), registers, Vim script files, and so on. You can set the 'encoding' option as the internal running mechanism of Vim. Fileencoding ---- this option is the encoding type used by vim when writing files. Termencoding ---- this option indicates the encoding type used to output to the client terminal (Term. 2. default value of the three variables: encoding ---- same as the current locale of the system. Therefore, you must consider the current locale when editing the file. Otherwise, you need to set more items. Fileencoding ---- vim automatically identifies the encoding when opening the file, and fileencoding is the recognized value. If it is null, the file is saved in the encoding format of encoding. If encoding is not modified, the value is the current locale of the system. Termencoding ---- the default value is null, that is, the output to the terminal does not undergo encoding conversion. It can be seen that editing different encoding files requires not only the three variables, there are also three key points: the current locale and, file encoding and automatic encoding identification, and the encoding types used by the client to run vim. these three key points affect the setting of the three variables. If someone asks: why is there a garbled code when I use vim to open a Chinese document? The answer is uncertain. the reason has already been discussed above. it is normal to find out the three key points and the set values of the three variables, it is a coincidence that no garbled characters appear. Let's take a look at the values of these three key points and the values of these three variables in this case: 1. locale-currently, most Linux systems use UTF-8 as the default locale, but it may not. for example, some systems use the Chinese locale zh_CN.GB18030. When locale is UTF-8, encoding is set to UTF-8 after vim is started. this is the best compatibility mode because UTF-8 is used for internal processing, no defect conversion can be performed regardless of the external storage encoding. Locale determines the encoding of data internally processed by vim, that is, encoding. 2. file encoding and automatic encoding recognition-this involves various encoding rules, so I will not go into detail here. However, you need to understand that the file encoding type is not stored in the file, that is, there is no descriptive field to record the encoding type of the document. Therefore, when editing a document, we must either know the encoding used for saving the document, or determine the encoding type through other means, it is determined by some encoding code table features, such as the number of bytes occupied by each character, and whether the ascii value of each character is greater than a field to determine the encoding of the file. This method is also used by vim, which is the automatic encoding and recognition mechanism of vim. However, this mechanism is not 100% accurate because of the variety of encodings, and it is impossible for each encoding to have significant features for identification. Because our GB2312 encoding uses two Chinese characters with an acⅱ value higher than 127, it is impossible to separate the gb2312 encoding file from the latin1 encoding area, therefore, the automatic identification mechanism is unsuccessful for gb2312. it only recognizes the file as latin1 encoding. This problem also occurs in gbk and big5. Therefore, when editing such documents, you need to manually set encoding and fileencoding. If the file encoding is UTF-8, vim can automatically identify the correct encoding. 3. the client's terminal running vim uses the same encoding type as the second one, which is also a key point that is hard to determine. The second key point determines the encoding used to read content from the file and write content to the file. this key point determines the encoding used when vim outputs content to the terminal, if the encoding type is different from the encoding type of the data received by the terminal, garbled characters may occur. In a linux local X environment, generally, the terminal considers that the encoding type of the received data is consistent with the locale type of the system. Therefore, you do not need to worry about the problem. However, if a remote terminal is involved, such as logging on to the server through ssh, the problem may occur. For example, from a system (called a client) with a locale of GB2310 ssh to a system (called a server) with a locale of UTF-8 and enable vim to edit the document without any modification, the data returned by the server is UTF-8, but the client considers that the data returned by the server is gb2312. according to gb2312, the data must be garbled, in this case, you need to set termencoding to gb2312 to solve this problem. This issue occurs even more when we remotely log on to the server through ssh on windows desktop, which involves encoding conversion between different systems. Therefore, it is highly related to windows and ssh clients. There are two types of coding software in windows, one is the software written for unicode encoding, the other is the ansi software, that is, the program processes data directly using byte streams, do not care about encoding. The previous program can correctly display the multi-language on windows in any language, and the other program can only display the correct text on the system of the language. For these two types of programs, we need to treat them differently. Take the ssh client as an example. The putty we use is unicode, while the secure CRT is ansi. For the former, we need to correctly process Chinese characters. we only need to ensure that the encoding of vim output to the terminal is UTF-8, that is, termencoding = UTF-8. But for the latter, on the one hand, we need to confirm that the default code page of our windows system is cp936 (the default value of Chinese windows), and on the other hand, we need to confirm that the termencoding set by vim is cp936. Finally, let's take a look at the typical situations and setting methods for handling Chinese documents: 1. the system locale is UTF-8 (the default locale format for many linux systems). The edited documents are in GB2312 or GBK format (Windows Notepad is saved by default. most editors are saved in this format by default, so it is the most common). if the terminal type is UTF-8 (that is, assume that the client is a unicode software of the putty class), after vim opens the document, encoding = UTF-8 (determined by locale ), fileencoding = latin1 (the automatic encoding mechanism is incorrect), termencoding = null (no need to convert the term encoding by default), and the display file is garbled. Solution 1: first, modify fileencoding to cp936 or euc-cn (the two are the same, but they are called differently). Note that the correct method is not: set fileencoding = cp936, this is just to save the file as cp936. the correct method is to re-load the file in cp936 encoding mode: edit ++ enc = cp936, which can be abbreviated as: e ++ enc = cp936. Solution 2: temporarily change the locale environment running vim to start vim in the format of LANG = zh_CN vim abc.txt. then encoding = euc-cn (determined by locale ), fileencoding = null (the file encoding automatic identification function is not enabled in locale, so fileencoding remains the same as the file encoding method, that is, euc-cn), and termencoding = null (default value, if it is null, it is equivalent to encoding. at this time, garbled characters are still returned, because our ssh terminal considers the accepted data as UTF-8, but vim sends the data as euc-cn, so it is still incorrect. In this case, run the following command: set termencoding = utf-8 to output the terminal data as UTF-8. 2. the situation is basically the same as that of Scenario 1, except that the ssh software used is secure CRT class ansi software. After vim opens the document, encoding = UTF-8 (determined by locale), fileencoding = latin1 (caused by incorrect automatic encoding judgment mechanism), and termencoding = null (no need to convert term encoding by default ), the file is garbled. Solution 1: ensure that the default code page of the windows machine running secure CRT is CP936, which is already set by default in Chinese windows. The others are the same as solution 1 above, but we only need to add one step: set termencoding = cp936 Solution 2: Similar to solution 2 above, but the last step is to modify termencoding, in this case, the minimum modification is required. if vim is enabled with locale as zh_CN, encoding = euc-cn, fileencoding, and termencoding are empty, that is, the value of encoding, is the most ideal situation.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.