How does vim edit GB2312 encoded files?
How does vim edit GB2312 encoded files under the Linux system?
System environment: LC_ALL=ZH_CN. UTF-8
Modify the. vimrc file so that it supports gb2312 on the line
"Set the file encoding type, completely solve the Chinese coding problem
Let &termencoding=&encoding
Set fileencodings=utf-8,gbk,ucs-bom,cp936
Check it out a little bit. The meaning of the content added in VIMRC is as follows:
Some places to be aware of when editing different encoded files in vim
This article explains some of the basics of vim editing a multibyte-encoded document (Chinese), noting that it does not involve Gvim, the vim under the pure-pointing character terminal.
Basic knowledge of VIM coding:
1, there are 3 variables:
encoding--This option makes it useful for buffering text (the file you are editing), registers, Vim script files, and so on. You can think of the ' encoding ' option as a setting for the internal operating mechanism of Vim.
fileencoding--This option is the type of encoding that VIM uses when writing to a file.
termencoding--This option represents the encoding type that is used by the output to the customer terminal (term).
2, the default value for this 3 variables:
encoding--is the same as the current locale of the system, so when you edit the file, consider the current locale, otherwise you will have more to set things up.
Fileencoding--vim automatically identifies the file when it is opened, and the fileencoding is the recognized value. Empty saves the file with encoding encoding, and if the encoding is not modified, that value is the current locale of the system. The
termencoding--default null value, which is the output to the terminal does not encode the conversion.
This shows that the editing of different encoding files need to be aware of not only these 3 variables, but also the system current locale and, the file itself encoding and automatic code recognition, the customer runs Vim terminal used by the encoding Type 3 key points, these 3 key points affect the setting of 3 variables.
If someone asks: Why do I use vim to open Chinese documents when garbled?
The answer is not sure, the reason above has been said, do not understand the 3 key points and the 3 variables set values, garbled is normal, it is not garbled that is coincidentally.
Take a look at the values of these three key points in common situations and the values of these 3 variables in this case:
1,locale--most Linux systems now have Utf-8 as their default locale, but may not, for example, some systems use Chinese locale zh_cn. GB18030. In the case where locale is utf-8, the encoding will be set to Utf-8 after the start of Vim, which is the best way to be compatible, because internal processing uses utf-8, regardless of the external memory encoding can be made without defect conversion. locale determines the encoding of Vim's internal processing data, which is encoding.
2, the file encoding and automatic code recognition-this aspect of the various coding rules, it is not a detailed talk. However, it is necessary to understand that the file encoding type is not stored in the file, that is, there is no descriptive field to record what kind of encoding the document is. So when we are editing a document, we either have to know what encoding the document was saved with, or determine the type of the encoding by some other means, which is determined by certain coded code table features, such as the number of bytes per character, The ASCII value of each character is greater than a field to determine what encoding the file belongs to. Vim is also used in this way, which is the automatic code recognition mechanism of VIM.but this mechanism because of the various coding, it is not possible to each encoding has significant characteristics to identify, so it is impossible to 100% accurate. For our GB2312 encoding, because its Chinese is the use of 2 acsii value more than 127 characters to form Chinese characters, it is not possible to gb2312 encoded files and latin1 encoding, so the mechanism of automatic identification coding for gb2312 is unsuccessful, It will only identify the file as Latin1 encoded. This problem also occurs in Gbk,big5. So when we are editing such documents, we need to manually set encoding and fileencoding. If the document is encoded as UTF-8, the normal vim will automatically recognize the correct encoding.
3, the type of encoding used by the client to run Vim's terminal-as in the second article, this is also a difficult point to determine. The second key determines the encoding to be used when reading from a file and writing to a file, and this key determines the encoding to use when VIM outputs the content to the terminal, which can also cause garbled problems if the encoding type and the terminal think that the data it receives is of different encoding types. In a Linux local x environment, the general terminal considers that the type of data it receives is consistent with the type of system locale, so there is no need to be concerned about this aspect of the problem. However, if a remote terminal is involved, such as an SSH login server, the problem is likely to occur. For example, from the 1 locale-GB2310 system (called the client) SSH to the locale-utf-8 system (called the server) and open the Vim editing document, without any changes, the data returned by the server is Utf-8, However, the client thinks that the data returned by the server is gb2312, according to gb2312 to interpret the data, it must be garbled, then you need to set termencoding to gb2312 to solve the problem. This problem occurs more in the case of our Windows desktop remote SSH login server, which involves the encoding conversion of different systems. So it has a lot to do with Windows itself and the SSH client. In Windows, there are two types of software coding, one is itself for the Unicode encoding software, one is the ANSI software, that is, the program processing data directly using byte stream, do not care about coding. The previous program can display the multilingual language correctly on Windows in any language, and the latter, the system on which the language is written, can only display the correct text on a system in which language. For these two types of programs, we need to treat each other differently. As an example of an SSH client, we use the Putty Unicode software, while the secure CRT is the ANSI software. For the former, we have to correctly handle Chinese, as long as the output of vim to ensure that the encoding of the terminal utf-8 can be, is termencoding=utf-8. But for the latter, on the one hand we want to confirm that our Windows system default code page is cp936 (Chinese Windows default), on the other hand to confirm the vim setting termencoding= cp936.
Finally, take a look at some of the most typical cases and settings for working with Chinese documents:
1, the system locale is Utf-8 (many Linux system default locale form), the edited document is the form of GB2312 or GBK (Windows Notepad default save form, most of the editor is also saved in this form by default, so the most common), Terminal type Utf-8 (that is, the client is assumed to be the Unicode software of the Putty Class)
When Vim opens the document, Encoding=utf-8 (locale determined), fileencoding=latin1 (automatic encoding judgment mechanism is not allowed), termencoding= empty (the default does not need to convert the term encoding), the display file is garbled.
Solution 1: First of all to fix fileencoding for cp936 or EUC-CN (the same, just the same as the term), note that the correct method is not: Set fileencoding=cp936, this just saves the file as cp936, The correct way is to re-load the file as cp936 encoding: Edit ++enc=cp936, can be abbreviated as: E ++enc=cp936。
Solution 2: Temporarily change the locale environment of vim operation by starting vim with LANG=ZH_CN vim abc.txt, then ENCODING=EUC-CN (locale determined), fileencoding= Empty (this locale file encoding automatic discriminant function is not enabled, so fileencoding for the file itself is encoded in the same way, that is, EUC-CN), termencoding= empty (the default, is empty equals encoding) is still garbled at this time, Because our SSH terminal believes that the data accepted is Utf-8, but Vim sends the data as EUC-CN, so it is still wrong. At this point, the command: Set termencoding=utf-8 the terminal data output to Utf-8, the display is normal.
2, the situation with 1 basically the same, just use SSH software for secure CRT class ANSI class software.
Vim opens the document, Encoding=utf-8 (locale determined), fileencoding=latin1 (automatic encoding judgment mechanism is not allowed to cause), termencoding= empty (the default does not need to convert the term encoding), the display file is garbled.
Solution 1: First make sure that the default code page for Windows machines running the secure CRT is CP936, which is already the default setting for Chinese Windows. The other is the same as scenario 1 above, just to add one step: Set termencoding=cp936
Solution 2: Similar to the above scenario 2, but the last step to modify termencoding omitted, in this case, the need for minimal modification, as long as the locale for ZH_CN to open vim, then ENCODING=EUC-CN, FileEncoding and termencoding are both null and encoding values, which is the ideal situation.
The understanding of the 3 key points and the significance of 3 parameters, the coding problem has a great help, after the arbitrary processing of documents, at the same time not only for vim, in other need to encode the conversion environment, can apply similar ideas to solve problems.
iconv: Illegal input sequence at unknown 189
WORKAROUND: Add parameter-C
Iconv-c-F utf-8-T gb2312 menu.sql>menu1.sql
view file encoding and modification code under Linux
How does vim edit GB2312 encoded files?