Vim-based file encoding in Linux

Source: Internet
Author: User

After struggling, we made a compromise to GBK. All three projects will be based on the GB encoding. I think I will miss the utf8 age in the future. In fact, if you are familiar with the relationship between these encodings and pay more attention to the file encoding during development, there will probably be no garbled code or pre-and post-Ajax encoding and decoding problems, there will always be people who forget to save the file as a GB-encoded file at some point in time, and there will always be people who forget this root cause when developing well. Since unified encoding, in front-end development under winodw, files must be stored as GBK, and editplus or notepad must be encoded as ANSI, these editors will identify whether the encoding is GBK, gb2312, or gb18030 Based on the code stream. at least part of the encoding of the GB series is compatible with each other, and the code stream has a lot of overlap space.
Many text editors in Windows do not forcibly adopt certain encoding. In Linux, it is not that simple. You must be careful when using vim.

Let's talk about these encodings. gb2312 is extended based on ASCII, and later expanded GBK and gb18030. The latest version of gb18030 is released only after 05 years. It is commonly used in simplified Chinese text, the three types of codes are fully compatible, but the new standard adds many uncommon words, radicals, and traditional Chinese characters. Unicode is a completely different encoding scheme from ASCII. utf8 is an encoding scheme based on this scheme. utf8 contains other languages other than Chinese and is suitable for global use, therefore, the size of utf8 is much larger than that of the character set combination of the GB series, but it is equivalent to that of the gb18030 series in terms of Chinese characters, and its encoding is completely incompatible with that of the GB series. In Linux, if not configured. in vimrc, The fileencoding and VIM will use the default system encoding to read and write files. the encoding type is defined in vimrc. My vimrc is defined as follows:

Set fileencodings = ucs-bom, UTF-8, gb18030, GBK

Uc-bom is a Unicode-encoded type. It is similar to utf8. Put utf8 at the top of the list because Vim is trying to read files using uc-bom or UTF-8, if an error is found, the subsequent encoding is used to read the file, but Vim cannot identify the Error Based on GBK and gb18030. No gb2312? It is useless to set gb2312 in vimrc. Perform operations based on this setting;

In Linux, VIM is an empty file, which must be UTF-8 encoded. input the word "Chinese" and save it as 11. utf8: run the file command to view the file. The prompt is that the Unicode file of utf8 is correct.

Vim another 11. gb18030: Enter the Chinese character, set the encoding: Set fileencoding = gb18030, and save the file name as 22. gb18030 exit, use file to view the file, the prompt is the ISO-8859 file, and no prompt is gb18030, in fact, the file command can only judge the encoding method according to the code stream, while GBK, the gb2312 and gb18030 code streams are consistent in the simplified Chinese range. Therefore, we cannot determine what character sets are, while utf8 encoding is a definite character set, therefore, the GBK, gb2312, and gb18030 files cannot be seen from the code stream. Similarly, input a file named Vim
"Chinese": Set fileencoding = GBK. Save and exit. It is the same. However, after the set command is executed, the editor will show that the file is encoded as cp936. cp936 is the alias of GBK.

Similarly, if you set fileencoding = gb2312 to a Vim file, the editor displays EUC-CN,

However, when the GB * encoding file is opened for the second time, the character sets recognized by VIM are different from those originally set, for example, open 22. gb18030, 33. GBK and 44. when gb2312 is used, VIM recognizes it as gb18030. Vim is bigger for compatibility consideration. Can I write a non-gb2312 Character Set in a file encoded with the gb2312 character set? For example, if I create a Vim file, Set
Fileencoding = gb2312, and then enter the Chinese character "", where "" is not a gb2312 character (search for all gb2312 characters here), and then save it. Error: "write error, conversion failed ", cannot be forcibly saved,

Only set fileencoding = GBK or gb18030 can be successfully saved. However, a prompt is displayed before saving the file. After the file is forcibly saved, the word "converted" appears, there seems to be a "encoding conversion" process in the middle.

Similarly, when you re-open a file with set fileencoding = gb2312 just now, the "converted" will also appear when you enter the "awkward" word and save it ", it seems that a "encoding conversion" process has passed, but there is no serious prompt compared with the previous example,

This is probably the benefit that Vim uses gb18030 to identify GBK, gb2312, and gb18030. As mentioned earlier, the encoding methods of the GB series are almost the same, as long as the three characters simultaneously contain the same encoding. When you use the file command to view files in three formats, the ISO-8859 will be displayed to indicate that this is the encoding of the GB series, the specific character set is unknown,

Let's look at the binary storage of the four files with the same content. The GB files are also consistent. (Therefore, configuring fileencoding in vimrc to write gb2312 is useless, and even writing GBK is useless. You only need to write fileencoding = gb18030 .)

In addition, some Vim configurations do not recognize the current file encoding. I configure it like this and display the file encoding in the status bar:

Set statusline = % <[% N] \
% F \ % H % m % R % = % K [% {(& fenc = \"\")? & Amp; ENC: & amp; fenc} %
{(& Bomb? \ ", BOM \": \ "\")}] [% {& ff}] [ASCII = \ % 03.3b]
\ %-10. (% L, % C % v %) \ % P

Conclusion:

1. Development in Linux should be unrelated to the default system encoding.
2. fileencoding of vimrc should be configured for development in Linux. The configuration content is as above.
3. If the configuration in vimrc does not first use gb18030 to write files, you should manually execute set fileencoding = gb18030when creating the file in Vim.
4. We do not recommend placing gb18030 In Front Of The ucs-bom and UTF-8 files in vimrc to avoid creating all files in Linux with GB encoding. After all, it is common to forget to convert the encoding.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.