Vim encoding and font

Source: Internet
Author: User
Tags gtk

Generally, garbled characters may occur when Vim opens a Chinese file. The reason is complicated and it is not so cool. Direct Solution
Set fileencoding = gb18030
Set fileencodings = UTF-8, gb18030, UTF-16, big5
Do you want to see the reason for this setting? Continue. The following is widely used in the Internet

The encoding in VIM is mainly related to three parameters:ENC (encoding), fenc (fileencoding) and fencs (fileencodings)

FencIt is the encoding of the current file, that is, a file correctly displayed in VIM (provided that your system environment matches your ENC settings ), you can change fenc and then W to save the file into different encodings. For example, I: Set fenc = UTF-8 and then: W will save the file as UTF-8, And: Set fenc = gb18030 and then: W will save the file as gb18030. This value does not matter whether the file can be correctly decoded when it is opened.

FencsThis is a list of guesses used to decode a file when it is opened. File encoding is not correct, so Vim can only guess the file encoding. For example, the setting in my vimrc is
Set fileencodings = UTF-8, gb18030, UTF-16, big5

Therefore, each time I open a file in Vim, I first try to use UTF-8 for decoding. If an error occurs when I use UTF-8 for decoding, the so-called error means that UTF-8 cannot be correctly decoded somewhere ), then, use gb18030 to re-decode the code from the beginning. If gb18030 is wrong (note that gb18030 is not a UTF-8-like rule encoding, the so-called error is that a specific encoding has no meaningful words, for example 0), try to use the UTF-16, still try to use big5 error. In this case, if there is no error in the decoding process from the beginning to the end, VIM considers the file to be encoded and will not try again. At this time, the fenc value will be set as the final encoding value used by VIM. You can use: Set fenc? To view the specific information.

Of course, this may also cause errors. For example, if your file is gb18030 encoded, but in fact only one or two characters are Chinese, then they may also be decoded by UTF-8, this file will be mistakenly identified as UTF-8, resulting in incorrect decoding.

ENCIts function is basically display. No matter what the final file is encoded, VIM will convert it to the current system encoding for processing, so that it can be correctly displayed in the current system. Therefore, ENC does this. In Windows, the default value of ENC is cp936, which is the default encoding of Chinese Windows. Therefore, ENC does not need to be changed. In Linux, as your system locale may be set to zh_cn.gb18030 or zh_CN.utf-8, your ENC should be set to gb18030 or UTF-8 (or GBK ).

Finally, let's talk about the default encoding of the new empty file. It seems that the first encoding in fencs will be used as the default encoding of the new file. However, there is a problem here, that is, the order of fencs has a lot to do with the decoding success rate. According to my experience, the success rate of UTF-8 is higher than that of gb18030, what if I want to set the new file to gb18030 encoding by default? One method is to set fenc = gb18030 after each file creation, but I found that setting fenc = gb18030 in vimrc can also achieve this effect.

In addition, some people have proposed this method in the Ubuntu Chinese forum and configured it directly.

AllCodePaste it directly to the terminal to run it!
InstallProgram
Code:
Sudo apt-Get install vim-GTK vim-Doc cscope

Create a startup Item
Code:

Cat>/usr/share/applications/gvim. Desktop <$ home/. vimrc <"EOF"
"========================================================== ==========================================
"Project: gvim configuration file
By yonsan [QQ: 82555472]
"Installation: sudo apt-Get install vim-GTK
"Usage: Copy this file (. vimrc) to $ home/
"========================================================== ==========================================

"Use the Murphy palette
Colo Murphy
Set the font list for the GUI.
Set guifont = simsun 10
"
Set nocompatible
"Sets the file browser directory to the current directory.
Set bsdir = Buffer
Set autochdir
"Set Encoding
Set ENC = UTF-8
Sets the file encoding.
Set fenc = UTF-8
Sets the file encoding detection type and supported formats.
Set fencs = UTF-8, ucs-bom, gb18030, GBK, gb2312, cp936
"Specifies the menu language
Set langmenu = zh_CN.UTF-8
Source $ vimruntime/delmenu. Vim
Source $ vimruntime/menu. Vim
"Sets the syntax brightness.
Set SYN = CPP
"Show row number
Set Nu!
"High Brightness Display of search results
Set hlsearch
"Tab width
Set tabstop = 4
Set cindent shiftwidth = 4
Set autoindent shiftwidth = 4
C/C ++ comments
Set comments = ://
"Corrected the automatic C-style annotation function.
Set comments = S1:/*, MB: *, ex0 :/
"Enhanced search functions
Set tags =./tags,./../tags,./**/tags
"Save File Format
Set fileformats = UNIX, DoS
"Keyboard operation
Map GK
Map GJ
"Command Line Height
Set Partition Height = 1
"Use cscope
If has ("CSAs ")
Set csprg =/usr/bin/cs.pdf
Set CSTO = 0
Set CST
Set nocsverb
"Add any database in current directory
If filereadable ("cscope. Out ")
CS add cscope. Out
"Else add database pointed to by Environment
Elseif $ cscope_db! = ""
CS add $ cscope_db
Endif
Set csverb
Endi
"Chinese help
If version> 603
Set helplang = Cn
Endi
EOF

Configuration file where locale is zh_cn.gbk
Code:

Cat> $ home/. vimrc <"EOF"
"========================================================== ==========================================
"Project: gvim configuration file
By yonsan [QQ: 82555472]
"Installation: sudo apt-Get install vim-GTK
"Usage: Copy this file (. vimrc) to $ home/
"========================================================== ==========================================

"Use the Murphy palette
Colo Murphy
Set the font list for the GUI.
Set guifont = simsun 10
"
Set nocompatible
"Sets the file browser directory to the current directory.
Set bsdir = Buffer
Set autochdir
"Set Encoding
Set ENC = Chinese
Sets the file encoding.
Set fenc = Chinese
Sets the file encoding detection type and supported formats.
Set fencs = GBK, UTF-8, ucs-bom, gb18030, gb2312, cp936
"Specifies the menu language
Set langmenu = zh_cn.gbk
Source $ vimruntime/delmenu. Vim
Source $ vimruntime/menu. Vim
"Sets the syntax brightness.
Set SYN = CPP
"Show row number
Set Nu!
"High Brightness Display of search results
Set hlsearch
"Tab width
Set tabstop = 4
Set cindent shiftwidth = 4
Set autoindent shiftwidth = 4
C/C ++ comments
Set comments = ://
"Corrected the automatic C-style annotation function.
Set comments = S1:/*, MB: *, ex0 :/
"Enhanced search functions
Set tags =./tags,./../tags,./**/tags
"Save File Format
Set fileformats = UNIX, DoS
"Keyboard operation
Map GK
Map GJ
"Command Line Height
Set Partition Height = 1
"Use cscope
If has ("CSAs ")
Set csprg =/usr/bin/cs.pdf
Set CSTO = 0
Set CST
Set nocsverb
"Add any database in current directory
If filereadable ("cscope. Out ")
CS add cscope. Out
"Else add database pointed to by Environment
Elseif $ cscope_db! = ""
CS add $ cscope_db
Endif
Set csverb
Endi
"Chinese help
If version> 603
Set helplang = Cn
Endi
EOF


Like all popular text editors, VIM can well edit a variety of character encoding files, which of course include popular unicode encoding methods such as UCS-2 and UTF-8. Unfortunately, like a lot of software from the Linux world, you need to set it yourself.

Vim has four options related to the character encoding method,Encoding, fileencoding, fileencodings, termencoding(For the possible values of these options, see Vim online help: Help encoding-names). Their meanings are as follows:

Encoding:The character encoding method used internally by VIM, including the buffer of VIM, menu text, and message text. By default, it is recommended to change the value of locale only in. vimrc in the user manual. In fact, it only makes sense to change the value in. vimrc. You can use another encoding method to edit and save files. For example, if your vim encoding is UTF-8, the edited file uses cp936 encoding, vim will automatically convert the Read File to UTF-8 (Vim can read), and when you write the file, it will automatically convert it back to cp936 (the file storage encoding ).

Fileencoding:The character encoding method of the file currently edited in Vim. When saving the file, VIM also saves the file as this encoding method (whether new files are used or not ).

Fileencodings:Vim automatically detects the sequence list of fileencoding. at startup, it detects the character encoding methods of the files to be opened one by one based on the character encoding methods listed in it, set fileencoding to the character encoding method that is finally detected. Therefore, it is best to put the Unicode encoding method at the beginning of this list, and put Latin1 in the latin1.

Termencoding:The character encoding method of the terminal (or Windows Console window) operated by VIM. If Vim is encoded in the same term as vim, you do not need to set it. Otherwise, you can use the termencoding option of VIM to automatically convert it to the term encoding. this option is invalid for gvim in common GUI mode in windows, but Vim in Console mode is the code page in Windows console, and we usually do not need to change it.

Well, I have explained this pile of parameters that will easily confuse new users. Let's take a look at how Vim's multi-character encoding method supports work.

1. Start vim and set the encoding mode of the buffer, menu text, and message text based on the encoding value set in. vimrc.

2. Read the file to be edited and test the file encoding method one by one based on the character encoding methods listed in fileencodings. And set fileencoding to the detected character encoding method, which looks correct (note 1.

3. compare the values of fileencoding and encoding. If they are different, call iconv to convert the file content to the character encoding method described by encoding, and put the converted content in the buffer opened for this file, now we can edit this file. Note: To complete this step, you need to call the external iconv. dll (note 2). You need to ensure that this file exists in $ vimruntime or other columns in the PATH environment variable directory.

4. When saving the file after editing, compare the values of fileencoding and encoding again. If different, call iconv again to convert the text in the buffer to the character encoding method described by fileencoding, and save it to the specified file. Similarly, you need to call iconv. DLL because Unicode can contain characters in almost all languages, and the Unicode UTF-8 encoding method is a very cost-effective encoding method (less space consumption than UCS-2 ), therefore, we recommend that you set the encoding value to UTF-8. Another reason for doing so is that when encoding is set to UTF-8, VIM will automatically detect more accurate file encoding methods (maybe this is the main reason ;). For Files edited in Chinese Windows, to ensure compatibility with other software, the file encoding is set to gb2312/GBK. Therefore, fileencoding is recommended to be set to Chinese (Chinese is an alias, indicates gb2312 in UNIX, cp936 in windows, that is, the GBK code page ).

Here is my. in vimrc (see the attachment), the character encoding method settings are flexible. You can use the environment variable $ Lang in the system (% Lang % in Windows) to automatically set the appropriate character encoding method. At this point, it is recommended to set % Lang % = zh_CN.UTF-8, you can easily through the Windows registry script file.

Note 1:In fact, the test accuracy of VIM is not high, especially when encoding is not set to UTF-8. Therefore, we strongly recommend that you set encoding to UTF-8, although it may cause another minor problem if you want Vim to display chinese menus and prompt messages.

NOTE 2:On the gnu ftp can be downloaded to iconv Win32 version (http://mirrors.kernel.org/gnu/libiconv/libiconv-1.9.1.bin.woe32.zip), not recommended to gnuwin32 (http://gnuwin32.sourceforge.net/) download libiconv, because that version is older, and need to rename the DLL file.

NOTE 3:View help: H iconv-dynamic

On MS-Windows Vim can be compiled with the | + iconv/Dyn | feature. This means

Vim will search for the "iconv. dll" and "libiconv. dll" libraries. When

Neither of them can be found Vim will still work but some conversions won't be

Possible.

Appendix 1:Vimrc File

"Multi-encoding setting, must be in the beginning of. vimrc!
"
If has ("multi_byte ")
"When 'fileencodings 'starts with 'ucos-bom ', don't do this manually
"Set bomb
Set fileencodings = UCS-bom, Chinese, Taiwan, Japan, Korea, UTF-8, Latin1
"CJK environment detection and corresponding setting
If V: lang = ~ "^ Zh_cn"
"Simplified Chinese, on unix euc-CN, on MS-Windows cp936
Set encoding = Chinese
Set termencoding = Chinese
If & fileencoding =''
Set fileencoding = Chinese
Endif
Elseif V: lang = ~ "^ Zh_tw"
"Traditional Chinese, on unix euc-TW, on MS-Windows cp950
Set encoding = Taiwan
Set termencoding = Taiwan
If & fileencoding =''
Set fileencoding = Taiwan
Endif
Elseif V: lang = ~ "^ Ja_jp"
"Japan, on unix euc-JP, on MS-Windows cp932
Set encoding = Japan
Set termencoding = Japan
If & fileencoding =''
Set fileencoding = Japan
Endif
Elseif V: lang = ~ "^ Ko"
"Korean on unix euc-KR, on MS-Windows cp949
Set encoding = Korea
Set termencoding = Korea
If & fileencoding =''
Set fileencoding = Korea
Endif
Endif
& Quot; detect UTF-8 locale, and override CJK setting if needed
If V: lang = ~ "Utf8 $" | V: lang = ~ UTF-8 $"
Set encoding = UTF-8
Endif
Else
Echoerr 'Sorry, this version of (g) Vim was not compiled with "multi_byte "'
Endif

Appendix 2:

Supported 'encoding' values are: * encoding-values *
1 Latin1 8-bit characters (ISO 8859-1)
1 iso-8859-n iso_8859 variant (n = 2 to 15)
1 koi8-r Russian
1 koi8-u Ukrainian
1 macroman (Macintosh encoding)
1 8bit-{name} Any 8-bit encoding (Vim specific name)
1 cp437 similar to iso-8859-1
1 cp737similar to iso-8859-7
1 cp775 Baltic
1 cp850 similar to iso-8859-4
1 cp852 similar to iso-8859-1
1 cp855 similar to iso-8859-2
1 cp857 similar to iso-8859-5
1 cp860 similar to iso-8859-9
1 cp861 similar to iso-8859-1
1 cp862 similar to iso-8859-1
1 cp863 similar to iso-8859-8
1 cp865 similar to iso-8859-1
1 cp866 similar to iso-8859-5
1 cp869 similar to iso-8859-7
1 cp874 Thai
1 cp1250 Czech, Polish, etc.
1 cp1251 Cyrillic
1 cp1253 Greek
1 cp1254 Turkish
1 cp1255 Hebrew
1 cp1256 Arabic
1 cp1257 Baltic
1 cp1258 Vietnamese
1 CP {number} ms-Windows: Any installed single-byte codePage
2 cp932 Japan (Windows only)
2 EUC-JP Japanese (unix only)
2 sjis Japan (unix only)
2 cp949 Korean (UNIX and Windows)
2 EUC-Kr Korean (unix only)
2 cp936 Simplified Chinese (Windows only)
2 EUC-CN Simplified Chinese (unix only)
2 cp950 traditional Chinese (on UNIX alias for big5)
2 big5 traditional Chinese (on Windows alias for cp950)
2 EUC-tw traditional Chinese (unix only)
2 2byte-{name} Unix: Any double-byte encoding (Vim specific name)
2 CP {number} ms-Windows: Any installed double-byte codePage
U UTF-8 32 bit UTF-8 encoded Unicode (ISO/IEC 10646-1)
U ucs-2 16 bit UCS-2 encoded Unicode (ISO/IEC 10646-1)
U ucs-2le like ucs-2, little endian
U UTF-16 ucs-2 extended with double-words for more characters
U utf-16le like UTF-16, little endian
U ucs-4 32 bit UCS-4 encoded Unicode (ISO/IEC 10646-1)
U ucs-4le like ucs-4, little endian

The {name} can be any encoding name that your system supports. It is passed
To iconv () to convert between the encoding of the file and the current locale.
For MS-Windows "CP {number}" means using codePage {number }.

Several aliases can be used, they are translated to one of the names above.
An incomplete list:

1 ANSI same as Latin1 (obsolete, for backward compatibility)
2 Japan: On Unix "EUC-JP", on MS-Windows cp932
2 Korea Korean: On Unix "EUC-KR", on MS-Windows cp949
2 PRC Simplified Chinese: On Unix "EUC-CN", on MS-Windows cp936
2 Chinese same as "PRC"
2 Taiwan traditional Chinese: On Unix "EUC-tw", on MS-Windows cp950
U utf8 same as UTF-8
U Unicode same as ucs-2
U ucs2be same as ucs-2 (big endian)
U ucs-2be same as ucs-2 (big endian)
U ucs-4be same as ucs-4 (big endian)
Default stands for the default value of 'encoding', depends on
Environment

 

Reprinted statement: This article from http://www.cnblogs.com/h2appy/archive/2008/08/14/1267593.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.