File encoding, file or file name encoding format conversion
Source: Internet
Author: User
File encoding, file or file name encoding format conversion if you need to operate files in windows in Linux, you may frequently encounter file encoding conversion problems. In Windows, the default file format is GBK (gb2312), while Linux is generally a UTF-8. In Linux...
File encoding, file or file name encoding format conversion if you need to operate files in windows in Linux, you may frequently encounter file encoding conversion problems. In Windows, the default file format is GBK (gb2312), while Linux is generally a UTF-8. The following describes how to view the file encoding in Linux and how to convert the file encoding. Www.2cto.com 1. view the file encoding: You can view the file encoding in Linux in the following ways: 1. view the file encoding in Vim: set fileencoding to display the file encoding format. If you only want to view files in other encoding formats or want to solve the problem of using Vim to view file garbled characters, you can ~ /. Add the following content to the vimrc File: set encoding = UTF-8 fileencodings = ucs-bom, UTF-8, cp936, vim can automatically identify the file encoding (can automatically identify the UTF-8 or GBK encoding files), in fact, according to the fileencodings provided by the encoding list to try, if not find the appropriate encoding, it is opened in latin-1 (ASCII) encoding. 2. enca (if this command is not installed in your system, you can use sudo yum install-y enca to install it) to view the file code $ enca filenamefilename: Universal transformation format 8 bits; UTF-8CRLF line terminators needs to note that enca is not very good at some GBK encoding files, the identification will appear: Unrecognized encoding 2, File encoding conversion 1. convert file encoding directly in Vim, such as converting a file into UTF-8 format: set fileencoding = utf-82. iconv conversion, iconv command format: iconv-f encoding-t encoding inputfile For example convert a UTF-8 encoded file into GBK encoded iconv-f GBK-t UTF-8 file1- O file23. enconv conversion file encoding for example, to convert a GBK encoded file into UTF-8 encoding, the operation is as follows enconv-L zh_CN-x UTF-8 filename three, file name encoding conversion: when copying files from Linux to windows or from windows to Linux, Chinese file names may be garbled. this problem occurs because the Chinese encoding of windows file names is GBK by default, in Linux, the default file name encoding is UTF8. because the encoding is inconsistent, the file name is garbled. to solve this problem, you need to transcode the file name. In Linux, the tool convmv is designed to convert file names from GBK to UTF-8 encoding or from UTF-8 to GBK. First, check whether convmv is installed on your system. if not, use yum-y install convmv. Next, let's take a look at the specific usage of convmv: convmv-f source code-t new encoding [option] common parameters for file names:-r recursive processing subfolders-notest for true operations, note that by default, files are not actually operated, but only for testing. -List shows all Supported encodings-unescap can be used for escape. for example, % 20 is converted into a space. for example, we have a UTF-8 encoded file name and converted it into GBK encoding. the command is as follows: convmv-f UTF-8-t GBK-notest utf8 encoded file name after this conversion "utf8 encoded file name" will be converted to GBK encoding (only the conversion of file name encoding, file content will not change) 4. the vim encoding method is the same as all popular text editors. Vim can edit various character encoding files, of course this includes popular Unicode encoding methods such as UCS-2 and UTF-8. Unfortunately, like a lot of software from the Linux world, you need to set it yourself. Vim has four options related to the character encoding method: encoding, fileencoding, fileencodings, and termencoding. for possible values of these options, see Vim Online help: help encoding-names ), their significance is as follows: * encoding: the character encoding method used inside Vim, including the buffer (buffer), menu text, and message text of Vim. By default, it is recommended to change the value of locale only in. vimrc in the user manual. In fact, it only makes sense to change the value in. vimrc. You can use another encoding method to edit and save files. for example, if your vim encoding is UTF-8, the edited file uses cp936 encoding, vim will automatically convert the read file to UTF-8 (vim can read), and when you write the file, it will automatically convert it back to cp936 (the file storage encoding ). * fileencoding: the character encoding method of the file currently edited in Vim. when saving the file, Vim also saves the file as this encoding method (whether new files are used or not ). * Fileencodings: Vim automatically detects the fileencoding sequence list. at startup, it detects the character encoding methods of the files to be opened one by one based on the character encoding methods listed in it, set fileencoding to the character encoding method that is finally detected. Therefore, it is best to put the Unicode encoding method at the beginning of this list, and put latin1 in the latin1. * Termencoding: the character encoding method of the terminal (or Windows Console window) operated by Vim. If vim is encoded in the same term as vim, you do not need to set it. Otherwise, you can use the termencoding option of vim to automatically convert it to the term encoding. this option is invalid for gVim in common GUI mode in Windows, but Vim in Console mode is the code page in Windows Console, and we usually do not need to change it. 5. Vim multi-character encoding method 1. Vim is started. the encoding method of the buffer, menu text, and message text is set based on the encoding value set in. vimrc. 2. read the file to be edited and test the file encoding method one by one based on the character encoding methods listed in fileencodings. And set fileencoding to the detected character encoding method, which looks correct (note 1. 3. compare the values of fileencoding and encoding. if they are different, call iconv to convert the file content to the character encoding method described by encoding, and put the converted content in the buffer opened for this file, now we can edit this file. Note: To complete this step, you need to call the external iconv. dll (note 2). you need to ensure that this file exists in $ VIMRUNTIME or other columns in the PATH environment variable directory. 4. when saving the file after editing, compare the values of fileencoding and encoding again. If different, call iconv again to convert the text in the buffer to the character encoding method described by fileencoding, and save it to the specified file. Similarly, you need to call iconv. dll because Unicode can contain characters in almost all languages, and the Unicode UTF-8 encoding method is a very cost-effective encoding method (less space consumption than UCS-2 ), therefore, we recommend that you set the encoding value to UTF-8. Another reason for doing so is that when encoding is set to UTF-8, Vim will automatically detect more accurate file encoding methods (maybe this is the main reason ;). For files edited in chinese Windows, to ensure compatibility with other software, the file encoding is set to GB2312/GBK. Therefore, fileencoding is recommended to be set to chinese (chinese is an alias, indicates gb2312 in Unix, cp936 in Windows, that is, the GBK code page ).
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.