File encoding, file or file name encoding format conversion

Source: Internet
Author: User
If you need to operate files in Windows in Linux, you may frequently encounter file encoding conversion problems. In Windows, the default file format is GBK (gb2312), while Linux is generally a UTF-8. The following describes how to view the file encoding in Linux and how to convert the file encoding. 1. view the file encoding:

You can view the file encoding in Linux in the following ways:
1. You can directly view the file encoding in vim.

: Set fileencoding
The file encoding format is displayed.
If you only want to view files in other encoding formats or want to solve the problem of using Vim to View File garbled characters, you can
~ /. Add the following content to the vimrc file: set encoding = UTF-8 fileencodings = ucs-bom, UTF-8, cp936, vim can automatically identify the file encoding (can automatically identify the UTF-8 or GBK encoding files), in fact, according to the fileencodings provided by the encoding list to try, if not find the appropriate encoding, it is opened in Latin-1 (ASCII) encoding. 2. enca (if this command is not installed in your system, you can use sudo Yum install-y enca to install it) to view the file encoding.

$ Enca filename
Filename: Universal Transformation Format 8 bits; UTF-8
CRLF line Terminators
Note that enca does not recognize some GBK-encoded files very well:
Unrecognized Encoding
Ii. file encoding and conversion

1. Convert the file encoding directly in Vim. For example, convert a file to UTF-8 format.

: Sets fileencoding = UTF-8 2. iconv conversion. The iconv command format is as follows:

Iconv-F encoding-T encoding inputfile
For example, converting a UTF-8-encoded file into GBK Encoding
Iconv-f gbk-T UTF-8 file1-O file2 3. enconv conversion file encodingFor example, to convert a GBK encoded file into a UTF-8 code, the operation is as follows:
Enconv-l zh_cn-x UTF-8 filename Iii. File Name encoding conversion:From Linux
If you copy files in Windows or copy files from windows to Linux, Chinese file names may be garbled. The reason for this problem is that Windows File Names
The default Chinese encoding is GBK, while the default file name encoding in Linux is utf8. Because the encoding is inconsistent, the file name is garbled. To solve this problem, you need to transcode the file name. A tool is provided in Linux. Convmv
To convert the file name encoding, you can convert the file name from GBK to UTF-8 encoding, or from the UTF-8 to GBK. First, check whether convmv is installed on your system. If not, use:
Yum-y install convmv
Install. Next, let's take a look at the specific usage of convmv: convmv-f Source Code-t new encoding [Options] common parameters for file names:
-R recursive processing of subfolders
-Notest: the actual operation is not performed on the file by default, but is only a test.
-List: displays all supported codes.
-UNESCAP can be used as an escape, for example, % 20 is converted into a space.
For example, we have a utf8 encoded file name, converted to GBK encoding, the command is as follows: convmv-F UTF-8-t gbk-notest utf8 encoded file name
In this way, the "utf8 encoded file name" will be converted to GBK encoding (only the file name encoding conversion will not change the file content) Iv. Vim EncodingLike all popular text editors, VIM can well edit a variety of character encoding files, which of course include popular unicode encoding methods such as UCS-2 and UTF-8. Unfortunately, like a lot of software from the Linux world, you need to set it yourself. Vim has four options related to the character encoding method: encoding, fileencoding, fileencodings, and termencoding. for possible values of these options, see Vim online help: Help encoding-names ), their significance is as follows: * encoding: the internal character encoding method used by VIM, including the vim Buffer
(Buffer), menu text, message text, etc. The default setting is based on your locale. We recommend that you only use. vimrc in the user manual.
In fact, it seems that only in. vimrc
To make sense. You can use another encoding method to edit and save files. For example, if your vim encoding is UTF-8, the edited file uses cp936 encoding. Vim will
The read file is automatically converted to UTF-8 (Vim can read), and when you write the file, it is automatically converted back to cp936 (file storage encoding ). * fileencoding: The character encoding method of the file currently edited in Vim. When saving the file, VIM also saves the file as this encoding method (whether new files are used or not ). * Fileencodings:
Vim automatically detects the fileencoding sequence list. At startup, it detects the character encoding methods of the files to be opened one by one according to the character encoding methods listed in it, and
Fileencoding is set to the final detected character encoding method. Therefore, it is best to put the Unicode encoding method at the beginning of this list, and the Latin encoding method
Put Latin1 at the end. * Termencoding: the terminal used by VIM (or the Windows Console window)
. If Vim is encoded in the same term as vim, you do not need to set it. Otherwise, you can use the termencoding option of VIM to automatically convert to the term
This option is invalid for gvim in common GUI mode in windows, while Vim in Console mode is
The code page of the Windows console, and we usually do not need to change it. 5. Vim multi-character encoding1. Start vim and set the encoding mode of the buffer, menu text, and message text based on the encoding value set in. vimrc. 2. Read the file to be edited and test the file encoding method one by one based on the character encoding methods listed in fileencodings. And set fileencoding to the detected character encoding method, which looks correct (note 1. 3. Compare fileencoding and encoding values. If they are different, call iconv to convert the file content to encoding.
Description of the character encoding method, and put the converted content to the buffer opened for this file
Now we can edit this file. Note: To complete this step, you need to call the external iconv. dll (note 2). You need to ensure that this file exists in
$ Vimruntime or other columns in the directory of the PATH environment variable. 4. When saving the file after editing, compare the values of fileencoding and encoding again. If different, call iconv again
Convert the text in the buffer to the character encoding method described by fileencoding and save it to the specified file. Similarly, you need to call
Iconv. dll since Unicode can contain characters in almost all languages and Unicode UTF-8
The encoding method is a very cost-effective encoding method (space consumption is smaller than the UCS-2), so we recommend that the value of encoding be set to UTF-8. Another reason for doing so is
When encoding is set to UTF-8, VIM automatically detects more accurate file encoding methods (maybe this is the main reason ;). In Chinese Windows
To ensure compatibility with other software, it is recommended to set the file encoding to gb2312/GBK.
Chinese (Chinese is an alias, gb2312 in UNIX, cp936 in windows, that is, GBK
).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.