File encoding, file or file name encoding format conversion

Source: Internet
Author: User
If you need to operate files under windows in Linux, then you may often encounter the problem of file encoding conversion. The default file format in Windows is GBK (gb2312), while Linux is generally UTF-8. The following describes how to view the file encoding and how to convert the file encoding in Linux.
One, view the file encoding:
There are several ways to view the file encoding in Linux:
1. You can directly view the file encoding in Vim
: set fileencoding
The file encoding format can be displayed.
If you just want to view files in other encoding formats or want to solve the problem of garbled files viewed with Vim, then you can
Add the following content to the ~ / .vimrc file:
set encoding = utf-8 fileencodings = ucs-bom, utf-8, cp936
In this way, vim can automatically recognize the file encoding (it can automatically identify UTF-8 or GBK encoded files). In fact, it is tried according to the encoding list provided by fileencodings. If no suitable encoding is found, latin-1 (ASCII) encoding turn on.
2. enca (If you do not have this command installed on your system, you can use sudo yum install -y enca to install) to view the file encoding
$ enca filename
filename: Universal transformation format 8 bits; UTF-8
CRLF line terminators
It should be noted that enca does not recognize some GBK-encoded files very well, and it will appear during recognition:
Unrecognized encoding
Second, the file encoding conversion
1. Directly convert the file encoding in Vim, such as converting a file to utf-8 format
: set fileencoding = utf-8
2. Iconv conversion, iconv command format is as follows:
iconv -f encoding -t encoding inputfile
For example, convert a UTF-8 encoded file to GBK encoding
iconv -f GBK -t UTF-8 file1 -o file2
3. Enconv conversion file encoding
For example, to convert a GBK encoded file to UTF-8 encoding, the operation is as follows
enconv -L zh_CN -x UTF-8 filename
Three, file name encoding conversion:
When copying files from Linux to Windows or from Windows to Linux, sometimes the Chinese file name is garbled. The reason for this problem is that the Chinese encoding of the Windows file name defaults to GBK, while the default file name encoding in Linux For UTF8, the file name is garbled due to inconsistent encoding. To resolve this problem, the file name needs to be transcoded.
In Linux, a tool convmv is specially provided to convert the file name encoding, which can convert the file name from GBK to UTF-8 encoding, or from UTF-8 to GBK.
First look at whether convmv is installed on your system, if not, use:
yum -y install convmv
installation.
Let's take a look at the specific usage of convmv:
convmv -f source encoding -t new encoding [options] filename
Common parameters:
-r Recursively process subfolders
--Notest does the actual operation, please note that by default, the file is not actually operated, but only an experiment.
--List show all supported encodings
--Unescap can do some escaping, such as changing% 20 into spaces
For example, we have a utf8 encoded file name, converted to GBK encoding, the command is as follows:
convmv -f UTF-8 -t GBK --notest utf8 encoded file name
After this conversion, the "utf8 encoded file name" will be converted to GBK encoding (just the conversion of the file name encoding, the file content will not change)
Fourth, the setting of vim encoding
Like all popular text editors, Vim can edit various character encoding files very well, which of course includes popular Unicode encoding methods such as UCS-2 and UTF-8. Unfortunately, like many software from the Linux world, this requires you to set it up yourself.
Vim has four options related to character encoding, encoding, fileencoding, fileencodings, termencoding (for possible values of these options, please refer to Vim online help: help encoding-names), their meanings are as follows:
* encoding: Vim's internal character encoding method, including Vim's buffer (buffer), menu text, message text, etc. The default is to choose according to your locale. The user manual recommends changing its value only in .vimrc. In fact, it seems that it only makes sense to change its value in .vimrc. You can use another encoding to edit and save the file. For example, if the encoding of your vim is utf-8, and the edited file uses cp936 encoding, vim will automatically convert the read file into utf-8 Understand the way), and when you write a file, it will automatically switch back to cp936 (the file save encoding).
* fileencoding: The character encoding of the currently edited file in Vim. When Vim saves the file, it will also save the file as this character encoding (regardless of whether it is a new file)
* fileencodings: Vim automatically detects the sequential list of fileencoding. When starting, it will detect the character encoding of the file to be opened one by one according to the character encoding it lists, and set fileencoding to the final character encoding detected. Therefore, it is best to put the Unicode encoding at the front of this list, and the Latin encoding at the end.
* termencoding: The character encoding mode of the terminal (or Windows console window) that Vim works on. If the term where vim is located is the same as the vim encoding, you do not need to set it. If it is not the case, you can use vim's termencoding option to automatically convert to term encoding. This option is not effective for gVim in our common GUI mode under Windows, and for Console mode Vim is the code page of the Windows console, and Usually we don't need to change it.
Five, Vim's multi-character encoding method
1. Start Vim, and set the character encoding of buffer, menu text, and message according to the encoding value set in .vimrc.
2. Read the file to be edited and detect the file encoding method one by one according to the character encoding methods listed in fileencodings. And set fileencoding to be detected, it looks correct (Note 1) Character encoding.
3. Compare the values of fileencoding and encoding. If they are different, call iconv to convert the file content to the character encoding method described by encoding, and put the converted content into the buffer developed for this file. This file. Note that to complete this step, you need to call external iconv.dll (Note 2). You need to ensure that this file exists in $ VIMRUNTIME or other directories listed in the PATH environment variable.
4. When saving the file after editing, compare the values of fileencoding and encoding again. If they are different, call iconv again to convert the text in the buffer to be saved to the character encoding described in fileencoding and save it to the specified file. Similarly, this requires calling iconv.dll. Because Unicode can contain characters of almost all languages, and UTF-8 encoding of Unicode is a very cost-effective encoding (space consumption is less than UCS-2), so the value of encoding is recommended. Set to utf-8. Another reason for this is that when encoding is set to utf-8, Vim automatically detects the encoding of the file to be more accurate (perhaps the main reason;). For the files we edit in Chinese Windows, in order to take into account compatibility with other software, the file encoding is still set to GB2312 / GBK, so fileencoding is recommended to be set to Chinese (chinese is an individual name, which means gb2312 in Unix and Windows cp936, which is the code page of GBK).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.