Linux view file encoding, file encoding format conversion and file name encoding conversion

Source: Internet
Author: User

Linux related 2008-10-07 10:46 read 1392 comments 0 font size: Big middle and small if you need to operate files under Windows in Linux, you may often encounter problems with file encoding conversions. The default file format in Windows is GBK (gb2312), and Linux is generally UTF-8. Here's how to view the encoding of a file in Linux and how to encode and convert the file.


View File Encoding
There are several ways to view file encodings in Linux:
1. The file encoding can be viewed directly in vim
: Set fileencoding
You can display the file encoding format.
If you just want to see other encoded files or if you want to solve the problem of viewing files garbled with Vim, you can
Add the following to the ~/.VIMRC file:

Set Encoding=utf-8 fileencodings=ucs-bom,utf-8,cp936

This allows vim to automatically identify the file encoding (which automatically identifies the UTF-8 or GBK encoded file), in fact, in accordance with the fileencodings provided by the encoding list to try, if not found the appropriate encoding, the latin-1 (ASCII) encoding opened.



File Encoding Conversion
1. Convert file encoding directly into Vim, such as converting a file to Utf-8 format
: Set Fileencoding=utf-8



2. Iconv conversion, the ICONV command format is as follows:
Iconv-f ENCODING-T Encoding Inputfile
such as converting a UTF-8 encoded file into a GBK encoding.
Iconv-f gbk-t UTF-8 file1-o file2





File name Encoding conversion:



From Linux to the Windows copy files or from Windows to Linux copy files, sometimes the Chinese file name garbled, the cause of this problem is because the Windows file name Chinese encoding defaults to GBK, and the default file name in Linux is encoded as UTF8, Because the encoding is inconsistent, so the file name garbled problem, to solve this problem need to transcode the file name.

A tool convmv for file name encoding is provided specifically in Linux, which converts the file name from GBK to UTF-8 encoding, or from UTF-8 to GBK.

First look at whether the CONVMV is installed on your system, if it is not installed:
Yum-y Install CONVMV
Installation.



Here's a look at the specific usage of CONVMV:

Convmv-f Source code-T new encoding [options] File name

Common parameters:
-R recursive processing of subfolders
--notest the actual operation, please note that by default, the file does not actually operate, but only the experiment.
--list display of all supported encodings
--unescap can be escaped, such as to turn%20 into a space
For example, we have a UTF8 encoded file name, converted to GBK encoding, the command is as follows:

Convmv-f UTF-8-T GBK--notest UTF8 encoded file name



After this conversion "UTF8 encoded file name" will be converted to GBK encoding (only the file name encoding conversion, the contents of the files will not change)




Setting of VIM encoding mode



Like all popular text editors, Vim can be very good at editing various character-encoded files, which of course includes popular Unicode encoding methods such as UCS-2, UTF-8, and so on. Unfortunately, as with many software from the Linux world, this requires you to set up your own hands.

Vim has four options related to character encoding, encoding, fileencoding, Fileencodings, termencoding (the possible values for these options can be found in the VIM online assistance: Help Encoding-names), They have the following meanings:

* Encoding:vim Internal use of the character encoding, including Vim buffer (buffers), menu text, message text and so on. The default is based on your locale selection. The user manual recommends changing its value only in. VIMRC, which in fact seems to only make sense if the value is changed in. vimrc. You can use another encoding to edit and save the file, such as your vim encoding for Utf-8, the edited file is cp936 encoded, VIM will automatically convert the file read into Utf-8 (Vim can read the way), and when you write to the file, It automatically turns back into cp936 (the file's save code).

* The character encoding of the currently edited file in Fileencoding:vim, Vim saves the file as well as this character encoding (regardless of whether the new file is the same).

* Fileencodings:vim automatically detects the sequential list of fileencoding, which detects the character encoding of the file to be opened by the character encoding it lists, and sets the fileencoding to the final detected character encoding method. It is therefore best to place Unicode encoding at the top of this list, Latin1 Latin encoding to the last side.

* The character encoding of the terminal (or Windows Console window) that the Termencoding:vim is working on. If Vim is in the same term as the VIM code, no setting is required. If not, you can use Vim's termencoding option to automatically convert to term encoding. This option is not valid for the gVim of our common GUI mode under Windows, and for the console mode vim is the Windows console code page, and Often we don't need to change it.

Well, after explaining this pile of parameters that are easy for beginners to confuse, let's look at how Vim's multi-character encoding support works.

1. Vim starts, sets the character encoding of buffer, menu text, message text according to the value of encoding set in. vimrc.

2. Read the file that needs to be edited, and probe the file encoding method according to the character encoding listed in Fileencodings. and set the fileencoding to be detected, it appears to be correct (note 1) the character encoding method.

3. Compare the values of fileencoding and encoding, if different call Iconv to convert the contents of the file to encoding described by the character encoding, and put the converted content into the buffer opened for this file, we can begin to edit this file. Note that you need to call external Iconv.dll (note 2) to complete this step, and you need to ensure that the file exists in $VIMRUNTIME or other directory in the PATH environment variable.

4. When you save the file after editing is complete, compare the values of fileencoding and encoding again. If it is different, call Iconv again to convert the text in the saved buffer to the character encoding described by fileencoding and save it to the specified file. Again, this requires calling Iconv.dll because Unicode can contain characters in almost all languages, and Unicode's UTF-8 encoding is a very cost-effective encoding (less space consuming than UCS-2), so it is recommended that the value of encoding be set to Utf-8 。 Another reason for this is that when encoding is set to Utf-8, the Vim Auto-detect file is encoded more accurately (perhaps this is the main;). We are editing files in Chinese Windows, in order to take into account the compatibility with other software, file encoding or set to GB2312/GBK more appropriate, so fileencoding recommended set to Chinese (Chinese is an individual name, in Unix represents gb2312 , which in Windows represents cp936, the code page for GBK).

Linux view file encoding, file encoding format conversion and file name encoding conversion

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.