Linux View character encoding

Source: Internet
Author: User

View file character encoding and conversion encoding under Linux



The Linux commune (linuxidc.com) registered and opened its website on September 25, 2006, and Linux has now become a
One of the most widely watched and supported operating systems, IDC is an Internet Data center, and LINUXIDC is a data about Linux
Center.



Linuxidc.com provides Linux professional Web sites including Ubuntu,fedora,suse technology, as well as the latest it information.



If you need to operate files under Windows in Linux, you may often encounter
To the issue of file encoding conversion. The default file format for Windows is GBK (gb2312),
And Linux is generally UTF-8. Here's how to view files in Linux
Encoding and how to encode and convert the files.
One, view the file encoding:
There are several ways to view file encodings in Linux:
1. The file encoding can be viewed directly in vim
: Set fileencoding
You can display the file encoding format.
If you just want to view files in other encoded formats or want to solve the problem of viewing files with Vim
Code problem, then you can
Add the following to the ~/.VIMRC file:
Set Encoding=utf-8
fileencodings=ucs-bom,utf-8,cp936
This allows vim to automatically identify the file encoding (which automatically identifies the UTF-8 or GBK
encoded file), in fact, in accordance with the fileencodings provided by the code list attempt,
If no appropriate encoding is found, it is opened with latin-1 (ASCII) encoding.

Www.linuxidc.com

The Linux commune (linuxidc.com) is a Linux professional Web site that includes Ubuntu,fedora,suse technology, the latest IT information, and more.


2. eNCA (if you do not have this command installed on your system, you can use sudo yum to install
-y eNCA installation) view file encoding
$ enca filename
Filename:universal Transformation Format 8 bits; UTF-8
CRLF Line Terminators
It is important to note that eNCA is not very good at identifying certain GBK encoded files,
When it does not appear:
Unrecognized encoding
Second, file encoding conversion
1. Convert file encoding directly into Vim, such as converting a file to Utf-8 format
: Set Fileencoding=utf-8
2. Iconv conversion, the ICONV command format is as follows:
Input/output format specification:
-F,--from-code= name original text encoding
-T,--to-code= name output encoding
Information: Www.Svn8.Com
-L,--list enumeration of all known character sets
Output control:
-C ignores invalid characters from output
-O,--output=file output file svn8.com
-S,--silent off warning
--verbose Printing Progress Information
-?,--Help gives a list of the systems
--usage gives a brief usage information
-V,--version print program version number
Example:
Iconv-f utf-8-T gb2312 aaa.txt >bbb.txt

Www.linuxidc.com

The Linux commune (linuxidc.com) is a Linux professional Web site that includes Ubuntu,fedora,suse technology, the latest IT information, and more.


This command reads the Aaa.txt file, converts from the Utf-8 encoding to gb2312 encoding, and its output is directed to the Bbb.txt file.

Iconv-f ENCODING-T Encoding Inputfile
such as converting a UTF-8 encoded file into a GBK encoding.
Iconv-f gbk-t UTF-8 file1-o file2
3. enconv Conversion File Encoding
For example, to convert a GBK encoded file into UTF-8 encoding, the operation is as follows
Enconv-l zh_cn-x UTF-8 filename
Third, the file name encoding conversion:
Copy files from Linux to Windows or copy files from Windows to Linux
Sometimes the Chinese file name garbled situation, the reason for this problem is because,
The default name encoding for Windows is GBK, while the default filename encoding in Linux
For UTF8, due to inconsistent coding, so the file name garbled problem, to solve this question
The file name needs to be transcoded.
A tool convmv for file name encoding is provided specifically in Linux and can be
Convert the file name from GBK to UTF-8 encoding, or from UTF-8 to GBK.
First look at whether the CONVMV is installed on your system, if it is not installed:
Yum-y Install CONVMV installation.
Here's a look at the specific usage of CONVMV:
For example
Convmv-f gbk-t UTF-8 *.mp3
However, this command does not convert directly, you can see the contrast before and after the conversion. If you want the conversion to be straight, add
Parameter--notest
Convmv-f gbk-t UTF-8--notest *.mp3

Www.linuxidc.com

The Linux commune (linuxidc.com) is a Linux professional Web site that includes Ubuntu,fedora,suse technology, the latest IT information, and more.


The-f parameter indicates the encoding before the conversion, and-T is the converted encoding. Don't make a mistake about it. Or else maybe
is garbled oh. There is one more parameter that is useful. IS-r This indicates that all subdirectories under the current directory are converted recursively.

Convmv-f Source code-T new encoding [options] File name
Common parameters:
-R recursive processing of subfolders
–notest the actual operation, please note that by default the file is not
, but only the experiment.
–list display of all supported encodings
–unescap can be escaped, such as to turn%20 into a space
For example, we have a UTF8 encoded file name, converted to GBK encoding, the command is as follows:
Convmv-f UTF-8-T gbk–notest UTF8 encoded file name
The "UTF8 encoded file name" will be converted to GBK encoding (just the file
Conversion of the name encoding, the contents of the file will not be changed)
Four, vim encoding mode of setting
Like all popular text editors, Vim can be very good at editing various character-coded texts
Of course, including UCS-2, UTF-8 and other popular Unicode encoding methods. However not
Fortunately, as with many software from the Linux world, this requires you to set up your own hands.
Vim has four options related to character encoding, encoding, fileencoding,
Fileencodings, termencoding (the possible values for these options refer to Vim in
Help Encoding-names), they have the following meanings:
* Encoding:vim Internal use of the character encoding method, including the Vim of the buffer
(buffer), menu text, message text, and so on. The default is based on your locale selection.

Www.linuxidc.com

The Linux commune (linuxidc.com) is a Linux professional Web site that includes Ubuntu,fedora,suse technology, the latest IT information, and more.


Only change its value in the. VIMRC, and in fact it seems that only
It makes sense to change its value in. vimrc. You can use a different code to edit and protect
Save the file, such as your Vim's encoding as Utf-8, the edited file takes the cp936
Code, VIM automatically converts the files that are read into Utf-8 (Vim's readable way), and
When you write to the file, it is automatically returned to cp936 (the file's save code).
* The character encoding of the currently edited file in Fileencoding:vim, Vim save
File is also saved as this character encoding (whether or not new files are
This).
* Fileencodings:vim automatic detection of fileencoding sequence list, start
The character encoding of the file that will be opened is
The fileencoding is set to the final detected character encoding method.
It is therefore best to place the Unicode encoding at the top of this list, and the Latin language
The code mode latin1 to the last face.
* Termencoding:vim the terminal (or the console of Windows)
The character encoding of the window). If Vim is in the same term as the VIM code, there is no
Need to be set. If not, you can use Vim's termencoding option to automatically convert to
Term encoding. This option is used under Windows for our common GUI mode
GVim is not valid, while the console mode of VIM is the Windows console
code page, and usually we don't need to change it.
Five, Vim's multi-character encoding working mode
1. Vim starts, set according to the value of encoding set in. VIMRC
The character encoding of buffer, menu text, message text.

Www.linuxidc.com

The Linux commune (linuxidc.com) is a Linux professional Web site that includes Ubuntu,fedora,suse technology, the latest IT information, and more.


2. Read the file you want to edit, according to the character encoding listed in Fileencodings
method to detect the encoding of the file individually. and set the fileencoding for the detected, see
Up is correct (note 1) the character encoding method.
3. Compare the values of fileencoding and encoding, or call iconv if different
Converts the contents of the file to the character encoding described by encoding, and converts the converted
Content into the buffer opened for this file, we can start editing this
The file. Note that you need to call the external Iconv.dll (note 2) to complete this step, and you
It is necessary to ensure that this file exists in $VIMRUNTIME or other columns in the PATH environment
In the contents of the volume.
4. When the edit is complete save the file, compare fileencoding and encoding again
The value. If different, call Iconv again to convert the text in the buffer you are about to save
The character encoding that is described by fileencoding and saved to the specified file.
Again, this requires calling Iconv.dll because Unicode can contain almost all of the language
UTF-8 encoding of Unicode is a very cost-effective way to
Encoding (space consumption is smaller than UCS-2), so it is recommended that the value of encoding be set to
Utf-8. Another reason for doing this is that when encoding is set to Utf-8, Vim is self-
The encoding of the motion detection file will be more accurate (perhaps this is the main reason.) I
Files edited in Chinese Windows, in order to accommodate compatibility with other software, the text
Code or set to GB2312/GBK is appropriate, so fileencoding recommends
Set to Chinese (Chinese is an individual name, represented in Unix gb2312, in
Windows represents cp936, which is GBK's code page).

Linux View character encoding

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.