1. Use the iconv command to encode and convert the File Content
Usage: iconv [option...] [file...]
The following options are available:
Input/output format specifications:
-F, -- from-code = Name Original Text Encoding
-T, -- to-code = Name output Encoding
Information:
-L, -- list lists all known character sets
Output Control:
-C: Ignore invalid characters from the output
-O, -- output = FILE: output FILE
-S, -- silent close warning
-- Verbose prints the progress information
-?, -- Help: Provides the system's help list
-- Usage provides brief usage information
-V, -- version print the program version number
Example:
Iconv-f gb2312-t UTF-8 aaa.txt> bbb.txt
This command reads the aaa.txt file and converts it from gb2312 to utf-8, and the output is directed to the bbb.txt file. Note: In windows, the txt generated by the WordPad is generally gb18030 encoded. If an error is specified, the following error is returned: iconv: the input sequence at unknown 6071 is invalid.
Ii. File Name encoding and conversion because linux is used now, all files in windows are encoded using GBK. Therefore, copying to linux is garbled, and the file content can be converted using iconv. However, many Chinese file names are still garbled. Find a command that can convert the file name encoding, that is, convmv. Convmv Command Parameters
For example convmv-f GBK-t UTF-8 *. mp3 but this command won't convert directly, you can see the comparison before and after conversion. If you want a straight conversion to add the parameter -- notestconvmv-f GBK-t UTF-8 -- notest *. The mp3-f parameter is the encoding before the conversion, and-t is the encoding after the conversion. Do not make a mistake. Otherwise it may be garbled. Another parameter is useful. -R indicates recursively converting all subdirectories in the current directory. * Need to install convmv-1.10-1.el5.noarch.rpm 3, better silly command line tool enca, it can not only intelligently identify file encoding, but also support batch conversion.
1. Install
$ Sudo apt-get install enca
2. view the current file encoding
Enca-L zh_CN ip.txt
Simplified Chinese National Standard; GB2312
Surrounded by/intermixed with non-text data
3. Conversion
Command Format:
$ Enca-L current language-x destination encoded file name
For example, convert all files in the current directory to UTF-8.
Enca-L zh_CN-x UTF-8 *
Enca-L zh_CN file check file encoding
Enca-L zh_CN-x UTF-8 file to convert file encoding to "UTF-8" Encoding
Enca-L zh_CN-x UTF-8 <file1> file2 can do this if you don't want to overwrite the original file.
From: tonychiu.blog.51cto.com