First, how to convert files:
The format of line breaks for DOS/Windows and Linux/Unix files is different. DOS/Windows-based text files have a CR (line breaks) and LF (line breaks) at the end of each line ), UNIX text only has one line feed.
1. Move the files in Dos/Windows to Linux/Unix.
Although many programs do not care about CR/LF text files in DOS/Windows format, there are several programs that care about it-the most famous is bash. As long as you press enter, it will cause problems. The following sed calls convert DOS/Windows text to a trusted UNIX format:
$ Sed-e's/. $ // 'mydos.txt> myunix.txt
The script works very easily: the replacement rule expression matches the last character of a row, and the character is exactly the carriage return. We can replace it with an empty character to completely delete it from the output. If you use this script and notice that the last character of each line in the output has been deleted, you specify a text file that is already in UNIX format. So there is no need to do that!
2. Move the Linux/UNIX text to the Windows system and use the following script to perform the required format conversion:
$ Sed-e's/$/\ r/'myunix.txt> mydos.txt
In this script, the '$' rule expression matches the end of the row, and '\ R' tells sed to insert a carriage return before it. Insert a carriage return before line feed. Immediately, each line ends with CR/LF. Note that '\ R' is replaced with CR only when GNU sed 3.02.80 or later is used '.
Second, file encoding
1. view the file encoding.
You can use: set fileencoding in VI to view the encoding of the current file.
2. Use the iconv command to encode and convert the File Content
Usage: iconv [option...] [file...]
The following options are available:
Input/output format specifications:
-F, -- from-code = Name Original Text Encoding
-T, -- to-code = Name output Encoding
Example:
Iconv-f gb2312-t UTF-8 aaa.txt> bbb.txt
This command reads the aaa.txt file and converts it from gb2312 to utf-8, and the output is directed to the bbb.txt file.
3. File Name encoding conversion
Convmv can convert the Chinese file name of GBK-encoded files in windows into UTF-8 encoding.
Convmv-f GBK-t UTF-8 *. mp3
However, this command does not convert directly. You can see the comparison before and after conversion. If you want to add the parameter -- notest to the direct conversion
Convmv-f GBK-t UTF-8 -- notest *. mp3
-F indicates the encoding before conversion, and-t indicates the encoding after conversion. -R indicates recursively converting all subdirectories in the current directory.
4. enca not only intelligently identifies file encoding, but also supports batch conversion.
1) view the current file encoding
Enca-L zh_CN ip.txt
Simplified Chinese National Standard; GB2312
Surrounded by/intermixed with non-text data
2) Conversion
Command Format:
$ Enca-L current language-x destination encoded file name
For example, convert all files in the current directory to UTF-8.
Enca-L zh_CN-x UTF-8 *
Enca-L zh_CN file check file encoding
Enca-L zh_CN-x UTF-8 file to convert file encoding to "UTF-8" Encoding
Enca-L zh_CN-x UTF-8 <file1> file2 can do this if you don't want to overwrite the original file.
Reference: http://www.bkjia.com/ OS /201110/106727.html