Article Title: UnicodeBOM file processing. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
Recently, I encountered a problem. In Ubuntu 10.04, I cannot use vi to view the Chinese file content in the Unicode BOM encoding format.
System Information:
Ubamount 10.04
Environment variable:
~ $ Echo $ LANG
En_US.utf8
Related Software Versions:
~ $ Vim -- version
VIM-Vi IMproved 7.2 (2008 Aug 9, compiled Apr 16 2010 12:47:47)
Encoded ded patches: 1-330
Compiled by buildd @
$ Enca -- version
Enca 1.12
~ $ Iconv -- version
Iconv (Ubuntu EGLIBC 2.11.1-0ubuntu7) 2.11.1
Use vim to view the file and find that the Chinese content of the file is garbled. Use enca to view the file encoding format:
~ $ Enca-L none test. cgi
Universal transformation format 8 bits; UTF-8
The "none" parameter indicates that the encoding format is unknown and is determined by enca. From the results of enca, the test. cgi file is in UTF-8 encoding format, so use the iconv tool to convert:
~ $ Iconv-l | grep UTF-8
ISO-10646/UTF-8/
UTF-8 //
First check whether iconv supports UTF-8, and the result shows support. Start conversion:
~ $ Iconv-f UTF-8-t GB2312 test. cgi
Iconv: illegal input sequence at position 0
Iconv error: the 0-position encoding format of the file is incorrect and cannot be identified. Therefore, the conversion fails.
Fail to find google.
Finally, you can use file to view the file type:
~ $ File test. cgi
Test. cgi: UTF-8 Unicode (with BOM) text
The problem found, not simple UTF-8 format, but also with a BOM, check the BOM:
Http://unicode.org/faq/utf_bom.html
You can refer to this introduction, simply put, BOM is a label, in the UTF-8 encoding format file with two more bytes, is the two bytes
As a result, vim cannot display Chinese characters.
If you know the problem, you can modify it. There are many tools available in win7 and many tools available in Ubuntu. to be simple, use vim to modify the tool.
~ $ Vim-B test. cgi
We can see that there are two more bytes at the beginning of the file content, and vim is displayed in hexadecimal notation,
#! /Bin/sh
<> The content in is the hexadecimal BOM label. just delete the label and save the file.
Check the file encoding type again:
~ $ File test. cgi
Test. cgi: POSIX shell script text executable
~ $ Enca-L none test. cgi
Universal transformation format 8 bits; UTF-8
The file tool can identify the test. cgi file as a script, and enca is silly. The result is the same as the original one. The key is iconv conversion:
~ $ Iconv-f UTF-8-t GB2312 test. cgi-o test1.cgi
No conversion problem. No error is reported. The conversion is successful. Use vim to view the conversion again. The Chinese content is normal.
PS: The premise is that your system must install Chinese support.