UnicodeBOM File Processing

Source: Internet
Author: User
Article Title: UnicodeBOM file processing. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.

Recently, I encountered a problem. In Ubuntu 10.04, I cannot use vi to view the Chinese file content in the Unicode BOM encoding format.

System Information:

Ubamount 10.04

Environment variable:

~ $ Echo $ LANG

En_US.utf8

Related Software Versions:

~ $ Vim -- version

VIM-Vi IMproved 7.2 (2008 Aug 9, compiled Apr 16 2010 12:47:47)

Encoded ded patches: 1-330

Compiled by buildd @

$ Enca -- version

Enca 1.12

~ $ Iconv -- version

Iconv (Ubuntu EGLIBC 2.11.1-0ubuntu7) 2.11.1

Use vim to view the file and find that the Chinese content of the file is garbled. Use enca to view the file encoding format:

~ $ Enca-L none test. cgi

Universal transformation format 8 bits; UTF-8

The "none" parameter indicates that the encoding format is unknown and is determined by enca. From the results of enca, the test. cgi file is in UTF-8 encoding format, so use the iconv tool to convert:

~ $ Iconv-l | grep UTF-8

ISO-10646/UTF-8/

UTF-8 //

First check whether iconv supports UTF-8, and the result shows support. Start conversion:

~ $ Iconv-f UTF-8-t GB2312 test. cgi

Iconv: illegal input sequence at position 0

Iconv error: the 0-position encoding format of the file is incorrect and cannot be identified. Therefore, the conversion fails.

Fail to find google.

Finally, you can use file to view the file type:

~ $ File test. cgi

Test. cgi: UTF-8 Unicode (with BOM) text

The problem found, not simple UTF-8 format, but also with a BOM, check the BOM:

Http://unicode.org/faq/utf_bom.html

You can refer to this introduction, simply put, BOM is a label, in the UTF-8 encoding format file with two more bytes, is the two bytes

As a result, vim cannot display Chinese characters.

If you know the problem, you can modify it. There are many tools available in win7 and many tools available in Ubuntu. to be simple, use vim to modify the tool.

~ $ Vim-B test. cgi

We can see that there are two more bytes at the beginning of the file content, and vim is displayed in hexadecimal notation,

   #! /Bin/sh

<> The content in is the hexadecimal BOM label. just delete the label and save the file.

Check the file encoding type again:

~ $ File test. cgi

Test. cgi: POSIX shell script text executable

~ $ Enca-L none test. cgi

Universal transformation format 8 bits; UTF-8

The file tool can identify the test. cgi file as a script, and enca is silly. The result is the same as the original one. The key is iconv conversion:

~ $ Iconv-f UTF-8-t GB2312 test. cgi-o test1.cgi

No conversion problem. No error is reported. The conversion is successful. Use vim to view the conversion again. The Chinese content is normal.

PS: The premise is that your system must install Chinese support.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.