Linux Command-iconv

Source: Internet
Author: User

Original article link

Description

The iconv command is used to convert the file encoding method (convert encoding of given files from one encoding to another). For example, it can convert utf8 encoding to gb18030 encoding, which is the opposite. JDK also provides a similar tool native2ascii. The iconv Development Library in Linux includes C functions such as iconv_open, iconv_close, and iconv, which can be used in C/C ++ProgramIt is very convenient to convert character encoding, which is useful in programs that capture web pages, and iconv commands are used to debug such programs.

Common Parameters

First, we need to know which character encodings are supported. This can be obtained using the-l parameter (List known coded character sets ).

Format: iconv-l

Second, how to convert, as shown below:

Format: iconv-F from-encoding-t to-encoding inputfile

The above call method prints the output on the screen. If you want to output the output to a file, you can do the following:

Format: iconv-F from-encoding-t to-encoding inputfile-O outputfile

Use Example 1 to list supported character codes

[Root @ new55 ~] # Iconv-l
The following list contain all the coded character sets known. This does
Not necessarily mean that all combinations of these names can be used
The from and to command line parameters. One coded character set can be
Listed with several different names (aliases ).
437,500,500 V1, 850,851,852,855,856,857,860,861,862,863,864,865,
866,866 Nav, 869,874,904,102 6, 1046,104 7, 8859_1, 8859_2, 8859_3, 8859_4,
8859_5, 8859_6, 8859_7, 8859_8, 8859_9, 10646-1: 1993,106 46-1: 1993/ucs4,
ANSI_X3.4-1968, ANSI_X3.4-1986, ansi_x3.4, ANSI_X3.110-1983, ansi_x3.110,
Arabic, arabic7, ARMSCII-8, ASCII, ASMO-708, asmo_449, Baltic, big-5,
Big-five, BIG5-HKSCS, big5, big5hkscs, bigfive, bs_4730, CA, CN-BIG5, CN-GB,
The output is omitted in the middle.
EUCJP-OPEN, EUCJP-WIN, eucjp, euckr, euctw, Fi, FR, GB, gb2312, gb13000,
Gb18030, GBK, GB_1988-80, gb_198880, Georgian-Academy, GEORGIAN-PS,
GOST_19768-74, gost_19768, gost_1976874, GREEK-CCITT, Greek, GREEK7-OLD,
Greek7, greek7old, greek8, greekccitt, Hebrew, HP-ROMAN8, hproman8, Hu,
The output is omitted in the middle.
TIS620.2529-1, TIS620.2533-0, tis620, TS-5881, tscii, UCS-2, UCS-2BE,
UCS-2LE, UCS-4, UCS-4BE, UCS-4LE, ucs2, ucs4, uhc, ujis, UK, Unicode,
Unicodebig, unicodelittle, US-ASCII, US, UTF-7, UTF-8, UTF-16, UTF-16BE,
UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, utf7, utf8, UTF16, utf16be, utf16le,
UTF32, utf32be, utf32le, viscii, wchar_t, WIN-SAMI-2, winbaltrim,
WINDOWS-31J, Windows-874, Windows-936, Windows-1250, Windows-1251,
Windows-1252, Windows-1253, Windows-1254, Windows-1255, Windows-1256,
Windows-1257, Windows-1258, winsami2, WS2, Yu

Too many. I just want to know which Chinese formats are supported.
[Root @ new55 ~] # Iconv-L | grep GB
CN-GB //
Csgb2312 //
Csiso58gb1988 //
EBCDIC-CP-GB //
GB //
Gb2312 //
Gb13000 //
Gb18030 //
GBK //
GB_1988-80 //
Gb_198880 //
ISO646-GB //

Are there any strange things? One Line is displayed, and two slashes are added at the end.
[Root @ new55 ~] #

Example 2: Convert Google Hong Kong's big5 encoding to GBK Encoding

[Root @ new55 ~] # Curl-s http://www.google.com.hk/| iconv-F big5-T GBK
<! Doctype HTML> The output is omitted in the middle.
})();
</SCRIPT> [root @ new55 ~] #

Example 3: Convert the homepage of my javaeye blog from utf8 to GBK

[Root @ new55 ~] # Curl-s http://codingstandards.javaeye.com/| iconv-F utf8-T GBK
<! Doctype HTML public "-// W3C // dtd xhtml 1.0 transitional // en" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<HTML xmlns = "http://www.w3.org/1999/xhtml" XML: lang = "ZH-CN" dir = "LTR">
<Head>
<Meta http-equiv = "Content-Type" content = "text/html; charset = UTF-8"/>
<Title> bash @ Linux-javaeye technical website </title>
<Meta name = "Description" content = ""/>
<Meta name = "keywords" content = "codingstandards bash @ Linux"/>
The output is omitted in the middle.
<Div class = "blog_main">
<Div class = "blog_title">
<Div class = "date"> <SPAN class = 'Year'> 2010 </span> <SPAN class = 'sep _ year'>-</span> <SPAN class = 'month'> 10 </span> <SPAN class = 'sep _ month'>-</span> <SPAN class = 'day'> 17 </span> </Div>
<Div class = "show_full_flag"> <a href = '? Show_full = true '> full text display </a> </div>
<H3> <a href = '/blog/8080'> [stick to the top] Linux Command series directory I have used </a> <Strong> Article Category: <a href = "http://www.javaeye.com/blogs/category/ OS" style = "text-Decoration: none; padding-Right: 10px;"> Operating System </a> </strong>
</Div>
<Div class = "blog_content">
Total directories of Linux Command series I have used
Link: http://codingstandards.javaeye.com/blog/786653
Iconv: Invalid input sequence at unknown 3345

The last line indicates an error. If you use the following line, the operation is successful.
[Root @ new55 ~] # Curl-s http://codingstandards.javaeye.com/| iconv-F utf8-T gb18030

Output is omitted here. If you are interested, you can try to display the entire pageSource code. Because GBK is a subset of gb18030, gb18030 contains more characters.

[Root @ new55 ~] #

Example 4 convert utf8 of mengzhidu to GBK

[Root @ new55 ~] # Curl-s http://www.dreamdu.com/| iconv-futf8-T GBK
Iconv: Invalid input sequence at unknown 0

if there is a problem, use hexdump to check the bytes and find the BOM information of ef bb bf. iconv does not support this information.
[root @ new55 ~] # Curl-s http://www.dreamdu.com/| hexdump-c | less
00000000 ef bb bf 3C 21 44 4f 43 54 59 45 20 68 74 6D |... 00000010 6C 20 50 42 4C 49 43 20 22 2D 2f 57 33 43 | L Public "-// W3C |
00000020 2f 2f 44 54 44 20 58 48 54 4D 4C 20 31 2E 30 20 | // dtd xhtml 1.0 |
00000030 53 74 72 69 63 74 2f 45 4E 22 20 22 68 74 | strict // en "" htt |
00000040 70 3A 2f 2f 77 77 77 2E 77 33 2E 6f 72 67 2f 54 | P: // www.w3.org/T |
00000050 52 2f 78 68 74 6D 6C 31 2f 44 54 44 2f 78 68 74 | r/xhtml1/DTD/xht |
00000060 6d 6c 31 2D 73 74 72 69 63 74 2E 64 74 64 22 3E | ml1-strict.dtd "> |
00000070 0d 0a 3C 68 74 6D 6C 20 78 6D 6C 6e 73 3D 22 68 | ..

: Q

Try removing the first three bytes.

[Root @ new55 ~] # Curl-s http://www.dreamdu.com/| cut-B 4-| iconv-futf8-T GBK
& Lt ;! Doctype HTML public "-// W3C // dtd xhtml 1.0 strict // en" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
ML xmlns = "http://www.w3.org/1999/xhtml" XML: lang = "ZH-CN" dir = "LTR" & gt;
EAD & gt;
Meta http-equiv = "Content-Type" content = "text/html; charset = UTF-8"/& gt;
Meta http-equiv = "content-language" content = "ZH-CN"/& gt;
LINK rel = "stylesheet" type = "text/CSS" href = "/style.css? V = 1 "Media =" screen "/& gt;
Script Type = "text/JavaScript" src = "/JS. js" & gt; </SCRIPT>
Title & gt; mengzhidu-website design and development tutorial </title>
Head & gt;
Ody & gt;

The output is omitted in the middle.
Body & gt;
TML & gt;

No problem found. The first few characters in each line have disappeared !!!
[Root @ new55 ~] #

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.