Use iconv programming for character set Conversion

Source: Internet
Author: User

During encoding and conversion on Linux, you can use both the iconv function family programming and the iconv command, but the latter is for files, converts a specified file from one encoding to another.

I. Use the iconv function family for encoding and conversion
The header file of the iconv function family is iconv. H, which must be included before use.

# Include <iconv. h>
The iconv function family has three functions. The prototype is as follows:

1) iconv_t iconv_open (const char * tocode, const char * fromcode );
This function indicates which two types of encoding are to be converted. tocode is the target encoding and fromcode is the original encoding. This function returns a conversion handle for the following two functions.
(2) size_t iconv (iconv_t CD, char ** inbuf, size_t * inbytesleft, char ** outbuf, size_t * outbytesleft );
This function reads characters from inbuf and outputs the converted characters to outbuf. inbytesleft records the number of characters that have not been converted, and outbytesleft records the remaining space of the output buffer.
(3) int iconv_close (iconv_t CD );
This function is used to close the conversion handle and release resources.

Example 1: A conversion example in C LanguageProgram

# Include <iconv. h>
# Include <stdio. h>
# Deprecision outlen 255

/*CodeConversion: Convert from one encoding to another */

Int code_convert (char * from_charset, char * to_charset, char * inbuf, int inlen, char * outbuf, int outlen)
{
Iconv_t CD;
Int RC;
Char ** pin = & inbuf;
Char ** pout = & outbuf;

Cd = iconv_open (to_charset, from_charset );
If (Cd = 0) Return-1;
Memset (outbuf, 0, outlen );
If (iconv (Cd, pin, & inlen, pout, & outlen) =-1) Return-1;
Iconv_close (CD );
Return 0;
}
/* Convert Unicode code to gb2312 Code */
Int u2g (char * inbuf, int inlen, char * outbuf, int outlen)
{
Return code_convert ("UTF-8", "gb2312", inbuf, inlen, outbuf, outlen );
}
/* Convert the gb2312 code to the Unicode Code */
Int g2u (char * inbuf, size_t inlen, char * outbuf, size_t outlen)
{
Return code_convert ("gb2312", "UTF-8", inbuf, inlen, outbuf, outlen );
}

Void main ()
{
Char * in_utf8 = "e? Why ?? Why? ";
Char * in_gb2312 = "installing ";
Char out [outlen];

Int RC;

/* Convert Unicode code to gb2312 Code */
Rc = u2g (in_utf8, strlen (in_utf8), Out, outlen );
Printf ("Unicode --> gb2312 out = % Sn", out );
// Convert the gb2312 code to the Unicode code

Rc = g2u (in_gb2312, strlen (in_gb2312), Out, outlen );
Printf ("gb2312 --> Unicode out = % Sn", out );
}

Example 2: A conversion example program in C ++

/* F. cpp: Code Conversion example c ++ Program */
# Include <iconv. h>
# Include <iostream>

# Deprecision outlen 255

Using namespace STD;

// Code conversion operation class

Class codeconverter {

PRIVATE:
Iconv_t CD;
Public:

// Construct
Codeconverter (const char * from_charset, const char * to_charset ){
Cd = iconv_open (to_charset, from_charset );
}

// Structure
~ Codeconverter (){
Iconv_close (CD );
}

// Conversion output
Int convert (char * inbuf, int inlen, char * outbuf, int outlen ){
Char ** pin = & inbuf;
Char ** pout = & outbuf;

Memset (outbuf, 0, outlen );
Return iconv (Cd, pin, (size_t *) & inlen, pout, (size_t *) & outlen );
}
};

Int main (INT argc, char ** argv)
{
Char * in_utf8 = "e? Why ?? Why? ";
Char * in_gb2312 = "installing ";
Char out [outlen];

// UTF-8 --> gb2312
Codeconverter cc = codeconverter ("UTF-8", "gb2312 ");
Cc. Convert (in_utf8, strlen (in_utf8), Out, outlen );
Cout <"UTF-8 --> gb2312 in =" <in_utf8 <", out =" <out <Endl;

// Gb2312 --> UTF-8
Codeconverter CC2 = codeconverter ("gb2312", "UTF-8 ");
Cc2.convert (in_gb2312, strlen (in_gb2312), Out, outlen );
Cout <"gb2312 --> UTF-8 in =" <in_gb2312 <", out =" <out <Endl;
}

 

Ii. Use the iconv command for encoding and conversion

The iconv command is used to convert the encoding of a specified file. By default, it is output to a standard output device, or an output file.

Usage: iconv [option...] [file...]

The following options are available:

Input/output format specifications:
-F, -- from-code = Name Original Text Encoding
-T, -- to-code = Name output Encoding

Information:
-L, -- list lists all known character sets

Output Control:
-C: Ignore invalid characters from the output
-O, -- output = file: output file
-S, -- silent close warning
-- Verbose prints the progress information

-?, -- Help: Provides the system's help list
-- Usage provides brief usage information
-V, -- version print the program version number

Example:
Iconv-F UTF-8-T gb2312 aaa.txt> bbb.txt
This command reads the aaa.txt file and converts it from the 8th to the gb2312th file, and the output is directed to the bbb.txt file.

 

Overview
Iconv is a library that uses Unicode as the intermediate code to convert various internal codes. It basically covers all the coding methods in the world, for example, ASCII, gb2312, GBK, gb18030, big5, UTF-8, UCS-2, UCS-2BE, UCS-2LE, UCS-4, UCS-4BE, UCS-4LE, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, etc, in addition, it also includes the coding of Thai, Japanese, Korean, Western Europe and other countries. The following shows how to use iconv to implement conversion from big5 to gb2312. Of course, you only need to modify it to implement iconv to support conversion between any codes.

Download
Libiconv is a Linux version of iconv that can be downloaded in the http://www.gnu.org/software/libiconv/
Iconv Win32 versions can be downloaded in the http://gnuwin32.sourceforge.net/packages/libiconv.htm

SVN source code
In addition, there are some DEMO code that can be downloaded from my SVN.
Http://xcyber.googlecode.com/svn/trunk/Convert/

DEMO code

/*************************************** *************************************
* Big5togb2312-convert big5 encoding file to gb2312 encoding File
* File:
* Big5togb2312. c
* Description:
* Convert big5 encoding file to gb2312 encoding file using iconv Library
* Author:
* Xcyber Email: XCyber@sohu.com
* Date:
* August 7, 2008
* Other:
* Visit http://www.gnu.org/software/libiconv/ for more help of iconv
**************************************** ***********************************/

# Include <stdio. h>
# Include <stdlib. h>
# Include <tchar. h>
# Include <locale. h>
# Include "../iconv-1.9.2.win32/include/iconv. H"

// # Pragma comment (Lib, "../iconv-1.9.2.win32/lib/iconv. lib") // using iconv dynamic-link Lib, iconv. dll
# Pragma comment (Lib, "../iconv-1.9.2.win32/lib/iconv_a.lib") // using iconv static lib

# Define buffer_size 1024 // buffer_size must> = 2

Void usage ()
{
Printf ("\ nbig5togb2312-convert big5 encoding file to gb2312 encoding file \ n ");
Printf ("on August 7, 2008 \ n ");
Printf ("Usage: \ n ");
Printf ("big5togb2312 [big5 file (in)] [gb2312 file (out)] \ n ");
}

Int main (INT argc, char * argv [])
{
File * psrcfile = NULL;
& Nbsp; file * pdstfile = NULL;

Char szsrcbuf [buffer_size];
Char szdstbuf [buffer_size];

Size_t NSRC = 0;
Size_t ndst = 0;
Size_t nread = 0;
Size_t nret = 0;

Char * psrcbuf = szsrcbuf;
Char * pdstbuf = szdstbuf;

Iconv_t icv;
Int argument = 1;

// Check input arguments
If (argc! = 3)
{
Usage ();
Return-1;
}

Psrcfile = fopen (argv [1], "R ");
If (psrcfile = NULL)
{
Printf ("can't open source file! \ N ");
Return-1;
}

Pdstfile = fopen (argv [2], "W ");
If (psrcfile = NULL)
{
Printf ("can't open destination file! \ N ");
Return-1;
}

// Initialize iconv routine, perform conversion from big5 to gb2312
// Todo: If you want to perfom other type of coversion, e.g. gb2312-> big5, gb2312-> UTF-8...
// Just change following two paremeters of iconv_open ()
Icv = iconv_open ("gb2312", "big5 ");
If (icv = 0)
{
Printf ("can't initalize iconv routine! \ N ");
Return-1;
}

// Enable "illegal sequence discard and continue" feature, so that if met illeagal sequence,
// Conversion will continue instead of being terminated
If (iconvctl (icv, iconv_set_discard_ilseq, & argument )! = 0)
{
Printf ("can't enable \" illegal sequence discard and continue \ "feature! \ N ");
Return-1;
}

While (! Feof (psrcfile ))
{
Psrcbuf = szsrcbuf;
Pdstbuf = szdstbuf;
Ndst = buffer_size;

// Read data from source file
Nread = fread (szsrcbuf + NSRC, sizeof (char), buffer_size-NSRC, psrcfile );
If (nread = 0)
Break;

// The amount of data to be converted shocould include previous left data and current read data
NSRC = NSRC + nread;

// Perform Conversion
Nret = iconv (icv, (const char **) & psrcbuf, & NSRC, & pdstbuf, & ndst );

If (nret =-1)
{
// Include all case of errno: e2big, eilseq, einval
// E2big: there is not sufficient room at * outbuf.
// Eilseq: An invalid multibyte sequence has been encountered in the input.
// Einval: an incomplete multibyte sequence has been encountered in the input
// Move the left data to the head of szsrcbuf in other to link it with the next data block
Memmove (szsrcbuf, psrcbuf, NSRC );
}

Pan> // wirte data to destination file
Fwrite (szdstbuf, sizeof (char), buffer_size-ndst, pdstfile );

}
Iconv_close (icv );
Fclose (psrcfile );
Fclose (pdstfile );

Printf ("Conversion complete. \ n ");

Return 0;
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.