Code Conversion iconv

Source: Internet
Author: User
I. Use the iconv function family for encoding and conversion
During encoding and conversion on Linux, you can use both the iconv function family programming and the iconv command, but the latter is for files, converts a specified file from one encoding to another.
The header file of the iconv function family is iconv. H, which must be included before use.
# Include <iconv. h>
The iconv function family has three functions. The prototype is as follows:
(1) iconv_t iconv_open (const char * tocode, const char * fromcode );
This function indicates which two types of encoding are to be converted. tocode is the target encoding and fromcode is the original encoding. This function returns a conversion handle for the following two functions.
(2) size_t iconv (iconv_t CD, char ** inbuf, size_t * inbytesleft, char
** Outbuf, size_t * outbytesleft );
This function reads characters from inbuf and outputs the converted characters to outbuf. inbytesleft records the number of characters that have not been converted, and outbytesleft records the remaining space of the output buffer.
(3) int iconv_close (iconv_t CD );
This function is used to close the conversion handle and release resources.
Example 1: A conversion example program implemented in C Language
/* F. C: Code Conversion example C program */
# Include <iconv. h>
# Deprecision outlen 255
Main ()
{
Char * in_utf8 = "e? Why ?? Why? ";
Char * in_gb2312 = "installing ";
Char out [outlen];
// Convert Unicode code to gb2312 code
Rc = u2g (in_utf8, strlen (in_utf8), Out, outlen );
Printf ("Unicode --> gb2312 out = % Sn", out );
// Convert the gb2312 code to the Unicode code
Rc = g2u (in_gb2312, strlen (in_gb2312), Out, outlen );
Printf ("gb2312 --> Unicode out = % Sn", out );
}
// Code Conversion: Convert from one encoding to another
Int code_convert (char * from_charset, char * to_charset, char * inbuf, int
Inlen, char * outbuf, int outlen)
{
Iconv_t CD;
Int RC;
Char ** pin = & inbuf;
Char ** pout = & outbuf;
Cd = iconv_open (to_charset, from_charset );
If (Cd = 0) Return-1;
Memset (outbuf, 0, outlen );
If (iconv (Cd, pin, & inlen, pout, & outlen) =-1) Return-1;
Iconv_close (CD );
Return 0;
}
// Convert Unicode code to gb2312 code
Int u2g (char * inbuf, int inlen, char * outbuf, int outlen)
{
Return code_convert ("UTF-8", "gb2312", inbuf, inlen, outbuf, outlen );
}
// Convert the gb2312 code to the Unicode code
Int g2u (char * inbuf, size_t inlen, char * outbuf, size_t outlen)
{
Return code_convert ("gb2312", "UTF-8", inbuf, inlen, outbuf, outlen );
}
Example 2: A conversion example program in C ++
/* F. cpp: Code Conversion example c ++ Program */
# Include <iconv. h>
# Include <iostream>
# Deprecision outlen 255
Using namespace STD;
// Code conversion operation class
Class codeconverter {
PRIVATE:
Iconv_t CD;
Public:
// Construct
Codeconverter (const char * from_charset, const char * to_charset ){
Cd = iconv_open (to_charset, from_charset );
}
// Structure
~ Codeconverter (){
Iconv_close (CD );
}
// Conversion output
Int convert (char * inbuf, int inlen, char * outbuf, int outlen ){
Char ** pin = & inbuf;
Char ** pout = & outbuf;
Memset (outbuf, 0, outlen );
Return iconv (Cd, pin, (size_t *) & inlen, pout, (size_t *) & outlen );
}
};
Int main (INT argc, char ** argv)
{
Char * in_utf8 = "e? Why ?? Why? ";
Char * in_gb2312 = "installing ";
Char out [outlen];
// UTF-8 --> gb2312
Codeconverter cc = codeconverter ("UTF-8", "gb2312 ");
Cc. Convert (in_utf8, strlen (in_utf8), Out, outlen );
Cout <"UTF-8 --> gb2312 in =" <in_utf8 <", out =" <out <Endl;
// Gb2312 --> UTF-8
Codeconverter CC2 = codeconverter ("gb2312", "UTF-8 ");
Cc2.convert (in_gb2312, strlen (in_gb2312), Out, outlen );
Cout <"gb2312 --> UTF-8 in =" <in_gb2312 <", out =" <out <Endl;
}
Ii. Use the iconv command for encoding and conversion
During encoding and conversion on Linux, you can use both the iconv function family programming and the iconv command, but the latter is for files, converts a specified file from one encoding to another.
The iconv command is used to convert the encoding of a specified file. By default, it is output to a standard output device, or an output file.
Usage: iconv [option...] [file...]
The following options are available:
Input/output format specifications:
-F, -- from-code = Name Original Text Encoding
-T, -- to-code = Name output Encoding
Information:
-L, -- list lists all known character sets
Output Control:
-C: Ignore invalid characters from the output
-O, -- output = file: output file
-S, -- silent close warning
-- Verbose prints the progress information
-?, -- Help: Provides the system's help list
-- Usage provides brief usage information
-V, -- version print the program version number
Example:
Iconv-F UTF-8-T gb2312 aaa.txt> bbb.txt
This command reads the aaa.txt file and converts it from the 8th to the gb2312th file, and the output is directed to the bbb.txt file.
Summary: Linux provides us with a powerful encoding and Conversion Tool, which brings us convenience.
Glibc comes with a transcoding function iconv, which is easy to use and can recognize many codes. If the program needs to involve transcoding between codes, you can use it.
Iconv command usage.
$ Iconv -- list # display recognizable encoding names
$ Iconv-F gb2312-T UTF-8 a.html> B .html #
Convert the.html file of gb23121_to 8w.bb.html
$ Iconv-F gb2312-T big5 a.html> B .html #
Convert the.html file of gb23121_to big51_and save B .html
Iconv programming involves calling the following glibc libraries:
# Include <iconv. h>
Iconv_t iconv_open (const char * tocode, const char * fromcode );
Int iconv_close (iconv_t CD );
Size_t iconv (iconv_t CD,
Char ** inbuf, size_t * inbytesleft,
Char ** outbuf, size_t * outbytesleft );
When using iconv transcoding, first use iconv_open to obtain the transcoding handle, then call iconv transcoding, and then call iconv_close to close the handle.
In the iconv function:
The parameter CD is the transcoding handle returned by calling iconv_open;
The inbuf parameter points to the buffer to be transcoded;
The inbytesleft parameter is the number of bytes to be transcoded stored by inbuf;
The parameter outbuf stores the transcoding result;
The parameter outbytesleft stores the size of the outbuf space.
If the call is successful, iconv returns the number of converted bytes (the number of bytes that cannot be called, which is not included in the number of Reversed bytes ). Otherwise,-1 is returned and the corresponding errno is set.
Iconv gradually scans inbuf. Each time a character is converted, inbuf is added, inbytesleft is reduced, results are saved to outbuf, and the number of result bytes is saved to outbytesleft. The scan will be stopped and returned in the following cases:
1. The multibyte sequence is invalid. errno is eilseq, and * inbuf points to the first invalid character;
2. bytes remain in inbuf and errno is einval;
3. There is not enough space for outbuf, and errno is e2big;
4. Complete Normal conversion.
For the iconv function, there are two calling cases:
1.
If inbuf or * inbuf is null and outbuf and * outbuf are not null, iconv sets the conversion status to the initial state and saves the conversion sequence to * outbuf. If the outbuf space is insufficient, errno is set to e2big and returns (size_t) (-1 );
2.
Inbuf or * inbuf is null, outbuf or * outbuf is also null, and iconv sets the conversion status to the initial state.
The iconv command is convenient to use, but if there is a problem during the conversion process, the conversion will be stopped. Sometimes we want to skip the byte sequence that cannot be converted and continue the conversion. The following program can implement this function.
/**
* Siconv. cpp-a simple way to demostrate the usage of iconv calling
*
* Report Bugs to marchday2004@yahoo.com.cn
* July 15th, 2006
*/
# Include <iconv. h>
# Include <stdio. h>
# Include <string>
# Include <stdarg. h>
# Include <errno. h>
# Include <sys/types. h>
# Include <sys/STAT. h>
# Include <unistd. h>
# Include <sys/Mman. h>
# Ifdef debug
# Define trace (FMT, argS...) fprintf (stderr, "% s: % d:" FMT ,\
_ File __, _ FUNCTION __, _ line __, # ARGs)
# Else
# Define trace (FMT, argS ...)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.