Encoding and conversion in Linux (iconv Function Family)

Source: Internet
Author: User

From: http://www.linuxdiyf.com/viewarticle.php? Id = 45164

During encoding and conversion on Linux, you can use both the iconv function family programming and the iconv command (for file conversion)

I. Use the iconv function family for encoding and conversion

The header file of the iconv function family is iconv. h.

Which must be included before use: # include <iconv. h>

The iconv function family has three functions. The prototype is as follows:

(1) iconv_open ()

Function prototype:

Iconv_t iconv_open (const char * tocode, const char * fromcode );

Function:

This function assigns an encoding and conversion handle.

Parameters:

1> tocode is the target code.

2> fromcode is the original code

(You can search for the specific encoding scheme that can be used by Baidu)

Supported internal code including: unicode encoding, such as UTF-8, UTF-16 and so on; countries adopt ANSI encoding, including gb2312, big5 and other Chinese encoding methods.

Return Value:

If the call is successful, the function returns a conversion handle for the following two functions;

Call failed.-1 is returned and errno is set.

Possible error types:

Einval the conversion from fromcode to tocode is not supported by the implementation.

(2) iconv ()

Function function: Perform actual encoding conversion.

Function prototype:

Size_t iconv (iconv_t CD,
Char ** inbuf, size_t * inbytesleft,
Char ** outbuf, size_t * outbytesleft );

Parameter introduction:

1> Cd is the transcoding handle returned by the method iconv_open () call;

2> inbuf points to the buffer to be transcoded;

3> inbytesleft is the number of bytes to be transcoded stored by inbuf;

4> outbuf stores transcoding results;

5> outbytesleft stores the size of the outbuf space.

(Disadvantage: there is no way to get the memory size required after the conversion, so it will cause a waste of memory space !)

Return Value:

If the call is successful, the number of converted bytes is returned (the number of irreversible bytes, which is not included in the number of Reversed bytes)

If the call fails,-1 is returned and the corresponding errno is set.

Note:

Iconv () is a step-by-step inbuf scan. inbuf is added for each conversion character, inbytesleft is reduced, results are saved to outbuf, and the number of result bytes is saved to outbytesleft.

There are three common cases:

Case 1: inbuf is not empty and * inbuf is not empty

In this case, normal encoding conversion is performed.

The scan will be stopped and returned in the following cases:

(1) inbuf encounters an invalid multi-byte order

In this case, errno is set to eilseq and-1 is returned;

(2) inbuf bytes are fully converted

In this case, the number of converted bytes is returned.

(3) inbuf encounters an incomplete multi-byte order

In this case, set errno to einval and return-1;

(4) outbuf does not have enough space for the next character conversion.

In this case, set errno to e2big and return-1;

Scenario 2: inbuf = NULL, or * inbuf = NULL; but outbuf! = NULL, and * outbuf! = NULL

Iconv sets the transition status to the initial state and saves the conversion sequence to * outbuf. If the outbuf space is insufficient, errno is set to e2big and returns (size_t) (-1 );

Case 3: inbuf = NULL, or * inbuf = NULL; and outbuf = NULL, * out = NULL

Iconv sets the transition status to the initial status

(3) iconv_close ()

Function prototype:

Int iconv_close (iconv_t CD );

Function:

This function is used to close the conversion handle and release resources.

Sample Code:

Encapsulate a conversion type:

Class ccodeconverter {public: ccodeconverter (const char * fromcode, const char * tocode) {hcodeconverter = iconv_open (tocode, fromcode );}~ Ccodeconverter () {iconv_close (hcodeconverter);} // convert int convert (char * srcbuf, int srclen, char * destbuf, int destlen) {// returns the number of characters to be converted. Int nconv = iconv (hcodeconverter, & srcbuf, (size_t *) & srclen, & destbuf, (size_t *) & destlen ); // if an error occurs, obtain the error code nerr = errno; return nconv;} // obtain the error message int geterrinfo () {Switch (nerr) {Case e2big: {printf ("errno: e2bgi (insufficient outbuf space) \ n "); break;} case eilseq: {printf (" errno: eilseq (inbuf multi-byte order is invalid) \ n "); break ;} case einval: {printf ("errno: einval (residual bytes not converted) \ n"); break;} default: break;} return nerr;} PRIVATE: // conversion handle iconv_t hcodeconverter; int nerr ;};

Convert UTF-8 to UTF-16 as an Example

Int main () {int srclen = 12; char * srcbuf = new char [srclen]; memset (srcbuf, 0, srclen); strcpy (srcbuf, "Baise "); int destlen = 2 * srclen; char * destbuf = new char [destlen]; memset (destbuf, 0, destlen); ccodeconverter CV = ccodeconverter ("UTF-8", "UTF-16 "); int nret = CV. convert (srcbuf, srclen, destbuf, destlen); If (nret <0) {cv. geterrinfo (); Return-1;} printf ("converted \ n ");}

Debug and view the memory. The destbuf content is as follows:

(GDB) print destbuf
$1 = 0x804b018 "/\ 377 \ 376b"
(GDB) print destbuf + 1
$2 = 0x804b019 "\ 376b"
(GDB) print destbuf + 2
$3 = 0x804b01a "B"
(GDB) print destbuf + 3
$4 = 0x804b01b ""
(GDB) print destbuf + 4
$5 = 0x804b01c ""
(GDB) print destbuf + 5
$6 = 0x804b01d ""
(GDB) print destbuf + 6

$7 = 0x804b01e "I"

Q: Why is there "\ 377 \ 376" in front of it to identify the encoding type?

Ii. iconv command

The iconv command is used to convert the encoding of a specified file. By default, it is output to a standard output device, or an output file.

Usage: iconv [option...] [file...]

The following options are available:

Input/output format specifications:

-F, -- from-code = Name Original Text Encoding
-T, -- to-code = Name output encoding information:
-L, -- list lists all known character set output controls:
-C: Ignore invalid characters from the output
-O, -- output = file: output file
-S, -- silent close warning
-- Verbose prints the progress information
-?, -- Help: Provides the system's help list
-- Usage provides brief usage information

-V, -- version print the program version number

Example:

Iconv-F UTF-8-T gb2312 aaa.txt> bbb.txt

This command reads the aaa.txt file and converts it from the 8th to the gb2312th file, and the output is directed to the bbb.txt file.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.