From: http://www.linuxdiyf.com/viewarticle.php? Id = 45164
During encoding and conversion on Linux, you can use both the iconv function family programming and the iconv command (for file conversion)
I. Use the iconv function family for encoding and conversion
The header file of the iconv function family is iconv. h.
Which must be included before use: # include <iconv. h>
The iconv function family has three functions. The prototype is as follows:
(1) iconv_open ()
Function prototype:
Iconv_t iconv_open (const char * tocode, const char * fromcode );
Function:
This function assigns an encoding and conversion handle.
Parameters:
1> tocode is the target code.
2> fromcode is the original code
(You can search for the specific encoding scheme that can be used by Baidu)
Supported internal code including: unicode encoding, such as UTF-8, UTF-16 and so on; countries adopt ANSI encoding, including gb2312, big5 and other Chinese encoding methods.
Return Value:
If the call is successful, the function returns a conversion handle for the following two functions;
Call failed.-1 is returned and errno is set.
Possible error types:
Einval the conversion from fromcode to tocode is not supported by the implementation.
(2) iconv ()
Function function: Perform actual encoding conversion.
Function prototype:
Size_t iconv (iconv_t CD,
Char ** inbuf, size_t * inbytesleft,
Char ** outbuf, size_t * outbytesleft );
Parameter introduction:
1> Cd is the transcoding handle returned by the method iconv_open () call;
2> inbuf points to the buffer to be transcoded;
3> inbytesleft is the number of bytes to be transcoded stored by inbuf;
4> outbuf stores transcoding results;
5> outbytesleft stores the size of the outbuf space.
(Disadvantage: there is no way to get the memory size required after the conversion, so it will cause a waste of memory space !)
Return Value:
If the call is successful, the number of converted bytes is returned (the number of irreversible bytes, which is not included in the number of Reversed bytes)
If the call fails,-1 is returned and the corresponding errno is set.
Note:
Iconv () is a step-by-step inbuf scan. inbuf is added for each conversion character, inbytesleft is reduced, results are saved to outbuf, and the number of result bytes is saved to outbytesleft.
There are three common cases:
Case 1: inbuf is not empty and * inbuf is not empty
In this case, normal encoding conversion is performed.
The scan will be stopped and returned in the following cases:
(1) inbuf encounters an invalid multi-byte order
In this case, errno is set to eilseq and-1 is returned;
(2) inbuf bytes are fully converted
In this case, the number of converted bytes is returned.
(3) inbuf encounters an incomplete multi-byte order
In this case, set errno to einval and return-1;
(4) outbuf does not have enough space for the next character conversion.
In this case, set errno to e2big and return-1;
Scenario 2: inbuf = NULL, or * inbuf = NULL; but outbuf! = NULL, and * outbuf! = NULL
Iconv sets the transition status to the initial state and saves the conversion sequence to * outbuf. If the outbuf space is insufficient, errno is set to e2big and returns (size_t) (-1 );
Case 3: inbuf = NULL, or * inbuf = NULL; and outbuf = NULL, * out = NULL
Iconv sets the transition status to the initial status
(3) iconv_close ()
Function prototype:
Int iconv_close (iconv_t CD );
Function:
This function is used to close the conversion handle and release resources.
Sample Code:
Encapsulate a conversion type:
Class ccodeconverter {public: ccodeconverter (const char * fromcode, const char * tocode) {hcodeconverter = iconv_open (tocode, fromcode );}~ Ccodeconverter () {iconv_close (hcodeconverter);} // convert int convert (char * srcbuf, int srclen, char * destbuf, int destlen) {// returns the number of characters to be converted. Int nconv = iconv (hcodeconverter, & srcbuf, (size_t *) & srclen, & destbuf, (size_t *) & destlen ); // if an error occurs, obtain the error code nerr = errno; return nconv;} // obtain the error message int geterrinfo () {Switch (nerr) {Case e2big: {printf ("errno: e2bgi (insufficient outbuf space) \ n "); break;} case eilseq: {printf (" errno: eilseq (inbuf multi-byte order is invalid) \ n "); break ;} case einval: {printf ("errno: einval (residual bytes not converted) \ n"); break;} default: break;} return nerr;} PRIVATE: // conversion handle iconv_t hcodeconverter; int nerr ;};
Convert UTF-8 to UTF-16 as an Example
Int main () {int srclen = 12; char * srcbuf = new char [srclen]; memset (srcbuf, 0, srclen); strcpy (srcbuf, "Baise "); int destlen = 2 * srclen; char * destbuf = new char [destlen]; memset (destbuf, 0, destlen); ccodeconverter CV = ccodeconverter ("UTF-8", "UTF-16 "); int nret = CV. convert (srcbuf, srclen, destbuf, destlen); If (nret <0) {cv. geterrinfo (); Return-1;} printf ("converted \ n ");}
Debug and view the memory. The destbuf content is as follows:
(GDB) print destbuf
$1 = 0x804b018 "/\ 377 \ 376b"
(GDB) print destbuf + 1
$2 = 0x804b019 "\ 376b"
(GDB) print destbuf + 2
$3 = 0x804b01a "B"
(GDB) print destbuf + 3
$4 = 0x804b01b ""
(GDB) print destbuf + 4
$5 = 0x804b01c ""
(GDB) print destbuf + 5
$6 = 0x804b01d ""
(GDB) print destbuf + 6
$7 = 0x804b01e "I"
Q: Why is there "\ 377 \ 376" in front of it to identify the encoding type?
Ii. iconv command
The iconv command is used to convert the encoding of a specified file. By default, it is output to a standard output device, or an output file.
Usage: iconv [option...] [file...]
The following options are available:
Input/output format specifications:
-F, -- from-code = Name Original Text Encoding
-T, -- to-code = Name output encoding information:
-L, -- list lists all known character set output controls:
-C: Ignore invalid characters from the output
-O, -- output = file: output file
-S, -- silent close warning
-- Verbose prints the progress information
-?, -- Help: Provides the system's help list
-- Usage provides brief usage information
-V, -- version print the program version number
Example:
Iconv-F UTF-8-T gb2312 aaa.txt> bbb.txt
This command reads the aaa.txt file and converts it from the 8th to the gb2312th file, and the output is directed to the bbb.txt file.