How to convert different character sets in character set research

Source: Internet
Author: User

Zhu Jinchan

Source: http://blog.csdn.net/clever101

The multi-byte character set and the Unicode character set are described in the previous article, and today we describe how to convert between the next two large character sets.

The first thing to talk about is Microsoft's attitude towards the Unicode character set. Under the Windows development system, the Unicode character set is called the wide-byte character set, and the multibyte character set is called the narrow character set. Microsoft is strongly supportive of the Unicode character set. It can be seen from the following points: starting from windows2000 with Unicode for development; Windows CE itself is an operating system using Unicode and does not support the ANSI version of Windows API functions at all The new VC project is using the Unicode character set (UTF16) by default. So the question is, as a C + + programmer, whether to use the Unicode character set.

Why use the Unicode character set? Improve operational efficiency, such as the Windows kernel itself is based on Unicode characters, non-Unicode character nginx into Unicode characters first ("Detailed explanation of Windows core programming") , in different languages can easily exchange data, such as the English version of the operating system to enter the Chinese path, if the non-Unicode characters are not installed in the Chinese character set, then garbled.

Why not use the Unicode character set? Because the traditional forces are very powerful, many cross-platform third-party libraries are based on multibyte-byte set for development, there are programming habits, such as in Windows development, it is known that the function of calculating the length of the string is strlen, who will go with the wide-byte version of the Wcslen it. See the article I wrote before:

Unicode character set, use or not? 》

Finally, we talk about multibyte character sets and Unicode character sets. Two ways, one is to use a cross-platform Iconv Library, the sample code is as follows:

Include <stdio.h> #include <stdlib.h> #include <string>using namespace std; #include <iconv.h>/ /Encoding Conversion Library # define Outlen 255//file path length//code conversion: from one encoding to another encoding int code_convert (char *from_charset, Char *to_charset, Char *inbuf, size_t Inlen, Char *outbuf, size_t outlen) {iconv_t Cd;char **pin = &inbuf;char **pout = &AMP;OUTBUF;CD = Iconv_open (t O_charset,from_charset), if (cd==0) Return-1;memset (Outbuf,0,outlen), if (Iconv (Cd,pin,&inlen,pout,&outlen) ==-1) Return-1;iconv_close (CD); return 0;} Unicode code converted to GB2312 code int u2g (char *inbuf, size_t Inlen, Char *outbuf, size_t outlen) {return Code_convert ("Utf-8", "gb2312" , Inbuf,inlen,outbuf,outlen);} GB2312 code to Unicode code int g2u (char *inbuf, size_t Inlen, Char *outbuf, size_t outlen) {return Code_convert ("gb2312", "Utf-8" , Inbuf,inlen,outbuf,outlen);} Execute SQL statement callback function static int _sql_callback (void* pused, int argc, char** argv, char** ppszcolname) {for (int i=0; i<argc; i++) {printf ("%s =%s/n", Ppszcolname[i], argv[i]==0? "NULL": Argv[i]);} return 0;}  void Main () {char *in_gb2312 = "d://Control point Library//gcpdb.3sdb"; char Out[outlen]; gb2312 code converted to Unicode code g2u (In_gb2312,strlen (in_gb2312), Out,outlen);p rintf ("Gb2312-->unicode out=%s/n", out);}

Another way is to use WINDIWSAPI, the sample code is as follows:

std::string MbcsToUtf8 (const char* PSZMBCS) {std::string str;          WCHAR *pwchar=0;          CHAR *pchar=0;          int len=0; int codepage = Arefileapisansi ()?          CP_ACP:CP_OEMCP;          Len=multibytetowidechar (codepage, 0, Pszmbcs,-1, null,0);          Pwchar=new Wchar[len];              if (pwchar!=0) {len = MultiByteToWideChar (codepage, 0, Pszmbcs,-1, Pwchar, Len);                  if (len!=0) {len = WideCharToMultiByte (Cp_utf8, 0, Pwchar,-1, 0, 0, 0, 0);                  Pchar=new Char[len];                      if (pchar!=0) {len = WideCharToMultiByte (Cp_utf8, 0, Pwchar,-1, Pchar, len,0, 0);                                         if (len!=0) {str = Pchar;                  } Delete Pchar;              } Delete Pwchar; }} REturn str;   }

Reference documents:

1. Use SQLite3 to support Chinese path

How to convert different character sets in character set research

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.