How to convert different character sets in character set research

Last Update:2016-03-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Zhu Jinchan

Source: http://blog.csdn.net/clever101

The multi-byte character set and the Unicode character set are described in the previous article, and today we describe how to convert between the next two large character sets.

The first thing to talk about is Microsoft's attitude towards the Unicode character set. Under the Windows development system, the Unicode character set is called the wide-byte character set, and the multibyte character set is called the narrow character set. Microsoft is strongly supportive of the Unicode character set. It can be seen from the following points: starting from windows2000 with Unicode for development; Windows CE itself is an operating system using Unicode and does not support the ANSI version of Windows API functions at all The new VC project is using the Unicode character set (UTF16) by default. So the question is, as a C + + programmer, whether to use the Unicode character set.

Why use the Unicode character set? Improve operational efficiency, such as the Windows kernel itself is based on Unicode characters, non-Unicode character nginx into Unicode characters first ("Detailed explanation of Windows core programming") , in different languages can easily exchange data, such as the English version of the operating system to enter the Chinese path, if the non-Unicode characters are not installed in the Chinese character set, then garbled.

Why not use the Unicode character set? Because the traditional forces are very powerful, many cross-platform third-party libraries are based on multibyte-byte set for development, there are programming habits, such as in Windows development, it is known that the function of calculating the length of the string is strlen, who will go with the wide-byte version of the Wcslen it. See the article I wrote before:

Unicode character set, use or not? 》

Finally, we talk about multibyte character sets and Unicode character sets. Two ways, one is to use a cross-platform Iconv Library, the sample code is as follows:

Include <stdio.h> #include <stdlib.h> #include <string>using namespace std; #include <iconv.h>/ /Encoding Conversion Library # define Outlen 255//file path length//code conversion: from one encoding to another encoding int code_convert (char *from_charset, Char *to_charset, Char *inbuf, size_t Inlen, Char *outbuf, size_t outlen) {iconv_t Cd;char **pin = &inbuf;char **pout = &AMP;OUTBUF;CD = Iconv_open (t O_charset,from_charset), if (cd==0) Return-1;memset (Outbuf,0,outlen), if (Iconv (Cd,pin,&inlen,pout,&outlen) ==-1) Return-1;iconv_close (CD); return 0;} Unicode code converted to GB2312 code int u2g (char *inbuf, size_t Inlen, Char *outbuf, size_t outlen) {return Code_convert ("Utf-8", "gb2312" , Inbuf,inlen,outbuf,outlen);} GB2312 code to Unicode code int g2u (char *inbuf, size_t Inlen, Char *outbuf, size_t outlen) {return Code_convert ("gb2312", "Utf-8" , Inbuf,inlen,outbuf,outlen);} Execute SQL statement callback function static int _sql_callback (void* pused, int argc, char** argv, char** ppszcolname) {for (int i=0; i<argc; i++) {printf ("%s =%s/n", Ppszcolname[i], argv[i]==0? "NULL": Argv[i]);} return 0;}  void Main () {char *in_gb2312 = "d://Control point Library//gcpdb.3sdb"; char Out[outlen]; gb2312 code converted to Unicode code g2u (In_gb2312,strlen (in_gb2312), Out,outlen);p rintf ("Gb2312-->unicode out=%s/n", out);}

Another way is to use WINDIWSAPI, the sample code is as follows:

std::string MbcsToUtf8 (const char* PSZMBCS) {std::string str;          WCHAR *pwchar=0;          CHAR *pchar=0;          int len=0; int codepage = Arefileapisansi ()?          CP_ACP:CP_OEMCP;          Len=multibytetowidechar (codepage, 0, Pszmbcs,-1, null,0);          Pwchar=new Wchar[len];              if (pwchar!=0) {len = MultiByteToWideChar (codepage, 0, Pszmbcs,-1, Pwchar, Len);                  if (len!=0) {len = WideCharToMultiByte (Cp_utf8, 0, Pwchar,-1, 0, 0, 0, 0);                  Pchar=new Char[len];                      if (pchar!=0) {len = WideCharToMultiByte (Cp_utf8, 0, Pwchar,-1, Pchar, len,0, 0);                                         if (len!=0) {str = Pchar;                  } Delete Pchar;              } Delete Pwchar; }} REturn str;   }

Reference documents:

1. Use SQLite3 to support Chinese path

How to convert different character sets in character set research

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to convert different character sets in character set research

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support