Notes about vc6 MySQL Encoding

Source: Internet
Author: User

Recently, I used vc6 to capture information. I first studied regular expressions two days ago and easily crawled information from the Internet. I used the MySQL database for background storage, this is mainly for the convenience of web development in the future. However, MySQL encountered some Encoding Problems during usage. Here we will record them to provide reference for friends who may encounter such problems.

Currently, the Chinese webpage code is mainly gb2132 and utf8. gb2132 does not need to be converted in vc6 when capturing webpages from the Internet, because vc6 uses multi-byte storage by default, there will be no garbled problem, but if the webpage is UTF-8 encoded, the captured result will be garbled. You need to convert utf8 to multiple bytes for processing in VC. For example, convert it as follows:

//////////////////////////

Int n = multibytetowidechar (cp_utf8, 0, strdata, strdata. getlength (), null, 0 );

Wchar * pchar = new wchar [n + 1];

Multibytetowidechar (cp_utf8, 0, strdata, strdata. getlength (), pchar, N );

Pchar [N] = 0;

Char szansi [1024];

Widechartomultibyte (cp_acp, wc_compositecheck, pchar,-1, szansi, sizeof (szansi), null, null );

///////////////////////////

Szansi is a multi-byte string that can be used.

After processing the strings inside vc6, encoding problems may also occur when the strings are stored in the MySQL database.

It should be noted that if the encoding is not performed according to the encoding settings of the database during storage, although the data can be stored in the database, a problem may occur during retrieval, in most cases, you will see a pair of garbled characters without knowing how to handle them.

Therefore, here we set the storage encoding format of the database to utf8 (mainly for ease of application, or something else, such as gb3132), so it is best to explicitly describe the table creation.

//////////////////////////

Create Table if not exists XXX (ID int (4) not null primary key auto_increment,...) default charset = utf8;

When connecting to the database, set the read/write encoding after mysql_init (& g_mysql:

Mysql_query (& g_mysql, _ T ("set names 'utf8 '")

//////////////////////////

In this way, the preparation is complete, and the storage is the encoding conversion problem. Because the string in vc6 is multi-byte encoding (in fact, it should be gb2132), MySQL database uses utf8 encoding during storage. If it is not converted, an error will occur during insertion, error message that cannot be recognized.

Here we need a character encoding conversion function, refer to http://www.vckbase.com/document/viewdoc? Id = The Conversion Function gb2312toutf_8 in 1444:

//////////////////////////

// Gb2312 into UTF-8

Char * gb2312toutf_8 (char * ptext, int Plen)

{

Int nulen = 1 + Plen * 2; // Plen + (Plen> 2) + 2;

Char Buf [4];

Char * rst = new char [nulen];

 

Memset (BUF, 0, 4 );

Memset (RST, 0, nulen );

Int I = 0;

Int J = 0;

While (I <Plen)

{

// Directly copy data in English

If (* (ptext + I)> = 0)

{

RST [J ++] = ptext [I ++];

}

Else

{

Wchar pbuffer;

Gb2312tounicode (& pbuffer, ptext + I );

Unicodetoutf_8 (BUF, & pbuffer );

Unsigned short int TMP = 0;

TMP = rst [J] = Buf [0];

TMP = rst [J + 1] = Buf [1];

TMP = rst [J + 2] = Buf [2];

J + = 3;

I + = 2;

}

}

RST [J] = '\ 0 ';

Return RST;

}

Note that there is a small problem in the original article when calculating the string length. I changed it above and returned the memory opening pointer. After using it, remember to release the memory;

By the way, the following functions are used:

// Convert the UTF-8 to Unicode

Void utf_8tounicode (wchar * pout, char * ptext)

{

Char * uchar = (char *) pout;

Uchar [1] = (ptext [0] & 0x0f) <4) + (ptext [1]> 2) & 0x0f );

Uchar [0] = (ptext [1] & 0x03) <6) + (ptext [2] & 0x3f );

Return;

}

// Unicode to UTF-8

Void unicodetoutf_8 (char * pout, wchar * ptext)

{

// Pay attention to the order of wchar high and low characters. The lower byte is in the front and the higher byte is in the back

Char * pchar = (char *) ptext;

Pout [0] = (0xe0 | (pchar [1] & 0xf0)> 4 ));

Pout [1] = (0x80 | (pchar [1] & 0x0f) <2) + (pchar [0] & 0xc0)> 6 );

Pout [2] = (0x80 | (pchar [0] & 0x3f ));

Return;

}

// Convert Unicode to gb2312

Void unicodetogb2312 (char * pout, unsigned short udata)

{

Widechartomultibyte (cp_acp, null, & udata, 1, pout, sizeof (wchar), null, null );

Return;

}

// Convert gb2312 to Unicode

Void gb2312tounicode (wchar * pout, char * gbbuffer)

{

: Multibytetowidechar (cp_acp, mb_precomposed, gbbuffer, 2, pout, 1 );

Return;

}

//////////////////////////

After the preceding encoding and conversion, the string can be stored in the utf8-encoded MySQL database.

Strsql. Format ("insert into XXX (name,...) values ('% s',...)", szname ...);

If (mysql_query (& g_mysql, strsql )! = 0)

{

Cout <mysql_error (& g_mysql) <Endl;

Errorfile. writestring (strsql );

Continue;

}

Delete [] szname;

/////////////////////////////////////////

However, I still encountered a problem when using the above Code. The following utf8 characters in the string may cause SQL Execution to fail,

\ Xe0 \ x84 \ x81

\ Xe0 \ x82 \ xb7

\ Xe0 \ X80 \ xbf

\ Xe0 \ x90 \ x96

\ Xe0 \ x8b \ x8a

The initial idea is to replace these characters. If there is a better way, you are welcome to discuss them further.

--------------------------------------------

Ppzhang | giszhang@gmail.com |

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.