Character set and MySQL character set processing (ii)

Source: Internet
Author: User
Tags set set

Then the previous article went on to talk about the character set story. This article focuses on the various character set settings for MySQL, and for the basic theory section, refer to here.

1. mysql's system variables

character_set_server: Default internal operation character Set

character_set_client: The character set used by the client source data

character_set_connection: Connection layer Character Set

character_set_results: Query result Character Set

character_set_database: The default character set for the currently selected database

character_set_system: System metadata (field name, etc.) character set

Simply put, for us using the MySQL C API, the main concern is the 3 character sets, namely Character_set_client, Character_set_connection and Character_set_results. but from the perspective of my use, I always feel that character_set_connection is a bit superfluous.

2. The character set conversion process in MySQL

This section is completely pirated by the http://www.laruence.com/2008/01/05/12.html. It is easy to read and paste again.

1) When MySQL server receives the request, it converts the request data from character_set_client to character_set_connection;

2) to convert the request data from character_set_connection to the internal operation character set before doing internal operations, the method is determined as follows:

? Use the character set set value for each data field;

? If the above value does not exist, the default CHARACTER set value of the corresponding data table is used (MySQL extension, non-SQL standard);

? If the above value does not exist, the default CHARACTER set value of the corresponding database is used;

? If the above value does not exist, use Character_set_server to set the value.

3) Convert the operation result from the internal operation character set to Character_set_results.

The process of converting the character set from character_set_connection to internal operation looks complicated, but if we specify the character set of the data table while MySQL is building the table, we can simply assume that this "internal operation character Set" is the character set of the corresponding table. So, I would recommend to take this phrase "DEFAULT charset=xxx" when building a table, where xxx can be passed "select Character_set_name from Information_schema." Character_sets "to get. The suggestion is "UTF8".

3. The character set conversion experiment in MySQL

My environment here is like this.

    • Main.cpp is in utf-8 format, the compiled GCC does not specify Finput-charset and Fexec-charset, so the Chinese in the executable file should also be stored in utf-8 way;
    • Linux terminal environment is De_de (export Lang=de_de)
    • The build table statement in MySQL is

        CREATE TABLE ' Tbl_test ' (
        ' ID ' int,
        Name varchar (20000),
        Uptime date,
        PRIMARY KEY (' id ')
        ) Engine=innodb DEFAULT Charset=utf8

Experiment one: The process of correctly handling Chinese

The general process of this experiment is that

    • After the connection, the Character_set_client, Character_set_connection, and Character_set_results are also set using the set names UTF8.
    • Sets the character set of the client through set CHARACTER_SET_CLIENT=GBK.
    • Hard-coded in code "INSERT into Tbl_test (ID, name, uptime) VALUES (100, ' Hello ', ' 20130101 ')
    • Then adjust that "good" for "foodies".

The point to note is that I first convert the hard-coded (UTF8 format) char* string in the binary to the wchar_t* string, and then adjust the Chinese. Adjust the wchar_t* string to GBK's char* string before you go out. After testing, the following code is working properly.

#include <vector> #include <string> #include <tr1/memory> #include <sstream> #include "common/ Dbcomm/dbcomm.h "using namespace std;    COMMON::D blocation dblocation1;void InsertBySqlStatmentTest1 (); int main () {dblocation1.setdbid ("test_db1");    Dblocation1.setip ("127.0.0.1");    Dblocation1.setport ("3306");    Dblocation1.setuser ("Cup_dba");    Dblocation1.setpassword ("123456");    InsertBySqlStatmentTest1 (); return 0;}        void InsertBySqlStatmentTest1 () {try {vector<common::D blocation> Dblocations_array;        Dblocations_array.push_back (DbLocation1);        Dblocations_array.push_back (DbLocation2);        Tr1::shared_ptr<common::idbtasks> Mysqltasks (New Common::mysqldbtasks (Dblocations_array, true));        Mysqltasks->connect ();                cout << "Connect success" << Endl;            {COMMON::D bexecuteaction* char_action = Mysqltasks->execute (); Common::executefilter Char_filter ("Set Names UTF8");                          Char_action->do (&char_filter, &dblocation1);             Change the character_set_client to GBK common::executefilter char_filter2 ("Set character_set_client = GBK");            Char_action->do (&char_filter2, &dblocation1);          Char_action->endaction ();        } COMMON::D bexecuteaction* insert_action = Mysqltasks->insert (5000);        StringStream SS; SS << "INSERT into Tbl_test (ID, name, uptime) VALUES" << "(" << + << "," << "' Hello '," <&l T        "' 20130101 ')";        String statement = Ss.str ();  Use MBSTOWCS to change the SQL statement to wide-char-string//We use the default value of Fexec-charset, which        is Utf-8, to compile this file with GCC.        SetLocale (Lc_all, "Zh_cn.utf8");        size_t wcs_size = mbstowcs (NULL, Statement.c_str (), 0);        wchar_t* dest = new Wchar_t[wcs_size + 1];     Wmemset (dest, L ' m ', wcs_size + 1);    Mbstowcs (dest, Statement.c_str (), statement.size () * sizeof (char));        The last ' good ' to ' foodies ' wchar_t *tmp = WCSRCHR (dest, L ' good ');                *tmp = L ' foodies '; The change the SQL statement to the charset so corresponds to the character_set_client of MySQL setlocale (Lc_all,        "ZH_CN.GBK");        size_t mbs_size = wcstombs (NULL, dest, 0);        char* buf_mbs = new Char [mbs_size + 1];        memset (Buf_mbs, ' mbs_size + 1);                Wcstombs (Buf_mbs, dest, wcs_size * sizeof (wchar_t));        Try to insert into MySQL common::insertfilter insertfilter (buf_mbs);        Insert_action->do (&insertfilter);        Insert_action->endaction ();        cout << "endaction success" << Endl;        Mysqltasks->disconnect ();    cout << "Disconnect success" << Endl;    } catch (common::throwableexception& e) {cout << e.what () << Endl;    } catch (...) {cout << "Unknown exception" << Std::endl; }}

Experiment two: The process of handling the Chinese in a wrong way

Now to make some changes, let's make the situation simple, we do not maliciously go to set CHARACTER_SET_CLIENT=GBK, but only run set names UTF8. Then, when you get the patchwork SQL statement, use the String::find method to find ' you ' and then directly use the numerical subscript of the result to modify it to ' foodies '. The specific code is as follows

#include <vector> #include <string> #include <tr1/memory> #include <sstream> #include "common/ Dbcomm/dbcomm.h "using namespace std;    COMMON::D blocation dblocation1;void InsertBySqlStatmentTest1 (); int main () {dblocation1.setdbid ("test_db1");    Dblocation1.setip ("127.0.0.1");    Dblocation1.setport ("3306");    Dblocation1.setuser ("Cup_dba");    Dblocation1.setpassword ("123456");    InsertBySqlStatmentTest1 (); return 0;}        void InsertBySqlStatmentTest1 () {try {vector<common::D blocation> Dblocations_array;        Dblocations_array.push_back (DbLocation1);        Tr1::shared_ptr<common::idbtasks> Mysqltasks (New Common::mysqldbtasks (Dblocations_array, true));        Mysqltasks->connect ();        cout << "Connect success" << Endl; {//************ here no longer mischief to modify Character_set_client to gbk************** COMMON::D bexecuteaction* char_acti            On = Mysqltasks->execute (); Common::executefilter Char_filteR ("Set Names UTF8");            Char_action->do (&char_filter, &dblocation1);          Char_action->endaction ();        } COMMON::D bexecuteaction* insert_action = Mysqltasks->insert (5000);        StringStream SS; SS << "INSERT into Tbl_test (ID, name, uptime) VALUES" << "(" << + << "," << "' Hello '," <&l T        "' 20130101 ')";        Directly modify string************** string statement = Ss.str ();        size_t pos = statement.find (' you ');                Statement[pos] = ' foodies ';        Try to insert into MySQL common::insertfilter insertfilter (statement);        Insert_action->do (&insertfilter);        Insert_action->endaction ();        cout << "endaction success" << Endl;        Mysqltasks->disconnect ();    cout << "Disconnect success" << Endl;    } catch (common::throwableexception& e) {cout << e.what () << Endl;    } catch (...)     {   cout << "Unknown exception" << Std::endl; }}

As a result,

In order to pursue the cause of the error, let's look at the 16-binary perspective.

can see that

        size_t pos = statement.find (' you ');        Statement[pos] = ' foodies ';

In essence, only one byte is changed (UTF8 code, from ' You ' e4bda0 to ' ho ' e4bc95, our change is that 95, he is a byte of ' foodies '. This phenomenon is also consistent with our understanding of string behavior.

4. Summary and RECOMMENDATIONS

    • It is recommended that you set the character set for each data table
    • It is recommended that character_set_client, character_set_connection, and character_set_results be set to match the character set of the data table
    • When using the MySQL C API, immediately after initializing the database handle, set the Mysql_set_charset_name property to a character set consistent with the character set of the data table, or by sending the SQL statement set names. mysql_options XXX to set character sets.
    • If you need to work with Chinese, the data table's character set is usually Utf-8 or GBK.
    • If you want to do character processing in Chinese, then you must set the setlocale according to the actual situation, using MBSTOWCS to convert to WCS, and then for Wide-char string operation, Then use the wcstombs to convert the multibyte string into a SQL statement that is passed to the database connection.

Character set and MySQL character set processing (ii)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.