MySQL character encoding and conversion

Source: Internet
Author: User

The following article describes the detailed analysis of MySQL character encoding and conversion (latin1-> gbk). We all know that it is easy to develop a second development site, there are two types of codes available for the original database: gbk and lation1. I use gbk, which involves encoding conversion.

Here, we found a detailed Encoding comparison on the LiJun027's Blog, in the following situations:

I. Experiment:

1. Scenario 1

Database field MySQL Character Set: UTF-8

Connected Character Set: no explicit settings. The default value is latin1.

Page Character Set: gbk

Deposit Process:

1) the SQL statement expressed in GBK is used to submit a storage request to the server;

2) Set Names '? Is not used by default '?? ') The server uses latin1 to open the connection;

3) The server mistakenly believes that the current SQL statement is represented by latin1;

4) The server regards the GBK character as latin1 character, incorrectly uses the "latin1 to UTF-8 function" to convert the MySQL character and store it in the UTF-8 field;

5) Wrong latin1 (actually GBK) => wrong UTF-8)

6) If you use phpmyadmin to open the table and use utf8 to connect) the field will be garbled;

Read process:

1) Set Names '? Is not used by default '?? ') The server uses latin1 to open the connection;

2) The server converts the value in the UTF-8 field to latin1 and returns it to the client;

3) Wrong UTF-8 => wrong latin1 (actually GBK) This process is the inverse process of the Deposit Process 5. Right)

4) check whether the GBK encoding of latin1 is properly displayed by PAGE character set;

Used to indicate:

Deposit Process:

----------------------

Page connection Storage

----------------------

GBK = & gt; latin1 = & gt; UTF-8

---------------

------------- |

| + ------- The UTF-8 obtained in this process is a string of ant cloud garbled characters, but MySQL stubbornly thinks this string of code is a UTF-8

|

+ ------ MySQL mistaken GBK for latin1

Read process:

----------------------

Page connection Storage

----------------------

GBK & lt ;= latin1 & lt; = UTF-8

---------------

------------- |

| + ------- This garbled string is converted back to the correct GBK encoding through the inverse process, but MySQL considers it as latin1.

|

+ ------ MySQL mistakenly believes that the GBK encoding of latin1 is sent back to the page, and the correct encoding is obtained.

2. Case 2

Database field character set: UTF-8

Connect to the MySQL Character Set: gbk

Page Character Set: gbk

The text description is omitted.

:

Deposit Process:

----------------------

Page connection Storage

----------------------

GBK = & gt; UTF-8

------------

------------- |

| + ------- The UTF-8 obtained in this process is converted from GBK and is correctly encoded with UTF-8.

|

+ ------ The page character set is equal to the connected character set. MySQL considers that the page is passed to GBK encoding, and its idea is exactly in line with the facts.

Read process:

----------------------

Page connection Storage

----------------------

GBK & lt; = UTF-8

---------------

------------- |

| + ------- Use the "UTF-8 to GBK function" to convert the correct UTF-8 encoding back to GBK

|

The page character set is equal to the MySQL character set. No problem is displayed.

3. Case 3

Database field character set: gbk

Connected Character Set: no explicit settings. The default value is latin1.

Page Character Set: gbk

Deposit Process:

----------------------

Page connection Storage

----------------------

GBK = & gt; latin1 = & gt; GBK

------------

------------- |

| + ------- The character is converted into garbled characters by the "latin1 to GBK function", but MySQL regards it as GBK, so the tool cannot be properly displayed.

|

+ ------ MySQL considers that the page is latin1 encoded, And it will add correct GBK to garbled code in the subsequent process.

Read process:

----------------------

Page connection Storage

----------------------

GBK & lt; = latin1 & lt; = GBK

---------------

------------- |

| + ------- "GBK to latin1 function" will be converted into GBK garbled characters, but MySQL regards them as latin1

|

+ ------ The incorrect latin1 encoding is actually the correct GBK encoding. The page is displayed normally, but the tool is not displayed normally.

Ii. Conversion between MySQL character sets

The author tries to convert the GBK character into the wrong UTF-8 by mistake as latin1. In the reverse process, the garbled characters are converted back to latin1, which is exactly the correct GBK.

$ Str = "Chinese test ";

 
 
  1. $str_tran = iconv('latin1', 'utf-8', $str);   
  2. echo $str_tran;   

Displays garbled characters, neither GBK nor UTF-8 or latin1

 
 
  1. echo "<br>-----------<br>";  
  2. $str_re_tran = iconv('utf-8', 'latin1', $str_tran);   
  3. echo $str_re_tran;    

Show "Chinese test"

An error occurs when the GBK character is mistakenly converted to the wrong GBK Encoding As UTF-8.

$ Str = "Chinese test ";

 
 
  1. #$str_tran = iconv('utf-8', 'gbk', $str);     

Error !!!

It can be seen that whether an encoding can be converted to a third encoding as another encoding depends on the inherent attributes of the encoding, the first example above only happens that GBK encoding can be mistakenly converted to UTF-8 as latin1. In the following cases, the database certainly cannot access data normally.

Let's talk about the lesson. When we build a database, all the codes of the same application must be consistent. Otherwise, we will be worried about self-searching.

After half a day, I still cannot use iconv for conversion. To enable iconv in Windows, you only need to remove ";" in php. ini before extension = php_mbstring.dll. I checked it online. Many of them say they want to enable extension = php_iconv.dll, but this line is not seen in several versions of PHP. Is it estimated that this is the case for the old version ?)

Finally, find a tool to implement the conversion of latin1 <-> gbk, gbk <-> utf8, gbk <-> big5, encoding, the program can perform multiple conversions to implement latin1-> gbk-> utf8 conversion, but cannot skip the conversion example: latin1 cannot be directly converted to utf8 ).

Not bad. No garbled code was transferred, and the problem was finally solved.

Also, I would like to mention the backup database tool Empirebak ). A free and open-source stable and efficient software designed specifically for MySQL big data backup and import. The system uses volume-based backup and import to theoretically back up databases of any size.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.