Learn PHPamp; amp; MySQL-character encoding (2) bitsCN.com
Next to PHP & MYSQL-character encoding (1), this article mainly describes MySQL garbled characters and character set settings.
MySQL character set conversion process is as follows:
This process has at least three character sets, client character set, connector character set, and server character set. The connector plays a crucial role. the specific process is that when the client saves data to the server, the client sends the data encoded in its character set to the connector. The Connector Selects a character set for conversion, then, convert the converted character set to the server character set and send it to the server for storage. When the client retrieves data from the server, the above process is reversed.
For one scenario:
The client is GBK encoded, the connector is UTF8 encoded, and the server is also UTF8 encoded. When the client sends GBK-encoded data to the connector, the connector converts the GBK-encoded data to UTF8 encoding and stores the data in the connector, the Connector then sends the temporary data to the server and saves it to the database without any conversion. When the client retrieves data, the above process is reversed.
In this case, there may be some problems. if the database originally stores characters that are only available in UTF8 encoding but not in GBK encoding, when the client retrieves data, bytes may be lost when UTF-8 encoded characters are converted to GBK. (If the scenario is only for China, there may be no problems .)
Is another scenario
The client is still GBK encoded, the connector is also GBK encoded, and the server is still UTF8 encoded. When the client sends the GBK-encoded data to the connector, the connector is not converted and temporarily stored in the connector. then, the connector transfers the temporary GBK-encoded data to UTF8 encoding and sends it to the server. Data retrieval is opposite to the preceding process. In this scenario, bytes are also lost.
Based on the analysis in the above scenario, if you want to avoid garbled characters in MySQL, you need to specify the client encoding so that the connector does not understand the error, so that the wrong data will not be stored, to tell the connector the character set of the returned result, you must set three character sets: client character set, returned result character set, and connector character set.
See the following scenarios
# Set the client character set to GBKset character_set_client = gbk; # set the connector character set to GBKset character_set_connection = latin1; # set the returned result character set to GBKset character_set_results = gbk;
When the client is GBK and the connector is latin1, the client character set capacity is larger than the connector character set capacity. for example, the client contains Chinese character encoding, but the connector does not, when the client sends Chinese character data to the connector, when the connector is converted to latin1, bytes are lost, and garbled characters are generated, in addition, this garbled text is an irreparable loss of bytecode (the first 2nd cases of garbled text in the previous article ).
In conclusion, the Server character set is> = Connection character set> = Client character set.
Set character_set_client = gbk; set character_set_connection = gbk; set character_set_results = gbk; # set the above three items to gbk, which can be abbreviated as set namesgbk ;;
To sum up the previous article and this article, you need to pay attention to the following points to avoid garbled characters in php + mysql development:
- Encoding of charset Information in meta of html and php
- Encoding formats for saving html, php, and other files
- Client, connection, and results in the mysql database
- Mysql database table Field Encoding
All the preceding four-point codes are unified.
Other related main UTF-8 BOM problems, PHP correct analysis of UTF-8 string
BitsCN.com