The "ultimate" solution for PHP and MYSQL matching Chinese garbled characters

Last Update:2018-05-10 Source: Internet

Author: User

Tags mediawiki oscommerce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The Chinese encoding problem in PHP programming has plagued many people. The cause of this problem is actually very simple. Every country (or region) specifies the character collation set for computer information exchange, such as the expanded ASCII code of the United States, GB2312-80 of China, JIS of Japan, etc. As the basis for information processing in the country/region, character encoding sets play an important role in unified encoding. The character Collation is divided into SBCS (single-byte character set) and DBCS (dubyte Character Set) by length. Early software (especially the operating system), in order to solve the computer processing of local character information, various local versions (L10N) were introduced. to distinguish, LANG, Codepage and other concepts were introduced. However, the Code ranges of local character sets overlap, making it difficult to exchange information between them. The independent maintenance costs of each localized version of the software are high. Therefore, it is necessary to extract the commonalities in the localization work for consistent processing, so as to minimize the content of special localization processing. This is the so-called International (118N ). The language information is further standardized as Locale information. The underlying character set to be processed becomes Unicode that contains almost all glyphs. Currently, most of the software's core Character Processing Systems with internationalization features are Unicode-based. During software running, the corresponding local character encoding settings are determined based on the current ocale/Lang/Codepage settings, and handle local characters accordingly. In the process, Unicode and local character sets must be converted to each other, or two different local character sets with Unicode as the center must be converted to each other. This method is further extended in the network environment. The character information at both ends of any network needs to be converted to acceptable content according to the character set settings.

　　Character Set encoding in the databasePopular Relational Database Systems Support database character set encoding. That is to say, when creating a database, you can specify its own character set settings. database data is stored in the specified encoding format. When an application accesses data, character set encoding is converted at the entry and exit. For Chinese data, the character encoding settings of the database should ensure data integrity. GB2312, GBK, UTF-8 are optional database character set encoding; of course we can also choose ISO8859-1 (8-bit ), we only need to split a 16-bit Chinese character or Unicode character into two 8-bit characters before writing data in the application. After reading the data, we also need to combine the two bytes, we also need to identify the SBCS characters, so we do not recommend using ISO8859-1 as the database character set encoding. This not only does not make full use of the database's own character set encoding support, but also increases programming complexity. During programming, you can check whether the Chinese data is correct with the management function provided by the database management system.
Before the PHP program queries the database, it first executes mysql_query ("set names xxxx"). xxxx indicates the code of Your webpage (charset = xxxx). If charset = utf8 In the webpage, then xxxx = utf8. If charset = gb2312 In the webpage, then xxxx = gb2312. Almost all WEB programs have a public code to connect to the database and put it in a file, in this file, add mysql_query ("set names xxxx.
Set names displays the character sets used in the SQL statements sent by the client. Therefore, the set names 'utf-8' statement tells the server that "the information sent from this client will use the UTF-8 character SET ". It also specifies the character set for the results sent from the server back to the client (for example, if you use a SELECT statement, it indicates the character set used by the column value ).

Frequently Used troubleshooting skills:
The most stupid and effective way to locate Chinese encoding problems is to print the string's internal code after you think the program is suspected of processing it. By printing the character string's internal code, you can find out when Chinese characters are converted to Unicode, when Unicode is converted back to Chinese characters, and when a Chinese character is converted to two Unicode characters, when is the Chinese string converted into a question mark? When is the high position of the Chinese string truncated ......

Selecting the appropriate sample string also helps to identify the type of the problem. For example, "aa, aa? @ Aa "is a string of all Chinese and English characters including GB and GBK. In general, English characters are not distorted no matter how they are converted or processed (if you encounter it, you can try to increase the length of consecutive English letters ).

　　Solve the garbled problem of various applications

There is also a problem. Why is the former absolutely effective, while the latter sometimes does not? This is the reason for Apache.
3) The conf folder in the adddefacharcharset Apache root directory contains the entire Apache configuration file httpd. conf. Open httpd. conf in a text editor. Line 1 (different versions may be different) has adddefacharcharset xxx, and xxx is the encoding name. Set the character set in the http header of the webpage file on the server as your default xxx character set. This line adds a header ("content-type: text/html; charset = xxx") to each file "). Now, we can see why UTF-8 is clearly set, and the browser can always use gb2312.

The preceding priorities are listed below:. header ("content-type: text/html; charset = xxx"). WVE .. AddDefaultCharset xxx

.. If you are a web programmer, we recommend that you add a header ("content-type: text/html; charset = xxx") to each page "), in this way, it can be correctly displayed on any server, and the portability is also strong.

4) default_charset configuration in php. ini: default_charset = "gb2312" in php. ini defines the default language character set of php. It is generally recommended to comment out this line so that the browser can automatically select a language based on the charset in the web page header, rather than making a mandatory provision, so that Web Services in multiple languages can be provided on the same server.

　　Conclusion

　　In fact, the Chinese encoding in php development is not as complicated as imagined. Although there are no rules for locating and solving problems, and various runtime environments are different, the principles behind them are the same. Understanding character sets is the basis for solving character problems. However, as the Chinese Character Set changes, not only php programming, Chinese Information Processing problems still exist for a period of time..

What to sayMysqlGarbled characters: Start with mysql parameters. Start with mysql 5, there are several additional system variables for Character Set setting:

Character_set_client client Character Set

Character_set_connection character set used for client-to-server connection

Character_set_results SELECT Character Set of the returned data

Character_set_databaseDatabaseCharacter set used

The garbled problem is generally caused by incorrect settings of the above variables. Many people may obtain the following answer when asking the garbled question: "set names first ". So what is set names? Set names actually sets the three system variables character_set_client, character_set_connection, and character_set_results.
For example, set names 'gbk' is equivalent:

Set @ character_set_client = 'gbk'

Set @ character_set_connection = 'gbk'

Set @ character_set_results = 'gbk'

In many cases, the garbled problem can be solved after this setting. But we still cannot completely avoid the possibility of garbled characters. Why?

The character_set_client and character_set_connection variables are only used to ensure the consistency with character_set_database encoding, while character_set_results is used to ensure that the results returned by the SELECT statement are consistent with the encoding of the program.

For example, if your database (character_set_database) uses the utf8 character set, you must ensure that character_set_client and character_set_connection are also the utf8 character set. However, your program may not use utf8. For example, if your program uses gbk, you may encounter garbled characters if you set character_set_results to utf8. In this case, set character_set_results to gbk. This ensures that the database returns the same result as the encoding of your program.

Next I will provide a section for setting character setsCode(A self-written db database is used, and I believe it should not affect reading ):

// Assume that our program uses the utf8 character set.

$ Program_char = 'utf8 ';

// Check the mysql version number first. if the version number is greater than 4, you can set these system variables (mysql4 does not have these system variables)

$ Version = current ($ db-> fetch_one ('select VERSION ()'));

If (substr ($ version, 0, 1)> 4)

{

// Retrieve the character set of the current database

$ SQL = 'select @ character_set_database ';

$ Char = current ($ db-> fetch_one ($ SQL ));

// Set the client character set (character_set_client) and connection character set (character_set_connection) to be consistent with the database character set (character_set_database)

$ Db-> query ('set @ character_set_client = "'. $ char .'"');

$ Db-> query ('set @ character_set_connection = "'. $ char .'"');

// Set the character set of the data returned by the SELECT query to be consistent with the character set of the current program

$ Db-> query ('set @ character_set_results = "'. $ program_char .'"');

}

1. Ensure that the data stored in the database is consistent with the database encoding, that is, the data encoding is consistent with character_set_database;

2. Ensure that the character sets for communication are consistent with those for databases, that is, character_set_client and character_set_connection are consistent with character_set_database;

3. Ensure that the returned results of SELECT are encoded in the same way as those of the program, that is, character_set_results is encoded in the same way as the program;

4. Ensure that the program code is consistent with the browser code, that is, the program code and .

The ultimate solution to mysql Chinese garbled characters

I promise this is gonna be the last time on it
Since writing the following two articles:
Precautions for wordpress 1.5 upgrade-Chinese Garbled text
MySQL (the best combination with PHP) 4.1 Chinese garbled second
~
Over the past few months, I have been writing to ask questions about compatibility with MySQL (the best combination with PHP) in Chinese, so I have the opportunity to see various connection methods and programs used by many players, it's too eye-catching.
Recently, many frameworks written last year cannot be used because of converting to flex 2 as a development platform and rewriting some products. However, some libraries that specifically process multiple languages also become invalid, as a result, I have to face this problem again, so I would like to take this opportunity to organize some new experiences.
* MySQL (the best combination with PHP) causes Chinese garbled characters
MySQL (the best combination with PHP) causes Chinese garbled characters, which are similar to the following:
-MySQL (the best combination with PHP) server settings, for example, still stuck in latin1
-MySQL (the best combination of PHP and MySQL) table language settings (including character and collation)
-Connection language settings for client programs (such as php)
In the previous two articles, we have introduced how to set the character/collation of the server/table for MySQL (the best combination with PHP.
Next, you only need to pay attention to the following points:
* MySQL (the best combination with PHP) can solve Chinese garbled characters
1. MySQL (the best combination with PHP) will read a preset config file at startup, which is generally named my. ini and will search for this file in the following two locations:
C: \ windows \ my. ini is the installation directory of the job system. It may also be C: \ winnt \ my. ini.
C: \: my. cnf is the root directory of C disk.
Note that suffix (suffix) files in different locations are different, which is not particularly emphasized in previous articles, so at that time, I used a more complex method to directly register MySQL (the best combination with PHP) into the service and specify my. ini location.

2. The content in my. ini is:
[MySQL (the best combination with PHP) d]
Default-character-set = utf8
[Client]
Default-character-set = utf8
Init_connect = 'set NAMES utf8'
Among them, MySQL (the best combination with PHP) d is the language used when the server is started, but if it is set to utf8, it may make many English software unhappy, such as osCommerce/mediaWiki, therefore, we recommend that you set it to latin1.
Examples, mysql(and php d))))))))adminadminadminadminadminadminadminadminadminadminadmin.exe or MySQL (the best combination with PHP) Control Center, which will read this configuration file and then use utf8 to connect.
Note: Thanks to b6s sang for providing the second line of instruction, it is said that it is faster than a php program to set connection setting, this should also solve the problem that phpmyadmin cannot correctly display unicode Chinese characters (however, amfphp does not use this set, so you must set your own language)

However, most engineers should write their own php/jsp (the preferred choice for SUN Enterprise Applications) programs to connect. At this time, they will naturally not read this setting and continue to use the default language-latin1.

This is exactly where I sent a letter to my friend.

Usually I will use an independent file to process MySQL (the best combination with PHP) connection settings, for example:
PLAIN TEXT
// Database connection details.
$ Host = "localhost ";
$ Link = MySQL (the best combination with PHP) _ connect ($ host, "xxx", "xxx ");
MySQL (the best combination with PHP) _ query ("set names 'utf8 '");
MySQL (best combination with PHP) _ select_db ("your_table_name_here", $ link );
?>
Note that the "set names 'utf8'" command is added to the fifth line after MySQL (the best combination with PHP) _ connect, tell MySQL (the best combination with PHP) That utf8 is used for connection content after connection. After such setting, most problems can be solved.
It can be inferred from this that if you use a custom connection pooling mechanism, you should remember to set it to utf8 immediately after each new connection.
After a few days of reverse tests (using Chinese characters such as traditional Chinese, simplified Chinese, Japanese, and Korean), this group of settings has confirmed that there will be no garbled characters or some words will become "Mouth ".
Example: If the words entered by the examinee are "no choice" and can be read again after they are correctly entered into MySQL (the best combination with PHP), it is no problem in Chinese, if the words become "Mouth" after reading it, it means that the revolution is not yet successful. comrades still need to work hard... orz
Of course, I also verified flex2-amfphp-php-MySQL (the best combination with PHP) back-and-forth resultset will no longer have Chinese garbled issues, you can use it with peace of mind.

The ultimate solution to mysql Chinese garbled characters
I promise this is gonna be the last time on it
Since writing the following two articles:
Precautions for wordpress 1.5 upgrade-Chinese Garbled text
MySQL 4.1 Chinese garbled second click
~
Over the past few months, I have been writing letters asking questions about compatibility with MySQL Chinese. Therefore, it is too eye-catching to see various connection methods and programs used by many players.
Recently, many frameworks written last year cannot be used because of converting to flex 2 as a development platform and rewriting some products. However, some libraries that specifically process multiple languages also become invalid, as a result, I have to face this problem again, so I would like to take this opportunity to organize some new experiences.
* MySQL Chinese garbled characters
MySQL causes Chinese garbled characters for the following reasons:
-MySQL server settings, such as latin1
-MySQL table language settings (including character and collation)
-Connection language settings for client programs (such as php)
In the previous two articles, we have introduced how to set character/collation for MySQL server/table.
Next, you only need to pay attention to the following points:
* MySQL Chinese garbled code Solution
1. When MySQL starts, it reads a preset config file named my. ini, which searches for the file in the following two locations:
C: \ windows \ my. ini is the installation directory of the job system. It may also be C: \ winnt \ my. ini.
C: \: my. cnf is the root directory of C disk.
Note that the files in different locations are different from suffix. This is not particularly emphasized in the previous article. At that time, I used a more complex method to directly register MySQL into the service, and specify my. ini location.

2. The content in my. ini is:
[MySQLd]
Default-character-set = utf8
[Client]
Default-character-set = utf8
Init_connect = 'set NAMES utf8'
MySQLd is the language used for server startup. However, setting utf8 may make many English software unhappy, such as osCommerce/mediaWiki. Therefore, we recommend setting it to latin1.
Examples, MySQLd.exe, MySQLadmin.exe, or MySQL Control Center programs will read this configuration file and use utf8 to connect.
Note: Thanks to b6s sang for providing the second line of instruction, it is said that it is faster than a php program to set connection setting, this should also solve the problem that phpmyadmin cannot correctly display unicode Chinese characters (however, amfphp does not use this set, so you must set your own language)

However, most engineers should write their own php/jsp (the preferred choice for SUN Enterprise Applications) programs to connect. At this time, they will naturally not read this setting and continue to use the default language-latin1.

This is exactly where I sent a letter to my friend.

I usually use an independent file to process MySQL connection settings, for example:
PLAIN TEXT
// Database connection details.
$ Host = "localhost ";
$ Link = MySQL_connect ($ host, "xxx", "xxx ");
MySQL_query ("set names 'utf8 '");
MySQL_select_db ("your_table_name_here", $ link );
?>
Note that the "set names 'utf8'" command is added to the fifth line after MySQL_connect, telling MySQL that utf8 is used for connection content after connection, generally, most problems can be solved.
It can be inferred from this that if you use a custom connection pooling mechanism, you should remember to set it to utf8 immediately after each new connection.
After a few days of reverse tests (using Chinese characters such as traditional Chinese, simplified Chinese, Japanese, and Korean), this group of settings has confirmed that there will be no garbled characters or some words will become "Mouth ".
For example, if the subject enters MySQL correctly and then reads the words "no", then the Chinese characters are okay, if the words become "Mouth" after reading it, it means that the revolution is not yet successful. comrades still need to work hard... orz
Of course, I also verified that the resultset passed back and forth by flex2-amfphp-php-MySQL will no longer have Chinese garbled characters, so you can use it with peace of mind.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More