Php Chinese character encoding and conversion methods

Source: Internet
Author: User
Php Chinese character encoding and conversion methods
This article introduces some Chinese character encoding conversion knowledge in php and analyzes the principles and methods of php encoding conversion. For more information, see.

This article describes how php can adapt to this change in mysql based on its understanding of the mysql4.1 character set. It is also applicable to MySQL 5 and later versions.

I. principles I. mysql character set has two concepts: "character set" and "collations ". 1. collationscollations is translated into Chinese as "verification". in the process of web development, this word is only used in mysql. it is mainly used to guide mysql in character comparison, for example, in the ascii character set, collations specifies whether a is less than B, a is equal to a, and a is equal to. Generally, you can ignore the existence of collations because each character set has a default collations. Generally, you can use the default collations. 2. In contrast to character sets, character sets are a broader concept. even common text files in windows are vulnerable to character sets. Different character sets define different character encoding methods. A character set is a set of symbols and encoding, such as the ascii character set, including digits, uppercase/lowercase letters, semicolons, line breaks, and so on, the encoding method is to use a 7bit to represent a character (a is 65, B is 98 ). Ascii only specifies the English letter encoding. non-English languages cannot be represented by ascii encoding. Therefore, different countries are encoded for their own languages. for example, in our country, it has gb2312 encoding. However, there are also some cross-platform problems in the coding of different countries. For this reason, some international standards organizations have developed some universal international coding, the most commonly used is utf8. Ascii is only used to encode English characters and English letters, gb2312 is used to encode English characters, English letters, and Chinese characters, and utf8 is used to encode all languages and texts in the world, the characters in gb1212 contain ascii characters, and utf8 contains gb2312 characters. It can be seen that utf8 is the most widely used character set. Therefore, utf8 is generally used in some multilingual web systems (phpmyadmin uses utf8 encoding ). The storage of any text involves the concept of character sets. Including databases and common text files.

Main terms: characters: Chinese characters, English letters, punctuation marks, Latin, etc. Encoding: converts a character to a computer-stored format. for example, a is expressed as 65. Character Set: A group of characters and their encoding methods. A. mysql character set mysql currently supports multiple character sets and supports conversion between different character sets (for ease of transplantation and support for multiple languages ). Mysql can set server-level character sets, database-level character sets, Data Table-level character sets, and column character sets. In fact, the character sets are stored in columns, for example, you set the col1 column in Table 1 to be of the character type, and col1 only uses the character set. if the col2 column in Table 1 is of the int type, col2 does not use the character set concept. Server-level character sets, database-level character sets, and Data Table-level character sets are default options for column character sets. Mysql must have a character set, which can be specified by adding parameters at startup, during compilation, or in the configuration file. The mysql server character set is used as the default value at the database level. When creating a database, you can specify a character set. if it is not specified, the server character set is used. Similarly, when creating a table, you can specify the character set at the table level. if not, use the character set of the database as the character set of the table. When creating a column, you can specify the character set of a column. if not specified, you can use the character set of the table. Generally, you only need to set server-level character sets, other database-level, table-level, and column-level character sets. Because utf8 is the most extensive character set, we usually set the mysql server-level character set to utf8!

B. character sets of common text. character sets exist in any text storage, and common text files are no exception. In Windows +, open notepad and "Save ..." In the dialog box, you can select the encoding method for storing text. Generally, all Windows + systems use the default encoding. Therefore, character set problems are not encountered. In windows, you can select the encoding method when saving a text file. However, when opening a text file, the encoding method is automatically determined. There is a joke about mobile and Unicom games on the Internet using Notepad of windows +. you can search for it, because the encoding error caused by windows when opening a text file. Because automatic encoding is sometimes incorrect, some text files define how to identify the encoding they use. The html file is such an example. Html is a text file. When you store an html file, you need to use an encoding. in the html file, you also use the html syntax to specify the encoding used by the file (for example ). If no encoding is specified for an html file, the browser automatically identifies the file encoding. If encoding is specified in html, the browser uses the encoding specified in html. Generally, the charset specified by the html file is the same as the encoding of the html file, but there are also inconsistencies. Otherwise, the webpage may be garbled (garbled here, only related to text files and databases .) Using specialized webpage editing tools (such as dreamwave), files are automatically encoded based on the charset value in the webpage.

C. php + mysql character set problem php finally generates a text file, but he wants to take the text in the database or save the text into the database. Mysql supports multiple character sets. by default, mysql does not know what encoding characters php sends to him. Therefore, mysql requires the client (php) to tell him what character set is accessed. Php sets character_set_client to tell mysql what encoding method php stores in the database. Php sets character_set_results to tell mysql what encoding data php needs. Php sets character_set_connection to tell mysql what encoding is used for text in php queries. Mysql uses the configured encoding method to store text. Assume that mysql uses setserver to store text. the character_set_client of php is setclient, and the character_set_results of php is setresult. Mysql converts the text sent by php from the setclient encoding method to the setserver encoding method and then stores it in the database. If php retrieves the text, mysql converts the text from setserver to setresult, and then send it to php. The PHP file (the final html file) itself has an encoding. if the encoding passed by mysql is different from the encoding of the PHP file itself, the entire web page will inevitably be garbled. Therefore, php generally tells mysql about its encoding method. To ensure that there are no garbled characters, three codes must be unified: one is the encoding of the webpage, and the other is the encoding specified in html, the third is the encoding that php tells mysql (including character_set_client and character_set_results ). The first and second codes are usually the same if you use an editor such as dw, but the webpages written in Notepad may be inconsistent. The third encoding must be manually notified to mysql. In this step, you can use mysql_query ("set names characterx") in php.

D. Character Set Conversion problems if a small character set is converted into a large character set, data will not be lost. However, if a large word set is converted into a small character set, data may be lost. For example, some characters in utf8 may not exist in gb2312. Therefore, conversion from utf8 to gb2312 may lose some characters. However, in some cases, convert from gb2312 to utf8 and then from utf8 to gb2312. in this case, data will not be lost, because the converted text is all characters in gb2312, therefore, the entire process is gb2312 characters in conversion and will not be lost. Because utf8 can accommodate all the characters in the world, the database generally uses utf8 encoding. This allows any character to be stored in a utf8-encoded database.

E. phpmyadmin garbled problem phpmyadmin supports multiple languages, which must require utf8 encoding for html pages. The html page uses utf8 encoding, which requires that character_set_client and character_set_results use utf8 encoding when phpmyadmin is connected to mysql. Currently, the php connection to mysql can only use set names (or several other statements) to notify mysql of the encoding method. If no explicit declaration encoding method is available, latin1 encoding is used. Generally, the character_set_client variable is not explicitly declared. Therefore, the gb2312 text is stored in the database in latin1 encoding mode, and phpmyadmin is read in utf8 format. it must be garbled. If the php program is stored in the database according to the correct encoding, it is certainly no problem. Therefore, it is not phpmyadmin that needs to be modified. (although phpmyadmin can solve the garbled problem sometimes, it is not the root of the problem)

II. Summary

1. use utf8 for database storage whenever possible (modify/etc/my. cnf: add default-character-set = utf8 in the [mysqld] segment (convert existing databases to utf8 format first) 2. before the php program queries the database, execute mysql_query ("set names xxxx"). xxxx indicates the code of your webpage (charset = xxxx). if charset = utf8 in the webpage, then xxxx = utf8. if the charset in the webpage is gb2312, then xxxx = gb2312. if the charset in the webpage is ipaddr, xxxx = ipaddr (joke, no such code) almost all web programs have a piece of public code to connect to the database. put it in a file and add mysql_query ("set names") to the file. 3. phpmyadmin does not need to be modified. 4. note: to ensure the actual encoding of the webpage (encoding in the windows Save dialog box) and its declared encoding (charset = ?) Yes. use tools such as dw for web pages.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.