Php Chinese garbled data inventory not transferred records

Source: Internet
Author: User
Tags mysql code
MySQL stores the json_encode format in php and solves the problem. MySQL stores the json_encode format information in php. In case of Chinese characters, it will become a bunch of information similar to uxxxx. 1. Cause Analysis: when it is stored in the database! MySQL does not store unicode characters: MySQL only supports basic multilingual flat characters (00000-0xFFFF ). Try to store a synonym

MySQL stores the json_encode format in php and solves the problem. MySQL stores the json_encode format information in php. In case of Chinese characters, it will become a bunch of information similar to uxxxx. 1. Cause Analysis: when it is stored in the database! MySQL does not store unicode characters: MySQL only supports basic multilingual flat characters (00000-0xFFFF ). Try to store a synonym

How to solve the problem of json_encode format in MySQL storage php

MySQL stores the json_encode format information in php. When encountering Chinese characters, it will become a bunch of information similar to uxxxx.

1. Cause Analysis:
When it is stored in the database! MySQL does not store unicode characters:

MySQL only supports basic multilingual flat characters (0x0000-0 xFFFF ). Please try to store a synonym opposite :)

Update: MySQL 5.5.3 (not yet GA). You can add additional characters if you use UTF8MB4 encoding.

When json_encode is Chinese, each Chinese character is encoded as "\ uxxxx"
When the database is saved, "\" is blocked and directly becomes "uxxxx"

2. Solve the problem:

If you know why, you can solve the problem. You can choose another storage method;
Or escape "\" as "\" as the remedy to keep "\"
Our solution:

1. Prevent json_encode from converting Chinese to unicode.
PHP5.4 has added the option JSON_UNESCAPED_UNICODE to Json. After this option is added, Chinese characters are not automatically encoded.

$ Test = json_encode ("Shenzhen", JSON_UNESCAPED_UNICODE );

2. urlencode, json_encode, and urldecode are used to ensure that Chinese characters are not converted to unicode.

$ Test = urldecode (json_encode (array ('Brief '=> urlencode ('Introduction'), 'title' => urlencode (title )));

3. Escape "\" as "\" to avoid the special characters '\' before unicode Chinese characters being removed by mysql.

$ Str = json_encode ('Chinese ');

$ Test = addslashes ($ str); or $ test = mysql_escape_string ($ str );

You can directly insert mysql to solve the problem.

I. First, PHP webpage code

1. The php file encoding should match the webpage encoding.

A. if you want to use gb2312 encoding, php needs to output the header: header ("Content-Type: text/html; charset = gb2312") and add a static page The encoding format of all files is ANSI, which can be opened in Notepad. Save it as ANSI and overwrite the source file.

B. if you want to use UTF-8 encoding, php needs to output headers: header ("Content-Type: text/html; charset = UTF-8") and add static pages The encoding format of all files is UTF-8. It may be a little troublesome to save it as UTF-8. Generally, BOM is generated at the beginning of the UTF-8 file. If session is used, problems may occur. You can use editplus to save it in editplus, tool-> Parameter Selection-> file-> UTF-8 signature, select the total is to delete, and then save to remove the BOM information.

2. php itself is not Unicode. All functions such as substr must be changed to mb_substr (mbstring extension is required), or iconv transcoding is used.

Ii. Data Interaction between PHP and Mysql

PHP and database encoding should be consistent

1. Modify the mysql configuration file my. ini or my. cnf. It is best to use utf8 encoding for mysql.

[Mysql]
Default-character-set = utf8
[Mysqld]
Default-character-set = utf8
Default-storage-engine = MyISAM
Add the following under [mysqld:
Default-collation = utf8_bin
Init_connect = 'set NAMES utf8'

2. add mysql_query ("set names 'code'") before the php program that requires database operations. The encoding is consistent with the php code. If the php code is gb2312, the mysql code is gb2312, if it is UTF-8, mysql encoding is utf8, so no garbled characters will appear during data insertion or retrieval.

Iii. PHP related to the Operating System

The encoding for Windows and Linux is different. In Windows, if the parameter is UTF-8 encoded when a function of PHP is called, an error occurs, such as move_uploaded_file (), filesize (), and readfile () these functions are often used for processing uploads and downloads. The following errors may occur during calls:

Warning: move_uploaded_file () [function. move-uploaded-file]: failed to open stream: Invalid argument in...

Warning: move_uploaded_file () [function. move-uploaded-file]: Unable to move ''to ''in...

Warning: filesize () [function. filesize]: stat failed for... in...

Warning: readfile () [function. readfile]: failed to open stream: Invalid argument in ..

Although gb2312 encoding in Linux does not produce these errors, the stored file name becomes unreadable due to garbled characters. In this case, you can first convert the parameter to the encoding recognized by the operating system, encoding conversion can be performed using mb_convert_encoding (string, new encoding, original encoding) or iconv (original encoding, new encoding, string). In this way, the stored file name will not contain garbled characters, you can also normally read files to upload and download files with Chinese names.

In fact, there are still better solutions to completely break away from the system, so you don't have to consider the encoding of the system. You can generate a sequence with only letters and numbers as the file name, and store the original Chinese name in the database. In this way, calling move_uploaded_file () will not cause problems, during the download, you only need to change the file name to the original name with Chinese characters. The download code is as follows:

Header ("Pragma: public ");

Header ("Expires: 0 ");

Header ("Cache-Component: must-revalidate, post-check = 0, pre-check = 0 ");

Header ("Content-type: $ file_type ");

Header ("Content-Length: $ file_size ");

Header ("Content-Disposition: attachment; filename = \" $ file_name \"");

Header ("Content-Transfer-Encoding: binary ");

Readfile ($ file_path );

$ File_type is the file type, $ file_name is the original name, and $ file_path is the address of the file stored on the service.

4. Let's summarize why Garbled text occurs.

In general, there are two possible causes for Garbled text. The first reason is the incorrect encoding (charset) setting, which leads to the incorrect encoding resolution by the browser, resulting in the "tianshu", which is full of screens ", secondly, the file is opened with an error code, and then saved, for example, a text file originally GB2312 encoding, but opened with UTF-8 encoding and then saved. To solve the above garbled code problem, you must first know which stages of development involve encoding:

1. file encoding: indicates the encoding of the page file (.html,. php, etc. Notepad and Dreamweaver automatically recognize the file encoding when opening the page, so there is no problem. ZendStudio does not automatically recognize the encoding. It only opens the file with a certain encoding according to the preference configuration. If you do not pay attention to it during work, use the error code to open the file, after the modification, the garbled code will appear as soon as it is saved (I have a deep understanding ).

2. Page declarative encoding: in the HTML code HEAD, you can use To tell the browser web page using what encoding, currently Chinese website development XXX mainly uses GB2312 and UTF-8 two types of encoding.

3. Database Connection encoding: it refers to the encoding used to transmit data with the database during database operations. Note that it should not be confused with the database encoding, for example, MySQL uses latin1 encoding by default. That is to say, Mysql uses latin1 encoding to store data. Data transmitted to Mysql using other encoding will be converted to latin1 encoding.
When we know where encoding is involved in WEB development, we also know the cause of garbled code: the three encoding settings are inconsistent, because most of the encodings are compatible with ASCII, so the English symbols will not appear, and Chinese characters will be unlucky.

5. battle against common errors and solutions:

1. The database uses UTF8 encoding, while the page declarative encoding is GB2312, which is the most common cause of garbled code. In this case, the SELECT data in the PHP script is garbled. You need to use mysql_query ("set names gbk") before querying to SET the MYSQL connection encoding, ensure that the page declarative encoding is consistent with the connection encoding set here (GBK is an extension of GB2312 ). If the page is UTF-8 encoded, you can use: mysql_query ("set names UTF8 ");
Note that it is UTF8 instead of a general UTF-8. If the encoding stated on the page is consistent with the internal encoding of the database, no connection encoding can be set.

Note: In fact, MYSQL data input and output are more complex than described above. MYSQL configuration file my. ini defines two default encodings, they are default-character-set in [client] and default-character-set in [mysqld] to set the encoding used for client connection and database respectively by default. The encoding we specified above is actually the command line parameter character_set_client when the MYSQL client connects to the server to tell the MYSQL server what encoding the client data is received, rather than the default encoding.

2. The page declarative encoding is inconsistent with the file encoding. this rarely happens, because if the encoding is inconsistent, the attacker will see garbled code in the browser when making the page. More often, it is caused by modifying some minor bugs after the release, opening the page with error code, and saving it. Or you can use some FTP software to directly modify files online, such as CuteFTP. The conversion error is caused by incorrect software encoding.

3. Some friends who rent a VM clearly confirm that the above three codes are correctly set and there are still garbled characters. For example, the web page is GB2312 encoding, IE and other browsers open but always recognized as a UTF-8, the web page HEAD has been declared is GB2312, manually modify the browser code to GB2312 after the page shows normal. The cause is that the server Apache sets the server's global default encoding, added the AddDefaultCharset UTF-8 in httpd. conf. At this time, the server will first send an HTTP header to the browser, which has a higher priority than the declarative encoding in the page, and the natural browser will recognize the error. There are two solutions. The administrator needs to add adddefacharcharset GB2312 to the virtual machine in the configuration file to overwrite the global configuration, or configure it in the. htaccess directory.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.