The problem of Chinese coding in PHP programming often bothers many people who have no experience in dealing with this problem, in fact, the reason for this problem is very simple, each country (or region) has set the computer information exchange with the character encoding set, such as the U.S. Extended ASCII code, China's gb2312-80, Japan's JIS such as As the basis of information processing in this country/region, character encoding set plays an important role in unified coding. The character encoding set is divided into SBCS (single-byte character set) and DBCS (double-byte character set) by length. Early software (especially the operating system), in order to solve the local character information computer processing, there have been various localized versions (L10N), in order to differentiate, introduced the LANG, Codepage and other concepts. However, because of the overlapping of local character set code, it is difficult to exchange information with each other. Each localized version of the software has high independent maintenance costs. Therefore, it is necessary to extract the commonality in the localization work, and to make a consistent processing, so that the special localization processing content is minimized. This is also called internationalization (118N). Various language information is further regulated as locale information. The underlying character set for processing becomes Unicode, which contains almost all glyphs.
Most of the software core character processing with internationalized features is now based on Unicode, which determines the local character encoding settings based on the Ocale/lang/codepage settings at the time of the software operation and handles local characters accordingly. The conversion between Unicode and local character sets is required during processing, or even two different local character sets in the middle of Unicode. This approach is further extended in the network environment, and the character information on either side of the network needs to be converted to acceptable content based on the settings of the character set.
Character set encoding problems in the database
The popular relational database system supports database character set encoding, which means that its own character set settings can be specified when the database is created, and the database data is stored in the specified encoding format. When an application accesses data, there is a character set encoding conversion at both the entrance and exit. For Chinese data, the database character encoding settings should guarantee the integrity of the data. GB2312, GBK, UTF-8, etc. are optional database character set encoding; Of course we can also choose iso8859-1 (8-bit), but we have to
Using the program to write data before the 16Bit of a Chinese character or Unicode split into two 8-bit characters, after reading the data also need to combine two bytes, but also to identify the SBCS characters, so we do not recommend the use of Iso8859-1 as the database character set encoding. This not only makes full use of the database's own character set encoding support, but also increases the complexity of programming. When programming, you can use the management functions provided by the database management system to check if the Chinese data is correct.
PHP program before querying the database, first execute mysql_query ("SET NAMES xxxx"); where xxxx is your page encoding (CHARSET=XXXX), if the page Charset=utf8, then Xxxx=utf8, if the page charset=gb2312, then xxxx=gb2312, almost all WEB programs, There is a connection to the database of common code, put in a file, in this file, add mysql_query ("SET NAMES xxxx") on it.
Set NAMES shows what character set is used in the SQL statement sent by the client. Therefore, the set NAMES ' utf-8 ' statement tells the server that "the information coming from this client will be in character set Utf-8". It also specifies a character set for the result that the server sends back to the client (for example, if you use a SELECT statement, it indicates what character set the column values use).
Common tips for locating problems
The problem with locating Chinese encoding is usually the stupidest and most effective way to print the inner code of a string after you think the program is suspect. By printing the inner code of the string, you can find out when the Chinese characters are converted to Unicode, when the Unicode is returned to the Chinese code, when the text is two Unicode characters, when the string is translated into a string of question marks, When is the high of the Chinese string truncated ...
Taking the appropriate sample string also helps to differentiate between types of problems. such as: "AA ah AA @aa" and other Chinese and English, GB, GBK character strings. In general, no matter how the English characters are converted or processed, it will not be distorted (if encountered, you can try to increase the length of consecutive English letters). 1
http://www.bkjia.com/PHPjc/446704.html www.bkjia.com true http://www.bkjia.com/PHPjc/446704.html techarticle The problem of Chinese coding in PHP programming often bothers a lot of people who have no experience in dealing with this problem, in fact, the reason for this problem is very simple, each country (or region) has stipulated ...