PHP programming in UTF8

Source: Internet
Author: User
For Chinese PHP programming under UTF8, read Chinese PHP programming under UTF8. To be honest, sandal also thinks UTF8 is a good thing ...... After all, the screen shows that China, Japan, and South Korea are very attractive to East Asians ...... (The benefit is not just that ......) Not only webpage programs ...... Many applications use Unicode encoding in their kernels ...... The purpose is obvious. "> <LINKhref =" http: // www

Preface:

To be honest,
Sandals also think UTF8 is a good thing ......
After all, the screen shows that China, Japan, and South Korea are very attractive to East Asians ......
(The benefit is not just that ......)
Not only webpage programs ......
Many applications use Unicode encoding in their kernels ......
The purpose is obvious: Support for multi-language display ......
All Microsoft software is a Unicode kernel ......
Therefore, the Japanese software can be properly displayed on your Chinese XP ......
The Chinese 98 will cause garbled characters when the GB kernel is installed with other language software ......

For UTF8,
It can be said that it is a branch of Unicode,
It stores a Chinese character in three bytes ......
(Unicode uses four bytes)
Application software has all been thrown to Unicode ......
Cannot we use UTF8 for webpage programs?

In this article, I will try my best to introduce PHP programming under UTF8 encoding ......
Why do we separately introduce "Chinese "......
This is because UTF8 ......
Unless you are preparing a multi-language system ......
(I want to complain: foreigners do not pay attention to this issue when writing programs ......)
Second, the UTF8 encoding methods for multi-byte languages such as China, Japan, and South Korea are similar ......
Just move on to the menu ......
Good ...... Let's start with the database processing ......


========================================================== =
Connect to database

Many people will find data out of chaos when upgrading to Mysql 4.1 ......
The reason is that Mysql has supported character sets since Mysql 4.1 ......
The default character set is UTF8 ......
(Fully prove the importance of integration with international standards ...... Hey hey ......)
In the past, most of us used utf8 or GBK encoding ......
The output data is garbled ......
Fix garbled characters ......
You have to let the program know what encoding data to get ......

Assume that your previous database is UTF-8 encoded ......
You can add a statement before the query.

Mysql_query ('set character set utf8') or die ("Query failed:". mysql_error ());
Of course, this is the case only when more than 4.1 of the requests are processed,
Therefore, we can add the following judgment:

$ Mysqlversion = $ db-> query_first ("select version () AS version ");
If ($ mysqlversion ['version']> = '4. 1 ')
{
Mysql_query ('set character set utf8') or die ("Query failed:". mysql_error ());
}
In this way, Mysql can be normally accessed no matter what the default encoding is ......
(No matter whether you are currently in use, whether you are in service or on a regular basis, or even in the case of zero storage, there is no problem ......)

However, they all go global ......
Are you still using utf8?
How to transcode it?
And ......
What if garbled characters occur during data upgrades?
Cool!
Listen to the next decomposition ......

========================================================== ====
Data upgrade to 4.1

Upgrade ......
Export ......
It's really irresponsible to say that foreigners ......
Some Chinese characters were always lost in the previous export method ......
For example, make "I love you" into "I love you ......
(Usually the last word of a piece of data is lost)
The whole thing went wrong ......
(In the words of pomegranate elder sister, it is "such a difficult task is really exciting "......)
To protect your fragile heart ......
To safeguard the traditional Chinese ethics and morality ......
You can change the fields whose data contains Chinese characters to Binary encoding ......
Specific method ......
You can run the following statement:

Alter table 'table name' convert to character set binary;
In this way, the character type fields, such:
CHAR, VARCHAR, and TEXT
Convert
BINARY, VARBINARY, and BLOB
Then export and import the data to the 4.1 environment ......
Of course, the last tedious task is:
You need to change their type back ......

There are upgrades to 4.1 ......
Of course there are also downgrades ......
How to downgrade ???
Sandal, go to the toilet ......
Please go to the next page ......

========================================================== =====
Data downgrade from 4.1

Someone found that the SQL file exported from 4.1 could not be imported into a lower version program ......
The problem is actually very simple ......
Mysql has already thought about everything for us ......
Add the-compatible parameter during export ......
Assume that your database is UTF-8 encoded ......
And the target database version is 4.0 ......
Write the following command line:

Shell> mysqldump -- user = username -- password = password -- compatible = mysql40 -- default-character-set = utf8 database> db. SQL
In this way, the exported SQL file can be smoothly imported to a lower-version database ......

The database is done ......
But how should we pay attention to PHP programming?
You have to go to the next page ......
Http://www.knowsky.com
========================================================== =====
PHP file encoding

Do all PHP files have to be converted to UTF8 encoding?
Sandal tells you NO ......

Let's just say ......
If the file contains Chinese characters to be displayed ......
It should be converted to UTF8 encoding ......
For example:

// I am a sandal.
Echo time ();
Although the above code has code ......
But because it exists in the annotation ......
Not output ......
So this page does not need to be converted to UTF8 format ......

Another example is:

Echo "I am a sandal ";
This is obviously a Chinese character output ......
You can still convert it to UTF8 ......

Of course, many programs now use the template (language pack) technology ......
Programs (non-language pack files) do not see any characters for output ......
In this way, we only need to convert the language pack file to UTF8 encoding ......
(Here is the advantage of the language pack ...... Haha ......)
'Http: // www.knowsky.com
========================================================== ============

UTF8 Chinese interception

Because UTF8 uses three bytes ......
So the traditional substr function is completely different ......
Many experts have written UTF8 Chinese character truncation functions ......
Here we provide several types:

1. calculate and then obtain

/**
* Author: Dummy | Zandy
* Email: lianxiwoo@gmail.com | hotmail.com
* Create: 200512
* Usage: echo join ('', String: subString_UTF8 ('Chinese characters', 0, 1 ));
*/
Ini_set ('display _ errors ', 1 );
Error_reporting (E_ALL ^ E_NOTICE );
Class String {
Function subString_UTF8 ($ str, $ start, $ lenth)
{
$ Len = strlen ($ str );
$ R = array ();
$ N = 0;
$ M = 0;
For ($ I = 0; $ I <$ len; $ I ++ ){
$ X = substr ($ str, $ I, 1 );
$ A = base_convert (ord ($ x), 10, 2 );
$ A = substr ('20140901'. $ a,-8 );
If ($ n <$ start ){
If (substr ($ a, 0, 1) = 0 ){
} Elseif (substr ($ a, 0, 3) = 110 ){
$ I + = 1;
} Elseif (substr ($ a, 0, 4) = 1110 ){
$ I + = 2;
}
$ N ++;
} Else {
If (substr ($ a, 0, 1) = 0 ){
$ R [] = substr ($ str, $ I, 1 );
} Elseif (substr ($ a, 0, 3) = 110 ){
$ R [] = substr ($ str, $ I, 2 );
$ I + = 1;
} Elseif (substr ($ a, 0, 4) = 1110 ){
$ R [] = substr ($ str, $ I, 3 );
$ I + = 2;
} Else {
$ R [] = '';
}
If (++ $ m >=$ lenth ){
Break;
}
}
}
Return $ r;
} // End subString_UTF8
} // End String
Echo join ('', String: subString_UTF8 ('Chinese character ', 0, 1 ));
2. intercept and retrieve
This kind of sandal looks clever ......
Use traditional truncation functions to intercept ......
Then, determine whether a single Chinese character is split ......
If it is ...... Processing ......
Note that the third parameter of the substr function must be greater than 3 ......
Why not use sandal sandals?

// A trim function to remove the last character of a UTF-8 string
// By following instructions on http://en.wikipedia.org/wiki/UTF-8
// Dotann
// Usage: $ str = utf8_trim (substr ($ str, 0, 50 ));
Function utf8_trim ($ str ){
$ Len = strlen ($ str );
For ($ I = strlen ($ str)-1; $ I >=0; $ I-= 1 ){
$ Hex. = ''. ord ($ str [$ I]);
$ Ch = ord ($ str [$ I]);
If ($ ch & 128) = 0) return (substr ($ str, 0, $ I ));
If ($ ch & 192) = 192) return (substr ($ str, 0, $ I ));
}
Return ($ str. $ hex );
}
$ Str = 'kanji ';
Echo utf8_trim (substr ($ str, 0, 3 ));
3. there are other methods,
For example, 007pig is the function written in the Chinese version of vBulletin ......
Short and refined ......
The source code cannot be released ......
No ......

Write it here today ......
There are transcoding and other issues not written ......
Busy recently ......
If you have time, continue sorting ......
Http://www.quchao.com /? P = 6 & pp = 1

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.