Chinese PHP programming under UTF8

Source: Internet
Author: User
Tags character set mysql ord pack php file php programming strlen table name
Programming | Chinese

"Abstract" in this article, the sandals as far as possible to introduce the UTF8 code of PHP programming, as to why the introduction of a separate "Chinese", one is because the English this thing really do not need to consider UTF8 ... Unless you're prepared to do a long language system, I'm going to complain: Now the foreigner is writing the program without paying attention to the problem. , and the other is China, Japan and South Korea, such as multibyte language in the UTF8 code under the treatment of the same way in fact ...
To tell the truth, I also think UTF8 is a good thing, after all, the same screen shows that the attraction of East Asian people is not small (of course, the advantage is not only this ...) is not just a Web page program, but the kernel of many applications is beginning to use Unicode encoding, which is obvious: support for multi-language display ...
All Microsoft software is a Unicode kernel ...
So the Japanese software to get your Chinese XP can be displayed on the normal ...
The Chinese 98 will be because the GB kernel installation of other language software to create garbled ...

As for UTF8,
It can be said to be a branch of Unicode,
It saves a Chinese character in three bytes ...
(Unicode in four bytes)
Application software is collectively defected to Unicode ...
We're not allowed to use UTF8 for Web apps?

The sandals in this article as far as possible to introduce the UTF8 code of PHP programming ...
As for why the introduction of "Chinese" alone ...
This is because the English thing really does not need to consider UTF8 ...
Unless you are prepared to do multiple language systems ...
(I would like to complain: now the foreigner to write the program does not pay attention to this problem ... )
The other is the Chinese and Japanese and South Korea, such as multi-byte language in the UTF8 code under the treatment of the same way in fact ...
According to gourd painting ladle can ...
Good...... Let's start with the database Processing section ...


==========================================
Connecting to a database

A lot of people just upgrade to Mysql 4.1 will find the data out of chaos ...
It's actually because Mysql has been supporting the character set since 4.1 ...
and the default character set is UTF8 ...
(Full proof of the importance of international standards ...) Hey...... )
In the past, most of us used UTF8 or GBK coding ...
Since then the output of the data is of course garbled ...
To solve garbled ...
You have to let the program know what encoded data to get ...

Let's assume that your previous database was UTF8 coded ...
Then you can add a sentence before the query

mysql_query (' Set CHARACTER set UTF8 ') or Die ("Query failed:".) Mysql_error ());
Of course, since more than 4.1 is needed to deal with this,
So we can add judgment:

$mysqlversion = $db->query_first ("SELECT version () as version");
if ($mysqlversion [' Version '] >= ' 4.1 ')
{
mysql_query (' Set CHARACTER set UTF8 ') or Die ("Query failed:".) Mysql_error ());
}
So no matter what the Mysql default code is the normal access to ...
(Whether you're alive, regular, or even Zecundi is not a problem ...) )

But, people are international birds ...
Do you still use UTF8?
How do you turn the code?
And also......
What happens when the data is upgraded?
Cold!
And listen to let's ...

==============================
Data upgrade to 4.1

To upgrade ...
We must first export ...
To say that foreigners are really not responsible for ...
The previous export method always lost some Chinese characters ...
For example, "I love your Mother" into "I love You" ...
(usually a word that loses the end of a piece of data)
A whole generation of children ...
In the words of the pomegranate sister, "it is so outrageous that it is too exciting" ... )
To protect your fragile heart ...
Also in order to uphold the Chinese traditional ethical moral ...
You can change the field of the data containing the Chinese characters to Binary (Binary) encoding ...
The Concrete method ...
You can run this statement:

ALTER table ' table name ' CONVERT to CHARACTER SET binary;
In this way, those character type fields, such as:
CHAR, VARCHAR, and TEXT
will be converted to
BINARY, VARBINARY and BLOBs
Then export and import into the 4.1 environment ...
The last tedious task, of course, is:
You need to change their type back to ...

There are 4.1 upgrades to ...
Of course, there are downward downgrade ...
How to downgrade???
Sandals to go to the toilet ...
And you please turn the page ...

=========================
Data degraded from 4.1

Someone found that the SQL file exported from 4.1 Could not import the low version program ...
The problem is actually very simple ...
And Mysql has been thinking about everything for us ...
Please add –compatible parameter when exporting ...
Let's assume that your database is UTF8 coded ...
and the target database version is 4.0 ...
So the command line says:

Shell>mysqldump--user=username--password=password--compatible=mysql40--default-character-set=utf8 Database > Db.sql
This exported SQL file will be able to successfully import the lower version of the database ...

The database is partially done ...
But how do you pay attention to PHP programming?
You have to go over the page ...
Http://www.knowsky.com
=============================================
PHP file Encoding

Do all PHP files have to be turned into UTF8 encoding?
Sandals tell you that it is NO ...

Let's just say ...
If the file contains Chinese characters that need to be displayed ...
It should be converted to UTF8 code ...
For example:

I'm a sandal.
echo Time ();
The code above has code ...
But because it exists in the comments ...
does not output ...
So this page doesn't have to be converted to UTF8 format ...

Another example:

echo "I am a sandal";
This obviously has the Chinese character output ...
You have to convert to UTF8 ...

Of course, many programs now use template (language Pack) technology ...
The program (Non-language pack file) does not see any characters for the output ...
So we just need to turn the language pack file into UTF8 code ...
(The advantage of the language pack is here.) Aha, haha ... )
' Http://www.knowsky.com
==================================================

UTF8 Chinese interception

Because UTF8 uses three bytes ...
So the traditional substr function is out of the ...
A lot of experts have written the UTF8 of the character intercept function ...
Here are several:

1. First count and then take

/**
* author:dummy | Zandy
* email:lianxiwoo@gmail.com | hotmail.com
* create:200512
* Usage:echo join (', String::substring_utf8 (' kanji ', 0, 1));
*/
Ini_set (' display_errors ', 1);
Error_reporting (e_all ^ e_notice);
Class String {
function Substring_utf8 ($str, $start, $lenth)
{
$len = strlen ($STR);
$r = Array ();
$n = 0;
$m = 0;
for ($i = 0; $i < $len; $i + +) {
$x = substr ($str, $i, 1);
$a = Base_convert (ord ($x), 10, 2);
$a = substr (' 00000000 '. $a,-8);
if ($n < $start) {
if (substr ($a, 0, 1) = 0) {
}elseif (substr ($a, 0, 3) = 110) {
$i + 1;
}elseif (substr ($a, 0, 4) = 1110) {
$i + 2;
}
$n + +;
}else{
if (substr ($a, 0, 1) = 0) {
$r [] = substr ($str, $i, 1);
}elseif (substr ($a, 0, 3) = 110) {
$r [] = substr ($str, $i, 2);
$i + 1;
}elseif (substr ($a, 0, 4) = 1110) {
$r [] = substr ($str, $i, 3);
$i + 2;
}else{
$r [] = ';
}
if (+ + $m >= $lenth) {
Break
}
}
}
return $r;
}//End Substring_utf8
}//End String
echo Join (', String::substring_utf8 (' kanji ', 0, 1));
2. First cut and then take
This way sandals feel very clever ...
The traditional intercept function first truncated ...
Then determine if the Chinese individual characters are split ...
If it is ... Deal with ...
It is particularly important to note that the third parameter of the SUBSTR function must be greater than 3 ...
Why not use sandals to explain it?

A trim function to remove the last character of a utf-8 string
By following instructions on Http://en.wikipedia.org/wiki/UTF-8
Dotann
Usage: $str = Utf8_trim (substr ($str, 0,50));
function Utf8_trim ($STR) {
$len = strlen ($STR);
For ($i =strlen ($STR)-1; $i >=0; $i-=1) {
$hex. = '. Ord ($str [$i]);
$ch = Ord ($str [$i]);
if (($ch & 128) ==0) return (substr ($str, 0, $i));
if ($ch & ==192) return (substr ($str, 0, $i));
}
Return ($str. $hex);
}
$str = ' Kanji ';
Echo Utf8_trim (substr ($str, 0, 3));
3. There are other ways to
For example, 007pig for our vBulletin Chinese version of the function written ...
Dapper......
The source is inconvenient to release ...
Sorry Bird ...



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.