First, the concept of understanding:
Unicode/ucs compressed Form--utf8 appeared, applying the official website's first sentence "UTF-8 stands for Unicode transformation Format-8." It is an octet (8-bit) lossless encoding of Unicode characters. "Because UTF also applies to coded UCS, so it can be called" UCS transformation Formats (UTF) "
UTF8 is the most basic unit of 8bits or 1Bytes encoding, of course, it can also be based on 16bits and 32bits, respectively, called UTF16 and UTF32, but the current use is not much, and UTF8 is widely used in file storage and network transmission.
--------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------
GBK text encoding is a double-byte representation, that is, both Chinese and English characters are represented by double-byte, but in order to distinguish English, the highest bit is set to 1.
As for the UTF-8 encoding, which is used to solve the international character of a multibyte encoding, it uses 8 bits (that is, one byte) in English, the Chinese use 24 bits (three bytes) to encode. For forums with more English characters, you can save space with UTF-8.
GBK contains all Chinese characters,
UTF-8 contains the characters that are needed for all countries in the world.
GBK is a standard that is compatible with GB2312 on the basis of national standard GB2312.
UTF-8 encoded text can be displayed on a variety of browsers that support UTF8 character sets in various countries.
For example, if it is a UTF8 code, it can display Chinese in the foreigner's English ie, without requiring them to download IE's Chinese language support package.
For a more English-speaking forum, use GBK to occupy 2 bytes per character, while using UTF-8 in English only takes one byte.
The UTF-8 version, while having good international compatibility, requires a 50% more database storage space than the GBK/BIG5 version and is therefore not recommended for use by users with special requirements for international compatibility.
1. Use vi/etc/httpd/conf/httpd.conf to set the encoding in Apache as: ( (Remember restart)
Adddefaultcharset UTF-8;
2. Use Vi/etc/php.ini to set the encoding in PHP as: ( (Remember restart)
Default_CharSet = "utf-8";
3. Use VI/ETC/MY.CNF to set the encoding in MySQL as: ( (Remember restart)
[Mysqld]
init_connect=' SET NAMES UTF8 ';
default-character-set=utf8;
[Client]
default-character-set = UTF8;
4. Select the encoding when building the library: (Remember to clear DB Cache)
DROP DATABASE IF EXISTS ' AA ';
CREATE DATABASE ' AA ' DEFAULT CHARACTER SET UTF8 COLLATE utf8_unicode_ci;
Use ' AA ';
CREATE TABLE IF not EXISTS ' AAT ' ( ' id ' char (1) is not NULL default '1',
' mystr ' varchar ($) Default NULL,
PRIMARY KEY (' id ')
) Engine=myisam DEFAULT Charset=utf8 collate=utf8_unicode_ci;
5. Convert all ANSI-formatted PHP documents to UTF-8 format using UltraEdit (version v11.20a):
File-to-Conversions-ASCII to UTF-8 (unicoding Editing) (Press advanced in UltraEdit------- Ile Handling--- unicode/utf-8 Detection--tick auto detect utf-8 files)
Remove bom.php can be performed if necessary. When the PHP document should be converted from ANSI to UTF-8 with the Notepad of the Windows system,
Because the document header has a BOM, can cause layout problems or PHP program without error prompts, the page is blank, need to remove, execute remove bom.php can be automatically removed.
Remove bom.php can be downloaded from the following URLs:
Http://www.hoyo.idv.tw/hoyoweb/document/view.php?sid=13&author=hoyo&status=view
6. In the PHP documentation, you must include:
< HTML > < Head >
< Meta http-equiv="Content-type"Content="text/html; Charset=utf-8 ">
</ Head > < Body >
7. You must add 3 lines of mysql_query in the document connected to db to be OK:
To add the following 3 lines before real query db takes out the data
mysql_query ("SET NAMES ' UTF8 ' )
mysql_query ("SET Character_set_client=utf8
mysql_query ("SET Character_set_results=utf8
$sql = "select * from AAT where crid= ' 1";
$rows = mysql_query ($sql);
8. In the PHP documentation, if necessary note: [Optional]
When using Htmlentities and Htmlspecialchars, the following:
$charshtmlentities($chars,ENT_QUOTES, "UTF-8");
$charshtmlspecialchars($chars,ENT_QUOTES, "UTF-8" );
and use it before the display.
$charshtml_Entity_decode($chars,ENT_QUOTES, " UTF8");
If useful over addslashes () or mysql_real_escape_string () Remember to use the following:
$charsstripslashes($chars);
If necessary, you can use the following function to convert the encoding:
$charsiconv('Big5', 'UTF-8', $chars); converted by Big5 UTF-8
< excerpt from: Http://www.cnblogs.com/cy163/archive/2007/05/31/766886.html & Http://blog.sina.com.cn/s/blog_ 6dd65c6f01019b37.html & Http://www.verydemo.com/demo_c116_i116823.html>
Website Full Adoption UTF-8 method