Summary of methods for converting encoding from GB2312 to UTF-8 (formerly, program, database) _ Related tips

Source: Internet
Author: User
Tags ord php write strlen create database dreamweaver truncated

If you need to internationalize a website, you need to convert the code from GB2312 to UTF-8, which has a lot of problems to note, if not converted thoroughly, there will be a lot of coding problems appear!
There are five main areas:
One.. HTML page UTF-8 encoding problem
Two. PHP page UTF-8 encoding problem
Three. The problem of using UTF-8 encoding in MySQL database
Four. JS-related UTF-8 coding problem
Five. Flash-related UTF-8 coding problem

I. HTML page UTF-8 encoding problem

1. After, there are Chinese characters, the title of the display may be garbled!
2.html file Encoding problem:
Click on the Editor's menu: "File"-> "Save As", you can see the current file encoding to ensure that the file encoding is: UTF-8, if it is ANSI, you need to change the code to: UTF-8.
3.HTML file Header BOM problem:
When you convert a file from another encoding to a UTF-8 encoding, you sometimes add a BOM tag at the beginning of the file, and a BOM label may cause the browser to appear garbled when it displays Chinese.
How to delete this BOM label:
1. You can use Dreamweaver to open the file and save it again, that is, you can remove the BOM label!
2. You can use EditPlus to open the file, and in the menu "preferences"-> "file"-> "UTF-8 identity", set to: "Always delete the signature", and then save the file, that is, you can remove the BOM label!
4.WEB Server UTF-8 encoding problem:
If you follow the steps listed above, or if there is a Chinese garbled problem, please check your Web server's coding problem
If you are using Apache, please set the: CharSet in the configuration file: Utf-8 (here only the methods are listed, please refer to the Apache configuration file for specific format).
If you are using Nginx, please set the nginx.conf: CharSet to Utf-8, find "CharSet gb2312;" Or a similar statement, changed to: "CharSet utf-8;"

Two. PHP page UTF-8 encoding problem

1. Add one line to the beginning of the code:
Header ("Content-type:text/html;charset=utf-8");
2.PHP file Encoding problem
Click the Editor's menu: "File"-> "Save As", you can see the current file encoding to ensure that the file encoding is: UTF-8, if it is ANSI, you need to change the code to: UTF-8.
3.PHP Header BOM problem:
PHP file must not have BOM label, otherwise, there will be the session can not use the situation, and have similar prompts:
Warning:session_start () [ Function.session-start]: Cannot send session cache Limiter-headers already sent
this is because when the session_start () is executed, the entire page cannot have Output, but when the previous PHP page has a BOM label, PHP is the BOM label as output, so it went wrong!
So the PHP page must delete the BOM label
Delete this BOM label method:
1. You can use Dreamweaver to open the file and save it, that is, you can remove the BOM label!
2. You can use EditPlus to open the file, and in the menu "preferences"-> "file"-> "UTF-8 identity", set to: "Always delete the signature", and then save the file, that is, you can remove the BOM label!
4.PHP as an attachment to save the file, UTF-8 encoding problem:
PHP as an attachment to save the file, file name must be GB2312 code, otherwise, if the file name in Chinese, will be displayed garbled:
If your PHP itself is a UTF-8 encoded file, you need to turn the filename variable from UTF-8 to GB2312:
Iconv ("UTF-8", "GB2312", "$filename");
5. When the title of the article is truncated, there is garbled or "? Question mark:
General article title very long time, will display a part of the title, will be truncated to the title of the article, because a UTF-8 encoded format of the Chinese characters will occupy 3 character width, the interception of the title, sometimes only intercepted to a Chinese character of 1 characters or 2 character width, did not intercept the complete, will appear garbled or "? Question mark, use the following function to intercept the caption, there is no problem:

function Get_brief_str ($str, $max _length) 
{ 
echo strlen ($str). "
"; 
if (strlen ($STR) > $max _length) 
{ 
$check _num = 0; 
For ($i =0 $i < $max _length $i + +) 
{ 
if (ord ($str [$i]) > 128) 
$check _num++; 
} 
if ($check _num% 3 = 0) 
$str = substr ($str, 0, $max _length). " ..."; 
else if ($check _num% 3 = 1) 
$str = substr ($str, 0, $max _length + 2). " ..."; 
else if ($check _num% 3 = 2) 
$str = substr ($str, 0, $max _length + 1). " ..."; 
} 
return $str; 

Three. The problem of using UTF-8 encoding in MySQL database
1. Create databases and datasheets with phpMyAdmin
When you create a database, set the collation to: "Utf8_general_ci" or execute the statement:

CREATE DATABASE ' dbname ' DEFAULT CHARACTER SET UTF8 COLLATE utf8_general_ci;

When you create a datasheet: If the field is in Chinese, you need to set the collation to: "Utf8_general_ci",
If the field is stored in English or a number, the default is OK.
The corresponding SQL statement, for example:

CREATE TABLE ' test ' ( 
' id ' INT NOT NULL, 
' name ' VARCHAR "CHARACTER SET UTF8 COLLATE utf8_general_ci NOT null , 
PRIMARY KEY (' id ') 

2. Read and write database in PHP
After connecting to the database:

$connection = mysql_connect ($host _name, $host _user, $host _pass);

Add two lines:

mysql_query ("Set character set ' UTF8 '");/Read Library
mysql_query ("Set names ' UTF8 ')"//write Library

You can read and write the MySQL database normally.

Four. JS-related UTF-8 coding problem

1.JS read cookies in Chinese garbled problem
PHP write cookies need to be the Chinese character to escape code, otherwise JS read to the cookie in the Chinese characters will be garbled.
But PHP itself has no escape function, and we're writing a new escape function:

function Escape ($str) 
{ 
Preg_match_all ("/[\x80-\xff].| [\x01-\x7f]+/", $str, $r); 
$ar = $r [0]; 
foreach ($ar as $k => $v) 
{ 
if (ord ($v [0]) < 128) 
$ar [$k] = Rawurlencode ($v); 
else 
$ar [$k] = "%u". Bin2Hex (Iconv ("UTF-8", "UCS-2", $v)); 
return join ("", $ar); 

JS read cookies, with unescape decoding, and then to solve the cookie in Chinese garbled problem.
2. External JS file UTF-8 coding problem
When an HTML page or PHP page contains an external JS file, if the HTML page or PHP page is UTF-8 encoded format files, the external JS file will also be converted into UTF-8 files, otherwise it would appear, did not contain unsuccessful, call the function without reaction.
Click on the Editor's menu: "File"-> "Save As", you can see the current file encoding to ensure that the file encoding is: UTF-8, if it is ANSI, you need to change the code to: UTF-8.

Five. Flash-related UTF-8 coding problem
Flash internal to all strings, by default are treated as UTF-8
1.FLASH read the general text of this document (txt,html)
To save the encoding of a text file as a UTF-8
Click on the Editor's menu: "File"-> "Save As", you can see the current file encoding to ensure that the file encoding is: UTF-8, if it is ANSI, you need to change the code to: UTF-8.
2.FLASH Read XML file
To save the encoding of an XML file as a UTF-8
Click on the Editor's menu: "File"-> "Save As", you can see the current file encoding to ensure that the file encoding is: UTF-8, if it is ANSI, you need to change the code to: UTF-8.
In the 1th line of XML, write:

<?xml version= "1.0" encoding= "Utf-8"?>
3.FLASH Read PHP return data
If the PHP encoding itself is UTF-8, the direct echo is OK.
If the PHP encoding itself is GB2312, you can dump PHP into a UTF-8 encoded file, and direct echo is OK.
If the PHP encoding itself is GB2312 and does not allow you to change the encoding format of the file, use the following statement to convert the string to a UTF-8 encoding format

$new _str = Iconv ("GB2312", "UTF-8", "$str");

Echo will be all right.
4.FLASH Read Database (MYSQL) data
Flash to read data from the database in PHP, PHP itself is not important coding, the key is if the encoding of the database is GB2312, you need to use the following statement to convert the string into UTF-8 encoding format.

$new _str = Iconv ("GB2312", "UTF-8", "$str");

5.FLASH writes data through PHP
In a word, Flash passed the string is UTF-8 format, to convert to the corresponding encoding format, and then operate (write files, write the database, direct display, etc.), or with the Iconv function conversion.
6.FLASH use local encoding (theoretically not recommended)
If you want Flash to not use UTF-8 encoding, instead use a local encoding. For mainland China, the local code is GB2312 or GBK
In the AS program, you can add the following code:

System.usecodepage = true;

Then all the characters in Flash are encoded with GB2312, all the data imported to flash or exported from Flash should be encoded accordingly.
Because the use of local encoding, will cause the use of traditional Chinese areas of the user generated garbled, so it is not recommended.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.