Website code from GB2312 to UTF-8 code

Source: Internet
Author: User

Is the existing website GB2312 encoded? See the advantages of UTF-8 coding a little heart? This article is to teach you how to change the website from GB2312 to UTF-8.

GB2312 is a simplified Chinese code. Therefore, when an article/webpage contains traditional Chinese, Japanese, or Korean, the content may not be correctly encoded.

There is a slightly wider encoding than GB2312, that is, GBK, which contains the encoding of traditional Chinese, but there is still a problem with non-Latin languages in other countries.

UTF-8 coding is a kind of code that is widely used in web pages. It is a Unicode code, that is, it is committed to incorporating all the languages around the world into a unified encoding. At present, UTF-8 has included several important Asian languages, including simplified Chinese and Japanese and Korean characters. Web pages using UTF-8 coding in a sense is "in line with international standards. In addition, many mobile terminals are using UTF-8 encoding, if the website to consider the development of WAP interface and website data itself is UTF-8 encoding, it saves the development of WAP interface transcoding problem.

Is the existing website GB2312 encoded? See UTF-8 coding a little heart? This article is to teach you how to change the website from GB2312 to UTF-8.

Before transcoding, you must consider whether transcoding is necessary for the website. I provide several points for reference:
1. The website targets people in a small circle, mainland China, Hong Kong, Macao, Taiwan, and the whole China and even the whole world.

2. In GB2312 encoding, a Chinese character occupies 2 bytes, and in the UTF-8, a Chinese character occupies 3 bytes, whether the cost of this increase in space is worthwhile.

3. In the old database system (such as MySQL 4.0 and earlier versions) may not have built-in support for the UTF-8, although this article has a solution, but does not exclude some potential small problems.

4, Web files into UTF-8 after encoding is convenient to edit. I am currently using ZDE4, with very good support for UTF-8 encoding after setup. You can set it by clicking the Editing label in the menu Tools-> Preferences and changing Encoding to UTF-8.

After you have decided on transcoding, you can start. This article only uses php 4.0 ~ 5.0 + MySQL 3.23 ~ 4.0.

First, create a new database and corresponding table structure for the database for transcoding to store transcoding results. If you do not have built-in database system operations that support UTF-8, we recommend that you change the CHAR, VARCHAR, and TEXT fields used to store Chinese characters to BINARY, VARBINARY, BLOB, although I have tried it before, it's okay.

Run the following command on the operating system command line to export the original database (replace {dbname} with the database name, and replace {path1} with an existing temporary path, the exported data will be stored here ):
Mysqldump -- opt -- comments = 0-n-t -- fields-terminated-by =, -- fields-escaped-by = {dbname}-uroot-p -- tab = {path1}

The user root in the preceding command can also be changed to another user, but the user must have the dump permission. Use a transcoding tool, such as ConvertZ, to convert all the files above {path1} into UTF-8 encoding. Disable the BOM option. Assume that the transcoded file is saved in the path {path2 }.

Use a user with the load data permission to connect to the MySQL server, use the use command to select the database you just created, and then run the following command on each table {table_name:
Load data infile '{path2}%table_name=.txt 'into TABLE {table_name} fields terminated by', 'escaped '';

Tip: When there are many tables, you can write a applet to generate an SQL script.

Warning may occur when you execute the preceding command. Please note the rows of Warning. Some data may not be converted successfully, for example, the field is misplaced.

Based on experience, most of the cases are caused by the hexadecimal code greater than 7F at the end of the data. Generally, the number of these rows is relatively small. You can manually modify these rows.

Now the database transcoding is complete. Clearing the temporary files in the original database and transcoding process is not detailed here.

For Web Transcoding: also use transcoding tools to convert all web pages into UTF-8 code.

Then open the webpage file/webpage template file containing the header and put the following line:

Replace it with the following:

In my experience, if a webpage uses a css style sheet to control the webpage style, and if a font is set in the css body label, the original gb2312 encoding will apply, this font setting can be inherited to intput and textarea, but after being converted to UTF-8, you need to reset the font in the input and textarea labels.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.