Batch Transcoding of website files

Source: Internet
Author: User

Due to database requirements, the original gbk2312 encoding is changed to UTF-8. To facilitate data interaction, this reduces encoding inconsistency,

Transcoding of the entire website (gb2312 --> UTF-8)

1. Find a batch transcoding tool online

 

Note: 1. This software supports selecting files or directories. Some optional types can be all files, which is convenient and careful. Check whether there are files in the selected file that do not need transcoding, such as files and images of different codes. Do not convert them together.

2. There is no de-duplication function, so be sure not to repeatedly select files (what will happen if you repeat the selection? I will try again)
3. if you select "Retain file backup", each file is generated with a corresponding bak file. Because my project has been managed by git, therefore, you do not need to back up data (git has its own recovery function). Check the specific situation by using the backup method. However, you must be careful about this.

 


2. Remove bom Headers
Use the EditPlus editor to open the transcoded file, and the status bar at the bottom is encoded as "UTF-8 +", that is, contains the bom header.

What is bom? Reference: "In a UTF-8 encoded file, BOM occupies three bytes in the file header to indicate that the file belongs to UTF-8 encoding. Currently, many software programs have recognized the bom header, however, some do not recognize bom headers. For example, PHP cannot recognize bom headers, which is also the cause of an error after UTF-8 encoding is edited in notepad.

In this way, the bom header will be output as the content when php executes the program. When there is no output requirement, for example, session_start (), an error will occur.

For a single file, use the editplus editor to open the file without 'utf-8' (that is, without bom.

For so many files, some netizens shared a script to quickly and accurately remove the bom header in batches (I did not find the original author. I would like to thank the experts for sharing this article ~), Create a PHP file under the root directory of the transcoded file. Copy the following code, enter the access address in the url, and run the following code:

<? Php
If (isset ($ _ GET ['dir']) {// sets the file directory
$ Basedir = $ _ GET ['dir'];
} Else {
$ Basedir = '.';
}
$ Auto = 1;
Checkdir ($ basedir );
Function checkdir ($ basedir ){
If ($ dh = opendir ($ basedir )){
While ($ file = readdir ($ dh ))! = False ){
If ($ file! = '.' & $ File! = '..'){
If (! Is_dir ($ basedir. "/". $ file )){
Echo "filename: $ basedir/$ file". checkBOM ("$ basedir/$ file"). "<br> ";
} Else {
$ Dirname = $ basedir. "/". $ file;
Checkdir ($ dirname );
}
}
}
Closedir ($ dh );
}
}
Function checkBOM ($ filename ){
Global $ auto;
$ Contents = file_get_contents ($ filename );
$ Charset [1] = substr ($ contents, 0, 1 );
$ Charset [2] = substr ($ contents, 1, 1 );
$ Charset [3] = substr ($ contents, 2, 1 );
If (ord ($ charset [1]) = 239 & ord ($ charset [2]) = 187 & ord ($ charset [3]) = 191) {
If ($ auto = 1 ){
$ Rest = substr ($ contents, 3 );
Rewrite ($ filename, $ rest );
Return ("<font color = red> BOM found, automatically removed. _ <a href = http://www.k686.com> http://www.k686.com </a> </font> ");
} Else {
Return ("<font color = red> BOM found. </font> ");
}
}
Else return ("BOM Not Found .");
}
Function rewrite ($ filename, $ data ){
$ Filenum = fopen ($ filename, "w ");
Flock ($ filenum, LOCK_EX );
Fwrite ($ filenum, $ data );
Fclose ($ filenum );
}
?>
3. Use the powerful ZendSdio batch search to replace the gb2312 encoding stated in htm with UTF-8

Note: Check whether the newly created zend project is properly displayed by htm. If it is garbled, check whether the htm encoding of the project is set to UTF-8. Select the project and perform global search (ctrl + H) replace "charset = gb2312" with "charset = UTF-8" in batches ",

Note: Some files that are introduced outside the project must be declared as gb2312. Therefore, you must exclude these exceptions and cannot replace them together. For files that have been transcoded this time, yes.

In addition, there may be spaces such as "charset = gb2312", no space, and various writing methods are searched. To prevent the Internet from leaking.

4. Then, the focus is on gb2312 (or gbk) in the PHP file. Combine the context logic context to determine whether replacement is required. You must also search for all the writing methods, such as utf8, UTF-8, gbk, and gb2312.


 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.