PHP implements normal webpage display under any character set

Source: Internet
Author: User
In general, our web page needs to specify an encoding character set, such as GB2312, UTF-8, ISO-8859-1, etc., so that we can display the text we specify encoding on the web page. But we are very likely to encounter this situation, that is, we may look forward to seeing Chinese characters on the ISO-8859-1 code web page

In general, our web page needs to specify an encoding character set, such as GB2312, UTF-8, ISO-8859-1, etc., so that we can display the text we specify encoding on the web page. But we are likely to encounter this situation, that is, we may look forward to display Chinese characters on the ISO-8859-1-encoded webpage, or display Korean on the GB2312-encoded webpage. Of course, a solution is that we do not need to ISO-8859-1 or GB2312 encoding, and all adopt UTF-8 encoding, so we as long as in this encoding, we can be mixed display of national text, this is the method used by many websites.

What I'm talking about here is not the above method, because the above method must specify the character set as the UTF-8, once the user manually specify as another character set, or for some reason, if the character set does not take effect, and the browser does not identify it accurately, the webpage we see is garbled, especially in some webpages using frames.
If the character set setting does not take effect, garbled characters are displayed in firefox and cannot be changed (I mean that the RightEncode plug-in is not installed ).

And I here the first method, even if the web page is designated as the ISO-8859-1 character set, can accurately display Chinese characters, Japanese and so on. The principle is very simple, that is, all the encoding except the first 128 characters in the ISO-8859-1 encoding is expressed by NCR (Numeric character reference. For example, if the word "Chinese character" is written as "Chinese character ",
In this situation, it can be accurately displayed in any character set. Based on this principle, I wrote the following program, which can convert an existing webpage to a webpage that can be displayed under any character set. You only need to specify the character set of the source webpage and the source webpage. click the submit button to obtain the target webpage. You can also convert only some text. you only need to enter the text
Box, and specify the character set of the text, click the submit button, the encoded text will be displayed on the page. In addition, I have also compiled the WordPress plug-in, and now my Blog can be accurately displayed under any character set.

Implementation method:

First, the first step is to convert the source character set string to the UTF-16 character set, do this step is because each character in the UTF-16 character set is two bytes, after processing is very easy, however, it would be complicated to directly process the source character set. The source character set can be obtained from the meta tag in the original webpage, or can be specified separately. my program allows users to specify the source character set in the form, because I cannot guarantee that the file submitted by the user must be an HTML file (other files are also acceptable, for example, the WordPress localization package source file is a po file, the content in it can also be processed in this way), and even HTML files do not necessarily have meta tags used to specify character sets. Therefore, it is safer to specify character sets separately through the form. You may feel complicated to convert one character set to another. if you implement it yourself, it is really troublesome, but it is easy to use PHP, because it already contains such a function, you can use the iconv function to easily convert various character sets. if iconv expansion is not installed on your machine, you can also apply the mb_convert_encoding function. if the Multibyte String has not been installed, there is no solution. it is impossible to implement the conversion of multiple types of codes by yourself, unless you are a top guy! Iconv is recommended. because of its high efficiency, more character sets are supported.

After completing the preceding step, the strings are processed in every two bytes. The two bytes are directly converted into numbers: *** xx in & # *** xx, if the number is less than 128, the character will be directly applied (note that this character will become a single byte), otherwise the application will apply the & # *** xx; situation. Note that this number is 65279 (hexadecimal 0x
FEFF), please ignore it, because this is the transfer control character in Unicode encoding, and our current string has only the first 128 characters in the ISO-8859-1 encoding, so we don't need it anymore.

Well, the basic idea is as follows:

Function nochaoscode ($ encode, $ str ){
$ Str = iconv ($ encode, \ "UTF-16BE \", $ str );
For ($ I = 0; $ I <strlen ($ str); $ I, $ I ){
$ Code = ord ($ str {$ I}) * 256 ord ($ str {$ I 1 });
If ($ code <128 ){
$ Output. = chr ($ code );
} Else if ($ code! = 65279 ){
$ Output. = \ "& # \". $ code .\";\";
}
}
Return $ output;
}
?>

$ Encode is the source character set and $ str is the string to be converted. The returned result is a converted string.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.