Because of their use of character encoding is generally utf-8 encoding, but if the other side of the blog using gb2312 code, post will appear garbled (unless the other side of the first conversion code before the post). It is necessary to make a code check and conversion if it is not guaranteed that the other party will use UTF-8 encoding.
Write a function to do this work, the principle is very simple, because GB2312/GBK is a Chinese two bytes, these two bytes is a range of values, and utf-8 Chinese characters are three bytes, the same byte also has a range of values. The English language is less than 128, regardless of the encoding, and occupies only one byte (except all corners).
If the file is the form of the code check, you can also check the utf-8 of the BOM information, on this aspect of things, you can see the TP Toolbox Code conversion function, I wrote in the Appcodingswitch Class A more detailed comment.
To say, directly on the function, this function is used to check the string and transcoding. Checking and transcoding of documents
Copy Code code as follows:
function safeencoding ($string, $outEncoding = ' UTF-8 ') {
$encoding = "UTF-8";
for ($i = 0; $i < strlen ($string); $i + +) {
if (Ord ($string {$i}) < 128)
Continue
if ((Ord ($string {$i}) & 224) = = 224) {
The first byte is judged by
$char = $string {+ + $i};
if ((Ord ($char) & 128) = = 128) {
The second byte is judged by
$char = $string {+ + $i};
if ((Ord ($char) & 128) = = 128) {
$encoding = "UTF-8";
Break
}
}
}
if ((Ord ($string {$i}) & 192) = = 192) {
The first byte is judged by
$char = $string {+ + $i};
if ((Ord ($char) & 128) = = 128) {
The second byte is judged by
$encoding = "GB2312";
Break
}
}
}
if (Strtoupper ($encoding) = = Strtoupper ($outEncoding))
return $string;
Else
Returniconv ($encoding, $outEncoding, $string);
}