PHP on GB code dynamic UTF-8 Several methods of evaluation _php Tutorial

Source: Internet
Author: User
Tags ord
It is most efficient to read IP database files directly using the IP2ADDR function in the evaluation of geo-location conversion (IP address--), which is the least efficient when storing IP data with a MySQL database. However, the IP database file QQWry.dat is GB2312 encoded. Now I need to UTF-8 coded geo-location results. If you use the MySQL method, you can convert the data into a UTF-8 encoding when it is stored in the database, once and for all. However, the QQWry.dat file cannot be modified, only the output of the IP2ADDR function can be converted dynamically.

There are at least four ways to dynamically convert GB->UTF-8 encoding:

Extended Conversion with PHP Iconv

Extended Conversion with PHP mb_string

Swap tables are stored in the MySQL database.

Swap tables are stored in a text file.

The first two methods need to be set up by the server (the corresponding extension is compiled and installed) to use. My virtual host does not have these two extensions and has to consider the latter two methods. The first two methods are not evaluated in this article.

The evaluation procedure is as follows (func_ip.php see the "IP address-to-location conversion assessment" article):

Require_once ("func_ip.php");
function U2utf8 ($c) {
$str = "";
if ($c < 0x80) {
$str. = $c;
} elseif ($c < 0x800) {
$str. = Chr (0xC0 | $c >> 6);
$str. = Chr (0x80 | $c & 0x3F);
} elseif ($c < 0x10000) {
$str. = Chr (0xE0 | $c >> 12);
$str. = Chr (0x80 | $c >> 6 & 0x3F);
$str. = Chr (0x80 | $c & 0x3F);
} elseif ($c < 0x200000) {
$str. = Chr (0xF0 | $c >> 18);
$str. = Chr (0x80 | $c >> & 0x3F);
$str. = Chr (0x80 | $c >> 6 & 0x3F);
$str. = Chr (0x80 | $c & 0x3F);
}
return $str;
}
function Gb2utf8_sql ($strGB) {
if (!trim ($strGB)) return $strGB;
$strRet = "";
$intLen = strlen ($strGB);
for ($i = 0; $i < $intLen; $i + +) {
if (Ord ($strGB {$i}) > 127) {
$strCurr = substr ($strGB, $i, 2);
$intGB = Hexdec (Bin2Hex ($strCurr))-0x8080;
$STRSQL = "Select Code_unicode from Nnstats_gb_unicode
WHERE CODE_GB = ". $intGB." LIMIT 1 "
;
$resResult = mysql_query ($STRSQL);
if ($arrCode = mysql_fetch_array ($resResult)) $strRet. = U2utf8 ($arrCode ["Code_unicode"]);
else $strRet. = "??";
$i + +;
} else {
$strRet. = $strGB {$i};
}
}
return $strRet;
}
function Gb2utf8_file ($strGB) {
if (!trim ($strGB)) return $strGB;
$arrLines = File ("Gb_unicode.txt");
foreach ($arrLines as $strLine) {
$arrCodeTable [Hexdec (substr ($strLine, 0, 6)] = Hexdec (substr ($strLine, 7, 6));
}
$strRet = "";
$intLen = strlen ($strGB);
for ($i = 0; $i < $intLen; $i + +) {
if (Ord ($strGB {$i}) > 127) {
$strCurr = substr ($strGB, $i, 2);
$intGB = Hexdec (Bin2Hex ($strCurr))-0x8080;
if ($arrCodeTable [$intGB]) $strRet. = U2utf8 ($arrCodeTable [$intGB]);
else $strRet. = "??";
$i + +;
} else {
$strRet. = $strGB {$i};
}
}
return $strRet;
}
function Encodeip ($strDotquadIp) {
$arrIpSep = Explode (., $strDotquadIp);
if (count ($ARRIPSEP)! = 4) return 0;
$intIp = 0;
foreach ($arrIpSep as $k = + $v) $intIp + = (int) $v * POW (3-$k);
return $intIp;
Return sprintf (\%02x%02x%02x%02x, $arrIpSep [0], $arrIpSep [1], $ARRIPSEP [2], $ARRIPSEP [3]);
}
function Getmicrotime () {
List ($msec, $sec) = Explode ("", Microtime ());
Return (double) $msec + (double) $sec);
}
for ($i = 0; $i < $i + +) {//randomly generate 100 IP addresses
$strIp = Mt_rand (0, 255). ".". Mt_rand (0, 255). ".". Mt_rand (0, 255). ".". Mt_rand (0, 255);
$arrAddr [$i] = Ip2addr (Encodeip ($strIp));
}
$resConn = mysql_connect ("localhost", "netnest", "netnest");
mysql_select_db ("test");
Code conversion for evaluating MySQL queries
$dblTimeStart = Getmicrotime ();
for ($i = 0; $i < $i + +) {
$strUTF 8Region = Gb2utf8_sql ($arrAddr [$i] [region]];
$strUTF 8Address = Gb2utf8_sql ($arrAddr [$i] [address]];
}
$dblTimeDuration = Getmicrotime ()-$dblTimeStart;
End of evaluation and output results
Echo $dblTimeDuration; echo "";
Evaluation of text file query encoding conversion
$dblTimeStart = Getmicrotime ();
for ($i = 0; $i < $i + +) {
$strUTF 8Region = Gb2utf8_file ($arrAddr [$i] [region]];
$strUTF 8Address = Gb2utf8_file ($arrAddr [$i] [address]];
}
$dblTimeDuration = Getmicrotime ()-$dblTimeStart;
End of evaluation and output results
Echo $dblTimeDuration; echo "";
?>

Measure two results (accurate to 3 decimal places, in seconds):

MySQL Query conversion: 0.112
Text Query conversion: 10.590

MySQL Query conversion: 0.099
Text Query conversion: 10.623

This is the MySQL method is far ahead of the file query method. However, there is no hurry to use the MySQL method, because the text file method is so time-consuming, mainly because it every time the conversion to the entire gb_unicode.txt read into memory, and Gb_unicode.txt is a text file, the format is as follows:

0x2121 0x3000 # ideographic SPACE
0x2122 0x3001 # ideographic COMMA
0x2123 0x3002 # ideographic Full STOP
0x2124 0X30FB # Katakana Middle DOT
0x2125 0X02C9 # MODIFIER Letter MACRON (Mandarin Chinese first tone)
......
0x552a 0x6458 #
0x552b 0x658b #
0X552C 0x5b85 #
0x552d 0x7a84 #
......
0x777b 0x9f37 #
0x777c 0X9F3D #
0x777d 0x9f3e #
0x777e 0x9f44 #

Text files are inefficient, so consider converting a text file to a binary file, and then use the binary method to find the file without having to read the entire file into memory. The file format is: The file header 2 bytes, the number of records stored, followed by a record to the file, each record 4 bytes, the first 2 bytes corresponding to the GB code, the next 2 bytes corresponding to the Unicode code. The conversion program is as follows:

$arrLines = File ("Gb_unicode.txt");
foreach ($arrLines as $strLine) {
$arrCodeTable [Hexdec (substr ($strLine, 0, 6)] = Hexdec (substr ($strLine, 7, 6));
}
Ksort ($arrCodeTable);
$intCount = count ($arrCodeTable);
$strCount = chr ($intCount% 256). Chr (Floor ($intCount/256));
$fileGBU = fopen ("Gbu.dat", "WB");
Fwrite ($fileGBU, $strCount);
foreach ($arrCodeTable as $k = = $v) {
$strData = chr ($k% 256). Chr (Floor ($K/256)). Chr ($v% 256). Chr (Floor ($V/256));
Fwrite ($fileGBU, $strData);
}
Fclose ($fileGBU);
?>
After executing the program, we get the binary Gb->unicode table Gbu.dat, and the data record is sorted by GB code, which is convenient for binary method. The functions for transcoding using Gbu.dat are as follows:

function Gb2utf8_file1 ($strGB) {
if (!trim ($strGB)) return $strGB;
$fileGBU = fopen ("Gbu.dat", "RB");
$strBuf = Fread ($fileGBU, 2);
$intCount = Ord ($strBuf {0}) + * * ORD ($strBuf {1});
$strRet = "";
$intLen = strlen ($strGB);
for ($i = 0; $i < $intLen; $i + +) {
if (Ord ($strGB {$i}) > 127) {
$strCurr = substr ($strGB, $i, 2);
$intGB = Hexdec (Bin2Hex ($strCurr))-0x8080;
$intStart = 1;
$intEnd = $intCount;
while ($intStart < $intEnd-1) {//binary method find
$intMid = Floor ($intStart + $intEnd

http://www.bkjia.com/PHPjc/531660.html www.bkjia.com true http://www.bkjia.com/PHPjc/531660.html techarticle in the article "IP address-geo-location conversion evaluation," it is most efficient to read IP database files directly using the IP2ADDR function, compared with the MySQL database to store IP data, SQL query is effective ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.