Application of PHP binary method in IP address query _php tips

Source: Internet
Author: User
Tags fread mysql query pack sprintf unpack
The database probably stores hundreds of thousands of IP records, the following recordset:


+----------+----------+------------+---------+---------+--------+--------+
| Ip_begin | Ip_end | country_id | prov_id | city_id | isp_id | Netbar |
+----------+----------+------------+---------+---------+--------+--------+
| 0 |          16777215 |       2 |       0 |      0 |      0 | 0 |
| 16777216 |          33554431 |       2 |       0 |      0 |      0 | 0 |
| 33554432 |          50331647 |       2 |       0 |      0 |      0 | 0 |
| 50331648 |          67108863 |       3 |       0 |      0 |      0 | 0 |
| 67108864 |          67829759 |       3 |       0 |      0 |      0 | 0 |
+----------+----------+------------+---------+---------+--------+--------+
This query requires the following SQL:
<?php
$sql = ' SELECT * from i_m_ip WHERE ip_begin <= $client _ip and ip_end >= $client _ip ';
?>
Such a search is obviously not used to index, even if used, MySQL query efficiency is unlikely to reach more than 500 times per second, I did a lot of concurrent optimization, the final average query efficiency is only about 200 times per second, it is a headache. At first I also think of the use of the naïve IP library search methods, but I have been inconsistent with the algorithm, also thought that the two-point method is very difficult, so there is no attempt, until finally there is no way, the final implementation of the two-point IP address search.
From the table above you can see that the IP library is a continuous value from 0 to 4294967295, if the value of the open storage, there will be hundreds of g of data, so there is no way to use the index can not hash. Eventually I used PHP to convert these things into binary storage, discarding the retrieval of the database. You can see the IP start and end length of a 4-byte long Integer, the following country ID, the province ID, etc., you can use a 2-byte short integer to store, a total of one row of data has 18 bytes, a total of 310,000 data, count up to 5 m. The specific IP library generates code as follows:
<?php
/*
IP file Format:
3741319168 3758096383 182 0 0 0 0
3758096384 3774873599 3 0 0 0 0
3774873600 4026531839 182 0 0 0 0
4026531840 4278190079 182 0 0 0 0
4294967040 4294967295 312 0 0 0 0
*/
Set_time_limit (0);
$handle = fopen ('./ip.txt ', ' RB ');
$fp = fopen ("./ip.dat", ' ab ');
if ($handle) {
while (!feof ($handle)) {
$buffer = fgets ($handle);
$buffer = Trim ($buffer);
$buffer = Explode ("T", $buffer);
foreach ($buffer as $key => $value) {
$buffer [$key] = (float) trim ($value);
}
$str = Pack (' L ', $buffer [0]);
$str. = Pack (' L ', $buffer [1]);
$str. = Pack (' S ', $buffer [2]);
$str. = Pack (' S ', $buffer [3]);
$str. = Pack (' S ', $buffer [4]);
$str. = Pack (' S ', $buffer [5]);
$str. = Pack (' S ', $buffer [6]);
Fwrite ($fp, $STR);
}
}
?>

This IP is arranged in order of 18 bytes per unit, so it is easy to use the binary method to retrieve the IP information:
function GetIP ($ip, $fp) {
Fseek ($fp, 0);
$begin = 0;
$end = FileSize ('./ip.dat ');
$begin _ip = Implode (', Unpack (' L ', Fread ($FP, 4));
Fseek ($FP, $end-14);
$end _ip = Implode (', Unpack (' L ', Fread ($FP, 4));
$begin _ip = sprintf ('%u ', $begin _ip);
$end _ip = sprintf ('%u ', $end _ip);

do {
if ($end-$begin <= 18) {
Fseek ($fp, $begin + 8);
$info = Array ();
$info [0] = implode (', Unpack (' S ', Fread ($FP, 2));
$info [1] = implode (', Unpack (' S ', Fread ($FP, 2));
$info [2] = implode (', Unpack (' S ', Fread ($FP, 2));
$info [3] = implode (', Unpack (' S ', Fread ($FP, 2));
$info [4] = implode (', Unpack (' S ', Fread ($FP, 2));
return $info;
}

$middle _seek = Ceil (($end-$begin)/2) * + $begin;

Fseek ($fp, $middle _seek);
$middle _ip = Implode (', Unpack (' L ', Fread ($FP, 4));
$middle _ip = sprintf ('%u ', $middle _ip);

if ($ip >= $middle _ip) {
$begin = $middle _seek;
} else {
$end = $middle _seek;
}
} while (true);
}

The above $fp for open Ip.dat file handle, because it is a circular search, so write in the function outside, lest each retrieval must open a file, 30W row of data, the maximum number of two-point cycle only 7 times (2^7) to find accurate IP information. Then I would like to put Ip.dat in memory to speed up the search, and later found that the efficiency of the string positioning function, the root and file pointer offset positioning is not in a order of magnitude, so or discard the use of memory to store the IP library.
This implementation, so that the efficiency of IP retrieval is more than a hundredfold, just a simple application of the binary, from this algorithm in the Web application is not important concept completely dispelled. In fact, to achieve this, I also consulted the Golden Fox, I began to ask him to help me to generate a pure IP library, and then use Discuz IP query function to retrieve, but he refused to help me, finally made me this practice and learning. Sometimes it is better to ask for self.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.