Application of php binary in IP address query

Source: Internet
Author: User
Some time ago, data analysis requires a large number of IP address queries (nearly ten thousand queries per second). First, consider using the database. The database stores hundreds of thousands of IP records. the record set is as follows:


+ ---------- + ------------ + --------- + -------- +
| Ip_begin | ip_end | country_id | prov_id | city_id | isp_id | netbar |
+ ---------- + ------------ + --------- + -------- +
| 0 | 16777215 | 2 | 0 | 0 | 0 | 0 |
| 16777216 | 33554431 | 2 | 0 | 0 | 0 | 0 |
| 33554432 | 50331647 | 2 | 0 | 0 | 0 | 0 |
| 50331648 | 67108863 | 3 | 0 | 0 | 0 | 0 |
| 67108864 | 67829759 | 3 | 0 | 0 | 0 | 0 |
+ ---------- + ------------ + --------- + -------- +
The following SQL statement is required for this query:
$ SQL = 'select * FROM I _m_ip WHERE ip_begin <= $ client_ip AND ip_end> = $ client_ip ';
?>
This search obviously does not use indexes. even if it is used, the MySQL Query efficiency is unlikely to reach more than 500 times per second. I have done a lot of concurrent optimization, the final average query efficiency is only about 200 times per second, which is really a headache. At the beginning, I also came up with a reference to the retrieval method of the pure IP database. However, I have always been in conflict with the algorithm and thought it was difficult to use the binary method, so I didn't try to use it until I finally had no way to do it, in the end, the binary IP address retrieval is realized.
From the table above, we can see that the IP database is a continuous value from 0 to 4294967295. if this value is stored separately, there will be several hundred GB of data, so there is no way to use the index or hash. In the end, I used PHP to convert these things into binary storage and discarded database retrieval. You can see that the IP address starts and ends with a 4-byte long integer, followed by the country ID and province ID. you can use two-byte Short integer to store the IP address, A total of 18 bytes of data in a row, with a total of 0.31 million data records, is equivalent to 5 MB. The specific IP library generation code is as follows:
/*
IP file format:
3741319168 3758096383 182 0 0 0 0
3758096384 3774873599 3 0 0 0 0
3774873600 4026531839 182 0 0 0 0
4026531840 4278190079 182 0 0 0 0
4294967040 4294967295 312 0 0 0 0
*/
Set_time_limit (0 );
$ Handle = fopen ('./ip.txt', 'RB ');
$ Fp = fopen ("./ip. dat", 'AB ');
If ($ handle ){
While (! Feof ($ handle )){
$ Buffer = fgets ($ handle );
$ Buffer = trim ($ buffer );
$ Buffer = explode ("\ t", $ buffer );
Foreach ($ buffer as $ key => $ value ){
$ Buffer [$ key] = (float) trim ($ value );
}
$ Str = pack ('L', $ buffer [0]);
$ Str. = pack ('L', $ buffer [1]);
$ Str. = pack ('s ', $ buffer [2]);
$ Str. = pack ('s ', $ buffer [3]);
$ Str. = pack ('s ', $ buffer [4]);
$ Str. = pack ('s ', $ buffer [5]);
$ Str. = pack ('s ', $ buffer [6]);
Fwrite ($ fp, $ str );
}
}
?>

In this way, IP addresses are arranged every 18 bytes in order, so it is easy to retrieve IP information using the binary method:
Function getip ($ ip, $ fp ){
Fseek ($ fp, 0 );
$ Begin = 0;
$ End = filesize ('./ip. dat ');
$ Begin_ip = implode ('', unpack ('L', fread ($ fp, 4 )));
Fseek ($ fp, $ end-14 );
$ End_ip = implode ('', unpack ('L', fread ($ fp, 4 )));
$ Begin_ip = sprintf ('% u', $ begin_ip );
$ End_ip = sprintf ('% u', $ end_ip );

Do {
If ($ end-$ begin <= 18 ){
Fseek ($ fp, $ begin + 8 );
$ Info = array ();
$ Info [0] = implode ('', unpack ('s', fread ($ fp, 2 )));
$ Info [1] = implode ('', unpack ('s', fread ($ fp, 2 )));
$ Info [2] = implode ('', unpack ('s', fread ($ fp, 2 )));
$ Info [3] = implode ('', unpack ('s', fread ($ fp, 2 )));
$ Info [4] = implode ('', unpack ('s', fread ($ fp, 2 )));
Return $ info;
}

$ Middle_seek = ceil ($ end-$ begin)/18)/2) * 18 + $ begin;

Fseek ($ fp, $ middle_seek );
$ Middle_ip = implode ('', unpack ('L', fread ($ fp, 4 )));
$ Middle_ip = sprintf ('% u', $ middle_ip );

If ($ ip >=$ middle_ip ){
$ Begin = $ middle_seek;
} Else {
$ End = $ middle_seek;
}
} While (true );
}

The above $ fp is the ip address. dat file handle, because it is a loop search, so write it out of the function, so that you do not need to open the file every time you search, up to seven cycles of 30 million lines of data (2 ^ 7) you can find accurate IP information. Then I want to change the ip address. dat is placed in the memory to speed up the retrieval. later, we found that the efficiency of the string locating function is not equal to the offset of the file pointer, so we still give up using the memory to store the IP address Library.
This implementation improves the efficiency of IP retrieval by nearly Times. it is just a simple application of the binary method, and the concept of the algorithm being unimportant in WEB applications is completely dispelled. To implement this, I also asked Jin Hu. at first, I asked him to help me generate an IP library in pure format, and then I used the Discuz IP query function for retrieval, but he refused to help me, and finally created my practice and learning. Sometimes, it is better to ask for yourself.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.