First, preface
have done IP attribution query function of friends should have heard of the pure IP library, pure IP library query similar to this:
If you only need to search for the user's attribution based on IP and then display it, just follow the rules of the IP library to find and display the two points OK. (Detailed format)
But what if you need to get a description of the attribution based on IP and then further associate it with your existing administrative area data sheet?
Look at both of these should be achievable, but efficiency? are very poor! In particular, in the face of slightly higher applications, both of these methods can not withstand the test.
Why not according to the Pure IP library (other IP libraries can) data and their own regional data associated with their own regional ID to replace the Pure IP Library region description, and finally make a own binary IP library file it?
Let's get to the point and see how to make a own binary IP library file based on the pure IP library data.
Note: This article only describes the general idea, no detailed code, thank you
Ii. preparatory work
We need to have two pieces of data ready:
1. txt file after extracting the pure IP library.
Pure IP Library After downloading will have a ip.exe tool, using the above decompression can be generated.
Generated data 1-1, I have about 444,290 of this version.
Figure 1-1
2. Own national provinces and cities cascading data sheets.
This online should be more, their own import, table structure similar (area_id, Area_level, Area_name, Area_pid), respectively, representing the region ID, Region level, region name, parent area ID.
Of course you can use a different structure yourself, without affecting our handling.
Third, the process
The data already has, now to plan the organization of the IP libraries we need to generate.
As we know from the title, we need to generate the IP library is binary packet, rather than plain text file, then our IP library file structure should be what?
:
As you can see, our structure is this:
- Head. is located in the first 8 bytes of a file. The first 4 bytes hold a 32-bit integer, the value is the position of the beginning of the data part in the file, and the last 4 bytes also holds a 32-bit integer, and the value is the position of the end of the data part in the file
- The Body Data section. consists of n fixed structures, 12 bytes per structure, one IP range data (Ip_start, Ip_end, Area_code). The three parts of a struct are also 32-bit signed integers, each 4 bytes. (1 characters can be used if the area_code is small)
The structure of the IP packet has been finalized, followed by step-by-step processing.
1. Read IP text file content, IP to 32-bit signed integer (custom ip2long), regional text analysis get to the final region
A. The rules for each line of the IP text file are: The first 15 bytes are the IP start address, the last 15 bytes are the IP end address, and finally the region text is described.
B. IP to 32-bit signed integers accounted for only 4 bytes, and solved the PHP function Ip2long in 32-bit and 64-bit system under different values, the new function is as follows:
< span="">IP2LONG32 < span=""> ( < span=""> < span=""> < span=""> < span=""> < span="">( < span=""> < span="">[1< span="">< span="">< span="">< span="">< span="">
Of course, you can also develop your own PHP extension, see this side: http://www.cnblogs.com/iblaze/archive/2013/06/02/3112603.html
C. Regions need to obtain the names of the various levels (including provinces, municipalities, counties, districts, etc., which are reserved only for countries), regular
2. Convert the acquired region information to a regional ID
This part of the processing I am not very good description, because it may be used by everyone in different regions, but the general principle is based on the lowest-level region name to find the ID (see the actual situation, it is possible to remove the city, the county, etc.), if not find the upper level, so loop, until the region ID is obtained.
If the region ID is not found, it is classified as unknown.
3. Compressed, the file is about 5.08M
Compression rule, the value in format corresponds to the type in the pack:
There's a place here. Must be prompted, because the IP to signed 32-bit integer, then 128.0.0.0 after the IP will be negative, so need to determine negative numbers, and put in front of our IP library, after all, is to use binary search, need for ordered data.
4. Find IP, using binary lookup,44W data only need to search 19 times, similar to the following:
4. A single test that looks like speed can also
5. Simple pressure measurement to see the effect
A. AB pressure measurement, using the machine AB
B. Test scripts in Linux test machine (PC)
C. The compression test script is as follows:
D. Pressure test statement: Ab-n 10000-c http://192.168.206.71/ipdata.php?type=php
It was a good performance. Oh
The end, there is nothing better way to discuss together under, thank you ~
http://www.bkjia.com/PHPjc/440306.html www.bkjia.com true http://www.bkjia.com/PHPjc/440306.html techarticle First, the foreword has done IP attribution to query the function of friends should have heard of the pure IP library, pure IP library query similar to this: if you only need to search the user's attribution according to the IP text and then ...