Create a binary IP library file

Source: Internet
Author: User
I. Preface

Some friends who have done the IP address retrieval function should have heard of the pure IP address library. The pure IP address library query is similar to this:

If you only need to search for the user's home text based on the IP address and then display it, you only need to perform binary search according to the rules of the IP address Library and the display will be OK. (Detailed format)

But what if you need to obtain the geographic description based on the IP address and associate it with your existing administrative region data table?

  1. So? Obtain the region name by IP address text description-> obtain the region name by regular expression-> obtain the Region ID from the database or cache based on the region name
  2. Still like this? Import the IP database data to the database (ip_start, ip_end, area_code)-> SQL query

Both of them can be implemented, but what about efficiency? Very poor! Especially for applications with high concurrency, both methods cannot be tested.

Why not use your own Region ID to replace the region description of the pure IP database instead of the region Description Based on the data of the pure IP database (which can also be used by other IP libraries, what about creating a binary IP library file?

Let's go to the topic and see how to create a binary IP library file based on the pure IP database data.

Note: This article only describes the general idea and does not provide detailed code. Thank you.

2. Preparations

We need to prepare two pieces of data:

  1. txt files extracted from the pure IP library.

A ip.exe tool will be available after the pure real IP library is downloaded, which can be generated by decompressing the above.

Generated data 1-1. I have about 444290 records in this version.

    

Figure 1-1

 

  2. Your own provincial/municipal join data table.

There should be a lot of data on the Internet. The table structure is similar to (area_id, area_level, area_name, area_pid), which represents the Region ID, region level, Region name, and parent Region ID.

Of course, you can also use different structures without affecting our processing.

 

Iii. Process

The data already exists. Now let's plan the institution of the IP library that we need to generate.

From the title, we know that the IP database we need to generate is a binary data packet instead of a common text file. What is the structure of the IP database file?

:

  

We can see that our structure is as follows:

    • Header. It is located in the first 8 bytes of the file. The first 4 bytes are 32-bit integers. The value is the start position of the Data Part in the file; the last 4 bytes are also 32-bit integers. The value is the end position of the Data Part in the file.
    • Body Data section. It consists of N fixed structs, each of which has 12 bytes and is an IP range data (ip_start, ip_end, area_code ). The three parts of the struct are also 32-bit signed integers, 4 bytes each. (If the area_code is small, it can contain 1 character)

  

The structure of the IP data packet has been fixed, and the next step is step-by-step processing.

    1. Read the content of the IP text file one by one, convert the IP address to a 32-Bit Signed INTEGER (custom ip2long), and obtain the final region from the regional text analysis.

A. Rules for each line of the IP text file are as follows: the first 15 bytes are the IP address starting address, the last 15 bytes are the IP address ending address, and the last is the regional text description.

B. The converted IP address to a 32-bit signed integer occupies only 4 bytes, and solves the problem that the PHP function ip2long has different values in 32-bit and 64-bit systems. The new function is as follows:

function ip2Long32($ip) {    $ip = unpack('l', pack('l', ip2long($ip)));    return $ip[1];} // end func

Of course, you can also develop your own PHP extension, see here: http://www.cnblogs.com/iblaze/archive/2013/06/02/3112603.html

C. The region name (including province, city, county, and district) must be obtained. Only countries are retained abroad.

      

  

    2. Convert the obtained region information to the Region ID.

    This part of processing is not very well described, because everyone may use different regions, but the general principle is to first find the id based on the lowest level of region name (depending on the actual situation, it is possible to remove the city, county, and so on), if not, search for the upper level, such a cycle, until the Region ID is obtained.

If no Region ID is found, it is classified as unknown.

    3. Compression. The compressed file is about 5.08 MB.

      Compression rule. The value in format corresponds to the type in pack:

      

Here, you must prompt that, because the IP address is converted to a 32-bit integer, the IP address after 128.0.0.0 will be negative, so you need to judge the negative number and put it in front of our IP database, after all, binary search is used and ordered data is required.

      

    4. Search for IP addresses. Use binary search,A maximum of data records can be searched 19 times,For example:

      

    

    4. For a single test, it seems that the speed is good.

      

5. Simple pressure test to check the effect

A. AB Pressure test, using the local AB

B. Test script on Linux testing machine (general PC)

C. The stress test script is as follows:

      

D. Pressure Test statement: AB-N 10000-C 50 http: // 192.168.206.71/ipdata. php? Type = PHP

      

Good performance. Haha

      

 

It's over. How can I discuss it together? Thank you ~

      

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.