Analysis on coral worm IP database

Source: Internet
Author: User

This is not a new thing. Someone has done it a long time ago.
It is to use PHP to operate the pure IP address library or Cortana IP address library, and obtain the physical location based on the visitor's IP address.

I post the code first. Then we will analyze it step by step. Hope to help those who want to know about this part.

Only For PHP5 code. Will continue to optimize the code.

Class IpLocation {
Private $ fp;
Private $ wrydat;
Private $ wrydat_version;
Private $ ipnumber;
Private $ firstip;
Private $ lastip;
Private $ ip_range_begin;
Private $ ip_range_end;
Private $ country;
Private $ area;
Const REDIRECT_MODE_0 = 0;
Const REDIRECT_MODE_1 = 1;
Const REDIRECT_MODE_2 = 2;
Function _ construct (){
$ Args = func_get_args ();
$ This-> wrydat = func_num_args ()> 0? $ Args [0]: 'coralwry. dat ';
$ This-> initialize ();
}
Function _ destruct (){
Fclose ($ this-> fp );
}
Private function initialize (){
If (file_exists ($ this-> wrydat ))
$ This-> fp = fopen ($ this-> wrydat, 'rb ');
$ This-> getipnumber ();
$ This-> getwryversion ();
}
Public function get ($ str ){
Return $ this-> $ str;
}
Public function set ($ str, $ val ){
$ This-> $ str = $ val;
}
Private function getbyte ($ length, $ offset = null ){
If (! Is_null ($ offset )){
Fseek ($ this-> fp, $ offset, SEEK_SET );
}
$ B = fread ($ this-> fp, $ length );
Return $ B;
}
/**
* Package IP addresses into binary data in the big endian format
* The data storage format is little endian (before low level), for example:
* 00 28 C6 DA 218.198.40.0 little endian
* 3F 28 C6 DA 218.198.40.0 little endian
* Such data cannot be compared by binary search. Therefore, you must first use strrev to convert the obtained IP data to big endian.
* @ Param $ ip
* @ Return binary data in big endian format
*/
Private function packip ($ ip ){
Return pack ("N", intval (ip2long ($ ip )));
}

Private function getlong ($ length = 4, $ offset = null ){
$ Chr = null;
For ($ c = 0; $ length % 4! = 0 & $ c <(4-$ length % 4); $ c ++ ){
$ Chr. = chr (0 );
}
$ Var = unpack ("Vlong", $ this-> getbyte ($ length, $ offset). $ chr );
Return $ var ['long'];
}

Private function getwryversion (){
$ Length = preg_match ("/coral/I", $ this-> wrydat )? 26:30;
$ This-> wrydat_version = $ this-> getbyte ($ length, $ this-> firstip-$ length );
}

Private function getipnumber (){
$ This-> firstip = $ this-> getlong ();
$ This-> lastip = $ this-> getlong ();
$ This-> ipnumber = ($ this-> lastip-$ this-> firstip)/7 + 1;
}

Private function getstring ($ data = "", $ offset = null ){
$ Char = $ this-> getbyte (1, $ offset );
While (ord ($ char)> 0 ){
$ Data. = $ char;
$ Char = $ this-> getbyte (1 );
}
Return $ data;
}

Private function iplocaltion ($ ip ){
$ Ip = $ this-> packip ($ ip );
$ Low = 0;
$ High = $ this-> ipnumber-1;
$ Ipposition = $ this-> lastip;
While ($ low <= $ high ){
$ T = floor ($ low + $ high)/2 );
If ($ ip <strrev ($ this-> getbyte (4, $ this-> firstip + $ t * 7 ))){
$ High = $ t-1;
} Else {
If ($ ip> strrev ($ this-> getbyte (4, $ this-> getlong (3 )))){
$ Low = $ t + 1;
} Else {
$ Ipposition = $ this-> firstip + $ t * 7;
Break;
}
}
}
Return $ ipposition;
}
Private function getarea (){
$ B = $ this-> getbyte (1 );
Switch (ord ($ B )){
Case self: REDIRECT_MODE_0:
Return "unknown ";
Break;
Case self: REDIRECT_MODE_1:
Case self: REDIRECT_MODE_2:
Return $ this-> getstring ("", $ this-> getlong (3 ));
Break;
Default:
Return $ this-> getstring ($ B );
Break;
}
}
Public function getiplocation ($ ip ){
$ Ippos = $ this-> iplocaltion ($ ip );
$ This-> ip_range_begin = long2ip ($ this-> getlong (4, $ ippos ));
$ This-> ip_range_end = long2ip ($ this-> getlong (4, $ this-> getlong (3 )));
$ B = $ this-> getbyte (1 );
Switch (ord ($ B )){
Case self: REDIRECT_MODE_1:
$ B = $ this-> getbyte (1, $ this-> getlong (3 ));
If (ord ($ B) = REDIRECT_MODE_2 ){
$ Countryoffset = $ this-> getlong (3 );
$ This-> area = $ this-> getarea ();
$ This-> country = $ this-> getstring ("", $ countryoffset );
} Else {
$ This-> country = $ this-> getstring ($ B );
$ This-> area = $ this-> getarea ();
}
Break;

Case self: REDIRECT_MODE_2:
$ Countryoffset = $ this-> getlong (3 );
$ This-> area = $ this-> getarea ();
$ This-> country = $ this-> getstring ("", $ countryoffset );
Break;

Default:
$ This-> country = $ this-> getstring ($ B );
$ This-> area = $ this-> getarea ();
Break;
}
}
}
/**/
Echo microtime ();
Echo "\ n ";
$ Iploca = new IpLocation;
// $ Iploca = new IpLocation ('qqwry. dat ');
Echo $ iploca-> get ('wrydat _ version ');
Echo "\ n ";
Echo $ iploca-> get ('ipnumber ');
Echo "\ n ";
$ Iploca-> getiplocation ('2017. 44.32.34 ');
/**/
Echo $ iploca-> get ('IP _ range_begin ');
Echo "\ n ";
Echo $ iploca-> get ('IP _ range_end ');
Echo "\ n ";
Echo $ iploca-> get ('country ');
Echo "\ n ";
Echo $ iploca-> get ('region ');

Echo "\ n ";
Echo $ iploca-> get ('lastip ');
Echo "\ n ";
Echo microtime ();
Echo "\ n ";
Unset ($ iploca );

Reference: LumaQQ pure IP database format explanation

The CoralWry. dat file structure is divided into three areas:

  • File Header [fixed 8 bytes]
  • Data zone [unfixed length, record IP address information]
  • Index Area [size depends on the file header]

The file data is stored in little endian.
The differences between little endian and big endian in Unicode encoding are referenced here.

Reference:

Big endian and little endian are different ways for CPUs to process the number of multi-word segments. For example, the Unicode code of the Chinese character is 6C49. When I write a file, do I write 6C in front or 49 in front? If you write 6C in front, it is big endian. Write 49 in front, that is, little endian.

The word "endian" comes from Gulliver Travel Notes. The civil war in the minor people's country originated from the fact that the Big-Endian attack or the Little-Endian attack were initiated when the eggs were eaten. As a result, there were six rebels, one of the emperors gave life and the other lost the throne.

We generally translate endian into byte order, and call big endian and little endian "big tail" and "Small Tail ".

File Header:
In the red box, the file header is used. The first four bytes are the start address of the index area, and the last four bytes are the end address of the index area.

As shown in:


Click to zoom in

Because the database uses the little endian byte library, we need to reverse it.
Read the 0-3 bytes in the file header, and then use the unpack function to convert the binary data to an unsigned integer in the big endian format.
After processing, the start address of the index area is 00077450, and the end address of the index area is 000CE17C.
If you have UltraEdit software on hand, you can open the CoralWry. dat file and find the location at 00077450, which is the beginning of the IP address index area.
As shown in:


Click to zoom in

The red frame is the starting position of the index area.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.