The IP database on the network is the most popular with the pure version, LUMAQQ also uses the pure version IP database as the basis of the IP query function. However, the document on its format is very small, and then finally found a document on the Internet to understand its insider, but the document is very few words, but also quite patient to read. Here I rewrite a copy of this as part of the LUMAQQ developer documentation, which I think is still necessary. This paper introduces the format of the pure IP database in detail, and gives some demo for reference.
Luma, Tsinghua University
Modification Date: 2005/01/14
Note: Thanks to the pure IP database author Kim Fox and the only author of the document.
Modify History:
2005-01-14 modified some of the original expression of unclear and wrong place
Since the IP database this thing, QQ plug-in display IP function also with the birth, I see quite narrow, whether there are other applications unknown, however, the IP database is really a good thing. Today, the most popular IP database on the network I think it should be the pure version of (say Wrong also don't flat me), so far its IP record number is close to 30000, for some IP even accurate to the floor, not too fast. May 2004, is Lumaqq break ground, in order to add this everyone likes, but it seems that everyone does not know why like the explicit IP function, I also adopted a pure version of the IP database, its advantages are more records, query speed, It contains all the records with only one file QQWry.dat, easy to embed in other programs, and easy to upgrade.
Basic structure
The QQWry.dat file is structured into 3 pieces: file header, record area, index area. In general we look for IP, first in the index area to find the record offset, and then to the record area to read the information. Because records in a record area are indefinite, searching directly in the record area is not possible. Because of the number of records, if we traverse the index area will be a bit slow, in general, we can use the binary lookup method to search the index area, the speed is faster than traversing the index area of several orders of magnitude. Figure 1 is the QQWry.dat file structure diagram.
[Img]/upload/2007721172051220.gif[img]
Figure 1. QQWRY.DAT file Structure
It should be noted that all of the QQWry.dat uses Little-endian byte sequence
I. Understanding file Headers
QQWry.dat's file header is only 8 bytes, its structure is very simple, the first four bytes is the absolute offset of the first index, and the last four bytes is the absolute offset of the final index.
Two. Know the record area
Each IP record is made up of country and region name, country area is not too exact here, because may discover "Tsinghua University computer Department" and so on, here Tsinghua University has become the country name, therefore this country region name and the IP database production time has the relation. So the format of the records is a bit like QName, there is a global and local part of the composition, we still follow the country name and region name.
So we imagined that the format of a record should be: [IP address] [country name] [region name], of course, this is not a problem, but this is the simplest case. Obviously, country name and region name may have a lot of duplication, if each record is saved a full name copy is very undesirable, so we need to redirect to save space. So in order to get a country name or region name, we have two possible: the first is the direct string representation of the country name, the second is a 4-byte structure, the first byte indicates the mode of redirection, followed by 3 bytes is the country name or the actual offset location of the region name. For the country's name, the situation may be more complex, as such redirects may be up to two times.
So what is redirect mode? According to the above, the format of a record is [IP address] [national record] [regional record], if the national record is redirected, then the regional record is not possible, so there are two situations, I call it mode 1 and Mode 2. We give a picture of these formats:
screen.width*0.7) {this.resized=true; this.width=screen.width*0.7 this.alt= ' Click here to open new Window\nctrl+mouse Wheel to zoom in/out ';} ' Border=0>
Figure 2. The simplest form of IP logging
Figure 2 shows the simplest IP record format, and I don't think there's anything to explain.
screen.width*0.7) {this.resized=true; this.width=screen.width*0.7 this.alt= ' Click here to open new Window\nctrl+mouse Wheel to zoom in/out ';} ' Border=0>
Figure 3. REDIRECT Mode 1
Figure 3 illustrates the case of redirect Mode 1. We see that in mode 1, regional records are followed by State records, with only 4 bytes left in the country record after the IP address, followed by 3 bytes that form a pointer to the actual country name, followed by the address name. The identity byte of mode 1 is 0x01.
screen.width*0.7) {this.resized=true; this.width=screen.width*0.7 this.alt= ' Click here to open new Window\nctrl+mouse Wheel to zoom in/out ';} ' Border=0>
Figure 4. REDIRECT Mode 2
Figure 4 illustrates the case of redirect Mode 2. We see that in Mode 2, where the identity byte is 0x02, the region record does not follow the state record, so there is a regional record after 4 bytes after the country record. I think you've figured out the difference between Mode 1 and Mode 2, that is, there will be no regional records after the country records of Mode 1, and regional records after the state records of Mode 2. Let's take a look at a more complicated situation here.
screen.width*0.7) {this.resized=true; this.width=screen.width*0.7 this.alt= ' Click here to open new Window\nctrl+mouse Wheel to zoom in/out ';} ' Border=0>
Figure 5. Mixed situation 1
Figure 5 illustrates the more complex situation that can occur when a country is logged as mode 1, in which case the redirection point is still a redirect, but the second redirection is mode 2. Don't worry, there's no mode 3, this redirection is also up to two times, and if a second redirect occurs, it must be mode 2, and this will only happen on national records, and for regional records, Mode 1 and Mode 2 are the same, and No 2 redirects will occur for regional records. However, this diagram can be more complex, as shown in Figure 7:
screen.width*0.7) {this.resized=true; this.width=screen.width*0.7 this.alt= ' Click here to open new Window\nctrl+mouse Wheel to zoom in/out ';} ' Border=0>
Figure 6. Mixed situation 2
Figure 6 is the most complex blending situation under mode 1, but I think it should be well understood, except that the area records are redirected, and I would like to remind you that if the redirected address is 0, the unknown region name is represented.
So we summarize the following: An IP record consists of [IP address] [national record] [regional record], and for national records, there can be three representations: string form, redirect Mode 1, and redirect Mode 2. There are two ways to represent regional records: string form and redirection, and another rule: Redirection mode 1 cannot be followed by a country record. According to this summary, a reasonable combination of these methods constitutes all possible scenarios for IP records.
The rationale for the design
Before we go on to understand the structure of the index area, let's take a look at why the structure of the recording area is designed to be so. I think you might think of the answer: string reuse. Yes, under this structure, for a country name and a region name, I just need to save it once. We illustrate that in order to be convenient, we use lowercase letters to represent IP Records, c for country names, and a for region names:
There are two records a (C1, A1), B (C2, A2), if C1 = C2, A1 = A2, then we can use the structure shown in Figure 3 to achieve reuse
There are three records a (C1, A1), B (C2, A2), C (C3, A3), if C1 = C2, A2 = A3, now we want to store record B, then we can use the structure of Figure 6 to implement reuse
There are two records a (C1, A1), B (C2, A2), if C1 = C2, now we want to store record B, then we can use mode 2 to represent C2, with string representation A2
You can cite more, and you'll find that in this structure, different strings only need to be stored once.
Understanding Index Areas
In the "Understanding the File Headers" section, we explained that the file header is actually two pointers, pointing to the absolute offset of the first and last index respectively. As shown in Figure 8:
screen.width*0.7) {this.resized=true; this.width=screen.width*0.7 this.alt= ' Click here to open new Window\nctrl+mouse Wheel to zoom in/out ';} ' Border=0>
Figure 8. File header pointing to index area diagram
It's really simple, isn't it? From the file header you can navigate to the index area, and then you can start to search the IP! Each index length is 7 bytes, the first 4 bytes are the starting IP address, and the last three bytes point to the IP record. Here are some concepts to explain, what is the starting IP, then there is no end IP? Suppose there is such a record: 166.111.0.0-166.111.255.255, then 166.111.0.0 is the beginning ip,166.111.255.255 is the end of IP, IP is the end of the IP record in the first 4 bytes, this should be clear to you. As a rule, each index is combined with a record to form an IP range, and if you're looking for the location of 166.111.138.138, you'll find that 166.111.138.138 falls within the range of 166.111.0.0-166.111.255.255, Then you can follow this index to read the country and region name. So let's give you one of the most detailed illustrations:
screen.width*0.7) {this.resized=true; this.width=screen.width*0.7 this.alt= ' Click here to open new Window\nctrl+mouse Wheel to zoom in/out ';} ' Border=0>
Figure 9. File detail Structure
It's all clear now, isn't it? Maybe there's one more thing you don't know, where is the QQWry.dat version information? The answer is: The last IP record is actually version information, the last record shows that: 255.255.255.0 255.255.255.255 Pure Network June 25, 2004 IP data. OK, you should be all clear by now.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.