Latest IP address database binary approximation & amp; Binary Search efficient resolution 8 million regional distribution of big data

Source: Internet
Author: User

The latest IP address database is from qqzeng.com

ExploitationBipartite ApproximationMethod(Bisection method), it takes only dozens of seconds to resolve more than 8 million IP addresses, which is more efficient!

OriginalSequential searchLow algorithm efficiency

Readonly string ipBinaryFilePath = "qqzengipdb. dat "; readonly byte [] dataBuffer, indexBuffer; readonly uint [] index = new uint [256]; readonly int dataLength; public IpLocation () {try {FileInfo file = new FileInfo (ipBinaryFilePath); dataBuffer = new byte [file. length]; using (var fin = new FileStream (file. fullName, FileMode. open, FileAccess. read) {fin. read (dataBuffer, 0, dataBuffer. length);} var Offset_len = BytesToLong (dataBuffer [0], dataBuffer [1], dataBuffer [2], dataBuffer [3]); // Big Endian indexBuffer = new byte [offset_len]; Array. copy (dataBuffer, 4, indexBuffer, 0, offset_len); dataLength = (int) offset_len; for (int loop = 0; loop <256; loop ++) {// index four-byte LITTLE_ENDIAN index [loop] = BytesToLong (indexBuffer [loop * 4 + 3], indexBuffer [loop * 4 + 2], indexBuffer [loop * 4 + 1], indexBuf Fer [loop * 4]);} catch {} public string [] Find (string ip) {var ips = ip. split ('. '); uint ip_prefix = uint. parse (ips [0]); uint find_uint32 = BytesToLong (byte. parse (ips [0]), byte. parse (ips [1]), byte. parse (ips [2]), byte. parse (ips [3]); // BIG_ENDIAN // LITTLE_ENDIAN int max_len = 0; int resultOffset =-1; int resultLegth =-1; uint start = index [ip_prefix] * 8 + 1024; if (ip_prefix! = 255) {max_len = (int) index [ip_prefix + 1] * 8 + 1024;} else {max_len = (int) index [255] * 8 + 1024 + 1 ;} for (; start <max_len; start + = 8) {// Content Length uint endipNum = BytesToLong (indexBuffer [start + 0], indexBuffer [start + 1], indexBuffer [start + 2], indexBuffer [start + 3]); // BIG_ENDIAN if (endipNum> = find_uint32) {resultOffset = (int) bytesToLong (byte) 0, indexBuffer [start + 6], indexBuffer [start + 5], indexBuffer [start + 4]); // LITTLE_ENDIAN resultleiterator = 0xFF & indexBuffer [start + 7]; // The break length;} if (resultOffset =-1 | resultleiterator =-1) {return new string [] {"N/A"};} var areaBytes = new byte [resultleworkflow]; Array. copy (dataBuffer, dataLength + resultOffset-1024, areaBytes, 0, resultleding); return Encoding. UTF8.GetString (areaBytes ). split ('');} private static uint BytesToLong (byte a, byte B, byte c, byte d) {return (uint) a <24) | (uint) B <16) | (uint) c <8) | (uint) d;} public static string long2IP (long longIP) {StringBuilder sb = new StringBuilder (""); sb. append (longIP> 24); sb. append (". "); // change the height of 8 to 0, and then shift 16 to sb. append (longIP & 0x00FFFFFF)> 16); sb. append (". "); sb. append (longIP & 0x0000FFFF)> 8); sb. append (". "); sb. append (longIP & 0x000000FF); return sb. toString ();}}

Domestic version usedBinary approximation algorithms (similar to binary search, but different)High Performance

Public string [] Find (string ip) {var ips = ip. split ('. '); uint ip_prefix = uint. parse (ips [0]); uint find_uint32 = BytesToLong (byte. parse (ips [0]), byte. parse (ips [1]), byte. parse (ips [2]), byte. parse (ips [3]); // BIG_ENDIAN uint max_len = 0; int resultOffset =-1; int resultleworkflow =-1; uint start = index [ip_prefix]; if (ip_prefix! = 255) {max_len = index [ip_prefix + 1];} else {max_len = index [255];} uint num = max_len-start; uint my_index = BinarySearch (start, max_len, find_uint32); start = my_index * 8 + 1024; resultOffset = (int) BytesToLong (byte) 0, indexBuffer [start + 6], indexBuffer [start + 5], indexBuffer [start + 4]); // LITTLE_ENDIAN resultlelistener = 0xFF & indexBuffer [start + 7]; // length if (resultOffset =-1 | resultlelistener =-1) {return new string [] {"N/A"};} var areaBytes = new byte [resultleworkflow]; Array. copy (dataBuffer, dataLength + resultOffset-1024, areaBytes, 0, resultleding); return Encoding. UTF8.GetString (areaBytes ). split ('');} // <summary> // binary approximation // </summary> public uint BinarySearch (uint low, uint high, uint k) {uint M = 0; while (low <= high) {uint mid = (low + high)/2; uint endipNum = GetStartIp (mid); if (endipNum> = k) {M = mid; // the possible solution for mid is high = mid-1;} else low = mid + 1;} return M ;}

With the above efficient algorithm, we can parse more than 8 million of the data and use a simple ling statistics.

  var cn_result= from r in list                        group r by r.cn into g                        select new { key = g.Key, cnt = g.Count() };

More than 8 million data statistics group chart

 

 

 

 

 

:

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.