About the Bloomfilter algorithm others introduced very detailed, I do not repeat the wheel, see:
(1) wiki Encyclopedia algorithm introduction and Error rate analysis Http://en.wikipedia.org/wiki/Bloom_filter
(2) Chinese information http://blog.csdn.net/jiaomeng/article/details/1495500
Now that you know what Bitset and Bloomfilter are, let's start with this article.
First, I have processed the database of the national segment into TXT text, with an example of the following:
[Beijingshi]151 --119, Max-169, +-1029,1690-1699,1790-1799, -- the Max --139, +-1159153 0- the, --139,1100-1199,1300-1399,2100-2199, the-3029,3104,3108-3109,4000-4019,5101-5105
...
...
PS: I am here to provide I wrote the Spider Spider crawl of the national mobile phone number of the data SQLite library and the library after the format of the complete Database.txt file, including the latest ' 147 ', ' 155 ', ' 156 ', ' 170 ', ' 176 ', ' 177 ', ' 178 ', ' 181 ', ' 184 ' equals period, as of July 2014 included number segment 299,488 article: SQLite library Database.txt
In the blog post a large segment of the code is no beauty, code I put on GitHub, I say the use of methods and test results.
1<?PHP2 include(' diparser.class.php ');3 $DP=NewDiparser ();4 5 //init load Test6 $start=Microtime(true);7 $DP->load (' Phonenum.database.txt '); 8 $end=Microtime(true);9 $time=$end-$start;Ten Echo"Database load time:".$time." S<br/> "; One A //no hit test - $sec=$DP->getsection (' Beijingshi '); - $start=Microtime(true); the Var_dump($sec->has (' 1386596 '));//return True or False - $end=Microtime(true); - $time=$end-$start; - Echo"1386596 Time:".$time." S<br/> "; + - //Hit test + $start=Microtime(true); A Var_dump($sec->has (' 1800137 '));//return True or False at $end=Microtime(true); - $time=$end-$start; - Echo"1800137 Time:".$time." S<br/> "; - - //spend time Test - $start=Microtime(true); in for($i= 1380000;$i<1389999;$i++) - { to //echo sprintf ("%s:%d<br/>", $i, $sec->has ($i)); + } - $end=Microtime(true); the $time=$end-$start; * Echo"Has 1380000-1389999 time:".$time." S<br/> ";
Test results:
time: 0. 10922789573669s//init load TestBool (falsetime: 0. 0054430961608887s//no hit TestBool (truetime: 0. time: 0.00032496452331543s//spend Time Test
And a MySQL search with a good index index, the hit time is about the same (MySQL hits 0.0188s), but the efficiency of multiple queries is better than using the Key-value database. Of course, the choice of database or bitmap and bit vector method to filter the specific also depends on the production environment and requirements. I have no intention here to be better than the two methods of superiority .
The Bloomfilter algorithm uses PHP's Bitset module, and if you're not sure, check out my previous article:
Introduction and installation of PHP Bitset modules
Using PHP array to implement BITSET bit processing module function
Using the Bloomfilter algorithm to determine the cell phone number screening