Recently got a nearly 1.2 billion of the global NS node data, originally wanted to do a nationwide DNS anti-search domain name and then nationwide site collection and scanning, and later found that the number of sites is not very accurate, coupled with a person's energy and financial resources is really difficult to accomplish such a huge task, did not do it, Only the notes of this building were left.
Text format, simple text search, the speed is too slow, a search close to spend 5-10 minutes, decided to pour it into the database for an optimization, speed should be able to improve, the computer only AMP environment, then decided to pour into MySQL,
First use navicat to pour, just the format of the data is ip,ns such format, poured nearly 5 hours to find not pour into 1%, this is plain text format when the size of 54G data file Ah!
Later found that with MySQL's own load data local infile only 30 minutes, the first time when the import forgot to create a new key, had to re-import once
mysql> Load Data local infile ' e:\\dns\\rite\\20141217-rdns.txt ' into table dnsfields terminated by ', '; Query OK, 1194674130 rows affected, 1700 warnings (min 26.65 sec) records:1194674130 deleted:0 skipped:0 warnings:1700
Because an ID field was added, the import speed was significantly reduced, but it took about 1.5 hours or so to complete the import of 55G data.
Next is to build the index, because I need the fuzzy query, so here is the full text+btree, it took almost 3 days to set up the index to complete, during the period because accidentally the MySQL execution window closed, thought it was so finished, Finally found that in fact, MySQL still in the background silently indexing.
After the index is established, it is faster to find the query speed than to have no index.
SELECT * FROM NS where NS like '%weibo.com '
It took 210 seconds, or too slow.
And then we started using Sphinx to do index lifting,
Downloaded the 64-bit Sphinx MYSQL Support package download address from the official
Then configure the configuration file, src Configure the account password to MySQL
SOURCE src1{ Sql_host = localhost sql_user = root sql_pass = root sql_db = ns Sql_port = 3306 sql_query = \ SELECT id,ip,ns from NS//Here Write the query statement sql_attr_uint = ID
Then SEARCHD also need to configure, port and log, PID file path configuration can be
searchd{ Listen = 9312 Listen = 9306:mysql41 Log = E:/phpstudy/splinx/file/log.log Query_log = E:/phpstudy/splinx/file/query.log pid_file = e:/phpstudy/splinx/file/searchd.pid
Then switch to Sphinx's Bin directory for indexing, executing
Searchd test1 #test1是你source的名称
I've probably built it in less than 2 hours and it's done.
Then switch to the API directory to execute the
e:\phpstudy\splinx\api>test.py Asddeprecated:do Not call this method or, even better, use SPHINXQL instead of Anapique Ry ' ASD ' retrieved $209273 matches in 0.007 secquery stats: ' asd ' found 209291 times in 209273 Documentsmatche S:1. doc_id=20830, weight=12. doc_id=63547, weight=13. doc_id=96147, weight=14. doc_id=1717000, weight=15. doc_id=2213385, weight=16. doc_id=3916825, weight=17. doc_id=3981791, weight=18. doc_id=5489598, weight=19. doc_id=9348383, weight=110. doc_id=18194414, weight=111. doc_id=18194415, weight=112. doc_id=18195126, weight=113. doc_id=18195517, weight=114. doc_id=18195518, weight=115. doc_id=18195519, weight=116. doc_id=18195520, weight=117. doc_id=18195781, weight=118. doc_id=18195782, weight=119. doc_id=18200301, weight=120. doc_id=18200303, Weight=1
was tested and found that the speed was really fast and wrote a PHP script to call
Setserver (' 127.0.0.1 ', 9312); $result = $sphinx->query (' weibo.com ', ' test1 '); foreach ($result [' matches '] as $key = + $val) { $sql = "SELECT * from NS where id= ' {$key} '"; $res =mysql_query ($sql); $res =mysql_fetch_array ($res); echo "{$res [' IP ']}:{$res [' NS ']}";} Echo Time ()-$now;? >
The basic implementation of the second check! , the last output time only spent 0!
123.125.104.176:w-176.service.weibo.com123.125.104.178:w-178.service.weibo.com123.125.104.179: w-179.service.weibo.com123.125.104.207:w-207.service.weibo.com123.125.104.208: W-208.SERVICE.WEIBO.COM123.125.104.209:W-209.SERVICE.WEIBO.COM123.125.104.210: w-210.service.weibo.com202.106.169.235:staff.weibo.com210.242.10.56:weibo.com.tw218.30.114.174: w114-174.service.weibo.com219.142.118.228:staff.weibo.com60.28.2.221:w-221.hao.weibo.com60.28.2.222: w-222.hao.weibo.com60.28.2.250:w-222.hao.weibo.com61.135.152.194:sina152-194.staff.weibo.com61.135.152.212: sina152-212.staff.weibo.com65.111.180.3:pr1.cn-weibo.com160.34.0.155: Srm-weibo.us2.cloud.oracle.com202.126.57.40:w1.weibo.vip.hk3.tvb.com202.126.57.41:w1.weibo.hk3.tvb.com0