sphinx+mysql+php 1.2 billion DNS data second-check

Source: Internet
Author: User
Recently got a nearly 1.2 billion of the global NS node data, originally wanted to do a nationwide DNS anti-search domain name and then nationwide site collection and scanning, and later found that the number of sites is not very accurate, coupled with a person's energy and financial resources is really difficult to accomplish such a huge task, did not do it, Only the notes of this building were left.
Text format, simple text search, the speed is too slow, a search close to spend 5-10 minutes, decided to pour it into the database for an optimization, speed should be able to improve, the computer only AMP environment, then decided to pour into MySQL,
First use navicat to pour, just the format of the data is ip,ns such format, poured nearly 5 hours to find not pour into 1%, this is plain text format when the size of 54G data file Ah!
Later found that with MySQL's own load data local infile only 30 minutes, the first time when the import forgot to create a new key, had to re-import once

mysql> Load Data local infile ' e:\\dns\\rite\\20141217-rdns.txt ' into table dnsfields terminated by ', '; Query OK, 1194674130 rows affected, 1700 warnings (min 26.65 sec) records:1194674130  deleted:0  skipped:0  warnings:1700

Because an ID field was added, the import speed was significantly reduced, but it took about 1.5 hours or so to complete the import of 55G data.
Next is to build the index, because I need the fuzzy query, so here is the full text+btree, it took almost 3 days to set up the index to complete, during the period because accidentally the MySQL execution window closed, thought it was so finished, Finally found that in fact, MySQL still in the background silently indexing.
After the index is established, it is faster to find the query speed than to have no index.

SELECT * FROM NS where NS like '%weibo.com '

It took 210 seconds, or too slow.
And then we started using Sphinx to do index lifting,
Downloaded the 64-bit Sphinx MYSQL Support package download address from the official
Then configure the configuration file, src Configure the account password to MySQL

SOURCE src1{    Sql_host        = localhost    sql_user        = root    sql_pass        = root    sql_db          = ns    Sql_port        = 3306      sql_query       = \        SELECT id,ip,ns from NS//Here Write the query statement    sql_attr_uint       = ID

Then SEARCHD also need to configure, port and log, PID file path configuration can be

searchd{    Listen          = 9312    Listen          = 9306:mysql41    Log         = E:/phpstudy/splinx/file/log.log    Query_log       = E:/phpstudy/splinx/file/query.log    pid_file        = e:/phpstudy/splinx/file/searchd.pid

Then switch to Sphinx's Bin directory for indexing, executing

Searchd test1 #test1是你source的名称

I've probably built it in less than 2 hours and it's done.
Then switch to the API directory to execute the

e:\phpstudy\splinx\api>test.py Asddeprecated:do Not call this method or, even better, use SPHINXQL instead of Anapique Ry ' ASD ' retrieved $209273 matches in 0.007 secquery stats:        ' asd ' found 209291 times in 209273 Documentsmatche S:1. doc_id=20830, weight=12. doc_id=63547, weight=13. doc_id=96147, weight=14. doc_id=1717000, weight=15. doc_id=2213385, weight=16. doc_id=3916825, weight=17. doc_id=3981791, weight=18. doc_id=5489598, weight=19. doc_id=9348383, weight=110. doc_id=18194414, weight=111. doc_id=18194415, weight=112. doc_id=18195126, weight=113. doc_id=18195517, weight=114. doc_id=18195518, weight=115. doc_id=18195519, weight=116. doc_id=18195520, weight=117. doc_id=18195781, weight=118. doc_id=18195782, weight=119. doc_id=18200301, weight=120. doc_id=18200303, Weight=1

was tested and found that the speed was really fast and wrote a PHP script to call

 
  Setserver (' 127.0.0.1 ', 9312); $result = $sphinx->query (' weibo.com ', ' test1 '); foreach ($result [' matches '] as $key = + $val) {    $sql = "SELECT * from NS where id= ' {$key} '";    $res =mysql_query ($sql);    $res =mysql_fetch_array ($res);    echo "{$res [' IP ']}:{$res [' NS ']}";} Echo Time ()-$now;? >

The basic implementation of the second check! , the last output time only spent 0!

123.125.104.176:w-176.service.weibo.com123.125.104.178:w-178.service.weibo.com123.125.104.179: w-179.service.weibo.com123.125.104.207:w-207.service.weibo.com123.125.104.208: W-208.SERVICE.WEIBO.COM123.125.104.209:W-209.SERVICE.WEIBO.COM123.125.104.210: w-210.service.weibo.com202.106.169.235:staff.weibo.com210.242.10.56:weibo.com.tw218.30.114.174: w114-174.service.weibo.com219.142.118.228:staff.weibo.com60.28.2.221:w-221.hao.weibo.com60.28.2.222: w-222.hao.weibo.com60.28.2.250:w-222.hao.weibo.com61.135.152.194:sina152-194.staff.weibo.com61.135.152.212: sina152-212.staff.weibo.com65.111.180.3:pr1.cn-weibo.com160.34.0.155: Srm-weibo.us2.cloud.oracle.com202.126.57.40:w1.weibo.vip.hk3.tvb.com202.126.57.41:w1.weibo.hk3.tvb.com0
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.