sphinx+mysql+php 1.2 billion DNS data second-check

Source: Internet
Author: User
Tags php script

recently got a nearly 1.2 billion of the global NS node data, originally wanted to do a nationwide DNS anti-search domain name and then nationwide site collection and scanning, and later found that the number of sites is not very accurate, coupled with a person's energy and financial resources is really difficult to complete such a huge task, Did not go on, leaving only the notes of the building.
text format, simple text search, the speed is too slow, a search close to spend 5-10 minutes, decided to pour it into the database for an optimization, speed should be able to improve, the computer only AMP environment, then decided to pour into MySQL,
First Use navicat to pour, just the format of the data is ip,ns such format, poured nearly 5 hours to find not pour into 1%, this is plain text format when the size of 54G data file Ah!
later found that with MySQL's own load data local infile only 30 minutes, the first time when the import forgot to create a new key, had to re-import once

mysql> Load Data local infile ' e:\\dns\\rite\\20141217-rdns.txt ' into table dnsfields terminated by ', ';  Query OK, 1194674130 rows affected, 1700 warnings (min 26.65 sec) records:1194674130 deleted:0 skipped:0 warnings: 1700

because an ID field was added, the import speed was significantly reduced, but it took about 1.5 hours or so to complete the import of 55G data.
Next is to build the index, because I need the fuzzy query, so here is the full text+btree, it took almost 3 days to set up the index to complete, during the period because accidentally the MySQL execution window closed, thought it was so finished, Finally found that in fact, MySQL still in the background silently indexing.
after the index is established, it is faster to find the query speed than to have no index .

SELECT * FROM NS where NS like '%weibo.com '

It took 210 seconds, or too slow.
and then we started using Sphinx to do index lifting,
Download the 64-bit Sphinx MYSQL Support package from the official
then configure the configuration file, src Configure the account password to MySQL

source src1{    sql_host         = localhost    sql_user         = root    sql_pass         = root    sql_db           = ns    sql_port        =  3306      sql_query       =          select id,ip,ns from ns //here to write the query statement      sql_attr_uint       = id 

and then searchd also need to configure, port and log, PID file path configuration can be

searchd{    listen           = 9312    listen           = 9306:mysql41    log          = e:/phpstudy/splinx/file/log.log    query_log        = e:/phpstudy/splinx/file/query.log    pid_ File        = e:/phpstudy/splinx/file/searchd.pid 

Then switch to Sphinx's Bin directory for indexing, executing

Searchd test1 #test1是你source的名称

I probably built it up in less than 2 hours, and it was done,
then switch to the API directory to execute

E:\phpstudy\splinx\api>test.py asddeprecated: do not call this method or , even better, use sphinxql instead of anapiquery  ' asd  '   retrieved 1000 of 209273 matches in 0.007 secquery stats:          ' ASD '  found 209291 times in 209273  documentsmatches:1. doc_id=20830, weight=12. doc_id=63547, weight=13. doc_id=96147,  weight=14. doc_id=1717000, weight=15. doc_id=2213385, weight=16. doc_id=3916825 ,  weight=17. doc_id=3981791, weight=18. doc_id=5489598, weight=19. doc_id= 9348383, weight=110. doc_id=18194414, weight=111. doc_id=18194415, weight=112.  doc_id=18195126, weight=113. doc_id=18195517, weight=114. doc_id=18195518, weight= 115. doc_id=18195519, weight=116. doc_id=18195520, weight=117. doc_id=18195781, weight=118. doc_id=18195782,  weight=119. doc_id=18200301, weight=120. doc_id=18200303, weight=1

was tested and found that the speed was really fast and wrote a PHP script to call

 <?phpinclude  ' sphinxapi.php '; $conn =mysql_connect (' 127.0.0.1 ', ' root ', ' root '); mysql_select_db (' ns '), $conn), $sphinx  = new sphinxclient (), $now =time () $sphinx->setserver  (  ' 127.0.0.1 ',  9312 ), $result  =  $sphinx->query  (' weibo.com ',  ' test1 ');  foreach ($result [' Matches '] as  $key  =>  $val) {     $sql = "Select * from  ns where id= ' {$key} ' ";     $res =mysql_query ($sql);     $ Res=mysql_fetch_array ($res);    echo  "{$res [' IP ']}:{$res [' NS ']}";} Echo time ()-$now;? 

Basic implementation of the second check! , the last output time only spent 0!

123.125.104.176:w-176.service.weibo.com123.125.104.178:w-178.service.weibo.com123.125.104.179: w-179.service.weibo.com123.125.104.207:w-207.service.weibo.com123.125.104.208: W-208.SERVICE.WEIBO.COM123.125.104.209:W-209.SERVICE.WEIBO.COM123.125.104.210: w-210.service.weibo.com202.106.169.235:staff.weibo.com210.242.10.56:weibo.com.tw218.30.114.174: w114-174.service.weibo.com219.142.118.228:staff.weibo.com60.28.2.221:w-221.hao.weibo.com60.28.2.222: w-222.hao.weibo.com60.28.2.250:w-222.hao.weibo.com61.135.152.194:sina152-194.staff.weibo.com61.135.152.212: sina152-212.staff.weibo.com65.111.180.3:pr1.cn-weibo.com160.34.0.155: Srm-weibo.us2.cloud.oracle.com202.126.57.40:w1.weibo.vip.hk3.tvb.com202.126.57.41:w1.weibo.hk3.tvb.com0

sphinx+mysql+php 1.2 billion DNS data second-check

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.