A small task, to a containing thousands of domain name of Excel, to detect whether it has been registered, register to obtain a registered company, and to obtain the corresponding website can open normally, and finally to excel file rendering.
1. At first, the thought is to read the domain name, through the HTTP call xinnet or the WHOIS query interface query Network, and then the page results are matched registrant:,registrant organization:,registrant Name: field, gets the row: the content behind. This result is very messy, many registered companies do not use these three fields, there is no unified standard to determine whether to register and register the company.
2. A similar solution was subsequently found http://stackoverflow.com/questions/16234477/ Php-script-that-finds-the-registrar-of-any-domain-name, directly by reading the WHOIS server, to achieve the domain name query. Access different WHOIS servers based on the suffix of the queried domain name, and query the results. However, it does not contain a full suffix, and many of the WHOIS content returned are onlyRegistrar字段,并且运行中容易断掉。
3. There is also a part of the core code that calls the Phpwhois component to implement the WHOIS information query:
<pre><? PHP include (' whois_inc/whois.main.php '); $whois =Newwhois (); $result $whois->lookup ($domain); $output implode $result [' RawData ']); Echo $output;? ></pre>
4. Later expert guidance, should first obtain all the domain name whois information to the database, and then slowly analysis, according to the law, classification, then classification, and then get.
First, use Phpexcel to import the domain name into the database.
<?PHPerror_reporting(E_all);//Open ErrorSet_time_limit(0);//script does not time outrequire_once' Library/phpexcel.php ';require_once' Library/phpexcel/iofactory.php ';require_once' Library/phpexcel/reader/excel5.php ';$conn=mysql_connect("localhost", "root", "" " );mysql_select_db("Test1",$conn );mysql_query("Set Names UTF8" );$inputFileName= './example.xls ';//$inputFileName = './test.xls ';$objReader= Phpexcel_iofactory::createreader (' Excel5 ');$objPHPExcel=$objReader->load ($inputFileName);//$filename can be an uploaded file, or a specified file$sheet=$objPHPExcel->getsheet (0 );//Var_dump ($sheet); exit;$highestRow=$sheet->gethighestrow ();//total number of rows obtained$highestColumn=$sheet->gethighestcolumn ();//gets the total number of columns//loops through the Excel file, reads a bar, inserts a for($j= 2;$j<=$highestRow;$j++) { $a=$objPHPExcel->getactivesheet ()->getcell ("A".$j)->getvalue ();$b=$objPHPExcel->getactivesheet ()->getcell ("B").$j)->getvalue ();$c=$objPHPExcel->getactivesheet ()->getcell ("C").$j),GetValue (); $sql= "INSERT into Domaininfo (domain,regdate,expdate) VALUES ('$a‘,‘$b‘,‘$c‘)"; mysql_query($sql );}Echo"Success";?>
The Linux system can use the whois command to obtain the WHOIS information of the domain name directly and save it to the database.
<?PHPerror_reporting(E_all);//Open ErrorSet_time_limit(0);//script does not time out$conn=mysql_connect("localhost", "Test", "Test1" );mysql_select_db("Test",$conn );mysql_query("Set Names UTF8" );$result=mysql_query("SELECT * from Domaininfo" ); while($row=Mysql_fetch_array($result ) ) { $domain=$row[' Domain ']; $retval=shell_exec("Whois$domain"); $retval=addslashes($retval); $sql= "Update domaininfo set whois = '".$retval. "' WHERE id =$row[ID] "; //echo $sql; exit; mysql_query($sql );}Echo"Over";?>
The domain names in the primary table are then progressively separated into other tables based on the WHOIS information classification.
Separate to the Domainnomatch table according to the match no matching records or not found or no match for
Match registrant: and have the return value if (Preg_match ('/registrant\s*:([^\r\n]+)/I ', $row [' whois '], $matches)) to detach to a table and get $matches[1] As a registered company.
Match registrant Organization: to a table and get registered company, match registrant Name: Detach to a table and get.
The main table still has a lot of registered but only the Registrar does not register the company returns the domain name, mostly COM suffix, do not know why the WHOIS command on the COM domain name returned information is not complete, and later on these domain names separated out to a site crawl matching
functionGetregistantname ($domain) { $url= "http://tool.admin5.com/whois/?q=$domain"; $contents= @file_get_contents($url ); //Print_r ($contents); echo "<br>"; if(Preg_match('/registrant\s*[organization]*:([^<]+)/I ',$contents,$matches )) { //Print_r ($matches); return Trim($matches[1] ); } ElseIf(Preg_match('/registrant\s*[name]*:([^<]+)/I ',$contents,$matches )) { return Trim($matches[1] ); } Else { return"No Information found"; }}
Finally, there are only single-digit special domain names directly processed manually.
And get the status code:
functionGethttpstatuscode ($url) { $curl= Curl_init ();//Initializes a new session, returns a curl handlecurl_setopt ($curl, Curlopt_url,$url);//Get content URLcurl_setopt ($curl, Curlopt_header, 1);//Get HTTP header informationcurl_setopt ($curl, Curlopt_nobody, 1);//do not return body information for HTMLcurl_setopt ($curl, Curlopt_returntransfer, 1);//return data stream, not directly outputcurl_setopt ($curl, Curlopt_timeout, 30);//timeout length, unit secondsCurl_exec ($curl);//Execute the session $rtn= Curl_getinfo ($curl,Curlinfo_http_code); Curl_close ($curl);//Close Session return $rtn; }
After each table is processed, it is integrated in the output to Excel file.