This article describes the implementation of PHP to crawl the VIP account of the Thunderbolt method. Share to everyone for your reference. as follows:
Read the @jinn_wei python version of the crawl account, so conveniently wrote a PHP version
PS1: The code is not optimized, it only implements the basic function
PS2: The Snoopy is used in the code
PS3: Test Address: http://xunlei.kphcdr.com
<?php
/**
* Grab love password Thunderbolt VIP account
* @author kphcdr@163.com * * *
header ("content-type:text/html; Charset=utf-8 ");
Include ' snoopy.php ';
$url = ' http://www.521xunlei.com/forum-xunleihuiyuan-1.html ';
Find a matching URL
$snoopy = new Snoopy ();
$result = $snoopy->fetchlinks ($url)->getresults ();
foreach ($result as $key => $val)
{
if (FALSE = = Strpos ($val, ' thread-'))
{
unset ($result [$key] );
}
else
{
if (!strpos ($val, ' -1-1.html '))
{
unset ($result [$key]);
}}} $real = new Snoopy ();
$result = Array_values (Array_unique ($result));
$text = $real->fetchtext ($result [1])->getresults ();
$text = Iconv (' GBK ', ' Utf-8//ignore ', $text);
Match the content
$pattern = '/^ Thunder member account | Thunder share account +[a-za-z0-9_]{4,15}+:+[0-9]+ Love password share password +[a-za-z0-9_]{4,20}\s/';
Preg_match_all ($pattern, $text, $return);
foreach ($return [0] as $a)
{
echo $a;
echo ' <br/> ';
}
snoopy-1.2.3.tar.gz Click here to download the site.
I hope this article will help you with your PHP program design.