A few days ago a friend asked me to help do a collection of news information procedures, smoked a little time to write a PHP version, essays recorded.
When it comes to acquisition, it's nothing more than a remote access to information------Read more
It's an enhanced version of the Simple Thief program.
Here is the corresponding core code (don't do bad things oh ^_^)
The content to be collected is an announcement on a game website, such as:
Basic page information can be obtained first using file_get_contents and simple regularization
Organize the basic information, collect the storage:
<?PHPinclude_once("conn.php"); if($_get[' ID ']<=8&&$_get[' ID ']){ $id=$_get[' ID ']; $conn=file_get_contents("Http://www.93moli.com/news_list_4_$id. html ");//Get page Content $pattern= "/<li><a title=\" (. *) \ "target=\" _blank\ "href=\" (. *) \ ">/ius";//Regular Preg_match_all($pattern,$conn,$arr);//match content to arr array//print_r ($arr);d ie; foreach($arr[1] as $key=$value) {//the two-dimensional array [2] corresponds to the ID and [1] exactly the same, using the key $url= "http://www.93moli.com/".$arr[2] [$key]; $sql= "INSERT into List (Title,url) value ('$value‘, ‘$url‘)"; mysql_query($sql); //echo "<a href= ' content.php?url=http://www.93moli.com/$url ' > $value </a>". " <br/> "; } $id++; Echo"Collecting URL Data list$id... Please, later ... "; Echo"<script>window.location= ' list.php?id=$id' </script> '; }Else{ Echo"End of data acquisition. "; }?>
conn.php is a database connection file
list.php is this page
Because the data to be collected is paginated, and the page address is a regular increment, so I used the JS jump code, the use of ID value to control the number of pages collected, but also to avoid too large for a loop.
Easy Data Warehousing, the next blog to write about the specific URL of the process of collecting information.