PHP crawling embarrassing things Wikipedia home embarrassing thing

Source: Internet
Author: User
PHP Crawl embarrassing Wikipedia home embarrassing thing

Suddenly want to get some online data to play, because there is the SAE MySQL database, let it stay there no egg use! So began to use PHP to write a crawl embarrassing Wikipedia homepage embarrassing small program, the data are stored in MySQL, it is very fun!

Do what you say! First determine the idea

Get HTML SOURCE---> Parse HTML---> Save to Database

There's nothing difficult.

1. Create php file "getdatatodb.php",

2, get the HTML source of the specified URL

Here I'm using the Curl function, see the PHP manual for details.

Code for

Get the corresponding link for htmlcodefunction gethtmlcode ($url) {$ch = Curl_init ();//Initialize a Cur object curl_setopt ($ch, Curlopt_url, $url);// Set the page to crawl curl_setopt ($ch, Curlopt_returntransfer, 1); Sets the Crul parameter, which requires the result to be saved to a string or output to the screen curl_setopt ($ch, curlopt_connecttimeout, 1000); Set link Delay $htmlcode = curl_exec ($ch); Run Curl, request the page return $HtmlCode;}
3, introduce third-party file ' simple_html_dom.php ' to parse HTML

Here I do not have the ability to use regular expressions, on the internet search, and finally found this, just like Java using Jsoup (using Jsoup analysis of Chuzhou College website to get a list of news), see blog

The code is as follows

function Getfmldatatodb () {$link = mysql_connect (sae_mysql_host_m. ':' . Sae_mysql_port, Sae_mysql_user, sae_mysql_pass);//Get Source $html = str_get_html (Gethtmlcode ("http://www.qiushibaike.com /")), if ($link) {mysql_select_db (sae_mysql_db, $link), mysql_query (' Set names UTF8 ');//class=" article block Untagg Ed Mb15 "foreach ($html->find (' div[class=article block untagged mb15] ') as $per) {$z = null; $t = null; $w = null; $d = NULL, $p = NULL, $ds = null, $ps = null;////Author $author = $per->find (' div[class=author] '); if ($author! = null) {$a = $ author [0]->find (' a '); $z = $a [1]->innertext;} else {$z = ' no author ';} The Avatar link if ($author! = null) {$icon = $author [0]->find (' a '); $t = $icon [0]->src->innertext;} else {$t = ' ... ..........';} Article content $content = $per->find (' div[class=content] '); $w = $content [0]->innertext;//number of likes $vote1 = $per->find (' d Iv[class=stats] $vote 2 = $vote 1 [0]->find (' span[class=stats-vote] '); $vote 3 = $vote 2 [0]->Find (' i[class=number] '); $d = $vote 3 [0]->innertext;//Comments $comments1 = $vote 1 [0]->find (' Span[class=stats-commen TS] '); $comments 2 = $comments 1 [0]->find (' a[class=qiushi_comments] '); $comments 3 = $comments 2 [0]->find (' I[class =number] $p = $comments 3 [0]->innertext;//topness $up_down = $per->find (' div[class=stats-buttons bar Clearfix] '); $ UP_DOWN1 = $up _down [0]->find (' ul '); $li = $up _down1 [0]->find (' Li '); $up = $li [0]->find (' Span[class=numbe R Hidden] '); $ds = $up [0]->innertext;//beats $down = $li [1]->find (' span[class=number hidden] '); $ps = $down [0]-&G T;innertext;}} else {echo ' database link ko ';}}
This code to write a little tangled, I tried to not directly get the data of the child node, only from the outer layer of a layer of stripping analysis, if there is a new wording, I will update, also please crossing to see.

4. Create a database and insert the data into the database

Here I am using the SAE in the MySQL, the specific connection to the link to see the MySQL database in the SAE using PHP connection

The need to note is the encoding format, the area to be executed before the statement to add such a sentence

mysql_query (' Set names UTF8 ');
The core code is as follows:

$sql = "INSERT into ' app_bmhjqs '. ' Db_fml ' (' IDs ', ' author ', ' Icon_url ', ' content ', ' vote ', ' comments ', ' up ', ' down ') VALUES (NULL, ' $z ', ' $t ', ' $w ', ' $d ', ' $p ', ' $ds ', ' $ps '); /Solve garbled mysql_query (' Set names UTF8 '); $result = mysql_query ($sql);

In this way, get---> Parse---> Insert is complete, the effect is to run PHP files, the database added to the embarrassing encyclopedia home on the embarrassing thing! I would like to write a timer, every time to run the code, which I can implement in Java, in PHP I will not, after all, is a hairless bird! Baidu Bar ... To find such a way of writing

Timer//Ignore_user_abort (); Run script. In background//set_time_limit (0); Run script. forever//$interval = 30; Do every minutes. do {//Echo date (' y-m-d h:i:s ', Time ());//Echo ' Write to database ';////getfmldatatodb ();/} while (true);
add such code in the file, just before the school off the net, published to the SAE, I did not test! Only wait until the next day to see the results!

This morning, I can't wait to open the computer and open the SAE database as follows:

The forehead drops God! No bird, hurriedly turn off the timer, wrote a button to trigger the event! So go on, the database will be packed!

Okay, php crawl embarrassing Wikipedia home embarrassing thing to do this

If you feel that this blog is helpful to you, just give it a praise!



  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.