Today to make a PHP movie crawler.
Let's take advantage of Simple_html_dom's collection of data instances, which is a PHP library that is easy to get started with.
Simple_html_dom can help us to parse HTML documents with PHP very well. This PHP wrapper class makes it easy to parse HTML documents and manipulate HTML elements (php5+ or more)
: Https://github.com/samacs/simple_html_dom
Let's take a list of http://www.paopaotv.com on the list page http://paopaotv.com/tv-type-id-5-pg-1.html the alphabetical pattern, and crawl the list data on the page, as well as the information inside the content.
1<?PHP2Include_once'simple_html_dom.php';3 //get HTML data into an object4$html = file_get_html ('http://paopaotv.com/tv-type-id-5-pg-1.html');5 //A -Z alphabetical list each piece of data is within the Id=letter-focus Div class= Letter-focus-item's DL tag, which is found using the Find method6$listData = $html->find ("#letter-focus. Letter-focus-item");//$listData an Array object7 foreach($listData as$key =$eachRowData) {8$filmName = $eachRowData->find ("DD Span",0)->plaintext;//Get movie name9$FILMURL = $eachRowData->find ("DD a",0)->href;//get the address of the video on the DD tabTen //get more info on movies One$filmInfo =file_get_html ("http://paopaotv.com". $filmUrl); A$filmDetail = $filmInfo->find (". Info DL"); - foreach($filmDetail as$film) { -$info = $film->find ("DD"); the$row =NULL; - foreach($info as$childInfo) { -$row []= $childInfoplaintext; - } +$cate [$key][]=join (",", $row);//storing information in an array of videos - } +}
This through simple_html_dom, you can paopaotv.com the information in the film and television list, as well as the specific information on the video capture, and then you can continue to crawl the film and television detailed page of the address information, and then the film all the information is stored in the database.
Here are the properties and methods commonly used by Simple_html_dom:
1$html = file_get_html ('http://paopaotv.com/tv-type-id-5-pg-1.html');2$e = $html->find ("Div",0);3 //label4$etag;5 //Foreign Language book6$eoutertext;7 //Inner Text8$einnertext;9 //Plain TextTen$eplaintext; One //child elements A$e->children ([int$index]); - //Parent Element -$eparent (); the //First child element -$efirst_child (); - //Last child element -$elast_child (); + //after a sibling element -$enext_sibling (); + //Previous Sibling element A$eprev_sibling (); at //Label Array -$ret = $html->find ('a'); - //first A-label -$ret = $html->find ('a',0);
Original: http://www.cnblogs.com/blueel/p/3756446.html
Today to make a PHP movie crawler.