Today to make a PHP movie crawler.

Source: Internet
Author: User

Today to make a PHP movie crawler.
Let's take advantage of Simple_html_dom's collection of data instances, which is a PHP library that is easy to get started with.
Simple_html_dom can help us to parse HTML documents with PHP very well. This PHP wrapper class makes it easy to parse HTML documents and manipulate HTML elements (php5+ or more)
: Https://github.com/samacs/simple_html_dom
Let's take a list of http://www.paopaotv.com on the list page http://paopaotv.com/tv-type-id-5-pg-1.html the alphabetical pattern, and crawl the list data on the page, as well as the information inside the content.

1<?PHP2Include_once'simple_html_dom.php';3 //get HTML data into an object4$html = file_get_html ('http://paopaotv.com/tv-type-id-5-pg-1.html');5 //A -Z alphabetical list each piece of data is within the Id=letter-focus Div class= Letter-focus-item's DL tag, which is found using the Find method6$listData = $html->find ("#letter-focus. Letter-focus-item");//$listData an Array object7 foreach($listData as$key =$eachRowData) {8$filmName = $eachRowData->find ("DD Span",0)->plaintext;//Get movie name9$FILMURL = $eachRowData->find ("DD a",0)->href;//get the address of the video on the DD tabTen //get more info on movies One$filmInfo =file_get_html ("http://paopaotv.com". $filmUrl); A$filmDetail = $filmInfo->find (". Info DL"); - foreach($filmDetail as$film) { -$info = $film->find ("DD"); the$row =NULL; - foreach($info as$childInfo) { -$row []= $childInfoplaintext; - } +$cate [$key][]=join (",", $row);//storing information in an array of videos - } +}

This through simple_html_dom, you can paopaotv.com the information in the film and television list, as well as the specific information on the video capture, and then you can continue to crawl the film and television detailed page of the address information, and then the film all the information is stored in the database.
Here are the properties and methods commonly used by Simple_html_dom:

1$html = file_get_html ('http://paopaotv.com/tv-type-id-5-pg-1.html');2$e = $html->find ("Div",0);3 //label4$etag;5 //Foreign Language book6$eoutertext;7 //Inner Text8$einnertext;9 //Plain TextTen$eplaintext; One //child elements A$e->children ([int$index]); - //Parent Element -$eparent (); the //First child element -$efirst_child (); - //Last child element -$elast_child (); + //after a sibling element -$enext_sibling (); + //Previous Sibling element A$eprev_sibling (); at //Label Array -$ret = $html->find ('a'); - //first A-label -$ret = $html->find ('a',0);

Original: http://www.cnblogs.com/blueel/p/3756446.html

Today to make a PHP movie crawler.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.