Teach you how to quickly develop a PHP movie crawler

Source: Internet
Author: User
Tags foreach

Today to do a PHP movie small crawler.

We're going to use Simple_html_dom's collection of data instances, which is a PHP library that is easy to get started with.

Simple_html_dom can help us to parse HTML documents using PHP. This PHP encapsulation class makes it easy to parse HTML documents and manipulate HTML elements (php5+ version above)

Download Address: Https://github.com/samacs/simple_html_dom

Here we take the list page http://paopaotv.com/tv-type-id-5-pg-1.html the http://www.paopaotv.com on the list as an example, grab the list data on the page, and the contents of the information

<?php
include_once ' simple_html_dom.php ';
Get HTML data into an object
$html = file_get_html (' http://paopaotv.com/tv-type-id-5-pg-1.html ');
A-Z alphabetical list each piece of data is within the Id=letter-focus Div class= Letter-focus-item's DL tag, found by Find method is 
$listData = $html->find ("# Letter-focus. Letter-focus-item ")//$listData The Array Object
foreach ($listData as$key=> $eachRowData) {
$filmName = $eachRowData->find ("dd span", 0)->plaintext;//get movie name
$filmUrl = $eachRowData->find ("dd a", 0)->href //Get the DD label under the film corresponding address
//Get more information
//View this column more highlights: http://www.bianceng.cnhttp://www.bianceng.cn/webkf/PHP/
$filmInfo =file_get_html ("http://paopaotv.com". $filmUrl);
$filmDetail = $filmInfo->find (". Info DL");
foreach ($filmDetail as $film) {
$info = $film->find ("DD");
$row =null;
foreach ($info as $childInfo) {
$row []= $childInfo->plaintext;
}
$cate [$key][]=join (",", $row);//To store the video information in the array
}
}

This through simple_html_dom, you can paopaotv.com film and television list of information, as well as film and television specific information to crawl to, and then you can continue to crawl film and television details on the page of the video address information, and then the film and television all the information stored in the database.

The following are common properties and methods for Simple_html_dom:

$html = file_get_html (' http://paopaotv.com/tv-type-id-5-pg-1.html ');
$e = $html->find ("div", 0);
Label
$e->tag;
The
$e->outertext;
of foreign language Inner text
$e->innertext;
Plain text
$e->plaintext;
The child element
$e->children ([int $index]);
The parent element
$e->parent ();
The first child element
$e->first_child ();
The last child element
$e->last_child ();
After a sibling element
$e->next_sibling ();
The former sibling element
$e->prev_sibling ();
Tag array
$ret = $html->find (' a ');
The first a label
$ret = $html->find (' A ', 0);

More usage can be referred to the Official Handbook.

Is it simple? There are questions welcome to the exchange of ideas

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.