How to crawl BT Paradise movie Data

Source: Internet
Author: User
night off, want to get two good-looking movies to see,

Looking for a long while didn't find want to see,

Thought before there was a personal crawl of user data, whim,

It's okay to crawl the BT Paradise movie Information Down, the next wide to go directly to the database.

Can only say the idle egg ache, haha, also can code under codes ^_^


1. Crawl Web site HTML source code

$url = "www.bttiantang.cc"; $html = Shell_exec ("Curl $url");

2. Get total pages, total number of movies (regular match)

Preg_match ("/. *?<\/span>/", $html, $pageCount);p reg_match_all ("/\d{1,10000}/", $pageCount [0],$ PageCount);

3. Capturing movie information (regular match information)

Preg_match ("/\d{4}\/\d{2}\/\d{2}/", $pageInfo [0][$i], $updateTime);p Reg_match ("/(. *?) /", $pageInfo [0][$i], $movieName);        Preg_match ("/(\d{1}) <\/strong>/", $pageInfo [0][$i], $movieScore _int); Preg_match ("/(\d{1}) <\/em>/", $pageInfo [0][$i], $movieScore _decimal); Preg_match ("/href=\" (. *?) \ "/", $pageInfo [0][$i], $MOVIEURL); Preg_match ("/

(.*?) <\/p>/", $pageInfo [0][$i], $actor);


4. Insert Database, done

Overall, PHP crawl speed is pretty fast, 4min less than, collect more than 2w of information.

start:01:22:54

End:01:26:11



Attached database:



Attached Source:

    . *?<\/span>/", $html, $pageCount);p reg_match_all ("/\d{1,10000}/", $pageCount [0], $pageCount); $pageSize = Intval ($pageCount [0][0]) $movieCount = $pageCount [0][1]; $conn = mysql_connect (' * * * ', ' * * * ', '); mysql_select_db (' * * * ', $conn); mysql_query (' Set names UTF8 ', $conn); for ($j =1; $j <= $pageSize; $j + +) {$movieHtml = Shell_exec ("Curl $url?    pageno= $j ");    Preg_match_all ("/.*?<\/div>/s", $movieHtml, $pageInfo); for ($i =0; $i
    
     /", $pageInfo [0][$i], $movieName); /*****same conditions*****/if (empty ($movieName)) Preg_match ("/
     (.*?)/", $pageInfo [0][$i], $movieName); if (empty ($movieName)) Preg_match ("/ (. *?) <\/b>/", $pageInfo [0][$i], $movieName); /************************/$movieName = $movieName [1]; Preg_match ("/ (\d{1}) <\/strong>/", $pageInfo [0][$i], $movieScore _int); $movieScore _int = $movieScore _int[1]; Preg_match ("/ (\d{1}) <\/em>/", $pageInfo [0][$i], $movieScore _decimal); Moviescore_decimal = $movieScore _decimal[1]; $movieScore = Floatval ($movieScore _int. '. $movieScore _decimal); Preg_match ("/href=\" (. *?) \ "/", $pageInfo [0][$i], $MOVIEURL); $MOVIEURL = $MOVIEURL [1]; Preg_match ("/

(. *?) <\/p>/", $pageInfo [0][$i], $actor); $movieActor = Str_replace ("", "', Str_replace (" "," ', $actor [1])); Mysql_unbuffered_query ("INSERT INTO Movie" (Name,actor,url,update_ts,score) VALUES (' $movieName ', ' $movieActor ', ' $ Movieurl ', ' $updateTime ', ' $movieScore ') "); }}?>

This movie information belongs to crawl from BT Paradise, does not involve the confidential information, Therefore I do not undertake any legal responsibility!

If the relevant film information is related to your copyright or intellectual property or other interests, please inform us that the confirmation will be deleted as soon as possible.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

The above describes how to crawl BT Paradise movie data, including the aspects of the content, I hope the PHP tutorial interested in a friend helpful.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.