night off, want to get two good-looking movies to see,
Looking for a long while didn't find want to see,
Thought before there was a personal crawl of user data, whim,
It's okay to crawl the BT Paradise movie Information Down, the next wide to go directly to the database.
Can only say the idle egg ache, haha, also can code under codes ^_^
1. Crawl Web site HTML source code
$url = "www.bttiantang.cc"; $html = Shell_exec ("Curl $url");
2. Get total pages, total number of movies (regular match)
Preg_match ("/. *?<\/span>/", $html, $pageCount);p reg_match_all ("/\d{1,10000}/", $pageCount [0],$ PageCount);
3. Capturing movie information (regular match information)
Preg_match ("/\d{4}\/\d{2}\/\d{2}/", $pageInfo [0][$i], $updateTime);p Reg_match ("/(. *?) /", $pageInfo [0][$i], $movieName); Preg_match ("/(\d{1}) <\/strong>/", $pageInfo [0][$i], $movieScore _int); Preg_match ("/(\d{1}) <\/em>/", $pageInfo [0][$i], $movieScore _decimal); Preg_match ("/href=\" (. *?) \ "/", $pageInfo [0][$i], $MOVIEURL); Preg_match ("/(.*?) <\/p>/", $pageInfo [0][$i], $actor);
4. Insert Database, doneOverall, PHP crawl speed is pretty fast, 4min less than, collect more than 2w of information.
start:01:22:54
End:01:26:11
Attached database:
Attached Source:
. *?<\/span>/", $html, $pageCount);p reg_match_all ("/\d{1,10000}/", $pageCount [0], $pageCount); $pageSize = Intval ($pageCount [0][0]) $movieCount = $pageCount [0][1]; $conn = mysql_connect (' * * * ', ' * * * ', '); mysql_select_db (' * * * ', $conn); mysql_query (' Set names UTF8 ', $conn); for ($j =1; $j <= $pageSize; $j + +) {$movieHtml = Shell_exec ("Curl $url? pageno= $j "); Preg_match_all ("/.*?<\/div>/s", $movieHtml, $pageInfo); for ($i =0; $i
/", $pageInfo [0][$i], $movieName); /*****same conditions*****/if (empty ($movieName)) Preg_match ("/
(.*?)/", $pageInfo [0][$i], $movieName); if (empty ($movieName)) Preg_match ("/ (. *?) <\/b>/", $pageInfo [0][$i], $movieName); /************************/$movieName = $movieName [1]; Preg_match ("/ (\d{1}) <\/strong>/", $pageInfo [0][$i], $movieScore _int); $movieScore _int = $movieScore _int[1]; Preg_match ("/ (\d{1}) <\/em>/", $pageInfo [0][$i], $movieScore _decimal); Moviescore_decimal = $movieScore _decimal[1]; $movieScore = Floatval ($movieScore _int. '. $movieScore _decimal); Preg_match ("/href=\" (. *?) \ "/", $pageInfo [0][$i], $MOVIEURL); $MOVIEURL = $MOVIEURL [1]; Preg_match ("/ (. *?) <\/p>/", $pageInfo [0][$i], $actor); $movieActor = Str_replace ("", "', Str_replace (" "," ', $actor [1])); Mysql_unbuffered_query ("INSERT INTO Movie" (Name,actor,url,update_ts,score) VALUES (' $movieName ', ' $movieActor ', ' $ Movieurl ', ' $updateTime ', ' $movieScore ') "); }}?>
This movie information belongs to crawl from BT Paradise, does not involve the confidential information, Therefore I do not undertake any legal responsibility!
If the relevant film information is related to your copyright or intellectual property or other interests, please inform us that the confirmation will be deleted as soon as possible.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
The above describes how to crawl BT Paradise movie data, including the aspects of the content, I hope the PHP tutorial interested in a friend helpful.