Introduction to PHP collection tutorial, teach you how to write collection
Our first step is to collect all the connections, we this is not a simple collection of an article Oh, we have to do is to collect the whole book, and saved to a text, because now MP3 Universal, can look at e-books.
How to save a book, of course, to use the title to save for easy to find pull, we first collect the title of the book,
Let's take a look at the prototype:
<meta name= "description" content= "Wu Xian (ii), after Jin Yong martial Arts Bible: 2" >
The rule is:
<meta name= "description" content= "title" >
Let's write the regular expression, do not tell me not, will not come to Hunan pull, hey hei many big birds.
Regular expression:
<meta name= "description" content= "(. *?)" >
Here we go, pull! The first thing we need to get the resources here is to use a function:
File_get_contents ()
Introduced:
Main function: Read the whole file into a string
The prototype is: String file_get_contents
(string filename [, bool Use_include_path [, resource context [, int offset [, int maxlen]]]
What does that mean, in fact, is to tell you to search for a specified string within a resource and give it to a variable
The above is the beginning need to use, we understand a little bit to start to write a little more profound understanding and can remember, I will analyze the way to write the program:
We collect an address, not just collect a book, so our collection address is changing, what is the change? This time a huge piece of chalk was thrown over, didn't I tell you? Variable, a strict Wangjianjun teacher, exhausted the whole body strength, collected in the chalk to me mercilessly threw over, I want to cry ... The teacher hit the!!!!!!!!. Hit home to see AH.
With variables good, then use the variable, we get the address, the code is as follows:
$url = "http://book.sina.com.cn/nzt/lit/zhuxian2/index.shtml";//Book Address
With the above, you should be able to write it all out now, start code:
<?php
//****************************************************************
$url = "http://book.sina.com.cn/nzt/lit/zhuxian2/index.shtml";//Book Address
$ver = "old"; Old and new versions
Because there are two types of books on his page, so we're going to make a difference here.
//****************************************************************
Get the page code file_get_contents () read the file into a string, and it needs to be used at the bottom
$r = file_get_contents ($url);
Search for the title in the string above, and assign the value to the variable $booktitle, $booktitle is an array,/is to get started!
Preg_match ("/<meta name=" description "content=" (. *?) " >/is ", $r, $booktitle);
Assign the first occurrence of the caption to the variable bookname.
$bookname = $booktitle [1]; Title
Print_r ($booktitle);d ie (); Do not understand the output this look, hey, help you understand
/*************************************************************************************
* Prototype: <li><a href=/nzt/lit/zhuxian2/1.shtml target=_blank class=a03> 45th Chapter Pain (1) </a>
* The rule is: <li><a href= is not fixed. shtml target=_blank class=a03> not fixed </a>
*isu is a regular pattern, the pattern is not greedy, that is to say, as long as the match ends
*************************************************************************************/
$preg = '/<li><a href= (. *). shtml Target=_blank Class=a03>/isu ';
/********************************************************************************
*preg_match_all for global Regular expression matching
Prototype
*
int Preg_match_all
*
(string pattern, string subject, array matches [, int flags])
* Meaning: In the global search resource variable $preg, get an array assignment to a variable $ZJ, this variable is the array.
* Access to the resources of the time with the logo can be, will not look at the array Oh!
* Miss Wang said, will not be the array to go out and chew the book, when will come in
**********************************************************************************/
Preg_match_all ($preg, $r, $ZJ);
Print_r ($ZJ);d ie (); Do not understand the output this look, hey, help you understand
Calculate the number of titles, I was asked the last hint to see how many chapters, how many collected
$BOOKZJ = count ($zj [1]);
Judge you want to collect the plate is kind of oh, because the content began different oh, in fact, can automatically judge, I also wrote, but do not publish, because very simple
if ($ver = = "new") {
$content _start = "<!--the contents of the text began-->";
$content _end = "<!--body content end-->";
}
if ($ver = = "old") {
$content _start = "</table><!--newszw_hzh_end-->";
$content _end = "<br>";
}
After the file is collected, then it is processed. This is set code, why is this, because you look at the site source code, HEY!!!
Header ("content-type:text/html;charset=gb2312");
/*****************************************************************************************
* Merge from 1 to 136 pages at a time. This is the most fun ... Play a copyright, lest someone infringement, hey, as if I was in tort Oh!!!
* So-and-so must want to kill, this means to write a copyright, create a file.
*****************************************************************************************/
Writer ($bookname. "A total". $bookzj. " The section RN handsome Liu and in ". Date (" D M J g:i:s T Y ")." In order to graduate the design of the novel collation Collection Rn ","./ljy/". $bookname.". TXT "," w+ ");
/*****************************************************************************************
* Merge from 1 to 136 pages at a time. This is the most fun ... Play a copyright, lest someone infringement, hey, as if I was in tort Oh!!!
* So-and-so must want to kill, this means to write a copyright, create a file.
*****************************************************************************************/
For ($i =0 $i < $bookzj; $i + +) {//hint: $bookzj What's in the front of you.
echo "http://book.sina.com.cn". $zj [1][$i] ". shtml";d ie ();
$str = file_get_contents ("http://book.sina.com.cn". $zj [1][$i]. ". sHTML ");
Preg_match ("/(<title>) (. *?) (</title>)/is ", $str, $title);
$title = Str_replace ("_ Reading Channel _ Sina Net", "", Preg_replace ("/<") >/s "," ", $title [2]));
/***************************************************************************
*preg_replace performs search and replace of regular expressions
*str_replace usage is really not good to say, see example! is actually a replacement
* str = "ABCABC". Replace (/a/g, "D"); The result is DBCDBC
* str = "ABCABC". Replace (/a/, "D"); The result is DBCABC
***************************************************************************/
Preg_match ("/(". $content _start. ") (.*?) (". $content _end.") /is ", $str, $content);
$content = Preg_replace ("/<" (. *?) >/s "," ", Str_replace (" </p> "," RN ", $content [2]));
$content = Str_replace ("
"," ", Preg_replace ("/^[s]*n/is "," ", $content));
$content = Str_replace ("?", "" ", Preg_replace ("/^[s]*n/is "," ", $content));
$result = "RN". ($i + 1). " Section--------". $title." _ Mr. Wang is handsome---------rn ". $content;
Var_dump ($result);d ie ();
Writer ($result, "./ailaopo/". $bookname. ". TXT "," A + ");
echo "novel". $bookname. " Altogether ". $bookzj." section, now sorted to the ". $i." Section _ ". $title." <br> ";
}
echo "novel". $bookname. " Altogether ". $bookzj." Section has been all sorted out! ";
function writer ($content, $url, $mode)
{
$fp = fopen ($url, $mode);
Fwrite ($fp, $content);
Fclose ($FP);
}
?>