PHP file_get_contents collection program development tutorial

Source: Internet
Author: User
Tags preg regular expression


File_get_contents () remote file retrieval function, used to obtain remote page content
Preg_match_all () is used for global regular expression matching and multiple matches.
Preg_match () is used for regular expression matching and match once for terminal matching.
Preg_replace () is used to replace regular expressions and filter terminals.

Procedure

Step 1 obtain the single-page list and single article content
Before collecting the list and content in batches, we first use the single-page list of the website and the content collection of a single article as a Test regular expression.

Link addresses of articles collected on the list page:

The code is as follows: Copy code

<? Php
// Obtain the list
$ Url = '/s2005/shishi.shtml ';
$ Con = file_get_contents ($ url );
// Obtain the link of the article in the list by writing a regular expression.
/* Example: <a test = a href = '/20130418/n373177942.shtml'
Target = '_ blank'> a total of 6 people were killed in the sinking accident of the Fuling power transmission project in Hunan </a> */
$ Preg = "| <a test = a href = '(. *) 'target =' _ blank '> (. *) </a> | iUs ";
// The/I in the regular expression is case insensitive/U non-greedy match/s point number can match Line breaks
Preg_match_all ($ preg, $ con, $ arr );
// Var_dump ($ arr );
/*
Array (3 ){
[0] =>
Array (40 ){
[0] =>
String (126) "<a test = a href = '/20130418/n373180618.shtml'
Target = '_ blank'> The Hexi Corridor in Gansu province was attacked by strong winds and dust, and the instantaneous maximum wind power reached 9 levels </a>"
[1] =>
String (112) "<a test = a href = '/20130418/n373180612.shtml'
Target = '_ blank'> all residential land prices in first-tier cities rose by month </a>"
......
[39] =>
String (124) "<a test = a href = '/20130418/n372131633.shtml'
Target = '_ blank'> a shooting incident in Hengyang, Hunan province resulted in the arrest of one dead policeman. </a>"
  }
[1] =>
Array (40 ){
[0] =>
String (46) "/20130418/n373180618.shtml"
[1] =>
String (46) "/20130418/n373180612.shtml"
......
[39] =>
String (46) "/20130418/n372131633.shtml"
  }
[2] =>
Array (40 ){
[0] =>
String (42) "The maximum instantaneous wind power is 9 levels in the Hexi corridor of Gansu province under heavy winds and dust attacks"
[1] =>
String (28) "all residential land prices in first-tier cities rose by month"
......
[39] =>
String (40"
  }
}
*/
?>

Collection of a single article:

The code is as follows: Copy code

<? Php
$ Url = 'http: // www.111cn.net ';
$ Con = file_get_contents ($ url );
// Regular expressions are divided into titles and content
$ Title_preg = "| $ Content_preg = "| <! -- Body --> (. *) <! -- Share --> | iUs ";
Preg_match ($ title_preg, $ con, $ title_arr );
Preg_match ($ content_preg, $ con, $ content_arr );
?>

Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.