How to implement Article Collection

How to implement Article Collection _php Tutorial in PHP

Last Update:2016-07-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Data acquisition, most of the regular expression, I simply introduce how to achieve the idea of acquisition. This is the implementation of PHP. It's not wise to put it in the space, because it's not only very resource-intensive but also supports remote grab functions, such as file_get_contents ($urls). File ($url), and so on.

1, automatic switching of the article List page, and the acquisition of the article path.

2, get: Title, content

3, Warehousing

4, question

1, automatic switching of the article List page, and the acquisition of the article path.

A, automatic switching of list pages is generally dependent on dynamic pages. For example,

Http://www.phpfirst.com/foru. d=1&page= $i

Can be used in the following $i automatic increase or range to achieve, such as $i++;

It is also possible to control the range of $i from the first page to the page, as Penzi demonstrates.

b, the article path to get points need to fill in regular and no need to fill regular 2 kinds:

1) No need to fill in the normal is to get all the links above the article List page

However, it is best to filter the connection, handle the---to determine the duplicate connection, leave only one, handle the relative path, and become the absolute path. For example. /And./ET.

Here are some of the messy implementation functions I wrote:

Php:

--------------------------------------------------------------------------------

$e =clinchgeturl ("http://phpfirst.com/forumdisplay.php?fid=1");

Var_dump ($e);

function Clinchgeturl ($url)

{

$url = "http://127.0.0.1/1.htm";

$rootpath = "http://fsrootpathfsfsf/yyyyyy/";

Var_dump ($RRR);

if (eregi (.) *[.] (.) *, $url)) {

$roopath =split ("/", $url);

$rootpath = "http://". $roopath [2]. " /";

$nnn =count ($roopath) -1;for ($yu =3; $yu < $nnn; $yu + +) {$rootpath. = $roopath [$yu]. " /";}

Var_dump ($rootpath); http:,, 127.0.0.1,xnml,index.php

}

else{$rootpath = $url;//var_dump ($rootpath);

}

if (Isset ($url)) {

echo "$url has the following walks:
";

$fcontents = file ($url);

while (list (, $line) =each ($fcontents)) {

while (Eregi (href[[:space:]]*=[[:space:]]* "? [ [: alnum:]:@/._-]+[?]? [^ "]*"?), $line, $regs)) {

$regs [1] = Eregi_replace ((href[[:space:]]*=[[:space:]]* "?) ([[: alnum:]:@/._-]+) ("?)," \2 ", $regs [1]);

$regs [1] = Eregi_replace ((href[[:space:]]*=[[:space:]]*["]?) ([[: alnum:]:@/._-]+[?]? [^ "]*) (. *) [^"/]* (["]?)," \2 ", $regs [1]);

if (!eregi (^http://, $regs [1])) {

if (eregi (^:, $regs [1])) {

$roopath =eregi_replace (/http)? ( [[: alnum:]:@/._-]+] [[: alnum:]+] (. *) [[: alnum:]+], "http://\2", $url);

$roopath =split ("/", $rootpath);

$rootpath = "http://". $roopath [2]. " /";

echo "This is fundamental d:". "";

$nnn =count ($roopath) -1;for ($yu =3; $yu < $nnn; $yu + +) {$rootpath. = $roopath [$yu]. " /";}

Var_dump ($rootpath);

if (eregi (^). [/[:alnum:]], $regs [1]) {

echo "This is. /Contents/: "." ";

$regs [1]=]. /xx/xxxxxx.xx ";

$RR =split ("/", $regs [1]);

for ($oooi =1; $oooi

$rrr = $regs [1];

{$rrr. = "/". $RR [$oooi];

$rrr = Eregi_replace ("^[.") [.] [/] ",, $rrr); //}

$regs [1]= $rootpath. $rrr;

}

}else{

if (Eregi (^[[:alnum:]], $regs [1]) {$regs [1]= $rootpath. $regs [1];}

else{$regs [1] = Eregi_replace ("^[/]", $regs [1]) $regs [1]= $rootpath. $regs [1];}

}

$line = $regs [2];

if (eregi (.) *[.] (htm|shtm|html|asp|aspx|php|jsp|cgi) (.) *, $regs [1])) {

$out [0][]= $regs [1]; }

}

}for ($ouou =0; $ouou

{

if ($out [0][$ouou]== $out [0][$ouou +1]) {

$sameurlsum = 1;

echo "Sameurlsum=1:";

for ($sameurl =1; $sameurl

if ($out [0][$ouou + $sameurl]== $out [0][$ouou + $sameurl +1]) {$sameurlsum + +;}

Else{break;}

}

for ($p = $ouou; $p

{$out [0][$p]= $out [0][$p + $sameurlsum];}

}

$i = 0;

while ($out [0][++ $i]) {

echo $root. $out [0][$i]. "";

$outed [0][$i]= $out [0][$i];

}

Unset ($out);

$out = $outed; return $out;

}

The above things can only Zend, or hinder the appearance of the city: (

After getting all the unique connections, put the array

2) need to fill in the regular processing

If you want to get an accurate link to the article, use this method

According to Ketle's ideas

Use

Php:

--------------------------------------------------------------------------------

function Cut ($file, $from, $end) {

$message =explode ($from, $file);

$message =explode ($end, $message [1]);

return $message [0];

}

$from is the HTML code in front of the list

$end is the HTML code that follows the list

The above can submit parameters through the form.

The list page is not a partial removal of the list, the rest is the required connection,

Just get it through the following regular:

Php:

--------------------------------------------------------------------------------

Preg_match ("/^" (http://)? *)/I ",

$url, $matches);

return $matches [2];

2, get: Title, content

A first, using the resulting article path, read the target path

You can use the following functions:

Php:

--------------------------------------------------------------------------------

function GetContent ($url) {

if ($handle = fopen ($url, "RB")) {

$contents = "";

do {

$data = Fread ($handle, 2048);

if (strlen ($data) = = 0) {

Break

}

$contents. = $data;

} while (true);

Fclose ($handle);

}

Else

Exit ("...");

return $contents;

}

or directly

Php:

--------------------------------------------------------------------------------

File_get_contents ($urls);

The latter is more convenient, but the shortcomings of the above is known.

B, Next get the title:

This implementation is generally used:

Php:

--------------------------------------------------------------------------------

Preg_match ("| |", $allcontent, $title);

The inside part is obtained by submitting the form.

You can also use a series of cut functions

For example, the function cut mentioned above ($file, $from, $end), the specific string cutting can be achieved by the character processing function cut, the following "Get content" in detail.

C, access to content

Getting content is the same as getting the title idea but the situation is complicated because it's not so easy around the content.

1) The feature string near the content has double quotes, spaces, line breaks, etc. are big obstacles

Double quotes need to become "can be handled by addslashes ()

Line break symbols are removed and can be

Php:

--------------------------------------------------------------------------------

$a =ereg_replace ("",, $a);

Remove.

2) Thinking 2, using a lot of cutting related functions to extract content, need a lot of practice, debugging, I'm getting here, no breakthrough ~~~~~~~~

3, Warehousing

A, cut your database to insert

For example, I can insert this directly:

Php:

--------------------------------------------------------------------------------

$sql = "INSERT into $articles VALUES (, $title,, $article,,, clinch, from, keywords, 1, $ column ID, $time, 1);";

which

Php:

--------------------------------------------------------------------------------

It's automatically ascending.

http://www.bkjia.com/PHPjc/486262.html www.bkjia.com true http://www.bkjia.com/PHPjc/486262.html techarticle data acquisition, most of the regular expression, I simply introduce how to achieve the idea of acquisition. Here is the implementation of PHP. It is generally the local operation, put in space is not wise, ...



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to implement Article Collection _php Tutorial in PHP

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How to implement Article Collection _php Tutorial in PHP

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support