This article is a small program to extract text from a batch of news pages, which can save the content of each piece of news as a text file with the title of the news as the file name. If you have a better approach, please contact me:
Lwx3069@sina.com
Here is an example of the news under "Today's headlines" in the net.
($url)? "": $url = "http://www.unn.com.cn/GB/channel2/3/11/index.html"; Today's News
if (Isset ($url) && $url! = "") {
$str = Implode ("", File ($url));
$str _ary = Explode ("
", $STR);
$str _ary = Explode ("
- ", Trim ($str _ary[1]));
for ($i =0; $i <8; $i + +) {
if (strlen (Trim ($str _ary[$i)) <3) {
Continue
}
echo "News". $i. " : ". $str _ary[$i];
$str 1=strstr ("$str _ary[$i]", ' $str 2=strstr ("$str _ary[$i]", "target");
$len 1=strlen ("$str 1");
$len 2=strlen ("$str 2");
$len = $len 1-$len 2;
$url =substr ("$str 1", Ten, $len-10);
if (strlen (Trim ($url))!=0) {
$url = "http://www.unn.com.cn/". $url;
Define (Contents_dir, "./contents/");
if (Isset ($url) && $url! = "") {
$str = Implode ("", File ($url));
$str 1=explode (' ', $str); Remove the upper part of the file that is useless
$str 2 = Explode ('', $str 1[1]);
Take out the lower half of the file and remove the bottom half of the paper, and all you get is useful.
$str 3=explode ('', $str 2[0]); Remove the file title and body from the entire useful section
$str 4=explode (", $str 2[0]); Date and time taken out
$str 5=explode ('', $str 3[1]); Remove title from title and body section
$title =str_replace ("
"," ", $str 5[0]);
$str 3=explode ('', $str 2[0]); Remove the body of the file from the entire useful section
$str 3[1]=str_replace ('
', ' \ n '. "", $str 3[1]);
$str 3[1]=str_replace ("," ", $str 3[1]);
$str 3=strip_tags ($str 3[1]);
$PF =trim ($title). ". TXT ";
$PPF =fopen (Contents_dir. " $PF ", ' W ');
Fputs ($PPF, $title);
Fputs ($PPF, "$str 4[0]");
Fputs ($PPF, $str 3);
}
}
}
}
?>
The above describes how the Housing Provident Fund extraction at the same time to extract a number of news in the text of a case, including the Housing Provident Fund How to extract the content, I hope that the PHP tutorial interested friends have helped.
-