PHP thief program design and implementation example-PHP source code

Source: Internet
Author: User
In fact, the thief program automatically collects the content of other websites, and then performs some processing and changes it to its own website to be powerless. For example, dedecms now has this function, and the locomotive is also such a tool, the following is an example. In fact, the thief program automatically collects the content of other websites, and then performs some processing and changes it to its own website to be powerless. For example, dedecms now has this function, and the locomotive is also such a tool, the following is an example.

Script ec (2); script


I have always wanted to create a website with pictures of content. The previous idea was to create a CMS and then upload some pictures by myself ..

At first, there was no motivation to do this. Then I gave up and studied a CURL. It is better to implement this idea anyway.

Using PHP to steal pictures is like wearing so and sandals. Although there is no problem, it does hurt.

Let me first talk about the design of the PHP thief program. PHP does not support multithreading, so it can only be done in order.

Get the HTML page of the target website + parse HTML page get the connection to the image store + read and save it locally in binary mode + rename = process OK

You can run the program in two ways:

First: run the program with the browser (most of them will be stuck, set the timeout and memory size to OK, it is difficult to wait between you)

Another method: Start PHP with a command line (PHP timeout does not exist)

/

The Code is as follows:
**
* HTML parsing class
* Author: Summer
* Date: 2014-08-22
**/

Class Analytical {

Public function _ construct ()
{
Require_once ('class/SimpleHtmlDom. Class. php ');
$ This-> _ getDir ();
}

Private function _ getDir ()
{
$ Dir = "../TMP/HTML/Results/1 ";
$ ImgBIG = "../TMP/IMG/JPG/BIG ";
$ It = new DirectoryIterator ($ dir ."/");
Foreach ($ it as $ file ){
// Use the isDot () method to filter out the "." and "." directories respectively.
If (! $ It-> isDot ()){
$ Dirs = $ dir. "/". $ file;
$ Tmp = explode (".", $ file );
$ Html = file_get_html ($ dirs );
$ UlArr = $ html-> find ('img ');
Foreach ($ ulArr as $ key => $ value)
{
If ($ value-> class = "u ")
{
$ Url = "http://www.111cn.net". $ value-> src;
$ Infomation = file_get_contents ($ url );
$ Result = $ this-> saveHtml ($ infomation, $ imgBIG, $ tmp ['0']. ". jpg ");
If ($ result)
{
Echo $ file. "OKn ";
}
}

}
}
}
}

Private function saveHtml ($ infomation, $ filedir, $ filename)
{

If (! $ This-> mkdirs ($ filedir ))
{
Return 0;
}

$ Sf = $ filedir. "/". $ filename;
$ Fp = fopen ($ sf, "w"); // open a file in write mode
Return fwrite ($ fp, $ infomation); // Save the content
Fclose ($ fp); // close the file
}

// Create a directory
Private function mkdirs ($ dir)
{
If (! Is_dir ($ dir ))
{
If (! $ This-> mkdirs (dirname ($ dir ))){
Return false;
}
If (! Mkdir ($ dir, 0777 )){
Return false;
}
}
Return true;
}

}

New Analytical ();

The above is the process of getting the IMG connection address on the HTML page.

Two important things are used:

1. simplehtmldom extension for php dom Parsing

2. Directory iterator of PHP

Understand these two things. This analysis class has no difficulties.

What if I get the page to be parsed?

In fact, the principle is the same as above. Obtains the URL of the page, reads the page through CURL, and returns an HTML string,

Then, save the function package HTML page to your local device.

I want to collect images on the page (to prevent anti-leech protection from others), so the design is complicated.

The Reason for separation is that simplehtmldom objects are very large and the process is clearer by splitting them.

Some people will say, why does it skip the process of saving HTML to the local without regular expression matching? BINGO! I can't bother writing regular expressions.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.