PHP uses file_get_contents to collect site content in bulk

Source: Internet
Author: User
Tags base64 base64 encode explode
Recently found a "stingy" learning site. Site content should not let replication, so do, how to let us test the time to make small copy, it is difficult not to a word a word to play. Fortunately, we are engaged in technology, this problem is still difficult to fall, you do not allow replication just right, this I still lazy trouble. It would be more convenient to take a script directly to remove the contents of the lesson.
Say to do, first look at the source code. However, the Web page is prohibited right button, the right button has the following tips:

This is not difficult, the way to view the source code of the Web page too much, do not know the Internet can look for it. See the source code, found that the contents of the page did not appear in the source code. Then take out the httpwatch bag analysis, in which the other link to find the page source code, but the source code is encrypted. As follows:

But this encryption is a bit of a dish, which is already clear that it is Base64 encryption. This decoding is not difficult, the Linux system with the Base64 tool can be completed:
[Root@web20 php]# base64--helpusage:base64 [OPTION] [file]base64 encode or decode FILE, or standard input, to standard O Utput.-w,--wrap=cols       Wrap encoded lines after COLS character (default 76). Use 0-Disable line wrapping.-d,--decode          decode Data.-i,--ignore-garbage when  decoding, ignore Non-alphabet CH Aracters.--help            Display This Help and exit.--version         Output version information and exit. If [file] defaults, or [file] is-, Read the standard input. The data are encoded as described for the Base64 alphabet in RFCs 3548.Decoding require compliant input by default gnore-Garbage to
attempt to recover from non-alphabet-characters (such as newlines) in the
encoded ST Ream.
base64-d file name on the line. However, the decoding found that the results are URL-typed. The results obtained are as follows:
%20%20%5b%e8%af%86%e8%ae%b0%5d%e4%bc%9a%e8%ae%a1%e7%9a%84%e6%b6%b5%e4%b9%89%e6%98%af%e4%bb%80%e4%b9%88%ef%20% 20
See this result is not puzzled again, it should be happy at this time. Because the result has already come out half. This result is not exactly the same as the URL in the search for Chinese characters to get the same URL result?
such as: I found in the hao123 Baidu search "test", the page URL is
Http://www.baidu.com/s?word=%B2%E2%CA%D4&tn=sitehao123
Test two Chinese characters in the URL becomes a%b2%e2%ca%d4, know the principle. Decoding is still not very simple. There is a function in PHP that is urldecode for this. Here is a list of all my URL codes:
? PHP
 For ($i =18291 $i <=18788 $i + +) {
= file_get_contents ("http://www. Xxx.com/test.php?wiki_id= ". $i);//echo $content; 
$SPWT 1=explode ("question:", $content); 
$SPWT 2=explode ('));< ', $SPWT 1[1]); 
$spdn=explode ("Answer:", $SPWT 2[1]);//echo $SPWT 2[0];//echo $SPDN [1]; 
preg_match('/base64decode "(. *?)" /', $SPWT 2[0], $MATCHESW); 
$wen=urldecode (Base64_decode ($MATCHESW [1])); 
echo $wen; 
 "n"; 
 "n"; 
preg_match('/base64decode "(. *?)" /', $SPDN [1], $MATCHESD); 
$da=urldecode (Base64_decode ($matchesd [1])); 
echo $da;}? >
In addition, for my protection of the intellectual property of that site, he changed his URL to http://www.XXX.com/test.php. (After all, it is not easy for others to do so.) I also have a PHP environment on my server, running PHP test.php directly. The result is very gratifying, for a while, the contents of this subject are all out.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.