PHP uses file_get_contents to collect site content in bulk

Last Update:2017-01-13 Source: Internet

Author: User

Tags base64 base64 encode explode

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently found a "stingy" learning site. Site content should not let replication, so do, how to let us test the time to make small copy, it is difficult not to a word a word to play. Fortunately, we are engaged in technology, this problem is still difficult to fall, you do not allow replication just right, this I still lazy trouble. It would be more convenient to take a script directly to remove the contents of the lesson.

Say to do, first look at the source code. However, the Web page is prohibited right button, the right button has the following tips:

This is not difficult, the way to view the source code of the Web page too much, do not know the Internet can look for it. See the source code, found that the contents of the page did not appear in the source code. Then take out the httpwatch bag analysis, in which the other link to find the page source code, but the source code is encrypted. As follows:

But this encryption is a bit of a dish, which is already clear that it is Base64 encryption. This decoding is not difficult, the Linux system with the Base64 tool can be completed:

[Root@web20 php]# base64--helpusage:base64 [OPTION] [file]base64 encode or decode FILE, or standard input, to standard O Utput.-w,--wrap=cols       Wrap encoded lines after COLS character (default 76). Use 0-Disable line wrapping.-d,--decode          decode Data.-i,--ignore-garbage when  decoding, ignore Non-alphabet CH Aracters.--help            Display This Help and exit.--version         Output version information and exit. If [file] defaults, or [file] is-, Read the standard input. The data are encoded as described for the Base64 alphabet in RFCs 3548.Decoding require compliant input by default gnore-Garbage to
attempt to recover from non-alphabet-characters (such as newlines) in the
encoded ST Ream.

base64-d file name on the line. However, the decoding found that the results are URL-typed. The results obtained are as follows:

%20%20%5b%e8%af%86%e8%ae%b0%5d%e4%bc%9a%e8%ae%a1%e7%9a%84%e6%b6%b5%e4%b9%89%e6%98%af%e4%bb%80%e4%b9%88%ef%20% 20

See this result is not puzzled again, it should be happy at this time. Because the result has already come out half. This result is not exactly the same as the URL in the search for Chinese characters to get the same URL result?

such as: I found in the hao123 Baidu search "test", the page URL is

Http://www.baidu.com/s?word=%B2%E2%CA%D4&tn=sitehao123

Test two Chinese characters in the URL becomes a%b2%e2%ca%d4, know the principle. Decoding is still not very simple. There is a function in PHP that is urldecode for this. Here is a list of all my URL codes:

? PHP
 For ($i =18291 $i <=18788 $i + +) {
= file_get_contents ("http://www. Xxx.com/test.php?wiki_id= ". $i);//echo $content; 
$SPWT 1=explode ("question:", $content); 
$SPWT 2=explode ('));< ', $SPWT 1[1]); 
$spdn=explode ("Answer:", $SPWT 2[1]);//echo $SPWT 2[0];//echo $SPDN [1]; 
preg_match('/base64decode "(. *?)" /', $SPWT 2[0], $MATCHESW); 
$wen=urldecode (Base64_decode ($MATCHESW [1])); 
echo $wen; 
 "n"; 
 "n"; 
preg_match('/base64decode "(. *?)" /', $SPDN [1], $MATCHESD); 
$da=urldecode (Base64_decode ($matchesd [1])); 
echo $da;}? >

In addition, for my protection of the intellectual property of that site, he changed his URL to http://www.XXX.com/test.php. (After all, it is not easy for others to do so.) I also have a PHP environment on my server, running PHP test.php directly. The result is very gratifying, for a while, the contents of this subject are all out.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More