1. obtain remote filesSource code(File_get_contents or fopen ).
2. AnalysisCodeGet the content you want (regular match is used here, usually pagination ).
3. Download and import the content obtained from the root, and perform other operations.
Here, the second step may have to be repeated several times. For example, you need to analyze the paging address first and analyze the content of the internal page to get what we want.
Code:
I have recorded some of the code from the previous sending part. Today I am here to make a simple sending part.
Copy the PHP content to the clipboard.
PHP code:
@ $ NL = file_get_contents ($ Rs ['url']); // capture remote content
Preg_match_all ("/var url =" gameswf /(.*?). SWF ";/is", $ NL, $ Connect); // perform a regular match to obtain the desired content.
Mysql_query ("insert... insert Database ");
The above code is all the code used for collection. Of course, you can also use FOPE. I personally prefer file_get_contents.
Below, I will share my method of downloading the image flash to the local device, which is too simple with two lines of code.
PHP code:
Copy code The Code is as follows: if (@ copy ($ URL, $ newurl )){
Echo 'OK ';
}
I used to send an image download function to the forum. This function will also be available to you.
PHP code:Copy codeThe Code is as follows:/* Save image function */
Function getimg ($ URL, $ filename ){
/* Determine whether the image URL is empty. If it is empty, stop the function */
If ($ url = ""){
Return false;
}
/* Get the image extension and save it to the variable $ ext */
$ Ext = strrchr ($ URL ,".");
/* Determine whether the image file is valid */
If ($ ext! = ". GIF" & $ ext! = ". Jpg "){
Return false;
}
/* Read the image */
$ IMG = file_get_contents ($ URL );
/* Open the specified file */
$ Fp = @ fopen ($ filename. $ Ext, "");
/* Write the image to the pointing file */
Fwrite ($ FP, $ IMG );
/* Close the file */
Fclose ($ FP );
/* Return the new image file name */
Return $ filename. $ ext;
}
share your personal experience:
1. if you do not use anti-leeching sites, you can resort to fraud, but the cost of such sites is too high
2. the site that collects data as quickly as possible. It is best to collect data locally
3. when collecting data, you can save a part of the data to the database and perform subsequent processing.
4. Make sure to handle errors during collection. I usually skip this step if the collection fails three times. In the past, it was often because a piece of content could not be collected and stuck there for a long time.
5. Make good judgment before warehouse receiving, check whether the content is valid, and filter unnecessary strings.