PHP-based data warehouse receiving Program (1): php-based data warehouse receiving program
A friend asked me to help me develop a program for collecting news information a few days ago. I took some time to write a PHP version and recorded it as needed.
When it comes to collection, it is nothing more than obtaining information remotely-> extracting the required content-> classifying storage-> reading-> displaying
It is also an enhanced version of simple "thief program ".
The following is the corresponding core code (don't take it as a bad thing. ^_^)
The content to be collected is an announcement on a game website, such:
You can use file_get_contents and simple regular expressions to obtain basic page information.
Sort the basic information and collect the information into the database:
<? Php include_once ("conn. php "); if ($ _ GET ['id'] <= 8 & $ _ GET ['id']) {$ id = $ _ GET ['id']; $ conn = file_get_contents ("http://www.93moli.com/news_list_4_$id.html "); // get the page content $ pattern = "/<li> <a title = \"(. *) \ "target = \" _ blank \ "href = \"(. *) \ ">/iUs"; // regular preg_match_all ($ pattern, $ conn, $ arr); // match the content to the arr array // print_r ($ arr ); die; foreach ($ arr [1] as $ key => $ value) {// the id of the Two-dimensional array [2] is exactly the same as that of [1, starting with key $ url = "http://www.93moli.com/ ". $ Arr [2] [$ key]; $ SQL = "insert into list (title, url) value ('$ value',' $ url ')"; mysql_query ($ SQL); // echo "<a href = 'content. php? Url = http://www.93moli.com/?url'> $ value </a> ". "<br/>" ;}$ id ++; echo ": collecting URL data list $ id... please wait... "; echo" <script> window. location = 'list. php? Id = $ id' </script> ";} else {echo" Data Collection ends. ";}?>
Conn. php is the database connection file
List. php is the current page
Because the data to be collected is displayed by PAGE and the page address is regularly increasing, I used js jump code to control the number of pages to be collected by using id transfer, this also avoids the large number of for loops.
You can easily import data to the database. In the next blog, you can write the process of collecting specific url Information.
How can php programmers master data collection?
Common php data collection techniques:
1. Skills in Data Extraction Using Regular Expressions: Key Steps for extracting content
2. skillful character encoding conversion analysis technology: Compatibility management and data validity control
3. Skilled data warehouse receiving and Sorting Technology: storage and management of collected content, including databases, files, and progress
4. Data Mining and website crawling technology: analyzes the website structure, simplifies crawling techniques, and improves efficiency
5. Anti-collection processing technology: Anti-collection technology designed for objects with anti-collection targets
6. multi-server concurrent Collection Management Technology: working methods to improve efficiency
7. Data collation and analysis technology: Check for missing data to verify data correctness and effectiveness
8. Self-identity protection technology: Self-Information Protection
A php collection program needs to be able to collect lists and content that can be pages, and it is best to add comments to the database.
Phpquery uses this to write another database entry,