PHP-based data warehouse receiving Program (1): php-based data warehouse receiving program

Source: Internet
Author: User

PHP-based data warehouse receiving Program (1): php-based data warehouse receiving program

A friend asked me to help me develop a program for collecting news information a few days ago. I took some time to write a PHP version and recorded it as needed.

When it comes to collection, it is nothing more than obtaining information remotely-> extracting the required content-> classifying storage-> reading-> displaying

It is also an enhanced version of simple "thief program ".

The following is the corresponding core code (don't take it as a bad thing. ^_^)

 

The content to be collected is an announcement on a game website, such:

You can use file_get_contents and simple regular expressions to obtain basic page information.

Sort the basic information and collect the information into the database:

<? Php include_once ("conn. php "); if ($ _ GET ['id'] <= 8 & $ _ GET ['id']) {$ id = $ _ GET ['id']; $ conn = file_get_contents ("http://www.93moli.com/news_list_4_$id.html "); // get the page content $ pattern = "/<li> <a title = \"(. *) \ "target = \" _ blank \ "href = \"(. *) \ ">/iUs"; // regular preg_match_all ($ pattern, $ conn, $ arr); // match the content to the arr array // print_r ($ arr ); die; foreach ($ arr [1] as $ key => $ value) {// the id of the Two-dimensional array [2] is exactly the same as that of [1, starting with key $ url = "http://www.93moli.com/ ". $ Arr [2] [$ key]; $ SQL = "insert into list (title, url) value ('$ value',' $ url ')"; mysql_query ($ SQL); // echo "<a href = 'content. php? Url = http://www.93moli.com/?url'> $ value </a> ". "<br/>" ;}$ id ++; echo ": collecting URL data list $ id... please wait... "; echo" <script> window. location = 'list. php? Id = $ id' </script> ";} else {echo" Data Collection ends. ";}?>

 

Conn. php is the database connection file

List. php is the current page

Because the data to be collected is displayed by PAGE and the page address is regularly increasing, I used js jump code to control the number of pages to be collected by using id transfer, this also avoids the large number of for loops.

 

You can easily import data to the database. In the next blog, you can write the process of collecting specific url Information.

 


How can php programmers master data collection?

Common php data collection techniques:

1. Skills in Data Extraction Using Regular Expressions: Key Steps for extracting content
2. skillful character encoding conversion analysis technology: Compatibility management and data validity control
3. Skilled data warehouse receiving and Sorting Technology: storage and management of collected content, including databases, files, and progress
4. Data Mining and website crawling technology: analyzes the website structure, simplifies crawling techniques, and improves efficiency
5. Anti-collection processing technology: Anti-collection technology designed for objects with anti-collection targets
6. multi-server concurrent Collection Management Technology: working methods to improve efficiency
7. Data collation and analysis technology: Check for missing data to verify data correctness and effectiveness
8. Self-identity protection technology: Self-Information Protection

A php collection program needs to be able to collect lists and content that can be pages, and it is best to add comments to the database.

Phpquery uses this to write another database entry,
 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.