PHP-based data warehouse receiving Program (2): php-based data warehouse receiving program

Source: Internet
Author: User

PHP-based data warehouse receiving Program (2): php-based data warehouse receiving program

In the previous article, the Program for collecting data into the database based on PHP (ii) mentions the list data on the news information page. Next, let's talk about the specific content of the collected news.

This is the final data table of the previous blog:

The next step is to read the URL to be collected from the database and capture the page.

Create a content table

However, you must note that you cannot use the incremental method of id collection URL, because IDs in the data table may be intermittent, such as id = 9, id = 11, when id = 10 is collected, the URL is blank, which may result in empty fields being collected.

One technique used here is the database query statement. When we collect the first piece of data, we can determine whether there is an id number greater than this id in the database. If so, read one, the query information already exists.

The Code is as follows:

<? Php include_once ("conn. php "); $ id = (int) $ _ GET ['id']; $ SQL =" select * from list where id = $ id "; $ result = mysql_query ($ SQL); $ row = mysql_fetch_array ($ result); // obtain the corresponding url address $ content = file_get_contents ($ row ['url']); $ pattern = "/<dd class = \" dataWrap \ "> (. *) <\/dd>/iUs "; preg_match ($ pattern, $ content, $ info); // obtain the information to store info echo $ title = $ row [1]. "<br/>"; echo $ content = $ info [0]. "

In this way, the news content we want will be collected into the database. Next we only need to sort out some data styles.


How can php programmers master data collection?

Common php data collection techniques:

1. Skills in Data Extraction Using Regular Expressions: Key Steps for extracting content
2. skillful character encoding conversion analysis technology: Compatibility management and data validity control
3. Skilled data warehouse receiving and Sorting Technology: storage and management of collected content, including databases, files, and progress
4. Data Mining and website crawling technology: analyzes the website structure, simplifies crawling techniques, and improves efficiency
5. Anti-collection processing technology: Anti-collection technology designed for objects with anti-collection targets
6. multi-server concurrent Collection Management Technology: working methods to improve efficiency
7. Data collation and analysis technology: Check for missing data to verify data correctness and effectiveness
8. Self-identity protection technology: Self-Information Protection

PHP collection warehouse receiving Problems

Php has the $ nr = implode ('#', $ arr) method.
However, the above is composed of "content 1 # Content 2", without the last #, if necessary
$ Nr = implode ('#', $ arr ).'#'

The stupid method is to use
Foreach ($ arr as $ vl ){
$ Nr. = $ vl ."#";
}
References: $

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.