This article mainly introduces how to collect weather data from www.weather.com.cn over the next seven days. This article describes the implementation of the demand in detail and can also be used as an introductory tutorial for learning PHP collection, for more information, see
This article mainly introduces how to collect weather data from www.weather.com.cn over the next seven days. This article describes the implementation of the demand in detail and can also be used as an introductory tutorial for learning PHP collection, for more information, see
Preface
When we write a Web program, we always want to make our website more beautiful and have more functions, sometimes writing some small tools or adding a small plug-in will make our site more complete. For example, the Perpetual Calendar function, for example, the weather forecast function we want to talk about now.
Of course, we cannot use professional satellites to accept data, so our weather data comes from the existing weather forecast website. Using the data service provided by the weather forecast website, we can write a PHP crawler, then dynamically collect the data we need, and when the target site updates the data ,, our programs can also be synchronously updated to automatically obtain data.
The following describes how to compile a simple PHP Data Collection Program (PHP crawler ).
Principle
Given the URL of a webpage, use PHP to download the webpage and obtain the webpage content. Then, extract the data we are interested in using a regular expression and output the data.
In this example, the webpage we want to capture is the weather conditions in the next seven days.
Implementation
0. Obtain the URL of the weather forecast webpage:
The Code is as follows:
$ Url = "http://www.weather.com.cn/weather/101050101.shtml ";
$ Page_content = file_get_contents ($ url );
Here, the file_get_contents () function downloads the webpage pointed to by $ url and returns the webpage content as a string. Therefore, the $ page_content variable contains all the HTML code of the webpage to be crawled. Next, we need to extract the data from it.
1. Use regular expressions to match matching strings
Output the value of $ page_content first, and then view the source code of the webpage.
The Code is as follows:
......
These two lines are found in the comment.
Use a regular expression to obtain And Between:
The Code is as follows:
Eregi (" (.*) ", $ Page_content, $ res );
2. Complete the path of the image on the page
Because the image paths in the remote webpage are relative paths like/m2/I/icon_weather/29x20/d01.gif, we need to complete these paths and add them before them.
The Code is as follows:
$ Forecast = str_replace ("
Now, $ forecast is the weather forecast information we need. This simple PHP crawler is also well written.
Source code
The following is the complete source code for the capture weather forecast applet. Some code is added to measure the running time of each part of the program, you can set the values of $ start and $ end to control the days of capturing information.
The Code is as follows:
$ Url = "http://www.weather.com.cn/weather/101050101.shtml ";
$ T1 = time ();
$ Page_content = file_get_contents ($ url );
$ T2 = time ();
$ Start = 1;
$ End = 3;
If ($ end> 7 ){
Echo "exceeds the forecast capability range. Please reset it! ";
} Else {
Echo "future". ($ end-$ start). "Weather Forecast for Harbin ("
. Date ('Y-m-J'). "published )";
Eregi ("-- day $ start -- (. *) -- day $ end --", $ page_content, $ res );
$ Forecast = str_replace ("" $ t3 = time ();
Echo $ forecast;
Echo 'first step costs '. ($ t2-$ t1). 'Ms .';
Echo 'last step costs '. ($ t3-$ t2). 'Ms .';
}
Other Application Examples
In the same way, we can try the following methods: NBA score cards every day, the news sync of the Harbin Institute of Technology, and the stock market. Both are synchronized and updated in real time. I only want to think about this for the time being. You are welcome to make a picture ~