Preface
We are writing a Web application, always think of their own site more beautiful, more features, sometimes write some small tools or add a small plug-in will make our site more perfect. such as the perpetual calendar function, for example, we now want to talk about the weather forecast function.
Of course we can't use professional satellites to receive data, so our weather data comes from the existing weather forecast website. Using the data service provided by the weather forecast website, we can write a PHP crawler, then dynamically collect the data we need, and when we update the data at the target site, our program can update the data automatically.
Here's how to write a simple PHP Data acquisition program (PHP crawler).
Principle
Given a Web page URL, use PHP to download the page and get the content of the page, and then through the regular expression to extract the data we are interested in, and then output.
Specifically in this example, the page we want to crawl is http://www.weather.com.cn/weather/101050101.shtml, and we are interested in the next 7 days of weather in the page.
Realize
0. Get the URL of the weather forecast webpage:
Copy the Code code as follows:
$url = "http://www.weather.com.cn/weather/101050101.shtml";
$page _content = file_get_contents ($url);
Here, the file_get_contents () function downloads the Web page that the $url points to, and returns the page content as a string. So, $page _content variable is the entire HTML code for the page we're crawling. Next, we want to extract the data we need from it.
1. Use regular expressions to match a string that matches a condition
First output the value of the $page _content, and then look at the Web page source code, see that we need the string can be
Copy the Code code as follows:
......
Found in the comments in these two lines.
Use regular expressions to get all the content between and:
Copy the Code code as follows:
Eregi (" (. *) ", $page _content, $res);
2. Path to the picture in the complete page
Since the picture paths in the remote Web pages are relative paths like/m2/i/icon_weather/29x20/d01.gif, we need to complement these paths and precede them with http://www.weather.com.cn.
Copy the Code code as follows:
$forecast = Str_replace ("
At this point, $forecast is the weather information we need. This simple PHP crawler is also ready to write.
Source
Here is the full source code for the crawl weather forecast applet, which adds some code to measure the run time of each part of the program, and can control which days of information is crawled by setting the value of $start and $end.
Copy the Code code as follows:
$url = "http://www.weather.com.cn/weather/101050101.shtml";
$t 1 = time ();
$page _content = file_get_contents ($url);
$t 2 = time ();
$start = 1;
$end = 3;
if ($end > 7) {
echo "Beyond the range of forecast capabilities, please reset!" ";
}else {
echo "The Future". ($end-$start). " Weather forecast for Harbin ("
. Date (' Y-m-j '). " Release) ";
Eregi ("--day $start-(. *)--day $end--", $page _content, $res);
$forecast = Str_replace ("" $t 3 = time ();
Echo $forecast;
Echo ' first step costs '. ($t 2-$t 1). ' Ms. ';
Echo ' last step costs '. ($t 3-$t 2). ' Ms. ';
}
Other application examples
The same ideas can be tried: NBA daily score cards, today hit news synchronization, stock market and so on. Synchronous real-time updates are possible. For the time being, you are welcome to shoot bricks.