Python crawl weather data __python China Weather Network

Last Update:2018-07-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As a result of the need to get today's weather data, and then picked up Python wrote a crawler to get the weather data on the Chinese meteorological network. Because I need the data is relatively simple, because I only need the temperature (the lowest temperature and the highest temperature) and the weather in Beijing, so the code section is relatively simple, the following is about the crawl process.

First Step web analytics

In order to design the crawler, we must first analyze the request process of the Web page. First, open the China Weather Network homepage, search the search box in Beijing, see the weather in Beijing, as shown in the following picture:

Found in today's data bar does not I want the lowest temperature and the highest temperature, so I chose the "7 Days" link, screenshot as follows:

This time I want the data (the lowest temperature, the highest temperature) have, and then the analysis of the Web page request process. By comparing the "Today" page and the "7 Days" page, we found that the request for the site was simple get request.
For example, the requested URL for the "7 Days" page is as follows:
Url= "Http://www.weather.com.cn/weather/101010100.shtml"

Among them, "weather" represents the request is "7 days", if the request "Today" is "weather1d"; the latter "101010100" represents the number of the Beijing area.

Now that the URL has been figured out, then come down to analyze the source of the Web page, find the data in the source of the present position, after a search, has been positioned data in the source location, where the weather data and temperature data in two P tags, and the highest temperature data in the span label, the lowest temperature in the I tag.

But one of the things that needs to be noticed here is that at night time, there's a change, which is that without the highest temperature, the results of the Web interface are:

The result of rendering in code is a missing span label, leaving only the I tag containing the lowest temperature data. Since I have to have the highest temperature in the data application scenario, to avoid the highest temperature, the method I take is to use the highest temperature of the second day instead (though it is more rough).

Here, the Web analytics work is over, and the next step is to get the data.

Second Step data acquisition

Given the elegance of the Python language, this simple crawler chooses python+beautiful Soup 4 for page parsing. Beauti soup is a python library that extracts data from HTML or XML files, and its powerful parsing function can quickly and easily solve many problems. About Beautisoup, you can refer to the official documents or other documents, here directly posted my code.

From urllib.request import Urlopen from
BS4 import beautifulsoup
import re

resp=urlopen (' http:// Www.weather.com.cn/weather/101010100.shtml ')
soup=beautifulsoup (resp, ' html.parser ')
tagtoday=soup.find (' P ', class_= "tem")  #第一个包含class = The P label of "TEM" is the label that holds today's weather data
try:
    temperaturehigh=tagtoday.span.string  #有时候这个最高温度是不显示的, Use the highest temperature of the second day instead.
except Attributeerror as E:
    temperaturehigh=tagtoday.find_next (' P ', class_= "tem"). Span.string  # Gets the highest temperature of the second day instead of

temperaturelow=tagtoday.i.string  #获取最低温度
weather=soup.find (' P ', class_= "WEA"). String #获取天气

print (' Minimum temperature: ' + Temperaturelow ') print ('
maximum temperature: ' + Temperaturehigh ')
print (' weather: ' + weather)

The results of the program operation are as follows:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python crawl weather data __python China Weather Network

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python crawl weather data __python China Weather Network

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support