Python crawl weather data __python China Weather Network

Source: Internet
Author: User

As a result of the need to get today's weather data, and then picked up Python wrote a crawler to get the weather data on the Chinese meteorological network. Because I need the data is relatively simple, because I only need the temperature (the lowest temperature and the highest temperature) and the weather in Beijing, so the code section is relatively simple, the following is about the crawl process.

First Step web analytics

In order to design the crawler, we must first analyze the request process of the Web page. First, open the China Weather Network homepage, search the search box in Beijing, see the weather in Beijing, as shown in the following picture:




Found in today's data bar does not I want the lowest temperature and the highest temperature, so I chose the "7 Days" link, screenshot as follows:




This time I want the data (the lowest temperature, the highest temperature) have, and then the analysis of the Web page request process. By comparing the "Today" page and the "7 Days" page, we found that the request for the site was simple get request.
For example, the requested URL for the "7 Days" page is as follows:
Url= "Http://www.weather.com.cn/weather/101010100.shtml"

Among them, "weather" represents the request is "7 days", if the request "Today" is "weather1d"; the latter "101010100" represents the number of the Beijing area.


Now that the URL has been figured out, then come down to analyze the source of the Web page, find the data in the source of the present position, after a search, has been positioned data in the source location, where the weather data and temperature data in two P tags, and the highest temperature data in the span label, the lowest temperature in the I tag.




But one of the things that needs to be noticed here is that at night time, there's a change, which is that without the highest temperature, the results of the Web interface are:




The result of rendering in code is a missing span label, leaving only the I tag containing the lowest temperature data. Since I have to have the highest temperature in the data application scenario, to avoid the highest temperature, the method I take is to use the highest temperature of the second day instead (though it is more rough).




Here, the Web analytics work is over, and the next step is to get the data.

Second Step data acquisition

Given the elegance of the Python language, this simple crawler chooses python+beautiful Soup 4 for page parsing. Beauti soup is a python library that extracts data from HTML or XML files, and its powerful parsing function can quickly and easily solve many problems. About Beautisoup, you can refer to the official documents or other documents, here directly posted my code.

From urllib.request import Urlopen from
BS4 import beautifulsoup
import re

resp=urlopen (' http:// Www.weather.com.cn/weather/101010100.shtml ')
soup=beautifulsoup (resp, ' html.parser ')
tagtoday=soup.find (' P ', class_= "tem")  #第一个包含class = The P label of "TEM" is the label that holds today's weather data
try:
    temperaturehigh=tagtoday.span.string  #有时候这个最高温度是不显示的, Use the highest temperature of the second day instead.
except Attributeerror as E:
    temperaturehigh=tagtoday.find_next (' P ', class_= "tem"). Span.string  # Gets the highest temperature of the second day instead of

temperaturelow=tagtoday.i.string  #获取最低温度
weather=soup.find (' P ', class_= "WEA"). String #获取天气

print (' Minimum temperature: ' + Temperaturelow ') print ('
maximum temperature: ' + Temperaturehigh ')
print (' weather: ' + weather)



The results of the program operation are as follows:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.