Python crawls historical weather data of the weather network, and python weather
My first blog, hahaha, record my advanced Python path!
Today I wrote a simple crawler.
Use the requests and BeautifulSoup modules of python. Python 2.7.12 can be directly installed using pip in the command line. The core of crawler is to use the select statement of BeautifulSoup to obtain the required information.
pip install requestspip install bs4
Take wuhan as an Example ~ The history of July is used as an example to crawl the historical weather data of Wuhan in the weather network.
July corresponding URL for http://lishi.tianqi.com/wuhan/201707.html
1. The requests module obtains the webpage content.
url='http://lishi.tianqi.com/wuhan/201707.html'response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser')
2. Use the. select statement to find the div of the page's Zhongtian data
weather_list = soup.select('div[class="tqtongji2"]')
3. Find data such as date, maximum temperature, minimum temperature, and weather, and use li. string to obtain information in li.
Ul_list = weather. select ('ul ') for ul in ul_list: li_list = ul. select ('lil') for li in li_list: li. string. encode ('utf-8') # specific weather information
The Code is as follows:
#encoding:utf-8 import requests from bs4 import BeautifulSoup urls = ["http://lishi.tianqi.com/wuhan/201707.html", "http://lishi.tianqi.com/wuhan/201706.html", "http://lishi.tianqi.com/wuhan/201705.html"] file = open('wuhan_weather.csv','w') for url in urls: response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') weather_list = soup.select('div[class="tqtongji2"]') for weather in weather_list: weather_date = weather.select('a')[0].string.encode('utf-8') ul_list = weather.select('ul') i=0 for ul in ul_list: li_list= ul.select('li') str="" for li in li_list: str += li.string.encode('utf-8')+',' if i!=0: file.write(str+'\n') i+=1 file.close()
Final result:
Compared with regular expressions, it is so easy to use select statements to crawl data!
The regular expression is not very understandable. Wait for a clear explanation and write a summary.
Move from csdn to experience http://blog.csdn.net/haha_point/article/details/77197230