Host Environment: (PYTHON2.7.9/WIN8_64/BS4)
Use BEAUTIFULSOUP4 to crawl www.pm25.com on the PM2.5 data, the reason to crawl this site, because there is a city PM2.5 concentration ranking (in fact, it is Baidu search PM2.5 out of the first site!). )
The program only compares two cities, so the speed of multithreading is not very obvious, you can get 10 cities and open 10 threads to try.
Finally spit the trough: Shanghai air quality how so bad!!!
pm25.py
The code is as follows:
#!/usr/bin/env python
#-*-Coding:utf-8-*-
# by Ustcwq
Import Urllib2
Import threading
From time import CTime
From BS4 import BeautifulSoup
def getPM25 (CityName):
site = ' http://www.pm25.com/' + CityName + '. html '
html = urllib2.urlopen (site)
Soup = beautifulsoup (HTML)
City = soup.find (Class_ = ' bi_loaction_city ') # Town Name
AQI = Soup.find ("a", {"Class", "Bi_aqiarea_num"}) # AQI index
Quality = Soup.select (". Bi_aqiarea_right span") # Air quality Rating
result = Soup.find ("div", Class_ = ' bi_aqiarea_bottom ') # Air quality description
Print City.text + u ' AQI index: ' + aqi.text + u ' \ n Air quality: ' + Quality[0].text + result.text
print ' * ' *20 + ctime () + ' * ' *20
Def one_thread (): # single Thread
print ' One_thread Start: ' + ctime () + ' \ n '
GetPM25 (' Hefei ')
GETPM25 (' Shanghai ')
Def two_thread (): # Multithreading
print ' Two_thread Start: ' + ctime () + ' \ n '
Threads = []
T1 = Threading. Thread (target=getpm25,args= (' Hefei ',))
Threads.append (T1)
T2 = Threading. Thread (target=getpm25,args= (' Shanghai ',))
Threads.append (T2)
For T in Threads:
# T.setdaemon (True)
T.start ()
if __name__ = = ' __main__ ':
One_thread ()
print ' \ n ' * 2
Two_thread ()
The above is the whole content of this article, I hope you can enjoy.