1. Installing the Python Environment
Official website https://www.python.org/Download the installation program that matches the operating system, install and configure environment variables
2.IntelliJ idea Install Python plugin
I used the idea to search for plugins and install them directly in the tool (Baidu)
3. Installing the BeautifulSoup Plugin
https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/#attributes
4. Crawler: Crawl Blog Park Flash content
#!/usr/bin/python#-*-coding:utf-8-*-import urllib2import timeimport bs4 "' ing.cnblogs.com Reptiles ' class Cnblogsspider : url = "Https://ing.cnblogs.com/ajax/ing/GetIngList?" inglisttype=all&pageindex=${pageno}&pagesize=30&tag=&_= "#获取html def gethtml (self): request = Urllib2. Request (Self.pageurl) response = Urllib2.urlopen (request) self.html = Response.read () #解析html def anal Yze (self): self.gethtml () Bsoup = BS4. BeautifulSoup (self.html) divs = Bsoup.find_all ("div", class_= ' Ing-item ') for div in divs:img = di V.find ("img") [' src '] item = div.find ("div", class_= ' feed_body ') UserName = Item.find ("A", class_= ' ing- Author '). Text text = Item.find ("span", class_= ' Ing_body '). Text pubtime = Item.find ("A", class_= ' Ing_ti Me '). Text star = Item.find ("img", class_= ' Ing-icon ') and True or False print ' (Avatar: ', IMG, ' nickname: ', user Name, ', Flash: ', text, ', Time: ', pubtime, ', Star: ', Star, ') ' Def Run (self,page): PageNo = 1 while (PageNo <= page): Self.pageurl = sel F.url.replace (' ${pageno} ', str (pageno)) +str (int (time.time ())) print '-------------\ r \ n ', PageNo, ' page data is as follows: ', SEL F.pageurl self.analyze () PageNo = PageNo + 1CnBlogsSpider (). Run (3)
5. Implementation results
First Python crawler