Directly on the code, the Chinese code to import Redis is not resolved, in the future to solve the first time on the code! Novice on the road, a lot of forgive!
#-*-coding:utf-8-*-ImportReImportRequests fromTimeImportSleep, CTime fromUrllib.requestImportUrlopen fromUrllib.requestImportRequest fromlxmlImportetreeImportRedisImportMYSQLDBR= Redis. Redis (host='192.168.60.112', port=6379,db=0) #host自己的ip地址#add a simulated browser protocol headerheaders = {'user-agent':'mozilla/5.0 (Windows; U Windows NT 6.1; En-us; rv:1.9.1.6) gecko/20091201 firefox/3.5.6'} forPageinchRange (1,3): #爬取第1页到第3页.
#大众点评链接, string URL is used="http://www.dianping.com/search/category/2/10/g112p%i"% (page) +"? aid=90308842%2c21171398%2c22974252%2c77259356%2c79709316%2c69011566%2c93070619%2c75101541%2c5724122% 2c21559834&cpt=90308842%2c21171398%2c22974252%2c77259356%2c79709316%2c69011566%2c93070619%2c75101541% 2c5724122%2c21559834&tc=1"#string concatenation #print (URL)Req_timeout = AA delayreq= Request (Url=url, headers=headers) F=Urlopen (req, None, req_timeout) s=F.read () s= S.decode ('Utf-8') SS=Str (s)#lxml Extractselector =etree. HTML (SS)
#爬的内容 links=Selector.xpath ('//div[@class = "txt"]/div[@class = "tit"]/a/@href |//div[@class = "txt"]/div[@class = "tit"]/a/h4/text ()') forLinkinchLinks:Print(link)
#写入redis, using the list type (stack structure) R.lpush ('MyList', link)
Python crawler crawls popular reviews and imports Redis