Lao Yan to crawl a network shopping site merchandise information, just I recently learning Python, wrote together a simple crawler program.
Demand: A network of product information, including product name, market price and price
Tool: Python2.7.8,urllib2,re
#coding = utf-8import Urllib2import Repath = "Aaa.txt" f = open (path, ' w+ ') for I in range (4980, 4991): print i # ge T webpage content url = "http://*" + str (i) + "*" page = Urllib2.urlopen (URL). Read () # Regular matching ma Tchtitle = Re.search (R ' <dt> (. *?) </dt> ', page) Matchmarketprice = Re.search (R ' <del.*?> (. *?) </del> ', page) Matchcurrentprice = Re.search (R ' <b> (. *?) </b> ', page] # Save result if Matchtitle and Matchmarketprice and Matchcurrentprice: f.write ( Matchtitle.group (1) + ' \ t ' + matchmarketprice.group (1) + ' \ t ' + matchcurrentprice.group (1) + ' \ n ') F.close ()
Some of the results show:
L ' oreal cream 30ml¥120.00 109.00 L-facial face milk 125ml¥130.00105.00 l ' oreal anti-wrinkle Firming Moisturizing Eye cream 15ml¥210.00179.00 l ' oreal be evaluated Seminyak skin toner 175ml¥ 160.00138.00
Python Crawl Product information