Python 2.7_ uses XPath syntax to crawl top250 information in a watercress book _20170129

Source: Internet
Author: User
Tags xpath

Second day, busy home some things, shun with people to crawl the watercress book top250

1. Construct the URLs list urls=[' https://book.douban.com/top250?start={} '. Format (str (i) for I in range (0, 226, 25))]

2. Module requests get webpage source code lxml Parse Web page XPath extract

3. Extracting information

4, can be encapsulated into a function here does not encapsulate the call

Python code:

#coding: Utf-8import sysreload (SYS) sys.setdefaultencoding (' Utf-8 ') from lxml import etreeimport requestsurls=[' HTTPS ://book.douban.com/top250?start={} '. Format (str (i) for I in range (0, 226, +))]for URL in urls:html=requests.get (URL). c Ontent Selector=etree. HTML (HTML) infos=selector.xpath ('//tr[@class = "item"] ') for info in infos:book_name = Info.xpath (' td/div/a/@t        Itle ') [0] Book_url = Info.xpath (' td/div/a/@href ') [0] Published_infos = str (Info.xpath (' Td/p/text () ') [0])         Splitlistinfos = Published_infos.split ('/') #print Splitlistinfos published_date=str (Splitlistinfos[-2]) #print published_date price = str (splitlistinfos[-1]) #print Price rate = Info.xpath (' td/div/ Span[2]/text () ') [0] # comment_nums = Info.xpath (' Td/div/span[3]/text () ') [0] # Print Comment_nums comm Ent_nums = Info.xpath (' Td/div/span[3]/text () ') [0].strip (' ('). Strip (). Strip (') '). Strip (). Strip (' Person rating '). Strip () + ' People rating ' IntrodUceinfo = Info.xpath (' Td/p/span/text () ') Print Book_name,book_url,published_date,price,rate,comment_nums,introduce Info[0] If Len (introduceinfo) > 0 Else '

 

Python 2.7_ uses XPath syntax to crawl top250 information in a watercress book _20170129

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.