Use Python+xpath to get download links

Source: Internet
Author: User
Tags readfile xpath

Use Python+xpath to get the download link for https://pypi.python.org/pypi/lxml/2.3/:

After using requests to get the HTML, analyze the tags found in HTML to find the link in <table class= "list" >...</table>

They were then given<trclass="Odd"> and <tr class=" even"> content, which can be written as XPath when using XPath ('//table[@class = ' list ']/ tr[@class = "even" or "odd"]/td/span/a[1]/@href ')


Import re

Import requests

Import Urllib2

From lxml import etree


Url= ' https://pypi.python.org/pypi/lxml/2.3/'

head={' user-agent ': ' mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/31.0.1650.63 safari/537.36 '}


def gethtml (URL, *args):

html = requests.get (URL, *args). Content

return HTML


def writfile (cont):

Try

FD = open (' X.txt ', ' W ')

Try

Fd.write (cont)

Finally

Fd.close ()

Except IOError:

Print "File not existing!"


def ReadFile ():

Try

FD = open (' X.txt ', ' R ')

Try

All_the_text = Fd.read ()

Finally

Fd.close ()

Except IOError:

Print "File Open Error!"


Return All_the_text


html = gethtml (URL, head)

Writfile (HTML)

All_text = ReadFile ()


Dom = etree. HTML (All_text)

Url_list = Dom.xpath ('//table[@class = "List"]/tr[@class = "even" or "odd"]/td/span/a[1]/@href ')

For URL in url_list:

Print URL








After testing, the corresponding download link can be obtained normally. As a beginner, the code has a lot of inappropriate places, but also ask Daniel to correct them after reviewing.





This article is from the "Ignorance" blog, please be sure to keep this source http://huangfuligang.blog.51cto.com/9181639/1706837

Use Python+xpath to get download links

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.