Python checks whether a website link already exists

Source: Internet
Author: User
Python is an interpreted, object-oriented, and dynamic data type high-level programming language. This article describes how to check whether a Python website link already exists, if you need to learn it together, Python is an interpreted, object-oriented, and dynamic data type high-level programming language.

Python was invented by Guido van rosum at the end of 1989. The first public release was released in 1991.

Like Perl, Python source code also complies with the GPL (GNU General Public License) protocol.

I have heard that the Python language is easy to operate and has been well known. In just a few words, it has implemented basic functions.

To check whether a specified URL exists on the target website, the process is simple:

1. Obtain the HTML code of the specified website webpage

2. Search for the specified URL in HTML code

3. If yes, OK; otherwise, Error

The entire program references two lib libraries: urllib2 and sgmllib.

The urllib2 library mainly defines some functions and classes for accessing URLs (basically through HTTP.

The sgmllib library is mainly responsible for parsing HTML code.

Import urllibfrom sgmllib import SGMLParserclass URLLister (SGMLParser): def reset (self): SGMLParser. reset (self) self. urls = [] def start_a (self, attrs): href = [v for k, v in attrs if k = 'href '] if href: if (href []. count ('HTTP: // website url') =): self. urls. extend (href) links = ['HTTP: // response :// Www.yahoo.com/','http://www.bing.com/','http://www..com' ,#for eachlink in links: f = urllib. urlopen (eachlink) if f. code =: parser = URLLister () parser. feed (f. read () f. close () if (len (parser. urls)> =): print 'the link from '+ eachlink +' is OK! 'Else: print 'the link from' + eachlink + 'is ERROR! '

The following are the main functions:

1. urllib2.urlopen (url [, data] [, timeout]) // open a URL

2. SGMLParser. feed (data) // obtain the HTML data to be parsed

3. SGMLParser. start_tag (attributes) // specify the HTML tag to be parsed. In this program, start_a is called, indicating that the tag in the HTML code needs to be parsed. Search for the value of the href attribute in the tag to obtain information about all links on the webpage. If the specified URL exists, it is OK.

This is actually a very small script, but it also makes me very excited. First, I have moved into the world of Python and used it to solve problems in practical work. Second, its simple syntax and indent format have really made me shine. In the future, we hope to use Python more to solve all kinds of problems in actual work and apply what we have learned.

The above is an introduction to the Python website link check. I hope it will be helpful to you!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.