Python htmlparser module parse HTML get URL instance _python

Source: Internet
Author: User

Htmlparser is the module that Python uses to parse HTML. It can analyze the tags in HTML, data, etc., is a simple way to deal with HTML. Htmlparser uses an event-driven pattern that, when Htmlparser finds a particular tag, invokes a user-defined function that notifies the program to handle it. Its main user callback functions are named after the Handler_, all of which are htmlparser member functions. When we use it, we derive a new class from the Htmlparser and redefine the functions that begin with Handler_. These functions include the following:

Handle_startendtag processing start and end tags
Handle_starttag processing start tags, such as <xx>
Handle_endtag processing end tags, such as </xx>
Handle_charref processing of special strings, that is, & #开头的, usually the character in the inner code
Handle_entityref deals with special characters that begin with &, such as
Handle_data processing data is the data in the middle of <xx>data</xx>
Handle_comment Processing Notes
Handle_decl deal with <! at the beginning, such as <! DOCTYPE HTML PUBLIC "-//w3c//dtd HTML 4.01 transitional//en"
HANDLE_PI deal with things like <?instruction>

Here I take the URL from the Web page for example, introduced. To get to the URL, be sure to parse the <a> tag and then fetch the value of its href attribute. Here's the code:

#-*-encoding:gb2312-*-
import Htmlparser

class Myparser (Htmlparser.htmlparser):
  def __init__ (self):
    htmlparser.htmlparser.__init__ (self)    
    
  def handle_starttag (self, Tag, attrs):
    # This defines the function that handles the start tag
    if  Tag = = ' A ':
      # <a> properties for
      name,value in attrs:
        if name = = ' href ':
          print value
    

if __name__ = = ' __main__ ':
  a = '  
 
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.