Use HTMLParser to parse html

Source: Internet
Author: User

Use HTMLParser to parse html

The HTMLParser in python parses html, which is different from the html library parsing in c ++ and other languages. It uses class inheritance.

By re-implementing several functions of the HTMLParser class, we can parse html.

Major heavy-duty functions include:

Handle_starttag # Start Tag Parsing

Handle_endtag # End Tag Parsing

Handle_data # parsing of tag data


The following is an example of how to use it (this example is an example on the python homepage ):

from html.parser import HTMLParserclass MyHTMLParser(HTMLParser):    def handle_starttag(self, tag, attrs):        print("Encountered a start tag:", tag)    def handle_endtag(self, tag):        print("Encountered an end tag :", tag)    def handle_data(self, data):        print("Encountered some data  :", data)parser = MyHTMLParser()parser.feed('Test'            'Parse me!')

The source html is:

   Test    Parse me! 

Output result:

Encountered a start tag: htmlEncountered a start tag: headEncountered a start tag: titleEncountered some data  : TestEncountered an end tag : titleEncountered an end tag : headEncountered a start tag: bodyEncountered a start tag: h1Encountered some data  : Parse me!Encountered an end tag : h1Encountered an end tag : bodyEncountered an end tag : html
Now the TAG content can be parsed.


Summary:

1) inherit the HTMLParser class

Class MYParser (HTMLParser ):

2) def handle_starttag (self, tag, attrs) # redefines the start tag of resolution. The tag is a tag, and attrs is the tag attribute and attribute value: it is a dict.

# Here is an example to extract the web site

3) def handle_endtag (self, tag): # redefine the resolution end tag

4) def handle_data (self, data): # redefine the parsing data



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.