Use HTMLParser to parse html
 
 
 The HTMLParser in python parses html, which is different from the html library parsing in c ++ and other languages. It uses class inheritance.
 
 
 
 By re-implementing several functions of the HTMLParser class, we can parse html.
 
 
 
 Major heavy-duty functions include:
 
 
 
 
 Handle_starttag # Start Tag Parsing
 
 Handle_endtag # End Tag Parsing
 
 Handle_data # parsing of tag data
 
 
 
 The following is an example of how to use it (this example is an example on the python homepage ):
 
  
 from html.parser import HTMLParserclass MyHTMLParser(HTMLParser):    def handle_starttag(self, tag, attrs):        print("Encountered a start tag:", tag)    def handle_endtag(self, tag):        print("Encountered an end tag :", tag)    def handle_data(self, data):        print("Encountered some data  :", data)parser = MyHTMLParser()parser.feed('Test'            'Parse me!') 
  
 The source html is:
 
  
    Test    Parse me! 
 
 Output result: 
 
 
 
Encountered a start tag: htmlEncountered a start tag: headEncountered a start tag: titleEncountered some data  : TestEncountered an end tag : titleEncountered an end tag : headEncountered a start tag: bodyEncountered a start tag: h1Encountered some data  : Parse me!Encountered an end tag : h1Encountered an end tag : bodyEncountered an end tag : html
Now the TAG content can be parsed. 
 
 
Summary:
 
1) inherit the HTMLParser class
 
Class MYParser (HTMLParser ):
 
2) def handle_starttag (self, tag, attrs) # redefines the start tag of resolution. The tag is a tag, and attrs is the tag attribute and attribute value: it is a dict.
 
# Here is an example to extract the web site
 
3) def handle_endtag (self, tag): # redefine the resolution end tag
 
4) def handle_data (self, data): # redefine the parsing data