HTML or XHTML may be the most commonly used programming language for every computer user. I'm sighing google, bing, baidu, and so on (by the way, my instructor's antu search) when search engines are powerful, have you ever thought about writing one by yourself?
The code below is just a test. There are a lot of problems from the "surface" or from the "internal" for your reference only.
Code to obtain the image URL from the webpage Information
[Python]
Class ImgParser (HTMLParser ):
Def _ init _ (self ):
Self. tag =''
Self. attrs =''
Self. readingtitle = False
HTMLParser. _ init _ (self)
Def handle_starttag (self, tag, attrs ):
If tag = 'img ':
Self. readingtitle = True
For name, value in attrs:
Print (value)
Def handle_data (self, data ):
If self. readingtitle = True:
Self. tag + = data
Def handle_endtag (self, tag ):
If tag = 'img ':
Self. readingtitle = False
Here, the HTMLParser module is still worth mentioning (an interesting module ):
HTMLParser itself does not provide too many functions. If you need to parse HTML, You can inherit HTMLParser. For some specific function functions, similar to the virtual function in C ++ (personal understanding), it defines the subtle processing of elements in HTML:
Handle_starttag (self, tag, attrs): process information in the start tag <tag attrs = "... "> data </tag>, where attrs (attribute) is stored in the list
Handle_endtag (self, tag): process the information in the end tag <tag attrs = "..."> data </tag>
Handle_data (self, data): Process Element data Information <tag attrs = "..."> data </tag>
Test.html:
[Html]
<! -- Basic Title parsing -->
<HTML>
<HEAD>
<TITLE>
Document Title
</TITLE>
</HEAD>
<BODY id = "1" name = "this is a body">
hoho </img>
Here is the test
</BODY>
</HTML>
Of course, it's a good parser ??? Is it a transliteration ??? It won't be done in three or two times. After learning about the parsing mechanism, you have to learn, communicate, and work hard with your own humility.
From the column of FishinLab