Example:
HTML file:
Html_doc = "" "dormouse ' s story
<p class= "Story" >once upon a time there were three Little sisters; And their names were <a href= "Http://example.com/elsie" class= "sister" id= "Link1" >elsie, <a href= "/http Example.com/lacie "class=" sister "id=" Link2 ">lacie and <a href=" Http://example.com/tillie "class=" Sister "Id=" Link3 ">Tillie; And they lived at the bottom for a well.
<p class= "Story" ...
"""
Code:
From BS4 import BeautifulSoup
Soup = BeautifulSoup (Html_doc)
Next you can start using a variety of features
Soup. X (x is any label, returns the entire label, including the label's attributes, contents, etc.)
such as: Soup.title
# <title>the Dormouse ' s story</title>
Soup.p
# <p class= "title" > Thedormouse ' s story
SOUP.A (Note: Only the first result is returned)
# <a class= "sister" href= "Http://example.com/elsie" id= "Link1" >elsie
Soup.find_all (' a ') (Find_all can return all)
# [<a class= "sister" href= "Http://example.com/elsie" id= "Link1" >elsie,
# <a class= "sister" href= "Http://example.com/lacie" id= "Link2" >lacie,
# <a class= "sister" href= "Http://example.com/tillie" id= "Link3" >tillie]
find can also be found by attributes
Soup.find (id= "Link3")
# <a class= "sister" href= "Http://example.com/tillie" id= "Link3" >tillie
to fetch a property of a tag, the available function has find_all,get
For link in Soup.find_all (' a '):
Print (Link.get (' href '))
# Http://example.com/elsie
# Http://example.com/lacie
# Http://example.com/tillie
to fetch all the text in an HTML file, use Get_text ()
Print (Soup.get_text ())
# The Dormouse ' s story
#
# The Dormouse ' s story
#
# Once Upon a time there were three Little sisters; and their names were
# Elsie,
# Lacie and
# Tillie;
# and they lived at the bottom for a well.
#
# ...
If you open an HTML file, the statement is available:
Soup = BeautifulSoup (Open ("index.html"))
Object in the BeautifulSoup
Tag (corresponding to tags in html)
Tag.attrs (returns all properties of tag as a dictionary)
You can add, delete, and change the properties of the tag directly, just like the Dictionary of Operations.
Tag[' class '] = ' verybold '
tag[' id '] = 1
Tag
# <blockquote class= "Verybold" id= "1" >extremely bold</blockquote>
Del tag[' class ']
del tag[' id ']
Tag
# <blockquote>extremely Bold</blockquote>
Tag[' class ']
# Keyerror: ' class '
Print (Tag.get (' class '))
# None
X.contents (x is the label, can return the contents of the label)
eg.
Head_tag = Soup.head
Head_tag
#
Head_tag.contents
[<title>the dormouse ' s story</title>]
Title_tag = Head_tag.contents[0]
Title_tag
# <title>the Dormouse ' s story</title>
Title_tag.contents
# [u ' the Dormouse ' story ']
BeautifulSoup commonly used functions "go"