See examples directly:
The code is as follows:
#!/usr/bin/python
#-*-Coding:utf-8-*-
From BS4 import BeautifulSoup
Html_doc = "" "
The Dormouse ' s story
The Dormouse ' s story
Once upon a time there were three Little sisters; and their names were
Elsie,
Lacie and
Tillie;
And they lived at the bottom for a well.
...
"""
Soup = BeautifulSoup (Html_doc)
Print Soup.title
Print Soup.title.name
Print soup.title.string
Print SOUP.P
Print Soup.a
Print Soup.find_all (' a ')
Print Soup.find (id= ' Link3 ')
Print Soup.get_text ()
The result is:
The code is as follows:
The Dormouse ' s story
Title
The Dormouse ' s story
The Dormouse ' s story
Elsie
[Elsie, Lacie, Tillie]
Tillie
The Dormouse ' s story
The Dormouse ' s story
Once upon a time there were three Little sisters; and their names were
Elsie,
Lacie and
Tillie;
And they lived at the bottom for a well.
...
Can see: Soup is beautifulsoup processing formatted string, Soup.title get is the title tag, SOUP.P get is the first p tag in the document, to want all the tags, you have to use the Find_all
function. The Find_all function returns a sequence that loops through it, and then gets the thought in turn.
Get_text () is the return text, which is the label for each BeautifulSoup processed object. You can try print soup.p.get_text ()
can actually get the other properties of the tag, such as I want to get the value of the href attribute of the A tag, can use print soup.a[' href ', similar to other properties, For example, class is also available (soup.a[' class ').
in particular, some special tags, such as the head tag, can be obtained through Soup.head, which has been said before.
How do I get an array of contents for a label? Using the Contents property, you can use print soup.head.contents to get all child children under head, return results as a list,
can be obtained using [num], get tags, use. Name.
Gets the label of the child, can also use children, but cannot print Soup.head.children no return list, return is ,
but use L The IST can be converted to a list. Of course you can use the For statement to traverse the child inside.
about the String property, if more than one label, then return none, or return the specific string print soup.title.string returned the Dormouse ' s story
more than a label, You can try strings
look up can use the parent function, if you find all, then you can use the parents function
to find the next sibling use Next_sibling, find the previous sibling node using previous_sibling, If you are looking for all, then add s after the corresponding function to
How do I traverse a tree?
Using the Find_all function
The code is as follows:
Find_all (name, Attrs, recursive, text, limit, **kwargs)
To illustrate:
The code is as follows:
Print Soup.find_all (' title ')
Print Soup.find_all (' P ', ' title ')
Print Soup.find_all (' a ')
Print Soup.find_all (id= "Link2")
Print Soup.find_all (id=true)
The return value is:
The code is as follows:
[The Dormouse ' s story]
[
The Dormouse ' s story
]
[Elsie, Lacie, Tillie]
[Lacie]
[Elsie, Lacie, Tillie]
Through CSS lookup, directly on the example:
The code is as follows:
Print Soup.find_all ("A", class_= "sister")
Print Soup.select ("P.title")
To find by property
The code is as follows:
Print Soup.find_all ("A", attrs={"class": "Sister"})
Find by text
The code is as follows:
Print Soup.find_all (text= "Elsie")
Print Soup.find_all (text=["Tillie", "Elsie", "Lacie"])
Limit the number of results
The code is as follows:
Print Soup.find_all ("a", limit=2)
The result is:
The code is as follows:
[Elsie, Lacie, Tillie]
[
The Dormouse ' s story
]
[Elsie, Lacie, Tillie]
[u ' Elsie ']
[u ' Elsie ', U ' Lacie ', U ' Tillie ']
[Elsie, Lacie]
In short, these functions allow you to find what you want.