How to use Python BeautifulSoup

Last Update:2016-06-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

See examples directly:

The code is as follows:

#!/usr/bin/python
#-*-Coding:utf-8-*-
From BS4 import BeautifulSoup
Html_doc = "" "
The Dormouse ' s story

The Dormouse ' s story

Once upon a time there were three Little sisters; and their names were
Elsie,
Lacie and
Tillie;
And they lived at the bottom for a well.

...

"""
Soup = BeautifulSoup (Html_doc)
Print Soup.title
Print Soup.title.name
Print soup.title.string
Print SOUP.P
Print Soup.a
Print Soup.find_all (' a ')
Print Soup.find (id= ' Link3 ')
Print Soup.get_text ()

The result is:

The code is as follows:

The Dormouse ' s story
Title
The Dormouse ' s story

The Dormouse ' s story

Elsie
[Elsie, Lacie, Tillie]
Tillie
The Dormouse ' s story
The Dormouse ' s story
Once upon a time there were three Little sisters; and their names were
Elsie,
Lacie and
Tillie;
And they lived at the bottom for a well.
...

Can see: Soup is beautifulsoup processing formatted string, Soup.title get is the title tag, SOUP.P get is the first p tag in the document, to want all the tags, you have to use the Find_all
function. The Find_all function returns a sequence that loops through it, and then gets the thought in turn.
Get_text () is the return text, which is the label for each BeautifulSoup processed object. You can try print soup.p.get_text ()
can actually get the other properties of the tag, such as I want to get the value of the href attribute of the A tag, can use print soup.a[' href ', similar to other properties, For example, class is also available (soup.a[' class ').
in particular, some special tags, such as the head tag, can be obtained through Soup.head, which has been said before.
How do I get an array of contents for a label? Using the Contents property, you can use print soup.head.contents to get all child children under head, return results as a list,
can be obtained using [num], get tags, use. Name.
Gets the label of the child, can also use children, but cannot print Soup.head.children no return list, return is ,
but use L The IST can be converted to a list. Of course you can use the For statement to traverse the child inside.
about the String property, if more than one label, then return none, or return the specific string print soup.title.string returned the Dormouse ' s story
more than a label, You can try strings
look up can use the parent function, if you find all, then you can use the parents function
to find the next sibling use Next_sibling, find the previous sibling node using previous_sibling, If you are looking for all, then add s after the corresponding function to

How do I traverse a tree?

Using the Find_all function

The code is as follows:

Find_all (name, Attrs, recursive, text, limit, **kwargs)

To illustrate:

The code is as follows:

Print Soup.find_all (' title ')
Print Soup.find_all (' P ', ' title ')
Print Soup.find_all (' a ')
Print Soup.find_all (id= "Link2")
Print Soup.find_all (id=true)

The return value is:

The code is as follows:

[The Dormouse ' s story]
[

The Dormouse ' s story

]
[Elsie, Lacie, Tillie]
[Lacie]
[Elsie, Lacie, Tillie]

Through CSS lookup, directly on the example:

The code is as follows:

Print Soup.find_all ("A", class_= "sister")
Print Soup.select ("P.title")

To find by property

The code is as follows:

Print Soup.find_all ("A", attrs={"class": "Sister"})

Find by text

The code is as follows:

Print Soup.find_all (text= "Elsie")
Print Soup.find_all (text=["Tillie", "Elsie", "Lacie"])

Limit the number of results

The code is as follows:

Print Soup.find_all ("a", limit=2)

The result is:

The code is as follows:

[Elsie, Lacie, Tillie]
[

The Dormouse ' s story

]
[Elsie, Lacie, Tillie]
[u ' Elsie ']
[u ' Elsie ', U ' Lacie ', U ' Tillie ']
[Elsie, Lacie]

In short, these functions allow you to find what you want.



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to use Python BeautifulSoup

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How to use Python BeautifulSoup

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support