How to use Python BeautifulSoup

Source: Internet
Author: User
See examples directly:

The code is as follows:


#!/usr/bin/python
#-*-Coding:utf-8-*-
From BS4 import BeautifulSoup
Html_doc = "" "
The Dormouse ' s story

The Dormouse ' s story


Once upon a time there were three Little sisters; and their names were
Elsie,
Lacie and
Tillie;
And they lived at the bottom for a well.


...


"""
Soup = BeautifulSoup (Html_doc)
Print Soup.title
Print Soup.title.name
Print soup.title.string
Print SOUP.P
Print Soup.a
Print Soup.find_all (' a ')
Print Soup.find (id= ' Link3 ')
Print Soup.get_text ()

The result is:

The code is as follows:


The Dormouse ' s story
Title
The Dormouse ' s story

The Dormouse ' s story


Elsie
[Elsie, Lacie, Tillie]
Tillie
The Dormouse ' s story
The Dormouse ' s story
Once upon a time there were three Little sisters; and their names were
Elsie,
Lacie and
Tillie;
And they lived at the bottom for a well.
...

Can see: Soup is beautifulsoup processing formatted string, Soup.title get is the title tag, SOUP.P get is the first p tag in the document, to want all the tags, you have to use the Find_all
function. The Find_all function returns a sequence that loops through it, and then gets the thought in turn.
Get_text () is the return text, which is the label for each BeautifulSoup processed object. You can try print soup.p.get_text ()
can actually get the other properties of the tag, such as I want to get the value of the href attribute of the A tag, can use print soup.a[' href ', similar to other properties, For example, class is also available (soup.a[' class ').
in particular, some special tags, such as the head tag, can be obtained through Soup.head, which has been said before.
How do I get an array of contents for a label? Using the Contents property, you can use print soup.head.contents to get all child children under head, return results as a list,
can be obtained using [num], get tags, use. Name.
Gets the label of the child, can also use children, but cannot print Soup.head.children no return list, return is ,
but use L The IST can be converted to a list. Of course you can use the For statement to traverse the child inside.
about the String property, if more than one label, then return none, or return the specific string print soup.title.string returned the Dormouse ' s story
more than a label, You can try strings
look up can use the parent function, if you find all, then you can use the parents function
to find the next sibling use Next_sibling, find the previous sibling node using previous_sibling, If you are looking for all, then add s after the corresponding function to

How do I traverse a tree?

Using the Find_all function

The code is as follows:


Find_all (name, Attrs, recursive, text, limit, **kwargs)

To illustrate:

The code is as follows:


Print Soup.find_all (' title ')
Print Soup.find_all (' P ', ' title ')
Print Soup.find_all (' a ')
Print Soup.find_all (id= "Link2")
Print Soup.find_all (id=true)

The return value is:

The code is as follows:


[The Dormouse ' s story]
[

The Dormouse ' s story

]
[Elsie, Lacie, Tillie]
[Lacie]
[Elsie, Lacie, Tillie]

Through CSS lookup, directly on the example:

The code is as follows:


Print Soup.find_all ("A", class_= "sister")
Print Soup.select ("P.title")

To find by property

The code is as follows:


Print Soup.find_all ("A", attrs={"class": "Sister"})

Find by text

The code is as follows:


Print Soup.find_all (text= "Elsie")
Print Soup.find_all (text=["Tillie", "Elsie", "Lacie"])

Limit the number of results

The code is as follows:


Print Soup.find_all ("a", limit=2)

The result is:

The code is as follows:


[Elsie, Lacie, Tillie]
[

The Dormouse ' s story

]
[Elsie, Lacie, Tillie]
[u ' Elsie ']
[u ' Elsie ', U ' Lacie ', U ' Tillie ']
[Elsie, Lacie]

In short, these functions allow you to find what you want.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.