The Python BeautifulSoup uses the method detailed _python

Source: Internet
Author: User

See examples directly:

Copy Code code as follows:

#!/usr/bin/python
#-*-Coding:utf-8-*-
From BS4 import BeautifulSoup
Html_doc = "" "
<body>
<p class= "title" ><b>the dormouse ' s story</b></p>
<p class= "Story" >once upon a time there were three little; and their names were
<a href= "http://www.jb51.net" class= "sister" id= "Link1" &GT;ELSIE&LT;/A&GT;
<a href= "http://www.jb51.net" class= "sister" id= "Link2" >Lacie</a>
<a href= "http://www.jb51.net" class= "sister" id= "Link3" >Tillie</a>;
And they lived at the bottom of a well.</p>
<p class= "Story" >...</p>
"""
Soup = BeautifulSoup (Html_doc)
Print Soup.title
Print Soup.title.name
Print soup.title.string
Print SOUP.P
Print Soup.a
Print Soup.find_all (' a ')
Print Soup.find (id= ' Link3 ')
Print Soup.get_text ()

The results are:

Copy Code code as follows:

<title>the Dormouse ' s story</title>
Title
The Dormouse ' s story
<p class= "title" ><b>the dormouse ' s story</b></p>
<a class= "Sister" href= "Http://www.jb51.net" id= "Link1" >Elsie</a>
[<a class= "sister" href= "Http://www.jb51.net" id= "Link1" >ELSIE</A> <a class= "Sister." href= Www.jb51.net "id=" Link2 >lacie</a> <a class= "sister" href= "Http://www.jb51.net" id= "Link3" >tillie </a>]
<a class= "Sister" href= "Http://www.jb51.net" id= "Link3" >Tillie</a>
The Dormouse ' s story
The Dormouse ' s story
Once upon a time there were three little; and their names were
Elsie,
Lacie and
Tillie;
And they lived at the bottom of a.
...

You can see that: soup is BeautifulSoup processing the formatted string, Soup.title get the title tag,soup.p  get the first P tag in the document, to want all the tags, to use Find_all
Function. The Find_all function returns a sequence that you can loop through to get what you think of in turn. The
Get_text () is the return text, and the label for each BeautifulSoup-processed object is valid. You can try print soup.p.get_text ()
You can actually get other properties of the tag, such as I'm going to get the value of the href attribute of a label, you can use print soup.a[' href ', other properties like that, such as class is also possible to get (soup.a[' class ')).
Special, some special tags, such as head tags, can be obtained through soup.head, in fact, has been said before.
How do I get an array of the contents of a label? Using the Contents property, you can use the print soup.head.contents to get all the children under the head and return the result as a list,
can be obtained by using the [num]  form, get the label, and use. Name.
The child who gets the tag can also use children, but cannot print Soup.head.children does not return a list, and returns the <listiterator object at 0x108e6d150>,
However, use list to convert it to a list. Of course, you can use the For statement to traverse the child inside.
About string property, if more than one label, then returns NONE, otherwise returns the specific string print soup.title.string returns the Dormouse ' s story
more than one label, You can try strings
look up can use the parent function, if you find all, then you can use the parents function
to find the next sibling using the next_sibling, find the previous sibling node using previous_sibling, If you are looking for all, add s to the back of the corresponding function to

How do I traverse a tree?

Using the Find_all function

Copy Code code as follows:

Find_all (name, Attrs, recursive, text, limit, **kwargs)

An example is provided:

Copy Code code as follows:

Print Soup.find_all (' title ')
Print Soup.find_all (' P ', ' title ')
Print Soup.find_all (' a ')
Print Soup.find_all (id= "Link2")
Print Soup.find_all (id=true)

The return value is:

Copy Code code as follows:

[<title>the dormouse ' s story</title>]
[<p class= "title" ><b>the dormouse ' s story</b></p>]
[<a class= "sister" href= "Http://www.jb51.net" id= "Link1" >ELSIE</A> <a class= "Sister." href= Www.jb51.net "id=" Link2 >lacie</a> <a class= "sister" href= "Http://www.jb51.net" id= "Link3" >tillie </a>]
[<a class= "sister" href= "Http://www.jb51.net" id= "Link2" >LACIE</A>]
[<a class= "sister" href= "Http://www.jb51.net" id= "Link1" >ELSIE</A> <a class= "Sister." href= Www.jb51.net "id=" Link2 >lacie</a> <a class= "sister" href= "Http://www.jb51.net" id= "Link3" >tillie </a>]

Look through the CSS, directly on the example:

Copy Code code as follows:

Print Soup.find_all ("A", class_= "sister")
Print Soup.select ("P.title")

Find by property

Copy Code code as follows:

Print Soup.find_all ("A", attrs={"class": "Sister"})

Find by text

Copy Code code as follows:

Print Soup.find_all (text= "Elsie")
Print Soup.find_all (text=["Tillie", "Elsie", "Lacie"])

Limit the number of results

Copy Code code as follows:

Print Soup.find_all ("a", limit=2)

The results are:

Copy Code code as follows:

[<a class= "sister" href= "Http://www.jb51.net" id= "Link1" >ELSIE</A> <a class= "Sister." href= Www.jb51.net "id=" Link2 >lacie</a> <a class= "sister" href= "Http://www.jb51.net" id= "Link3" >tillie </a>]
[<p class= "title" ><b>the dormouse ' s story</b></p>]
[<a class= "sister" href= "Http://www.jb51.net" id= "Link1" >ELSIE</A> <a class= "Sister." href= Www.jb51.net "id=" Link2 >lacie</a> <a class= "sister" href= "Http://www.jb51.net" id= "Link3" >tillie </a>]
[u ' Elsie ']
[u ' Elsie ', U ' Lacie ', U ' Tillie ']
[<a class= "sister" href= "Http://www.jb51.net" id= "Link1" >ELSIE</A> <a class= "Sister." href= Www.jb51.net "id=" Link2 ">LACIE</A>]

In short, you can find what you want by using these functions.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.