BeautifulSoup find (), Find_all (), select () function __ function

Source: Internet
Author: User
Tags reserved tag name

Find () function: Outputs the first matching object, that is, Find_all () [0].
Find_all () function: (following from official documentation)

FindAll (Name=none, attrs={}, Recursive=true, Text=none, Limit=none,
**kwargs) returns a list. These parameters will appear repeatedly in this document. The most important of these is the name parameter and the keywords parameter (the **kwargs parameter).
The parameter name matches the name of the tags and gets the result set accordingly. There are several ways to match name, and the simplest use is simply to give a tag name value.
1. The following code looks for all the B tags in the document: Soup.findall (' B ')
2. You can pass a regular expression, and the following code looks for all tags that start with B:

import re
 TAGSSTARTINGWITHB = Soup.findall (Re.compile (' ^b '))

Output: [Tag.name for tag in TAGSSTARTINGWITHB]
3. You can pass a list or dictionary, the following two call is to find all the title and P tags, they get the same result, but the latter method is faster:

Soup.findall ([' title ', ' P '])

Output:

[<title>page Title</title>, 
<p id= "Firstpara" align= "center" >this is paragraph <b>one </b>.</p>, 
<p id= "Secondpara" align= "blah" >this is paragraph <b>two</b>.</p "]
Soup.findall ({' title ': True, ' P ': true})

Output:

[<title>page Title</title>, 
 <p id= "Firstpara" align= "center" >this is paragraph <b>one </b>.</p>, 
 

4. You can pass a true value so that you can match the name of each tag: that is, match each tag.
Alltags = Soup.findall (True)
Output:

[Tag.name for tag in Alltags]
[u ' html ', U ' head ', U ' title ', U ' body ', U ' P ', U ' b ', U ' P ', U ' B ']

This may not seem very useful, but when you qualify attribute values, it is useful to use true.
5. You can pass the callable object, which is an object that uses the tag object as its only parameter and returns a Boolean value.
Each tag object that FindAll uses as a parameter is passed to the callable object, and if the call returns True, then the tag is matched.
6. The following is a label that looks up two and has only two attributes (tags):

Soup.findall (Lambda tag:len (tag.attrs) = = 2)

Output:

[<p id= "Firstpara" align= "center" >this is paragraph; 
 

7. The following is a label that looks for a single character to be signed and has no attributes:

Soup.findall (Lambda tag:len (tag.name) = = 1and not tag.attrs)

Output:

The 8.keyword parameter is used to filter the properties of tag. The following example looks for all tags that have attribute align and value center:

Soup.findall (align= "center")

Output:

[<p id= "Firstpara" align= "center" >this is paragraph;]

As with the name parameter, you can also use a different keyword parameter object to more flexibly specify the matching criteria for the property value (but you cannot use Python reserved words such as Class).
9. You can pass a string to match the value of the property. You can also pass a regular expression, a list, a hash table (hash), a special value of true or none, or a callable object with an attribute value as a parameter (note: This value may be none). Some examples:

Soup.findall (Id=re.compile ("para$"))

Output:

[<p id= "Firstpara" align= "center" >this is paragraph <b>one</b>.</p>,<p "id=" align= "blah" >this is paragraph <b>two</b>.</p>]
Soup.findall (align=["center", "Blah"])

Output:

Soup.findall (Align=lambda (value): Value and Len (value) < 5)

Output:

10. Special values True and none are more interesting. True matches a label with any value for a given property and none matches a label for which a given property value is null. Some examples are as follows:

Soup.findall (Align=true)

Output:

[Tag.name for tag in Soup.findall (Align=none)]
[u ' html ', U ' head ', U ' title ', U ' body ', U ' b ', U ' B ']
If you need to add more complex or interrelated (interlocking) matching values to the label's properties, as above, handle the tag object with the callable object's pass arguments. You may notice a problem here. If you have a document that has a label that defines a name property, what happens. You cannot use name as the keyword parameter because beautiful soup has already defined a name parameter to use. You can't use a python reserved word such as a for as keyword parameter. BeautifulSoup provides a special parameter attrs that you can use to cope with these situations. Attrs is a dictionary and is used like the keyword parameter:

Soup.findall (Id=re.compile ("para$"))

Output:

[<p id= "Firstpara" align= "center" >this is paragraph <b>one</b>.</p>,<p "id=" align= "blah" >this is paragraph <b>two</b>.</p>]
Soup.findall (attrs={' id ': Re.compile ("para$")})

Output:

[<p id= "Firstpara" align= "center" >this is paragraph <b>one</b>.</p>,
 <p id= " Secondpara "align=" blah ">this is paragraph <b>two</b>.</p>]

You can use Attrs to match attributes that are named Python reserved words, such as class, for, and import, or those that are not keyword parameters but whose names are beautiful soup search methods use, such as name, Recursive, limit, text, and attrs itself.

From BeautifulSoup import beautifulstonesoup
xml = ' <person name= ' Bob ' ><parent rel= ' mother ' name= ' Alice ' > ' Xmlsoup = Beautifulstonesoup (XML) 
Xmlsoup.findall (name= "Alice")

Output:
[]
Xmlsoup.findall (attrs={"name": "Alice"})
Output:

[Parent rel= "mother" Name= "Alice" ></PARENT>]

Use CSS classes to find
It is more convenient to attrs parameters for CSS classes. For example, class is not only a CSS property, but also a reserved word for Python. You can use Soup.find ("TagName", {"Class": "CssClass"}) to search for CSS class, but because there are many such operations, you can only pass a string to Attrs. This string defaults to the parameter values of the CSS class .

From BeautifulSoup import beautifulsoup
soup = BeautifulSoup ("" "" Bob ' s <b>Bold</b> barbeque Available in <b class= "Hickory" >Hickory</b> and <b class= "lime" >Lime</a> "" "")
Soup.find (" B ", {" Class ":" Lime "})

Output:

<b class= "Lime" >Lime</b>
Soup.find ("B", "Hickory")

Output:

<b class= "Hickory" >Hickory</b>

*
The Select () function gets the element that contains a specific CSS property
For example:

Import requests from
BS4 Import beautifulsoup

html_sample = ' \

1. Use the Select () function to find all elements with title (ID preceded by #)

ALink = Soup.select (' #title ')
print (ALink)

The output results are:

[

2. Use the Select () function to find all elements of link for class (before class needs to be added.)

ALink = Soup.select ('. Link ')
print (ALink)

The output results are:

[<a class= "link" href= "#" >this is link1!</a> <a class= "link" href= "#" >this is Link2!</a> <a class= "link" href= "#" >this is link3!</a> <a class= "link" href= "#" >this is Link4!</a>]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.