BeautifulSoup supports the most commonly used CSS selectors, which is the. Select () method that converts a string into a tag object or BeautifulSoup itself.The HTML used in this article is:Html_doc = "" "html>head>title>The Dormouse ' s storytitle>head>body>p class="title">b>The Dormouse ' s storyb>P>p class="story">Once upon a time there were three Little sisters; and their names werea href="Http://example.com/elsie" class="Sister" ID ="Link1">Elsiea>,a href="Http://example.com/
See examples directly:
The code is as follows:
#!/usr/bin/python#-*-Coding:utf-8-*-From BS4 import BeautifulSoupHtml_doc = "" "The Dormouse ' s story
The Dormouse ' s story
Once upon a time there were three Little sisters; and their names wereElsie,Lacie andTillie;And they lived at the bottom for a well.
...
"""
Soup = BeautifulSoup (Html_doc)
Print Soup.title
Print Soup.title.name
Print soup.title.string
Print SOUP.P
Print Soup.a
Print Soup.find_
First, find a label(1) Find all a tags>>> forXinchSoup.find_all ('a'): Print (x)class="Sister"href="Http://example.com/elsie"Id="Link1">elsieclass="Sister"href="Http://example.com/lacie"Id="Link2">lacieclass="Sister"href="Http://example.com/tillie"Id="Link3">Tillie(2) Find all a tags, and the attribute value href need to protect the keyword "" for in Soup.find_all ('a', href = re.compile ('lacie' ): pr
function, but the function must have only one parameter tag, and the return value must be true or False, then the label with the true result is selected, and even a regular expression can be passed.Soup.find_all ('b')#[ImportRe forTaginchSoup.find_all (Re.compile ("^b")): Print(Tag.name)#Body#bSoup.find_all (["a","b"])#[# " id= "Link1" >ELSIE# " id= "Link2" >LACIE#defhas_class_but_no_id (tag):returnTag.has_attr ('class') and notTag.has_attr ('ID'
="Link1">elsieclass="Sister"href="Http://example.com/lacie"Id="Link2">lacieclass="Sister"href="Http://example.com/tillie"Id="Link3">Tillie][finishedinch0.2S]The result of the code execution is as follows, and the number of rows with a is executed.We rewrite the document to rewrite the contents of the soup and agree to the results. (Directly paste the website content, not duplicates) Soup.title#Soup.title.name#u ' title 'soup.title.string#u ' the Dorm
the label#print (soup.title.name)#title#Print the contents of a label#print (soup.title.string)#The dormouse ' s story#Print the P tag in soup, but here's the first one you can find#print (SOUP.P)##Print the P tag class name in soup, but here's the first one you can find#Print (soup.p[' class '],type (Soup.p[' class '))#[' title '] #print the A tag in soup, but here's the first one you can find#print (SOUP.A)# " id= "Link1" >Elsie#Print all the A labels#Print (Soup.find_all (' a '))#[ " id= "Li
There are a lot of LightScribeLabeler production software on Windows, such as SonicExpressLabeler, NeroCoverDesiner, SureThingCDLabeler, RoxioEasyMediaCreator (only NeroCoverDesigner), because Windows XP of VirtualBox cannot recognize its own Recorder, you cannot use many LightScribe Labeler production software on the NeroCoverD Windows platform, such as Sonic Express Labeler, Nero Cover Desiner, SureThing CD Labeler, and Roxio Easy Media Creator (only used Nero Cover Designer ), because Windows
three Little sisters; and their names wereElsie,Lacie andTillie;And they lived at the bottom for a well....You can see: soup is BeautifulSoup processing the formatted string, Soup.title get the title tag, SOUP.P get is the first p tag in the document, to want all the labels, you have to use Find_allFunction. The Find_all function returns a sequence that loops through it, and then gets the thought in turn.Get_text () is the return text, which is the l
values:Css_soup. Find_all("P"class_="body strikeout")# [If the order of the CSS class name does not match the actual value of class , the result will not be searched:Soup. Find_all("a"attrs={"class""Sister"})# [# # textParametersThe text parameter allows you to search the contents of a string in a document. Like the optional value of the name parameter, the text parameter accepts a string, a regular expression, a list, and True. See Example:Soup.find_all (text= "Elsie") # [u ' Els
the BS4 library# 07-urllib2_beautipulsoup_prettify fromBs4ImportBeautifulsouphtml= """ three Little Sisters; and their names were and " id= "Link3" >Tillieand they lived at the bottom of a well."""#创建 Beautiful Soup ObjectsSoup=BeautifulSoup (HTML)#打开本地 HTML file to create an object#soup = beautifulsoup (open (' index.html '))#格式化输出 the contents of a Soup objectPrintSoup.prettify ()Operation Result: The Dormouse ' s story class="title"name="Dromouse"> The Dormouse ' s story class="Sto
")Soup.find_all (text=["Tillie", "Elsie", "Lacie"])Soup.find_all (Text=re.compile ("Dormouse"))def Is_the_only_string_within_a_tag (s):return (s = = s.parent.string)Soup.find_all (Text=is_the_only_string_within_a_tag)Although the text parameter is used to search for a string, it can be mixed with other parameters to filter the tag. Beautiful soup will find the. String method that matches the value of the text parameter. The following code is used to s
See examples directly:
Copy Code code as follows:
#!/usr/bin/python
#-*-Coding:utf-8-*-
From BS4 import BeautifulSoup
Html_doc = "" "
And they lived at the bottom of a well."""
Soup = BeautifulSoup (Html_doc)
Print Soup.title
Print Soup.title.name
Print soup.title.string
Print SOUP.P
Print Soup.a
Print Soup.find_all (' a ')
Print Soup.find (id= ' Link3 ')
Print Soup.get_text ()
The results are:
Copy Code code as follows:
Title
The Dormouse ' s story
addition, you can also use the Get_text () function for the output label content:
PID = Soup.find (Href=re.compile ("^http:")) #使用re正则匹配 behind it is said
P1=soup.p.get_text () The
dormouse ' s story
to obtain the properties of a label through the Get function:
Soup=beautifulsoup (HTML, ' Html.parser ')
pid = Soup.findall (' A ', {' class ': ' sister '}) for
i-PID:
Print i.get (' href ') #对每项使用get函数取得tag属性值
http://example.com/elsie
http://example.com/l
names, attributes, text. Let us illustrate. Here is a link on the webpage: fromBs4ImportBeautifulSoupImportre Html_doc="""""" Print('get all the links') Links= Soup.find_all ('a') forLinkinchLinks:Print(link.name,link['href'],link.get_text ())Print('get a link to Lacie') Link_node= Soup.find ('a', href='Http://example.com/lacie') Print(link_node.name,link_node['href'
can also freely manipulate properties
Tag[' class ' = ' Verybold ' tag[' id '] = 1tag#extremely bolddel tag[' class ']del tag[' id ']tag#extremely boldtag[' class ']# Keyerror: ' Class ' Print (Tag.get (' class ')) # None
You can also find DOM elements in a random way, such as the following example
1. Build a document
Html_doc = "" The Dormouse ' s storythe dormouse ' s storyonce upon a time there were three Little sisters; And their names Wereelsie,laci
Example:HTML file:Html_doc = "" "dormouse ' s story"""Code:From BS4 import BeautifulSoupSoup = BeautifulSoup (Html_doc)Next you can start using a variety of featuresSoup. X (x is any label, returns the entire label, including the label's attributes, contents, etc.)such as: Soup.title# Soup.p# dormouse ' s storySOUP.A (Note: Only the first result is returned)# Soup.find_all (' a ') (Find_all can return all)# [# # find can also be found by attributesSoup.find (id= "Link3")# to fetch a property of
-*-ImportOSImportRe fromBs4ImportBeautifulsouphtml_doc=""""""Print 'get all the A links:'Soup= BeautifulSoup (Html_doc,'Html.parser', from_encoding='Utf-8') Links= Soup.find_all ('a') forLinkinchLinks:Printlink.name,link['href'],link.get_text ()Print 'get a link to Lacie:'Link_node1= Soup.find ('a', href='Http://example.com/lacie')Printlink_node1.name,link_node1['href'],link_node1.get_text ()Print 'use regu
#-*-coding:utf-8-*-#Python 2.7#Xiaodeng#http://tieba.baidu.com/p/2460150866#Label Operations fromBs4ImportBeautifulSoupImporturllib.requestImportRe#if it is a URL, you can use this method to read the page#Html_doc = "http://tieba.baidu.com/p/2460150866"#req = urllib.request.Request (html_doc)#webpage = urllib.request.urlopen (req)#html = webpage.read ()HTML=""""""Soup= BeautifulSoup (HTML,'Html.parser')#Document Object#re.compile to match the href address that needs to be crawled forKinchSoup.fi
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.