.
You can also operate attributes freely.
tag['class'] = 'verybold'tag['id'] = 1tag#
Extremely bold
del tag['class']del tag['id']tag#
Extremely bold
tag['class']# KeyError: 'class'print(tag.get('class'))# None
You can also search for dom elements as needed, for example, the following example:
1. build a document
html_doc = """The Dormouse's storyThe Dormouse's storyOnce upon a time there were three little sisters; and their names wereElsie,Lac
third need to import cookielib.The realization of 0X03 BeautifulSoupThe following is a brief talk about the usage of beautifulsoup . It's basically three-step: Create BeautifulSoup objects, find nodes, and get node content. fromBs4ImportBeautifulSoupImportRehtml_doc=""""""Soup= BeautifulSoup (Html_doc,'Html.parser', from_encoding='Utf-8')Print 'Get All Links'links= Soup.find_all ('a') forLinkinchLinks:Printlink.name,link['href'],link.get_text ()Print 'Get L
can get aBeautifulSoupobject, and can be output in the form of a standard indented format:From BS4 Import beautifulsoupsoup = BeautifulSoup (html_doc) print (Soup.prettify ()) # A few simple ways to browse structured data:soup.title# Find links to all For link in Soup.find_all (' a '): print (link.get (' href ')) # http://example.com/elsie# http://example.com/lacie# Http://example.com/tillieget all the text from the document:Print (Soup.get_text (
"name= "Dromouse">b>The Dormouse ' s storyb>P>Pclass= "Story">Once Upon a time there were three little sisters; and their names wereahref= "Http://example.com/elsie"class= "Sister"ID= "Link1">Elsie -a>,ahref= "Http://example.com/lacie"class= "Sister"ID= "Link2">Laciea> andahref= "Http://example.com/tillie"class= "Sister"ID= "Link3">Tilliea>; and they lived at the bottom of a well.P>Pclass= "Story">...P>The parsing code is as follows: from Import = Be
# Coding:utf8? from bs4 import beautifulsoup Import Re?Html_doc = "" "?And they lived at the bottom of a well.?"""Soup = beautifulsoup(html_doc,' Html.parser ', from_encoding=' utf-8 ' ) ?Print ' links ' Links = soup. find_all(' a ' ) for link in links : #print LinkPrint link. name, link[' href '],link. get_text( ) ? print ' Get a separate link " link_code = Soup . find ( ' a ' , href = ' Http://example.com/
Beautiful soup is a python library that can extract data from HTML or XML files. It can implement the usual document navigation, searching, and modifying methods through your favorite converter. beautiful Soup helps you save hours or even days of work.
Quick Start. Use the following HTML as an example.
html_doc = """
Use beautifulsoup to parse this code and you can getBeautifulSoupAnd can output according to the structure of the standard indent format:
from bs4 import BeautifulSoupsoup = Beautif
1 html = " " "2 HTML>Head>title>The Dormouse ' s storytitle>Head>3 Body>4 Pclass= "title"name= "Dromouse">b>The Dormouse ' s storyb>P>5 Pclass= "Story">Once Upon a time there were three little sisters; and their names were6 ahref= "Http://example.com/elsie"class= "Sister"ID= "Link1">Elsie -a>,7 ahref= "Http://example.com/lacie"class= "Sister"ID= "Link2">Laciea> and8 ahref= "Http://example.com/tillie"class= "Sister"ID= "Link3">Tilliea>;9And they lived
of all its words, but the essence of some words.We can extract all of it and, of course, extract what we want, such as the name of the book I Want to extract:AndIn fact, this is the title is the python to get the title of the method, in the HTML code, you can see that there are two titles, the above is the HTML Settings page title, the following is an indicator, Soup.title code crawling is the HTML page settings of the titleI can also crawl other things, such as the URL of a particular location
existsExample: look for a label that has the Herf attribute under the a tag.9, through the value of the property to findExample: Select the A tag and its properties Href=http://example.com/lacie all tags. Example: Select the A tag, whose href attribute is all tags that begin with HTTP.Example: Select the A tag, whose href attribute is all tags ending with lie.Example: Select the A tag whose href attribute contains the label of the. com.10. Search by
Beautiful Soup is a python library that extracts data from HTML or XML files. It is able to use your favorite converter to achieve idiomatic document navigation, find, modify the way the document. a person has at least one dream, there is a reason to be strong. If the heart does not inhabit the place, everywhere is wandering. Installation and use of BeautifulSoupWindow Installation method: Pip install BEAUTIFULSOUP4.First, the simple use of BEAUTIFULSOUP4 fromBs4ImportBeautifulSoupImportRehtml_d
; And their names WereHttp://example.com/elsie"class="Sister"Id="Link1">elsieHttp://example.com/lacie"class="Sister"Id="Link2">LacieHttp://example.com/tillie"class="Sister"Id="Link3">tillieclass=" Story">...""3. Create a BeautifulSoup object#创建BeautifulSoup对象soup = beautifulsoup (html)"" If the HTML content exists in the file a.html, then you can create the BeautifulSoup object soup = BeautifulSoup (Open(a.html))"" "4. Formatted output#格式化输出 Print (So
tree: BeautifulSoup defines many search methods. Here we will introduce two methods: find () and find_all (). parameters and usage of other methods are similar to html_doc = "View Code
2. find_all (name, attrs, recursive, text, ** kwargs)
#2. find_all (name, attrs, recursive, text, ** kwargs) #2.1. name: Search for the value of the name parameter to enable any type of filter, escape character, and regular expression, list, method, or True. print (soup. find_all (name = re. compile ('^ t') #2.2,
with the optional value of the name parameter, the text parameter accepts a string, regular expression, list, True#Print (Soup.find_all (text= "Elsie"))#Print (Soup.find_all (text=["Tillie", "Elsie", "Lacie"] )#17. Limit the number of search labels#Print (Soup.find_all ("a", limit=2))#18, want to search the direct child node tag, you can use the parameter Recursive=falseDoc=""""""Soup= BeautifulSoup (Doc,"Html.parser")#Print (Soup.find_all ("title",
test module alone.#-*-Coding:utf-8-*-from bs4 import Beautifulsoupimport re#find_all () and find () only search all child nodes of the current node, grandson nodes, etc. # find_parents () and F Ind_parent () is used to search the parent node of the current node, # Search method is the same as normal tag search method, search document Search document contains content HTML = "" "Html>Head>Title>the Dormouse ' s storyTitle>Head>Body>Pclass="Title"Name="Dromouse" >B>the Dormouse ' s storyB>P>Pclass
eSATA port, built with four hard drive racks. 750GB HDD version to 599 U.S. dollars, 1.5TB to 749 dollars.
HP MediaSmart Server ex485
In the context of enterprise-focused raid, Atto has released the Faststream SC 8500 RAID storage controller, which has 8Gbps FC connections to 3Gbps Sas/sata disks, which also has a expresssas for Das (direct-attached storage) environment R380 RAID Adapter. The Lacie Hard disk Max, designed by the renowned industrial
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.