:
Tag. attrs# {U'class': u'boldest '}
Or you can directly retrieve the attributes of tag. class.
You can also operate attributes freely.
tag['class'] = 'verybold'tag['id'] = 1tag#
Extremely bold
del tag['class']del tag['id']tag#
Extremely bold
tag['class']# KeyError: 'class'print(tag.get('class'))# None
You can also search for dom elements as needed, for example, the following example:
1. build a document
html_doc = """The Dormouse's storyThe Dormouse's storyOnce upon a time ther
()) The third type:Add cookie processing to get the page information you need to sign in." http://www.baidu.com " Print ' The third method ' = = = Urllib2.urlopen (URL) print response3.getcode () Print CJ print len (Response3.read ())Of course, the use of these methods need to import URLLIB2, the third need to import cookielib.The realization of 0X03 BeautifulSoupThe following is a brief talk about the usage of beautifulsoup . It's basically three-step: Create BeautifulSoup objects, find n
code will be used as an example. This is Alice in Wonderland a piece of content (a document that is referred to as Alice in later content ):Html_doc = "" "using BeautifulSoup to parse this code, you can get aBeautifulSoupobject, and can be output in the form of a standard indented format:From BS4 Import beautifulsoupsoup = BeautifulSoup (html_doc) print (Soup.prettify ()) # A few simple ways to browse structured data:soup.title# Find links to all For link in Soup.find_all (' a '): print (li
': ' ... '} # r = Requests.get (' ... ', proxies = proxies) Two. BeautifulSoup LibraryHTML: examples are as followsHTML>Head>title>The Dormouse ' s storytitle>Head>Body>Pclass= "title"name= "Dromouse">b>The Dormouse ' s storyb>P>Pclass= "Story">Once Upon a time there were three little sisters; and their names wereahref= "Http://example.com/elsie"class= "Sister"ID= "Link1">Elsie -a>,ahref= "Http://example.com/lacie"class= "Sister"ID= "Link2">Laciea
# Coding:utf8? from bs4 import beautifulsoup Import Re?Html_doc = "" "?And they lived at the bottom of a well.?"""Soup = beautifulsoup(html_doc,' Html.parser ', from_encoding=' utf-8 ' ) ?Print ' links ' Links = soup. find_all(' a ' ) for link in links : #print LinkPrint link. name, link[' href '],link. get_text( ) ? print ' Get a separate link " link_code = Soup . find ( ' a ' , href = ' Http://example.com/
Beautifulsoup:you can treat it as a Tag object
Comment: Get comments
Tag:
print type (SOUP.A) # print Soup.p.attrs #
{' class ': [' title '], ' name ': ' Dromouse '}
css_soup = BeautifulSoup ( " ) Css_ soup.p[ " class " #
Navigablestring:
Print soup.p.string # The dormouse ' s story
Useful enough:Soup.title#Soup.title.name#u ' title 'soup.title.string#u ' the Dormouse ' story 'Soup.title.parent.name#u ' head 'SOUP.P#
Beautiful soup is a python library that can extract data from HTML or XML files. It can implement the usual document navigation, searching, and modifying methods through your favorite converter. beautiful Soup helps you save hours or even days of work.
Quick Start. Use the following HTML as an example.
html_doc = """
Use beautifulsoup to parse this code and you can getBeautifulSoupAnd can output according to the structure of the standard indent format:
from bs4 import BeautifulSoupsoup = Beautif
1 html = " " "2 HTML>Head>title>The Dormouse ' s storytitle>Head>3 Body>4 Pclass= "title"name= "Dromouse">b>The Dormouse ' s storyb>P>5 Pclass= "Story">Once Upon a time there were three little sisters; and their names were6 ahref= "Http://example.com/elsie"class= "Sister"ID= "Link1">Elsie -a>,7 ahref= "Http://example.com/lacie"class= "Sister"ID= "Link2">Laciea> and8 ahref= "Http://example.com/tillie"class= "Sister"ID= "Link3">Tilliea>;9And they lived
fictional text that can be understood.The reason why learning requires a book, is not the combination of all its words, but the essence of some words.We can extract all of it and, of course, extract what we want, such as the name of the book I Want to extract:AndIn fact, this is the title is the python to get the title of the method, in the HTML code, you can see that there are two titles, the above is the HTML Settings page title, the following is an indicator, Soup.title code crawling is the
attribute of LINK2 and the id attribute as LINK3.Example: Select the Class property of red, the id attribute is LINK2, and the id attribute is link3 all labels. 8. Find by whether a property existsExample: look for a label that has the Herf attribute under the a tag.9, through the value of the property to findExample: Select the A tag and its properties Href=http://example.com/lacie all tags. Example: Select the A tag, whose href attribute is all tag
Beautiful Soup is a python library that extracts data from HTML or XML files. It is able to use your favorite converter to achieve idiomatic document navigation, find, modify the way the document. a person has at least one dream, there is a reason to be strong. If the heart does not inhabit the place, everywhere is wandering. Installation and use of BeautifulSoupWindow Installation method: Pip install BEAUTIFULSOUP4.First, the simple use of BEAUTIFULSOUP4 fromBs4ImportBeautifulSoupImportRehtml_d
document ):HTML = """ class="title">class=" Story">once upon A TimeThere were three Little sisters; And their names WereHttp://example.com/elsie"class="Sister"Id="Link1">elsieHttp://example.com/lacie"class="Sister"Id="Link2">LacieHttp://example.com/tillie"class="Sister"Id="Link3">tillieclass=" Story">...""3. Create a BeautifulSoup object#创建BeautifulSoup对象soup = beautifulsoup (html)"" If the HTML content exists in the file a.html, then you can create
. parent node and ancestor node #8. sibling Node
# Traversing the document tree: You can directly select the document tree by Tag name, which features fast selection, however, if multiple identical tags exist, only The first html_doc = "View Code 4 search document tree
1. Five filters
# Search document tree: BeautifulSoup defines many search methods. Here we will introduce two methods: find () and find_all (). parameters and usage of other methods are similar to html_doc = "View Code
2. find_all
(Soup.find_all ("A", id= "Link3"))#15. Find the contents of class_= "sister" under a tag#Print (Soup.find_all ("A", class_= "Sister"))#16, through the text parameter can search the document string content.#As with the optional value of the name parameter, the text parameter accepts a string, regular expression, list, True#Print (Soup.find_all (text= "Elsie"))#Print (Soup.find_all (text=["Tillie", "Elsie", "Lacie"] )#17. Limit the number of search lab
network. They can also be used as an itunes server and as an extended backup device for Mac-series computers running leopard systems or apple time Machine software. HP said the two home servers used the 2.0GHz Celeron processor and 2GB DDR2 memory, integrated gigabit NIC, 4 USB 2.0 ports and one eSATA port, built with four hard drive racks. 750GB HDD version to 599 U.S. dollars, 1.5TB to 749 dollars.
HP MediaSmart Server ex485
In the context of enterprise-focused raid, Atto has released the F
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.