I. Installation of Beautiful Soup Library
Win platform: "Run as Administrator" cmd
Execute pip Install Beautifulsoup4
Installation quiz: from BS4 import BeautifulSoup
Soup=beautifulsoup (' <p>data</p> ', 'html.parser')
Print (Soup.prettify ())
Second, the basic elements of Beautiful Soup Library
1, BeautifulSoup class
From BS4 import BeautifulSoup
Soup2=beautifulsoup (Open ("d://demo.html"), "Html.parser")
Soup=beautifulsoup ("
2. Basic elements of BeautifulSoup class <p class= "title" >...</p>
Basic elements |
Description |
Tag |
tags, the most basic information organizational unit, with <> and </> to indicate the beginning and end |
Name |
The name of the label,<p>...</p> is ' P ', format: <tag>.name |
Attributes |
Label properties, dictionary form organization, format: <tag>.attrs |
Navigablestring |
String in non-attribute string,<>...</> in tag, format: <tag>.string |
Comment |
The annotation part of a string within a tag, a special type of comment |
Review demo.html
>>> Import Requests
>>> r=requests.get ("http://python123.io/ws/demo.html")
>>> Demo=r.text
>>> Demo
'
Tag tag : Any tag that exists in HTML syntax can be accessed with soup.<tag> access
When there are multiple identical <tag> corresponding content in an HTML document,,soup.<tag> returns the first
>>> from BS4 import BeautifulSoup
>>> Soup=beautifulsoup (demo, "Html.parser")
>>> Soup.title
<title>this is a python demo page</title>
>>> SOUP.P
<p class= "title" ><b>the Demo Python introduces several Python courses.</b></p>
>>> Soup.a
<a class= "Py1" href= "http://www.icourse163.org/course/BIT-268001" id= "Link1" >basic python</a>
Python web crawler and Information Extraction-beautiful Soup Library Introduction