Beautiful soup is a library of Python, and the main function is to fetch data from a Web page.
Beautiful soup can provide simple, Python-style functions for navigating, searching, and modifying analysis trees. Beautiful Soup is a toolkit that provides users with the data they need to crawl by parsing a document. Because it's simple, you can write a complete application without much code.
Beautiful soup automatically converts the input document to Unicode encoding, and the output document is converted to UTF-8 encoding.
650) this.width=650; "Src=" https://s1.51cto.com/wyfs02/M00/91/87/wKioL1j2xUTipGBKAAAb7qgyaWw585.jpg-wh_500x0-wm_ 3-wmp_4-s_497214178.jpg "title=" timg.jpg "alt=" Wkiol1j2xutipgbkaaab7qgyaww585.jpg-wh_50 "/>
Beautiful Soup Object Types
Beautiful soup transforms complex HTML documents into a complex tree structure. Each node is a Python object, and all objects can be grouped into 4 types: Tag, navigablestring, BeautifulSoup, Comment.
(1) Tag
tag is actually a tag in HTML, such as:
<title>the Dormouse ' s story</title>
<aclass= "Sister" href= "Http://example.com/elsie" id= "Link1" > Elsie</a>
The above title, A, and so on HTML tags plus included in the content is tag
For example: Use beautiful Soup to get tags
Print Soup.title
Print Soup.head
Print Soup.a
Print SOUP.P
Print type (SOUP.A)
Print Soup.name
Print Soup.head.name
(2) Navigablestring
What if you have got the contents of the tag and want to get the text inside the tag?
For example, you can use. String to get internal text
Print soup.p.string
This makes it easy to get the contents of the tag, which is much more complex if you use regular expressions. Its type is a navigablestring, which means a string that can be traversed.
Print type (soup.p.string)
(3) BeautifulSoup
The BeautifulSoup object represents the entire contents of a document. Many times you can think of it as a special tag object.
For example: You can get its type, name, and property separately
Print type (soup.name)
Print Soup.name
Print Soup.attrs
(4) Comment
The comment object is a special type of navigablestring object, and the contents of the output still do not include comment symbols. If it does not work well, it can cause unexpected trouble with text processing.
For example: Find a label with annotations
Print Soup.a
Print soup.a.string
Print type (soup.a.string)
This article from the "CAS Computer Training" blog, declined to reprint!
Python language learning: Beautiful soup Four specific uses of a single object