Python language learning: Beautiful soup Four specific uses of a single object

Source: Internet
Author: User

Beautiful soup is a library of Python, and the main function is to fetch data from a Web page.


Beautiful soup can provide simple, Python-style functions for navigating, searching, and modifying analysis trees. Beautiful Soup is a toolkit that provides users with the data they need to crawl by parsing a document. Because it's simple, you can write a complete application without much code.


Beautiful soup automatically converts the input document to Unicode encoding, and the output document is converted to UTF-8 encoding.


650) this.width=650; "Src=" https://s1.51cto.com/wyfs02/M00/91/87/wKioL1j2xUTipGBKAAAb7qgyaWw585.jpg-wh_500x0-wm_ 3-wmp_4-s_497214178.jpg "title=" timg.jpg "alt=" Wkiol1j2xutipgbkaaab7qgyaww585.jpg-wh_50 "/>


Beautiful Soup Object Types


Beautiful soup transforms complex HTML documents into a complex tree structure. Each node is a Python object, and all objects can be grouped into 4 types: Tag, navigablestring, BeautifulSoup, Comment.


(1) Tag


tag is actually a tag in HTML, such as:


<title>the Dormouse ' s story</title>


&lt;aclass= "Sister" href= "Http://example.com/elsie" id= "Link1" &gt; Elsie&lt;/a&gt;


The above title, A, and so on HTML tags plus included in the content is tag


For example: Use beautiful Soup to get tags


Print Soup.title


Print Soup.head


Print Soup.a


Print SOUP.P


Print type (SOUP.A)


Print Soup.name


Print Soup.head.name




(2) Navigablestring


What if you have got the contents of the tag and want to get the text inside the tag?


For example, you can use. String to get internal text


Print soup.p.string


This makes it easy to get the contents of the tag, which is much more complex if you use regular expressions. Its type is a navigablestring, which means a string that can be traversed.


Print type (soup.p.string)




(3) BeautifulSoup


The BeautifulSoup object represents the entire contents of a document. Many times you can think of it as a special tag object.


For example: You can get its type, name, and property separately


Print type (soup.name)


Print Soup.name


Print Soup.attrs




(4) Comment


The comment object is a special type of navigablestring object, and the contents of the output still do not include comment symbols. If it does not work well, it can cause unexpected trouble with text processing.


For example: Find a label with annotations


Print Soup.a


Print soup.a.string


Print type (soup.a.string)


This article from the "CAS Computer Training" blog, declined to reprint!

Python language learning: Beautiful soup Four specific uses of a single object

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.