Generally, the first version is better than python3.2 and later versions of html. parser.
Beautiful Soup basic class
Tag, the most basic information organization unit, respectively, with <> and </> to indicate the beginning and end
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>')tag = soup.btype(tag)# <class 'bs4.element.Tag'> Any tag that exists in the HTML syntax can be obtained by using soup. <tag>
When multiple <tag> corresponding content exists in the HTML document, soup. <tag> returns the first
TAG attributes
TAG Name: the Name of the TAG, <p>... </P> the name is 'P'. Format: <tag>. name
Each <tag> has its own name, obtained through <tag>. name, string type
Attribute of TAG Attributes tag, which is in dictionary format. Format: <TAG>. attrs
One <tag> can have 0 or more attributes, Dictionary type
The TAG NavigableString TAG contains non-attribute strings. <>... </> String in the format of <tag>. string
NavigableString can span multiple layers
TAG CommentObject is a special typeNavigableStringObject:
Bs-based HTML content Traversal method
Html_doc = """
<Html>
<Body>
<P class = "title"> <B> The Dormouse's story </B> </p>
<P class = "story"> Once upon a time there were three little sisters; and their names were
<A href = "http://example.com/elsie" class = "sister" id = "link1"> Elsie </a>,
<A href = "http://example.com/lacie" class = "sister" id = "link2"> Lacie </a> and
<A href = "http://example.com/tillie" class = "sister" id = "link3"> Tillie </a>;
And they lived at the bottom of a well. </p>
<P class = "story">... </p>
Ps: This column is still used
Downlink traversal:
. Contents subnode list, save all <tag> subnodes to the list
The iteration type of the. children subnode, similar to. contents, used to traverse the subnode cyclically.
. Descendants child node iteration type, including all child nodes, used for loop Traversal
Parallel traversal:
. Next_sibling returns the next parallel node label in HTML text order.
. Previus_sibling returns the label of the previous parallel node in the HTML text order.
. Next_siblings iteration type, returns all subsequent parallel node labels in HTML text order
for sibling in soup.a.next_sibling: print(sibling)
. Previus_siblings iteration type, returns all the parallel node labels that follow the HTML text order.
Because the output is too explosive, try it by myself. I won't map the last one, so I am too lazy to paste it ( ̄ _,  ̄)
for sibling in soup.a.previous_sibling: print(sibling)
Uplink traversal:
Parent label of the. parent node
The iteration type of the parents node's parent label, which is used to traverse the parent node cyclically.
The Key Usage of the BeautifulSoup database is probably the Keyword parameter "find". The official documentation is very detailed. Here we will not list the usage of these Keyword parameters. The main roommates will be killed when they go to bed.
Give a small find column:
Output: