First, find a label
(1) Find all a tags
>>> forXinchSoup.find_all ('a'): Print (x)<aclass="Sister"href="Http://example.com/elsie"Id="Link1">elsie</a><aclass="Sister"href="Http://example.com/lacie"Id="Link2">lacie</a><aclass="Sister"href="Http://example.com/tillie"Id="Link3">Tillie</a>
(2) Find all a tags, and the attribute value href need to protect the keyword ""
for in Soup.find_all ('a', href = re.compile ('lacie' ): print (x)class="sister" href="/http/ Example.com/lacie "id="link2">Lacie</a>
(3) Find all a tags, and the string content contains the keyword "Elsie"
>>> for x in Soup.find_all ( Span style= "COLOR: #800000" > " a " , Span style= "COLOR: #0000ff" >string = re.compile ( " elsie " <a class =" sister " href=< Span style= "COLOR: #800000" > " http://example.com/elsie " id= " link1 >elsie</a>
(4) Find all the child tags of the body tag and cycle the printout
>>> forXinchSoup.find ('Body'). Children:ifisinstance (x,bs4.element.tag): #使用isinstance过滤掉空行内容 print (x)<pclass="title"><b>the dormouse's story</b></p><pclass=" Story">Once Upon a time there were three little sisters; and their names were<aclass="Sister"href="Http://example.com/elsie"Id="Link1">Elsie</a>,<aclass="Sister"href="Http://example.com/lacie"Id="Link2">Lacie</a> and<aclass="Sister"href="Http://example.com/tillie"Id="Link3">Tillie</a>; and they lived at the bottom of a well.</p>
Ii. Information Extraction (link extraction)
(1) Parse the information label structure, find all a tags, and extract the value of the href attribute in each a tag (that is, the link), and then there is an empty list;
>>> linklist = []>>> forXinchSoup.find_all ('a'): Link= x.Get('href') iflink:linklist.append (link)>>> forXinchlinklist: #验证: Ring print out the link in the linklist list print (x) http://Example.com/elsiehttp//Example.com/laciehttp//Example.com/tillie
Summary: Link Extraction <---> attribute Content extraction <---> x.get (' href ')
(2) Parse the information label structure, find all a tags, and each a tag in the href contains the keyword "Elsie", and then into the empty list;
>>> Linklst = []>>> forXinchSoup.find_all ('a', href = Re.compile ('Elsie')): Link= x.Get('href') iflink:linklst.append (link)>>> forXinchlinklst: #验证: Loop print out the link in the linklist list print (x) http://Example.com/elsie
Summary: When a tag is searched, the regular match content of the href content of the attribute value is added <---> href = re.compile (' Elsie ')
(3) Parsing the information label structure, querying all a tags, and each A-tag string content contains the keyword "Elsie", and the output structure into an empty list;
for in Soup.find_all ('a'): string = X.get_text ( ) Print (string) Elsielacietillie
Python's BeautifulSoup tag lookup and information extraction