1 html = " " "2 <HTML><Head><title>The Dormouse ' s story</title></Head>3 <Body>4 <Pclass= "title"name= "Dromouse"><b>The Dormouse ' s story</b></P>5 <Pclass= "Story">Once Upon a time there were three little sisters; and their names were6 <ahref= "Http://example.com/elsie"class= "Sister"ID= "Link1"><!--Elsie -</a>,7 <ahref= "Http://example.com/lacie"class= "Sister"ID= "Link2">Lacie</a> and8 <ahref= "Http://example.com/tillie"class= "Sister"ID= "Link3">Tillie</a>;9And they lived at the bottom for a well.</P>Ten <Pclass= "Story">...</P> One"""
When we write CSS, the tag name does not have any decoration, the class name is added before, the ID name plus #, where we can also use a similar method to filter elements, the method used is Soup.select (), the return type is a list
(1) Search by tag name
Print Soup.select (' title ') #[<title>the dormouse ' s story</title>] Print soup.select (' a ') #[<a class= " Sister "href=" Http://example.com/elsie "id=" Link1 "><!--Elsie--></a>, <a class=" sister "href=" http ://example.com/lacie "id=" Link2 ">LACIE</A> <a class=" sister "href=" Http://example.com/tillie "id=" Link3 ">TILLIE</A>] Print soup.select (' B ') #[<b>the dormouse ' s story</b>]
(2) Search by class name
Print Soup.select ('. Sister ') #[<a class= "sister" href= "Http://example.com/elsie" id= "Link1" ><!--Elsie-- ></a>, <a class= "sister" href= "Http://example.com/lacie" id= "Link2" >LACIE</A>, <a class= " Sister "href=" Http://example.com/tillie "id=" Link3 ">TILLIE</A>]
(3) Search by ID name
Print Soup.select (' #link1 ') #[<a class= "sister" href= "Http://example.com/elsie" id= "Link1" ><!--Elsie-- </a>]
(4) Combination Search
Combination of the search and write class file, tag name and class name, id name of the composition of the principle is the same, for example, find the P tag, the ID equals LINK1 content, the two need to be separated by a space
Print Soup.select (' P #link1 ') #[<a class= "sister" href= "Http://example.com/elsie" id= "Link1" ><!--Elsie-- ></a>]
Direct Child label Lookup
Print Soup.select ("head > title") #[<title>the Dormouse ' s story</title>]
(5) Property Lookup
You can also add attribute elements to the lookup, the attributes need to be enclosed in brackets, note that the property and label belong to the same node, so there is no space in the middle, otherwise it will not match.
Print Soup.select ("head > title") #[<title>the Dormouse ' s story</title>] Print soup.select (' a[href= ') Http://example.com/elsie "] #[<a class=" sister "href=" Http://example.com/elsie "id=" Link1 "><!--Elsie-- ></a>]
Similarly, attributes can still be combined with the above lookup method, not separated by spaces on the same node, without spaces on the same node
Print Soup.select (' P a[href= "Http://example.com/elsie"] ') #[<a class= "sister" href= "Http://example.com/elsie" id= "Link1" ><!--Elsie--></a>]
Python crawler: BeautifulSoup using the Select method