Used to sneak up on things, wrong, is to make reptiles are used urllib, but really is very troublesome, the following on the use of requests + BeautifulSoup crawl Simple page.
The details are all commented out in the code, as you can see.
#-*-coding:utf-8-*-"""Created on Thu Jul 5 20:48:25 2018@author:brave-manblog:http://www.cnblogs.com/zrmw/python3 + Anaconda (Spyder) + RE Squests + BeautifulSoup Here the environment is used to speak yesterday Anaconda under the Spyder, very convenient, who use who know"""ImportRequests fromBs4ImportBeautifulSoup#From termcolor Import colored#Console output text color control, the network is not very good, not installed Termcolor, but in the company tested, function parameters should be no problem#print ("abc", "Red")#get the entire response page through the Get method in the requests library, stored in resres = Requests.get ("https://www.cnblogs.com/zdong0103/p/8492779.html")#(1) res.encoding = "Utf-8"Soup = BeautifulSoup (Res.text,"Html.parser")#The source code of the entire response page will be output in the console if the printed soup#print (soup)#If the print is garbled, you can add (1) The code shown at (1), set the encoding format, but sometimes it is not necessary. #Next, the source of the Web page analysis"""in the Web page, press F12 to view the page source code, the article title in class = "Block_title" Inside, Soup.select (". Block_title") Gets a list, gets the first element of this list, so index = 0, from the label Get text generally using the text method can be the same as above, the body in class = "Blogpost-body" ..."""title= Soup.select (". Block_title") [0].texttexts= Soup.select (". Blogpost-body") [0].texttime= Soup.select (". Itemdesc span") [0].textauthor= Soup.select ("#header") [0].textPrint(title, author, time, texts)
found that their ability to express is really slag ah, slowly improve it.
Python crawler (i) requests+beautifulsoup crawl Simple page code example