1. Online resources will be requested:
1 Import Requests 2 res=requests.get ('http://*******')3 res.encoding=' utf-8'4print(res.text)
This uses the requests get method to get the HTML, specifically get or post and so on through the page header information to query:
For example, Baidu's method is can use get get.
2, will get the Web page use BeautifulSoup to analyze
1 fromBs4ImportBeautifulSoup2Soup=beautifulsoup (Res.text,'Html.parser')3 Print(soup)#you can see the contents of the Web page4 forNewsinchSoup.select ('. News-item'):#Crawl Some news information5Header=news.select ('H1') [0].text#News Headlines6Time=news.select ('. Time') [0]#Time7 Print(Header,time)
This need to pay attention to the problem of the node, in the view of the source code of the page to distinguish the location of information storage, step by step to analyze, reasonable use for the loop.
Python's base crawler (using requests and BS4)