Use Python to make the simplest crawler
--The Heart
#第一种方法
Import urllib2 #将urllib2库引用进来
Response=urllib2.urlopen ("http://www.baidu.com") #调用库中的方法 to encapsulate the request response into the response object
Html=response.read () #调用response对象的read () method to assign the response string to the hhtml variable
Print HTML #打印出来
#第二中方法
Import Urllib2
Req=urllib2. Request ("http://ww.baidu.com")
Response=urllib2.urlopen (req)
html = response.read ()
Print HTML
In General, the above crawler, if a lot of crawling, will be restricted access, so to disguise as a browser to access
this place is accessed in disguise as a IE9.0 .
#要求请的url地址
Import Urllib2
url= "http://www.baidu.com"
#要伪装的浏览器user_agent头
user_agent= "mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/39.0.2171.71 safari/537.36; "
#创建字典, the ' user-agent ' in the headers of the request: corresponding user_agent string
headers={' user-agent ': user_agent}
#新建一个请求, transform the headers in the request into its own defined
req =urllib2. Request (url,headers=headers)
#请求服务器, get a response.
Response=urllib2.urlopen (req)
#得到回应内容
The_page=response.read ()
#打印结果
Print The_page
Use Python to make the simplest crawler