I've been exposed to reptile knowledge before, but in general it's best to use Python because Python has a lot of libraries for data collection,
In addition, the language of Python is more concise and easy to read, in the process of acquisition, maintenance is more convenient!
Let's start with a few simple data collection:
Import requests
Req=requests.get ("http://www.baidu.com")
Print Req.content
This is equivalent to a simple HTTP GET request!
Import Urllib
Req=urllib.urlopen ("http://www.baidu.com")
Print Req.read ()
This is equivalent to a simple reptile.
Import Requests,json
Url= "Http://www.baidu.com/login"
payload={' username ': ' admin ', ' pwd ': ' mima123 '}
Req=requests.get (Url,data=json.dumps (payload))
Print Req.content
This is an HTTP GET request with parameter returns the result is the HTML of the login success interface, or the HTML of the login failure interface
Import Requests,time,urllib
Req1=requests.get ("http://www.baidu.com")
Print Req1.content
Time.sleep (5) #---------5 seconds after the next step
Req2=requests.get ("http://www.sougou.com")
Print Req2.content
Time.sleep (5) #---------5 seconds after the next step
Req3=urllib.urlopen ()
Always multiple simple page crawls with 5 second intervals in the middle
Import Urllib,urllib2
url = ' Https://api.douban.com/v2/book/user/ahbei/collections '
data={' status ': ' read ', ' rating ': 3, ' tag ': ' Novel '}
Data=urllib.urlencode (data)
Req=urllib2. Request (Url,data)
Res=urllib2.urlopen (req)
Print Res.read ()
This is a standard post request, but due to multiple visits to the site, it is easy for IP to be blocked
Import Urllib,urllib2
url = ' Https://api.douban.com/v2/book/user/ahbei/collections '
data={' status ': ' read ', ' rating ': 3, ' tag ': ' Novel '}
Data=urllib.urlencode (data)
req=url+ '? ' +data #----------------different than the above POST request
Res=urllib2.urlopen (req)
Print Res.read ()
This is a GET request with parameters
Import Urllib,urllib2
body_json{
}
Req=urllib2. Request ("Http://www.baidu.com/login", Data=urllib.urlencode (Body_json))
Res=urllib2.urlopen (req)
Print Res.read ()
This is a POST request with the body part
Share this today and share some more about Python crawlers and interface articles next time.
Python web crawler