Python web crawler (1) -- url access and parameter settings, python -- url
Environment: Python2.7.9/Sublime Text 2/Chrome
1. Access the url and directly call the urllib library function.
import urllib2url='http://www.baidu.com/'response = urllib2.urlopen(url)html=response.read()print html
2. Access with parameters. The baidu search function is used as an example.
Use the Chrome browser to access the results. Set the Chrome search engine to baidu and enter test in the address bar. The results are as follows:
# Coding = utf-8import urllibimport urllib2 # url = 'https: // www.baidu.com/s'?#values= {'ie': 'utf-8', 'wd ': 'test'} # encapsulate the parameter data = urllib. urlencode (values) # assemble complete urlreq = urllib2.Request (url, data) # access complete urlresponse = urllib2.urlopen (req) html = response. read () print html
Run the Code. (If a Decode error occurs in Sublime Text, set Python. sublime-build to "encoding": "UTF-8 ".)
# Coding = utf-8import urllibimport urllib2 # url = 'https: // www.baidu.com/s'?#values= {'ie': 'utf-8', 'wd ': 'test'} # encapsulate the parameter data = urllib. urlencode (values) # assemble the complete url # req = urllib2.Request (url, data) url = url + '? '+ Data # access the complete url # response = urllib2.urlopen (req) response = urllib2.urlopen (url) html = response. read () print html
Run again and the result is
# Coding = utf-8import urllibimport urllib2 # url address # url = 'https: // encode {'ie': 'utf-8', 'wd ': 'test'} # encapsulate the parameter data = urllib. urlencode (values) # assemble the complete url # req = urllib2.Request (url, data) url = url + '? '+ Data # access the complete url # response = urllib2.urlopen (req) response = urllib2.urlopen (url) html = response. read () print html
Run again to enable normal access