Python3x, we can get the content of the Web page in two ways
Get address: National Geographic Chinese Network
url = ' http://www.ngchina.com.cn/travel/'
Urllib Library
1, guide warehousing
From Urllib Import Request
2, get the content of the Web page
With Request.urlopen (URL) as file:
data = File.read ()
print (data)
Run found an error:
Urllib.error.HTTPError:HTTP Error 403:forbidden
Mainly because the site is prohibited from crawler-led, can be in the request with the header information, disguised as a browser to access user-agent
Then we add a ' user-agent ' field to the request's head.
headers = {' user-agent ': ' mozilla/5.0 Linux; Android 6.0; Nexus 5 build/mra58n) applewebkit/537.36 (khtml, like Gecko) chrome/66.0.3359.139 Mobile safari/537.36 '}
# Create request
req = Request. Request (Url=url, headers=headers) with
Request.urlopen (req) as response:
# Read the contents of response, and transcoding
data1 = Response.read (). Decode (' Utf-8 ') # Default to Utf-8
print (data1)
For user-agent, we can use Google Browser's developer tool to capture view
Requests
1, guide warehousing
Import requests
2, get the content of the Web page
With Requests.get (Url=url, headers=headers) as response:
# Read the contents of response, and transcoding
data2 = Response.content.decode ()
print (DATA2)
Add:
Requests's response can also get more information, including cookies, head, status, URL and so on, for more information, please refer to other information.
Response.Cookies
Response.headers
Response.status_code
Response.url
Under the python2x, you can refer to this article
Python open Web page get Content method summary