Python Learning note __12.9 urlib

Source: Internet
Author: User

# This is a learning note for the Liaoche teacher Python tutorial

1. Overview

Urllib provides a series of functions for manipulating URLs.

The urllib includes four modules, including

    • urllib.request " can be used to send the request and get the result of the request

    • urllib.error

    • urllib.parse

    • urllib.robotparse : robots.txt file for parsing pages

1.1 , u rllib.request

The Urllib request module makes it easy to crawl URL content.

It sends a GET request to the specified page first, and then returns the HTTP response:

1 ) to the watercress of a URL crawl, and return a response

From Urllib Import Request

# Request Module calls the Urlopen method to open the URL

With Request.urlopen (' https://api.douban.com/v2/book/2129650 ') as F:

data = F.read () # returned page content

Print (' Status: ', f.status, F.reason)

For K, V in f.getheaders ():

Print ('%s:%s '% (k, v))

Print (' Data: ', data.decode (' utf-8 '))

2 ) Simulation IPhone 6 to request the Watercress homepage

The impersonation browser sends a GET request using the Request object. By adding HTTP headers to the Request object, we can disguise the requests as a variety of browsers

From Urllib Import Request

req = Request. Request (' http://www.douban.com/') # created a Request object req is a class

# Add the requested header information

req.add_header (' user-agent ', ' mozilla/6.0 (iPhone; CPU iPhone os 8_0 like Mac os X applewebkit/536.26 (khtml, like Gecko) version/8.0 mobile/10a5376e safari/8536.25 ')

With Request.urlopen (req) as F: # will be Request Object as URL Incoming

Print (' Status: ', F.status, F.reason)

For K, V in F.getheaders ():

Print ('%s:%s '% (k, v))

Print (' Data: ', F.read (). Decode (' Utf-8 '))

1.2 , P OST

If you want to send a request as a post, you only need to pass the parameter data in bytes form.

1 We simulate a microblog login, read the login mailbox and password first, and then follow the format of the weibo.cn login page to username=xxx&password=xxx the encoding passed in

From Urllib import request, parse

Print (' Login to weibo.cn ... ')

email = input (' Email: ')

passwd = input (' Password: ')

Login_data = Parse.urlencode ( # # using the urlencode method of parse to encode the data to be passed

(' username ', email),

(' Password ', passwd),

(' entry ', ' Mweibo '),

(' client_id ', '),

(' SaveState ', ' 1 '),

(' EC ', '),

(' Pagerefer ', ' Https://passport.weibo.cn/signin/welcome?entry=mweibo&r=http%3A%2F%2Fm.weibo.cn%2F ')

])

req = Request. Request (' Https://passport.weibo.cn/sso/login ') # create request Object

Req.add_header (' Origin ', ' https://passport.weibo.cn ')

Req.add_header (' user-agent ', ' mozilla/6.0 (iPhone; CPU iPhone os 8_0 like Mac os X applewebkit/536.26 (khtml, like Gecko) version/8.0 mobile/10a5376e safari/8536.25 ')

Req.add_header (' Referer ', ' https://passport.weibo.cn/signin/login?entry=mweibo&res=wel&wm=3349&r= http%3a%2f%2fm.weibo.cn%2f ')

with Request.urlopen (req, Data=login_data.encode (' Utf-8 ')) As F: # data is used to indicate additional information sent to the server request

print (' Status: ', F.status, F.reason)

For K, V in F.getheaders ():

Print ('%s:%s '% (k, v))

Print (' Data: ', F.read (). Decode (' Utf-8 '))

1.3 , Header

1 If more complex controls are needed, such as through a proxy to access the site, we need to use Proxyhandler to handle

Proxy_handler = Urllib.request.ProxyHandler ({' http ': 'http://www.example.com:3128/'}) # Create an agent

Proxy_auth_handler = Urllib.request.ProxyBasicAuthHandler () # set up Basic authentication management, use agent to process identity authentication

# Relam : Scope of the agent, ' Host ' : Agent URL ,

# with a proxy URL that is provided using programming (' Host ') Replace the default Proxyhandler

Proxy_auth_handler.add_password (' Realm ', ' host ', ' username ', ' password ')

opener = Urllib.request.build_opener (Proxy_handler, Proxy_auth_handler) # returns a Openerdirector instance

With opener.open (' http://www.example.com/login.html ') as F: # visit URL

Pass

1.4 , Summary

Urllib provides the ability to use the program to execute various HTTP requests. If you want to simulate a browser to complete a specific function, you need to disguise the request as a browser. The camouflage method is to first monitor the browser's request, and then according to the browser's request header to disguise,user-agent header is used to identify the browser.

1.5 , expand Documents

Python3 web crawler "Send request using Urllib.request" (76067790)

Python in Urlopen () Introduction (https://www.cnblogs.com/zyq-blog/p/5606760.html)

python Why crawlers Use opener object and why you want to create a global default opener Object (https://www.cnblogs.com/cunyusup/p/7341829.html)

2 , examples

1 , use Urllib to read the JSON, and then parse the JSON into a Python object:

#-*-Coding:utf-8-*-

From Urllib Import Request

Import JSON

def fetch_data (URL):

With Request.urlopen (URL) as f:

data = Json.loads (F.read (). Decode (' Utf-8 ')) # decode the content of the Read page, and then json.loads () Deserialize to Python Object

Return data

# test

URL = ' https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20weather.forecast%20where%20woeid%20%3d% 202151330&format=json '

data = Fetch_data (URL)

Print (data)

Assert data[' query ' [' Results '] [' channel '] [' location '] [' city '] = = ' Beijing '

Print (' OK ')


Python Learning note __12.9 urlib

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.