Urllib module Use

Source: Internet
Author: User
URLLIB2 Basic Operation 1, open Web page (urlopen)

Open a Web page

Import urllib2response = Urllib2.urlopen (' http://www.baidu.com ') html= response.read () print HTML

Urlopen commonly used has three parameters, its parameters are as follows:

Urllib.requeset.urlopen (Url,data,timeout)

Use of the data parameter (GET)

Import urllib  Import urllib2  data = {' email ': ' myemail ', ' Password ': ' Password '}  
response= Urllib.urlopen ("%s?%s"% (URI, params))

use of the data parameter (POST)

Import urllib  Import urllib2  data = {' email ': ' myemail ', ' Password ': ' Password '}  

So if we add the data parameter as a POST request, if there is no data parameter is the GET request way

Use of the timeout parameter

In some cases of poor network conditions or server-side exceptions, the request is slow, the request set a time-out

Import urllib2response = Urllib2.urlopen (' http://www.baidu.com ', timeout=1) print (Response.read ())
2. Open Web page (request)

Open a Web page

Import urllib.requestrequest = urllib.request.Request (' https://www.baidu.com ') response = Urllib.request.urlopen ( Request) print (Response.read (). Decode (' Utf-8 '))

Specify the request header

Import urllib2# make request Header headers = {"User-agent": "mozilla/5.0 (Windows NT 10.0; WOW64) "}# Package Request requests = Urllib2. Request (Url=url, headers=headers) response = Urllib2.urlopen (request) content = Response.read (). Decode (' Utf-8 ') print Content
3. Advanced

Add Agent

# custom headersheaders = {    ' Host ': ' www.dianping.com ',    ' Cookie ': ' jsessionid=f1c38c2f1a7f7bf3bcb0c4e3ccdbe245 aburl=1; cy=2, "    " User-agent ': "mozilla/5.0 (Windows; U Windows NT 6.1; En-US) applewebkit/532.5 (khtml, like Gecko) chrome/4.0.249.0 safari/532.5 ",    }proxy_handler = urllib2. Proxyhandler ({' http ': ' Http://host:port '}) opener = Urllib2.build_opener (Proxy_handler) Urllib2.install_opener ( Opener) Request = Urllib2. Request (URL, headers=headers) response = Urllib2.urlopen (request) content = Response.read (). Decode (' Utf-8 ')

Manipulating Cookies

Import urllib2import Cookielibimport Jsoncookie = cookielib. Cookiejar () cookie_s = Urllib2. Httpcookieprocessor (Cookie)  # Create cookie Processor opener = Urllib2.build_opener (cookie_s) # Build Openerurllib2.install_ Opener (opener) response= Urllib2.urlopen (' http://www.dianping.com '). Read ()  # reads the contents of the specified Web site  CJ = urllib2. Httpcookieprocessor (cookie) Print Response    # page html# view Cookieprint cookie, type (cookie) for item in cookie:    print ' name: ' + item.name + '-value: ' + item.value

Save cookies

Def savecookie ():    # Sets the file that holds the cookie    filename = ' cookie.txt '    # declares a Mozillacookiejar object to save the cookie, and then writes the file    Cookie = Cookielib. Mozillacookiejar (filename)    # Creates a cookie processor    handler = Urllib2. Httpcookieprocessor (Cookie)    # Build opener    opener = Urllib2.build_opener (handler)    # Create request    res = Opener.open (' http://www.baidu.com ')    # Save cookies to file    # Ignore_discard means that even if cookies are discarded, they will be saved.    Ignore_expires means that if the cookie already exists in the file, overwrite the original file to write    Cookie.save (ignore_discard=true, Ignore_expires=true)

Remove a cookie from a file

Def getcookie ():    # Create a Mozillacookiejar object    cookie = cookielib. Mozillacookiejar ()    # Read cookie content from file to variable    cookie.load (' Cookie.txt ', Ignore_discard=true, ignore_expires= True)    # Print the cookie content to prove that the cookie was successful for the    item in cookie:        print ' name: ' + item.name + '-value: ' + item.value    # Lee Create a opener handler = Urllib2 with the cookie you acquired    . Httpcookieprocessor (cookie)    opener = Urllib2.build_opener (handler)    res = opener.open ('/HTTP/ Www.baidu.com ')    print res.read ()
An example.
Def my_cookie_test (): headers = {' user-agent ': "mozilla/5.0 (Windows; U Windows NT 6.1; En-US) applewebkit/532.5 (khtml, like Gecko) chrome/4.0.249.0 safari/532.5 ", ' Accept ': ' text/html,application/xhtml +xml,application/xml;q=0.9,*/*;q=0.8 ', ' accept-language ': ' zh-cn,zh;q=0.8,en;q=0.6,zh-tw;q=0.4 ', ' Connection ': ' keep-alive ', ' Cookie ': ' cy=2; _LXSDK_CUID=16000A1A16CC8-0629D2CA3B9F7-40544230-100200-16000A1A16DC8; _LXSDK=16000A1A16CC8-0629D2CA3B9F7-40544230-100200-16000A1A16DC8; _lxsdk_s=16000a1a16f-c56-870-2aa%7c%7c23; _hc.v=44792549-7147-7394-ac0a-eefed1fa19a2.1511839081; S_viewtype=10 ', ' Host ': ' www.dianping.com ', ' Referer ': ' Http://www.dianping.com/shop ', ' upgrade-insec Ure-requests ': 1} # Request Cookie cj_a = Cookielib. Cookiejar () cj_s = Urllib2. Httpcookieprocessor (cj_a) proxy_s = Urllib2.    Proxyhandler ({' http ': ' 0.0.0.0:8080 '}) opener = Urllib2.build_opener (proxy_s, cj_s) Urllib2.install_opener (opener) TrY:request = Urllib2.  Request ("Http://www.dianping.com/shop/000000/", headers=headers) response = Urllib2.urlopen (Request) content            = Response.read (). Decode (' Utf-8 ') # HTML print content cookie_data = {} for item in CJ_A: # print ' Request: Name: ' + item.name + '-value: ' + item.value cookie_data[item.name] = Item.value Co Okie_str = Json.dumps (Cookie_data) with open (' Cookie.txt ', ' W ') as F:f.write (cookie_str) print (" Cookie information has been saved to local ") except Exception as E:print E

Web Information Extraction ... Waiting for the next period ...

Urllib module Use

Large-Scale Price Reduction
  • 59% Max. and 23% Avg.
  • Price Reduction for Core Products
  • Price Reduction in Multiple Regions
undefined. /
Connect with us on Discord
  • Secure, anonymous group chat without disturbance
  • Stay updated on campaigns, new products, and more
  • Support for all your questions
undefined. /
Free Tier
  • Start free from ECS to Big Data
  • Get Started in 3 Simple Steps
  • Try ECS t5 1C1G
undefined. /

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.