Use the mechanism module in Python to simulate the browser function

Last Update:2015-05-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article describes how to use the mechanism module in Python to simulate browser functions, including cookie and proxy settings. For more information, see

It is usually useful to know how to quickly instantiate a browser in a command line or python script.

Every time I need to do any Automatic web tasks, I use this python code to simulate a browser.

Import mechanic

Import cookielib

# Browser

Br = mechanic. Browser ()

# Cookie Jar

Cj = cookielib. LWPCookieJar ()

Br. set_cookiejar (cj)

# Browser options

Br. set_handle_equiv (True)

Br. set_handle_gzip (True)

Br. set_handle_redirect (True)

Br. set_handle_referer (True)

Br. set_handle_robots (False)

# Follows refresh 0 but not hangs on refresh> 0

Br. set_handle_refresh (mechanic. _ http. HTTPRefreshProcessor (), max_time = 1)

# Want debugging messages?

# Br. set_debug_http (True)

# Br. set_debug_redirects (True)

# Br. set_debug_responses (True)

# User-Agent (this is cheating, OK ?)

Br. addheaders = [('user-agent', 'mozilla/5.0 (X11; U; Linux i686; en-US; rv: 1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1. fc9 Firefox/3.0.1 ')]

Now you get a browser example, the br object. With this object, you can open a page and use code similar to the following:

# Open some site, let's pick a random one, the first that pops in mind:

R = br. open ('HTTP: // google.com ')

Html = r. read ()

# Show the source

Print html

# Or

Print br. response (). read ()

# Show the html title

Print br. title ()

# Show the response headers

Print r.info ()

# Or

Print br. response (). info ()

# Show the available forms

For f in br. forms ():

Print f

# Select the first (index zero) form

Br. select_form (nr = 0)

# Let's search

Br. form ['q'] = 'weekend Code'

Br. submit ()

Print br. response (). read ()

# Looking at some results in link format

For l in br. links (url_regex = 'stockrt '):

Print l

If the website you visit needs to be verified (http basic auth), then:

# If the protected site didn't receive the authentication data you wowould

# End up with a 410 error in your face

Br. add_password ('HTTP: // safe-site.domain ', 'username', 'Password ')

Br. open ('HTTP: // safe-site.domain ')

Because Cookie Jar is used before, you do not need to manage the logon session of the website. That is, you do not need to POST a user name and password.

In this case, the website will request your browser to store a session cookie unless you log on again,

As a result, your cookie contains this field. All these things have been done by cookie Jar to store and resend the session Cookie.

At the same time, you can manage your browser history:

# Testing presence of link (if the link is not found you wowould have

# Handle a LinkNotFoundError exception)

Br. find_link (text = 'weekend Code ')

# Actually clicking the link

Req = br. click_link (text = 'weekend Code ')

Br. open (req)

Print br. response (). read ()

Print br. geturl ()

# Back

Br. back ()

Print br. response (). read ()

Print br. geturl ()

Download an object:

# Download

F = br. retrieve ('HTTP: // www.google.com.br/intl/pt-BR_br/images/logo.gif') [0]

Print f

Fh = open (f)

Set proxy for http

# Proxy and user/password

Br. set_proxies ({"http": "joe: password@myproxy.example.com: 3128 "})

# Proxy

Br. set_proxies ({& quot; http & quot;: & quot; myproxy.example.com: 3128 & quot "})

# Proxy password

Br. add_proxy_password ("joe", "password ")

However, if you only want to open the web page and do not need all the magical functions, you can:

# Simple open?

Import urllib2

Print urllib2.urlopen ('HTTP: // stockrt.github.com '). read ()

# With password?

Import urllib

Opener = urllib. FancyURLopener ()

Print opener. open ('HTTP: // user: password@stockrt.github.com '). read ()

You can learn more from the official website of "machize", "machize", and "ClientForm.

From: http://reyoung.me/index.php/2012/08/08/%E7%BF%BB%E8%AF%91%E4%BD%BF%E7%94%A8python%E6%

A8 % A1 % E4 % BB % BF % E6 % B5 % 8F % E8 % A7 % 88% E5 % 99% A8 % E8 % A1 % 8C % E4 % B8 % BA/

------------------------------

Finally, let's talk about a very important concept and technology when accessing a page through code: cookie.

We all know that HTTP is a non-connection status protocol, but the client and server need to maintain some mutual information, such as cookies. With cookies, the server can know that the user just logged on to the website, to allow the client to access some pages.

For example, if you use a browser to log on to Sina Weibo, you must first log on. After successful login, you can access other web pages. When you use a program to log on to Sina Weibo or another verification website, the key point is that you need to save the cookie and then access the website with the cookie to achieve the effect.

Here, we need the cooperation of cookielib and urllib2 of Python, and bind cookielib to urllib2 to attach a cookie to the request webpage.

The first step is to use the httpfox plug-in of firefox to browse the Sina Weibo homepage in the browser, and then log on to it, view the URL of the data request sent in each step, and then simulate the process in python. Use urllib2.urlopen to send the user name and password to the login page and obtain the cookie after login, then visit other pages to get Weibo data.

The main function of the cookielib module is to provide objects that can store cookies for use with the urllib2 module to access Internet resources. For example, you can use the CookieJar class object of this module to capture the cookie and resend it in subsequent connection requests. The coiokielib module mainly uses the following objects: CookieJar, FileCookieJar, MozillaCookieJar, and LWPCookieJar.

The urllib module is similar to the urllib module. It is used to open a URL and obtain data from it. Unlike the urllib module, the urllib module can not only use the urlopen () function, but also customize Opener to access webpages. Note that the urlretrieve () function is in the urllib module and does not exist in the urllib2 module. However, when using the urllib2 module, the urllib module is generally inseparable, because the POST data must be encoded using the urllib. urlencode () function.

The cookielib module is generally used in combination with the urllib2 module. It is mainly used in the urllib2.build _ extract () function as a parameter of urllib2.HTTPCookieProcessor. Use the following code to log on to Renren:

#! /Usr/bin/env python

# Coding = UTF-8

Import urllib2

Import urllib

Import cookielib

Data = {"email": "username", "password": "password"} # login username and password

Post_data = urllib. urlencode (data)

Cj = cookielib. CookieJar ()

Opener = urllib2.build _ opener (urllib2.HTTPCookieProcessor (cj ))

Headers = {"User-agent": "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1 "}

Req = urllib2.Request ("http://www.renren.com/PLogin.do", post_data, headers)

Content = opener. open (req)

Print content. read (). decode ("UTF-8"). encode ("gbk ")

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Use the mechanism module in Python to simulate the browser function

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Use the mechanism module in Python to simulate the browser function

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support