Use the mechanism module in Python to simulate the browser function

Source: Internet
Author: User

Use the mechanism module in Python to simulate the browser function

This article describes how to use the mechanism module in Python to simulate browser functions, including cookie and proxy settings. For more information, see

It is usually useful to know how to quickly instantiate a browser in a command line or python script.

Every time I need to do any Automatic web tasks, I use this python code to simulate a browser.

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

Import mechanic

Import cookielib

# Browser

Br = mechanic. Browser ()

# Cookie Jar

Cj = cookielib. LWPCookieJar ()

Br. set_cookiejar (cj)

# Browser options

Br. set_handle_equiv (True)

Br. set_handle_gzip (True)

Br. set_handle_redirect (True)

Br. set_handle_referer (True)

Br. set_handle_robots (False)

# Follows refresh 0 but not hangs on refresh> 0

Br. set_handle_refresh (mechanic. _ http. HTTPRefreshProcessor (), max_time = 1)

# Want debugging messages?

# Br. set_debug_http (True)

# Br. set_debug_redirects (True)

# Br. set_debug_responses (True)

# User-Agent (this is cheating, OK ?)

Br. addheaders = [('user-agent', 'mozilla/5.0 (X11; U; Linux i686; en-US; rv: 1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1. fc9 Firefox/3.0.1 ')]

Now you get a browser example, the br object. With this object, you can open a page and use code similar to the following:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

# Open some site, let's pick a random one, the first that pops in mind:

R = br. open ('HTTP: // google.com ')

Html = r. read ()

# Show the source

Print html

# Or

Print br. response (). read ()

# Show the html title

Print br. title ()

# Show the response headers

Print r.info ()

# Or

Print br. response (). info ()

# Show the available forms

For f in br. forms ():

Print f

# Select the first (index zero) form

Br. select_form (nr = 0)

# Let's search

Br. form ['q'] = 'weekend Code'

Br. submit ()

Print br. response (). read ()

# Looking at some results in link format

For l in br. links (url_regex = 'stockrt '):

Print l

If the website you visit needs to be verified (http basic auth), then:

?

1

2

3

4

# If the protected site didn't receive the authentication data you wowould

# End up with a 410 error in your face

Br. add_password ('HTTP: // safe-site.domain ', 'username', 'Password ')

Br. open ('HTTP: // safe-site.domain ')

Because Cookie Jar is used before, you do not need to manage the logon session of the website. That is, you do not need to POST a user name and password.

In this case, the website will request your browser to store a session cookie unless you log on again,

As a result, your cookie contains this field. All these things have been done by cookie Jar to store and resend the session Cookie.

At the same time, you can manage your browser history:

?

1

2

3

4

5

6

7

8

9

10

11

12

# Testing presence of link (if the link is not found you wowould have

# Handle a LinkNotFoundError exception)

Br. find_link (text = 'weekend Code ')

# Actually clicking the link

Req = br. click_link (text = 'weekend Code ')

Br. open (req)

Print br. response (). read ()

Print br. geturl ()

# Back

Br. back ()

Print br. response (). read ()

Print br. geturl ()

Download an object:

?

1

2

3

4

# Download

F = br. retrieve ('HTTP: // www.google.com.br/intl/pt-BR_br/images/logo.gif') [0]

Print f

Fh = open (f)

Set proxy for http

?

1

2

3

4

5

6

# Proxy and user/password

Br. set_proxies ({"http": "joe: password@myproxy.example.com: 3128 "})

# Proxy

Br. set_proxies ({& quot; http & quot;: & quot; myproxy.example.com: 3128 & quot "})

# Proxy password

Br. add_proxy_password ("joe", "password ")

However, if you only want to open the web page and do not need all the magical functions, you can:

?

1

2

3

4

5

6

7

# Simple open?

Import urllib2

Print urllib2.urlopen ('HTTP: // stockrt.github.com '). read ()

# With password?

Import urllib

Opener = urllib. FancyURLopener ()

Print opener. open ('HTTP: // user: password@stockrt.github.com '). read ()

You can learn more from the official website of "machize", "machize", and "ClientForm.

From: http://reyoung.me/index.php/2012/08/08/%E7%BF%BB%E8%AF%91%E4%BD%BF%E7%94%A8python%E6%

A8 % A1 % E4 % BB % BF % E6 % B5 % 8F % E8 % A7 % 88% E5 % 99% A8 % E8 % A1 % 8C % E4 % B8 % BA/

------------------------------

Finally, let's talk about a very important concept and technology when accessing a page through code: cookie.

We all know that HTTP is a non-connection status protocol, but the client and server need to maintain some mutual information, such as cookies. With cookies, the server can know that the user just logged on to the website, to allow the client to access some pages.

For example, if you use a browser to log on to Sina Weibo, you must first log on. After successful login, you can access other web pages. When you use a program to log on to Sina Weibo or another verification website, the key point is that you need to save the cookie and then access the website with the cookie to achieve the effect.

Here, we need the cooperation of cookielib and urllib2 of Python, and bind cookielib to urllib2 to attach a cookie to the request webpage.

The first step is to use the httpfox plug-in of firefox to browse the Sina Weibo homepage in the browser, and then log on to it, view the URL of the data request sent in each step, and then simulate the process in python. Use urllib2.urlopen to send the user name and password to the login page and obtain the cookie after login, then visit other pages to get Weibo data.

The main function of the cookielib module is to provide objects that can store cookies for use with the urllib2 module to access Internet resources. For example, you can use the CookieJar class object of this module to capture the cookie and resend it in subsequent connection requests. The coiokielib module mainly uses the following objects: CookieJar, FileCookieJar, MozillaCookieJar, and LWPCookieJar.

The urllib module is similar to the urllib module. It is used to open a URL and obtain data from it. Unlike the urllib module, the urllib module can not only use the urlopen () function, but also customize Opener to access webpages. Note that the urlretrieve () function is in the urllib module and does not exist in the urllib2 module. However, when using the urllib2 module, the urllib module is generally inseparable, because the POST data must be encoded using the urllib. urlencode () function.

The cookielib module is generally used in combination with the urllib2 module. It is mainly used in the urllib2.build _ extract () function as a parameter of urllib2.HTTPCookieProcessor. Use the following code to log on to Renren:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

#! /Usr/bin/env python

# Coding = UTF-8

Import urllib2

Import urllib

Import cookielib

Data = {"email": "username", "password": "password"} # login username and password

Post_data = urllib. urlencode (data)

Cj = cookielib. CookieJar ()

Opener = urllib2.build _ opener (urllib2.HTTPCookieProcessor (cj ))

Headers = {"User-agent": "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1 "}

Req = urllib2.Request ("http://www.renren.com/PLogin.do", post_data, headers)

Content = opener. open (req)

Print content. read (). decode ("UTF-8"). encode ("gbk ")

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.