Python uses mechanize to simulate the browser

Source: Internet
Author: User

Python uses mechanize to simulate the browser

Before I use my own URLLIB2 simulation browser to access Web pages and other operations, many sites will be wrong, will return garbled, and then use the Mechanize simulation browser, these conditions have not appeared, really good, here to recommend to you.
Mechanize is the replacement of some of the functions of the URLLIB2, which can better simulate the browser behavior and be more comprehensive in Web Access control.

Let's start with the installation, taking Ubuntu as an example:

Python most of the third-party packages are standard installation, after downloading from the official website, unzip into a folder, and then execute this command in this folder is OK:

Python setup.py Install

Website website :
http://wwwsearch.sourceforge.net/mechanize/

A more convenient way is to install the Easy_install tool First:

Normally, we would like to install a third-party expansion pack for Python, we must download the package, unzip it to a directory, then command line or terminal to open the directory, and then execute
Python setup.py Install
To install them.
And with Easy_install, we can execute the command line directly.
Easy_install xxx
The latest version of XXX packaging up
So Easy_install is just for us. It's easier to install a third-party expansion pack

Installation method:

First download the Easy_install installation package:
Http://pypi.python.org/pypi/setuptools

Download the corresponding version, Windows directly run EXE installed on it

Linux can be run directly above
SH Setuptools-0.6c9-py2.4.egg

Once the installation is complete, Easy_install will be automatically copied to the Bin directory, which is our path path, so we can run the Easy_install command directly in the terminal.

There are also simple installation methods under Ubuntu:

The command to install Easy_install is as follows:

sudo apt-get install Python-setuptools

Then install mechanize with Easy_install, i.e.:

sudo easy_install mechanize

After installation, you can use it happily, first of all to simulate a browser's code:

Import Mechanizeimport Cookielib# Browserbr = Mechanize. Browser ()# Cookie JarCJ = Cookielib. Lwpcookiejar () Br.set_cookiejar (CJ)# Browser OptionsBr.set_handle_equiv (True) Br.set_handle_gzip (True) Br.set_handle_redirect (True) Br.set_handle_referer (true) br.set _handle_robots (False)# follows refresh 0 hangs on refresh > 0Br.set_handle_refresh (mechanize._http. Httprefreshprocessor (), max_time=1)# Want debugging messages?#br. Set_debug_http (True)#br. Set_debug_redirects (True)#br. set_debug_responses (True)# user-agent (This is cheating, OK?)Br.addheaders = [(' User-agent ',' mozilla/5.0 (X11; U Linux i686; En-us; rv:1.9.0.1) gecko/2008071615 fedora/3.0.1-1.fc9 firefox/3.0.1 ')]

This gets the instance of a browser, the BR object. With this object, you can manipulate the Web page:

# Open Some site, let's pick a random one, the first pops in mind:R = Br.Open(' http://www.baidu.com ') HTML = R.Read()# Show The sourcePrintHtml# orPrintBr.response ().Read()# Show the HTML titlePrintBr.title ()# Show The response headersPrintR.info ()# orPrintBr.response (). info ()# Show the available forms forF in Br.forms ():PrintF# Select The first (index zero) FormBr.select_form (nr=0)# let ' s searchbr.form[' Q ']=' Weekend codes 'Br.submit ()PrintBr.response ().Read()# Looking at some results in link format forL in Br.links (url_regex=' Stockrt '):PrintL

In addition, if the site visited needs to be verified (http basic auth), then:

Ifprotected site didn‘tendwith410in your facebr.add_password(‘http‘username‘password‘)br.open(‘http://safe-site.domain‘)

In addition to this method, storing and re-sending this session cookie has been done by the cookie jar and can manage the browser history:. In addition, there are many applications, as shown below:

# Downloadf = br.retrieve(‘http://www.google.com.br/intl/pt-BR_br/images/logo.gif‘)[0]printopen(f)

To set the proxy for http:

# Proxy and user/passwordbr.set_proxies({"http""joe:[email protected]:3128"})# Proxybr.set_proxies({"http""myproxy.example.com:3128"})# Proxy passwordbr.add_proxy_password("joe""password")

Fallback (back):

Print URL to verify fallback

    # Back    br.back()    print br.geturl()

Analog Google and Baidu query:

即打印和选择forms,然后填写相应键值,通过post提交完成操作
    forin br.forms():        print f    br.select_form(nr=0)
谷歌查询football
    br.form[‘q‘‘football‘    br.submit()    print br.response().read()
百度查询football
    br.form[‘wd‘‘football‘    br.submit()    print br.response().read()
相应键值名,可以通过打印查出

More information you can go to the official website to view

In addition, the use of Mechanize simulation browser to continue to visit the Web page is able to brush a variety of visits to the blog, including CSDN, I declare, I test brush 10 access is not engaged, after all, brush access is a very little thing, and no meaning, write a blog is to summarize their own, Also in order to help others, sharing experience, to pursue what traffic, points are meaningless, advise you also do not mess. And this is easy to check, the consequences of being detected is very serious, the simple script is as follows, this is to brush a page 100 times, interval 1 seconds:

#!/usr/bin/env pythonImportMechanizeImportCookielib fromTimeImportCtime,sleep def run():    Print ' start! '     forIinchRange -): browse ()Print "Run"I"Times", CTime () sleep (1) def Browse():br = Mechanize. Browser () CJ = Cookielib. Lwpcookiejar () Br.set_cookiejar (CJ) Br.set_handle_equiv (True) Br.set_handle_gzip (True) Br.set_handle_redirect (True) Br.set_handle_referer (True) Br.set_handle_robots (False) Br.set_handle_refresh (mechanize._http. Httprefreshprocessor (), max_time=1) Br.addheaders = [(' User-agent ',' mozilla/5.0 (X11; U Linux i686; En-us; rv:1.9.0.1) gecko/2008071615 fedora/3.0.1-1.fc9 firefox/3.0.1 ')] R = Br.open (' http://www.baidu.com ') HTML = R.read ()#print HTMLRun ()Print "!!!!!!!!!!!!!!!!!! All over!!!!!!!!!!!!!!!!!! \ n%s "%ctime ()

I am still a student, write bad place also please correct me,

Reprint please specify the source:

http://blog.csdn.net/sunmc1204953974

Python uses mechanize to simulate the browser

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.