Python uses mechanize to simulate the browser

Source: Internet
Author: User

Python uses mechanize to simulate the browser

Before I used my own URLLIB2 simulation browser to visit the Web and other operations, very many sites will be error, but also return garbled. After the use of the Mechanize simulation browser, these conditions have not appeared, really very useful. Here to recommend to you.
Mechanize is the replacement of some of the functions of the URLLIB2, which can better simulate browser behavior, and is more comprehensive in Web Access control.

Let's start with the installation and take the Ubuntu system as an example:

Python Most third-party packages. are standard installation methods, after downloading from the official website. Unzip to a folder and execute the command within this folder:

Python setup.py Install

Website website :
http://wwwsearch.sourceforge.net/mechanize/

A more convenient way is to install the Easy_install tool First:

Under normal circumstances. We're going to install a third-party expansion pack for Python. We must download the compressed package, unzip it to a folder, then command line or terminal to open this folder and then execute
Python setup.py Install
To install them.


And with Easy_install, we can execute the command line directly.
Easy_install xxx
The latest version of XXX packaging up
So Easy_install is just for us. It's easier to install a third-party expansion pack

Installation method:

Download the Easy_install installation package first. :
Http://pypi.python.org/pypi/setuptools

Download the corresponding version number, Windows above directly execute EXE installation will be able to

Linux can be executed directly above
SH Setuptools-0.6c9-py2.4.egg

After the installation is complete. Easy_install will be actively copied to the Bin folder, which is under our path path. So we can execute the easy_install command directly in the terminal.

There are also simple installation methods under Ubuntu:

The commands for installing Easy_install are as follows:

sudo apt-get install Python-setuptools

Then install mechanize with Easy_install. That

sudo easy_install mechanize

After installation, you can use it happily. The first is to emulate the code of a browser:

Import Mechanizeimport Cookielib# Browserbr = Mechanize. Browser ()# Cookie JarCJ = Cookielib. Lwpcookiejar () Br.set_cookiejar (CJ)# Browser OptionsBr.set_handle_equiv (True) Br.set_handle_gzip (True) Br.set_handle_redirect (True) Br.set_handle_referer (true) br.set _handle_robots (False)# follows refresh 0 hangs on refresh > 0Br.set_handle_refresh (mechanize._http. Httprefreshprocessor (), max_time=1)# Want debugging messages?#br. Set_debug_http (True)#br. Set_debug_redirects (True)#br. set_debug_responses (True)# user-agent (This is cheating, OK?)Br.addheaders = [(' User-agent ',' mozilla/5.0 (X11; U Linux i686; En-us; rv:1.9.0.1) gecko/2008071615 fedora/3.0.1-1.fc9 firefox/3.0.1 ')]

This gets the instance of a browser, the BR object.

With this object, you can manipulate the Web page:

# Open Some site, let's pick a random one, the first pops in mind:R = Br.Open(' http://www.baidu.com ') HTML = R.Read()# Show The sourcePrintHtml# orPrintBr.response ().Read()# Show the HTML titlePrintBr.title ()# Show The response headersPrintR.info ()# orPrintBr.response (). info ()# Show the available forms forF in Br.forms ():PrintF# Select The first (index zero) FormBr.select_form (nr=0)# let ' s searchbr.form[' Q ']=' Weekend codes 'Br.submit ()PrintBr.response ().Read()# Looking at some results in link format forL in Br.links (url_regex=' Stockrt '):PrintL

Also assume that the site to be visited needs to be verified (http basic auth), then:

Ifprotected site didn‘tendwith410in your facebr.add_password(‘http‘username‘password‘)br.open(‘http://safe-site.domain‘)

In addition, the cookie jar has been used to store and re-send the session cookie. And be able to manage browser history:. In addition, there are many applications. For example, download the following:

# Downloadf = br.retrieve(‘http://www.google.com.br/intl/pt-BR_br/images/logo.gif‘)[0]printopen(f)

To set the proxy for http:

# Proxy and user/passwordbr.set_proxies({"http""joe:[email protected]:3128"})# Proxybr.set_proxies({"http""myproxy.example.com:3128"})# Proxy passwordbr.add_proxy_password("joe""password")

Fallback (back):

Print URL to verify fallback

    # Back    br.back()    print br.geturl()

Analog Google and Baidu query:

即打印和选择forms,然后填写对应键值。通过post提交完毕操作
    forin br.forms():        print f    br.select_form(nr=0)
谷歌查询football
    br.form[‘q‘‘football‘    br.submit()    print br.response().read()
百度查询football
    br.form[‘wd‘‘football‘    br.submit()    print br.response().read()
对应键值名,能够通过打印查出

A lot of other information you can go to the official website to view

In addition to using Mechanize simulation browser to continue to visit the Web page is able to brush a variety of blog access, including CSDN, I declare, I measured a test brush 10 visits will not engage, after all, brush access to the volume is a very little thing. And it doesn't make sense to write a blog well to summarize yourself. Also to help others, to share experience. To pursue the amount of interview. There is no point in integrating points, and we advise you not to mess around. And this is very easy to check. The consequences of being detected but very serious, simple scripts such as the following, this is to brush a page 100 times, interval 1 seconds:

#!/usr/bin/env pythonImportMechanizeImportCookielib fromTimeImportCtime,sleep def run():    Print ' start! '     forIinchRange -): browse ()Print "Run"I"Times", CTime () sleep (1) def Browse():br = Mechanize. Browser () CJ = Cookielib. Lwpcookiejar () Br.set_cookiejar (CJ) Br.set_handle_equiv (True) Br.set_handle_gzip (True) Br.set_handle_redirect (True) Br.set_handle_referer (True) Br.set_handle_robots (False) Br.set_handle_refresh (mechanize._http. Httprefreshprocessor (), max_time=1) Br.addheaders = [(' User-agent ',' mozilla/5.0 (X11; U Linux i686; En-us; rv:1.9.0.1) gecko/2008071615 fedora/3.0.1-1.fc9 firefox/3.0.1 ')] R = Br.open (' http://www.baidu.com ') HTML = R.read ()#print HTMLRun ()Print "!!!!!!!!!!!!!!!!!! All over!!!!!!!!!!!!!!!!!! \ n%s "%ctime ()

I am still a student, write bad place also please correct me,

Reprint please specify the source:

http://blog.csdn.net/sunmc1204953974

Python uses mechanize to simulate the browser

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.