Python uses mechanize to simulate the browser
Before I used my own URLLIB2 simulation browser to visit the Web and other operations, very many sites will be error, but also return garbled. After the use of the Mechanize simulation browser, these conditions have not appeared, really very useful. Here to recommend to you.
Mechanize is the replacement of some of the functions of the URLLIB2, which can better simulate browser behavior, and is more comprehensive in Web Access control.
Let's start with the installation and take the Ubuntu system as an example:
Python Most third-party packages. are standard installation methods, after downloading from the official website. Unzip to a folder and execute the command within this folder:
Python setup.py Install
Website website :
http://wwwsearch.sourceforge.net/mechanize/
A more convenient way is to install the Easy_install tool First:
Under normal circumstances. We're going to install a third-party expansion pack for Python. We must download the compressed package, unzip it to a folder, then command line or terminal to open this folder and then execute
Python setup.py Install
To install them.
And with Easy_install, we can execute the command line directly.
Easy_install xxx
The latest version of XXX packaging up
So Easy_install is just for us. It's easier to install a third-party expansion pack
Installation method:
Download the Easy_install installation package first. :
Http://pypi.python.org/pypi/setuptools
Download the corresponding version number, Windows above directly execute EXE installation will be able to
Linux can be executed directly above
SH Setuptools-0.6c9-py2.4.egg
After the installation is complete. Easy_install will be actively copied to the Bin folder, which is under our path path. So we can execute the easy_install command directly in the terminal.
There are also simple installation methods under Ubuntu:
The commands for installing Easy_install are as follows:
sudo apt-get install Python-setuptools
Then install mechanize with Easy_install. That
sudo easy_install mechanize
After installation, you can use it happily. The first is to emulate the code of a browser:
Import Mechanizeimport Cookielib# Browserbr = Mechanize. Browser ()# Cookie JarCJ = Cookielib. Lwpcookiejar () Br.set_cookiejar (CJ)# Browser OptionsBr.set_handle_equiv (True) Br.set_handle_gzip (True) Br.set_handle_redirect (True) Br.set_handle_referer (true) br.set _handle_robots (False)# follows refresh 0 hangs on refresh > 0Br.set_handle_refresh (mechanize._http. Httprefreshprocessor (), max_time=1)# Want debugging messages?#br. Set_debug_http (True)#br. Set_debug_redirects (True)#br. set_debug_responses (True)# user-agent (This is cheating, OK?)Br.addheaders = [(' User-agent ',' mozilla/5.0 (X11; U Linux i686; En-us; rv:1.9.0.1) gecko/2008071615 fedora/3.0.1-1.fc9 firefox/3.0.1 ')]
This gets the instance of a browser, the BR object.
With this object, you can manipulate the Web page:
# Open Some site, let's pick a random one, the first pops in mind:R = Br.Open(' http://www.baidu.com ') HTML = R.Read()# Show The sourcePrintHtml# orPrintBr.response ().Read()# Show the HTML titlePrintBr.title ()# Show The response headersPrintR.info ()# orPrintBr.response (). info ()# Show the available forms forF in Br.forms ():PrintF# Select The first (index zero) FormBr.select_form (nr=0)# let ' s searchbr.form[' Q ']=' Weekend codes 'Br.submit ()PrintBr.response ().Read()# Looking at some results in link format forL in Br.links (url_regex=' Stockrt '):PrintL
Also assume that the site to be visited needs to be verified (http basic auth), then:
Ifprotected site didn‘tendwith410in your facebr.add_password(‘http‘username‘password‘)br.open(‘http://safe-site.domain‘)
In addition, the cookie jar has been used to store and re-send the session cookie. And be able to manage browser history:. In addition, there are many applications. For example, download the following:
# Downloadf = br.retrieve(‘http://www.google.com.br/intl/pt-BR_br/images/logo.gif‘)[0]printopen(f)
To set the proxy for http:
# Proxy and user/passwordbr.set_proxies({"http""joe:[email protected]:3128"})# Proxybr.set_proxies({"http""myproxy.example.com:3128"})# Proxy passwordbr.add_proxy_password("joe""password")
Fallback (back):
Print URL to verify fallback
# Back br.back() print br.geturl()
Analog Google and Baidu query:
即打印和选择forms,然后填写对应键值。通过post提交完毕操作
forin br.forms(): print f br.select_form(nr=0)
谷歌查询football
br.form[‘q‘‘football‘ br.submit() print br.response().read()
百度查询football
br.form[‘wd‘‘football‘ br.submit() print br.response().read()
对应键值名,能够通过打印查出
A lot of other information you can go to the official website to view
In addition to using Mechanize simulation browser to continue to visit the Web page is able to brush a variety of blog access, including CSDN, I declare, I measured a test brush 10 visits will not engage, after all, brush access to the volume is a very little thing. And it doesn't make sense to write a blog well to summarize yourself. Also to help others, to share experience. To pursue the amount of interview. There is no point in integrating points, and we advise you not to mess around. And this is very easy to check. The consequences of being detected but very serious, simple scripts such as the following, this is to brush a page 100 times, interval 1 seconds:
#!/usr/bin/env pythonImportMechanizeImportCookielib fromTimeImportCtime,sleep def run(): Print ' start! ' forIinchRange -): browse ()Print "Run"I"Times", CTime () sleep (1) def Browse():br = Mechanize. Browser () CJ = Cookielib. Lwpcookiejar () Br.set_cookiejar (CJ) Br.set_handle_equiv (True) Br.set_handle_gzip (True) Br.set_handle_redirect (True) Br.set_handle_referer (True) Br.set_handle_robots (False) Br.set_handle_refresh (mechanize._http. Httprefreshprocessor (), max_time=1) Br.addheaders = [(' User-agent ',' mozilla/5.0 (X11; U Linux i686; En-us; rv:1.9.0.1) gecko/2008071615 fedora/3.0.1-1.fc9 firefox/3.0.1 ')] R = Br.open (' http://www.baidu.com ') HTML = R.read ()#print HTMLRun ()Print "!!!!!!!!!!!!!!!!!! All over!!!!!!!!!!!!!!!!!! \ n%s "%ctime ()
I am still a student, write bad place also please correct me,
Reprint please specify the source:
http://blog.csdn.net/sunmc1204953974
Python uses mechanize to simulate the browser