Machize (1)

Last Update:2014-07-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I recently read about web crawlers and simulated login. I found such a package

Machize['Mek?.Na?Z]It is also known as the meaning of mechanization. It does mean automation.

mechanize.BrowserAndmechanize.UserAgentBaseImplement the interfaceurllib2.OpenerDirector, So:

Any URL can be opened, not justhttp:
mechanize.UserAgentBaseOffers Easy dynamic configuration of User-Agent features like protocol, Cookie, redirection androbots.txtHandling, without having to make a newOpenerDirectorEach time, e.g. By callingbuild_opener().
Easy HTML form filling.
Convenient link parsing and following.
Browser History (.back()And.reload()Methods ).
TheRefererHTTP header is added properly (optional ).
Automatic observancerobots.txt.
Automatic handling of HTTP-equiv and refresh.

That is to say, the values of "machize. Browser" and"Machize. useragentbase isurllib2.OpenerDirectorSo, including the HTTP protocol, all protocols can be opened.

In addition, it provides a simpler configuration method instead of creating a newOpenerDirector

Operations on table lists, browsing history and reload operations on chain operations, monitoring operations on robots.txt, etc.

Import reimport mechanic
(1) instantiate a browser object BR = mechanic. browser () (2) open a URL
BR. Open ("http://www.example.com/") (3) 2nd links under the page meeting text_regex
# Follow second link with element text matching regular expressionresponse1 = BR. follow_link (text_regex = r "cheese \ s * shop", Nr = 1) assert BR. viewing_html () (4) webpage name
Print Br. Title () (5) print the web site
Print response1.geturl () (6) webpage Header
Print response1.info () # headers (7) webpage body
Print response1.read () # Body
(8) Select formbr. select_form (name = "order") for name = "order" # browser passes through unknown attributes (including methods) # To the selected htmlform.
(9) Assign BR ["Cheeses"] = ["Mozzarella", "caerphilly"] # (the method here is _ setitem _) to the form with name = cheeses __) # submit current form. browser CILS. close () on the current response on # navigation, so this closes response1 (10) Submit
Response2 = BR. submit () # print currently selected form (don't call. submit () on this, use BR. submit () print BR. form (11) returns response3 = BR. back () # Back To cheese shop (same data as response1) # The History mechanic returns cached response objects # We can still use the response, even though it was. close () d
Response3.get _ data () # Like. Seek (0) followed by. Read () (12) refresh the webpage
Response4 = Br. Reload () # fetches from server (13) This can list all the forms on this page
For form in BR. forms (): print form #. links () optionally accepts the keyword ARGs. follow _/. find_link () for link in BR. links (url_regex = "python.org"): Print link BR. follow_link (Link) # Takes either link instance or keyword ARGs BR. back ()

This is an example provided in the document. The basic explanation has been provided in the Code.

You may controlBrowser's policyBy using the methodsmechanize.Browser'S base class,mechanize.UserAgent. For example:

Passmechanize.UserAgentIn this module, we can implementBrowser's policyThe Code is as follows. It is also an example from the document:

br = mechanize.Browser()# Explicitly configure proxies (Browser will attempt to set good defaults).# Note the userinfo ("joe:[email protected]") and port number (":3128") are optional.br.set_proxies({"http": "joe:[email protected]:3128","ftp": "proxy.example.com", })# Add HTTP Basic/Digest auth username and password for HTTP proxy access.# (equivalent to using "joe:[email protected]" form above)
br.add_proxy_password("joe", "password")
# Add HTTP Basic/Digest auth username and password for website access.br.add_password("http://example.com/protected/", "joe", "password")
# Don‘t handle HTTP-EQUIV headers (HTTP headers embedded in HTML).br.set_handle_equiv(False)
# Ignore robots.txt.  Do not do this without thought and consideration.br.set_handle_robots(False)
# Don‘t add Referer (sic) headerbr.set_handle_referer(False)
# Don‘t handle Refresh redirectionsbr.set_handle_refresh(False)
# Don‘t handle cookiesbr.set_cookiejar()
# Supply your own mechanize.CookieJar (NOTE: cookie handling is ON by# default: no need to do this unless you have some reason to use a# particular cookiejar)br.set_cookiejar(cj)
# Log information about HTTP redirects and Refreshes.br.set_debug_redirects(True)
# Log HTTP response bodies (ie. the HTML, most of the time).br.set_debug_responses(True)
# Print HTTP headers.br.set_debug_http(True)# To make sure you‘re seeing all debug output:logger = logging.getLogger("mechanize")logger.addHandler(logging.StreamHandler(sys.stdout))logger.setLevel(logging.INFO)# Sometimes it‘s useful to process bad headers or bad HTML:response = br.response()  # this is a copy of responseheaders = response.info()  # currently, this is a mimetools.Messageheaders["Content-type"] = "text/html; charset=utf-8"response.set_data(response.get_data().replace("<!---", "<!--"))br.set_response(response)

In addition, there are some webpage interaction modules similar to the Mechanism,

There are several wrappers around mechanic designed for functional testing of Web applications:

zope.testbrowser
Twill

In the final analysis, they all encapsulate urllib2. Therefore, you can select a better module!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Machize (1)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support