Python automatically submits and captures web pages

Source: Internet
Author: User
I recently studied how to make an automatic poster. it is very difficult to complete this tool. the verification code is a big problem (I have not thought of a solution yet, no matter what ), the solution is to capture, analyze, and submit pages. The following is written in python and uses lxml for html analysis. what we can see on the internet is that the analysis speed is the fastest, but it has not been verified. Okay, go to the code.

The code is as follows:


Import urllib
Import urllib2
Import urlparse
Import lxml.html
Def url_with_query (url, values ):
Parts = urlparse. urlparse (url)
Rest, (query, frag) = parts [:-2], parts [-2:]
Return urlparse. urlunparse (rest + (urllib. urlencode (values), None ))
Def make_open_http ():
Opener = urllib2.build _ opener (urllib2.HTTPCookieProcessor ())
Opener. addheaders = [] # pretend we're a human -- don't do this
Def open_http (method, url, values = {}):
If method = "POST ":
Return opener. open (url, urllib. urlencode (values ))
Else:
Return opener. open (url_with_query (url, values ))
Return open_http
Open_http = make_open_http ()
Tree = lxml.html. fromstring (open_http ("GET", "http://www.jb51.net"). read ())
Form = tree. forms [0]
Form. fields ["q"] = "eplussoft"
Form. action = "http://www.jb51.net/search"
Response = lxml.html. submit_form (form, open_http = open_http)
Html = response. read ()
Doc = lxml.html. fromstring (html)
Lxml.html. open_in_browser (doc)


Well, the verification code is a big problem. Today, I read some things on the Baidu Post Bar, which is even worse. its verification code is an image obtained using ajax, which is even more troublesome. However, it seems that the verification codes of most forums and blogs are the same. In this way, the page captured for the first time will not contain any verification code image, let alone analyze the verification code image. There are still many problems to solve...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.