Use cookielib and urlib2 in Python with PyQuery to capture web page information

Source: Internet
Author: User
This article mainly introduces how to use cookielib and urlib2 in Python with PyQuery to capture web page information. it mainly uses PyQuery to parse HTML. if you need it, you can refer to what I just learned, I suddenly thought of the idea of creating a course schedule, so Baidu got up.

At the beginning, I thought like this: When writing a wall, I used urllib2 [two lines of code to capture the web page], so there was only html parsing. So Baidu: python parses html. I found a good article about pyQuery.

PyQuery is the implementation of jQuery in Python. it can use jQuery syntax to parse HTML documents. You need to install it before use. The Mac installation method is as follows:

sudo easy_install pyquery

OK! Installed!

Let's give it a try:

From pyquery import PyQuery as pqhtml = pq (url = u'http: // seam.ustb.edu.cn: 8080/sort GL/index. jsp ') # Now you have obtained the htmlclasses = html ('. haveclass ') # obtain elements by class name # If you are familiar with jQuery, you must understand the convenience of pyQuery. For more information, see pyQuery API.

It seems that you have learned how to use pyQuery to grasp the Course Table. However, if you use the source code directly, errors will certainly occur. Because you have not logged on yet!

Therefore, before running this line to capture the correct code, we need to simulate logon to the tutorial network. At this time, I think urllib has a function to simulate the post request, so I am Baidu: urllib post.

This is the simplest example of simulating a post request:

Import urllibimport urllib2import cookielibcj = cookielib. cookieJar () opener = urllib2.build _ opener (urllib2.HTTPCookieProcessor (cj) opener. addheaders = [('User-agent', 'mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1) ')] urllib2.install _ opener (opener) req = urllib2.Request ("http://seam.ustb.edu.cn: 8080/logs GL/Login ", urllib. urlencode ({"username": "41255029", "password": "123456", "usertype": "student"}) req. add_header ("Referer", "http://xxoo.com") resp = urllib2.urlopen (req) # cookielib is used here. I don't know much about it later. # urllib and urllib2 are also used, urllib2 is probably an extension package of urllib [233 thought of Three Kingdoms

In this simplest instance, I use my campus network account to submit form data to the logon page to simulate logon.

Now, we have logged on to the tutorial network, and then parse html with the pyQuery to get the course list on the webpage.

html = pq(url=u'http://seam.ustb.edu.cn:8080/jwgl/index.jsp')self.render("index.html",data=html('.haveclass'))

Result Display

Finally:

I found that pyQuery is not only very convenient for parsing html, but also can be used as a tool for cross-origin data capturing. NICE !!!

I hope to help you.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.