Python parsing HTML Development Library pyquery

Last Update:2014-02-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

PyQuery is a Python library similar to jQuery. It can also be said that it is implemented by jQuery in Python. It can use jQuery syntax to parse HTML documents, which is easy to use and fast to parse.

For example, a piece of watercress html fragment http://movie.douban.com/subject/3530403/

Director: Tom tykwe/larna vdrosky/Andy vdroskski scriptwriter: Tom ticwe/Andy vdroskski/larna vdrosky Starring: tom Hanks/HALY berry/Jim braud bunt/Hugo vivin/Jim stgis/pei douna/BEN Wei Xiao/James XI/Zhou Xun/case David/David giyaxi/ susan Jordan/hugh grant: official Website: cloudatlas.warnerbros.com: Germany/US/Hong Kong/Singapore language: English Release Date: (Mainland China)/2012-10-26 (United States) Title long: 134 minutes (Mainland China)/172 minutes (United States) IMDb link: tt1371111 Official Website: Movie cloudView Code

from pyquery import PyQuery as pqdoc=pq(url='http://movie.douban.com/subject/3530403/')data=doc('.pl')for i in data:    print pq(i).text()

Output

Director scriptwriter starring type: Official Website: production country/region: Language: Release Date: film length: IMDb link: Official Website:

It looks like jQuery.

Usage

You can use the PyQuery class to load xml documents from strings, lxml objects, files, or urls:

>>> from pyquery import PyQuery as pq>>> from lxml import etree>>> doc=pq("
You can select an object like jQuery.
>>> doc('.pl')[<span.pl>, <span.pl>, <span.pl>, <span.pl>, <span.pl>, <span.pl>, <span.pl>, <span.pl>, <span.pl>, <span.pl>, <span.pl>, <span#rateword.pl>, <span.pl>, <span.pl>, <span.pl>, <span.pl>, <span.pl>, <span.pl>, <span.pl>, <p.pl>]
In this way, all objects whose class is 'pl' are selected.
However, text needs to be re-encapsulated When iteration is used:
For para in doc ('. pl '): para = pq (para) print para. text () Director screenwriter starring type: Official Website: production country/region: Language: Release Date: Length: IMDb link: Official Website:
The resulting text is a unicode code. If you want to write a file, you must encode it as a string.
You can use some pseudo classes provided by jquery (but css is not supported) for operations, such:
>>> Doc ('. pl: first') [<span. pl>] >>> print doc ('. pl: first'). text () DirectorAttributes
Get attributes of html elements
>>> p=pq('<p id="hello" class="hello"></p>')('p')>>> p.attr('id')'hello'>>> p.attr.id'hello'>>> p.attr['id']'hello'
Assignment
>>> p.attr.id='plop'>>> p.attr.id'plop'>>> p.attr['id']='ola'>>> p.attr.id'ola'>>> p.attr(id='hello',class_='hello2')[<p#hello.hell0>]Traversing
Filter
>>> d=pq('<p id="hello" class="hello"><a/>hello</p><p id="test"><a/>world</p>')>>> d('p').filter('.hello')[<p#hello.hello>]>>> d('p').filter('#test')[<p#test>]>>> d('p').filter(lambda i:i==1)[<p#test>]>>> d('p').filter(lambda i:i==0)[<p#hello.hello>]>>> d('p').filter(lambda i:pq(this).text()=='hello')[<p#hello.hello>]
Select in order
>>> d('p').eq(0)[<p#hello.hello>]>>> d('p').eq(1)[<p#test>]
Select embedded Element
>>> d('p').eq(1).find('a')[<a>]
Select parent Element
>>> d=pq('<p><span><em>Whoah!</em></span></p><p><em> there</em></p>')>>> d('p').eq(1).find('em')[<em>]>>> d('p').eq(1).find('em').end()[<p>]>>> d('p').eq(1).find('em').end().text()'there'>>> d('p').eq(1).find('em').end().end()[<p>, <p>]
　　
Download: http://pypi.python.org/pypi/pyquery
Document: http://packages.python.org/pyquery/
Selector Summary: http://www.cnblogs.com/onlys/articles/jQuery.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python parsing HTML Development Library pyquery

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python parsing HTML Development Library pyquery

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support