How to use python to parse the python library pyquery

Source: Internet
Author: User
PyQuery is a Python library similar to jQuery. it can also be said that it is implemented by jQuery in Python. it can use jQuery syntax to parse HTML documents. it is easy to use and fast to parse. for example

The code is as follows:



Director: Tom tykwe/Larna vdrosky/Andy vdroskski

Scriptwriter: Tom ticwe/Andy wozowski/Larna wojosky

Starring: tom Hanks/Haly Berry/Jim Braud bunt/Hugo Vivin/Jim stgis/Pei Douna/Ben Wei Xiao/James Xi/Zhou Xun/case David/David giyaxi/ susan Jordan/Hugh Grant

Type: Plot/sci-fi/Suspense

Official website: cloudatlas.warnerbros.com

Production country/region: Germany/United States/Hong Kong/Singapore

Language: English

Release date: (Mainland China)/(United States)

Title length: 134 minutes (Mainland China)/172 minutes (United States)

IMDb link: tt1371111

Official website:
Movie cloud maps


The code is as follows:


From pyquery import PyQuery as pq
Doc = pq (url = 'http: // movie.douban.com/subject/415403 /')
Data = doc ('. pl ')
For I in data:
Print pq (I). text ()

Output

The code is as follows:


Director
Scriptwriter
Starring
Type:
Official website:
Production country/region:
Language:
Release date:
Title length:
IMDb link:
Official website:

Usage

You can use the PyQuery class to load xml documents from strings, lxml objects, files, or URLs:

The code is as follows:


>>> From pyquery import PyQuery as pq
>>> From lxml import etree
>>> Doc = pq ("")
>>> Doc = pq (etree. fromstring (""))
>>> Doc = pq (filename = path_to_html_file)
>>> Doc = pq (url = 'http: // movie.douban.com/subject/415403 /')

You can select an object like jQuery.

The code is as follows:


>>> Doc ('. pl ')
[,,,,,,,,,,,,,,,,,,, ]

In this way, all objects whose class is 'pl' are selected.

However, text needs to be re-encapsulated when iteration is used:

The code is as follows:


For para in doc ('. pl '):
Para = pq (para)
Print para. text ()
Director
Scriptwriter
Starring
Type:
Official website:
Production country/region:
Language:
Release date:
Title length:
IMDb link:
Official website:

The resulting text is a unicode code. if you want to write a file, you must encode it as a string.
You can use some pseudo classes provided by jquery (but css is not supported) for operations, such:

The code is as follows:


>>> Doc ('. pl: first ')
[]
>>> Print doc ('. pl: first'). text ()
Director

Attributes
Get attributes of html elements

The code is as follows:


>>> P = pq ('

') ('P ')
>>> P. attr ('id ')
'Hello'
>>> P. attr. id
'Hello'
>>> P. attr ['id']
'Hello'

Assignment

The code is as follows:


>>> P. attr. id = 'plop'
>>> P. attr. id
'Plop'
>>> P. attr ['id'] = 'Ola'
>>> P. attr. id
'Ola'
>>> P. attr (id = 'hello', class _ = 'hello2 ')
[ ]

Traversing
Filter

The code is as follows:


>>> D = pq ('

Hello

World

')
>>> D ('P'). filter ('. hello ')
[ ]
>>> D ('P'). filter ('# test ')
[ ]
>>> D ('P'). filter (lambda I: I = 1)
[ ]
>>> D ('P'). filter (lambda I: I = 0)
[ ]
>>> D ('P'). filter (lambda I: pq (this). text () = 'Hello ')
[ ]

Select in order

The code is as follows:


>>> D ('P'). eq (0)
[ ]
>>> D ('P'). eq (1)
[ ]

Select embedded element

The code is as follows:


>>> D ('P'). eq (1). find ('A ')
[]

Select parent element

The code is as follows:


>>> D = pq ('

Whoah!

There

')
>>> D ('P'). eq (1). find ('em ')
[ ]
>>> D ('P'). eq (1). find ('em '). end ()
[

]
>>> D ('P'). eq (1). find ('em '). end (). text ()
'There'
>>> D ('P'). eq (1). find ('em '). end (). end ()
[

,

]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.