PyQuery is a Python library similar to jQuery. it can also be said that it is implemented by jQuery in Python. it can use jQuery syntax to parse HTML documents. it is easy to use and fast to parse. for example
The code is as follows:
Director: Tom tykwe/Larna vdrosky/Andy vdroskski
Scriptwriter: Tom ticwe/Andy wozowski/Larna wojosky
Starring: tom Hanks/Haly Berry/Jim Braud bunt/Hugo Vivin/Jim stgis/Pei Douna/Ben Wei Xiao/James Xi/Zhou Xun/case David/David giyaxi/ susan Jordan/Hugh Grant
Type: Plot/sci-fi/Suspense
Official website: cloudatlas.warnerbros.com
Production country/region: Germany/United States/Hong Kong/Singapore
Language: English
Release date: (Mainland China)/(United States)
Title length: 134 minutes (Mainland China)/172 minutes (United States)
IMDb link: tt1371111
Official website:
Movie cloud maps
The code is as follows:
From pyquery import PyQuery as pq
Doc = pq (url = 'http: // movie.douban.com/subject/415403 /')
Data = doc ('. pl ')
For I in data:
Print pq (I). text ()
Output
The code is as follows:
Director
Scriptwriter
Starring
Type:
Official website:
Production country/region:
Language:
Release date:
Title length:
IMDb link:
Official website:
Usage
You can use the PyQuery class to load xml documents from strings, lxml objects, files, or URLs:
The code is as follows:
>>> From pyquery import PyQuery as pq
>>> From lxml import etree
>>> Doc = pq ("")
>>> Doc = pq (etree. fromstring (""))
>>> Doc = pq (filename = path_to_html_file)
>>> Doc = pq (url = 'http: // movie.douban.com/subject/415403 /')
You can select an object like jQuery.
The code is as follows:
>>> Doc ('. pl ')
[,,,,,,,,,,,,,,,,,,, ]
In this way, all objects whose class is 'pl' are selected.
However, text needs to be re-encapsulated when iteration is used:
The code is as follows:
For para in doc ('. pl '):
Para = pq (para)
Print para. text ()
Director
Scriptwriter
Starring
Type:
Official website:
Production country/region:
Language:
Release date:
Title length:
IMDb link:
Official website:
The resulting text is a unicode code. if you want to write a file, you must encode it as a string.
You can use some pseudo classes provided by jquery (but css is not supported) for operations, such:
The code is as follows:
>>> Doc ('. pl: first ')
[]
>>> Print doc ('. pl: first'). text ()
Director
Attributes
Get attributes of html elements
The code is as follows:
>>> P = pq ('
') ('P ')
>>> P. attr ('id ')
'Hello'
>>> P. attr. id
'Hello'
>>> P. attr ['id']
'Hello'
Assignment
The code is as follows:
>>> P. attr. id = 'plop'
>>> P. attr. id
'Plop'
>>> P. attr ['id'] = 'Ola'
>>> P. attr. id
'Ola'
>>> P. attr (id = 'hello', class _ = 'hello2 ')
[ ]
Traversing
Filter
The code is as follows:
>>> D = pq ('
Hello
World
')
>>> D ('P'). filter ('. hello ')
[ ]
>>> D ('P'). filter ('# test ')
[ ]
>>> D ('P'). filter (lambda I: I = 1)
[ ]
>>> D ('P'). filter (lambda I: I = 0)
[ ]
>>> D ('P'). filter (lambda I: pq (this). text () = 'Hello ')
[ ]
Select in order
The code is as follows:
>>> D ('P'). eq (0)
[ ]
>>> D ('P'). eq (1)
[ ]
Select embedded element
The code is as follows:
>>> D ('P'). eq (1). find ('A ')
[]
Select parent element
The code is as follows:
>>> D = pq ('
Whoah!
There
')
>>> D ('P'). eq (1). find ('em ')
[
]
>>> D ('P'). eq (1). find ('em '). end ()
[]
>>> D ('P'). eq (1). find ('em '). end (). text ()
'There'
>>> D ('P'). eq (1). find ('em '). end (). end ()
[
,
]