For example
The code is as follows:
Director : Tom Tykwer/Lana Wachowski/Andy Vodoski
screenwriter : Tom Tykwer/Andy Vodoski/Lana Wachowski
starring : Tom Hanks/Halle Berry/Jim Braudbent/Hugo • Weaving/Jim Stergis/BAE Doona/Ben Wishow/James Dasi/Zhou Xun/Kess David/David Gijaci/Susan Sarandon/Hugh Grant /c4>
Type: plot/Sci Fi/Suspense
Official website:Cloudatlas.warnerbros.com
Production country/region:Germany/USA/HONG Kong/Singapore
Language:English
Release Date: 2013-01-31 (Mainland China)/2012-10-26 (USA)
length of the film: 134 mins (Mainland China)/172 minutes (USA)
IMDb Links:tt1371111
Official station:
The film "Cloud"
The code is as follows:
From pyquery import Pyquery as PQ
DOC=PQ (url= ' http://movie.douban.com/subject/3530403/')
Data=doc ('. pl ')
For I in data:
Print PQ (i). Text ()
Output
The code is as follows:
Director
Writers
Starring
Type:
Official website:
Production Country/region:
Language:
Release Date:
Length of the film:
IMDb Links:
Official station:
Usage
Users can use the Pyquery class to load XML documents from strings, lxml objects, files, or URLs:
The code is as follows:
>>> from pyquery import Pyquery as PQ
>>> from lxml import etree
>>> doc=pq ("")
>>> DOC=PQ (Etree.fromstring (""))
>>> DOC=PQ (Filename=path_to_html_file)
>>> doc=pq (url= ' http://movie.douban.com/subject/3530403/')
You can choose objects like jquery.
The code is as follows:
>>> doc ('. Pl ')
[ , , , , , , , , , , , , , , , , , , , ]
In this way, all objects of class ' pl ' are selected.
However, the text needs to be re-encapsulated when using iterations:
The code is as follows:
For Para in Doc ('. pl '):
PARA=PQ (para)
Print Para.text ()
Director
Writers
Starring
Type:
Official website:
Production Country/region:
Language:
Release Date:
Length of the film:
IMDb Links:
Official station:
The text that gets here is the Unicode code that needs to be encoded as a string if you want to write the file.
Users can use some of the pseudo-classes provided by jquery (but do not yet support CSS) to operate, such as:
The code is as follows:
>>> doc ('. Pl:first ')
[ ]
>>> Print doc ('. Pl:first '). Text ()
Director
Attributes
Get the attributes of an HTML element
The code is as follows:
>>> P=PQ ('
') (' P ')
>>> p.attr (' id ')
' Hello '
>>> p.attr.id
' Hello '
>>> p.attr[' id ']
' Hello '
Assign value
The code is as follows:
>>> p.attr.id= ' plop '
>>> p.attr.id
' Plop '
>>> p.attr[' id ']= ' ola '
>>> p.attr.id
' Ola '
>>> p.attr (id= ' Hello ', class_= ' Hello2 ')
[ ]
Traversing
Filter
The code is as follows:
>>> D=PQ ('
Hello
World
')
>>> d (' P '). Filter ('. Hello ')
[ ]
>>> d (' P '). Filter (' #test ')
[ ]
>>> d (' P '). Filter (lambda i:i==1)
[ ]
>>> d (' P '). Filter (lambda i:i==0)
[ ]
>>> d (' P '). Filter (lambda I:PQ (this). Text () = = ' Hello ')
[ ]
In order select
Code as follows:
>>> d (' P '). EQ (0)
[ ]
>>> d (' P '). EQ (1)
[ ]
Select inline elements
Code as follows:
>>> d (' P '). EQ (1). Find (' a ')
[]
Select parent element
code is as follows:
>>> d=pq ('
whoah!
there
')
>>> D (' P '). EQ (1). Find (' em ')
[
]
>>> d (' P '). EQ (1). Find (' em ' ). End ()
[]
>>> D (' P '). EQ (1). Find (' em '). End (). Text ()
' there '
>>> D (' P '). EQ (1). Find (' em '). End (). end ()
[
,
]