For example
Copy Code code as follows:
<div id= "Info" >
<span><span class= ' PL ' > Director </span>: <a href= "/celebrity/1047989/" rel= "V:directedby" > 汤姆·提克威 </a>/<a href= "/celebrity/1161012/" rel= "V:directedby" > Lana Vodoski </a>/<a href= "/celebrity/" 1013899/"rel=" V:directedby "> Andy Vodoski </a></span><br/>
<span><span class= ' pl ' > screenwriter </span>: <a href= "/celebrity/1047989/" > Tom Tikwi </a>/<a href= "/celebrity/1013899/" > Andy Vodoski </a>/<a href= "/celebrity/1161012/" > Lana Vodoski </a></span ><br/>
<span><span class= ' pl ' > starring </span>: <a href= "/celebrity/1054450/" rel= "v:starring" > Tom Hanks </a>/<a href= "/celebrity/1054415/" rel= "v:starring" > Halle Berry </a>/<a href= "/celebrity/1019049/" Rel= "v:starring" > Jim Braudbent </a>/<a href= "/celebrity/1040994/" rel= "v:starring" > Hugo Uighur </a>/< A href= "/celebrity/1053559/" rel= "v:starring" > Jim Stergis </a>/<a href= "/celebrity/1057004/" rel= "V: Starring "> Pei Douna </a>/<a href="/celebrity/1025149/"rel=" v:starring "> Ben Wishow </a>/<a href="/ celebrity/1049713/"rel=" v:starring "> James Dasi </a>/<a href="/celebrity/1027798/"rel=" v:starring "> Zhou Xun </a>/<a href= "/celebrity/1019012/" rel= "v:starring" > Kess David </a>/<a href= "/celebrity/1201851/" Rel= "v:starring" > David Gijaci </a>/<a href= "/celebrity/1054392/" rel= "v:starring" > Susan Sarandon Langdon </a>/<a href= "/celebrity/1003493/" rel= "v:starring" > Doldrums </a></span><br/>
<span class= "PL" > Type:</span> <span property= "V:genre" > Plot </span>/<span property= "V:genre" > Science fiction </span>/<span property= "v:genre" > Suspense </span><br/>
<span class= "PL" > official website:</span> <a href= "http://cloudatlas.warnerbros.com" rel= nofollow "target=" _ Blank ">cloudatlas.warnerbros.com</a><br/>
<span class= "PL" > Producer country/Region:</span> Germany/US/Hong Kong/Singapore <br/>
<span class= "PL" > Language:</span> English <br/>
<span class= "PL" > Release date:</span> <span property= "v:initialreleasedate" content= "2013-01-31 (Mainland China)" > 2013-01-31 (China Mainland) </span>/<span property= "v:initialreleasedate" content= "2012-10-26 (USA)" >2012-10-26 (USA ) </span><br/>
<span class= "PL" >:</span> <span property= "V:runtime" content= "134" >134 minutes (Mainland China) </span>/ 172 mins (USA) <br/>
<span class= "PL" >imdb link:</span> <a href= "http://www.imdb.com/title/tt1371111" target= "_blank" Nofollow ">tt1371111</a><br>
<span class= "PL" > Official station:</span>
<a href= "http://site.douban.com/202494/" target= "_blank" > film "Cloud Picture" </a>
</div>
Copy Code code as follows:
From pyquery import Pyquery as PQ
DOC=PQ (url= ' http://movie.douban.com/subject/3530403/')
Data=doc ('. pl ')
For I in data:
Print PQ (i). Text ()
Output
Copy Code code as follows:
Director
Writers
Starring
Type:
Official website:
Production Country/region:
Language:
Release Date:
Film Length:
IMDB Link:
Official station:
Usage
Users can use the Pyquery class to load an XML document from a string, lxml object, file, or URL:
Copy Code code as follows:
>>> from pyquery import Pyquery as PQ
>>> from lxml import etree
>>> DOC=PQ (">>> DOC=PQ (etree.fromstring (">>> DOC=PQ (Filename=path_to_html_file)
>>> doc=pq (url= ' http://movie.douban.com/subject/3530403/')
You can select objects like jquery.
Copy Code code as follows:
>>> doc ('. Pl ')
[<span.pl>, <span.pl>, <span.pl>, <span.pl>, <span.pl>, <span.pl>, <span.pl ", <span.pl>, <span.pl>, <span.pl>, <span.pl>, <span#rateword.pl>, <span.pl> <span.pl>, <span.pl>, <span.pl>, <span.pl>, <span.pl>, <span.pl>, <p.pl>]
In this way, the object class for ' pl ' is all selected.
However, when you use iterations, you need to encapsulate the text:
Copy Code code as follows:
For Para in Doc ('. pl '):
PARA=PQ (para)
Print Para.text ()
Director
Writers
Starring
Type:
Official website:
Production Country/region:
Language:
Release Date:
Film Length:
IMDB Link:
Official station:
The text you get here is a Unicode code, which you need to encode as a string if you want to write to the file.
Users can use some of the pseudo classes provided by jquery (but do not yet support CSS) to operate, such as:
Copy Code code as follows:
>>> doc ('. Pl:first ')
[<span.pl>]
>>> Print doc ('. Pl:first '). Text ()
Director
Attributes
Get Properties of HTML element
Copy Code code as follows:
>>> p=pq (' <p id= "Hello" class= "Hello" ></p> ") (' P ')
>>> p.attr (' id ')
' Hello '
>>> p.attr.id
' Hello '
>>> p.attr[' id ']
' Hello '
assigning values
Copy Code code as follows:
>>> p.attr.id= ' plop '
>>> p.attr.id
' Plop '
>>> p.attr[' id ']= ' ola '
>>> p.attr.id
' Ola '
>>> p.attr (id= ' Hello ', class_= ' Hello2 ')
[<p#hello.hell0>]
Traversing
Filter
Copy Code code as follows:
>>> d=pq (' <p id= "Hello" class= "Hello" ><a/>hello</p><p id= "test" ><a/>world </p> ')
>>> d (' P '). Filter ('. Hello ')
[<p#hello.hello>]
>>> d (' P '). Filter (' #test ')
[<p#test>]
>>> d (' P '). Filter (lambda i:i==1)
[<p#test>]
>>> d (' P '). Filter (lambda i:i==0)
[<p#hello.hello>]
>>> d (' P '). Filter (lambda I:PQ (this). Text () = = ' Hello ')
[<p#hello.hello>]
Select in order
Copy Code code as follows:
>>> d (' P '). EQ (0)
[<p#hello.hello>]
>>> d (' P '). EQ (1)
[<p#test>]
Select inline Element
Copy Code code as follows:
>>> d (' P '). EQ (1). Find (' a ')
[<a>]
Select parent Element
Copy Code code as follows:
>>> D=PQ (' <p><span><em>Whoah!</em></span></p><p><em> There</em></p> ')
>>> d (' P '). EQ (1). Find (' em ')
[<em>]
>>> d (' P '). EQ (1). Find (' em '). End ()
[<p>]
>>> d (' P '). EQ (1). Find (' em '). End (). Text ()
' There '
>>> d (' P '). EQ (1). Find (' em '). End (). End ()
[<p>, <p>]