How to use Python parsing HTML development library Pyquery

Source: Internet
Author: User
For example

The code is as follows:



Director : Tom Tykwer/Lana Wachowski/Andy Vodoski

screenwriter : Tom Tykwer/Andy Vodoski/Lana Wachowski

starring : Tom Hanks/Halle Berry/Jim Braudbent/Hugo • Weaving/Jim Stergis/BAE Doona/Ben Wishow/James Dasi/Zhou Xun/Kess David/David Gijaci/Susan Sarandon/Hugh Grant /c4>

Type: plot/Sci Fi/Suspense

Official website:Cloudatlas.warnerbros.com

Production country/region:Germany/USA/HONG Kong/Singapore

Language:English

Release Date: 2013-01-31 (Mainland China)/2012-10-26 (USA)

length of the film: 134 mins (Mainland China)/172 minutes (USA)

IMDb Links:tt1371111

Official station:
The film "Cloud"

The code is as follows:


From pyquery import Pyquery as PQ
DOC=PQ (url= ' http://movie.douban.com/subject/3530403/')
Data=doc ('. pl ')
For I in data:
Print PQ (i). Text ()

Output

The code is as follows:


Director
Writers
Starring
Type:
Official website:
Production Country/region:
Language:
Release Date:
Length of the film:
IMDb Links:
Official station:

Usage

Users can use the Pyquery class to load XML documents from strings, lxml objects, files, or URLs:

The code is as follows:


>>> from pyquery import Pyquery as PQ
>>> from lxml import etree
>>> doc=pq ("")
>>> DOC=PQ (Etree.fromstring (""))
>>> DOC=PQ (Filename=path_to_html_file)
>>> doc=pq (url= ' http://movie.douban.com/subject/3530403/')

You can choose objects like jquery.

The code is as follows:


>>> doc ('. Pl ')
[ , , , , , , , , , , , , , , , , , , , ]

In this way, all objects of class ' pl ' are selected.

However, the text needs to be re-encapsulated when using iterations:

The code is as follows:


For Para in Doc ('. pl '):
PARA=PQ (para)
Print Para.text ()
Director
Writers
Starring
Type:
Official website:
Production Country/region:
Language:
Release Date:
Length of the film:
IMDb Links:
Official station:

The text that gets here is the Unicode code that needs to be encoded as a string if you want to write the file.
Users can use some of the pseudo-classes provided by jquery (but do not yet support CSS) to operate, such as:

The code is as follows:


>>> doc ('. Pl:first ')
[ ]
>>> Print doc ('. Pl:first '). Text ()
Director

Attributes
Get the attributes of an HTML element

The code is as follows:


>>> P=PQ ('

') (' P ')
>>> p.attr (' id ')
' Hello '
>>> p.attr.id
' Hello '
>>> p.attr[' id ']
' Hello '

Assign value

The code is as follows:


>>> p.attr.id= ' plop '
>>> p.attr.id
' Plop '
>>> p.attr[' id ']= ' ola '
>>> p.attr.id
' Ola '
>>> p.attr (id= ' Hello ', class_= ' Hello2 ')
[ ]

Traversing
Filter

The code is as follows:


>>> D=PQ ('

Hello

World

')
>>> d (' P '). Filter ('. Hello ')
[ ]
>>> d (' P '). Filter (' #test ')
[ ]
>>> d (' P '). Filter (lambda i:i==1)
[ ]
>>> d (' P '). Filter (lambda i:i==0)
[ ]
>>> d (' P '). Filter (lambda I:PQ (this). Text () = = ' Hello ')
[ ]

In order select

Code as follows:


>>> d (' P '). EQ (0)
[ ]
>>> d (' P '). EQ (1)
[ ]

Select inline elements

Code as follows:


>>> d (' P '). EQ (1). Find (' a ')
[]

Select parent element

code is as follows:


>>> d=pq ('

whoah!

there

')
>>> D (' P '). EQ (1). Find (' em ')
[ ]
>>> d (' P '). EQ (1). Find (' em ' ). End ()
[

]
>>> D (' P '). EQ (1). Find (' em '). End (). Text ()
' there '
>>> D (' P '). EQ (1). Find (' em '). End (). end ()
[

,

]

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.