The Pyquery library is a python implementation of jquery that can be used to parse HTML Web page content using the method:
The code is as follows:
From pyquery import Pyquery as PQ
1, can load an HTML string, or an HTML file, or a URL address, for example:
The code is as follows:
D = PQ ("D = PQ (Filename=path_to_html_file)
D = PQ (url= ' http://www.baidu.com ') # Here the URL must be written in full
2, HTML () and text ()--get the corresponding HTML block or block of text, for example:
The code is as follows:
p = PQ ("P (' head '). html () # Back <title>hello</title>
P (' head '). Text () # return Hello
3, according to the HTML tag to get elements, for example:
The code is as follows:
D = PQ (' <div><p>test 1</p><p>test 2</p></div> ')
D (' P ') # return [<p>,<p>]
Print d (' P ') # returns <p>test 1</p><p>test 2</p>
Print d (' P '). html () # returns test 1
Note: When you get more than one element, the HTML (), text () method returns only the corresponding block of content for the first element
4. EQ (index)--Gets the specified element based on the given index number
In the example above, if you want to get the contents of the second P tag, you can:
The code is as follows:
Print d (' P '). EQ (1). HTML () # returns test 2
5, filter ()--according to the class name, id name to get the specified element, for example:
The code is as follows:
D = PQ ("<div><p id= ' 1 ' >test 1</p><p class= ' 2 ' >test 2</p></div>")
D (' P '). Filter (' #1 ') # returns [<p#1>]
D (' P '). Filter ('. 2 ') # returns [<p.2>]
6, find ()--Find nested elements, for example:
The code is as follows:
D = PQ ("<div><p id= ' 1 ' >test 1</p><p class= ' 2 ' >test 2</p></div>")
D (' div '). Find (' P ') # return [<p#1>, <p.2>]
D (' div '). Find (' P '). EQ (0) #返回 [<p#1>]
7, directly according to the class name, ID name to obtain the element, for example:
The code is as follows:
D = PQ ("<div><p id= ' 1 ' >test 1</p><p class= ' 2 ' >test 2</p></div>")
D (' #1 '). html () # return Test 1
D ('. 2 '). HTML () # returns test 2
8. Get attribute values, for example:
The code is as follows:
D = PQ ("<p id= ' my_id ' ><a href= ' http://hello.com ' >hello</a></p>")
D (' a '). attr (' href ') # return to Http://hello.com
D (' P '). attr (' id ') # return my_id
9, modify the attribute value, for example:
The code is as follows:
D (' a '). attr (' href ', ' http://baidu.com ')
10, AddClass (value)--Add a class for the element, for example:
The code is as follows:
D = PQ (' <div></div> ')
D.addclass (' My_class ') # returns [<div.my_class>]
11, Hasclass (name) #返回判断元素是否包含给定的类, for example:
The code is as follows:
D = PQ ("<div class= ' My_class ' ></div>")
D.hasclass (' My_class ') # returns True
12, children (selector=none)--Get child elements, for example:
The code is as follows:
D = PQ ("<span><p id= ' 1 ' >hello</p><p id= ' 2 ' >world</p></span>")
D.children () # return [<p#1>, <p#2>]
D.children (' #2 ') # returns [<p#2>]
13, parents (Selector=none)--Get the parent element, for example:
The code is as follows:
D = PQ ("<span><p id= ' 1 ' >hello</p><p id= ' 2 ' >world</p></span>")
D (' P '). Parents () # return [<span>]
D (' #1 '). Parents (' span ') # returns [<span>]
D (' #1 '). Parents (' P ') # return []
14. Clone ()--Returns a copy of a node
15, empty ()--Remove node content
16, Nextall (Selector=none)--return all the element blocks, for example:
The code is as follows:
D = PQ ("<p id= ' 1 ' >hello</p><p id= ' 2 ' >world</p> ')
D (' P:first '). Nextall () # return to [<p#2>, ]
D (' P:last '). Nextall () # return to []
17, Not_ (selector)--Returns the element that does not match the selector, for example:
The code is as follows:
D = PQ ("<p id= ' 1 ' >test 1</p><p id= ' 2 ' >test 2</p>")
D (' P '). Not_ (' #2 ') # return [<p#1>]
This article originated from: http://www.jb51.net/article/50069.htm
Python Parsing HTML Web page