The Pyquery library is a python implementation of jquery that can be used to parse HTML Web page content using the method:
Copy CodeThe code is as follows:
From pyquery import Pyquery as PQ
1, can load an HTML string, or an HTML file, or a URL address, for example:
Copy CodeThe code is as follows:
D = PQ (" <title>Hello</title>")
D = PQ (Filename=path_to_html_file)
D = PQ (url= ' http://www.baidu.com ') # Here the URL must be written in full
2, HTML () and text ()--get the corresponding HTML block or block of text, for example:
Copy CodeThe code is as follows:
p = PQ (" <title>Hello</title>")
P (' head '). html () # return <title>Hello</title>
P (' head '). Text () # return Hello
3, according to the HTML tag to get elements, for example:
Copy CodeThe code is as follows:
D = PQ ('
Test 1
Test 2
')
D (' P ') # returns [
,
]
Print d (' P ') # return
Test 1
Test 2
Print d (' P '). html () # returns test 1
Note: When you get more than one element, the HTML (), text () method returns only the corresponding block of content for the first element
4. EQ (index)--Gets the specified element based on the given index number
In the example above, if you want to get the contents of the second P tag, you can:
Copy the Code code as follows:
Print d (' P '). EQ (1). HTML () # returns test 2
5, filter ()--according to the class name, id name to get the specified element, for example:
Copy the Code code as follows:
D = PQ ("
Test 1
Test 2
")
D (' P '). Filter (' #1 ') # returns [ ]
D (' P '). Filter ('. 2 ') # returns [ ]
6, find ()--Find nested elements, for example:
Copy CodeThe code is as follows:
D = PQ ("
Test 1
Test 2
")
D (' div '). Find (' P ') # returns [ , ]
D (' div '). Find (' P '). EQ (0) #返回 [ ]
7, directly according to the class name, ID name to obtain the element, for example:
Copy CodeThe code is as follows:
D = PQ ("
Test 1
Test 2
")
D (' #1 '). html () # return Test 1
D ('. 2 '). HTML () # returns test 2
8. Get attribute values, for example:
Copy CodeThe code is as follows:
D = PQ ("
Hello
")
D (' a '). attr (' href ') # return to Http://hello.com
D (' P '). attr (' id ') # return my_id
9, modify the attribute value, for example:
Copy CodeThe code is as follows:
D (' a '). attr (' href ', ' http://baidu.com ')
10, AddClass (value)--Add a class for the element, for example:
Copy the Code code as follows:
D = PQ (")
D.addclass (' My_class ') # return []
11, Hasclass (name) #返回判断元素是否包含给定的类, for example:
Copy CodeThe code is as follows:
D = PQ ("")
D.hasclass (' My_class ') # returns True
12, children (selector=none)--Get child elements, for example:
Copy CodeThe code is as follows:
D = PQ ("
Hello
World
")
D.children () # returns [ , ]
D.children (' #2 ') # returns [ ]
13, parents (Selector=none)--Get parent element, example:
Copy Code code is as follows:
D = P Q ("
hello
world
")
D (' P '). Parents () # returns [ ]
D (' #1 '). Parents (' Span ') # returns []
D (' #1 '). Parents (' P ') # return []
14, Clone ()--Returns a copy of a node
15, empty ()--Remove node content
16, Nextall (Selector=none)--Returns all of the following block of elements, for example:
Copy code code as follows:
D = PQ ("
hello
world
")
D (' P:first '). Nextall () # returns [ ,]
D (' P:last '). Nextall () # back []
17, Not_ (selector)--Returns the element that does not match the selector, for example:
Copy code code as follows:
D = PQ ("
test 1
test 2
")
D (' P '). Not_ (' #2 ') # return [ ]
p>
For more information, refer to official website http://packages.python.org/pyquery