The pyquery library is implemented in Python of jQuery and can be used to parse HTML webpage content. Usage:
Copy codeThe Code is as follows:
From pyquery import PyQuery as pq
1. You can load an HTML string, an HTML file, or a url. For example:
Copy codeThe Code is as follows:
D = pq ("D = pq (filename = path_to_html_file)
D = pq (url = 'HTTP: // www.baidu.com ') # The url must be fully written.
2. html () and text () -- obtain the corresponding HTML block or text block, for example:
Copy codeThe Code is as follows:
P = pq ("P('head'{.html () # Return <title> hello </title>
P ('head'). text () # Return hello
3. Obtain elements based on HTML tags, for example:
Copy codeThe Code is as follows:
D = pq ('<div> <p> test 1 </p> <p> test 2 </p> </div> ')
D ('P') # Return [<p>, <p>]
Print d ('P') # returns <p> test 1 </p> <p> test 2 </p>
Print d('p').html () # Return test 1
Note: When more than one element is obtained, the html () and text () methods only return the corresponding content blocks of the first element.
4. eq (index) -- Obtain the specified Element Based on the given index number
For example, if you want to get the content in the second p tag, you can:
Copy codeThe Code is as follows:
Print d('p'{.eq(1}.html () # Return test 2
5. filter () -- Obtain the specified element based on the class name and id, for example:
Copy codeThe Code is as follows:
D = pq ("<div> <p id = '1'> test 1 </p> <p class = '2'> test 2 </p> </div> ")
D ('P'). filter ('# 1') # Return [<p #1>]
D ('P'). filter ('. 2') # Return [<p.2>]
6. find () -- search for nested elements, for example:
Copy codeThe Code is as follows:
D = pq ("<div> <p id = '1'> test 1 </p> <p class = '2'> test 2 </p> </div> ")
D ('div '). find ('P') # Return [<p #1>, <p.2>]
D ('div '). find ('P'). eq (0) # Return [<p #1>]
7. Obtain elements directly based on the class name and id name, for example:
Copy codeThe Code is as follows:
D = pq ("<div> <p id = '1'> test 1 </p> <p class = '2'> test 2 </p> </div> ")
D('00001'0000.html () # Return test 1
D('.2'0000.html () # Return test 2
8. Get the property value, for example:
Copy codeThe Code is as follows:
D = pq ("<p id = 'my _ id'> <a href = 'HTTP: // hello.com '> hello </a> </p> ")
D ('A'). attr ('href ') # Return http://hello.com
D ('P'). attr ('id') # Return my_id
9. Modify the attribute value, for example:
Copy codeThe Code is as follows:
D ('A'). attr ('href ', 'HTTP: // baidu.com ')
10. addClass (value) -- add a class for the element, for example:
Copy codeThe Code is as follows:
D = pq ('<div> </div> ')
D. addClass ('My _ class') # Return [<div. my_class>]
11. hasClass (name) # Return to determine whether the element contains the given class. For example:
Copy codeThe Code is as follows:
D = pq ("<div class = 'my _ class'> </div> ")
D. hasClass ('My _ class') # Return True
12. children (selector = None) -- Obtain the child element, for example:
Copy codeThe Code is as follows:
D = pq ("<span> <p id = '1'> hello </p> <p id = '2'> world </p> </span> ")
D. children () # Return [<p #1>, <p #2>]
D. children ('# 2') # Return [<p #2>]
13. parents (selector = None) -- Obtain the parent element, for example:
Copy codeThe Code is as follows:
D = pq ("<span> <p id = '1'> hello </p> <p id = '2'> world </p> </span> ")
D ('P'). parents () # Return [<span>]
D ('# 1'). parents ('span') # Return [<span>]
D ('# 1'). parents ('P') # Return []
14. clone () -- returns a copy of a node.
15. empty () -- remove node content
16. nextAll (selector = None) -- return all element blocks following the returned results, for example:
Copy codeThe Code is as follows:
D = pq ("<p id = '1'> hello </p> <p id = '2'> world </p> D ('P: first '). nextAll () # Returns [<p #2>, ]
D ('P: la'). nextAll () # Return []
17. not _ (selector) -- returns the element that does not match the selector. For example:
Copy codeThe Code is as follows:
D = pq ("<p id = '1'> test 1 </p> <p id = '2'> test 2 </p> ")
D ('P'). not _ ('# 2') # Return [<p #1>]
For more information, refer to the official website http://packages.python.org/pyquery