<class 'pyquery.pyquery.PyQuery'>
Ii. Common CCS Selector
Print the label whose id is container
print(doc('#container'))print(type(doc('#container')))
Return
<ul id="container"> <li class="object-1"/> <li class="object-2"/> <li class="object-3"/></ul><class 'pyquery.pyquery.PyQuery'>
Print the label of class as object-1
print(doc('.object-1'))
Return
<li class="object-1"/>
Print the tag named body
print(doc('body'))
Return
<body> <ul id="container"> <li class="object-1"/> <li class="object-2"/> <li class="object-3"/> </ul></body>
Multiple css selectors
print(doc('html #container'))
Return
<ul id="container"> <li class="object-1"/> <li class="object-2"/> <li class="object-3"/></ul>
Iii. Pseudo-class selector
Pseudo nth
Print (pseudo _ Doc ('li: nth-child (2) ') # print the first li tag print (pseudo _ Doc ('li: first-child ')) # print the last tag print (pseudo _ Doc ('li: last-child '))
Return
<Li class = "object-2"> syntax </li> <li class = "object-1"> Python </li> <li class = "object-6"> fun </li>
Contains
# Find the li label print (pseudo _ Doc ("li: contains ('python')") containing Python) # Find the li label print (pseudo _ Doc ("li: contains ('hao ')"))
Return
<Li class = "object-1"> Python </li> <li class = "object-3"> good </li> <li class = "object-4"> good </li> <li class = "object-6"> Fun </li>
4. Search for tags
Search for qualified tags in the Pyquery object according to the condition, similar to the find method in BeautifulSoup.
Print the tag id = container
print(doc.find('#container'))
Return
<ul id="container"> <li class="object-1"/> <li class="object-2"/> <li class="object-3"/></ul>print(doc.find('li'))
Return
<li class="object-1"/><li class="object-2"/><li class="object-3"/>
4.2 child tags-children Method
# Id = iner label iner = doc. find ('# iner') print (container. children ())
Return
<li class="object-1"/><li class="object-2"/><li class="object-3"/>
4.3 parent label-parent Method
object_2 = doc.find('.object-2')print(object_2.parent())
Return
<ul id="container"> <li class="object-1"/> <li class="object-2"/> <li class="object-3"/></ul>
4.4 sibling tag-siblings Method
object_2 = doc.find('.object-2')print(object_2.siblings())
Return
<li class="object-1"/><li class="object-3"/>
5. Obtain Tag Information
After locating the target tag, we need the text or attribute values inside the tag. At this time, we need to extract the text or attribute values.
5.1 tag attribute value Extraction
. Attr () refers to the attribute name of the input tag and returns the attribute value.
object_2 = doc.find('.object-2')print(object_2.attr('class'))
Return
object-2
5.2 text in the tag
. Text ()
Html_text = "
Return
Simple and Easy to use PyQuery Hello World! Good Python syntax
object_1 = docs.find('.object-1')print(object_1.text())container = docs.find('#container')print(container.text())
Return
PythonHello World! Good Python syntax
Tips: If I only want to get "Hello World" and do not want to get any other text, I can remove the li tag using the remove method, and then use the text method.
container = docs.find('#container')container.remove('li')print(container.text())
Return
Hello World!
Pyquery custom usage
Access URL
Compared with BeautifulSoup, PyQuery can initiate a request to the website. For example
from pyquery import PyQueryPyQuery(url = 'https://www.baidu.com')
Opener Parameters
This is PyQuery's request for Baidu's Web site and processing the response returned by the request as a PyQuery object. Generally, the pyquery library calls the urllib library by default. If you want to use selenium or requests library, you can customize the opener parameter of PyQuery.
The opener parameter indicates the request library used by pyquery to initiate a request to the website. Common Request libraries such as urllib, requests, and selenium. Here we define a selenium opener.
From pyquery import PyQueryfrom selenium. webdriver import PhantomJS # Use selenium to access urldef selenium_opener (url): # I didn't put Phantomjs into environment variables, therefore, you need to put the path driver = PhantomJS (executable_path = 'phantomjs path') driver every time you use it. get (url) html = driver. page_source driver. quit () return html # note that the opener parameter in use is a function name without parentheses! PyQuery (url = 'https: // www.baidu.com/', opener = selenium_opener)
At this time, we can operate on the PyQuery object to extract useful information. For details, please refer to the previous sharing. If you want to learn more functions, the pyquery document is not very detailed. Fortunately, it basically matches the jQuery function. If you want to use pyquery well, you need to check the jQuery document.
Cookies, headers
In requests usage, it is generally used as a browser to make the website more authentic. In general, we need to pass in headers. If necessary, we also need to pass in cookies. The pyquery library has this function, and can also pretend to be a browser.
From pyquery import PyQuerycookies = {'cookies': 'Your cookies'} headers = {'user-agent': 'mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) appleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.20.3.100 Safari/100'} PyQuery (url = 'https: // www.baidu.com/', headers?headers,cookies=cookies)
Bring your selenium with the pyquery Function
Directly convert the webpage obtained from the URL accessed by the driver into a PyQuery object, making it easier to extract data.
From pyquery import PyQueryfrom selenium. webdriver import PhantomJSclass Browser (PhantomJS): @ property def dom (self): return PyQuery (self. page_source) "this part of the property is the decorator. You need to know the function that follows @ property to implement the class property function. Here browser. dom is the dom attribute of browser. "Browser = Browser (executable_path = 'phantomjs path') browser. get (url = 'https: // www.baidu.com/') print (type (browser. dom ))
Return
<class 'pyquery.pyquery.PyQuery'>
Summary
The above is all the content of this article. I hope the content of this article has some reference and learning value for everyone's learning or work. If you have any questions, please leave a message to us, thank you for your support.