Selenium is a complete Web application testing system that includes test recording (Selenium IDE), writing and running (Selenium Remote Control) and testing parallel processing (selenium Grid). The core of selenium, selenium-based Jsunit, is written entirely by JavaScript and can be used on any JavaScript-enabled browser.
Selenium can simulate real-world browsers, automate test tools, and support multiple browsers that are primarily used to solve JavaScript rendering problems.
When using Python to write crawlers, the main use is Selenium webdriver, we can use the following way to first see which browsers selenium.webdriver support
Install Selenium if the PIP has already been installed. Run the command directly.pip install
-
U selenium
Another way, https://pypi.python.org/packages/source/s/selenium/selenium-2.52.0.tar.gz download and unzip. Here is a brief description of the Windows version, in fact, Unix under the same, using wget download installation.
Use commands (Setup is typically used for third-party module installation): CD c:\Python3\xxxx python setup.py install
Selenium2 combines selenium and webdriver, directly into the respective browser corresponding driver, open can
declaring browser objects
We know that selenium supports a lot of browsers, but if you want to declare and invoke a browser, you need:
From selenium import Webdriverbrowser = Webdriver. Chrome () browser = Webdriver. Firefox ()
Visit page:
browser.get(
‘https://www.xxx.com‘
)
Selenium can get elements and perform various actions in each of the following ways, as explained in the above linked document:
- find_element_by_id
- Find_element_by_name
- Find_element_by_xpath
- Find_element_by_link_text
- Find_element_by_partial_link_text
- Find_element_by_tag_name
- Find_element_by_class_name
- Find_element_by_css_selector
username
=
"qun"
passwd
=
"passwd"
browser
=
webdriver.Firefox()
browser.get(
‘https://www.xxx.com‘
)
browser.implicitly_wait(
10
)
elem
=
browser.find_element_by_id(
"loginFormUserName"
)
elem.send_keys(username)
elem
=
browser.find_element_by_id(
"loginFormPassword"
)
elem.send_keys(passwd)
elem
=
browser.find_element_by_id(
"loginFormSubmit"
)
elem.click()
Find elements: In three different ways to get the elements of the response, the first one is through the ID, the second is the CSS selector, the third is the XPath selector, the result is the same.
From selenium import Webdriverbrowser = Webdriver. Chrome () browser.get ("http://www.taobao.com") Input_first = browser.find_element_by_id ("q") Input_second = Browser.find_element_by_css_selector ("#q") Input_third = Browser.find_element_by_xpath ('//*[@id = "q"] ') print (input _first) print (input_second) print (Input_third) browser.close ()
Interactive action
Attaching an action to a chain of actions serial execution
From selenium import webdriverfrom selenium.webdriver import actionchainsbrowser = Webdriver. Chrome () url = "http://www.runoob.com/try/try.php?filename=jqueryui-api-droppable" browser.get (URL) browser.switch_ To.frame (' iframeresult ') Source = Browser.find_element_by_css_selector (' #draggable ') target = Browser.find_element_ By_css_selector (' #droppable ') actions = actionchains (browser) actions.drag_and_drop (source, target) Actions.perform ( )
For more information on how to do this: Http://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.common.action_chains
Execute JavaScript
This is a very useful method, you can directly invoke the JS method to implement some operations,
The following example is done by logging in and then flipping through JS to the bottom of the page, and the pop-up box prompts
From selenium import Webdriverbrowser = Webdriver. Chrome () browser.get ("Http://www.zhihu.com/explore") Browser.execute_script (' Window.scrollto (0, document.body.scrollHeight) Browser.execute_script (' Alert ("to Bottom) ')
Get element properties
Get_attribute (' class ')
From selenium import Webdriverbrowser = Webdriver. Chrome () url = ' https://www.zhihu.com/explore ' browser.get (URL) logo = browser.find_element_by_id (' Zh-top-link-logo ') Print (logo) print (Logo.get_attribute (' class '))
Get text value
Text
From selenium import Webdriverbrowser = Webdriver. Chrome () url = ' https://www.zhihu.com/explore ' browser.get (URL) input = Browser.find_element_by_class_name (' Zu-top-add-question ') print (Input.text)
Get ID, location, label signature
Id
Location
Tag_name
Size
From selenium import Webdriverbrowser = Webdriver. Chrome () url = ' https://www.zhihu.com/explore ' browser.get (URL) input = Browser.find_element_by_class_name (' Zu-top-add-question ') print (input.id) print (input.location) print (input.tag_name) print (input.size)
Frame
There are frame tags on many pages, so when we crawl the data it involves cutting into the frame as well as the cut-out problem, which is illustrated by the following example
Switch_to.from () and switch_to.parent_frame () are commonly used here.
Import timefrom Selenium import webdriverfrom selenium.common.exceptions Import Nosuchelementexceptionbrowser = Webdriver. Chrome () url = ' http://www.runoob.com/try/try.php?filename=jqueryui-api-droppable ' browser.get (URL) browser.switch_ To.frame (' iframeresult ') Source = Browser.find_element_by_css_selector (' #draggable ') print (source) Try: logo = Browser.find_element_by_class_name (' logo ') except nosuchelementexception: print (' NO logo ') Browser.switch_ To.parent_frame () logo = Browser.find_element_by_class_name (' logo ') print (logo) print (logo.text)
Wait
When an implicit wait is used to perform the test, if Webdriver does not find the element in the DOM, it will continue to wait for an exception to be found without an element after the set time, in other words, when the lookup element or element does not appear immediately, the implicit wait waits for some time to find the DOM, The default time is 0
Implicit wait
To a certain time to find that the element has not yet loaded, then continue to wait for our specified time, if more than our specified time has not been loaded will throw an exception, if there is no need to wait for the load is completed immediately after the execution
From selenium import Webdriverbrowser = Webdriver. Chrome () browser.implicitly_wait () browser.get (' https://www.zhihu.com/explore ') input = Browser.find_element_by_ Class_name (' zu-top-add-question ') print (input)
Show wait
Specify a wait condition, and specify a maximum wait time, will be in this time to determine whether to meet the waiting condition, if the establishment will return immediately, if not, will wait until you specify the longest waiting time, if still not satisfied, will throw an exception, if satisfied, will return to normal
From selenium import webdriverfrom selenium.webdriver.common.by import byfrom selenium.webdriver.support.ui Import Webdriverwaitfrom selenium.webdriver.support Import expected_conditions as Ecbrowser = Webdriver. Chrome () browser.get (' https://www.taobao.com/') wait = webdriverwait (browser, ten) input = Wait.until (ec.presence_of_ Element_located ((by.id, ' Q '))) button = Wait.until (Ec.element_to_be_clickable ((By.css_selector, '. Btn-search '))) Print (input, button)
The conditions in the example above: ec.presence_of_element_located () is to confirm that the element has already appeared
Ec.element_to_be_clickable () is a confirmation that the element is clickable
Common Criteria for judging:
Title_is title is a content
Title_contains Title contains a content
presence_of_element_located elements are loaded, passed in the locating tuple, such as (By.id, ' P ')
visibility_of_element_located element visible, incoming locator tuple
Visibility_of visible, incoming element object
Presence_of_all_elements_located all elements are loaded out
Text_to_be_present_in_element an element literal contains a literal
Text_to_be_present_in_element_value An element value contains a literal
Frame_to_be_available_and_switch_to_it frame Load and switch
invisibility_of_element_located element not visible
Element_to_be_clickable elements can be clicked
STALENESS_OF determines whether an element is still in the DOM and can determine whether the page has been refreshed
element_to_be_selected element selectable, pass element object
element_located_to_be_selected element selectable, incoming locating tuple
Element_selection_state_to_be passed in Element object and state, equality returns TRUE, otherwise false is returned
Element_located_selection_state_to_be incoming locating tuples and states, equal returns True, otherwise false
Alert_is_present if alert is present
For more information on how to do this: http://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.support.expected_conditions
Browser's forward and backward
Back ()
Forward ()
Import timefrom Selenium Import Webdriverbrowser = Webdriver. Chrome () browser.get (' https://www.baidu.com/') browser.get (' https://www.taobao.com/') browser.get (' https:// www.python.org/') Browser.back () time.sleep (1) browser.forward () Browser.close ()
Cookie manipulation
Get_cookies ()
Delete_all_cookes ()
Add_cookie ()
From selenium import Webdriverbrowser = Webdriver. Chrome () browser.get (' Https://www.zhihu.com/explore ') print (Browser.get_cookies ()) Browser.add_cookie ({' Name ': ' Name ', ' domain ': ' www.zhihu.com ', ' value ': ' Zhaofan '}) print (Browser.get_cookies ()) browser.delete_all_cookies () Print (Browser.get_cookies ())
TAB Management
Implementing the JS command to implement a new Open tab window.open ()
Different tabs are present in the list browser.window_handles
The first tab can be manipulated by browser.window_handles[0]
Import timefrom Selenium Import Webdriverbrowser = Webdriver. Chrome () browser.get (' https://www.baidu.com ') browser.execute_script (' window.open () ') print (browser.window_ Handles) Browser.switch_to_window (browser.window_handles[1]) browser.get (' https://www.taobao.com ') time.sleep (1) Browser.switch_to_window (Browser.window_handles[0]) browser.get (' https://python.org ')
Exception handling
Here the exception is more complex, the official website of the reference address:
Http://selenium-python.readthedocs.io/api.html#module-selenium.common.exceptions
This is a simple demonstration to find an element that does not exist.
From selenium import webdriverfrom selenium.common.exceptions import timeoutexception, Nosuchelementexceptionbrowser = Webdriver. Chrome () Try: browser.get (' https://www.baidu.com ') except TimeoutException: print (' Time Out ') Try: browser.find_element_by_id (' hello ') except nosuchelementexception: print (' No element ') finally: Browser.close ()
Python+selenium Implementing login Account