Reptile--selenium and Phantomjs

Source: Internet
Author: User
Tags tag name xpath

Selenium

Selenium is a Web automation testing tool, originally developed for website Automation testing, type like we play the game with the Key wizard, can be automatically operated according to the specified command, the difference is that selenium can run directly on the browser, It supports all major browsers (including PHANTOMJS, which are non-interface browsers).

Selenium can use our instructions to let the browser automatically load the page, get the required data, or even screen screenshots, or determine whether certain actions on the site occur.

Selenium does not have a browser, does not support browser features, it needs to be combined with a third-party browser to use. But most of the time we need to let it run in code, so we can use a tool called PHANTOMJS instead of a real browser.

You can download the Selenium library https://pypi.python.org/simple/selenium from the PyPI website, or you can install it with a command using a third-party manager PIP:pip install selenium

Selenium Official Reference Document: Http://selenium-python.readthedocs.io/index.html

Phantomjs

PHANTOMJS is a webkit-based "No Interface" (headless) browser that loads the site into memory and executes JavaScript on the page because it doesn't show a graphical interface, so it's more efficient to run than a full browser.

If we combine selenium and PHANTOMJS, we can run a very powerful web crawler that can handle JavaScript, cookies, headers, and anything that our real users need to do.

Note: Phantomjs can only be downloaded from its official website http://phantomjs.org/download.html. Because PHANTOMJS is a fully functional (though no interface) browser rather than a Python library, it does not need to be installed like any other library in Python, but we can use it directly through selenium call PHANTOMJS.

PHANTOMJS Official Reference Document: Http://phantomjs.org/documentation

Quick Start

There is an API called Webdriver in the Selenium library. Webdriver is a bit like a browser that can load a Web site, but it can be used like BeautifulSoup or other selector objects to find page elements, interact with elements on the page (send text, click, etc.), and perform other actions to run a web crawler.

#!/usr/bin/python3#-*-conding:utf-8-*-__author__ = ' Mayi ' # importing Webdriverfrom Selenium import Webdriverfrom Selenium.webdriver.common.keys Import Keysimport time# invokes the environment variable specified by the Phantomjs browser to create the browser object, Executable_ Path: Specify PHANTOMJS position driver = Webdriver. PHANTOMJS (Executable_path = r "D:\Program Files\phantomjs\bin\phantomjs") the # Get () method waits until the page is fully loaded before proceeding with the program Driver.get (" http://www.baidu.com/") # Gets the text content of the ID label for page name wrapper data = driver.find_element_by_id (" wrapper "). text# print the captured text content print ( Data) # Print page title: Baidu a bit, you know print (driver.title) # Generate the current page snapshot and save Driver.save_screenshot ("Baidu.jpg") # Enter "Ants" in the Baidu Search input box driver.find_element_by_id ("kw"). Send_keys ("Ant") # Analog click on "Baidu click" button driver.find_element_by_id ("su"). Click () # wait 2 seconds, Let the page load Time.sleep (2) # get the page snapshot after the search driver.save_screenshot ("ant. jpg") # Print Web page rendered source code # Print (driver.page_source) # Gets the current page Cookieprint (Driver.get_cookies ()) # Ctrl + A Select all input box contents driver.find_element_by_id ("kw"). Send_keys (Keys.control, "a ") # Ctrl + x Cut the input box contents driver.find_element_by_id (" kw "). Send_keys (Keys.control," x ") # Input box re-enter content driver.find_element_by_id ("kw"). Send_keys ("Python") # simulates the Enter enter key driver.find_element_by_id ("Su"). Send_keys (Keys.enter) # Wait 2 seconds for the page to load Time.sleep (2) # Clear the Input box contents driver.find_element_by_id ("kw"). Clear () # Get a new snapshot Driver.save_screenshot (" Python.jpg ") # Gets the current Urlprint (Driver.current_url) # Close the current page, if there is only one page, will close the browser driver.close () # Close browser driver.quit ()
Page actions

Selenium's Webdriver provides a variety of ways to find elements, assuming there is a form input box below:

<input type= "text" name= "User-name" id= "Passwd-id"/>

So

# Get ID Tag Value element = driver.find_element_by_id ("Passwd-id") # Gets the name tag value of element = Driver.find_element_by_name ("User-name ") # Gets the tag name value element = driver.find_element_by_tag_name (" input ") # is matched by an XPath to element = Driver.find_element_by_xpath ("// input[@id = ' Passwd-id '] ")
Positioning UI elements (webelements)

About the selection of elements:

Find_element_by_idfind_elements_by_namefind_elements_by_xpathfind_elements_by_link_textfind_elements_by_ Partial_link_textfind_elements_by_tag_namefind_elements_by_class_namefind_elements_by_css_selector
1.by ID
# page Content <div id= "Coolestwidgetevah" >...</div># implementation element = driver.find_element_by_id (Coolestwidgetevah ")
2.by Name
# page Content <input name= "cheese" type= "text"/># implement element = Driver.find_element_by_name ("cheese")
3.by XPath
# page Content <input type= "text" name= "Example"/><input type= "text" Name= "other"/># implement element = Driver.find_ Elements_by_xpath ("//input")
4.by link text
# page Content <a href= "Http://www.google.com/search?q=cheese" >cheese</a># implementation element = Driver.find_element_by_ Link_text ("cheese")
5.by Partial link text
# page Content <a href= "Http://www.google.com/search?q=cheese" >search for cheese</a>># implementation element = Driver.find_ Element_by_partial_link_text ("cheese")
6.by Tag Name
# page Content <iframe src= "..." ></iframe># implementation element = Driver.find_element_by_tag_name ("iframe")
7.by class Name
# page Content <div id= "food" ><span class= "Dairy" >milk</span><span class= "dairy aged" >cheese</ span></div># implementation element = Driver.find_elements_by_class_name ("div")
8.by CSS Selector
# page Content <div id= "food" ><span class= "Dairy" >milk</span><span class= "dairy aged" >cheese</ span></div># implementation element = Driver.find_elements_by_css_selector ("#food span.dairy.aged")
Mouse Action Chain

Sometimes, we need to simulate some mouse actions on the page, such as double-click, right-click, drag-and-drop, or even hold, we can do this by importing the Actionchains class:

#导入 Actionchains class from selenium.webdriver import actionchains# mouse move to ac position ac = Driver.find_element_by_xpath (' element ') Actionchains (Driver). Move_to_element (AC). Perform () in AC position click ac = Driver.find_element_by_xpath ("Elementa") Actionchains (Driver). Move_to_element (AC). Click (AC). Perform () in AC position double-click AC = Driver.find_element_by_xpath ("Elementb" ) Actionchains (Driver). Move_to_element (AC). Double_click (AC). Perform () in AC position right-click AC = Driver.find_element_by_xpath (" Elementc ") Actionchains (Driver). Move_to_element (AC). Context_click (AC). In AC position left click hold AC = Perform Element_by_xpath (' elementf ') actionchains (driver). Move_to_element (AC) click_and_hold (AC). Perform () # drag AC1 to AC2 Location AC1 = Driver.find_element_by_xpath (' elementd ') AC2 = Driver.find_element_by_xpath (' Elemente ') actionchains (driver) . Drag_and_drop (AC1, AC2). Perform ()
Fill out the form

We already know how to enter text into a text box, but sometimes we encounter a drop-down box for a select tag. Clicking the option in the drop-down box directly is not necessarily possible.

<select id= "status" class= "Form-control valid" onchange= "name=" status ">    <option value=" "></ option>    <option value= "0" > Not audited </option>    <option value= "1" > First instance through </option>    <option value= "2" > Review through </option>    <option value= "3" > Audit not approved </option></select>

  

Selenium specifically provides a select class to handle the drop-down box. In fact, Webdriver provides a method called Select that can help us accomplish these things:

# import Select class from Selenium.webdriver.support.ui import select# find the tab for name select = Select (Driver.find_element_by_name (' s Tatus ') # Select.select_by_index (1) select.select_by_value ("0") Select.select_by_visible_text ("not approved")

Here are three ways to select a dropdown box, which can be selected according to the index, selected according to the value, and selected according to the text content. Attention:

    • Index indexes starting from 0
    • Value is a property value of the option tag, not the value displayed in the drop-down box
    • Visible_text is the value of the option label text, which is the value that is displayed in the drop-down box

Deselect all: Select.deselect_all ()

Pop-up window processing

When you trigger an event, the page appears with a pop-up prompt, either to handle the hint or to get an informational method:

Alert = Driver.switch_to_alert ()
Page switching

A browser will certainly have a lot of windows, so we must have a way to implement the window switch. Here's how to switch windows:

Driver.switch_to.window ("This is window name")

You can also use the Window_handles method to get the action object for each window.

For handle in Driver.window_handles:    Driver.switch_to_window (handle)
Page forward and backward

To manipulate the forward and backward functions of the page:

Driver.forward ()     #前进driver. Back ()        # Rewind
Cookies

Get page per cookie value:

For Cookie in Driver.get_cookies ():    print ("%s,%s"% (cookie[' name '], cookie[' value '))

To delete a cookie:

# Delete all driver.delete_all_cookies () by cookie name Driver.delete_cookie ("CookieName") #
Page wait

Today's Web pages are increasingly using AJAX technology, so the program cannot determine when the page is fully loaded. If the actual page waits too long to cause a DOM element to come out, but your code uses the webelement directly, the Nullpointer exception is thrown.

In order to avoid the difficulty of locating this element and possibly causing an exception to be thrown, selenium provides two waiting methods, one for implicit waiting and one for explicit waiting.

The implicit wait is to wait for a specific time, and the wait is specified when a condition is established to continue execution.

Implicit wait

The implicit wait is simple, which is to simply set a wait time in seconds.

From selenium import webdriverdriver = Webdriver. Chrome () # wait 10 seconds driver.implicitly_wait # secondsdriver.get ("http://www.baidu.com/") Mydynamicelement = driver.find_element_by_id ("Mydynamicelement")
Explicit wait

Explicitly waits for a condition to be specified, and then sets the maximum wait time. If, at this time, the specified condition is established, the execution continues; If at this time the specified condition is not yet established, an exception is thrown.

From selenium import webdriverfrom selenium.webdriver.common.by import by# webdriverwait Library, responsible for loop waiting from Selenium.webdriver.support.ui Import webdriverwait# Expected_conditions class, responsible for the conditions of departure from Selenium.webdriver.support Import expected_conditions as Ecdriver = Webdriver. Chrome () driver.get ("http://www.baidu.com/") Try:    # page keeps looping until id= "mydynamicelement" appears    element = Webdriverwait (Driver). Until (        ec.presence_of_element_located ((by.id, "mydynamicelement"))    finally:    Driver.quit ()

If you do not write the parameter, the program defaults to 0.5s call once to see if the element has been loaded, and if the original element is present, it will return immediately.

Here are some built-in wait conditions that you can invoke directly without having to write some wait conditions yourself.

Title_istitle_containspresence_of_element_locatedvisibility_of_element_locatedvisibility_ofpresence_of_all_ Elements_locatedtext_to_be_present_in_elementtext_to_be_present_in_element_valueframe_to_be_available_and_ Switch_to_itinvisibility_of_element_locatedelement_to_be_clickable–it is Displayed and enabled.staleness_ofelement _to_be_selectedelement_located_to_be_selectedelement_selection_state_to_beelement_located_selection_state_to_ Bealert_is_present

Of course, if not set, the default wait time is 0.

Reptile--selenium and Phantomjs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.