SELENIUM+PHANTOMJS parsing JS

Last Update:2017-01-21 Source: Internet

Author: User

Tags xpath

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Background knowledge:

PHANTOMJS is a WebKit-based server-side JavaScript API. It fully supports the web without the need for browser support, its fast, native support for various web standards: DOM processing, CSS selectors, JSON, Canvas, and SVG. PHANTOMJS can be used for page automation, network monitoring, web screen screenshots, and no interface testing.

Selenium is also a tool for Web application testing. The selenium test runs directly in the browser, just as the real user is doing. Supported browsers include IE (7, 8, 9), Mozilla Firefox, Mozilla Suite, and more. The main features of this tool include: test and browser compatibility--test your application to see if it works well on different browsers and operating systems.

PHANTOMJS is used to render parsing js,selenium used to drive and with Pyt

#coding =utf-8from Selenium Import webdriverdriver = Webdriver. Phantomjs (executable_path= ' C:usersgentlyguitardesktopphantomjs-1.9.7-windowsphantomjs.exe ') driver.get ("HTTP/ phperz.com/") driver.find_element_by_id (' Search_form_input_homepage '). Send_keys (" Nirvana ") Driver.find_element_ by_id ("Search_button_homepage"). Click () print Driver.current_urldriver.quit ()

Hon of the docking, Python for the later processing.

Selenium2 supported Python versions: 2.7, 3.2, 3.3 and 3.4

Additional installation of Selenium server is required if remote operation is required

Installation:

First installed selenium2, which way can be installed, I usually directly download the compressed package, and then use the Python setup.py install command to install, Selenium 2.42.1: https://pypi.python.org/pypi/ selenium/2.42.1

Then download Phantomjs,https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-1.9.7-windows.zip, unzip to see a Phantomjs.exe file

Example 1 :

The Executable_path is just the path of the Phantomjs.exe, the result of the operation:

Https://phperz.com/?q=Nirvana

Walk through of the example :

It is worth mentioning that:

The Get method waits until the page is fully loaded before proceeding with the program

But for Ajax:it ' s worth noting that if your page uses a IoT of AJAX on load then webdriver could not know when It had complete Ly loaded

Send_keys is the fill input

Example 2 :

#coding =utf-8from Selenium Import webdriverfrom selenium.webdriver.common.keys import Keysfrom Selenium.webdriver.support.ui Import webdriverwaitfrom selenium.webdriver import Actionchainsimport timeimport Sysdriver = Webdriver. Phantomjs (executable_path= ' C:usersgentlyguitardesktopphantomjs-1.9.7-windowsphantomjs.exe ') driver.get ("HTTP/ www.zhihu.com/#signin ") #driver. Find_element_by_name (' email '). Send_keys (' your email ') driver.find_element_by_ XPath ('//input[@name = "password"]). Send_keys (' Your password ') #driver. Find_element_by_xpath ('//input[@name = ' Password "]). Send_keys (Keys.return) time.sleep (2) driver.get_screenshot_as_file (' Show.png ') #driver. Find_element_ By_xpath ('//button[@class = "Sign-button"]). Click () Driver.find_element_by_xpath ('//form[@class = ' Zu-side-login-box "]). Submit () try:dr=webdriverwait (driver,5) dr.until (Lambda The_driver:the_driver.find_element_ By_xpath ('//a[@class = "Zu-top-nav-userinfo"). is_displayed ()) Except:print ' Login failed ' sys.exit (0) driver.get_ Screenshot_as_file (' show.png ') #webdriver #user =driver.find_element_by_class_name (' Zu-top-nav-userinfo '). Actionchains (Driver). Move_to_element (user). Perform () #移动鼠标到我的用户名loadmore =driver.find_element_by_xpath ('//a[@id = "Zh-load-more"] actions = Actionchains (driver) actions.move_to_element (Loadmore) Actions.click (Loadmore) Actions.perform () Time.sleep (2) driver.get_screenshot_as_file (' show.png ') print Driver.current_urlprint driver.page _sourcedriver.quit ()

This program is completed, login to know, and then can automatically click on the page below the "more" to load more content

Walk through of the example :

From Selenium.webdriver.common.keys import Keys,keys This class is the key on the keyboard, the text of the Send_keys (Keys.return) is to press a carriage return

From Selenium.webdriver.support.ui import webdriverwait is for a later wait operation

From Selenium.webdriver import Actionchains is the class that imports an action, the wording of this sentence, I looked for a long time

Find_element recommend the use of XPath method, very convenient

Syntax for XPath expressions tutorial: http://www.ruanyifeng.com/blog/2009/07/xpath_path_expressions.html

It is worth noting that you should avoid selecting the value with a space attribute, such as class = "Country name", otherwise it will be an error, probably compound class or something wrong

The correct way to check the user's password is to take a screenshot after filling it in.

If you want to get a screenshot, this is the line:

Driver.get_screenshot_as_file (' Show.png ')

However, the screenshot here is not with the scroll bar, is to give you the entire page photo down

Try

Dr=webdriverwait (driver,5)

Dr.until (Lambda the_driver:the_driver.find_element_by_xpath ('//a[@class = "Zu-top-nav-userinfo"]). Is_displayed () )

Except

print ' Login failed '

Sys.exit (0)

is used to check if an element is loaded to see if the login is successful, I think it is possible to use a black box. 5 of the Explanations: 1 page changes are scanned every 500 milliseconds in 5 seconds until the specified element

For a form submission, you can select the login button and then use the click Method, or you can select the form and then use the Submit method, which can handle the absence of a login button, so it is recommended to use the Submit ()

For a single click, you can either use the click () or use a series of actions, as in the text:

Loadmore=driver.find_element_by_xpath ('//a[@id = ' Zh-load-more ')

actions = Actionchains (driver)

Actions.move_to_element (Loadmore)

Actions.click (Loadmore)

Actions.perform ()

These 5 sentences are actually equivalent to a sentence, find element and click, but the scope of action is more extensive, for example, in this case, to click on a tag object, I do not know why directly with the click does not work

Print Driver.current_url

Print Driver.page_source

Print two properties of a Web page: URL and source

Reprint http://www.phperz.com/article/15/0829/117337.html

SELENIUM+PHANTOMJS parsing JS

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More