1. Installing Selenium
PIP/PIP3 Install Selenium
Pay attention to dependency relationships
2.PHANTOMJS for Windows
: http://phantomjs.org/download.html
Phantomjs-2.1.1-windows only supports 64-bit systems
Phantomjs-1.9.7-windows supports 32-bit systems, earlier versions not tested
Copy the Phantomjs.exe file from the downloaded installation package bin directory to the scripts directory under the Python installation directory
3. Simulate browser operation
Import Module
from Import Webdriver
Visit page
WB ="http://www.test.com"wb.get (URL)
Login
Wb.find_element_by_name ("username"). Send_keys ("user" ) ) Wb.find_element_by_name ("password"). Send_keys ("123456 " ) wb.find_element_by_name ("submit"). Click ()
Frame/Window Jump
Wb.switch_to.frame ('frame_name') Wb.switch_to.window ('window_ Name')
Page
Wb.get_screenshot_as_file ("d:\\test.jpg")
HTML parsing _ element positioning
Various elements on the page that require mouse clicks can be activated by using the click Method on the element location
Select options for the drop-down list
from Import Selectselect (wb.find_element_by_name ("colour")). Select_by_visible_text ( ' Red ')
Refresh Page
Wb.refresh ()
Close page
Wb.close ()
4. Features
Between JS, H5 trend, most of the site is mixed with JS data loading, the data is delayed loading. We need to make the page JS rendering data loaded completely, and then start parsing. Using third-party libraries is simpler, but at the expense of some efficiency. Selenium is like a large container, inside the PHANTOMJS to achieve JS rendering, we can directly manipulate the selenium API.
5. Precautions
When packaging with Pyinstaller, the selenium library fails to load if the "-F" option is packaged as a standalone file. After packaging is complete, copy the Phantomjs.exe file to the directory where you packaged the production EXE file.
SELENIUM+PHANTOMJS crawling dynamic page data