Scrapy Learning -16-Dynamic Web page technology

Last Update:2018-05-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Selenium Browser Automation testing Framework Introduction

Selenium is a tool for Web application testing. The selenium test runs directly in the browser, just as the real user is doing.

Supported browsers include IE (7, 8, 9, ten, one), Mozilla firefox,safari,google Chrome,opera, etc.

The main features of this tool include: test and browser compatibility--test your application to see if it works well on different browsers and operating systems.

Test system functions--Create regression test to verify software functionality and user requirements. Supports automatic recording of actions and automatic generation of test scripts in different languages such as. Net, Java, and Perl

function

The bottom of the framework uses JavaScript to simulate real-world user action on the browser. When the script executes, the browser automatically clicks, enters, opens, validates, and so on, as the real user does, testing the application from the end user's perspective.

Makes it possible to automate browser compatibility testing, although there are still subtle differences on different browsers.

Easy to use, writing use case scripts in multiple languages, such as Java,python

installation

1 pip install Selenium

Official Documents

1 http://selenium-python.readthedocs.io/

Driver Download

Chrome  https://sites.google.com/a/chromium.org/chromedriver/downloadsedge    https:// developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/Firefox https://github.com/mozilla/ Geckodriver/releases

Basic use of simulation landing know

 fromSeleniumImportWebdriver fromScrapy.selectorImportSelectorbrowser=Webdriver. Chrome (Executable_path="E:\Python Project\scrapyproject\_articlespider\chromedriver_win32\chromedriver.exe") Browser.get ("https://www.zhihu.com/#signin") Browser.find_element_by_css_selector (". View-signin input[name= ' account ']"). Send_keys ("18412542552") Browser.find_element_by_css_selector (". View-signin input[name= ' password ']"). Send_keys ("As15fqafa") Browser.find_element_by_css_selector (". View-signin Button.sign-button"). Click ()#browser.quit ()

basic use of analog landing Weibo and drop-down scroll bar

 fromSeleniumImportWebdriverImportTimebrowser=Webdriver. Chrome (Executable_path="E:\Python Project\scrapyproject\_articlespider\chromedriver_win32\chromedriver.exe") Browser.get ("https://weibo.com/") Time.sleep (5) Browser.find_element_by_css_selector ("#loginname"). Send_keys ("<username>") Browser.find_element_by_css_selector (". Info_list.password input[node-type= ' password ']"). Send_keys ("<password>") Browser.find_element_by_css_selector (". Info_list.login_btn a[node-type= ' submitbtn ']"). Click () forIinchRange (3): Browser.execute_script ("window.scrollto (0, document.body.scrollHeight); var lenofpage=document.body.scrollheight; return lenofpage;") Time.sleep (3)#browser.quit ()

basic use does not load the picture to promote the page load speed

 fromSeleniumImportwebdriverchrome_opt=Webdriver. Chromeoptions () prefs= {"profile.managed_default_content_settings.images": 2}chrome_opt.add_experimental_option ("prefs", prefs) browser=Webdriver. Chrome (Executable_path="E:\Python Project\scrapyproject\_articlespider\chromedriver_win32\chromedriver.exe", Chrome_options=chrome_opt) Browser.get ("https://www.taobao.com/")#browser.quit ()

basic use of hidden Chrom graphical interfaceNote: Download related modules are currently only available in Linux

1 pip install Pyvirtualdisplay

Related dependencies Download

sudo apt-get install xvfbpip install Xvfbwrapper

Use steps

 from Import  = Display (visible=0, size= (+= webdriver) . Chrome (    executable_path="E:\Python Project\scrapyproject\_articlespider\chromedriver_ Win32\chromedriver.exe") browser.get (https://www.taobao.com/)# browser.quit ()

Basic use of PHANTOMJSCharacteristics

interface-free browser, high efficiency

Use more when Linux has no graphical interface

PHANTOMJS performance can be severely degraded under multiple processes

Multi-threaded execution is unstable

Download

1 http://phantomjs.org/download.html

Simple to use

 fromSeleniumImportWebdriverbrowser=Webdriver. PHANTOMJS (Executable_path="E:\Python Project\scrapyproject\_articlespider\phantomjs-2.1.1-windows\bin\phantomjs.exe") Browser.get ("Https://item.taobao.com/item.htm?id=558638145403&ali_refid=a3_430673_1006:1109358544:N:%E6%89%8B%E6%9C %ba%e8%8b%b9%e6%9e%9c%e6%89%8b%e6%9c%ba:5d77c360cd1e64043b2f430be7531705&ali_trackid=1_ 5d77c360cd1e64043b2f430be7531705&spm=a2e15.8261149.07626516002.2")Print(Browser.page_source) browser.quit ()

integrated Selenium into the scrapy frameworkCreate a Chrom Browser object for each spider

Importscrapy fromScrapy.xlib.pydispatchImportDispatcher fromScrapyImportSignals fromSeleniumImportWebdriverclassJobbolespider (scrapy. Spider): Name="Jobbole"Allowed_domains= ["blog.jobbole.com"] Start_urls= ['http://blog.jobbole.com/all-posts/']    def __init__(self): Self.browser=Webdriver. Chrome (Executable_path="E:\Python Project\scrapyproject\_articlespider\chromedriver_win32\chromedriver.exe") Super (Jobbolespider, self).__init__() dispatcher.connect (self.spider_closed, signals.spider_closed)defspider_closed (self, Spider): Self.browser.quit ()defParse (self, response):Pass

Write middleware use Chrom to open Web pages when downloading

Import Time fromScrapy.httpImportHtmlresponseclassJspagemiddleware (object):defprocess_request (self, request, spider):ifSpider.name = ="Jobbole": Spider.browser.get (Request.url) time.sleep (3)            returnHtmlresponse (Url=spider.browser.current_url, Body=spider.browser.page_source, encoding="Utf-8", request=request)

Configure settings

Downloader_middlewares = {    'ArticleSpider.middlewares.JSPageMiddleware': 1,}

overriding downloader implementation Selenium support asynchronous requestsWe need to familiarize ourselves with the Scrapy programming specifications and can refer to

1 Https://github.com/flisky/scrapy-phantomjs-downloader

other browser automation test tools A lighter-weight tool for loading dynamic pages splash, gridCharacteristics

Superior performance over Chrom and PHANTOMJS

Support for distributed crawlers

Stability is not as high as Chrom

Splash-github Project

1 Https://github.com/scrapy-plugins/scrapy-splash

Selenium extension grid

1 Https://www.oschina.net/question/tag/selenium-grid

other browser Automation test tools splinter (Pure python development)

1 Https://github.com/cobrateam/splinter

Scrapy Learning -16-Dynamic Web page technology

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Scrapy Learning -16-Dynamic Web page technology

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Scrapy Learning -16-Dynamic Web page technology

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support