Scrapy Learning -16-Dynamic Web page technology

Source: Internet
Author: User

Selenium Browser Automation testing Framework Introduction
    • Selenium is a tool for Web application testing. The selenium test runs directly in the browser, just as the real user is doing.
    • Supported browsers include IE (7, 8, 9, ten, one), Mozilla firefox,safari,google Chrome,opera, etc.
    • The main features of this tool include: test and browser compatibility--test your application to see if it works well on different browsers and operating systems.
    • Test system functions--Create regression test to verify software functionality and user requirements. Supports automatic recording of actions and automatic generation of test scripts in different languages such as. Net, Java, and Perl
function
    • The bottom of the framework uses JavaScript to simulate real-world user action on the browser. When the script executes, the browser automatically clicks, enters, opens, validates, and so on, as the real user does, testing the application from the end user's perspective.
    • Makes it possible to automate browser compatibility testing, although there are still subtle differences on different browsers.
    • Easy to use, writing use case scripts in multiple languages, such as Java,python
installation
1 pip install Selenium

Official Documents
1 http://selenium-python.readthedocs.io/

Driver Download
Chrome  https://sites.google.com/a/chromium.org/chromedriver/downloadsedge    https:// developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/Firefox https://github.com/mozilla/ Geckodriver/releases

Basic use of simulation landing know
 fromSeleniumImportWebdriver fromScrapy.selectorImportSelectorbrowser=Webdriver. Chrome (Executable_path="E:\Python Project\scrapyproject\_articlespider\chromedriver_win32\chromedriver.exe") Browser.get ("https://www.zhihu.com/#signin") Browser.find_element_by_css_selector (". View-signin input[name= ' account ']"). Send_keys ("18412542552") Browser.find_element_by_css_selector (". View-signin input[name= ' password ']"). Send_keys ("As15fqafa") Browser.find_element_by_css_selector (". View-signin Button.sign-button"). Click ()#browser.quit ()

basic use of analog landing Weibo and drop-down scroll bar
 fromSeleniumImportWebdriverImportTimebrowser=Webdriver. Chrome (Executable_path="E:\Python Project\scrapyproject\_articlespider\chromedriver_win32\chromedriver.exe") Browser.get ("https://weibo.com/") Time.sleep (5) Browser.find_element_by_css_selector ("#loginname"). Send_keys ("<username>") Browser.find_element_by_css_selector (". Info_list.password input[node-type= ' password ']"). Send_keys ("<password>") Browser.find_element_by_css_selector (". Info_list.login_btn a[node-type= ' submitbtn ']"). Click () forIinchRange (3): Browser.execute_script ("window.scrollto (0, document.body.scrollHeight); var lenofpage=document.body.scrollheight; return lenofpage;") Time.sleep (3)#browser.quit ()

basic use does not load the picture to promote the page load speed
 fromSeleniumImportwebdriverchrome_opt=Webdriver. Chromeoptions () prefs= {"profile.managed_default_content_settings.images": 2}chrome_opt.add_experimental_option ("prefs", prefs) browser=Webdriver. Chrome (Executable_path="E:\Python Project\scrapyproject\_articlespider\chromedriver_win32\chromedriver.exe", Chrome_options=chrome_opt) Browser.get ("https://www.taobao.com/")#browser.quit ()

basic use of hidden Chrom graphical interfaceNote: Download related modules are currently only available in Linux
1 pip install Pyvirtualdisplay
Related dependencies Download
sudo apt-get install xvfbpip install Xvfbwrapper
Use steps
 from Import  = Display (visible=0, size= (+= webdriver) . Chrome (    executable_path="E:\Python Project\scrapyproject\_articlespider\chromedriver_ Win32\chromedriver.exe") browser.get (https://www.taobao.com/)# browser.quit ()

Basic use of PHANTOMJSCharacteristics
    • interface-free browser, high efficiency
    • Use more when Linux has no graphical interface
    • PHANTOMJS performance can be severely degraded under multiple processes
    • Multi-threaded execution is unstable
Download
1 http://phantomjs.org/download.html
Simple to use
 fromSeleniumImportWebdriverbrowser=Webdriver. PHANTOMJS (Executable_path="E:\Python Project\scrapyproject\_articlespider\phantomjs-2.1.1-windows\bin\phantomjs.exe") Browser.get ("Https://item.taobao.com/item.htm?id=558638145403&ali_refid=a3_430673_1006:1109358544:N:%E6%89%8B%E6%9C %ba%e8%8b%b9%e6%9e%9c%e6%89%8b%e6%9c%ba:5d77c360cd1e64043b2f430be7531705&ali_trackid=1_ 5d77c360cd1e64043b2f430be7531705&spm=a2e15.8261149.07626516002.2")Print(Browser.page_source) browser.quit ()

integrated Selenium into the scrapy frameworkCreate a Chrom Browser object for each spider
Importscrapy fromScrapy.xlib.pydispatchImportDispatcher fromScrapyImportSignals fromSeleniumImportWebdriverclassJobbolespider (scrapy. Spider): Name="Jobbole"Allowed_domains= ["blog.jobbole.com"] Start_urls= ['http://blog.jobbole.com/all-posts/']    def __init__(self): Self.browser=Webdriver. Chrome (Executable_path="E:\Python Project\scrapyproject\_articlespider\chromedriver_win32\chromedriver.exe") Super (Jobbolespider, self).__init__() dispatcher.connect (self.spider_closed, signals.spider_closed)defspider_closed (self, Spider): Self.browser.quit ()defParse (self, response):Pass
Write middleware use Chrom to open Web pages when downloading
Import Time fromScrapy.httpImportHtmlresponseclassJspagemiddleware (object):defprocess_request (self, request, spider):ifSpider.name = ="Jobbole": Spider.browser.get (Request.url) time.sleep (3)            returnHtmlresponse (Url=spider.browser.current_url, Body=spider.browser.page_source, encoding="Utf-8", request=request)
Configure settings
Downloader_middlewares = {    'ArticleSpider.middlewares.JSPageMiddleware': 1,}

overriding downloader implementation Selenium support asynchronous requestsWe need to familiarize ourselves with the Scrapy programming specifications and can refer to
1 Https://github.com/flisky/scrapy-phantomjs-downloader

other browser automation test tools A lighter-weight tool for loading dynamic pages splash, gridCharacteristics
    • Superior performance over Chrom and PHANTOMJS
    • Support for distributed crawlers
    • Stability is not as high as Chrom
Splash-github Project
1 Https://github.com/scrapy-plugins/scrapy-splash
Selenium extension grid
1 Https://www.oschina.net/question/tag/selenium-grid

other browser Automation test tools splinter (Pure python development)
1 Https://github.com/cobrateam/splinter

Scrapy Learning -16-Dynamic Web page technology

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.