Python crawler tutorial -26-selenium + PHANTOMJS

Source: Internet
Author: User

Python crawler tutorial -26-selenium + PHANTOMJS
  • Dynamic Front-end page:
      • javascript:
        JavaScript a literal-translation scripting language, a dynamic type, a weak type, a prototype-based language, and a built-in support type. Its interpreter, known as the JavaScript engine, is widely used in the client's scripting language as part of the browser, and is first used in HTML (an application under the standard Universal Markup Language) to add dynamic functionality to an HTML page
      • jquery:
        jquery is a fast, concise JavaScript framework that is a good JavaScript code base (or JavaScript framework) following prototype. The purpose of jquery design is "write Less,do more", which advocates writing less code and doing more things. It encapsulates common JavaScript functionality code, provides a simple JavaScript design pattern, optimizes HTML document manipulation, event handling, animation design, and Ajax interaction
      • Ajax:
        Ajax "Asynchronous JavaScript and XML" (Asynchronous JavaScript and XML) refers to a web development technique that creates interactive Web applications.
        Ajax = Asynchronous JavaScript and XML (a subset of standard generic markup languages).
        Ajax is a technique for creating fast, Dynamic Web pages.
        Ajax is a technique for updating parts of a Web page without reloading the entire page.
        through the background with the server
      • DHTML:
        DHTML is the short name for Dynamic HTML, which is a dynamically HTML (an application under the standard Universal Markup Language), which is the concept of making Web pages relative to traditional static HTML. Dynamic HTML, called DHTML, is not really a new language, it's just an integration of HTML, CSS, and client-side scripting, where a page includes html+css+javascript (or other client script). Where CSS and client-side scripts are written directly on the page rather than linked on the related file. DHTML is not a technology, standard, or specification, but a combination of existing web technologies and language standards, creating a Web design concept that can still transform page element effects in real time after downloading
Python collects Dynamic Data
    • Starting with JavaScript code acquisition
    • Python third-party libraries run JavaScript and directly capture the pages you see in your browser
Selenium + PHANTOMJS
    • Selenium:web Automated Testing tools
    • Selenium Official Document: https://www.seleniumhq.org/docs/
    • Features of the Selenium:
    • 1. Loading pages automatically
    • 2. Get Data
    • 3. Screen Cutting
    • PHANTOMJS: Webkit-based browser with no interface
      • Operated by Selenium Phantomjs
Installation of Selenium
    • If you are using Anaconda:
      • Enter the current environment: (My environment is named learn, if there is only one base environment, ignore this step)

        Activate learn

      • Installing Selenium

        Conda Install Selenium

    • Of course, it can be installed directly in the Pycharm.
      • "Pycharm" > "File" > "Settings" > "Project Interpreter" > "+" > "Selenium" > "Install"
      • Specific operation:

Installation of PHANTOMJS
    • : http://phantomjs.org/download.html
    • Download as per your operating system version, unzip is available
Use of Selenium
    • The Selenium Library has a webdriver API
    • Webdriver can interact with the elements on the page and use it to crawl
    • Note: Use PHANTOMJS to automatically find the appropriate browser according to the environment variables, if you do not configure the environment variable to take the path as a parameter
    • Case code 28dhtml.py file: https://xpwi.github.io/py/py%E7%88%AC%E8%99%AB/py28dhtml.py
# Selenium 的使用# 通过 WebDriver 操作百度进行查找from selenium import webdriverimport time# 通过 Keys 模拟键盘# 也就是放入需要输入的东西,就不用键盘输入了from selenium.webdriver.common.keys import Keys# 操作哪个浏览器就对哪个浏览器创建一个实例,这里是 PhantomJS# 自动按照环境变量查找相应浏览器,如果没有配置环境变量就将路径作为参数driver = webdriver.PhantomJS(executable_path=r"D:\app\phantomjs-2.1.1-windows\bin\phantomjs.exe")driver.get("http://www.baidu.com")# 通过函数查找 title 标签print("Title: {0}".format(driver.title))
Run results

Note: If you do not configure an environment variable, use your own path as a parameter

The red Word is not an error, the print title success is used successfully

-This note does not allow any person or organization to reprint

Python crawler tutorial -26-selenium + PHANTOMJS

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.