Recently, as requested by the company, crawl related sites, found no JS package address, I use selenium to crawl information, related actual links: Python crawler combat (a)--------China crop germplasm Information Network
I. Introduction of Selenium
What is Selenium ? In a word, automated testing tools. It supports a variety of browsers, including Chrome,safari,firefox and other mainstream interface browser, if you install a Selenium plug-in in these browsers, then you can easily implement the Web interface testing. In other words, call Selenium to support these browser drivers. Selenium support multiple language development, such as Java,c,ruby and so on, have Python? That's a must! Install as long as cmd in pip install Selenium can.
Second, why do reptiles use selenium?
For the general web site Scrapy, requests, beautifulsoup and so can crawl, but some information needs to perform JS to appear, and you can see the basic can be seen by the naked eye to climb down, in the study encountered, on the record down for easy viewing later.
webdrive is a function in selenium:
1 from Import Webdriver 2 Driver = webdriver. PHANTOMJS ()3 driver.get (' url ')
Which PHANTOMJS can also be replaced by Chrome, Firefox, ie and so on, but Phantomjs is a headless browser, running will not jump out of the corresponding browser, running relatively high efficiency. In the debugging can be replaced by chrome, easy to debug, and finally replaced by PHANTOMJS.
Iii. introduction of PHANTOMJS
Phantomjs is a WebKit-based JavaScript API. It uses Qtwebkit as the function of its core browser, using WebKit to compile and interpret the execution of JavaScript code. Anything you can do on a WebKit browser can do it. Not only is it an invisible browser, it provides such things as CSS selectors, web standards support, DOM manipulation, JSON, HTML5, Canvas, SVG, and so on, as well as handling file I/O, so you can read and write files to the operating system. The usefulness of PHANTOMJS is very extensive, such as the front-end Interface automation testing (need to combine Jasmin), network Monitoring, Web page screenshots and so on.
PHANTOMJS official address: http://phantomjs.org/.
PHANTOMJS official api:http://phantomjs.org/api/.
PHANTOMJS Official example: http://phantomjs.org/examples/.
Phantomjs github:https://github.com/ariya/phantomjs/.
Iv. installation of PHANTOMJS
I windowns7 the system, the downloaded phantomjs.exe moved to your Python folder under the script can be used. (Download Phantomjs-2.1.1-windowns.zip Link: http://pan.baidu.com/s/1c8HeBc password:2zm4)
Small test:
1 From Selenium import Webdriver 2 Driver = webdriver. PHANTOMJS ()3 driver.get ("http://hotel.qunar.com/")4 data = Driver.title5print data67# Output 8 "Where to go Hotel" hotel reservation, hotel query-where to go net qunar.com
Five, the operation of the actual combat
1 #-*-coding:utf-8-*-2 fromSeleniumImportWebdriver3 Import Time4 ImportWin32API5 ImportRe6 ImportWin32con7Browser =Webdriver. PHANTOMJS ()8 " "Phantomjs's screen is scrolling at the bottom, and Chrome doesn't" "9 TenBrowser.get ("http://flight.qunar.com/")#Open the website. OneA=browser.get_screenshot_as_file ("e:/python27/test2.jpg")#Screen A -BROWSER.FIND_ELEMENT_BY_ID ("Searchtypernd"). Click ()#Click Round Trip -Browser.find_element_by_xpath ('//*[@id = "Dfsform"]/div[2]/div[1]/div/input'). Clear ()#first clear the input box, the default is a place theBrowser.find_element_by_xpath ('//*[@id = "Dfsform"]/div[2]/div[1]/div/input'). Send_keys (U"Beijing")#Enter start position - - " "here Win32API can refer to the relevant manual----The following is the keyboard operation" " -Time.sleep (0.5) +Win32api.keybd_event (108,0,0,0)#Press the ENTER key - #Press a key win32api.keybd_event (keying code, 0,0,0) +Win32api.keybd_event (108,0,win32con. keyeventf_keyup,0)#release button A #release key win32api.keybd_event (keying code, 0,win32con. keyeventf_keyup,0) at -Browser.find_element_by_xpath ('//*[@id = "Dfsform"]/div[2]/div[2]/div/input'). Clear () -Browser.find_element_by_xpath ('//*[@id = "Dfsform"]/div[2]/div[2]/div/input'). Send_keys (U"Shanghai")#Enter End Location -Time.sleep (0.5) -Win32api.keybd_event (108,0,0,0)#Press the ENTER key -Win32api.keybd_event (108,0,win32con. keyeventf_keyup,0)#release button in -Browser.find_element_by_xpath ('//*[@id = "FromDate"]'). Clear () toBrowser.find_element_by_xpath ('//*[@id = "FromDate"]'). Send_keys ("2017-04-19")#Enter departure time + #Browser.find_element_by_xpath ('//*[@id = "FromDate"]). Click () -Browser.find_element_by_xpath ('//*[@id = "ToDate"]'). Clear () theBrowser.find_element_by_xpath ('//*[@id = "ToDate"]'). Send_keys ("2017-04-22")#Enter return time * #Browser.find_element_by_xpath ('//*[@id = "ToDate"]). Click () $ Panax Notoginseng - " "Law II set up place and time" " the #browser.find_element_by_name ("name"). Send_keys ("Beijing (BJS)") #设置值 + #browser.find_element_by_name ("Pass"). Send_keys ("Shanghai (SHA)") #设置值 A #browser.find_element_by_id ("txtAirplaneTime1"). Send_keys ("2016-12-19") #设置值 the +Browser.find_element_by_xpath ('//*[@id = "Dfsform"]/div[4]/button'). Click ()#Click on the button to submit the form -Browser.maximize_window ()#Maximum Window $ $ " "Save current Page" " - Print(Browser.current_url)#Current URL - #browser.get ("http://www.ly.com/FlightQuery.aspx") #cookie保存在对象中, the required certification page can be accessed directly theData=browser.page_source.encode ("Utf-8","Ignore") -Fh=open ("e:/python27/qun.html","WB")Wuyi fh.write (data) the fh.close () -Data2=Browser.page_source Wu #Print Data2 -A=browser.get_screenshot_as_file ("e:/python27/test.jpg") About #print (Browser.page_source) $ - " "' You can grab a few things later." " - -Browser.quit ()
Six, the recommended insect Master blog's learning materials will be implemented with Selenium + Webdriver + python
Relationship:
Selenium Automated Test Tool: Selnium 1.0 includes: Selenium RC, selenium IDE, selenium GRID, selenium CORE
Webdriver Google's automated Testing framework (or set of specification APIs)
Webdriver and Seleinum each have advantages, two teams think the merger will be more awesome, so:
Selenium 2.0 = Selenium RC + webdriver
Selenium can be implemented in multiple languages: C #, Java, Python, Ruby ....
Environment Construction:
Selenium + Python Automated test environment build: Selenium webdriver (python)---------------- Easy automation of series catalogs-----------Easy Automation---selenium-webdriver (python) (i)
Start our first script:
- Familiarity with selenium python code styles
- Time.sleep () Add sleep time
- Print printout information
Easy Automation---selenium-webdriver (python) (ii)
- Print URL
- Maximize your browser
- Set browser fixed width, height
- Navigate the browser forward and backward
Easy Automation---selenium-webdriver (python) (iii)
* Simple Object positioning:
- · Id
- · Name
- · Class name
- · Link text
- · Partial link text
- · Tag name
- · Xpath
- · CSS Selector
Easy Automation---selenium-webdriver (python) (iv)
- Positioning a group of elements
Easy Automation---selenium-webdriver (python) (v)
Easy Automation---selenium-webdriver (python) (vi)
Action object:
- · Click on Object
- · Send_keys analog key input on the object
- · Clear clears the contents of the object, if possible
Webelement Other common methods:
- · Text gets the literal of the element
- · Submit Form
- · Get_attribute Getting property values
Easy Automation---selenium-webdriver (python) (vii)
Multi-layered frame or window positioning:
- Switch_to_frame ()
- Switch_to_window ()
Smart wait:
Easy Automation---selenium-webdriver (python) (eight)
Invoking the JS method
Execute_script (script, *args)
Easy Automation---selenium-webdriver (python) (ix)
Easy Automation---selenium-webdriver (python) (10)
- Handling drop-down boxes
- Switch_to_alert ()
- Accept ()
Easy Automation---selenium-webdriver (python) (11)
- Control scroll bar to bottom
Easy Automation---selenium-webdriver (python) (12)
- L Keyboard Key Usage
- L Keyboard Combination Key Usage
- L Send_keys () input Chinese operation error problem
Selenium-webdriver (Python) (13)--Cookie processing
- Driver.get_cookies () Get cookie information
- Add_cookie (COOKIE_DICT) Adding session information to cookies
- Delete_cookie (name) delete a specific (partial) cookie
- Delete_all_cookies () Delete all Cookies
Selenium-webdriver (Python) (14)--Webdriver principle
Selenium-webdriver (Python) (15)--Mouse events
- Context_click () Right-click
- Double_click () Double-click
- Drag_and_drop () drag
Selenium-webdriver (Python) (16)--unittest frame
- Analysis of UnitTest test framework
Today's filial piety
Source: Http://www.cnblogs.com/jinxiao-pu
This article is copyrighted by the author and the blog Park, Welcome to reprint, but without the consent of the author must retain this paragraph, and in the article page obvious location to the original link.
Reference Link: http://www.cnblogs.com/zzhzhao/p/5380376.html
Http://www.cnblogs.com/BigFishFly/p/6380024.html
Python crawler accumulation (i)--------the use of SELENIUM+PYTHON+PHANTOMJS