Selenium does not have a browser, and it needs to be combined with a third-party browser. For example, run selenium on Firefox.
Phantomjs is a "headless" browser. It loads the site into memory and executes the JavaScript on the page, but it does not show the user the graphical interface of the page. By combining selenium and PHANTOMJS, you can run a very powerful web crawler that can handle cookies, javascript,header, and anything you need to do.
Selenium can download the selenium library from the PyPI website (https://pypi.python.org/simple/selenium) or install it with PIP.
PHANTOMJS can be downloaded from official website (http://phantomjs.org/download.html), Phantomjs is not a Python library and cannot be installed with PIP.
1 fromSeleniumImportWebdriver2 Import Time3 4Driver = Webdriver. PHANTOMJS (executable_path=' ')5Driver.get ("http://pythonscraping.com/pages/javascript/ajaxDemo.html")6Time.sleep (3)7 Print(DRIVER.FIND_ELEMENT_BY_ID ('content'). Text)8Driver.close ()
The Executable_path variable value is the path of Phantomjs.exe. such as: Executable_path = '/download/phantomjs-2.1.1-windows/bin/phantomjs '
The selenium selector is a very straightforward name, and the above example can be used with the following selector:
Driver.find_element_by_css_selector ("#content")
Driver.find_element_by_tag_name ("div")
In addition, if you still want to use BeautifulSoup to parse the page content, you can use the Webdriver Page_source function to return the page's source code string.
1 pagesouce = driver.page_source2 bsobj = beautifulsoup (pagesource)3 Print (Bsobj.find (id="content"). Get_text ())
Executing javascript with selenium in Python