Selenium+phantomjs using the first experience

Source: Internet
Author: User

SELENIUM+PHANTOMJS technology can be used when crawling web content that is done using AJAX technology

1.pip install selenium2. Download Phantomjs no need to use PIP Wuhan University of Technology Homepage There is a piece of Web content that uses JS to load asynchronously,

The idea of grabbing this piece of content is to determine if the piece is loaded; Selenium crawl

In judging the loading completed this step can be judged whether there is a ' school-enterprise cooperation ' appears

(PS: In fact, it is reasonable to find the asynchronous content inside of a last loaded elements, but this example of the element has no redundant features to choose from)

1 #Coding:utf-82  fromSeleniumImportWebdriver3  fromSelenium.webdriver.common.byImport by4  fromSelenium.webdriver.support.uiImportwebdriverwait5  fromSelenium.webdriver.supportImportExpected_conditions as EC6 7Driver = Webdriver. PHANTOMJS (Executable_path ='C://python27//scripts//phantomjs-2.1.1-windows//bin//phantomjs')8Driver.get ("http://www.wust.edu.cn/default.html")9 Ten Try: OneElment = webdriverwait (Driver, ten). Until (Ec.presence_of_element_located (By.partial_link_text,'School-Enterprise cooperation'))) A finally: -UL = driver.find_element_by_id ('infocont_137575764138965434_148645613741998292') -Status ='False:' the     iful!=None: -Lis = Ul.find_elements_by_tag_name ('Li') -         iflis==None: -             Print('Query failed') +          forLiinchlis: -Text = Li.find_element_by_tag_name ('a'). Text +             iftext!="': AStatus ='Tuple:' at                 Print(status+text) -Driver.close ()

This procedure is performed in the following steps:

Determine if there is a link containing the "school-Enterprise cooperation" string;

Find the UL tag with ID infocont_137575764138965434_148645613741998292

Find the LI tag inside the UL tag

Find the A tag in the Li tag and extract the text of the A tag

It is worth noting that:

The Windows system needs to set the encoding on the first line;

Use Webdriverwait to Judge page load status, better than time.sleep effect;

Asynchronous loading may return more Li tags than is displayed, the review element can be seen, but the page does not show it, so you need to judge text!= ';

Labels cannot be found directly across hierarchies.

Operation Result:

Selenium+phantomjs using the first experience

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.