selenium+PhantomJS小案例—爬豆瓣網所有電影代碼python

來源:互聯網
上載者:User

標籤:[1]   name   python   append   item   from   end   app   src   

#coding=utf-8
from selenium import webdriver

def crawMovie():
driver=webdriver.PhantomJS()
driver.get("https://movie.douban.com/")
movie_list=[]
more_btn=driver.find_element_by_xpath(‘(//a[@class="more-link"])[1]‘)
more_btn.click()

while True:
start_index=len(movie_list)
xpath_str=‘//a[@class="item"][position()>%d]‘%start_index
item_tags=driver.find_elements_by_xpath(xpath_str)
print "start_index:",start_index
print item_tags
print "number:",len(item_tags)
for item_tag in item_tags:
img_tag=item_tag.find_element_by_tag_name(‘img‘)
cover=img_tag.get_attribute("src")
title=img_tag.get_attribute("alt")
rating=item_tag.find_element_by_xpath(".//p/strong").text

movie={‘cover‘:cover,
‘title‘:title,
‘rating‘:rating
}

movie_list.append(movie)
print "--"*20
load_more_btn=driver.find_element_by_xpath(‘//a[@class="more"]‘)
if load_more_btn.get_attribute("style"):
break
load_more_btn.click()

with open("e:\\movie_list.txt","w") as fp:
for d in movie_list:
temp=""
for k in d:
temp+=k+":"+d[k]+","
fp.write("{"+temp.strip(",")+"}"+"\n")

if __name__=="__main__":
crawMovie()

selenium+PhantomJS小案例—爬豆瓣網所有電影代碼python

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.