The following code is used to crawl the LinkedIn site for some scholar's experience, for reference only, Note: Do not use a large number of crawls will be blocked number, do not ask me why I know
#-*-coding:utf-8-*- fromSeleniumImportWebdriver fromSelenium.webdriver.common.keysImportKeysImport Time fromBs4ImportBeautifulsoupdiver=Webdriver. Chrome () Diver.get ('https://www.linkedin.com/')#wait for Web site to finish loadingTime.sleep (1)#Analog LoginDIVER.FIND_ELEMENT_BY_ID ('Login-email'). Send_keys (user name) diver.find_element_by_id ('Login-password'). Send_keys (password)#Click to jumpDIVER.FIND_ELEMENT_BY_ID ('Login-submit'). Send_keys (Keys.enter) time.sleep (1)#EnquiryDiver.find_element_by_tag_name ('input'). Send_keys (scholar name) Diver.find_element_by_tag_name ('input'). Send_keys (Keys.enter) time.sleep (1)#get the current page for all possible peopleSoup=beautifulsoup (Diver.page_source,'lxml') Items=soup.findall ('Div',{'class':'Search-result__wrapper'}) N=0 forIinchitems:n+=1title=i.find ('Div',{'class':'Search-result__image-wrapper'}). Find ('a')['href']diver.get ('https://www.linkedin.com'+title) Time.sleep (3) Soup=beautifulsoup (Diver.page_source,'lxml')#Print SoupItems=soup.findall ('Li',{'class':'Pv-profile-section__card-item pv-position-entity Ember-view'})PrintSTR (n) +':' forIinchItems:PrintI.find ('Div',{'class':'Pv-entity__summary-info'}). Get_text (). Replace ('\ n',"') Diver.close ()
Analog Login + data crawling (Python+selenuim)