Watercress is more in line with the "Ming Don't say Dark words" principle. So we grilled watercress, not much to say, directly on the code
fromScrapyImportappImportReheader= { 'user-agent': 'mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/50.0.2661.102 safari/537.36', 'Host':'movie.douban.com', 'Accept-language':'zh-cn,zh;q=0.9'}movie_url="https://movie.douban.com/subject/26985127/?from=showing"m_id= Re.search ("[0-9]+", Movie_url). Group ()#Get Soup ObjectSoup = App.get_soup (Url=movie_url, Headers=header, charset="Utf-8") Content= Soup.find (id="content")#capture movie name and release yearM_name = Content.find ("H1"). Find ("span"). Stringm_year= Content.find (class_=" Year"). String#grabbing directorinfo = Content.find (id="Info") M_directer= Info.find (attrs={"rel":"V:directedby"}). String#Release dateM_date = Info.find (attrs={" Property":"v:initialreleasedate"}). String#typeTypes = Info.find_all (attrs={" Property":"v:genre"}, limit=2) M_types= [] forType_inchtypes:m_types.append (type_.string)#grab the starring, take only the front fiveActors = Info.find (class_="actor"). Find_all (attrs={"rel":"v:starring"}, Limit=5) M_actors= [] forActorinchactors:m_actors.append (actor.string)#piece LengthM_time = Info.find (attrs={" Property":"V:runtime"}). String#m_adaptor = Info.select ()Print("ID", m_id,"name", M_name,"year", M_year,"Director", M_directer,"starring", M_actors)Print("Release date", M_date,"type", M_types,"piece Length", M_time)
Output:
ID 26,985,127 name a Play year (2018) The director Huang Bo starring [' huang Bo ' shu qi ' Wangbaoqiang ' Zhang Yi ' cast ' 2018-08-10 (Mainland China) type [' drama ' comedy '] 134 minutes long
Simple Rough
Python starts the crawler from 0-turn the Watercress movie