Because Sina Weibo web version crawler is more difficult, so take the mobile web-page crawl
The procedure is as follows:
1. Web-site landing Sina Weibo
2. Open m.weibo.cn
3. Find the topic you are interested in and get the corresponding data interface link
4. Access to cookies and headers
#-*-coding:utf-8-*-ImportRequestsImportCSVImportOsbase_url='Https://m.weibo.cn/api/comments/show?id=4131150395559419&page={page}'Cookies= {'Cookies':' xxx'} headers= {'user-agent':'XXX'}path= OS.GETCWD () +"/weibo.csv"CSVFile= Open (Path,'A +', encoding='Utf-8', newline="') Writer=Csv.writer (CSVFile) Writer.writerow (('username','Source','Comment')) forIinchRange (0,83): Try: URL= Base_url.format (page=i) Resp= Requests.get (URL, headers=headers, cookies=cookies) Jsondata=Resp.json () data= Jsondata.get ('Data') forDinchData:created_at= D.get ("Created_at") Source= D.get ("Source") Username= D.get ("User"). Get ("Screen_name") Comment= D.get ("text") Print(( username,source,comment)) Writer.writerow ((username, source, comment))except: Print('*'*1000) Passcsvfile.close ()
As for crawling out of the data have non-Chinese data, to extract Chinese, please refer to: Filter out a paragraph of Chinese language
Not to be continued ....
Python crawls Sina Weibo comment data, written in CSV file