Graduation design topic is to use Scrapy crawl QQ space data, recently graduated design finished, to summarize the following: First, the problem of analog login:
Because Tencent to the analog login more annoying, each preparedness, and I limited ability, so do the simplest, manual login, get cookie information, and then carry access.
Next is the data interface:
Through the Web page analysis of QQ space, the data is mainly returned through the form of JSON. Two data interfaces selected for data fetching
Details of each QQ interface: "http://user.qzone.qq.com/p/base.s8/cgi-bin/user/cgi_userinfo_get_all?uin=" +str (SELF.CURRENTQQ) + " &vuin=qq&fupdate=1&rd=0.007898919197098397&g_tk= "+GTK
Each QQ said information interface: "Http://taotao.qq.com/cgi-bin/emotion_cgi_msglist_v6?uin=" +str (SELF.CURRENTQQ) + "&ftype=0& sort=0&pos=0&num=40&replynum=100&g_tk= "+str (SELF.GETOLDGTK (self.skey)) +" &callback=_ Preloadcallback&code_version=1&format=json&need_private_comment=1 "
Finally, the use of Scrapy to complete the network request, and the data into the database finally made Simple statistics:
Crawl QQ space through Scrapy