This paper is a simple way to capture the content information of Huawei's official microblogging blog. First grab the cookie that logs on to the microblog, and then use a cookie to log on to the microblog.
The specific code looks like this:
#-*-Coding:utf-8-*-
"" "
Created on Sun Apr 14:16:32 2017
@author: Zch" ""
import requests
F Rom BS4 import beautifulsoup
import time
import pandas as PD
#放入cookie信息
cook = {"Cookie": "_t_wm= ..."}< c10/> #爬取华为终端官方微博的内容
url = "Https://weibo.cn/huaweidevice"
html = requests.get (url,cookies=cook). Content
#使用Beautiful来解析网页内容.
soup =beautifulsoup (HTML, "Html.parser")
r = Soup.findall (' span ', attrs={' class ': ' CTT '}) for e-in
r:< C17/>print (E.text)
The results of the run are as follows:
Of course, the code above can only crawl a small amount of content information at a time, but also very simple. Then we should find a way to solve the problem of multiple pages continuous crawling and automatic classification of storage .