2018/4/29 self Record:
In the B-station web page source code, only the following figure red circle of exposure in the source code, but also does not include paging.
In the source code
Sort of.
Simply crawl this and keep the information in
E:/bilibili/temp.txt
#-*-coding:utf-8-*-import RE import requests import OS import time import datetime messages = [] Filebags = Os.path . Exists (' E:/bilibili ') #设置保存txt文件路径 if not filebags:os.makedirs (' E:/bilibili ') os.chdir (' E:/bilibili ') #设置工作路径 url = ' https://
Www.bilibili.com ' while (1): texts = requests.get (URL) if (Texts.status_code =): Contents = Texts.text Pattern = Re.compile (' <div class= ' groom-module home-card ' ><a href= ' (. *?) "target=" _blank "title=" (. *?) " ><p class= "Author" > (. *?) </p><p class= "Play" >, re. S) results = Re.findall (pattern,contents) if ' temp.txt ' not in Os.listdir (OS.GETCWD ()): & nbsp #是否存在txt文件 file = open (' Temp.txt ', ' W ', encoding= ' utf-8 ') for message in results: #prin T (' address is: ' + URL + message[0] + ' \ n title is: ' + message[1]) if MEssage[0] Not in messages: #检验是否重复 print (Datetime.datetime.now ()) Messages.append (Message[0]) with open (' Temp.txt ', ' a ') as F:f.write (' \ n address is: ' + URL + M
Essage[0] + ' \ n title: ' + message[1] + ' \ n ' + message[3] time.sleep #根据测试, should be delayed for 10 minutes best
Date The refresh occurs with the DateTime test:
The result is 10 minutes.