Demand:
In the following page, crawl the latest news, divided by day.
Http://blog.eastmoney.com/13102551638/bloglist_0_1.html
Implementation Method 1: Using recursion
Import Urllib
Import re
Import time
#读取网页内容
Content = Urllib.urlopen (' http://blog.eastmoney.com/13102551638/bloglist_0_1.html '). Read ()
#print Content
#截取一部分
Pre = Re.compile (' <li><a href= ' (. +?) "target=" _blank "> (. +?) </a><span class= "Time" > (. +?) </span></li> ')
New = Re.findall (pre,content)
#print New
Class News:
#当前年月日
T=int (Time.strftime ("%y%m%d", Time.localtime ()))
def __init__ (SELF,CT):
SELF.CT = ct
def search (self):
News.t-=1
#循环这个列表
For item in SELF.CT:
#列表里, the time of the news
date = Int (item[2][1:5]+item[2][6:8]+item[2][9:11])
#如果新闻是今天发的
If date >= news.t:
#输出这个新闻的标题
TITLE=ITEM[1]
return title
#否则, continue with the recursive search function
Else
News.search ()
Aaa=news (New)
Cc=aaa.search ()
Print (CC)
Implementation Method 2: Using the While loop
Import Urllib
Import re
Import time
#读取网页内容
Content = Urllib.urlopen (' http://blog.eastmoney.com/13102551638/bloglist_0_1.html '). Read ()
#print Content
#截取一部分
Pre = Re.compile (' <li><a href= ' (. +?) "target=" _blank "> (. +?) </a><span class= "Time" > (. +?) </span></li> ')
New = Re.findall (pre,content)
#print New
Class Good:
def __init__ (SELF,CT):
SELF.CT = ct
def search (self):
Cc=self.ct
I=0
#第一条新闻时间和下一条新闻时间对比, one analogy. If same, output the title of the first news, continue the loop
While cc[i][2][0:11] = = Cc[i+1][2][0:11]:
Print (cc[i][1])
I+=1
#如果不一样, output the title of the first piece of news you just compared
Else
Print (cc[i][1])
Aaa=good (New)
Cc=aaa.search ()
"Python practice" intercepts the latest news in the Web