PYTHON: News Aggregation

Source: Internet
Author: User

This project has been watching for a while, because it has not been running, and it is not particularly understood about NNTP. Here is the analysis of reprint code123.

Original address: http://www.code123.cc/1327.html

The fourth exercise in the book, News aggregation. It's a rare type of application that I've never used, or Usenet. The main function of this program is to collect information from the specified source (this is a Usenet newsgroup) and then to save the information to the specified destination file (two forms are used here: Plain text and HTML files). The usefulness of this program is somewhat similar to the current blog subscription tool or RSS feed.

First, the code, and then to analyze it individually:

 fromNntplibImportNNTP fromTimeImportStrftime,time,localtime fromEmailImportmessage_from_string fromUrllibImportUrlopenImportTextWrapImportRe Day= 24*60*60defWrap (string,max=70):        " "         " "        return '\ n'. Join (Textwrap.wrap (string)) +'\ n' classNewsagent:" "        " "        def __init__(self): self.sources=[] self.destinations= []         defAddsource (Self,source): self.sources.append (source)defadddestination (self,dest): Self.destinations.append (dest)defDistribute (self): items= []                 forSourceinchSelf.sources:items.extend (Source.getitems ()) forDestinchself.destinations:dest.receiveItems (items)classNewsItem:def __init__(self,title,body): Self.title=title Self.body=BodyclassNntpsource:def __init__(Self,servername,group,window): Self.servername=servername Self.group=Group Self.window=windowdefGetItems (self): Start= LocalTime (Time ()-self.window*Day ) Date= Strftime ('%y%m%d', start) hour= Strftime ('%h%m%s', start) server=NNTP (self.servername) IDs= Server.newnews (Self.group,date,hour) [1]                  forIdinchIds:lines= server.article (ID) [3] Message= Message_from_string ('\ n'. Join (lines)) title= message['subject'] Body=message.get_payload ()ifMessage.is_multipart (): Body=Body[0]yieldNewsItem (Title,body) server.quit ()classSimplewebsource:def __init__(Self,url,titlepattern,bodypattern): Self.url=URL Self.titlepattern=re.compile (titlepattern) Self.bodypattern=re.compile (Bodypattern)defGetItems (self): text=Urlopen (Self.url). Read () Titles=Self.titlePattern.findall (text) Bodies=self.bodyPattern.findall (text) forTitle.bodyinchZip (titles,bodies):yieldNewsItem (Title,wrap (body))classplaindestination:defReceiveitems (self,items): forIteminchItems:PrintItem.titlePrint '-'*Len (item.title)PrintItem.Bodyclasshtmldestination:def __init__(self,filename): Self.filename=filenamedefReceiveitems (self,items): Out= Open (Self.filename,'W')                Print>> out," "" "                 Print>> out,'<ul>'ID=0 forIteminchItems:id+ = 1Print>> out,'<li><a href= "#" >%s</a></li>'%(Id,item.title)Print>> out,'</ul>'ID=0 forIteminchItems:id+ = 1Print>> out,''%(Id,item.title)Print>> out,'<pre>%s</pre>'%Item.BodyPrint>> out," "</body> " "defRundefaultsetup (): Agent=newsagent () Bbc_url='http://news.bbc.co.uk/text_only.stm'Bbc_title= R'(? s) a href= "[^"]* ">\s*<b>\s* (. *?) \s*</b>'Bbc_body= R'(? s) </a>\s*<br/>\s* (. *?) \s*<'BBC=Simplewebsource (Bbc_url, Bbc_title, Bbc_body) Agent.addsource (BBC) Clpa_server='news2.neva.ru'Clpa_group='Alt.sex.telephone'Clpa_window= 1Clpa=Nntpsource (Clpa_server,clpa_group,clpa_window) Agent.addsource (CLPA) agent.adddestination (PlainDestin ation ()) agent.adddestination (Htmldestination ('news.html') ) Agent.distribute ()if __name__=='__main__': Rundefaultsetup ()

This program, first from the overall analysis, the focus is newsagent, its role is to store news sources, store the target address, The source servers (Nntpsource and Simplewebsource) and the classes that write the news (Plaindestination and htmldestination) are then called separately. So I can see from here that Nntpsource is specifically used to get the information on the news server, and Simplewebsource is to get the data on a URL. The role of Plaindestination and Htmldestination is obvious, the former is used to output the obtained content to the terminal, the latter is to write data into the HTML file.

With these analyses, and then looking at the contents of the main program, the main program is to add information sources and output destination addresses to newsagent.

This is really a simple procedure, but the program is layered.

PYTHON: News Aggregation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.