Python Basic Tutorial Summary 15--4 News Aggregation

Source: Internet
Author: User

NNTP: Network News Transfer Protocol

Goal:

Collecting news from a variety of different sources;

Users can easily add new news sources (even new types of news sources;

The program can send out the compiled news reports in various formats of the target;

The program can easily add new targets (even new kinds of targets)

1. Simple News Agent Program

1) NNTP Class object: Instantiate with NNTP server name;

Newnews method: Returns an article published after a given date time;

Head method: Provides various information about the document (mainly the subject);

Body method: Provides the body text of the article

2) Time. LocalTime([ sec ]): Convert SEC to Time.struct_time type Object

time. Strftime(format[, T]): Format--formatted string, T--optional parameter T is a Struct_time object that returns the time in a readable string representation

#newsagent1.py fromNntplibImportNNTP fromTimeImportstrftime, time, Localtimeday=24*60*60Yesterday=localtime (Time ()-Day ) Date=strftime ('%y%m%d', yesterday) hour=strftime ('%h%m%s', yesterday) servername='News.foo.bar'  #Fictitious server namegroup='comp.lang.python.announce'Sever=NNTP (servername) IDs=server.newnews (Group.date.hour) [1]#Extract the Newnews method returns the second argument of a tuple, which is the ID number of the published article forIdinchIds:head=server.head (ID) [3]#returns the fourth element of an information tuple: a list of strings (the data itself)     forLineinchHead:ifLine.lower (). Startswitch ('Subject:'): Subject= Line[9:]             BreakBody=server.body (ID) [3]    PrintsubjectPrint '-'*len (subject)Print '\ n'. Join (body) server.quit ()

2. Improvements

 fromNntplibImportNNTP fromTimeImportStrftime,time,localtime fromEmailImportmessage_from_string fromUrllibImportUrlopenImportTextWrapImportReday= 24*60*60defWrap (string,max=70):        " "Adjust the string to the maximum line width" "        return '\ n'. Join (Textwrap.wrap (string)) +'\ n'classNewsagent:" "objects that can get news items from a news source and publish them to a news destination" "        def __init__(self): self.sources=[] self.destinations= []        defAddsource (Self,source): self.sources.append (source)defadddestination (self,dest): Self.destinations.append (dest)defDistribute (self):" "get all news items from all sources and publish to all targets" "Items= []                 forSourceinchSelf.sources:items.extend (Source.getitems ()) forDestinchself.destinations:dest.receiveItems (items)classNewsItem:" "Simple news items that include title and subject text" "        def __init__(self,title,body): Self.title=title Self.body=BodyclassNntpsource:" "Get news sources for news items from an NNTP group" "        def __init__(Self,servername,group,window): Self.servername=servername Self.group=Group Self.window=windowdefGetItems (self): Start= LocalTime (Time ()-self.window*Day ) Date= Strftime ('%y%m%d', start) hour= Strftime ('%h%m%s', start) server=NNTP (self.servername) IDs= Server.newnews (Self.group,date,hour) [1]                 forIdinchIds:lines= server.article (ID) [3] Message= Message_from_string ('\ n'. Join (lines)) title= message['subject'] Body=message.get_payload ()ifMessage.is_multipart (): Body=Body[0]yieldNewsItem (Title,body) server.quit ()classSimplewebsource:" "use regular expressions to extract news sources from a Web page for news items" "        def __init__(Self,url,titlepattern,bodypattern): Self.url=URL Self.titlepattern=re.compile (titlepattern) Self.bodypattern=re.compile (Bodypattern)defGetItems (self): text=Urlopen (Self.url). Read () Titles=Self.titlePattern.findall (text) Bodies=self.bodyPattern.findall (text) forTitle.bodyinchZip (titles,bodies):yieldNewsItem (Title,wrap (body))classplaindestination:" "format All news items as plain text news catalog Classes" "        defReceiveitems (self,items): forIteminchItems:PrintItem.titlePrint '-'*Len (item.title)PrintItem.Bodyclasshtmldestination:" "format all news items as HTML target classes" "        def __init__(self,filename): Self.filename=filenamedefReceiveitems (self,items): Out= Open (Self.filename,'W')                Print>> out," "" "                Print>> out,'<ul>'ID=0 forIteminchItems:id+ = 1Print>> out,'<li><a href= "#" >%s</a></li>'%(Id,item.title)Print>> out,'</ul>'ID=0 forIteminchItems:id+ = 1Print>> out,''%(Id,item.title)Print>> out,'<pre>%s</pre>'%Item.BodyPrint>> out," "</body> " "defRundefaultsetup ():" "The default location of the source and destination, you can modify" "Agent=newsagent ()" "get the news from BBS news station Simplewebsource" "Bbc_url='http://news.bbc.co.uk/text_only.stm'Bbc_title= R'(? s) a href= "[^"]* ">\s*<b>\s* (. *?) \s*</b>'Bbc_body= R'(? s) </a>\s*<br/>\s* (. *?) \s*<'BBC=Simplewebsource (Bbc_url, Bbc_title, Bbc_body) Agent.addsource (BBC)" "get the news from Comp.lang.python.announce Nntpsource" "Clpa_server='news2.neva.ru'Clpa_group='Alt.sex.telephone'Clpa_window= 1Clpa=Nntpsource (Clpa_server,clpa_group,clpa_window) Agent.addsource (CLPA)" "increase plain text targets and HTML targets" "agent.adddestination (Plaindestination ()) agent.adddestination (Htmldestination ('news.html'))              " "Publish a news item" "Agent.distribute ()if __name__=='__main__': Rundefaultsetup ()

This program, first from the overall analysis, the focus is newsagent, its role is to store news sources, store the target address, The source servers (Nntpsource and Simplewebsource) and the classes that write the news (Plaindestination and htmldestination) are then called separately. So I can see from here that Nntpsource is specifically used to get the information on the news server, and Simplewebsource is to get the data on a URL. The role of Plaindestination and Htmldestination is obvious, the former is used to output the obtained content to the terminal, the latter is to write data into the HTML file.

With these analyses, and then looking at the contents of the main program, the main program is to add information sources and output destination addresses to newsagent.

Python Basic Tutorial Summary 15--4 News Aggregation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.