Ref: 77323673
RSS related Introduction
- Introduction to RSS: Https://wikipedia.org/wiki/RSS
- An introduction to the XML format for RSS: http://www.w3school.com.cn/rss/rss_syntax.asp
Feedparser
- Feedparser Installation
- Simplified Rss.xml
<?xml version= "1.0" encoding= "Utf-8"? ><feed xmlns= "Http://www.w3.org/2005/Atom" > <title type= "Text" > Blog Park _ Dictation </title> <subtitle type= "text" ></subtitle> <id>uuid :70a1ed00-25f2-44e5-b74e-7e9f1e384f1c;id=5134</id> <updated>2018-09-29T09:06:43Z</updated> <author> <name> Dictation </name> <uri>http://www.cnblogs.com/qiulinzhang/</uri> </ author> <generator>feed.cnblogs.com</generator> <entry> <id>http://www.cnblogs.com/ qiulinzhang/p/9724748.html</id> <title type= "text" >pearson Correlation coefficient 2018-09-29-Dictation </ title> <summary type= "text" >pearson Correlation coefficient pearson correlation coefficient is a statistic used to reflect the linear correlation of two variables The simple correlation coefficients of samples are usually expressed in R, where n is the sample amount, and the observed and mean values of two variables are respectively. R describes the degree of linear correlation between two variables. R value between 1 and +1, if R 0, indicates two </summary> <published>2018-09-29T09:07:00Z</published> <updated>2018 -09-29t09:07:00z</updated> <author> <name> Dictation </name> <uri>http://www.cnblogs.com/qiulinzhang/</uri> </author> <link rel= "Alternate" href= "http://www.cnblogs.com/qiulinzhang/p/9724748.html"/> <link rel= "alternate" Ty Pe= "text/html" href= "http://www.cnblogs.com/qiulinzhang/p/9724748.html"/> <content type= "html" > "Summary" Pearson Correlation coefficient Pearson correlation coefficient is a simple correlation coefficient of a statistical sample used to reflect the linear correlation of two variables, which is generally expressed in R, where n is the sample amount, and the observed and mean values of two variables are respectively. R describes the degree of linear correlation between two variables. The value of R is between 1 and +1, if R 0, indicating two <a href= "http://www.cnblogs.com/qiulinzhang/p/9724748.html" target= "_blank" > Read the full text </a></content> </entry> <entry>...</entry> <entry>...</entry > <entry>...</entry> <entry>...</entry> <entry>...</entry> <entry>, .... </entry> <entry>...</entry> <entry>...</entry> <entry> <id>http:// Www.cnblogs.com/qiulinzhang/p/9570867.html</id><title type= "text" >sizeof () Usage-Dictation love </title> <summary type= "text" >1. The definition sizeof is an operator operator, not a function that returns the number of bytes of memory that an object or type occupies \ 2. Syntax sizeof object; sizeof object sizeof (object); sizeof (TYPE_NAME); For example sizeof (int) object </summary> <published>2018-09-01T08:53:00Z</published> <updated>2018-09 -01t08:53:00z</updated> <author> <name> Dictation </name> <uri>http://www.cnblogs.com/ qiulinzhang/</uri> </author> <link rel= "alternate" href= "http://www.cnblogs.com/qiulinzhang/p/ 9570867.html "/> <link rel=" alternate "type=" text/html "href=" http://www.cnblogs.com/qiulinzhang/p/9570867. html "/> <content type=" html > "Summary" 1. The definition sizeof is an operator operator, not a function that returns the number of bytes of memory that an object or type occupies \ 2. Syntax sizeof object; sizeof object sizeof (object); sizeof (TYPE_NAME); For example sizeof (int) object <a href= "http://www.cnblogs.com/qiulinzhang/p/9570867.html" target= "_blank" > Read full text </a></content> </entry></feed>
Then use feedparser
it to parse:
>>> import feedparser>>> feed = feedparser.parse(‘rss.xml‘)>>> print feed[‘feed‘][‘title‘]博客园_默写年华>>> print feed.feed.title #通过属性访问博客园_默写年华>>> print feed.entries[0].id #对应上面第一个 entry 的 idhttp://www.cnblogs.com/qiulinzhang/p/9724748.html>>> print feed[‘entries‘][-1][‘summary‘] #对应于最后一个 entry的 summary1. 定义 sizeof 是一个操作符 operator ,不是一个函数, 其作用是返回一个对象或类型所占的内存字节数 \ 2. 语法 sizeof object; //sizeof 对象 sizeof(object); sizeof(type_name); // 例如 sizeof(int) 对象>>> len(feed[‘entries‘])10
Note: Chinese garbled problem:
Unicode encoding does not display Chinese in tuples, only in encoded form, with a u in front of the format, Unicode, so the individual printing will print feed[‘feed‘][‘title‘]
not be printed in the form of a meta-ancestor, so that Chinese can be typed
Python2 default is ASCII
, and python3 default is unicode
, so:
In the case of Python2, it's print feed[‘feed‘]
all Unicode.
In the case of Python3 print feed[‘feed‘]
can be correctly typed Chinese
Reference: http://blog.51cto.com/daimalaobing/2046659
RSS parsing Feedpaser 2018-10-02