R Language-kindle Discount book crawl list example & output HTML tips

Source: Internet
Author: User

Since the purchase of the Kindle, always want to regularly brush there is no cheap books, Amazon often some 1 yuan/2 Yuan book special, but every time to brush those lists too troublesome, and the list can not be ranked by price, fishing books a little tired

So I use the R language of the Rvest package simply wrote a small program, let it automatically according to different price range to divide the special books.

The main focus is on the Kindle new list and the fastest-selling list.

Sales Climb fastest list: http://www.amazon.cn/gp/movers-and-shakers/digital-text/

New list: http://www.amazon.cn/gp/new-releases/digital-text/

The Data.table/dplyr/rvest package needs to be pre-installed.

The code is as follows

Install.packages ("rvest") install.packages ("data.table " ) install.packages ("dplyr")

The main sharing points are:

1. Simple application Example of Rvest
2, how to output the data frame (data.frame or table) as an HTML file, that is, the method of adding HTML script

Library (rvest) library (data.table) library (DPLYR)
#这里是导入网址。研究一下amazon的顺序,直接导入就好
Id<-1:5Url_increase_fast<-Paste0 ("Http://www.amazon.cn/gp/movers-and-shakers/digital-text/ref=zg_bsms_digital-text_pg_", ID,"? ie=utf8&pg=", id) url_newest<-Paste0 ("Http://www.amazon.cn/gp/new-releases/digital-text/ref=zg_bsnr_digital-text_pg_", ID,"? ie=utf8&pg=", id) URL<-c (Url_increase_fast,url_newest)
#这里编写readdata函数,读取网页内容。里面有些不常用的字段,为了最后导出效果好看,我没全部都导。
#有额外需要的可以自己改编,譬如分类啊,好评率啊等等。对我来说,知道价格、书名就够了
readdata<-function (i) {Web<-html (url[i],encoding="UTF-8") Title<-web%>% Html_nodes ("Div.zg_title")%>%html_text () Title_short<-substr (title,1,20) Price<-as.numeric (Gsub ("¥","", Web%>% html_nodes ("Div.zg_itempriceblock_normal Strong.price")%>%Html_text ())) Ranking_movement<-web%>% Html_nodes ("span.zg_salesmovement")%>%html_text () Rank_number<-as.numeric (Gsub ("\\.","", Web%>% html_nodes ("Span.zg_ranknumber")%>%Html_text ())) #there is no record of sales changes in the book list, so it's written in Na.        if(Length (ranking_movement) ==0) {Ranking_movement=rep (na,20) Rank_number=rep (na,20)} link<-gsub ("\\\n","", Web%>% html_nodes ("Div.zg_title a")%>% Html_attr ("href")) ASIN<-sapply (strsplit (link,split ="/dp/"), function (e) e[2]) img<-web%>% Html_nodes ("div.zg_itemimage_normal img")%>% Html_attr ("src")        #Add HTML code hereIMG_LINK&LT;-PASTE0 (""Img"' >") Title_link&LT;-PASTE0 ("<a href= '", Link,"' >", Title_short,"</a>")        #Merging Datacombine<-data.table (img_link,title_link,price,ranking_movement) setnames (Combine,c ("Image","title","Price","Sales Changes"))        #in case of an IP, set to 5 seconds to run the data. Sys.sleep (5) Combine}#do a bad start run numberfinal<-data.table () for(Iinch1:10) {Final<-Rbind (Final,readdata (i))Print(i)}#here to write a function, convert data.table to html_table# points see w3school,table page, start with <table>, header is <th>, row to Row is <tr># The main thing is sapply, the application of apply,paste ... is to put the data frame first add <td>, and then add <tr> Finally, the outer layer <table>transfer_html_table<-function (rawdata) {title&LT;-PASTE0 ("<th>", names (RawData),"</th>") Content<-sapply (Rawdata,function (e) paste0 ("<td>"E"</td>") ) Content<-apply (Content,1,function (e) paste0 (e,collapse ="") ) Content&LT;-PASTE0 ("<tr>", Content,"</tr>") BBB<-c ("<table border=1><tr>", title,"</tr>", Content,"</table>") BBB}#The transfer_html_table function is used here to output the list as an HTML table.Final_less1<-transfer_html_table (rawdata=final%>% filter (Price <=1)) write (Final_less1,"~//kindle-less than 1 Yuan Special book. HTML") Final_1_2<-transfer_html_table (rawdata=final%>% filter (Price >1 & Price <=2)) write (Final_1_2,"~//kindle_1-2 Yuan Special book. HTML") Final_2_5<-transfer_html_table (rawdata=final%>% filter (Price >2 & Price <=5)) write (Final_2_5,"~//kindle_2-5 Yuan Special book. HTML")

Finally in my document ("~//" means to navigate to my document) will find three HTML files, open, probably long below this appearance, so can be very happy to choose the book. The Kindle will occasionally put some good books 1 yuan for sale ~ ~ So with the Kindle often cheap hands, with this little script I think I will be more cheap to buy books ...

If you are interested, you can also search the R language batch, Autorun, and so on, put this code to run on a regular basis, and then let the results can be accumulated saved. Then you will know when the Kindle store has the most price adjustment. Amazon is also relatively easy to crawl, its HTML page code is very neat and tidy, in addition to the Product Details page, product description (products description) is always protected by the script, it is more difficult to climb.

End

Transferred from: http://mp.weixin.qq.com/s?__biz=MzA3MTM3NTA5Ng==&mid=2651055375&idx=1&sn= 5c9e12352eab84012bc26cb9851a96b2&chksm= 84d9c498b3ae4d8e015575ae573d13c553a33ee08403e7a86853b426d6a7b06087fb02ab1bbc&scene=0#rd

R Language-kindle Discount book crawl list example & output HTML tips (GO)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.