Tuesday, March 31, 2015
Before using Rvest to help people write a regular crawl Amazon price inventory, and compared with the previous price of the small program, is recently written the first complete program. It involves a number of error handling.
Here are the main references to the following questions and answers on StackOverflow:
- How to skip an error in a loop
- Skip to next value of loop upon error in R
Trycatch section, follow-up information, found the following blog post: 1. The R language uses Trycatch for simple error handling
The following is a code example:
1) Use the Trycatch function to skip the error message. (examples are styled with download.file)
Look at the following code. There is a need to download a bunch of Amazon product information in bulk. If the Product ID number is wrong, or the IP is limited, the webpage will not open, and Download.file will error. I use trycatch here to get the error message when the webpage cannot be opened. and requires the next loop "" to be executed.
for(ninch1: Length (Productlink)) {TryCatch ({download.file () productlink[n],paste0 (GETWD (),"/html/", Productid[n,],". html"), Cacheok =TRUE)},error=function (e) {Cat ("ERROR:", Conditionmessage (E),"\ n")}) Sys.sleep (0.5)#added the sys.sleep (seconds) function to allow each step of the loop to pause for a while. While this may slow down the program, it is a good idea for sites with restricted access. }
The above example consists of two important functions, namely Trycatch and cat
Lookup function, Trycatch belongs to base package, condition system. Use Trycatch for simple error handling in the R language this post has a simple demonstration of Trycatch as follows:
result = TryCatch ( {expr}, = function (w) {warning-handler-code}, = function (e) { error-handler-Code}, finally = {cleanup-code} )
That is, if warning, what to do with warning, if error is what to do with error. If no conditions are matched, the content in expr is eventually output. If there is a final item, then both the finally item and the expr item are output
TryCatch ({a<- c " b <-" c b ==a}, Error =function (e) {cat ( hahaha
, Conditionmessage (e), \n\n finally ={print ( " CCC )})
[1] "CCC"
[1] TRUE
TryCatch ({a<-"C"
cc==A}, #cc不存在 error=function (e) {cat ("hahaha", Conditionmessage (e), "\ n")}, Finally={print ("CCC ")})
hahaha object ' cc ' not found
For the code example, the download succeeds to return the download content, and the unsuccessful return Error=function (e) {Cat ("ERROR:", Conditionmessage (e), "\ n")}
Then there is the cat function. This cat is an input/output value. This is equivalent to asking the system to output the contents of "ERROR:" +conditionmessage (e). Then use the "" Branch.
In addition, we see a more interesting application in this question and answer by Mmann1123, which is answered by StackOverflow.
It shrinks and expands and can be read.
#!/usr/bin/env Rscript#TRYCATCH.R--Experiments with TryCatch#Get any argumentsArguments <-Commandargs (trailingonly=TRUE) A<-arguments[1]#Define A division function that can issue warnings and errorsMydivide <-function (d, a) {if(A = ='Warning') {Return_value<-'mydivide Warning Result'Warning ("mydivide warning Message") } Else if(A = ='Error') {Return_value<-'mydivide Error result'Stop ("mydivide Error message") } Else{Return_value= d/As.numeric (a)}return(Return_value)}#Evalute the desired series of expressions inside of TryCatchResult <-TryCatch ({b<-2C<-b^2D<-c+2if(A = ='suppress-warnings') {e<-suppresswarnings (Mydivide (d,a))}Else{e<-Mydivide (D,a)#6/a} f<-E + 100}, Warning=function (War) {#warning Handler picks up where error is generated Print(Paste ("my_warning:", War)) b<-"changing ' B ' inside the warning handler has no effect"e<-Mydivide (d,0.1)#=60F <-E + 100return(f)}, error=function (err) {#warning Handler picks up where error is generated Print(Paste ("My_error:", err)) b<-"changing ' B ' inside the error handler has no effect"e<-Mydivide (d,0.01)#=600F <-E + 100return(f)},finally= { Print(Paste ("A =", a)) Print(Paste ("B =", B)) Print(Paste ("C =", C)) Print(Paste ("d =", D)) #note:finally is evaluated in the context of the inital #Note:trycatch block and ' e ' would not exist if a warning #note:or error occurred. #print (Paste ("E =", E))}) #END TryCatchPrint(Paste ("result =", result))
Trycatch Demonstration
2) Use the IF statement and the Stop statement.
That is, if a condition is not true, stop the program and output the contents of the stop. I'm mainly here to check if the original Product ID is entered correctly.
if (!sum (check) = =Length (productlink)) { productlink<-null productid<-null Stop ("Invalid ProductID" Double check if any space or else in, and resave the file or the script W Ill not run") }
3) When processing bulk read data using Data.frame, the element does not exist because of data.frame error.
For example, if a does not exist, the data.frame error is caused.
a<-nullb<-c ("cc","dd") data.frame (a,d) in Data.frame (A, D): parameter value means different number of rows: 0, 2
Therefore, in the loop, you need to synthesize data.frame separately, and then use Rbind to synthesize each data.frame together, you can consider increasing the value of the outlier assignment. As in the following two paragraphs, if the product name does not exist in the page I pulled, then length (ProductName) ==1 is false, the output "product not download or not existing" is directly Then this field is not a null value or 2-3 rows, but 1 lines, and then merged into Data.frame will not be an error.
data<-function (n) {## # #隐掉获得productname code for/price/category if(!length (ProductName) ==1) {productname="Product not download or not existing"} if(!length (Price) ==1) { price=NA category<-"Product not download or not existing"} data.frame (productname,price,category)#here, the data.frame is synthesized, if the three rows are not equal (many null values are null, or a field has 2-3 rows.) #The advantage of using the if to determine the value of the assignment is that the last productname,price,category guaranteed is 1 lines, which can be combined with data.frame. And there is output for outliers as well.
I didn't understand the Trycatch function because I was dealing with class 2nd 3 errors. Now look down, seemingly trycatch function can do more things?
Write down for reference when writing code later.
In addition, Trycatch have similar effects in java,c. It seems that r in the final analysis, still can not escape the underlying language AH.
The next April study plan, finish one to write a blog post ~ ~ Organize ideas to record notes.
1) Rcurl bag, and its thick english instruction manual. I hope I can finally learn to use it to crawl some script pages that rvest can't crawl, search box pages, etc.
2) Do financial time series analysis with R (into gold of the smelting number)
3) Follow Xiao Xing teacher re-review the Financial Analysis Knowledge (MOOC Class), after reviewing the financial knowledge of the past, then re-look at the stock data of the forecaster net, try to do some digging and analysis, at least from the macro understanding of the current Chinese listed company's layout and characteristics, for later use R research stocks lay the groundwork.
----------
my blog: http://www.cnblogs.com/weibaar Records Learning R and everything about data analysis.
R language-three examples of handling outliers or errors