Import Org.apache.spark.mllib.linalg.Vectors
Import Org.apache.spark.mllib.regression.LabeledPoint
Val Traindata=data.map{line =>
Val label=line (0). tostring.todouble
Val value0= (1 to). Map (i=> line (i). tostring.todouble)
Val Featurevector=vectors.dense (Value0.toarray)
Labeledpoint (label, Featurevector)
}
Test start +1
Val Numround = 800
Val Parammap = List (
"ETA"-> 0.1f,
"Max_depth"-> 6,//The maximum depth of the number. The default value is 6, and the value range is: [1,∞]
"Silent"-> 0,//0 indicates that the runtime information is printed out, and 1 indicates that it is silent and does not print run-time information. The default value is 0
"Objective"-> "reg:linear",/define learning tasks and corresponding learning goals
"Eval_metric"-> "Rmse", and/or the evaluation index required to verify the data
"Nthread"-> 1//xgboost the number of threads at run time. The default value is the maximum number of threads that the current system can obtain
). Tomap
Val model = Xgboost.train (Traindata, Parammap, numround, nworkers =, Useexternalmemory = False)
Lablepoint Construction
Modify....
Val Testdata=data1.map{line =>
Val label=line (0). tostring.todouble
Val value0= (1 to). Map (i=> line (i). tostring.todouble)
Val Featurevector=vectors.dense (Value0.toarray)
Featurevector
}
Val predtrain = model.predict (testData)
Val s=predtrain.collect () (0)
S.length
True Value
Val Data2=df1.select (Df1 ("Masterhotel"), Df1 ("Order_cii_notcancelcii"), Df1 ("Rank1"), Df1 ("OrderDate"))
Val actual_frame=data2.todf ()
Building Dataframe Type Result sets
Case Class ResultSet (Masterhotel:int,//Parent Hotel ID
Quantity:double,//Real output
Rank:int,//Sort
Date:string,//Date
Frcst_cii:double//Forecast output
)
Val Ac_1=actual_frame.collect ()
Val pr_1=predtrain.collect () (0)
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.