Saveashadoopfile
def saveashadoopfile (Path:string, keyclass:class[_], valueclass:class[_], outputformatclass:class[_ <: Outputformat[_, _]], codec:class[_ <: Compressioncodec]): Unit
def saveashadoopfile (Path:string, keyclass:class[_], valueclass:class[_], outputformatclass:class[_ <: Outputformat[_, _]], conf:jobconf = ..., codec:option[class[_ <: compressioncodec]] = None): Unit
Saveashadoopfile is a file that stores the RDD on HDFs and supports the old version of the Hadoop API.
You can specify Outputkeyclass, Outputvalueclass, and compression formats.
Output one file per partition.
var rdd1 = Sc.makerdd (Array ("a", 2), ("A", 1), ("B", 6), ("B", 3), ("B", 7)) Import Org.apache.hadoop.mapred.TextOutputFormat Import org.apache.hadoop.io.Text Import org.apache.hadoop.io.IntWritable Rdd1.saveashadoopfile ("/tmp/lxw1234.com/", Classof[text],classof[intwritable],classof[textoutputformat[text, Intwritable]]) rdd1.saveashadoopfile ("/tmp/lxw1234.com/", classof[text],classof[intwritable],classof[ Textoutputformat[text,intwritable]], Classof[com.hadoop.compression.lzo.lzopcodec])
Saveashadoopdataset
def saveashadoopdataset (conf:jobconf): Unit
Saveashadoopdataset is used to save the RDD to other storage other than HDFs, such as HBase.
In jobconf, it is usually necessary to focus or set five parameters:
The file's save path, the class type of the key value, the class type of value, the output format of the RDD (OutputFormat), and the compression-related parameters.
# #使用saveAsHadoopDataset将RDD保存到HDFS中
Import org.apache.spark.SparkConf Import org.apache.spark.SparkContext import sparkcontext._ Import Org.apache.hadoop.mapred.TextOutputFormat Import org.apache.hadoop.io.Text Import org.apache.hadoop.io.IntWritable Import org.apache.hadoop.mapred.JobConf var rdd1 = Sc.makerdd (Array ("a", 2), ("A", 1), ("B", 6), ("B", 3), ("B", 7)) var Jo bconf = new jobconf () Jobconf.setoutputformat (classof[textoutputformat[text,intwritable]) Jobconf.setoutputkeyclass (Classof[text]) Jobconf.setoutputvalueclass (classof[intwritable]) JobConf.set (" Mapred.output.dir ","/tmp/lxw1234/") Rdd1.saveashadoopdataset (jobconf) Result: Hadoop fs-cat/tmp/lxw1234/part-00000 a 2 a 1 Hadoop fs-cat/tmp/lxw1234/part-00001 B 6 B 3 b 7
# #保存数据到HBASE
HBase Build Table:
Create ' Lxw1234′,{name = ' f1′,versions = 1},{name = ' F2′,versions = 1},{name = ' f3′,versions = 1}
Import org.apache.spark.SparkConf Import org.apache.spark.SparkContext import sparkcontext._ Import Org.apache.hadoop.mapred.TextOutputFormat Import org.apache.hadoop.io.Text Import org.apache.hadoop.io.IntWritable Import org.apache.hadoop.mapred.JobConf Import org