Rdd Action Action (6) –saveashadoopfile, Saveashadoopdataset

Source: Internet
Author: User
Tags versions hadoop fs
Saveashadoopfile

def saveashadoopfile (Path:string, keyclass:class[_], valueclass:class[_], outputformatclass:class[_ <: Outputformat[_, _]], codec:class[_ <: Compressioncodec]): Unit

def saveashadoopfile (Path:string, keyclass:class[_], valueclass:class[_], outputformatclass:class[_ <: Outputformat[_, _]], conf:jobconf = ..., codec:option[class[_ <: compressioncodec]] = None): Unit

Saveashadoopfile is a file that stores the RDD on HDFs and supports the old version of the Hadoop API.

You can specify Outputkeyclass, Outputvalueclass, and compression formats.

Output one file per partition.

var rdd1 = Sc.makerdd (Array ("a", 2), ("A", 1), ("B", 6), ("B", 3), ("B", 7)) Import   Org.apache.hadoop.mapred.TextOutputFormat Import org.apache.hadoop.io.Text Import org.apache.hadoop.io.IntWritable Rdd1.saveashadoopfile ("/tmp/lxw1234.com/", Classof[text],classof[intwritable],classof[textoutputformat[text, Intwritable]]) rdd1.saveashadoopfile ("/tmp/lxw1234.com/", classof[text],classof[intwritable],classof[ Textoutputformat[text,intwritable]], Classof[com.hadoop.compression.lzo.lzopcodec]) Saveashadoopdataset

def saveashadoopdataset (conf:jobconf): Unit

Saveashadoopdataset is used to save the RDD to other storage other than HDFs, such as HBase.

In jobconf, it is usually necessary to focus or set five parameters:

The file's save path, the class type of the key value, the class type of value, the output format of the RDD (OutputFormat), and the compression-related parameters.

# #使用saveAsHadoopDataset将RDD保存到HDFS中

Import org.apache.spark.SparkConf Import org.apache.spark.SparkContext import sparkcontext._ Import Org.apache.hadoop.mapred.TextOutputFormat Import org.apache.hadoop.io.Text Import org.apache.hadoop.io.IntWritable Import org.apache.hadoop.mapred.JobConf var rdd1 = Sc.makerdd (Array ("a", 2), ("A", 1), ("B", 6), ("B", 3), ("B", 7)) var Jo bconf = new jobconf () Jobconf.setoutputformat (classof[textoutputformat[text,intwritable]) Jobconf.setoutputkeyclass (Classof[text]) Jobconf.setoutputvalueclass (classof[intwritable]) JobConf.set (" Mapred.output.dir ","/tmp/lxw1234/") Rdd1.saveashadoopdataset (jobconf) Result: Hadoop fs-cat/tmp/lxw1234/part-00000 a 2 a 1 Hadoop fs-cat/tmp/lxw1234/part-00001 B 6 B 3 b 7

# #保存数据到HBASE

HBase Build Table:

Create ' Lxw1234′,{name = ' f1′,versions = 1},{name = ' F2′,versions = 1},{name = ' f3′,versions = 1}

Import org.apache.spark.SparkConf Import org.apache.spark.SparkContext import sparkcontext._ Import Org.apache.hadoop.mapred.TextOutputFormat Import org.apache.hadoop.io.Text Import org.apache.hadoop.io.IntWritable Import org.apache.hadoop.mapred.JobConf Import org

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.