Saveastextfile
Saveastextfile (path, compressioncodecclass=none)
Aveastextfile is used to store the RDD in a text file format into a file system, storing each element in string format (combined with Python's loads and dumps to be well-applied)
Parameters:
- path –path to text file
- Compressioncodecclass – (None by default) string i.e. "Org.apache.hadoop.io.compress.GzipCodec" specifies the compressed class name
Example:
Saveassequencefile
Sequencefile (path, keyclass=none, valueclass=none, keyconverter=none, valueconverter=none, minsplits=none, batchsize=0)
Parameters:
- path –path to Sequncefile
- keyclass –fully qualified classname of key writable class (e.g. "Org.apache.hadoop.io.Text")
- valueclass –fully qualified classname of value writable class (e.g. "org.apache.hadoop.io.LongWritable")
- Keyconverter –
- Valueconverter –
- minsplits –minimum splits in DataSet (Default min (2, sc.defaultparallelism))
- batchsize , haven number of Python objects represented as a single Java object. (default 0, choose BatchSize automatically)
Saveassequencefile is used to save the RDD in Sequencefile file format to HDFs
Storage will be stored by default on HDFs, preserving the original format
Example:
Look at the files on HDFs, as well as get down after looking at the file format:
Saveashadoopfilesaveashadoopdatasetsaveasnewapihadoopfilesaveasnewapihadoopdataset
Spark programming--actions II