Sparkcontext Create
Val conf = new sparkconf (). Setappname ("AppName")
val sc = new Sparkcontext (conf)
Read HDFs file
Sc.textfile (PATH)
The Textfile parameter is a path, which can be:
1. A file path where only the specified file is loaded
2. A directory path in which only all files under the specified directory (excluding files under subdirectories) are loaded
3. Load multiple files in the form of wildcards or load all files under multiple directories 4. Precede path with file://to read from the local file system, add hdfs://in front of path to read from the HDFs file system, and read the file from HDFs by default
Save File
Saveastextfile (PATH)
def saveastextfile (path:string): Unit
def saveastextfile (path:string, codec:class[_ <: Compressioncodec]): Unit
The saveastextfile is used to store the RDD in a text file format into the file system.
The codec parameter can specify a compressed class name.
Saveastextfile ("hdfs:///tmp/test/", Classof[com.hadoop.compression.lzo.lzopcodec])
Add file://in front of path to read from the local file system, add hdfs://in front of path to read from the HDFs file system, read the file from HDFs by default
classification and function of spark operators
value type transformation operator
input partition and output partition one-to- one
Map
FlatMap
mappartitions
Glom
input partition and output partition many-to-one type
Union
Cartesian
input partition and output partition Many-to-many types
GroupBy
output partition as input partition subset type
Filter
distinct
Subtract
Sample
Takesample
Cache Type
Cache
persist
key-value Type transformation operator
input partition and output partition one-to-one
mapvalues
aggregation to a single rdd or two Rdd
Single Rdd aggregation
Combinebykey
Reducebykey
Partitionby
aggregation of two Rdd
Cogroup
Connection
Join
Leftoutjoin and Rightoutjoin
actions operator
No output
foreach
HDFS
Saveastextfile
Saveasobjectfile
Scala collections and data types
Collect
Collectasmap
reducebykeylocally
Lookup
Count
Top
Reduce
Fold
Aggregate