The main contents of this lesson:
1, Rdd creation of several ways
2. RDD Create Combat
3. Rdd Insider
There are many ways to create an RDD, and the following are some ways to create an rdd:
1, use the collection of programs to create RDD, the actual meaning for testing purposes;
2, the use of local file system to create RDD, testing a large number of data files;
3, using HDFs to create the RDD, the most common way;
4. Create rdd based on DB;
5. Create an RDD based on NoSQL, such as HBase;
6, based on S3 to create the RDD;
7, based on the data source to create the RDD;
Rdd Combat:
Creating an Rdd from a collection method
Val conf = new sparkconf (). Setappname ("Rdddemo"). Setmaster ("local")
Val sc = new Sparkcontext (conf)
Create an Rdd
Val Rdd = sc.parallelize(0 to 100)
1+2=3 3+3 = 6 6+4 = 10 ....
Val sum = Rdd.reduce (_ + _)
println (SUM)
Create an RDD from a file on HDFs
Valconf =NewSparkconf (). Setappname ("Rdddemo"). Setmaster ("Local")
Valsc =NewSparkcontext (CONF)
//Create Rdd
ValLines = SC.textfile("Hdfs://master:9000/data/readme.md")
ValWords = Lines.flatmap (line = Line.split (" "). Map (line, 1))
ValWordCount = Words.reducebykey (_ + _)
Wordcount.collect (). foreach (println)
Note:
information from: Dt_ Big Data Dream Factory
for more private content, please follow the public number: Dt_spark
If you have a big dataSparkinterested to be free to listen to by Liaoliang teacher every night -:xxopened bySparkpermanent free public class, addressYYRoom Number:68917580
This article is from "Dt_spark Big Data DreamWorks" blog, please make sure to keep this source http://18610086859.blog.51cto.com/11484530/1773180
15th Lesson: Rdd Creation Insider thorough decryption