15th Lesson: Rdd Creation Insider thorough decryption

Source: Internet
Author: User

The main contents of this lesson:

1, Rdd creation of several ways

2. RDD Create Combat

3. Rdd Insider


There are many ways to create an RDD, and the following are some ways to create an rdd:

1, use the collection of programs to create RDD, the actual meaning for testing purposes;

2, the use of local file system to create RDD, testing a large number of data files;

3, using HDFs to create the RDD, the most common way;

4. Create rdd based on DB;

5. Create an RDD based on NoSQL, such as HBase;

6, based on S3 to create the RDD;

7, based on the data source to create the RDD;


Rdd Combat:

Creating an Rdd from a collection method

Val conf = new sparkconf (). Setappname ("Rdddemo"). Setmaster ("local")
Val sc = new Sparkcontext (conf)
Create an Rdd
Val Rdd = sc.parallelize(0 to 100)
1+2=3 3+3 = 6 6+4 = 10 ....
Val sum = Rdd.reduce (_ + _)
println (SUM)


Create an RDD from a file on HDFs

Valconf =NewSparkconf (). Setappname ("Rdddemo"). Setmaster ("Local")
Valsc =NewSparkcontext (CONF)
//Create Rdd
ValLines = SC.textfile("Hdfs://master:9000/data/readme.md")
ValWords = Lines.flatmap (line = Line.split (" "). Map (line, 1))
ValWordCount = Words.reducebykey (_ + _)
Wordcount.collect (). foreach (println)


Note:

information from: Dt_ Big Data Dream Factory

for more private content, please follow the public number: Dt_spark

If you have a big dataSparkinterested to be free to listen to by Liaoliang teacher every night -:xxopened bySparkpermanent free public class, addressYYRoom Number:68917580

This article is from "Dt_spark Big Data DreamWorks" blog, please make sure to keep this source http://18610086859.blog.51cto.com/11484530/1773180

15th Lesson: Rdd Creation Insider thorough decryption

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.