Apache Spark Rdd What is an RDD

Source: Internet
Author: User
Tags spark rdd

What is an RDD?

The official explanation for RDD is the elastic distributed data set, the full name is resilient distributed Datasets. The RDD is a collection of read-only, partitioned records. The RDD can only be created based on deterministic operations on datasets in stable physical storage and other existing RDD. These deterministic operations are called transformations, such as map, filter, GroupBy, join.

The RDD is not materialized, and the Rdd contains information about how to derive from the other Rdd (that is, compute) the RDD (i.e., lineage), so that the RDD partition can be computed from the physical stored data when the RDD portion of the partition data is lost.

All or part of this dataset can be in memory and reused across multiple computations.

Elasticity refers to the ability to swap with a disk when memory is insufficient.

This design is another feature of the RDD: Memory calculation, which is to save the data in memory. and to address memory capacity limitations, Spark gives us the greatest degree of freedom, all of which can be set by the cache, including whether the cache and how to cache.

Apache Spark Rdd What is an RDD

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.