Dry Foods | Apache Spark three big Api:rdd, dataframe and datasets, how do I choose

Source: Internet
Author: User
Tags comments
Follow the Iteblog_hadoop public number and comment at the end of the "double 11 benefits" comments Free "0 start TensorFlow Quick Start" Comment area comments (seriously write a review, increase the opportunity to list). Message points like the top 5 fans, each free one of the "0 start TensorFlow Quick Start", the event until November 07 18:00.

This PPT from Spark Summit EUROPE 2017 (other PPT material is being collated, please pay attention to this public number Iteblog_hadoop, or https://www.iteblog.com)


The most gratifying thing for developers is that there is a set of APIs that can greatly improve the productivity of developers, are easy to use, very intuitive and expressive. One important reason that Apache Spark is popular with developers is that it's a very easy-to-use API that makes it easy to manipulate large datasets in multiple languages, such as Scala, Java, Python, and R.


This article provides an in-depth look at the three types of Api:rdd, Dataframe, and Datasets available in Apache Spark 2.2 and above, under what circumstances you should choose which and why, and outline their performance and optimization points, List scenarios that should use Dataframe and datasets instead of RDD. I will pay more attention to dataframe and datasets because the two APIs have been integrated in Apache Spark 2.0.


The motivation behind this integration is that we want to make it easier to use spark by reducing the number of concepts you need to master and by providing a way to work with structured data. When working with structured data, spark can provide advanced abstractions and APIs as much as the capabilities provided by the language of a particular domain.


This article ppt download address: https://www.iteblog.com/ A-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets-with-jules-damji-iteblog.pdf

Or

Http://cdn.iteblog.com/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets-with-jules-damji-iteblog.pdf

Click below to read the original text to enter.

this share of the video is as follows (due to the public number of restrictions, can only upload 20MB of video, HD video is being uploaded or directly accessed: HTTPS://WWW.YOUTUBE.COM/WATCH?V=OFK7G3GD9JK )


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.