Topic Center

Contact Sales

首頁 > 熱門類別 > Big Data

Spark 大資料平台

最後更新：2016-01-04 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.

BDAS, the Berkeley Data Analytics Stack, is an open source software stack that integrates software components being built by the AMPLab to make sense of Big Data.

Spark Components	VS.	Hadoop Components
Spark Core	<------>	Apache Hadoop MR
Spark Streaming	<------>	Apache Storm
Spark SQL	<------>	Apache Hive
Spark GraphX	<------>	MPI(taobao)
Spark MLlib	<------>	Apache Mahout

BlinkDB is a massively parallel, approximate query engine for running interactive SQL queries on large volumes of data. It allows users to +, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars.
Two key ideas:

An adaptive optimization framework that builds and maintains a set of multi-dimensional samples from original data over time
A dynamic sample selection strategy that selects an appropriately sized sample based on a query’s accuracy and/or response time requirements.

Why spark is fast:

in-memory computing
Directed Acyclic Graph (DAG) engine, compiler can see the whole computing graph in advance so that it can optimize it. Delay Scheduling

Resilient Distributed Dataset

A list of partitions
A function for computing each split
A list of dependencies on other RDDs
Optionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned)
Optionally, a list of preferred locations to compute each split on (e.g. block locations for an HDFS file)

Storage Strategy

class StorageLevel private(    private var useDisk_ : Boolean,    private var useMemory_ : Boolean,    private var deserialized_ : Boolean,    private var replication_ : Int = 1)    val MEMORY_ONLY_ = new StorageLevel(false, true, true)

RDD, transformation & action

lazy evaluation
?

Spark 大資料平台

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

相關關鍵詞：

大資料<javaSE + Linux精英實訓班>_day_07 03-24

全球100款大資料工具匯總（前50款） 10-16

51CTO大資料學習006--集合 06-03

大數計算機 12-04

蔡先生論道大資料之(十五) ：什麼是資料化運營？ 07-24

MYSQL大資料匯入 12-08

聯繫我們

該頁面正文內容均來源於網絡整理，並不代表阿里雲官方的觀點，該頁面所提到的產品和服務也與阿里云無關，如果該頁面內容對您造成了困擾，歡迎寫郵件給我們，收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容，歡迎發送郵件至： info-contact@alibabacloud.com 進行舉報並提供相關證據，工作人員會在 5 個工作天內聯絡您，一經查實，本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark 大資料平台

聯繫我們

熱門內容

熱門主題

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support