Spark 大資料平台

來源:互聯網
上載者:User

標籤:

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.

BDAS, the Berkeley Data Analytics Stack, is an open source software stack that integrates software components being built by the AMPLab to make sense of Big Data.

?

?

Spark Components VS. Hadoop Components
Spark Core <------> Apache Hadoop MR
Spark Streaming <------> Apache Storm
Spark SQL <------> Apache Hive
Spark GraphX <------> MPI(taobao)
Spark MLlib <------> Apache Mahout

BlinkDB is a massively parallel, approximate query engine for running interactive SQL queries on large volumes of data. It allows users to +, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars.
Two key ideas:

  • An adaptive optimization framework that builds and maintains a set of multi-dimensional samples from original data over time
  • A dynamic sample selection strategy that selects an appropriately sized sample based on a query’s accuracy and/or response time requirements.

Why spark is fast:

  • in-memory computing
  • Directed Acyclic Graph (DAG) engine, compiler can see the whole computing graph in advance so that it can optimize it. Delay Scheduling
Resilient Distributed Dataset
  • A list of partitions
  • A function for computing each split
  • A list of dependencies on other RDDs
  • Optionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned)
  • Optionally, a list of preferred locations to compute each split on (e.g. block locations for an HDFS file)
Storage Strategy
class StorageLevel private(    private var useDisk_ : Boolean,    private var useMemory_ : Boolean,    private var deserialized_ : Boolean,    private var replication_ : Int = 1)    val MEMORY_ONLY_ = new StorageLevel(false, true, true)
RDD, transformation & action

lazy evaluation
?

Spark 大資料平台

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.