Spark Big Data Platform

Source: Internet
Author: User
Tags spark mllib

Apache Spark is an open source cluster computing system, aims to do data analytics fast-both fast to run and fast To write.

Bdas, the Berkeley Data Analytics Stack, is an open source software stack This integrates software components being built By the Amplab-make sense of Big Data.

?

?

Components Components
SparkVS. Hadoop
Spark Core <------> Apache Hadoop MR
Spark Streaming <------> Apache Storm
Spark SQL <------> Apache Hive
Spark GraphX <------> MPI (Taobao)
Spark MLlib <------> Apache Mahout

BLINKDB is a massively parallel, approximate query engine for running Interactive SQL queries on large V Olumes of data. It allows users to +, enabling interactive queries over massive data by running queries on data samples and presenting res Ults annotated with meaningful error bars.
Both key ideas:

    • An adaptive optimization framework, builds and maintains a set of multi-dimensional samples from original data over Ti Me
    • A dynamic sample selection strategy that selects an appropriately sized sample based on a query ' s accuracy and/or response Time requirements.

Why Spark is fast:

    • In-memory Computing
    • Directed acyclic graph (DAG) engine, compiler can see the whole computing Graph in advance so it can optimize it. Delay Scheduling
Resilient distributed Dataset
    • A List of partitions
    • A function for computing each split
    • A List of dependencies on other RDDs
    • Optionally, a partitioner for Key-value RDDs (e.g. to say that the RDD is hash-partitioned)
    • Optionally, a list of preferred locations to compute each split on (e.g. block locations for an HDFS file)
Storage strategy
class StorageLevel private(    private var useDisk_ : Boolean,    private var useMemory_ : Boolean,    private var deserialized_ : Boolean,    private var replication_ : Int = 1)    val MEMORY_ONLY_ = new StorageLevel(false, true, true)
RDD, Transformation & Action

Lazy evaluation
?

Spark Big Data Platform

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.