Spart Rapid Big Data Analysis learning outline (i)

Source: Internet
Author: User

What is Spart?

Spart is a fast and versatile cluster computing platform for the implementation.

In terms of speed, Spart expands the widely used MapReduce computing model and efficiently supports more computational patterns, including interactive queries and streaming. One of the main features of Spart is that it can be computed in memory and thus faster. Even if complex computations must be performed on the hard disk, Spart is still faster than MapReduce.

The Spart is suitable for a variety of scenarios that previously required many different distributed platforms, including batch processing, iterative algorithms, interactive queries, and stream processing. By supporting these different calculations in a unified framework, Spart allows us to integrate various processing processes in a simple and low-cost way, and this feature greatly reduces the burden that was previously required to manage the various platforms separately.

    1. Spart Core

      Spart core implements the basic functions of spart, including task scheduling, memory management, error recovery, and interaction with the storage system. Spart core contains the definition of an elastic distributed data set (RDD) API: The RDD represents a collection of elements distributed across multiple computer nodes that can be manipulated concurrently, and is the main programming abstraction of Spart.

    2. Spart SQL

      Spart SQL is a package that Spart uses to manipulate structured data, and with Spart SQL, we can query data using SQL or Apache hive version of SQL.

    3. Spart Streamig

      The Spart Streamig is a component that Spart provides streaming computing for real-time data. A message queue, such as a Web server log in a production environment, or a status update submitted by a user in a network service, is a stream of data.

    4. MLlib

      Spart is a program pants that provides common machine learning functions, called Mllib. Mllib provides machine learning algorithms, including classification, regression, clustering, collaborative filtering, decision trees, as well as model evaluation, data import and other functions.

    5. Graphx

      Graphx is a library for manipulating diagrams, such as social diagrams, that can be used for parallel graph calculations.

Spart Rapid Big Data Analysis learning outline (i)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.