What is Spart?
Spart is a fast and versatile cluster computing platform for the implementation.
In terms of speed, Spart expands the widely used MapReduce computing model and efficiently supports more computational patterns, including interactive queries and streaming. One of the main features of Spart is that it can be computed in memory and thus faster. Even if complex computations must be performed on the hard disk, Spart is still faster than MapReduce.
The Spart is suitable for a variety of scenarios that previously required many different distributed platforms, including batch processing, iterative algorithms, interactive queries, and stream processing. By supporting these different calculations in a unified framework, Spart allows us to integrate various processing processes in a simple and low-cost way, and this feature greatly reduces the burden that was previously required to manage the various platforms separately.
Spart Core
Spart core implements the basic functions of spart, including task scheduling, memory management, error recovery, and interaction with the storage system. Spart core contains the definition of an elastic distributed data set (RDD) API: The RDD represents a collection of elements distributed across multiple computer nodes that can be manipulated concurrently, and is the main programming abstraction of Spart.
Spart SQL
Spart SQL is a package that Spart uses to manipulate structured data, and with Spart SQL, we can query data using SQL or Apache hive version of SQL.
Spart Streamig
The Spart Streamig is a component that Spart provides streaming computing for real-time data. A message queue, such as a Web server log in a production environment, or a status update submitted by a user in a network service, is a stream of data.
MLlib
Spart is a program pants that provides common machine learning functions, called Mllib. Mllib provides machine learning algorithms, including classification, regression, clustering, collaborative filtering, decision trees, as well as model evaluation, data import and other functions.
Graphx
Graphx is a library for manipulating diagrams, such as social diagrams, that can be used for parallel graph calculations.
Spart Rapid Big Data Analysis learning outline (i)