Greenplum is a distributed database based on PostgreSQL. It uses the shared nothing Architecture (MPP), the host, the operating system, the memory, the storage is self-controlled, there is no sharing. That is, each node is a separate database. The information interaction between nodes is realized through the node internetwork. By distributing data across multiple nodes to achieve storage of scale data, the query performance can be improved by parallel query processing.
How this feels is like organizing small databases together into a large database. The data is fragmented and stored on each node. Each node only queries its own data. The results obtained are then processed by the master node to obtain the final result. The linear expansion of the system is achieved by increasing the number of nodes.
Summary-value of the new MPP database
Technology: A new type of database based on Columnstore +MPP architecture has great difference from traditional database in core technology, and it is designed and developed for structural data analysis, and can deal with petabytes of data effectively. Technology for many industry users to solve the problem of data processing performance.
User value: The new database is run on the x-86 PC server, which can greatly reduce the cost of data processing (an order of magnitude).
Future trends: The new database will be progressively combined with the Hadoop ecosystem to process petabytes of high-quality structured data with MPP, while providing rich SQL and transactional support for applications and semi-structured, unstructured data processing with Hadoop. This can meet the needs of structured, semi-structured, and unstructured data processing.
Big Data focuses more on user behavior, group trends, correlations between events, and more than just past KPIs. This puts forward new requirements and challenges for the data analysis platform to analyze the ability and performance of the data.
The MPP (massively Parallel processing), massively parallel processing system, is made up of a number of loosely coupled processing units, and it is important to note that this refers to the processing unit rather than the processor. The CPU in each unit has its own private resources, such as bus, memory, hard disk, etc. Each unit has an operating system and an instance replica of the management database. The biggest feature of this structure is the non-sharing of resources.
Features that the MPP schema database should have:
Task parallel execution;
Data distributed storage (localization);
distributed computing;
private resources;
Horizontal expansion;
Shared nothing schema.