Deepgreen DB Introduction

Last Update:2017-09-18 Source: Internet

Author: User

Tags postgresql psql

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original link

Deepgreen DB full name Vitesse Deepgreen DB, a scalable, massively parallel (often called MPP) data warehousing solution that originates from an open source Data Warehouse Project Greenplum DB (often referred to as GP or gpdb). So already familiar with GP's friend, can seamlessly switch to Deepgreen. It has almost all the features of GP and, based on all the advantages of GP, deepgreen the original query processing engine, and the next generation of query processing engines expands:

Superior connectivity and aggregation algorithms
New overflow processing Subsystem
JIT-based query optimization, vector scanning and data path optimization

The following is a brief introduction to the main features of Deepgreen (mainly in contrast to Greenplum): 1. 100% Gpdbdeepgreen and Greenplum almost 100% consistent, it is said almost, because Deepgreen also removed some of the greenplum on the chicken function, such as mapreduce support, it can be said to retain the essence. From SQL syntax, stored procedure syntax, to data storage formats, to components like gpstart/gpfdist, deepgreen minimizes migration impact for users who want to migrate from Greenplum. This is especially true in the following areas:

Other data does not need to be reloaded except for data that is compressed in Quicklz mode
DML and DDL statements do not change
UDF (user-defined function) syntax has no change
No changes to stored procedure syntax
No changes to connection and authorization agreements such as JDBC/ODBC
No changes to the run script (e.g. backup script)

So what's the difference between Deepgreen and Greenplum? summed up into a word is: quick! Fast! Fast! (important thing to say three times). Because most of the OLAP work is related to the CPU performance, so the CPU-optimized deepgreen in the performance test, can achieve more than the original Greenplum faster than the performance of many times. 2. The faster decimal type Deepgreen provides two more precise decimal types: Decimal64 and Decimal128, which are more efficient than Greenplum original decimal type (Numeric). Because they are more accurate, they are more suitable for business scenarios where data accuracy is required, such as banks, compared to the fload/double type. Installation:

这两个数据类型需要在数据库初始化以后，通过命令加载到需要的数据库中：[email protected]:~$ source deepgreendb/greenplum_path.sh[email protected]:~$ cd $GPHOME/share/postgresql/contrib/[email protected]:~/deepgreendb/share/postgresql/contrib$ psql postgres -f pg_decimal.sql

Test a piece:

Use statement: Select AVG (x), sum (2*x) from table data Volume: 1 million [email protected]:~$ psql-d postgrespsql (8.2.15) Type ' help ' for Hel p.postgres=# drop table if exists TT; Notice:table "tt" does not exist, Skippingdrop tablepostgres=# CREATE TABLE TT (Postgres (# II Bigint,postgres (# f64 Doub Le Precision,postgres (# d64 decimal64,postgres (# d128 Decimal128,postgres (# N Numeric (3)) postgres-# distributed randomly; CREATE tablepostgres=# INSERT INTO ttpostgres-# select i,postgres-# i + 0.123,postgres-# (i + 0.123)::d Ecimal64,po stgres-# (i + 0.123)::d ecimal128,postgres-# i + 0.123postgres-# from generate_series (1, 1000000) I;insert 0 100000  0postgres=# \timing ontiming is on.postgres=# select COUNT (*) from TT; Count---------1000000 (1 row) time:161.500 mspostgres=# set vitesse.enable=1;       settime:1.695 mspostgres=# Select AVG (f64), sum (2*F64) from TT;       Avg | Sum------------------+------------------500000.622996815 | 1000001245993.63 (1 row) time:45.368 mspostgres=# SeleCT avg (D64), sum (2*D64) from TT;        Avg | Sum------------+-------------------500000.623 |    1000001246000.000 (1 row) time:135.693 mspostgres=# select AVG (d128), sum (2*d128) from TT;        Avg | Sum------------+-------------------500000.623 | 1000001246000.000 (1 row) time:148.286 mspostgres=# set vitesse.enable=1;         settime:11.691 mspostgres=# Select AVG (n), sum (2*n) from TT;        Avg | Sum---------------------+-------------------500000.623000000000 | 1000001246000.000 (1 row) time:154.189 mspostgres=# set vitesse.enable=0;         settime:1.426 mspostgres=# Select AVG (n), sum (2*n) from TT;        Avg | Sum---------------------+-------------------500000.623000000000 | 1000001246000.000 (1 row) time:296.291 ms

Results list:

45ms - 64位float136ms - decimal64148ms - decimal128154ms - deepgreen numeric296ms - greenplum numeric

With the above test, the Decimal64 (136ms) type is faster than the Deepgreen numeric (154ms) type, twice times faster than the Greenplum numeric, and more than 5 times times faster in the production environment. 3. Support Jsondeepgreen supports JSON type, but it is not fully supported. Unsupported functions are: Json_each,json_each_text,json_extract_path,json_extract_path_text, Json_object_keys, Json_populate_ Record, Json_populate_recordset, Json_array_elements, and Json_agg. Installation: Execute the following command to extend JSON support:

[email protected]:~$ psql postgres -f $GPHOME/share/postgresql/contrib/json.sql

Test a piece:

[email protected]:~$ psql postgrespsql (8.2.15)Type "help" for help.postgres=# select ‘[1,2,3]‘::json->2; ?column?---------- 3(1 row)postgres=# create temp table mytab(i int, j json) distributed by (i);CREATE TABLEpostgres=# insert into mytab values (1, null), (2, ‘[2,3,4]‘), (3, ‘[3000,4000,5000]‘);INSERT 0 3postgres=#postgres=# insert into mytab values (1, null), (2, ‘[2,3,4]‘), (3, ‘[3000,4000,5000]‘);INSERT 0 3postgres=# select i, j->2 from mytab; i | ?column?---+---------- 2 | 4 2 | 4 1 | 3 | 5000 1 | 3 | 5000(6 rows)

4. The efficient compression algorithm Deepgreen continues the Greenplum zlib compression algorithm for storage compression. In addition to this, Deepgreen offers two compression formats that are more load-optimized for the database: ZSTD and LZ4. Select the ZSTD compression algorithm if the customer requires a better compression ratio when storing or only appending heap table storage. There is a better compression ratio than ZLIB,ZSTD, and the CPU can be used more efficiently. If the customer has a large number of read requirements, then you can choose the LZ4 compression algorithm, because it has an amazing decompression speed. Although lz4 on the compression ratio is not as outstanding as zlib and ZSTD, it is worthwhile to make some sacrifices to satisfy the high read load. For details on the two compression algorithms, see the home page:

ZSTD Home http://facebook.github.io/zstd/
LZ4 Home http://lz4.github.io/lz4/

Test A: Here only for non-compression/zlib/zstd/lz4 four, for simple testing, my machine performance is not high, all results are for reference only:

postgres=# Create temp Table Ttnone (postgres (# i int,postgres (# t Text,postgres (# Default column encoding (CO Mpresstype=none)) postgres-# with (Appendonly=true, Orientation=column) postgres-# distributed by (i); Create tablepostgres=# \timing ontiming is on.postgres=# create temp table Ttzlib (postgres (# i int,postgres (# t Te Xt,postgres (# Default column encoding (Compresstype=zlib, compresslevel=1)) postgres-# with (Appendonly=true, Orientati On=column) postgres-# distributed by (i);     Create tabletime:762.596 mspostgres=# Create temp Table ttzstd (postgres (# i int,postgres (# t Text,postgres (# Default column encoding (COMPRESSTYPE=ZSTD, compresslevel=1)) postgres-# with (Appendonly=true, Orientation=column) postgres-# distributed by (i);     Create tabletime:827.033 mspostgres=# Create temp Table ttlz4 (postgres (# i int,postgres (# t Text,postgres (# Default column encoding (COMPRESSTYPE=LZ4)) postgres-# with (Appendonly=true, Orientation=column) postgres-# Distributed by (i); CREATE tabletime:845.728 mspostgres=# INSERT INTO Ttnone select I, ' User ' | | I from Generate_series (1, 100000000) I;insert 0 100000000time:104641.369 mspostgres=# INSERT into ttzlib select I, ' User ' | | I from Generate_series (1, 100000000) I;insert 0 100000000time:99557.505 mspostgres=# INSERT into ttzstd select I, ' user ' || I from Generate_series (1, 100000000) I;insert 0 100000000time:98800.567 mspostgres=# INSERT into ttlz4 select I, ' user ' | |i from Generate_series (1, 100000000) i;insert 0 100000000time:96886.107 mspostgres=# Select Pg_size_pretty (pg_relation _size (' Ttnone ')); Pg_size_pretty----------------1708 MB (1 row) time:83.411 mspostgres=# Select Pg_size_pretty (pg_relation_size (' Ttzlib ‘)); Pg_size_pretty----------------374 MB (1 row) time:4.641 mspostgres=# Select Pg_size_pretty (pg_relation_size (' ttzstd ') ); Pg_size_pretty----------------325 MB (1 row) time:5.015 mspostgres=# Select Pg_size_pretty (pg_relation_size (' ttlz4 ')) ; Pg_size_pretty----------------7MB (1 row) time:4.483 mspostgres=# Select sum (Length (t)) from Ttnone;    Sum------------1288888898 (1 row) time:4414.965 mspostgres=# Select sum (Length (t)) from Ttzlib;    Sum------------1288888898 (1 row) time:4500.671 mspostgres=# Select sum (Length (t)) from TTZSTD;    Sum------------1288888898 (1 row) time:3849.648 mspostgres=# Select sum (Length (t)) from TTLZ4; Sum------------1288888898 (1 row) time:3160.477 ms

5. Data sampling from the Deepgreen 16.16 release, built-in support for data real-world sampling via SQL, you can sample by defining the number of rows or by defining a sample:

SELECT {select-clauses} LIMIT SAMPLE {n} ROWS;
SELECT {select-clauses} LIMIT SAMPLE {n} PERCENT;

Test a piece:

postgres=# Select COUNT (*) from TTLZ4;    Count-----------100000000 (1 row) time:903.661 mspostgres=# select * from TTLZ4 limit sample 0.00001 percent;       I | T----------+---------------3442917 | User 3442917 9182620 | User 9182620 9665879 | User 9665879 13791056 | User 13791056 15669131 | User 15669131 16234351 | User 16234351 19592531 | User 19592531 39097955 | User 39097955 48822058 | User 48822058 83021724 | User 83021724 1342299 | User 1342299 20309120 | User 20309120 34448511 | User 34448511 38060122 | User 38060122 69084858 | User 69084858 73307236 | User 73307236 95421406 |    User 95421406 (+ rows) time:4208.847 mspostgres=# select * FROM TTLZ4 limit sample rows;       I | T----------+---------------78259144 | User 78259144 85551752 | User 85551752 90848887 | User 90848887 53923527 | User 53923527 46524603 | User 46524603 31635115 | User 31635115 19030885 | User 19030885 97877732 | User 97877732 33238448 | User 33238448 20916240 | User 20916240 (rows) Time:3578.031 ms

6.tpc-h Performance Deepgreen and Greenplum performance comparison, please refer to my other two posts: "Deepgreen and Greenplum tpc-h performance test comparison (using the German script)" Deepgreen and Greenplum Tpc-h Performance test comparison (using Vitessedata script) "In addition deepgreen self-powered high-performance components xdrive, will be shared at a later stage ~ end~

Deepgreen db Introduction (RPM)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More