I. Introduction to CSTORE_FDW
HTTPS://GITHUB.COM/CITUSDATA/CSTORE_FDW, this external table extension is developed by Citusdata Corporation, which uses the Orc_file format to store data in columns.
Advantage 1: Because of the compression, so the storage on disk greatly reduced, compression ratio can reach 2-4 times
Advantage 2: Data internal block storage, the block data for the max and Min value of the record, in the query can be a jump block query
Advantage 3: In the query, not all the disk data is load to memory, but rather select the column according to the recorded skiplist offset to load the required data, reduce IO
Second, installation and use
[[email protected] ~]# git clone https://github.com/citusdata/cstore_fdw.git
After downloading, modify the Pgconfig in the makefile file to be specified in the installation directory for example:/usr/local/postgres/bin/pgconfig
[[email protected] ~]# make && make install
Configure the end of the postgres.conf file to add:
shared_preload_libraries = ' CSTORE_FDW '
Start the database:
[Email protected] ~]$ pg_ctl-d db1-l logfile start-m fast
[Email protected] ~]$ Psql
postgres=# Create extension CSTORE_FDW; Create extensionpostgres=# Create server Cstore_server foreign data wrapper cstore_fdw; Create serverpostgres=# Create FOREIGN TABLE customer_reviewspostgres-# (postgres (# customer_id text,postgres (# review_date date,postgres (# review_rating integer,postgres (# Review_votes integer,postgres (# Review_helpful_votes integer,postgres (# product_id CHAR (#), Postgres (# Product_title text,postgres (# Product_sales_rank bigint,postgres (# Product_group text,postgres (# product_category TEXT,postgres (# Product_subcategory textpostgres (#) postgres-# SERVER cstore_serverpostgres-# OPTIONS (compression ' pglz ');
PG native table takes up disk size:
postgres=# INSERT INTO Customer_reviews select * from Customer;insert 0 176774postgres=# Select Pg_relation_size (' Custome R '); Pg_relation_size------------------ 145489920 (1 row)
Disk size consumed after CSTORE_FDW external expansion compression:
[Email protected] 13056]$ ll/home/postgres/db1/cstore_fdw/13056
-RW-------1 postgres postgres 6236569 Dec 5 10:07 278237
-RW-------1 postgres postgres 5 10:07 278237.footer
After comparison, disk usage is reduced a lot!!
Third, source code analysis
The implementation of the external table in Postgres is equivalent to an engine that is implemented by hooking up the C-language function pointer
datumcstore_fdw_handler (Pg_function_args) {Fdwroutine *fdwRoutine = Makenode (fdwroutine); fdwroutine->getforeignrelsize = Cstoregetforeignrelsize;fdwroutine->getforeignpaths = Cstoregetforeignpaths;fdwroutine->getforeignplan = Cstoregetforeignplan;fdwroutine->explainforeignscan = Cstoreexplainforeignscan;fdwroutine->beginforeignscan = cstorebeginforeignscan;//1fdwroutine-> Iterateforeignscan = Cstoreiterateforeignscan;//2fdwroutine->rescanforeignscan = CStoreReScanForeignScan;// 3fdwroutine->endforeignscan = Cstoreendforeignscan;//4fdwroutine->analyzeforeigntable = Cstoreanalyzeforeigntable;fdwroutine->planforeignmodify = cstoreplanforeignmodify;//5fdwroutine-> beginforeignmodify = Cstorebeginforeignmodify;//6fdwroutine->execforeigninsert = CStoreExecForeignInsert;// 7fdwroutine->endforeignmodify = Cstoreendforeignmodify;//8pg_return_pointer (fdwRoutine);}
1, 2, 3, 4 constitute a query operation such as: SELECT * from Customer_reviews;
5, 6, 7, 8 constitute an insert operation, for example: INSERT INTO customer_reviews select * from Customer;
It is particularly important to note that the Tableentry->rtekind = = Rte_subquery is judged by cstoreplanforeignmodify in this function when inserting,
So insert into XX values xxx This insertion is not supported.
It is observed from the source that the flushstripe operation is performed in the cstoreendforeignmodify, that is, whether inserting a data or inserting the data in bulk, the flushstripe operation will be performed.
If you insert a piece of data, this data takes up a stripe of disk space
If it is bulk INSERT, then according to the default stripe size, block size to split, to meet the stripe to brush the disk, and then the remaining does not meet stripe as another stripe, if the data is a stripe, the query load data will be quite slow.
Finally, it is concluded that the compression efficiency is not obvious for a single insert or transactional database, and the compression ratio is quite significant for bulk inserts, and the query will be faster.
Rcfile format vs. orc format:
There is also for rcfile this format, the string type of compression is not very obvious processing, unlike the Orc format, orc with dictionary compression processing, and Rcfile did not
Https://github.com/gokhankici/orc_fdw
This external table extension simply reads the files in the ORC format and does not have a write operation, and the operation of the write file is developed using the Java language.
Installation and use of CSTORE_FDW and source code analysis